Chapter 3: ErgoTree Structure
PRE-ALPHA WARNING: This is a pre-alpha version of The Sigma Book. Content may be incomplete, inaccurate, or subject to change. Do not use as a source of truth. For authoritative information, consult the official repositories:
- sigmastate-interpreter — Reference Scala implementation
- sigma-rust — Rust implementation
- ergo — Ergo node
Prerequisites
- Binary representation concepts (bits, bytes, bitwise operations)
- Variable-Length Quantity (VLQ) encoding—a method for encoding integers using a variable number of bytes
- Understanding of Abstract Syntax Trees (ASTs) as hierarchical representations of code structure
- Prior chapters: Chapter 1 for the three-layer architecture, Chapter 2 for type codes used in serialization
Learning Objectives
By the end of this chapter, you will be able to:
- Parse and interpret ErgoTree header bytes, extracting version and feature flags
- Explain how constant segregation enables template sharing and caching optimizations
- Describe the version mechanism and how it enables soft-fork protocol upgrades
- Read and write the complete ErgoTree binary format
ErgoTree Overview
When you write an ErgoScript contract, the compiler transforms it into ErgoTree—a compact binary format designed for blockchain storage and deterministic execution. Every UTXO box contains an ErgoTree that defines its spending conditions.
ErgoTree is specifically designed to be12:
- Self-sufficient: Contains everything needed for evaluation (no external dependencies)
- Compact: Optimized binary encoding minimizes on-chain storage
- Forward-compatible: Version mechanism enables protocol upgrades without hard forks
- Deterministic: Same bytes always produce the same evaluation result
The structure consists of:
- Header byte — Format version and feature flags
- Size field (optional) — Total size for fast skipping
- Constants array (optional) — Extracted constants for template sharing
- Root expression — The actual script logic, returning
SigmaProp
const ErgoTree = struct {
header: HeaderType,
constants: []const Constant,
root: union(enum) {
parsed: SigmaPropValue,
unparsed: UnparsedTree,
},
proposition_bytes: ?[]const u8,
pub fn bytes(self: *ErgoTree, allocator: Allocator) ![]u8 {
if (self.proposition_bytes) |b| return b;
return try serialize(self, allocator);
}
pub fn bytesHex(self: *ErgoTree, allocator: Allocator) ![]u8 {
const b = try self.bytes(allocator);
return std.fmt.allocPrint(allocator, "{x}", .{b});
}
};
Header Format
The first byte uses a bit-field format34:
7 6 5 4 3 2 1 0
┌──┬──┬──┬──┬──┬──┬──┬──┐
│ │ │ │ │ │ │ │ │
└──┴──┴──┴──┴──┴──┴──┴──┘
│ │ │ │ │ └──┴──┴── Version (bits 0-2)
│ │ │ │ └─────────── Size flag (bit 3)
│ │ │ └────────────── Constant segregation (bit 4)
│ │ └───────────────── Reserved (bit 5, must be 0)
│ └──────────────────── Reserved for GZIP (bit 6, must be 0)
└─────────────────────── Extended header (bit 7)
const HeaderType = packed struct(u8) {
version: u3, // bits 0-2
has_size: bool, // bit 3
constant_segregation: bool, // bit 4
reserved1: bool = false, // bit 5
reserved_gzip: bool = false, // bit 6
multi_byte: bool = false, // bit 7
pub const VERSION_MASK: u8 = 0x07;
pub const SIZE_FLAG: u8 = 0x08;
pub const CONST_SEG_FLAG: u8 = 0x10;
pub fn fromByte(byte: u8) HeaderType {
return @bitCast(byte);
}
pub fn toByte(self: HeaderType) u8 {
return @bitCast(self);
}
pub fn v0(constant_segregation: bool) HeaderType {
return .{
.version = 0,
.has_size = false,
.constant_segregation = constant_segregation,
};
}
pub fn v1(constant_segregation: bool) HeaderType {
return .{
.version = 1,
.has_size = true, // Required for v1+
.constant_segregation = constant_segregation,
};
}
};
Common Header Values
| Byte | Binary | Meaning |
|---|---|---|
0x00 | 00000000 | v0, no segregation, no size |
0x08 | 00001000 | v0, no segregation, with size |
0x10 | 00010000 | v0, constant segregation, no size |
0x18 | 00011000 | v0, constant segregation, with size |
0x09 | 00001001 | v1, with size (required) |
0x19 | 00011001 | v1, constant segregation, with size |
Binary Format
┌──────────────────────────────────────────────────────────────────┐
│ ErgoTree │
├─────────┬─────────────┬──────────────────┬───────────────────────┤
│ Header │ [Size] │ [Constants] │ Root Expression │
│ 1 byte │ VLQ (opt) │ Array (opt) │ Serialized tree │
└─────────┴─────────────┴──────────────────┴───────────────────────┘
If header bit 3 is set (hasSize):
Size = VLQ-encoded size of (Constants + Root Expression)
If header bit 4 is set (isConstantSegregation):
Constants = VLQ count + Array of serialized constants
Root Expression = Serialized expression tree (SigmaPropValue)
const ErgoTreeSerializer = struct {
pub fn deserialize(reader: anytype) !ErgoTree {
// 1. Read header byte
const header = HeaderType.fromByte(try reader.readByte());
// 2. Read extended header if bit 7 set
if (header.multi_byte) {
// VLQ continuation - read additional bytes
_ = try readVlqExtension(reader);
}
// 3. Read size if flag set
var tree_size: ?u32 = null;
if (header.has_size) {
tree_size = try readVlq(reader);
}
// 4. Read constants if segregation enabled
var constants: []Constant = &.{};
if (header.constant_segregation) {
const count = try readVlq(reader);
// Bounds check: prevent DoS via excessive allocation
const MAX_CONSTANTS: u32 = 4096;
if (count > MAX_CONSTANTS) {
return error.TooManyConstants;
}
constants = try allocator.alloc(Constant, count);
for (constants) |*c| {
c.* = try Constant.deserialize(reader);
}
}
// NOTE: In production, use a pre-allocated pool instead of dynamic
// allocation during deserialization. See ZIGMA_STYLE.md.
// 5. Read root expression
const root = try Expr.deserialize(reader);
return ErgoTree{
.header = header,
.constants = constants,
.root = .{ .parsed = root },
.proposition_bytes = null,
};
}
};
Constant Segregation
Constant segregation is an optimization technique that extracts literal values from the expression tree and stores them in a separate array5. The expression tree then references these constants via placeholder indices. This seemingly simple change enables several powerful optimizations:
Without segregation:
┌─────────────────────────────────────────────────┐
│ header: 0x00 │
│ root: AND(GT(HEIGHT, IntConstant(100)), pk) │
└─────────────────────────────────────────────────┘
With segregation:
┌─────────────────────────────────────────────────┐
│ header: 0x10 │
│ constants: [IntConstant(100)] │
│ root: AND(GT(HEIGHT, Placeholder(0)), pk) │
└─────────────────────────────────────────────────┘
Benefits:
- Template sharing: Same template, different constants
- Caching: Templates cached for repeated evaluation
- Substitution: Constants replaced without re-parsing
/// Substitute ConstantPlaceholder nodes with actual constants
pub fn substConstants(
root: *const Expr,
constants: []const Constant,
) Expr {
return switch (root.*) {
.constant_placeholder => |ph| .{
.constant = constants[ph.index],
},
.and => |a| .{
.and = .{
.left = substConstants(a.left, constants),
.right = substConstants(a.right, constants),
},
},
// ... other node types
else => root.*,
};
}
// NOTE: In production, use an iterative approach with an explicit work stack
// to guarantee bounded stack depth and prevent stack overflow on deep trees.
Version Mechanism
The ErgoTree version field (bits 0-2) enables soft-fork protocol upgrades without breaking consensus6. Each ErgoTree version corresponds to a minimum required block version in the Ergo protocol—nodes running older protocol versions will skip validation of scripts with newer ErgoTree versions rather than rejecting them as invalid.
| ErgoTree Version | Min Block Version | Key Features |
|---|---|---|
| v0 | 1 | Original format with Ahead-of-Time (AOT) costing calculated during compilation |
| v1 | 2 | Just-in-Time (JIT) costing calculated during execution; size field required |
| v2 | 3 | Extended operations and new opcodes |
| v3 | 4 | UnsignedBigInt type and enhanced collection methods |
The size field became mandatory in v1 to support forward compatibility—nodes can skip over scripts they cannot fully parse by reading the size and advancing past the unknown content.
pub fn setVersionBits(header: HeaderType, version: u3) HeaderType {
var h = header;
h.version = version;
// Size flag required for version > 0
if (version > 0) {
h.has_size = true;
}
return h;
}
Unparsed Trees
When a node encounters an ErgoTree with an unknown opcode—typically from a newer protocol version—deserialization fails. Rather than rejecting the transaction entirely, the raw bytes are preserved as an "unparsed tree"7. This design is critical for soft-fork compatibility: older nodes can process blocks containing newer script versions without understanding their contents.
const UnparsedTree = struct {
bytes: []const u8,
err: DeserializationError,
};
/// Convert to proposition, handling unparsed case
pub fn toProposition(self: *const ErgoTree, replace_constants: bool) !SigmaPropValue {
return switch (self.root) {
.parsed => |tree| blk: {
if (replace_constants and self.constants.len > 0) {
break :blk substConstants(tree, self.constants);
}
break :blk tree;
},
.unparsed => |u| return u.err,
};
}
Creating ErgoTrees
pub fn fromProposition(prop: SigmaPropValue) ErgoTree {
return fromPropositionWithHeader(HeaderType.v0(false), prop);
}
pub fn fromPropositionWithHeader(header: HeaderType, prop: SigmaPropValue) ErgoTree {
// Simple constants don't need segregation
if (prop == .sigma_prop_constant) {
return withoutSegregation(header, prop);
}
// Complex expressions benefit from segregation
return withSegregation(header, prop);
}
fn withSegregation(header: HeaderType, prop: SigmaPropValue) ErgoTree {
var constants = std.ArrayList(Constant).init(allocator);
const segregated = extractConstants(prop, &constants);
return ErgoTree{
.header = .{
.version = header.version,
.has_size = header.has_size,
.constant_segregation = true,
},
.constants = constants.toOwnedSlice(),
.root = .{ .parsed = segregated },
.proposition_bytes = null,
};
}
Template Extraction
The template is root expression bytes without constant values:
pub fn template(self: *const ErgoTree) ![]const u8 {
// Serialize root with placeholders (no constant substitution)
var buf = std.ArrayList(u8).init(allocator);
try self.root.parsed.serialize(buf.writer());
return buf.toOwnedSlice();
}
Templates are useful for:
- Identifying script patterns regardless of constants
- Contract template matching
- Caching deserialized templates
Properties
const ErgoTree = struct {
// ... fields ...
/// Returns true if tree contains deserialization operations
pub fn hasDeserialize(self: *const ErgoTree) bool {
return switch (self.root) {
.parsed => |p| containsDeserializeOp(p),
.unparsed => false,
};
}
/// Returns true if tree uses blockchain context
pub fn isUsingBlockchainContext(self: *const ErgoTree) bool {
return switch (self.root) {
.parsed => |p| containsContextOp(p),
.unparsed => false,
};
}
/// Convert to SigmaBoolean if simple proposition
pub fn toSigmaBooleanOpt(self: *const ErgoTree) ?SigmaBoolean {
const prop = self.toProposition(self.header.constant_segregation) catch return null;
return switch (prop) {
.sigma_prop_constant => |c| c.value,
else => null,
};
}
};
Summary
This chapter covered the complete ErgoTree binary format—the serialized representation of smart contracts stored in every UTXO box:
- ErgoTree is a self-sufficient serialized contract format containing everything needed for evaluation without external dependencies
- The header byte uses a bit-field layout: version (bits 0-2), size flag (bit 3), constant segregation flag (bit 4), with reserved bits for future extensions
- Constant segregation (bit 4) extracts literal values into a separate array, enabling template sharing, caching, and runtime substitution without re-parsing
- The version mechanism enables soft-fork protocol upgrades—newer ErgoTree versions are skipped by older nodes rather than causing consensus failures
- ErgoTree versions 1+ require the size flag, allowing nodes to skip past unknown content
- UnparsedTree preserves raw bytes when deserialization fails, maintaining block validity even with unknown opcodes
- Simple cryptographic propositions can be extracted as SigmaBoolean values for direct signature verification
Next: Chapter 4: Value Nodes
Scala: ErgoTree.scala:24-80
Rust: ergo_tree.rs:33-41
Scala: ErgoTree.scala:227-270
Rust: tree_header.rs:10-32
Scala: ErgoTree.scala:307-322
Scala: ErgoTree.scala:263-305
Scala: ErgoTree.scala:19-22