One feature Skein has, but Blake is lacking is tree hashing. The kind of tree I’ll describe here has continuous leaves, so it is suitable for incremental updates/verification. For parallel hashing other constructions, such as interleaving, might be superior.
I’m using the Blake compression function without any changes, but I change the counter scheme. The counter scheme combines ideas from my earlier Alternative Padding suggestion and the UTF8 encoding. I’ll only describe the Blake256 variant here, but generalization to Blake512 is straightforward.
Description
The tree hash takes three parameters:
 The
LeafSize
. A power of two, at least 4.  The
FanOut
. A power of two, at least 2  The maximum tree high
MaxHeight
, at least 2.
As with my earlier proposal, bit 63 marks the final block of a layer, and bit 62 indicates the presence of bit padding. The remaining 62 bits serve as a combined layer indication and position.
The number of leading 1
bits indicates the layer. No leading 1
bit corresponds to leaves etc. The remaining bits count the length of the message in bytes up to and including the current block.
For the leaf layer divide the message into LeafSize * HashSize
pieces. For the higher layers divide the message into FanOut * HashSize
pieces.
Hash each piece, producing one hash that’ll become part of the next layer. Don’t reset the counter when going to the next piece, only reset the chaining value to the IV.
If you reach MaxHeight
, hash the whole lower layer in one go, instead of splitting it into pieces.
Repeat until the length of a layer is equal to the HashSize
. That layer is the root hash.
Pseudocode
TreeHash(Message, LeafSize, FanOut, MaxHeight)
requires (LeafSize >= 4) and IsPowerOfTwo(LeafSize)
requires (FanOut >= 2) and IsPowerOfTwo(FanOut)
requires MaxHeight >= 2
const HashSizeInBytes = 32
const BlockSizeInBytes = HashSizeInBytes * 2
const CounterSizeInBits = 64
CurrentLayer = Message
if Message.LengthInBits==0
return BlakeCompress(IV, BlockSizeInBytes zerobytes, 2**(CounterSizeInBits1))
LayerIndex = 0
repeat
NextLayer = ""
if LayerIndex == 0
PieceSizeInBytes = LeafSize * HashSizeInBytes
else
PieceSizeInBytes = FanOut * HashSizeInBytes
if (PieceSizeInBytes<CurrentLayer.LenghInBytes) or (LayerIndex == MaxHeight  1)
PieceSizeInBytes = CurrentLayer.LengthInBytes
for(pieceIndex=0; pieceIndex < Ceiling(CurrentLayer.LengthInBits/(PieceSizeInBytes*8)); pieceIndex++)
State = IV
Piece = CurrentLayer[pieceIndex*PieceSize, PieceSize]
if CurrentLayer.LengthInBits % 8 != 0 //Add bit padding
Piece.AppendBit(1)
while Piece.LengthInBits % (BlockSizeInBytes * 8) != 0 //Pad to block size
Piece.AppendBit(0)
for(blockIndex=0; blockIndex<Piece.LengthInBytes/BlockSizeInBytes; blockIndex++)
Counter = Min(
pieceIndex*PieceSize + (blockIndex + 1) * BlockSizeInBytes,
Ceiling(CurrentLayer.LengthInBits/8))
if Counter == Ceiling(CurrentLayer.LengthInBits/8)//Last block of layer
Counter.SetBit(CounterSizeInBits  1)
if CurrentLayer.LengthInBits % 8 != 0 //Bit padding was used
Counter.SetBit(CounterSizeInBits  2)
for(i=0; i< LayerIndex; i++)//Set the `LayerIndex` most significant bits in Counter
Counter.SetBit(CounterSizeInBits  (3 + i));
State = BlakeCompress(State, Piece[blockIndex * HashSizeInBytes, HashSizeInBytes], Counter)
NextLayer.Append(State)
CurrentLayer = NextLayer
LayerIndex++
until CurrentLayer.LengthInBytes == HashSizeInBytes)
return CurrentLayer
(The division sign /
denotes exact division, not c style truncating division, =
is assignment, and ==
equality test)
Example
HashSizeInBytes = 32
FanOut = 2
LeafSize = 4
MaxHeight = 255
Message.LengthInBits = 2500
Each block corresponds to a compression function call, the number in it is the counter as a hexadecimal number.
++
 B000 ... 0040 
++

++
 
++ ++
 2000 ... 0040   A000 ... 0060 
++ ++
 
++++ +
    
++ ++ ++ ++ ++
 0000 ... 0040   0000 ... 0080   0000 ... 00C0   0000 ... 0100   C000 ... 0139 
++ ++ ++ ++ ++
Properties/Open Questions
 The Blake256 based variant can hash 2^648 bits or 2^611 bytes
 I didn’t describe a configuration block yet. Without such a block, trees with different parameters might interact in undesirable ways.
 Should we reduce the minimum
LeafSize
to2 * HashSize
? That’d reduce the maximum input size or adds additional complications.  Can it be extended to include interleaved hashing in an elegant way?