Post

Executable Format Analysis under SSCLI (2)

Now we begin analyzing the exciting part — Metadata.

First, locate the metadata table using the IMAGE_DATA_DIRECTORY entry named MetaData in IMAGE_COR20_HEADER. The calculation: 0x00002148 - 0x00002000 + 0x00000200 = 0x00000348.

Metadata uses a simplified compound document format:

+-----------+ | Signature | +-----------+ | Streams | +-----------+ | STORAGEHEADER | | Extra data | | STORAGESTREAM[] | +-----------+ | offset | +-----------+

In clr\src\md\inc\MdFileFormat.h:

struct STORAGESIGNATURE { ULONG m_lSignature; // "Magic" signature 0x424a5342 USHORT m_iMajorVer; // Major version 0x0001 USHORT m_iMinorVer; // Minor version 0x0001 ULONG m_lExtraData; // Offset to next structure ULONG m_lVersionStringLength; // 0x00000008 BYTE m_pVersion[0]; // "v1.0.0" }; struct STORAGEHEADER { BYTE fFlags; // STGHDR_xxx flags BYTE pad; USHORT iStreams; // Number of streams };

Following the header are STORAGESTREAM entries for each stream. There are 5 common stream types: #Strings, #Blob, #Guid, #US (User Strings), and #~. Each type can appear at most once; #US and #Blob are optional.

#Strings and #US Streams: Both store strings — #Strings for descriptor strings (separated by \0, always starting with a null string), and #US for user-defined strings in UTF-8. When accessing strings via mdtString, the RID is not a row number but an offset within the stream.

Example: using ILDASM to view Hello.exe, tokens 0x70000017 and 0x7000002F resolve to “Hello world” and “Echo: {0}” respectively at offsets 0x17 and 0x2F from the #US stream start.

#GUID Stream: An array of GUIDs. For our file it contains one GUID: {6BE151AB-E2FF-3D10-5CA8-7B77DA98426C}.

#Blob Stream: Binary large objects. Starting at 0x00000348 + 0x000002F0 = 0x00000638.

#~ Stream: Where metadata tables live — the most important stream. Starting at 0x00000348 + 0x00000068 = 0x000003B0, it contains a CMiniMdSchemaBase and CMiniMdSchema (defined in clr\src\md\inc\Metamodel.h).

class CMiniMdSchemaBase { ULONG m_ulReserved; // 0 BYTE m_major; // 0x01 BYTE m_minor; // 0x00 BYTE m_heaps; // Heap size bits BYTE m_rid; // log-base-2 of largest rid unsigned __int64 m_maskvalid; // Bit mask of present table counts unsigned __int64 m_sorted; // Bit mask of sorted tables }; class CMiniMdSchema : public CMiniMdSchemaBase { ULONG m_cRecs[TBL_COUNT]; // Table row counts ULONG m_ulExtra; // Extra data };

TBL_COUNT is 42 in SSCLI, defined in clr\src\inc\metamodelpub.h. m_heaps is a bitmap: bit 0 = #Strings, bit 1 = #Guid, bit 3 = #Blob. When a bit is set, the corresponding heap uses 4-byte (instead of 2-byte) indexes.

m_maskvalid and m_sorted are 64-bit bitmaps for table status. Since m_maskvalid = 0x901a21557 in our example, only 14 of 42 tables actually have data. The code in CMiniMdSchema::LoadFrom loads only the existing tables using this bitmap.

The 14 tables present in Hello.exe and their row counts:

  • Module: 1
  • TypeRef: 4
  • TypeDef: 3
  • Field: 1
  • Method: 6
  • Param: 2
  • MemberRef: 4
  • CustomAttribute: 1
  • StandAloneSig: 2
  • PropertyMap: 1
  • Property: 1
  • MethodSemantic: 2
  • Assembly: 1
  • AssemblyRef: 1

After the schema data comes the metadata table data, with a structure similar to relational database tables: rows and columns, each row representing a record.

This post is licensed under CC BY 4.0 by the author.