Executable Format Analysis under SSCLI (2)
Now we begin analyzing the exciting part — Metadata.
First, locate the metadata table using the IMAGE_DATA_DIRECTORY entry named MetaData in IMAGE_COR20_HEADER. The calculation: 0x00002148 - 0x00002000 + 0x00000200 = 0x00000348.
Metadata uses a simplified compound document format:
+-----------+
| Signature |
+-----------+
| Streams |
+-----------+
| STORAGEHEADER |
| Extra data |
| STORAGESTREAM[] |
+-----------+
| offset |
+-----------+
In clr\src\md\inc\MdFileFormat.h:
struct STORAGESIGNATURE {
ULONG m_lSignature; // "Magic" signature 0x424a5342
USHORT m_iMajorVer; // Major version 0x0001
USHORT m_iMinorVer; // Minor version 0x0001
ULONG m_lExtraData; // Offset to next structure
ULONG m_lVersionStringLength; // 0x00000008
BYTE m_pVersion[0]; // "v1.0.0"
};
struct STORAGEHEADER {
BYTE fFlags; // STGHDR_xxx flags
BYTE pad;
USHORT iStreams; // Number of streams
};
Following the header are STORAGESTREAM entries for each stream. There are 5 common stream types: #Strings, #Blob, #Guid, #US (User Strings), and #~. Each type can appear at most once; #US and #Blob are optional.
#Strings and #US Streams: Both store strings — #Strings for descriptor strings (separated by \0, always starting with a null string), and #US for user-defined strings in UTF-8. When accessing strings via mdtString, the RID is not a row number but an offset within the stream.
Example: using ILDASM to view Hello.exe, tokens 0x70000017 and 0x7000002F resolve to “Hello world” and “Echo: {0}” respectively at offsets 0x17 and 0x2F from the #US stream start.
#GUID Stream: An array of GUIDs. For our file it contains one GUID: {6BE151AB-E2FF-3D10-5CA8-7B77DA98426C}.
#Blob Stream: Binary large objects. Starting at 0x00000348 + 0x000002F0 = 0x00000638.
#~ Stream: Where metadata tables live — the most important stream. Starting at 0x00000348 + 0x00000068 = 0x000003B0, it contains a CMiniMdSchemaBase and CMiniMdSchema (defined in clr\src\md\inc\Metamodel.h).
class CMiniMdSchemaBase {
ULONG m_ulReserved; // 0
BYTE m_major; // 0x01
BYTE m_minor; // 0x00
BYTE m_heaps; // Heap size bits
BYTE m_rid; // log-base-2 of largest rid
unsigned __int64 m_maskvalid; // Bit mask of present table counts
unsigned __int64 m_sorted; // Bit mask of sorted tables
};
class CMiniMdSchema : public CMiniMdSchemaBase {
ULONG m_cRecs[TBL_COUNT]; // Table row counts
ULONG m_ulExtra; // Extra data
};
TBL_COUNT is 42 in SSCLI, defined in clr\src\inc\metamodelpub.h. m_heaps is a bitmap: bit 0 = #Strings, bit 1 = #Guid, bit 3 = #Blob. When a bit is set, the corresponding heap uses 4-byte (instead of 2-byte) indexes.
m_maskvalid and m_sorted are 64-bit bitmaps for table status. Since m_maskvalid = 0x901a21557 in our example, only 14 of 42 tables actually have data. The code in CMiniMdSchema::LoadFrom loads only the existing tables using this bitmap.
The 14 tables present in Hello.exe and their row counts:
- Module: 1
- TypeRef: 4
- TypeDef: 3
- Field: 1
- Method: 6
- Param: 2
- MemberRef: 4
- CustomAttribute: 1
- StandAloneSig: 2
- PropertyMap: 1
- Property: 1
- MethodSemantic: 2
- Assembly: 1
- AssemblyRef: 1
After the schema data comes the metadata table data, with a structure similar to relational database tables: rows and columns, each row representing a record.