Explore NTFS
Webcrazy (tsu00@263.net)
NTFS is a new file system introduced by Windows NT, which has many new features. This article is intended to explore the underlying structure of NTFS, which is only the distribution of files on the NTFS volume. In NTFS, all stored data in the volume is in a file called $ MFT, called the master file table. And $ MFT is composed of file record arrays. The size of File Record is generally fixed, usually 1KB, which is equivalent to inode in Linux. File Record is physically continuous in the $ MFT file and numbered from 0. $ MFT is for File System itself, the architecture file system is used, which is called metadata in NTFS. The metadata file of the NTFS out of Windows 2000 Release is listed below (partial output of the partial output of the sample code to give).
FILE RECORD (Inode) FileName
--------------------------
0 $ mft
1 $ mftmirr
2 $ logfile
3 $ VOLUME
4 $ AttrDef
5.
6 $ Bitmap
7 $ boot
8 $ BadClus
9 $ Secure
10 $ UPCase
11 $ extend
The Dir commands (even / AH parameters) cannot be used in WINDOWS 2000 to list these metadata files like normal files. In fact, File System Driver (NTFS.SYS) maintains a system variable NTFSProtectSystemFiles to hide these metadata. By default, this variable is set to True, so use DIR / AH will not get any files. When you know this behavior, use i386kd to modify NTFSPROTECTSystemFiles, you can list the metadata file:
KD> X NTFS! NTFSPROTECT *
Fe213498 NTFS! NTFSPROTECTSYSTEMFILES
Fe21349c NTFS! NTFSPROTECTSYSTEMATTRIBUTES
KD> DD NTFS! NTFSPROTECTSYSTEMFILES L 2
Fe213498 00000001 00000001
KD> ED NTFS! NTFSPROTECTSYSTEMFILES 0
KD> DD NTFS! NTFSPROTECTSYSTEMFILES L 2
FE213498 00000000 00000001
KD>
D: /> Ver
Microsoft Windows 2000 [Version 5.00.2195]
D: /> DIR / AH $ *
The volume in the driver D is W2KNTFS
The serial number of the volume is E831-9D04
D: / directory
2000-04-27 19:31 36,000 $ AttrDef
2000-04-27 19:31 0 $ Badclus
2000-04-27 19:31 67,336 $ bitmap
2000-04-27 19:31 8,192 $ boot
2000-04-27 19:31
2000-04-27 19:31 13,139,968 $ logfile
2000-04-27 19:31 27,575,296 $ mft2000-04-27 19:31 4,096 $ mftmirr
2000-04-27 19:31 131,072 $ updcase
2000-04-27 19:31 0 $ VOLUME
9 files 40, 961, 960 bytes
1 directory 51,863,552 available bytes
Need to point out NTFS.sys open the metadata file in a special way, so after opening NTFSPROTECTSystemFiles, if you use readFile, you will cause the IRP package that generates IRP_MJ_READ, which will cause Page Fault (see Gary Nebbett "Windows NT / 2000 Native API REFERENCE ").
The above discussion is discussed based on the $ MFT file, namely, based on File Record (Inode) in $ MFT. For better continuation of the following discussion, I list the structure of File Record Header:
Typedef struct {
Ulong Type;
Ushort usaoffset;
Ushort usacount;
USN USN;
} NTFS_RECORD_HEADER, * PNTFS_RECORD_HEADER;
Typedef struct {
NTFS_RECORD_HEADER NTFS;
Ushort sequenceenumber;
USHORT LINKCOUNT;
Ushort attributesoffset;
Ushort flags; // 0x0001 = inuse, 0x0002 = Directory
Ulong Bytesinuse;
Ulong Bytesallocated;
Ulonglong basefileRecord;
Ushort nextAtttributenumber;
} File_record_header, * pfile_record_header;
Below I will discuss how to locate $ MFT. People who are slightly operated in system knowledge will know the boot sector, which is the first sector in the volume. The first sector (of course, other applications such as WinHex) is analyzed by DSKPROBE.EXE (WinDOWS 2000 Resource Kit):
File: d: /sector00.bin
Size: 0x00000200 (512)
Address | 00 01 02 03-04 05 06 07: 08 09 0A 0B-0C 0D 0E 0F | 0123456789ABCDEF
--------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ----------- | -----------------
00000000 | EB 52 90 4E-54 46 53 20:? 20 20 20 00-02 08 00 00 | R NTFS .....?
00000010 聽 聽 聽聽 聽聽 聽聽 聽聽 聽聽 聽聽 聽聽 聽聽 聽聽 聽聽 聽聽 聽聽 聽聽 聽聽 聽聽 聽聽 聽聽 聽聽 聽聽.com
00000020 聽 聽 聽聽 聽聽 聽聽 聽聽 聽聽 聽聽 聽聽 聽聽 聽聽 聽聽 聽聽 聽聽 聽聽 聽聽 聽聽 聽聽 聽聽 聽聽 聽聽 聽聽 聽聽 聽聽 聽聽 聽聽 聽聽
00000030 | 04 00 00 00 00 00 00 00 00 00 00 00 00 | ................
00000040 | F6 00 00 00 00 00 00 00 00: 04 9d 31 E8-BB 31 E8 94 |? .......? 杌 1 钄.
.
.
000001F0 | 00 00 00 00 00 00 00 00 55 aa | ........ 儬成 ..u?
This 512 byte is the following format: (Excerpted from the Gary Nebbett book, many of this document come from or refer to this book.)
#pragma Pack (Push, 1)
Typedef struct {
Uchar jump [3];
Uchar Format [8];
Ushort bytespector;
Uchar SectorsperCluster;
Ushort bootsectors;
Uchar MBZ1;
Ushort MBZ2;
Ushort reserved1;
Uchar mediatype;
Ushort MBZ3;
USHORT SECTORSPERTRACK;
Ushort Numberofheads;
Ulong partitionoffset;
Ulong reserved2 [2];
Ulonglong Totalsectors;
Ulonglong mftstartlcn;
Ulonglong mft2startlcn;
Ulong ClustersperFileRecord;
Ulong ClustersperIndexBlock;
Ulonglong VolumeSerialNumber;
Uchar code [0x1ae];
Ushort bootsignature;
} Boot_block, * pboot_block;
#pragma pack (POP)
The detailed meaning of each field can be substantially clear from the field name. There is also a detailed document in Linux-NTFS GNU project (http://sf.net/projects/linux-ntfs), which is limited to the space I don't list it. You can use the following code to read the first sector in the volume:
Hvolume = createfile (drive, generic_read, file_share_read | file_share_write, 0,
Open_existing, 0, 0);
Readfile (Hvolume, & Bootb, Sizeof (bootb), & n, 0);
Bootb is a boot_block structure, in my format (please correspond to Sect00.bin):
Dump Bootblock At Below:
BYTESPERSECTOR: 200
SECTORSPERCLUSTER: 8
Bootsectors: 0
Sectorspertrack: 3F
Numberofheads: F0
PartitionOffset: 3F
Totalsectors: 41c090
MFTSTARTLCN: 4
MFT2STARTLCN: 41C09
ClustersperfileRecord: F6
ClustersperIndexBlock: 1
VolumeSerialnumber: e8319d04
Bootsignature: aa55
The above MFTSTARTLCN is actually a cluster (Cluster) in the volume of MFT. Clusters are the basic units of NTFs, minimum units. A file with only 1byte should also take up a cluster of space. NTFS uses the LCN (Logical Cluster Number) to represent the physical location in the NTFS volume, which is simple to numbered from the total cluster number in the 0 to the volume. For a specific file NTFS, use the VCN (Virtual Cluster Number) to map the organization of the LCN implementation file. From the value of MFTSTARTLCN, you can know that the LCN of $ MFT is 4 and the SectorsPerCluster, the size of the Bytespector can locate the location of the $ MFT. After getting the location of the $ MFT, if all File Record in the $ MFT can get all the files in the volume (previously mentioned that File Record is just simple from 0). That is to say, so far can you have the easiest understanding of the document organization, but how to get the file information, such as file name, etc. All documents in NTFS include normal user files, metadata files, data, attributes, etc. in the same way. I am listed in the output of NFI.exe (from Windows NT / 2000 OEM Support Tools) as the beginning of my narrative: D: /> COPY CON FILE
Testforntfs ^ z
1 file has been copied.
D: /> NFI D: / file
NTFS File Sector Information Utility.
Copyright (c) Microsoft Corporation 1999. All Rights Reserved.
/ file
$ StandARD_INFORMATION (Resident)
$ FILE_NAME (Resident)
$ Data (Resident)
D: /> echo testforattr> file: attr
D: /> NFI D: / file
NTFS File Sector Information Utility.
Copyright (c) Microsoft Corporation 1999. All Rights Reserved.
/ file
$ StandARD_INFORMATION (Resident)
$ FILE_NAME (Resident)
$ Data (Resident)
$ Data Attr (Resident)
NFI's output results $ Standard_information, $ file_name, $ data, etc. are known as attributes in NTFS (Attribute). The property is divided into resident attribute and a very stationery (NonResIdent Attribute). The data of the file is also included in the properties, which seems to be a bit mixed with the name of the property. However, this has made NTFS have the form of more unified organizational documents. This also allows NTFS with Multistreams (this feature). The implementation code for the given Attribute is as follows: By the specified FILE RECORD, the code is as follows:
Template
T1 * PADD (T1 * P, T2 N) {RETURN (T1 *) ((char *) p n);}
Pattribute Findattribute (Pfile_Record_Header File,
Attribute_type type, pwstr name)
{
For (Pattribute Attr = Pattribute (Padd (File, File-> AttributeSoffset);
Attr-> attributeType! = -1;
Attr = padd (attr, attr-> length) {
IF (attr-> attributeType == type) {
IF (Name == 0 && Attr-> Namelength == 0) Return Attr; if (Name! = 0 && Wcslen (Name) == Attr-> Namelength
&& _WCSICMP (Name, Pwstr (Padd (attr, attr-> nameoffset))) == 0) Return Attr;
}
}
Return 0;
}
This FindAttribute function provided by Gary Nebbett is not a bug when attribute name (ie, the third parameter) is not an empty string. The main reason is that _wcsicmp is compared to the Unicode string should be the standard C-string ended with / 0. . I have corrected this error in the code provided.
Below I will analyze this code by using Softice to get the $ FILE_NAME attribute of $ MFT to get the File Name of $ MFT. This example is equally applicable to $ file_name (such as the File above) to get other files, as well as other properties such as $ data, and more.
: BPX Findattribute
Break Due to BPX Findattribute (et = 6.89 seconds)
: Locals
[EBP-4] Struct Attribute * attr = 0x00344d68 <{...}>
[EBP 8] STRUCT FILE_RECORD_HEADER * File = 0x00344d38 <{...}>
[EBP C] ENUM Attribute_Type Type = AttributeFileName (30)
[EBP 10] unsigned short * name = 0x004041bc <"$ mft">
:? file
Struct file_record_header * = 0x00344d38 <{...}>
Struct NTFS_Record_Header NTFS = {...}
Unsigned short sequenceenumber = 0x1, "/ 0 / x01"
Unsigned short linkcount = 0x1, "/ 0 / x01"
Unsigned short attributesoffset = 0x30, "/ 00"
Unsigned short flags = 0x1, "/ 0 / x01"
Unsigned long Bytesinuse = 0x2d8, "/ 0/0 / x02 / xd8"
Unsigned long bytesallocated = 0x400, "/ 0/0 / x04 / 0"
Unsigned quad basefileRecord = 0x0, "/ 0/0/0/0/0/0/0/0"
Unsigned short nextattributenumber = 0x6, "/ 0 / x06"
File parameters I am incorporated from $ MFT, from the LCN = 4 of $ MFT, it can be obtained in the physical address in the volume, which has been described above. You can also use DSKProbe (IP) to get the output of Softice under the bottom:
: DD @file // The following comments can be defined in the file_record_header definitions listed in the beginning of the text.
0023: 00344D38 454C4946 0003002A 6D4AC04D 00000000 file * ... m.jm .... 0023: 00344D48 00010001 00010030 000002d8 00000400 .... 0 .........
----
| __ATTRIBUTEOFFSET
0023: 00344D58 00000000 00000000 04340006 0000fa0d .......... 4 .....
0023: 00344d68 00000010 00000060 00180000 000,000 00 ... `...........
-------- --------
| | _ Pointed out this attribute length. Defined as follows.
| _Tribute header, based on AttributeOffset, is defined as follows. 00000010 pointed out this attribute for StandardInformation
0023: 00344D78 00000048 00000018 2C1761D0 01BFB03C H ........ a., <...
Attribute heads are defined below:
Typedef struct {
Attribute_type attributeType;
Ulong Length;
Boolean Nonresident;
Uchar namelength;
USHORT NameOffset;
Ushort flags; // 0x0001 = compressed
Ushort attributenumber;
} Attribute, * pattribute;
Typedef struct {
Attribute attribute;
Ulong valueelength;
Ushort valueoffset;
Ushort flags; // 0x0001 = indexed
} Resident_attribute, * president_attribute;
Typedef struct {
Ulonglong DirectoryFileReferenCenumber;
Ulonglong CreationTime; // Saved When FileName Last Changed
Ulonglong ChangeTime; // Ditto
Ulonglong LastWrittime; // Ditto
Ulonglong LastAccesstime; // DITTO
Ulonglong allocateedsize; // Ditto
Ulonglong DataSize; // DITTO
Ulong FileAttributes; // Ditto
Ulong alignorreserved;
Uchar namelength;
Uchar nametype; // 0x01 = long, 0x02 = short
Wchar Name [1];
} Filename_attribute, * pfilename_attribute;
Attribute_type is an enorm type definition. The 00000010 is StandardInformation. 30 is FileName. Because FileNameAttribute is always a standing Attribute, I will also define the resident_attribute definition. OK, now you can continue DUMP next Attribute:
// DD @ file file-> attributeoffset length (StandardInformationAttribute): DD @ file 30 60
0023: 00344dc8 00000030 000,00068 00180000 00030000 0 ... h ...........
-------- ----------
| | ___ The namelength here refers to the filenamettribute name of the FILENAMEATTRIBUTE. Don't mix with $ MFT FileName.
| _ Pointed out this is a FileNameAttribute.
0023: 00344DD8 0000004a 00010018 00000005 00050000 j ...............
-------- ---- --------
| | | _Priend of valueoffset, get the specific location of FileName_Attribute.
| | | _Valueoffset value
| _VALUELENGTH value
0023: 00344DE8 2C1761D0 01BFB03C 2C1761D0 01BFB03C .a., <.... a., <...
0023: 00344df8 2C1761D0 01BFB03C 2C1761D0 01BFB03C .a., <.... a., <...
0023: 00344E08 00004000 00000000 00004000 00000000. @ ....... @ ......
0023: 00344E18 00000006 00000000 00240304 0046004D ........ $. M.f.
- ------------
| | ___ Find the filename of $ MFT.
| _NameLength
0023: 00344E28 00000054 00000000 00000080 000000190 T .............
0023: 00344E38 00400001 00010000 00000000 00000000 .. @ .............
This gives a specific method of Dump Attribute here. Finally, I will give the code that traversed the File Record, which should explain the $ BITMAP attributes in the $ MFT before giveing the code. NTFS's Attribute is equivalent to the S_Inode_bitmap array of Linux Ext2 (Linux 2.0). Therefore, it is easy to understand the role of $ bitmap, that is, each bit is pointed out in the case of the corresponding File Record. The following is the code of DumpAllFileRecord:
Bool BitSet (Puchar Bitmap, Ulong i)
{
Return (Bitmap [I >> 3] & (1 << (i & 7))))! = 0;
}
Void DumpAllFileRecord ()
{
Pattribute Attr = Findattribute (MFT, AttributeBitmap, 0);
Puchar Bitmap = New Uchar [AttributeLengthThallocated (attr)];
Readttribute (Attr, Bitmap);
Ulong n = attributength (Findattribute (MFT, AttributeData, 0)) / bytesperfileRecord;
Pfile_record_header file = pfile_record_header (new uchar [bytesperfileRecord]); for (Ulong i = 0; i IF (! BitSet (Bitmap, I)) Continue; ReadFileRecord (i, file); IF (file-> ntfs.type == 'Elif' && (file-> flags & 3) { Attr = Findattribute (File, AttributeName, 0); IF (attr == 0) Continue; PfileName_Attribute Name = PfileName_Attribute (PADD (attr, president_attribute (attr) -> valueoffset); Printf ("% 8LU%. * ws / n", i, int (name-> namelength), name-> name) } } } This article refers some of the definitions of Gary Nebbett. Some of the WINDOWS 2000 versions may have some very small access, but the Internet has its magical place. Although Microsoft does not provide this information, such as Linux-NTFS GNU engineering is a very good study of NTFS. Information, this article also refers to many documents it provides. In addition, Mark Russinovich, "Inside NTFS", "Exploring NTFS On-Disk Structures", etc. are also very good NTFS data. This article still does not involve the organization (B tree) of the catalog in NTFS, and maybe I will introduce. The full code introduced in the text can be downloaded by http://webcrazy.yeah.net. Errors also welcome to letter (TSU00@263.net)! Finally, I thank Anton Altaparmakov, thanking my colleagues asked me to bought the Gary Nebbelet when I was on a business trip. Thank you for the originals of the information I saw. Thank you! Reference: 1.Gary Nebbett "Windows NT / 2000 Native API Reference" 2.Linux-NTFS Project NTFS Documentation Version 0.4 3.Mark Russinovich related documentation 4. David SolomoMOM "Inside Windows NT, 2nd Edition"