Use memory map files in VC ++ to process large files

xiaoxiao2021-03-06  65

introduction

File operation is one of the most basic functions of the application, Win32 API and MFC provide functions and classes that support file processing, commonly used CreateFile (), WriteFile (), readfile (), WRITEFILE (), readfile (), CFILE class provided by the MFC Wait. In general, these functions can meet the requirements of most cases, but for several tens of GB, hundreds of GB, or even TB, the mass storage required for some special application areas, and then process the usual file processing method. Obviously it is not possible. Currently, the operation of this large file is generally processed in a mode of memory mapping files, which will be discussed below for this Windows core programming technology.

Memory map file

Memory map files are similar to virtual memory. You can keep an area of ​​an address space through a memory mapping file, while submitting the physical memory to this area, just the physical memory mapping of the memory file from a file already existing on the disk, not the system The page file, and must first map the file before operating the file, just load the entire file from the disk to memory. It can be seen that when using the memory map file to process files stored on the disk, it will not be necessary to perform I / O operations on the file, which means that it will not be necessary to apply and allocate the cache when processing the file. The file cache operation is directly managed by the system. Since the file data is loaded into memory, the data from memory to files and releases the memory block, the memory map file can be played when processing a large amount of data. Pretty important role. In addition, the system in the actual engineering often needs to share data between multiple processes. If the amount of data is small, the processing method is flexible, and if the shared data capacity is huge, then it needs to be performed by means of a memory mapping file. In fact, memory mapping files are the most effective way to solve data sharing between locals.

Memory map files are not simple file I / O operations, actually use Windows core programming technology - memory management. So, if you want to have a more profound understanding of memory map files, you must have a clear understanding of the memory management mechanism of the Windows operating system. The relevant knowledge of memory management is very complicated, and the discussion category of this article is exceeded. Interested readers can refer to other related books. The general method of using a memory map is given below:

First, you must create or open a file kernel object through the createFile () function, which identifies the file that will be used as a memory map file on the disk. After advertising the file image in the location of the file image in the physical memory, only the path of the image file is specified, and the length of the image is not specified. To specify how much physical storage is required to specify file mapping objects, you need to create a file mapping kernel object to tell the system file size and access the file. After the file mapping object is created, you must retain an address space area for file data, and submit file data as a physical memory mapped to the area. The MapViewOffile () function is responsible for managing all or part of the file mapping object to the process address space through the management of the system. At this time, the use and processing of the memory mapping file is basically the same as the processing method of file data that is usually loaded into the memory. When the use of the memory map file is completed, the clearance is completed through a series of operations and Use the release of resources. This part is relatively simple, and you can complete the image of the file data from the process of address space by unmapViewoffile (), and close the file mapping objects and file objects created in front of CLOSEHANDLE ().

Memory map file related functions

When using a memory map file, the API function used is mainly the functions mentioned earlier, and the following is introduced:

HANDLE CreateFile (LPCTSTR lpFileName, DWORD dwDesiredAccess, DWORD dwShareMode, LPSECURITY_ATTRIBUTES lpSecurityAttributes, DWORD dwCreationDisposition, DWORD dwFlagsAndAttributes, HANDLE hTemplateFile); function CreateFile () is often used to create even during the normal operation of the file, the file is opened, in the process memory mapping When the file is created / opened, the file is turned back, and the handle is returned. When calling the function, it is necessary to set the parameters dwdesiredaccess and dwsharemode according to whether the data read and write and file is required, and the error parameter setting will be Failed to cause the corresponding operation.

Handle CreateFilemapping (Handle Hfile, LPSecurity_Attributes LPFilemappingAttributes, DWORD FLPROTECT, DWORD DWMAXIMUMSIGH, DWORD DWMAXIMUMSIGH, DWORD DWMAXIMUMSELOW, LPCTSTR LPNAME);

CreateFileMapping () Function Creates a file mapping kernel object, specifying the file handle to the process address space by parameter HFile (the handle is acquired by the CreateFile () function. Since the physical memory of the memory mapping file is actually stored on the disk, not the memory allocated from the system's page file, the system does not actively reserve the address space area, nor will the file storage space Map to this area, in order to make the system to determine what protecting properties to the page, you need to set by parameter flprotect, protect attributes Page_readonly, Page_ReadWrite, and Page_WriteCopy, you can read, read and write file data. . When using PAGE_READONLY, we must ensure that CreateFile () is used in GENERIC_READ parameters; PAGE_READWRITE requires CreateFile () is used in GENERIC_READ | GENERIC_WRITE parameters; As for property PAGE_WRITECOPY only need to ensure that CreateFile () uses one of GENERIC_READ and can GENERIC_WRITE . DWORD type parameters DwMaximumSizeHigh and dwmaximumsizerow are also quite important to specify the maximum number of bytes of the file, because the two parameters are 64 bits, so the maximum file length is 16eb, which can almost satisfy any big data volume file processing. Requirements.

LPVOID MAPVIEWOFFILE (Handle HfileMappingObject, DWORD DWDesIREDACCESS, DWORD DWFILEOFFSETLOW, DWORD DWNUMBEROFBYTOMAP);

The MapViewOffile () function is responsible for mapping the file data to the address space of the process, and the parameter hFileMappingObject is the file image object handle returned for CREATEFILEMAPPING (). The parameters DwdesiredAccess each specify the access method of the file data and also matches the protection attribute set to the createFileMapping () function. Although the protection attributes are repeatedly set up here, it can make the application more effectively control the application of the protection attribute of the data. The MapViewOffile () function allows all or part of the mapping file, when mapping, you need to specify the offset address of the data file and the length of the to map. The offset address of the file is specified by the 64-bit value consisting of DWORD type parameters dwfileoffsetHigh and DwFileOffsetLow, and must be an integration of the allocation grain size of the operating system. For the Windows operating system, the assignment particle size is fixed to 64KB. Of course, it is also possible to dynamically obtain the allocation granularity of the current operating system by the following code: system_info sinf; getSysteminfo (& SINF); DWORD DWALLOCATIONGRANULARITY = SINF.DWALLOCATIONGRANTY;

The parameter dwnumberofbytestomap specifies the mapping length of the data file, which is especially pointed out that for the Windows 9X operating system, if MapViewOffile () cannot find a large enough area to store the entire file mapping object, return null value (NULL); but Under Windows 2000, MapViewOffile () only needs to find a large enough area for the necessary view, without considering the size of the entire file mapping object.

After completing the file processing that is mapped to the process space area, you need to complete the release of the file data image through the function unmapViewOffile (), which is as follows:

Bool UnmapViewoffile (LPCVOID LPBASEADDRESS);

The only parameter lpBaseAddress specifies the base address of the return area, and it must be set to the return value of MapViewOffile (). After using the function mappviewoffile (), there must be a corresponding unmapViewOffile () call, otherwise the preserved area will not be released before the process is terminated. In addition to this, the file kernel object and file mapping kernel object have been created in front, and it is necessary to release it through CloseHandle () before the process is terminated, otherwise resource leakage problem will occur. .

In addition to these necessary API functions, other secondary functions should be selected as appropriate when using memory map files. For example, when using a memory mapping file, in order to increase the speed, the system will make the data page of the file in cache, and the disk image of the file is not updated immediately when the file mapping view is processed. To solve this problem, you can consider using the FlushViewOffile () function, which enforces the modified data sections or all of them to the disk image, so that all data updates can be saved to disk in time.

Use memory map file to process large file application examples

Next, a specific example is further described to further describe how the memory map file is used. This example receives data from the port and stores it in the disk in real time, and the memory map file is handled in this way due to large data amount (tens of GB). The following is a part of the main code in the working thread mainproc, which starts from the program running, and when the port has data arrival, the event hevent [0], the waitformultipleObjects () function waits for the event after the event will receive The data is saved to the disk, and if the end reception will issue an event HEVENT [1], the event handler will be responsible for completing the release of the resource release and the file closure. Gives this thread handler following specific implementation process: ...... // create a file kernel object, its handle is stored in hFileHANDLE hFile = CreateFile ( "Recv1.zip", GENERIC_WRITE | GENERIC_READ, FILE_SHARE_READ, NULL, CREATE_ALWAYS, FILE_FLAG_SEQUENTIAL_SCAN, NULL) ;

// Create a file mapping kernel object, the handle is saved in hfilemappinghandle hfilemapping = createfilemapping (HFile, NULL, PAGE_READWRITE, 0, 0X4000000, NULL); // Release the file kernel object CloseHandle (HFILE);

// Setting the size, offset, etc. __int64 qwfilesize = 0x4000000; __ int64 qwfileoffset = 0; __ int64 t = 600 * sinf.dwallocationGranularity; dword dwbytesinblock = 1000 * sinf.dwallocationGranular;

// file data mapped to the address space PBYTE pbFile = (PBYTE) MapViewOfFile (hFileMapping, FILE_MAP_ALL_ACCESS, (DWORD) (qwFileOffset >> 32), (DWORD) (qwFileOffset & 0xFFFFFFFF), dwBytesInBlock); while (bLoop) {// Capture Event HEVENT [0] and event HEVENT [1] DWORD RET = WaitFormultiPleObjects (2, HEVENT, FALSE, INFINITE); RET - = WAIT_Object_0; switch (re) {// Receive data event trigger case 0: // receive from port Receive Data and save to memory map nreadlen = Syio_read (port [1], pbfile qfileoffset, queuelen; QWFileOffset = NReadlen;

// When the data is full of 60%, for the anti-data overflow, it is necessary to open a new map view if (QWFileOffset> T) {t = qwfileoffset 600 * sinf.dwallocationGranularity; unmapViewoffile (pbfile); pbfile = PBYTE) MapViewoffile (HFilemapping, File_Map_all_Access, (dWFILEOFFSET >> 32), (DWORD) (qfileoffset & 0xfffffff), DWBYTESINBLOCK;} Break

// Terminate the event triggers case 1: bloop = false;

// Undo file data image unmapViewoffile (PBFile) from the address space from the process;

// Close the file mapping object closehandle (hfilemapping); Break;}} ... If only simple execution unmapViewOffile () and closehandle () functions during the termination event trigger process will not be able to identify the actual size of the file, that is, if the memory The mapping file is 30GB, and the received data is only 14GB, then the above program is executed, the saved file length is still 30GB. That is, it is necessary to restore the file to the actual size after the processing is completed, and the following is the main code for this requirement:

// Create another file kernel object hfile2 = createfile ("Recv.zip", generic_write | generic_read, file_share_read, null, create_always, file_flag_sequential_scan, null;

// Create another file mapping kernel object hfilemapping2 = createfilemapping (QWFILEOFFSET & 0xFFFFFFFFFFFF), NULL, NULL, NULL, NULL.

// Close the file kernel object closehandle (HFILE2);

// Map the file data to the address space of the process PBFile2 = (pbyte) mappviewoffile (hfilemapping2, file_map_all_access, 0, 0, qfileoffset);

// Copy the data from the original memory map file to this memory map Memcpy (PBFile2, Pbfile, QWFILEOFFSET);

File: // Undo file data image unmapViewoffile (pbfile) from the address space; unmapViewoffile (pbfile2);

// Close the file mapping object closehandle (hfilemapping); CloseHandle (HFileMapping2);

// Delete the temporary file Deletefile ("Recv1.zip");

in conclusion

With the actual test, the memory map file has a good performance when processing large data volume files, which has a significant advantage over the file processing method that usually uses the CFILE class and readFile () and WriteFile (). The code described herein is compiled by Microsoft Visual C 6.0 under Windows 98.

转载请注明原文地址:https://www.9cbs.com/read-113555.html

New Post(0)