Use memory map files in VC ++ to process large files

xiaoxiao2021-03-05  26

Introduction File Operation is one of the most basic functions of the application. Win32 API and MFC provide functions and classes that support file processing. Commonly used CreateFile (), WriteFile (), readfile (), and MFC offer cfile Class, etc. In general, these functions can meet the requirements of most cases, but for several tens of GB, hundreds of GB, or even TB, the mass storage required for some special application areas, and then process the usual file processing method. Obviously it is not possible. Currently, the operation of this large file is generally processed in a mode of memory mapping files, which will be discussed below for this Windows core programming technology. Memory mapping files are similar to virtual memory, and you can keep an area of ​​an address space through a memory mapping file, and submit the physical memory to this area, just the physical memory mapping of memory files from a file already existing on disk, Instead of the system's page file, and must first map files before the file is performed, just load the entire file from disk to memory. It can be seen that when using the memory map file to process files stored on the disk, it will not be necessary to perform I / O operations on the file, which means that it will not be necessary to apply and allocate the cache when processing the file. The file cache operation is directly managed by the system. Since the file data is loaded into memory, the data from memory to files and releases the memory block, the memory map file can be played when processing a large amount of data. Pretty important role. In addition, the system in the actual engineering often needs to share data between multiple processes. If the amount of data is small, the processing method is flexible, and if the shared data capacity is huge, then it needs to be performed by means of a memory mapping file. In fact, memory mapping files are the most effective way to solve data sharing between locals. Memory map files are not simple file I / O operations, actually use Windows core programming technology - memory management. So, if you want to have a more profound understanding of memory map files, you must have a clear understanding of the memory management mechanism of the Windows operating system. The relevant knowledge of memory management is very complicated, and the discussion category of this article is exceeded. Interested readers can refer to other related books. The general method of using a memory map is given: First, you must create or open a file kernel object through the CreateFile () function, which identifies the file that will be used as the memory map file on the disk. After advertising the file image in the location of the file image in the physical memory, only the path of the image file is specified, and the length of the image is not specified. To specify how much physical storage is required to specify file mapping objects, you need to create a file mapping kernel object to tell the system file size and access the file. After the file mapping object is created, you must retain an address space area for file data, and submit file data as a physical memory mapped to the area. The MapViewOffile () function is responsible for managing all or part of the file mapping object to the process address space through the management of the system. At this time, the use and processing of the memory mapping file is basically the same as the processing method of file data that is usually loaded into the memory. When the use of the memory map file is completed, the clearance is completed through a series of operations and Use the release of resources. This part is relatively simple, and you can complete the image of the file data from the process of address space by unmapViewoffile (), and close the file mapping objects and file objects created in front of CLOSEHANDLE ().

Memory-mapped files correlation function when using memory-mapped files, API function used mainly mentioned earlier that several functions, the following were introduced them: HANDLE CreateFile (LPCTSTR lpFileName, DWORD dwDesiredAccess, DWORD dwShareMode, LPSECURITY_ATTRIBUTES lpSecurityAttributes DWORD DWCREADISPSITION, DWORD DWFLAGSANDASPOSITION, HANDLE HTEMPLATTRIBUTES, HANDLE HTEMPLATEFILE () Function CreateFile () Even when it is also used in normal file operation, open files, when processing memory mapping files, this function is created / opened a file kernel object, And return its handle, need to set the parameters DwdesiredAccess and DWSHAREMODE according to whether the data read and write and file is required, and the error parameter setting will result in failure when the corresponding operation. HANDLE CreateFileMapping (HANDLE hFile, LPSECURITY_ATTRIBUTES lpFileMappingAttributes, DWORD flProtect, DWORD dwMaximumSizeHigh, DWORD dwMaximumSizeLow, LPCTSTR lpName); CreateFileMapping () function to create a file-mapping kernel object, specify the file handle to be mapped into the process address space by parameters hFile (the handle by the The return value of the createfile () function is obtained). Since the physical memory of the memory mapping file is actually stored on the disk, not the memory allocated from the system's page file, the system does not actively reserve the address space area, nor will the file storage space Map to this area, in order to make the system to determine what protecting properties to the page, you need to set by parameter flprotect, protect attributes Page_readonly, Page_ReadWrite, and Page_WriteCopy, you can read, read and write file data. . When using PAGE_READONLY, we must ensure that CreateFile () is used in GENERIC_READ parameters; PAGE_READWRITE requires CreateFile () is used in GENERIC_READ | GENERIC_WRITE parameters; As for property PAGE_WRITECOPY only need to ensure that CreateFile () uses one of GENERIC_READ and can GENERIC_WRITE . DWORD type parameters DwMaximumSizeHigh and dwmaximumsizerow are also quite important to specify the maximum number of bytes of the file, because the two parameters are 64 bits, so the maximum file length is 16eb, which can almost satisfy any big data volume file processing. Requirements. LPVOID MapViewOfFile (HANDLE hFileMappingObject, DWORD dwDesiredAccess, DWORD dwFileOffsetHigh, DWORD dwFileOffsetLow, DWORD dwNumberOfBytesToMap); MapViewOfFile () function is responsible for mapping the data file into the process address space parameters hFileMappingObject is CreateFileMapping () returns the file image object handle. The parameters DwdesiredAccess each specify the access method of the file data and also matches the protection attribute set to the createFileMapping () function. Although the protection attributes are repeatedly set up here, it can make the application more effectively control the application of the protection attribute of the data.

The MapViewOffile () function allows all or part of the mapping file, when mapping, you need to specify the offset address of the data file and the length of the to map. The offset address of the file is specified by the 64-bit value consisting of DWORD type parameters dwfileoffsetHigh and DwFileOffsetLow, and must be an integration of the allocation grain size of the operating system. For the Windows operating system, the assignment particle size is fixed to 64KB. Of course, it is possible to dynamically obtain a particle size distribution of the current by the operating system following code: SYSTEM_INFO sinf; GetSystemInfo (& sinf); DWORD dwAllocationGranularity = sinf.dwAllocationGranularity; dwNumberOfBytesToMap parameter specifies the length of the data mapping file, should be particularly pointed out that, for Windows 9X operating system, if MapViewOffile () can't find a large area to store the entire file mapping object, will return null values; but under Windows 2000, MapViewOffile () only needs to find a very big enough for the necessary view. The area can be regarded as the size of the entire file mapping object. After completing the file processing that is mapped to the process address space area, you need to complete the release of the file data image through the function unmapViewOffile (), which is as follows: BOOL UnmapViewOffile (LPCVOID LPBASEADDRESS; unique parameter lpBaseAddress specified returning area) The base address must be set to the return value of MapViewOffile (). After using the function mappviewoffile (), there must be a corresponding unmapViewOffile () call, otherwise the preserved area will not be released before the process is terminated. In addition to this, the file kernel object and file mapping kernel object have been created in front, and it is necessary to release it through CloseHandle () before the process is terminated, otherwise resource leakage problem will occur. . In addition to these necessary API functions, other secondary functions should be selected as appropriate when using memory map files. For example, when using a memory mapping file, in order to increase the speed, the system will make the data page of the file in cache, and the disk image of the file is not updated immediately when the file mapping view is processed. To solve this problem, you can consider using the FlushViewOffile () function, which enforces the modified data sections or all of them to the disk image, so that all data updates can be saved to disk in time. Using a Memory Mapping File Processing Big File Application Example The following combines a specific instance to further describe how the memory mapping file is used. This example receives data from the port and stores it in the disk in real time, and the memory map file is handled in this way due to large data amount (tens of GB). The following is a part of the main code in the working thread mainproc, which starts from the program running, and when the port has data arrival, the event hevent [0], the waitformultipleObjects () function waits for the event after the event will receive The data is saved to the disk, and if the end reception will issue an event HEVENT [1], the event handler will be responsible for completing the release of the resource release and the file closure.

Gives this thread handler following specific implementation process: ...... // create a file kernel object, its handle is stored in hFileHANDLE hFile = CreateFile ( "Recv1.zip", GENERIC_WRITE | GENERIC_READ, FILE_SHARE_READ, NULL, CREATE_ALWAYS, FILE_FLAG_SEQUENTIAL_SCAN, NULL) ; // Create a file mapping kernel object, handle is stored in HFileMappingHandle Hfilemapping = CreateFileMapping (HFile, NULL, PAGE_READWRITE, 0, 0X4000000, NULL); // Release File Core Object CloseHandle (HFILE); // Set size, offensive other parameters __int64 qwFileSize = 0x4000000; __ int64 qwFileOffset = 0; __ int64 T = 600 * sinf.dwAllocationGranularity; DWORD dwBytesInBlock = 1000 * sinf.dwAllocationGranularity; // mapping file data into the address space PBYTE pbFile = (PBYTE) MapViewOfFile ( hFileMapping, FILE_MAP_ALL_ACCESS, (DWORD) (qwFileOffset >> 32), (DWORD) (qwFileOffset & 0xFFFFFFFF), dwBytesInBlock); while (bLoop) {// capture event hEvent [0] and events hEvent [1] DWORD ret = WaitForMultipleObjects (2, HEVENT, FALSE, INFINITE; RET - = WAIT_Object_0; Switch (RET) {// Receive Data Event Trigger Case 0: // Receive data from the port and saved to memory mapping file NReadlen = Syio_read (port [1], PBFile QWFileOffset , QwFileOffset = nreadlen; // When the data is full of 60%, for the anti-data overflow, it is necessary to open a new map view if (QWFileOffset> T) {t = qwfileo ffset 600 * sinf.dwAllocationGranularity; UnmapViewOfFile (pbFile); pbFile = (PBYTE) MapViewOfFile (hFileMapping, FILE_MAP_ALL_ACCESS, (DWORD) (qwFileOffset >> 32), (DWORD) (qwFileOffset & 0xFFFFFFFF), dwBytesInBlock);} break; // termination Event trigger case 1: bloop = false; // Cancel file data image unmapViewOffile (PBFile) from the process; // Close the file mapping object closeHandle (HFilemApping); Break;}} ... If only if only the event trigger processing Simple execution unmapViewoffile () and closehandle () functions will not be able to identify the actual size of the file, that is, if the open memory map file is 30GB, the received data is only 14GB, then the above program is executed, the saved file length is still 30GB.

转载请注明原文地址:https://www.9cbs.com/read-35706.html

New Post(0)