Berkeley DB Overview
Picong
Berkeley DB is a library of an open source embedded database developed by US Sleepycat Software, which provides scalable, high-performance, transactional data management services. Berkeley DB provides a set of simple functions to the API interface for the access and management of the data.
It is a classic Toolkit for Toolkit, which provides a wide range of function sets that provide industrial-grade intensity database services for application developers. Its main features are as follows:
Embedded: It is directly linked to the application, running in the same address space with the application, so, whether between different computers on the network or the different processes of the same computer, database operations and Do not require inter-process communication.
Berkeley DB provides an API interface for a variety of programming languages, including C, C , Java, Perl, TCL, Python, and PHP, all database operations in the library. Multiple processes, or multiple threads of the same process can use the database simultaneously, as they are used separately, the underlying service such as locking, transaction log, shared buffer management, memory management, etc. are transparently executed by the library.
Portable: It can run almost all UNIX and Linux systems and their variant systems, Windows operating systems, and a variety of embedded real-time operating systems. It can operate on 32-bit and 64-bit systems, which has been used in many high-end Internet servers, desktops, handheld computers, set-top boxes, network switches, and other applications. Once the Berkeley DB is linked to the application, the end user generally does not feel that there is a database system exists.
Scalable: This is manifested in many ways. Database Library itself is a very streamlined (less than 300KB of text space), but it can manage a database of 256TB. It supports high concurrency, thousands of users can manipulate the same database at the same time. Berkeley DB can operate with a sufficiently small space occupancy system, or can consume several GB of memory and several TB disk space on high-end servers.
Berkeley DB is better than relational databases and object-oriented databases in embedded applications: (1) Because the database label is running in the same address space with the application, the database operation does not require communication between the processes. Different processes between a machine or in different machines in different machines, far more than the overhead of function calls;
(2) Because Berkeley DB uses a set of API interfaces for all operations, there is no need to resolve some kind of query language, nor does it need to generate an execution plan, which greatly improves the operation.
BerkeleyDB system structure
Berkeley DB consists of five major subsystems. Including: Access management subsystem, memory pool management subsystem, transaction subsystem, lock subsystem, and log subsystem. The access management subsystem is the internal core component of the Berkeley DB database process package, while other subsystems exist outside the Berkeley DB database process package.
Each subsystem supports different application levels.
1. The Access Methods subsystem provides a variety of support for creating and accessing the database file. Berkeley DB provides the following four file storage methods: hash file, B tree, fixed length record (queue), and beam-length record (based on record numbers), the application can select the most suitable file organization structure. The programmer can use any structure when creating a table, and can mix the files of different storage types in the same application. In the case where there is no transaction management, the modules in the subsystem can be used separately to provide fast and efficient data access services for the application. The data access subsystem applies to applications that need to be quickly formatted without transactions.
2. Memory Pool Subsystem Memory Pool Subsystems are valid for the shared buffers used by Berkeley DB. It allows multiple processes or multiple threads of the database at the same time to share a cache, which is responsible for writing the modified page back to the file and assigns memory space for the newly transferred page. It can also be used independently of the Berkeley DB system, alone is used separately, and allocate memory space for its own files and pages. The memory pool management subsystem applies to applications that require flexible, page-oriented, buffered shared file access.
3. Transaction subsystem The subsystem provides transaction management capabilities for Berkeley DB. It allows a set of modifications to the database as an atomic unit, which is either all, or not. In the default, the system will provide a strict ACID transaction property, but the application can choose from isolation assurance that the system is used. The subsystem uses two block locks and first write log policies to ensure the correctness and consistency of database data.
It can also be used by the application to perform transaction protection for its own data updates. The transaction subsystem is applicable to the application of the required transaction guarantee data. 4. Locking subsystem provides locking mechanisms for Berkeley DB, providing multi-user read and single-user sharing control. The data access subsystem can utilize the subsystem to obtain read and write permissions on the page or record; the transaction subsystem utilizes the lock mechanism to implement concurrent control of multiple transactions. The subsystem can also be used separately by the application. The lock subsystem is suitable for a flexible, fast, setable lock manager. 5. The logge system log (Logging) The subsystem is a policy of writing a log, and is used to support the transaction subsystem to perform data recovery to ensure data consistency. It is unlikely to be used separately by the application, only as a call module of the transaction subsystem.
The above part constitutes the entire Berkeley DB database system. The relationship between each part is shown below: In this model, the application directly calls the data access subsystem and the transaction management subsystem, which in turn calls the lower memory management subsystem, lock subsystem and log. Subsystem. Since several subsystems are relatively independent, the application can specify which data management services will be used when the application starts. You can use it all, you can use only some of them. For example, if an application needs to support multi-user concurrency operation, do not require transaction management, it can only use the lock subsystem without transaction. Some applications may require fast, single users, B-tree storage structures without transaction management functions, then applications can make the lock subsystem and transaction subsystem fail, which will reduce overhead.
BerkeleyDB Storage Function Overview The logical organization unit of Berkeley DB is a database (Database) that is independent or has a certain relationship, each of which is composed of several records, which are all represented by key, value.
If you look at a set of related (key, value) as a table, each database is only allowed to store a table, which is different from the general relational database. In fact, the "database" mentioned in Berkeley DB is equivalent to the table in the general relational database system; "Key / Data" is equivalent to the row in the relational database system; Berkeley DB does not provide relational database The functions of direct access directly, but in the "Key / Data" DATA in the "Key / DATA" to encapsulate fields (columns) through the actual application. In physical organization, each database can be selected by the application based on its data characteristics at its data. The four file storage structures available for selection are: hash file, B tree, fixed length record (queue), and beam-length records (based on record numbers).
A separate database can be stored in a physical file, or a number of associated or unrelated databases can also be stored, and these databases can use any different organization methods other than the queue, and the database organization can only be stored separately. One file cannot be mixed with other storage types.
An arbitrary multiple databases can be stored in addition to the constraints of the maximum file length and storage space. So the system is positioned a database usually requires two parameters - "File Name" and "Database Name", which is also where Berkeley DB is different from the general relationship database.
The Berkeley DB storage system provides a range of interface functions for the application for administrative and operation of the database. These include: (1) Creating, opening, closing, deleting, renovating, renovation, etc. Information, read the information of the database environment, clear the content of the database, the database's synchronous backup, version upgrade, prompt information, etc.; (3) The system also provides a cursor mechanism for accessing and accessing the group. And to associate and equivalence connection operations for two or more related databases; (4) The system also gives some interface functions to optimize the configuration of the strategy, such as the application to set the Sort of B. Comparison Function, saving the minimum number of keys in each page, hash bucket fill factors, hash functions, hash table maximum length, maximum length of the queue, the byte order of the data inventory, the size of the underlying storage page, the memory allocation function, The size of the cache, the size of the fixed length record and the filling bit, the separator used by the long record, and the like.