Cache policy for database objects
Foreword
This article explores the cache policy of database objects of JIVE (Open-source Java Forum) and Hibernate (Java open source persistence), and explains the database object cache policy used by the author's Lightor (Java open source persistence).
This article explores the JIVE code based on previous open source, Hibernate2.1.7 source, and author's Lightor code.
This article uses ID (Identifier's abbreviation) to represent the keyword of the data record.
Data object query is generally divided into two: conditional queries, return a list of data objects that meet the conditions; ID query, return the data object corresponding to the ID.
This article mainly discusses the cache policy of "Conditional Inquiry" and "ID Query".
This article only discusses a data cache policy within a JVM, does not involve distributed cache; this article only discusses the cache of the corresponding single table data object, and does not involve the case of associated table objects.
First, JIVE caching strategy
1. Process description of the JIVE cache policy:
(1) When the condition is queried, JIVE uses the SQL statement such as SELECT ID from table_name where .... (Only ID field) query the database to get a ID list.
(2) JIVE according to each ID in the ID list, first check whether there is a data object in the cache: if there is, then remove it directly, add it to the list of results; if there is no existence, then pass a select * from table_name where ID = {id value} This SQL query database, takes out the corresponding data object, put it into the result list, and puts this data object into the cache according to ID.
(3) When the ID query, JIVE performs a procedure similar to the step (2), first looking for this ID from the cache, can't find, then query the database, then put the result into the cache.
(4) When deleting, updating, increasing data, while updating the cache.
2. Advantages of the JIVE cache policy:
(1) When the ID query, if the ID already exists in the cache, then it can be taken out. Save a database query.
(2) When the result set of multiple conditions queries intersects, the data object in the intersection is not used to obtain the entire acquisition from the database, and can be obtained directly from the cache.
For example, the ID list of the first query is {1, 2}, and then take an out of the data object from the database according to the ID list, and the result set is {A (ID = 1), B (ID = 2)}.
The next query ID list is {2, 3}, since the data objects of ID = 2 already exist in the cache, then remove the data objects of ID = 3 from the database.
3. Disadvantages of JIVE caching strategy:
(1) It is essential to obtain the database query for the ID list (1) for the DAO in the process of finding the data object list.
(2) If there is n ID in the ID list returned in step (1), in the worst hit rate (no ID in the cache), JIVE also queries the N times database. In the worst case, a total of N 1 database queries are required.
Second, Hibernate's second-level caching strategy
Hibernate wraps the database connection from the SESSION class to the shutdown process.
Session inside and maintains a collection of data objects, including the data objects selected in this session. This is called the session internal cache, which is the first level of Hibernate, which belongs to the established behavior of Hibernate, and does not need to be configured (there is no way to configure :-). The session life is very short, and the first-level cached life of the session is of course very short, and the hit rate is naturally low. Of course, the main role of this session internal cache is to maintain the internal data status synchronization of the session.
If you need a global cache that across the Semion's hit rate, you must perform a secondary cache configuration for Hibernate. In general, the same data type (Class) data object, shared a secondary cache (or the same piece).
1. Hibernate Secondary Cache Policy Procedure Description:
(1) When the condition is queried, I always issue a SELECT * from Table_name where .... (Select all fields) Query the database, get all the data objects at a time.
(2) All data objects obtained are placed in the second level according to ID.
(3) When Hibernate accesses the data object, first check from the Session level cache; if you can't find the secondary cache, you can check from the secondary cache; you can't find it, then query the database, The results are placed in the cache according to ID.
(4) When deleting, updating, increasing data, while updating the cache.
2. Advantages of Hibernate Second Cache Policy:
(1) When the JIVE caching policy is the same: ID query, if the ID already exists in the cache, then it can be taken out. Save a database query.
(2) Section (2) disadvantages without JIVE caching strategies, that is, Hibernate does not have the N 1 database query in worst cases.
3. Disadvantages of Hibernate Secondary Cache Policy:
(1) As the disadvantage of subsection (1) of the JIVE Cache Policy, the database query statement step (1) in step (1) is not less. And Hibernate selection all fields, which is much more time and space that spends only the ID field.
(2) Section (2) of the JIVE cache policy does not have the advantages of the JIVE caching strategy. When the condition is queried, the database object must be removed from the database, even if the ID of the database already exists in the cache.
Third, Hibernate Query Cache Policy
It can be seen that the second-level caching strategy for JIVE cache and Hibernate is just a cache policy for the ID query, which has no effect on the conditional query. (Although the advantages of the JIVE cache (2), it is necessary to avoid duplicate data objects corresponding to the same ID from the database, but SELECT ID from ... This database query is essential to each condition query).
To this end, Hibernate provides Query cache for conditional queries.
1. Hibernate's Query Cache Policy Procedure Description:
(1) Conditional query requests generally include the following information: SQL, SQL required parameters, record range (starting position ROWSTART, maximum number maxROWS), etc.
(2) Hibernate firstly makes up a query key based on this Query Key to Query Cache. If there is, then return to this result list; if there is no existence, query the database, get the list of results, put the entire result list according to Query Key in the Query Cache. (3) The SQL in the Query Key involves some table names, if any of these tables modify, delete, increase, etc., these related query keys are emptied from the cache.
2. Advantages of Hibernate Query Cache Policy
(1) When the condition is queried, if the Query Key already exists in the cache, then no need to query the database. In the case of hit, a database query is not required.
3. Disadvantages of Hibernate's QUERY Cache Policy
(1) In the table involved in the query, if there is any record, delete, or change, the Query Key related to the table is invalid in the cache.
For example, there are several groups of query keys, and their SQL includes Table1.
SQL = SELECT * from Table1 WHERE C1 =? ...., Parameter = 1, rowstart = 11, maxROWS = 20.
SQL = SELECT * from Table1 WHERE C1 =? ...., Parameter = 1, ROWSTART = 21, maxROWS = 20.
SQL = SELECT * from Table1 WHERE C1 =? ... .., parameter = 2, rowstart = 11, maxROWS = 20.
SQL = SELECT * from Table1 WHERE C1 =? ... .., parameter = 2, rowstart = 11, maxROWS = 20.
SQL = SELECT *WOM TABLE1 WHERE C2 =? ...., Parameter = 'abc', rowstart = 11, maxROWS = 20.
When any data object (any field) of Table1 changes, increasing, delete, these Query Key corresponding to the result set does not guarantee that there is no change.
It is difficult to accurately determine the results set of the Query Key corresponding to the result set of Query Key according to the data object. The simplest implementation method is to empty all SQL containing the Query Key of Table1.
(2) In the Query Cache, the Query Key corresponds to the list of data objects. If the data objects corresponding to different Query Key have intersection, the data objects of the intersection part are repeatedly stored.
For example, the list of data objects corresponding to Query Key 1 is {a (id = 1), B (ID = 2)}, Query Key 2 corresponds to {A (ID = 1), c (id = 3) } This A has two copies at both LISTs.
4. Secondary cache and Query cache synchronization
If the Query Cache, a Query Key corresponds to the result list of {a (id = 1), B (ID = 2), c (id = 3)}; the secondary cache has an ID = 1 corresponding data object a.
What is the relationship between these two data object a? Can you keep the status synchronization?
I read the relevant source code of Hibernate and did not find this synchronization between two caches. Or there is no relationship between the two. As I said above, as long as the table data changes, the relevant Query Key must be emptied. So don't consider synchronization problems?
Fourth, Lightor's caching strategy
Lightor is the Java open source persistent layer frame. Lightfor means LightWeight O / R. Hibernate, JDO, EJB CMP These persistent layer frames are Layer. Lightor can't be at Layer, but just a helper. Here's O / R is not Object / Relational, but the meaning of Object / ResultSet. :-)
Lightor's cache policy, primarily refer to Hibernate's cache idea, Lightor's cache is also divided into QUERY cache and ID cache. But there is a point difference, and there is no connection between the two, but interconnected.
1. Process Description of Lightor's Cache Policy:
(1) Requests of conditional queries generally include the following information: SQL, corresponding SQL parameters, start recording position (ROWSTART), maximum number of records, and so on.
(2) Lightor first constitutes a Query Key based on this information, finds the corresponding result ID list according to this query key to the Query cache. Note that here is the list of IDs.
If the result ID list exists in the Query cache, the corresponding data object is taken to the ID cache according to each ID of this ID list. If all the data objects corresponding to all IDs are found, the list returns this data object result list. Note that here is a list of the entire data object (all fields).
If the result ID list does not exist in the Query cache, or an ID in the result ID list does not exist in the ID cache, then query the database, obtain the list of results. Then, put the acquired data object into the ID cache according to the ID; and assemble it into a ID list, follow the Query Key to the Query cache. Note that this is a list of IDs instead of the entire object list, and placed in the Query cache.
(3) When the ID query, Lightor looks up the ID from the ID cache. If there is no existence, then query the database, put the result into the ID cache.
(4) The SQL in the Query Key involves some table names, if any of these tables modify, delete, increase, etc., these related query keys are emptied from the cache.
2. Advantages of Lightor's Cache Policy
(1) Lightor's ID has the advantage of jive cache, and Hibernate secondary ID cache. When the ID query, if the ID already exists in the cache, then it can be taken out. Save a database query.
(2) Lightor's Query Cache has the advantage of the Query cache of Hibernate. When the condition is queried, if the Query Key already exists, then no need to query the database. In the case of hit, a database query is not required.
(3) Lightor's query cache, Query Key corresponds to the ID list, not a list of data objects, and the real data object exists only in the ID cache. Therefore, different Query Key correspond to the list of IDs If there is an intersection, the ID corresponding to the ID will not be reused in the ID cache.
(4) Lightor's cache does not have the worst case of the JIVE cache n 1 database query shortcomings.
3. Disadvantages of Lightor's caching strategy
(1) Lightor's Query Cache has the shortcomings of the Query cache of Hibernate. In the table involved in the condition, if you have any records, delete, or change, the Query Key associated with the table is invalid in the cache. (2) Lightor's ID cache also has the shortcomings of Hibernate's secondary ID cache. When the condition is queried, even if the ID already exists in the cache, you also need to re-put the data object from the database, put it into the cache.
5. Efficiency of Query Key
Query Cache Query Key's space and time overhead is relatively large.
There are a lot of things stored in the query key, SQL, parameters, range (start, number).
The biggest thing in this is SQL. Also dominate, take time (havehcode, equals).
The two most important ways of Query Key are Hashcode and Equals, with a focus on SQL's Hashcode and Equals.
Lightor's approach is that because Lightor uses SQL directly, it is recommended to use STATIC Final String SQL to save space and time, so that Query Key is equivalent to the efficiency of the Id Key.
As for the QueryKey of Hibernate, interested readers can download the source code of each version of Hibernate, track the QueryKey implementation optimization process.
Six, summary
A table is listed here, synthesizing the characteristics of the cache policy of JIVE, Hibernate, Lightor.
N 1 problem
Repeat ID cache problem
Query Cache Support
JIVE cache
Have
no
not support
Hibernate cache
no
Have
stand by
Lightor cache
no
Have
stand by
Note:
The meaning of "Duplicate ID Cache Problem" is that each condition query is queried, not only the ID list, but takes a list of complete objects (all fields). Thus, the data object corresponding to the same ID, even if it already exists in the cache, it may be reset into the cache. See the disadvantage of the relevant cache.
How big is the negative effect of "Repeating ID Cache", just look at your Select ID from ... (only ID) is faster than your SELECT * FROM ... (select all fields). The main influencing factors are the number of fields, the length of field value, and network transmission speeds between database servers.
Anyway, even if you choose all fields, it is just a database query. The possible worst negative effects of N 1 (N 1 data query) is very large.
When selecting a cache policy, the probability and the positive negative effect should be made based on these situations.