Thread-Safe Caching Object with File and Http Implementations

xiaoxiao2021-04-03  235

From: http://aspn.activeState.com/aspn/cookbook/python/recipe/302997

Title: Thread-Safe Caching Object with File and http ustementations Submitter: Nicolas Lehuen (Other Recipes) Last Updated: 2006/02/08 Version No: 1.8 Category: Algorithms

Not rated yet

Description: Implementation of an abstract, thread-safe cache with minimal locking Four concrete implementations:.. A validating file cache, a validating HTTP cache, an experimental Python module cache and a function cache Plus, an abstract cache with weak references to its values . Source: Text source # - * - CODING: ISO-8859-1 - * -

From OS IMPORT STAT

From time import time, mktime

From RFC822 Import Parsedate

From Calendar Import Timegm

IMPORT URLLIB2

IMPORT RE

Import weakref

Import New

TRY:

From Threading Import Lock

Except Importerror:

From Dummy_threading Import Lock

NOT_INITIALIZED = Object ()

Class Entry (Object):

"" "A cache entry, mostly an internal object." "" "

DEF __INIT __ (Self, Key):

Object .__ init __ (self)

Self._key = key

Self._value = not_initialized

Self._lock = LOCK ()

Class Cache (Object):

"" "" "" "" "" ""

DEF __INIT __ (Self, Max_Size = 0):

"" "Builds a cache with a limit of max_size entries.

IF this limit is exceeded, The Least Recessly Used Entry is Discarded.

If max_size == 0, The Cache IS Unbounded (No Lru Rule Is Applied).

"" "

Object .__ init __ (self)

SELF._MAXSIZE = Max_size

SELF._DICT = {}

Self._lock = LOCK ()

# Hader of the access list

IF self._maxsize:

Self._head = entry (none)

Self._head._previous = self._head

Self._Head._Next = SELF._HEAD

DEF __SETITEM __ (Self, Name, Value):

"" "Populates the cache with a given name and value." "" "" "" "" ""

Entry = SELF._GET_ENTRY (Key)

Entry._lock.acquire ()

TRY:

SELF._PACK (Entry, Value)

SELF.COMMIT ()

Finally:

Entry._lock.release ()

DEF __GETITEM __ (Self, Name):

"" "Gets a value from the cache, builds it if need.

"" "

Return self._checkItem (name) [2]

DEF __DELITEM __ (Self, Name):

Self._lock.acquire ()

TRY:

Key = Self.Key (Name)

Del self._dict [key]

Finally:

Self._lock.release ()

DEF _GET_ENTRY (Self, Key):

Self._lock.acquire ()

TRY:

Entry = self._dict.get (key)

IF NOT ENTRY:

Entry = entry (key)

SELF._DICT [key] = Entry

IF self._maxsize:

Entry._next = entry._previous = none

SELF._ACCESS (Entry)

Self._checklru ()

Elif self._maxsize:

SELF._ACCESS (Entry)

Return Entry

Finally:

Self._lock.release ()

Def_CheckItem (Self, Name):

"" "Gets a value from the cache, builds it if need.

Returns a tuple is_new, key, value, entry.

IF is_new is true, the result had to be rebuilt.

"" "

Key = Self.Key (Name)

Entry = SELF._GET_ENTRY (Key)

Entry._lock.acquire ()

TRY:

Value = Self._unpack (entry)

IS_NEW = FALSE

IF value is not_initialized:

Opened = Self.check (key, name, entry)

Value = Self.Build (key, name, opened, entry)

IS_NEW = TRUE

SELF._PACK (Entry, Value)

SELF.COMMIT ()

Else:

Opened = Self.check (key, name, entry)

IF opened is not none:

Value = Self.Build (key, name, opened, entry)

IS_NEW = TRUE

SELF._PACK (Entry, Value)

SELF.COMMIT ()

Return IS_NEW, Key, Value, Entry

Finally:

Entry._lock.release ()

DEF MRU (Self):

"" Returns the MOSTLY USED Key "" "

IF self._maxsize: self._lock.acquire ()

TRY:

Return self._head._previous._key

Finally:

Self._lock.release ()

Else:

Return none

DEF LRU (Self):

"" Returns The Least Recessly Used Key "" "

IF self._maxsize:

Self._lock.acquire ()

TRY:

Return Self._Head._Next._Key

Finally:

Self._lock.release ()

Else:

Return none

Def key (self, name):

"" "" OVERRIDE this Method to Extract a key from the name passed to the [] operator "" "

Return Name

Def Commit (Self):

"" "OVERRIDE this Method if You Want to do Something Each Time The Underlying Dictionary is Modified (E.G. make it it penness)." ""

PASS

Def Clear:

"" Clears the cache "" "

Self._lock.acquire ()

TRY:

Self._dict.clear ()

IF self._maxsize:

Self._Head._Next = SELF._HEAD

Self._head._previous = self._head

Finally:

Self._lock.release ()

Def Check (Self, Key, Name, Entry):

"" "" OVERRIDE this Method to Check WHETHER THE Entry with the Given Name Is Stale. Return None if it is Fresh

OR An Opened Resource if it is stale. The Object Returned Will Be Passed to the 'build' Method as the 'Opened' parameter.

Use the 'entry' parameter to store meta-data if required. Don't worry About Multiple Threads accessing the Same Name,

as this method isotly isolated.

"" "

Return none

Def Build (Self, Key, Name, Opened, Entry):

"" "" Build The Cached Value with The Given Name from The Given Opened Resource. Use entry to ketain or store meta-data if needed.

Don't Worry About Multiple Threads Accessing The Same Name, AS this Method Is Properly isolated.

"" "

Raise NotimplementError ()

DEF _ACCESS (Self, Entry): "INTERNAL USE" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "

if Entry._Next Is Not Self._Head:

IF entry._previous is not none:

# REMOVE The entry from the access list

Entry._previous._next = entry._next

Entry._next._previous = entry._previous

# insert the entry at the end of the access list

Entry._previous = self._head._previous

Entry._previous._next = entry

Entry._next = Self._Head

Entry._next._previous = entry

If self._head._next is self._head:

Self._Head._Next = Entry

Def_Checklru (Self):

"" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "INTER

if Len (Self._Dict)> Self._MaxSize:

LRU = Self._Head._Next

Lru._previous._next = lru._next

Lru._next._previous = lru._previous

DEL SELF._DICT [Lru._Key]

Def _pack (Self, Entry, Value):

"" "" "" "" ""

Entry._value = Value

Def _unpack (self, entry):

"" "Recover The value from the entry, returns not_initialized if it is not ok." ""

Return Entry._Value

Class Weakcache (Cache):

"" "" This cache holds weak references to the value it stores. WHENEVER A Value IS Not Longer

Normally Reference, IT IS Removed from the cache. Useful for sharing the result of long

Computations But Letting Them Go As Soon as They Are Not Needed by Anybody.

"" "

Def _pack (Self, Entry, Value):

Entry._value = weakref.ref (value, lambda ref: self .__ delitem __ (entry._key))

Def _unpack (self, entry):

If entry._value is not_initialized:

Return NOT_INITIALIZED

Value = entry._value ()

IF value is none:

Return NOT_INITIALIZED

Else:

Return Value

Class FileCache (Cache): "" "A File Cache. Returns The Content of The Files As A String, Given Their FileName.

What Cache is Updated.

Override the build method to obtain more intending behaviour.

"" "

DEF __INIT __ (Self, Max_Size = 0, Mode = 'RB'):

Cache .__ init __ (self, max_size)

Self.Mode = MODE

Def Check (Self, Key, Name, Entry):

TimeStamp = Stat (key) .st_mtime

If entry._value is not_initialized:

Entry._timestamp = TimeStamp

Return File (key, self.mode)

Else:

IF entry._timestamp! = TimeStamp:

Entry._timestamp = TimeStamp

Return File (key, self.mode)

Else:

Return none

Def Build (Self, Key, Name, Opened, Entry):

"" "Return The Content of The File as a string. Override this for better behaviour." "" "

TRY:

Return OPENED.READ ()

Finally:

Opened.close ()

Def Parserfc822Time (T):

Return Mktime (ParSedate (T))

RE_MAX_AGE = Re.Compile ('max-agn / s * = / s * (/ d )', Re.i)

Class httpentity (Object):

DEF __INIT __ (Self, Entity, Metadata):

Self.entity = Entity

Self.metadata = metadata

DEF __REPR __ (Self):

Return 'httpentity (% s,% s)'% (Repr (Self.Entity), Self.Metadata

DEF __STR __ (Self):

Return Self.Antity

Class httpcache (cache):

"" "" AN http cache. Returns The Entity Found At The Given URL.

Uses Expires, etag and last-modified headers to minimize bandwidth usage.

Partial Cache-Control Support (Only Max-Age IS Support).

"" "

Def Check (Self, Key, Name, Entry):

Request = urllib2.request (key)

TRY:

IF Time ()

Return none

Except AttributeError:

PASS

TRY:

Header, Value = entry._validator

Request.headers [header] = ValueExcept AttributeError:

PASS

Opened = none

TRY:

Opened = urllib2.urlopen (Request)

Headers = OPENED.INFO ()

# xiration handling

Expiration = false

TRY:

Match = re_max_age.match (Headers ['Cache-Control'])

IF match:

Entry._expires = time () int (Match.Group (1))

Expiration = true

Except (KeyError, ValueError):

PASS

IF NOT Expiration:

TRY:

Date = Parserfc822Time (Headers ['Date'])

Expires = Parserfc822Time (Headers ['Expires'])

Entry._expires = time () (Expires-Date)

Expiration = true

Except Keyerror:

PASS

# Validator HANDLING

VALIDATION = FALSE

TRY:

Entry._validator = 'if-none-match', Headers ['etc "]

Validation = TRUE

Except Keyerror:

PASS

IF not validation:

TRY:

Entry._validator = 'if-modified-Since', Headers ['last-modified']

Except Keyerror:

PASS

Return Opened

Except Urllib2.httperror, Error:

if Opened: Opened: Opened.Close ()

IF error.code == 304:

Return none

Else:

Raise Error

Def Build (Self, Key, Name, Opened, Entry):

TRY:

Return httpensity (Opened.Read (), Dict (Opened.info ())))

Finally:

Opened.close ()

RE_NOT_WORD = Re.Compile (r '/ w ')

Class Modulecache (Filecache):

"" "A Module Cache. Give It A File Name, IT Returns A Module

Which Results from the Execution of the Python Script It Contains.

This module is not inserted Into sys.modules.

"" "

DEF __INIT __ (Self, Max_Size = 0):

Filecache .__ init__ (self, max_size, 'r')

Def Build (Self, Key, Name, Opened, Entry):

TRY:

Module = new.module (RE_NOT_WORD.SUB ('_', key))

Module .__ file__ = key

EXEC OPENED IN Module .__ DICT__

Return ModuleFinally:

Opened.close ()

Class httpmodulecache (httpcache):

"" "" A Module Cache. Give It An HTTP URL, IT RETURns A Module

Which Results from the Execution of the Python Script It Contains.

This module is not inserted Into sys.modules.

"" "

DEF __INIT __ (Self, Max_Size = 0):

Httpcache .__ init __ (self, max_size)

Def Build (Self, Key, Name, Opened, Entry):

TRY:

Module = new.module (RE_NOT_WORD.SUB ('_', key))

Module .__ file__ = key

TEXT = OPENED.READ (). Replace ('/ r / n', '/ n')

Code = Compile (Text, Name, 'Exec')

Exec code in module .__ DICT__

Return Module

Finally:

Opened.close ()

Class FunctionCache (Cache):

DEF __INIT __ (Self, Function, Max_Size = 0):

Cache .__ init __ (self, max_size)

Self.function = function

DEF __CALL __ (Self, * args, ** kW):

IF kW:

# a dict is not Hashable So We Build a Tuple of (Key, Value) PAIRS

KW = tuple (kw.iterItems ())

Return Self [Args, KW]

Else:

Return Self [args, ()]

Def Build (Self, Key, Name, Opened, Entry):

Args, kw = key

Return Self.Function (* args, ** Dict (kW)) Discussion: Two Years ago I Was A Definite Java Fan (Writing in this language since "). NOW i '

m in love with Python. The trouble is that I left behind quite a few useful classes I wrote (thread-safe caches, pools, etc.), so I had to reimplement them in Python.A cache is a pretty simple dictionary-like object: you provide it an index or a name, it gives you back an object for example, for an HTTP cache, the index is an URL, the object is the data you can fetch from the URL.The trick is that the corresponding. object can be quite expensive (in CPU, bandwitdh, time or memory) to build, so you have to balance between building the object every time you need it, or pre-building all the objects you could require, knowing that the target object can change in time (think of how an URL can point to different data over time). A cache is precisely a way to find a balance between these two extremities.This recipes provides you with an abstract Cache class, from which you can inherit, overriding And Build () Methods, And Four Specialisations. Filecache and Httpcache Are Quite WHA t their name describe. ModuleCache is an experimental specialisation of FileCache, which can come handy when playing with dynamic code, since it allow you to load any arbitrary file as a python module, and dynamically reload it each time the file is modified. FunctionCache is the good old function call cache (already presented many times in this Cookbook), with thread-safety included thanks to the Cache base class.Thread-safety of the cache is a must, since the purpose of a cache is to be shared a used By As Much Code As Possible ... a Multi-Threaded Application Such as a Web Application Server Has A Strong Need for Thread-Safe Cache and pool structures.for A Sample Usage of Fileche: >>> FC =

Filecache (10) # 10 Files in Memory At MOST >>> F = Open ('Test.txt', 'W') >>> F.write ('Hello, World!') >>> f.close () >>> FC ['Test.txt'] 'Hello, World!' >>> FC ['Test.txt'] # this time the file is checked but not read'Hello, World! '>>> f = Open ('Test.txt', 'W') >>> F.Write ('Hello, ME!') >>> f.close () >>> FC ['Test.txt'] # this time the file is Checked and re-read'Hello, Me'A Sample Usage of httpcache: >>> hc = httpcache (1000) # maximum 1000 Documents in the cache >>> HC ['http://www.google.com/'] Httpensity (' [snipped] ', ...) >>> HC ['http://www.google.com'] # The problem is, Google Don't want it.com Cached, So there is no gainhttpentity (' [snipped] ', ...) >>> HC ['http://www.google.com'] .metadata # That's why: no last -Modified, NO ETAG, NO EXPIRES Headers. {'Content-Length': '2360', 'Set-Cookie': 'pref = [snipped]; Expires = Sun, 17-JAN-2038 19:14:07 GMT; Path = /; domain = .google.fr ',' Server ':' GWS / 2.1 ',' Connection ':' Keep-Alive ',' Cache-Control ':' Private ',' Date ':' WED, 01 Sep 2004 21:21:50 GMT ',' Content-Type ':' Text / Html '} >>> HC [' http://diveintomark.org/xml/atom.xml '] httpensity (' [snipped] ', ...) >>> HC [' http://diveintomark.org/xml/atom.xml '

] # The second call is much faster, Since Mark Put Some Cache Hint in Order To Save His Bandwidthhtt Prentity (' [snipped]', ...)> >> HC ['http://diveintomark.org/xml/atom.xml'] .metadata # here's why: nice expires, last-modified and etc Headers {'Content-length': '9785', 'Accept-Ranges ':' Bytes ',' Expires ':' THU, 02 Sep 2004 01:23:43 GMT ',' Vary ':' * ',' Server ':' Apache / 1.3.31 (Debian GNU / Linux), 'Last-Modified': 'WED, 01 SEP 2004 03:26:16 GMT', 'Connection': 'Close', 'ETAG': '"E80A6-2639-41354158"', 'Cache-Control': 'Max -AGE = 14400 ',' Date ':' WED, 01 Sep 2004 21:23:43 GMT ',' Content-Type ':' Application / XML '} Sample Usage of functioncache: >>> from time import sleep >> > def my_long_function (value): ... Sleep (5) ... Return Value 1 >>> my_long_function (2) # 5 seconds lat ... 3 >>> cached = functioncache (my_long_function, 10) # Keep The 10 Last Calls in Memory >>> Cached (2) # 5 SECONDS LATER ... 3 >>> Cached (2) # Immediate Answer3Add Comment Number of Comments: 4

Update, Nicolas Lehuen, 2004/09 / 07Changes: - added a WeakCache abstract class- all short member names such a entry._p, cache._d and so on have been renamed to longer, more explicit names Hopefully the code is more readable. now.- Cache.extract is renamed to Cache._unpack, and must return NOT_INITIALIZED if the entry is invalid. Cache._pack does the opposite of Cache._unpack. See WeakCache for an example use.- ModuleCache now return Module objects, which still Are PlaceHolder Classes But With A Better Repr (). Add CommentUpdate, Nicolas Lehuen, 2004/10 / 13IN __SETITITEM__, An Entry Lock Was Forgotten When Setting The value on an already existing entry.add Comment

Update, Nicolas Lehuen, 2005/10 / 05Updated the code to the latest version I'm using The API for check () and build () has slightly changed:. The "key" parameter has been added The ModuleCache class now uses real. Module Objects INSTEAD OF Fake Ones.Add Comment

Update, Nicolas Lehuen, 2006/02 / 08Compatibility improvements: this latest version is compatible with Python 2.2, and with Python versions which do not include thread support.Performance improvements: FileCache stat () the file to test if it is modified, and open () It online, it is it, previously it opened, Even if it isn't modified.add comment

转载请注明原文地址:https://www.9cbs.com/read-131652.html

New Post(0)