Caching in Python with Examples
Every Python programmer should be familiar with the concept of caching.
Caching is a programming technique to store frequently required data in a temporary location for faster access rather than requesting it from the main source every time.
In this tutorial, we will learn how to implement caching to store frequently needed data using the cachetools library in a Python program. The cachetools module includes a number of classes that implement caches using various cache algorithms derived from Cache class which, in turn, is derived from the collections.MutableMapping. The module also provides maxsize and currsize attributes to get the maximum and current size of the cache. When a cache is full, Cache.__setitem__() repeatedly calls self.popitem() until the item can be inserted.
This module contains a number of memoizing collections and decorators, including variations of the @lru_cache function decorator from the Python Standard Library.
Install cachetools
Install cachetools library using the following command:
pip install cachetools
cachetools.Cache
This cachetools.Cache class provides mutable mapping that can be used as a simple cache or cache base class. Cache class uses popitem() to make space when necessary. To implement specific caching algorithms, derived classes can override popitem(). We can also override __getitem__(), __setitem__(), and __delitem__() functions if a subclass needs to track item access, insertion, or deletion. For example:
from cachetools import Cache
#maxsize is the size of data the Cache can hold
cache_data = Cache(maxsize=50000)
data_item1 = "http://example1.com"
data_item2 = "http://example2.com"
data_item3 = "http://example3.com"
data_item4 = "http://example4.com"
data_item5 = "http://example5.com"
cache_data[hash(data_item1)] = data_item1
cache_data[hash(data_item2)] = data_item2
cache_data[hash(data_item3)] = data_item3
cache_data[hash(data_item4)] = data_item4
cache_data[hash(data_item5)] = data_item5
#Accessing data from cache
item = cache_data.get(hash(data_item4), None)
print("Getting from cache = ", item)
The output of the above code is as follows:
Getting from cache = http://example4.com
cachetools.FIFOCache
FIFOCache is an implementation of First In First Out cache. This class removes data in the order they were added in the cache to make space when necessary. The popitem() function removes the data inserted first in the cache. For example:
from cachetools import FIFOCache
cache_data = FIFOCache(maxsize=50000)
data_item1 = "http://example1.com"
data_item2 = "http://example2.com"
data_item3 = "http://example3.com"
data_item4 = "http://example4.com"
data_item5 = "http://example5.com"
cache_data[hash(data_item1)] = data_item1
cache_data[hash(data_item2)] = data_item2
cache_data[hash(data_item3)] = data_item3
cache_data[hash(data_item4)] = data_item4
cache_data[hash(data_item5)] = data_item5
#Accessing data from cache
item = cache_data.get(hash(data_item2), None)
print("Getting from cache = ", item)
The output of the above code is as follows:
Getting from cache = http://example2.com
cachetools.LFUCache
LFUCache is an implementation of Least Frequently Used cache. This class keeps track of how often an item is retrieved and discards the data that aren't used too often to free space when necessary. The popitem() function removes the least frequently used data from the cache. For example:
from cachetools import LFUCache
cache_data = LFUCache(maxsize=50000)
data_item1 = "http://example1.com"
data_item2 = "http://example2.com"
data_item3 = "http://example3.com"
data_item4 = "http://example4.com"
data_item5 = "http://example5.com"
cache_data[hash(data_item1)] = data_item1
cache_data[hash(data_item2)] = data_item2
cache_data[hash(data_item3)] = data_item3
cache_data[hash(data_item4)] = data_item4
cache_data[hash(data_item5)] = data_item5
#Accessing data from cache
item = cache_data.get(hash(data_item2), None)
print("Getting from cache = ", item)
The output of the above code is as follows:
Getting from cache = http://example2.com
cachetools.LRUCache
LRUCache is an implementation of Least Recently Used cache. This class discards the least recently used data from the cache to free space when necessary. The popitem() function removes the least recently used data from the cache. For example:
from cachetools import LRUCache
cache_data = LRUCache(maxsize=50000)
data_item1 = "http://example1.com"
data_item2 = "http://example2.com"
data_item3 = "http://example3.com"
data_item4 = "http://example4.com"
data_item5 = "http://example5.com"
cache_data[hash(data_item1)] = data_item1
cache_data[hash(data_item2)] = data_item2
cache_data[hash(data_item3)] = data_item3
cache_data[hash(data_item4)] = data_item4
cache_data[hash(data_item5)] = data_item5
#Accessing data from cache
item = cache_data.get(hash(data_item2), None)
print("Getting from cache = ", item)
The output of the above code is as follows:
Getting from cache = http://example2.com
cachetools.MRUCache
MRUCache is an implementation of Most Recently Used cache. This class discards the most recently used data items from the cache to free space when necessary. The popitem() function removes the most recently used data from the cache. For example:
from cachetools import MRUCache
cache_data = MRUCache(maxsize=50000)
data_item1 = "http://example1.com"
data_item2 = "http://example2.com"
data_item3 = "http://example3.com"
data_item4 = "http://example4.com"
data_item5 = "http://example5.com"
cache_data[hash(data_item1)] = data_item1
cache_data[hash(data_item2)] = data_item2
cache_data[hash(data_item3)] = data_item3
cache_data[hash(data_item4)] = data_item4
cache_data[hash(data_item5)] = data_item5
#Accessing data from cache
item = cache_data.get(hash(data_item2), None)
print("Getting from cache = ", item)
The output of the above code is as follows:
Getting from cache = http://example2.com
cachetools.RRCache
RRCache is an implementation of Random Replacement cache. This class randomly selects and discards the data items from the cache to free space when necessary. The popitem() function randomly removes the data from the cache. For example:
from cachetools import RRCache
import random
cache_data = RRCache(maxsize=50000, choice=random.choice)
data_item1 = "http://example1.com"
data_item2 = "http://example2.com"
data_item3 = "http://example3.com"
data_item4 = "http://example4.com"
data_item5 = "http://example5.com"
cache_data[hash(data_item1)] = data_item1
cache_data[hash(data_item2)] = data_item2
cache_data[hash(data_item3)] = data_item3
cache_data[hash(data_item4)] = data_item4
cache_data[hash(data_item5)] = data_item5
#Accessing data from cache
item = cache_data.get(hash(data_item2), None)
print("Getting from cache = ", item)
The output of the above code is as follows:
Getting from cache = http://example2.com
cachetools.TTLCache
TTLCache is an implementation of LRU cache with a per-item time-to-live (TTL) value. Each cached item has a time-to-live value and the item will no longer be accessible after the time-to-live value expires. The popitem() function removes the least recently used data from the cache. For example:
from cachetools import TTLCache
from datetime import datetime, timedelta
#Creating cache where each item will be accessbile for 1 hour
cache_data = TTLCache(maxsize=50000, ttl=timedelta(hours=1), timer=datetime.now)
data_item1 = "http://example1.com"
data_item2 = "http://example2.com"
data_item3 = "http://example3.com"
data_item4 = "http://example4.com"
data_item5 = "http://example5.com"
#Storing data in cache for no longer than 1 hour
cache_data[hash(data_item1)] = data_item1
cache_data[hash(data_item2)] = data_item2
cache_data[hash(data_item3)] = data_item3
cache_data[hash(data_item4)] = data_item4
cache_data[hash(data_item5)] = data_item5
#Accessing data from cache
item = cache_data.get(hash(data_item2), None)
print("Getting from cache = ", item)
The output of the above code is as follows:
Getting from cache = http://example2.com
Memoizing Decorators
The cachetools module also provides decorators for memoizing function and method calls. Memoization is a technique in which the results of expensive function calls are stored in temporary locations for faster access to speed up computer programs by returning the cached result when the same inputs are provided again.
@cachetools.cached
cachetools.cached is a decorator that wraps a function with a memoizing callable and stores the results in a cache. Example:
from cachetools import cached
import time
#WITHOUT CACHE speed up calculating Fibonacci numbers
def fibonacci(n):
return n if n < 2 else fibonacci(n - 1) + fibonacci(n - 2)
start_time = time.time()
print("Result without cache = ",fibonacci(30))
end_time = time.time()
print("Time Taken without cache : ", end_time - start_time)
#WITH CACHE speed up calculating Fibonacci numbers
@cached(cache={})
def fibonacci(n):
return n if n < 2 else fibonacci(n - 1) + fibonacci(n - 2)
start_time = time.time()
print("Result with cache = ",fibonacci(30))
end_time = time.time()
print("Time Taken with cache : ", end_time - start_time)
The output of the above code is as follows:
Result without cache = 832040 Time Taken without cache : 0.19904470443725586 Result with cache = 832040 Time Taken with cache : 0.0
Here's another example of calling a method with and without caching the result. The caching method saves the result for no longer than 10 minutes:
from cachetools import cached, TTLCache
import urllib.request
import time
BASE_URL = "https://api.openweathermap.org/data/2.5/weather?"
API_KEY = Get your keys here at https://home.openweathermap.org/api_keys
def get_weather(city):
URL = BASE_URL + "q=" + city + "&appid=" + API_KEY
f = urllib.request.urlopen(URL)
if f.status == 200:
data = f.read().decode('utf-8')
return data
return None
start_time = time.time()
print(get_weather("Boston"))
end_time = time.time()
print("Time Taken without using cache = ", end_time - start_time)
# cache weather data for no longer than ten minutes
@cached(cache=TTLCache(maxsize=1024, ttl=600))
def get_weather(city):
URL = BASE_URL + "q=" + city + "&appid=" + API_KEY
f = urllib.request.urlopen(URL)
if f.status == 200:
data = f.read().decode('utf-8')
return data
return None
start_time = time.time()
print(get_weather("Boston"))
end_time = time.time()
print("Time Taken using cache = ", end_time - start_time)
The output of the above code is as follows:
{"coord":{"lon":-71.0598,"lat":42.3584},"weather":[{"id":801,"main":"Clouds","description":"few clouds","icon":"02n"}],"base":"stations","main":{"temp":264.65,"feels_like":258.32,"temp_min":260.9,"temp_max":268.12,"pressure":1040,"humidity":63},"visibility":10000,"wind":{"speed":4.12,"deg":220},"clouds":{"all":20},"dt":1644982968,"sys":{"type":1,"id":3486,"country":"US","sunrise":1644925256,"sunset":1644963363},"timezone":-18000,"id":4930956,"name":"Boston","cod":200} Time Taken without using cache = 2.791274309158325 {"coord":{"lon":-71.0598,"lat":42.3584},"weather":[{"id":801,"main":"Clouds","description":"few clouds","icon":"02n"}],"base":"stations","main":{"temp":264.65,"feels_like":258.32,"temp_min":260.9,"temp_max":268.12,"pressure":1040,"humidity":63},"visibility":10000,"wind":{"speed":4.12,"deg":220},"clouds":{"all":20},"dt":1644982968,"sys":{"type":1,"id":3486,"country":"US","sunrise":1644925256,"sunset":1644963363},"timezone":-18000,"id":4930956,"name":"Boston","cod":200} Time Taken using cache = 0.43514037132263184