Understanding Memcached Python

If you have an application that needs to handle a lot of data or access a database, then you may want to use memcached Python in order to speed it up. Memcached Python is fast and efficient because it avoids the need to repeatedly access a slow database, or to have to recompute data every time an action is performed.

Memcached is a separate tool which is available for many languages. There are other caching possibilities available for Python out of the box, such as the LRU_Cache, which can cache items by utilizing a ‘least recently used’ algorithm, to limit the amount of data being cached. These tools are local to the Python process, however, and this means that they do not scale when you are running multiple copies of an application across a bigger platform. That’s where Memcached starts to come in handy.

Memcached as a Dictionary

Memcached is quite easy to understand. Think of it as a huge dictionary available to the whole network. It has keys and values, which are bytes, and which are valid for a given time period.

You interact with memcached using ‘set’ and ‘get’ operations, to either give a key a value, or retrieve the value that was already set.

Memcached is not local to your Python process, so it can be used across the whole network, simply set up a memcached instance, and then make the Python application a client. The memcached network protocol is fast and high performance. Yes, filing memcached with data from the canonical data source will take time, but it only has to be done once per validity period, then all future calls will access the memcached dictionary instead.

Keeping Data Relevant

When you store data in your memcached dictionary, you can set the length of time that it remains valid for. This is a time in seconds that memcached should keep the key and the data value around. After that time, memcached will automatically remove the key from the cache, and if necessary you can update the cache with new data.

The cache validity period that you set will vary depending on the type of application you are making. There is no one right or wrong answer. With some applications, you may want the cache to hold data for just a few seconds, for others, a validity period in the hours could be fine.

One thing that you need to watch out for is cache invalidation, which is what will happen if a chache becomes out of sync with your current data. It is something that you most definitely will not want if you are handling important data with the application. You have limited memory so you can’t just cache the whole database and refresh it over and over; rather, you should look at caching frequently accessed data, and having a TTL period that is reasonable given how critical the data is for the application. Remember that the cache will need ‘warmed up’ if memcached crashes or the server is restarted.