.. Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to you under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. .. include:: ../../common.defs .. _developer-cache-ram-cache: RAM Cache ********* New RAM Cache Algorithm (CLFUS) =============================== The new RAM Cache uses ideas from a number of cache replacement policies and algorithms, including LRU, LFU, CLOCK, GDFS and 2Q, called CLFUS (Clocked Least Frequently Used by Size). It avoids any patented algorithms and includes the following features: * Balances Recentness, Frequency and Size to maximize hit rate (not byte hit rate). * Is Scan Resistant and extracts robust hit rates even when the working set does not fit in the RAM Cache. * Supports compression at 3 levels: fastlz, gzip (libz), and xz (liblzma). Compression can be moved to another thread. * Has very low CPU overhead, only slightly more than a basic LRU. Rather than using an O(lg n) heap, it uses a probabilistic replacement policy for O(1) cost with low C. * Has relatively low memory overhead of approximately 200 bytes per object in memory. The rationale for emphasizing hit rate over byte hit rate is that the overhead of pulling more bytes from secondary storage is low compared to the cost of a request. The RAM Cache consists of an object hash fronting 2 LRU/CLOCK lists and a *seen* hash table. The first cached list contains objects in memory, while the second contains a history of objects which have either recently been in memory or are being considered for keeping in memory. The *seen* hash table is used to make the algorithm scan resistant. The list entries record the following information: ============== ================================================================ Value Description ============== ================================================================ key 16 byte unique object identifier auxkeys 8 bytes worth of version number (in our system, the block in the partition). When the version of an object changes old entries are purged from the cache. hits Number of hits within this clock period. size size of the object in the cache. len Length of the object, which differs from *size* because of compression and padding). compressed_len Compressed length of the object. compressed Compression type, or ``none`` if no compression. Possible types are: *fastlz*, *libz*, and *liblzma*. uncompressible Flag indicating that content cannot be compressed (true), or that it mat be compressed (false). copy Whether or not this object should be copied in and copied out (e.g. HTTP HDR). LRU link HASH link IOBufferData Smart point to the data buffer. ============== ================================================================ The interface to the cache is *Get* and *Put* operations. Get operations check if an object is in the cache and are called on a read attempt. The Put operation decides whether or not to cache the provided object in memory. It is called after a read from secondary storage. Seen Hash ========= The *Seen List* becomes active after the *Cached* and *History* lists become full following a cold start. The purpose is to make the cache scan resistant, which means that the cache state must not be affected at all by a long sequence Get and Put operations on objects which are seen only once. This is essential, and without it not only would the cache be polluted, but it could lose critical information about the objects that it cares about. It is therefore essential that the Cache and History lists are not affected by Get or Put operations on objects seen the first time. The Seen Hash maintains a set of 16 bit hash tags, and requests which do not hit in the object cache (are in the Cache List or History List) and do not match the hash tag result in the hash tag being updated but are otherwise ignored. The Seen Hash is sized to approximately the number of objects in the cache in order to match the number that are passed through it with the CLOCK rate of the Cached and History Lists. Cached List =========== The *Cached List* contains objects actually in memory. The basic operation is LRU with new entries inserted into a FIFO queue and hits causing objects to be reinserted. The interesting bit comes when an object is being considered for insertion. A check is first made against the Object Hash to see if the object is in the Cached List or History. Hits result in updating the ``hit`` field and reinsertion of the object. History hits result in the ``hit`` field being updated and a comparison to see if this object should be kept in memory. The comparison is against the least recently used members of the Cache List, and is based on a weighted frequency:: CACHE_VALUE = hits / (size + overhead) A new object must be enough bytes worth of currently cached objects to cover itself. Each time an object is considered for replacement the CLOCK moves forward. If the History object has a greater value then it is inserted into the Cached List and the replaced objects are removed from memory and their list entries are inserted into the History List. If the History object has a lesser value it is reinserted into the History List. Objects considered for replacement (at least one) but not replaced have their ``hits`` field set to ``0`` and are reinserted into the Cached List. This is the CLOCK operation on the Cached List. History List ============ Each CLOCK, the least recently used entry in the History List is dequeued and if the ``hits`` field is not greater than ``1`` (it was hit at least once in the History or Cached List) it is deleted. Otherwise, the ``hits`` is set to ``0`` and it is requeued on the History List. Compression and Decompression ============================= Compression is performed by a background operation (currently called as part of Put) which maintains a pointer into the Cached List and runs toward the head compressing entries. Decompression occurs on demand during a Get. In the case of objects tagged ``copy``, the compressed version is reinserted in the LRU since we need to make a copy anyway. Those not tagged ``copy`` are inserted uncompressed in the hope that they can be reused in uncompressed form. This is a compile time option and may be something we want to change. There are 3 algorithms and levels of compression (speed on an Intel i7 920 series processor using one thread): ======= ================ ================== ==================================== Method Compression Rate Decompression Rate Notes ======= ================ ================== ==================================== fastlz 173 MB/sec 442 MB/sec Basically free since disk or network will limit first; ~53% final size. libz 55 MB/sec 234 MB/sec Almost free, particularly decompression; ~37% final size. liblzma 3 MB/sec 50 MB/sec Expensive; ~27% final size. ======= ================ ================== ==================================== These are ballpark numbers, and your millage will vary enormously. JPEG, for example, will not compress with any of these (or at least will only do so at such a marginal level that the cost of compression and decompression is wholly unjustified), and the same is true of many other media and binary file types which embed some form of compression. The RAM Cache does detect compression level and will declare something *incompressible* if it doesn't get below 90% of the original size. This value is cached so that the RAM Cache will not attempt to compress it again (at least as long as it is in the history).