Cache Tools

Tools and techniques for cache monitoring and inspection.

Topics to be done

  • Resident alternates
  • Object refresh

Cache Consistency

The cache is completely consistent, up to and including kicking the power cord out, if the write buffer on consumer disk drives is disabled. You need to use:

hdparm -W0

The cache validates that all the data for the document is available and will silently mark a partial document as a miss on read. There is no gentle shutdown for Traffic Server. You simply kill the process and the recovery code (fsck) is run every time Traffic Server starts up.

On startup the two versions of the index are checked, and the last valid one is read into memory. Traffic Server then moves forward from the last snapped write cursor and reads all the fragments written to disk and updates the directory (as in a log-based file system). It stops reading at the write before the last valid write header it sees (as a write is not necessarily atomic because of sector reordering). Then the new updated index is written to the invalid version (in case of a crash during startup) and the system starts.

Volume Tagging

Currently, cache volumes are allocated somewhat arbitrarily from storage elements. This enhancement allows storage.config to assign storage units to specific volumes although the volumes must still be listed in volume.config in general and in particular to map domains to specific volumes. A primary use case for this is to be able to map specific types of content to different storage elements. This can be employed to have different storage devices for various types of content (SSD vs. rotational).

Version Upgrade

It is currently the case that any change to the cache format will clear the cache. This is an issue when upgrading the Traffic Server version and should be kept in mind.

Controlling the cache key

The cache key is by default the URL of the request. There are two possible choices, the original (pristine) URL and the remapped URL. Which of these is used is determined by the configuration value proxy.config.url_remap.pristine_host_hdr.

This is an INT value. If set to 0 (disabled) then the remapped URL is used, and if it is not 0 (enabled) then the original URL is used. This setting also controls the value of the HOST header that is placed in the request sent to the origin server, using the hostname from the original URL if not 0 and the host name from the remapped URL if 0. It has no other effects.

For caching, this setting is irrelevant if no remapping is done or there is a one-to-one mapping between the original and remapped URLs.

It becomes significant if multiple original URLs are mapped to the same remapped URL. If pristine headers are enabled, requests to different original URLs will be stored as distinct objects in the cache. If disabled, the remapped URL will be used and there may be collisions. This is bad if the contents different, but quite useful if they are the same (as in situations where the original URLs are just aliases for the same underlying server resource).

This is also an issue if a remapping is changed because it is effectively a time axis version of the previous case. If an original URL is remapped to a different server address then the setting determines if existing cached objects will be served for new requests (enabled) or not (disabled). Similarly, if the original URL mapped to a particular URL is changed then cached objects from the initial original URL will be served from the updated original URL if pristine headers is disabled.

These collisions are not by themselves good or bad. An administrator needs to decide which is appropriate for their situation and set the value correspondingly.

If a greater degree of control is desired, a plugin must be used to invoke the API calls TSHttpTxnCacheLookupUrlSet() or TSCacheUrlSet() to provide a specific cache key. The TSCacheUrlSet() API can be called as early as TS_HTTP_READ_REQUEST_HDR_HOOK but no later than TS_HTTP_POST_REMAP_HOOK. It can be called only once per transaction; calling it multiple times has no additional effect.

A plugin that changes the cache key must do so consistently for both cache hit and cache miss requests because two different requests that map to the same cache key will be considered equivalent by the cache. Use of the URL directly provides this and so must any substitute. This is entirely the responsibility of the plugin; there is no way for the Traffic Server core to detect such an occurrence.

If TSHttpTxnCacheLookupUrlGet() is called after new cache url set by TSHttpTxnCacheLookupUrlSet() or TSCacheUrlSet(), it should use a URL location created by TSUrlCreate() as its third input parameter instead of getting url_loc from the client request.

It is a requirement that the string be syntactically a URL but otherwise it is completely arbitrary and need not have any path. For instance, if the company Network Geographics wanted to store certain content under its own cache key, using a document GUID as part of the key, it could use a cache key like

ngeo://W39WaGTPnvg

The scheme ngeo was picked specifically because it is not a valid URL scheme, and so will never collide with any valid URL.

This can be useful if the URL encodes both important and unimportant data. Instead of storing potentially identical content under different URLs (because they differ on the unimportant parts) a url containing only the important parts could be created and used.

For example, suppose the URL for Network Geographics content encoded both the document GUID and a referral key.

http://network-geographics-farm-1.com/doc/W39WaGTPnvg.2511635.UQB_zCc8B8H

We don’t want to serve the same content for every possible referrer. Instead, we could use a plugin to convert this to the previous example and requests that differed only in the referrer key would all reference the same cache entry. Note that we would also map the following to the same cache key

http://network-geographics-farm-56.com/doc/W39WaGTPnvg.2511635.UQB_zCc8B8H

This can be handy for sharing content between servers when that content is identical. Plugins can change the cache key, or not, depending on any data in the request header. For instance, not changing the cache key if the request is not in the doc directory. If distinguishing servers is important, that can easily be pulled from the request URL and used in the synthetic cache key. The implementor is free to extract all relevant elements for use in the cache key.

While there is no explicit requirement that the synthetic cache key be based on the HTTP request header, in practice it is generally necessary due to the consistency requirement. Because cache lookup happens before attempting to connect to the origin server, no data from the HTTP response header is available, leaving only the request header. The most common case is the one described above where the goal is to elide elements of the URL that do not affect the content to minimize cache footprint and improve cache hit rates.