Extractor Reference

A feature is created by applying a feature expression, which consists of a mix of literal strings and extractors.

For convenience, because a single extractor is by far the most common case, unquoted strings are treated as a single extractor. Consider the extractor ua-req-host. This can be used in the following feature expressions, presuming the host is “example.one”.

Feature String

Extracted Feature

"Host = {ua-req-host}"

Host = example.one

"Host = {uareq-host:*<15}"

Host = example.one***

ua-req-host

example.one

"ua-req-host"

ua-req-host

"{ua-req-host}"

example.one

"{{ua-req-host}}"

{ua-req-host}

!literal "{ua-req-host}"

{ua-req-host}

Extractors may have or require parameters that affect what is extracted which are supplied as an argument. This enables using these parameters inside a feature expression.

Extractors

HTTP Messages

There is a lot of information in the HTTP messages handled by Traffic Server and many extractors to access it. These are divided in to families, each one based around one of the basic messages -

  • User Agent Request - the request sent by the user agent to Traffic Server.

  • Proxy Request - the request sent by Traffic Server (the proxy) to the upstream.

  • Upstream Response - the response sent by the upstream to Traffic Server in response to the Proxy Request.

  • Proxy Response - the response sent by Traffic Server to the user agent.

There is also the “pre-remap” or “pristine” user agent URL. This is a URL only, not a request, and is a copy of the user agent URL just before URL rewriting.

In addition, for the remap hook, there are two other URLs (not requests) that are available. These are the “target” and “replacement” URLs which are the values literally specified in the URL rewrite rule. Note these can be problematic for regular expression remap rules as they are frequently not valid URLs.

Host and Port Handling

The host and port for a request or URL require special handling due to vagaries in the HTTP specification. The most important distinction is these can appear in two different places, the URL itself or in the Host field of the request, or both. This can make modifying them in a specific way challenging. The complexity described here is to make it possible to do exactly what is needed. In particular, although the HTTP specification says if the host and port are in the URL and in the Host header, these must be the same, in _practice_ many proxies are configured to make them different before sending the request upstream from the proxy, generally so that the Host field is based on what the user agent sent and the URL is changed to be where the proxy routes the request. Therefore there are extactors which work on the request as a whole, considering both the URL and the Host field, and others that always use the URL.

Beyond this, the port is optional and this presents some problems. One is a result of the ATS plugin API which makes it impossible to distinguish between “http://delain.nl” and “http://delain.nl:80”. Both port 80 and port 443 are treated specially, the former for scheme “HTTP” and the latter for “HTTPS”. I intend to add that at some point but currently it cannot be done. The result is it is difficult to impossible to properly set these values in a configuration language such as TxnBox has and even if possible would be rather painful to do repeatedly. Therefore TxnBox has the concept of “location” which corresponds to the host and port (the HTTP specification calls this the authority but everyone thought using the term was a terrible idea). This makes it easy to access the host, the port, or both. This is more important when interacting with directives to set those values. Here is a chart to illustrate the three terms

url

host

port

loc

http://evil-kow.ex/path

evil-kow.ex

80

evil-kow.ex

http://evil-kow.ex/path:80

evil-kow.ex

80

evil-kow.ex

https://evil-kow.ex/path

evil-kow.ex

443

evil-kow.ex

http://evil-kow.ex:4443/path

evil-kow.ex

4443

evil-kow.ex:4443

https://evil-kow.ex:4443/path

evil-kow.ex

4443

evil-kow.ex:4443

Paths

Unfortunately due to how the plugin API works, paths are a bit odd. One result is

Important

Paths do not have a leading slash

Given the URL “http://delain.nl/pix”, the path is “pix”, not “/pix”. The existence of the slash is implied by the existence of the path. There is, unfortunately, no way to distinguish a missing from an empty path. E.g. “http://delain.nl” and “http://delain.nl/” are not distinguishable by looking at the value from a path extractor, both will yield an empty string. This matters less than it appears because both ATS and the upstream will treat them identically. Note this applies only to the slash separating the “authority” / “location” from the path. The path for the URLs “http://delain.nl/pix/charlotte” and “http://delain.nl/pix/charlotte/” are distinguishable.

User Agent Request

ua-req-method
Result:
string

The user agent request method.

ua-req-url
Result:
string

The URL in the user agent request.

ua-req-scheme
Result:
string

The URL scheme in the user agent request.

ua-req-loc
Result:
string

The location for the request, consisting of the host and the optional port. This is retrieved from the URL if present, otherwise from the Host field.

ua-req-host
Result:
string

Host for the user agent request. This is retrieved from the URL if present, otherwise from the Host field. This does not include the port.

ua-req-port
Result:
integer

The port for the user agent request. This is pulled from the URL if present, otherwise from the Host field. If not specified, the canonical default based on the scheme is used.

ua-req-path
Result:
string

The path of the URL in the user agent request. This does not include a leading slash.

ua-req-query
Result:
string

The query string for the user agent request if present, an empty string if not.

ua-req-query-value
Result:
string
Argument:
Query parameter key.

The value for a specific query parameter, identified by key. This assumes the standard format for a query string, key / value pairs (joined by ‘=’) separated by ‘&’ or ‘;’. The key comparison is case insensitive. NIL is returned if the key is not found.

ua-req-fragment
Result:
string

The fragment of the URL in the user agent request if present, an empty string if not.

ua-req-url-host
Result:
string

Host for the user agent request URL.

ua-req-url-port
Result:
integer

The port for the user agent request URL.

ua-req-url-loc
Result:
string

The location for the user agent request URL, consisting of the host and the optional port.

ua-req-field
Result:
NULL, string, string list
Argument:
name

The value of a field in the client request. This requires a field name as a argument. To get the value of the “Host” field the extractor would be “ua-req-field<Host>”. The field name is case insensitive.

If the field is not present, the NULL value is returned. Note this is distinct from the empty string which is returned if the field is present but has no value. If there are duplicate fields then a string list is returned, each element of which corresponds to a field.

Pre-Remap

The following extractors extract data from the user agent request URL, but from the URL as it was before URL rewriting (“remapping”). Only the URL is preserved, not any of the fields or the method. These are referred to elsewhere as “pristine” but that is a misnomer. If the user agent request is altered before URL rewriting, that will be reflected in the data from these extractors. These do not necessarily return the URL as it was received by ATS from the user agent. All of these have an alias with “pristine” instead of “pre-remap” for old school operations staff. There are no directives to modify these values, they are read only.

pre-remap-scheme
Result:
string

The URL scheme in the pre-remap user agent request URL.

pre-remap-url
Result:
string

The full URL of the pre-remap user agent request.

pre-remap-path
Result:
string

The URL path in the pre-remap user agent request URL. This does not include a leading slash.

pre-remap-host
Result:
string

The host in the pre-remap user agent request URL. This does not include the port.

pre-remap-port
Result:
integer

The port in the pre-remap user agent request URL. If not specified, the canonical default based on the scheme is used.

pre-remap-query
Result:
string

The query string for the pre-remap user agent request URL.

pre-remap-query-value
Result:
string
Argument:
Query parameter key.

The value for a specific query parameter, identified by key. This assumes the standard format for a query string, key / value pairs (joined by ‘=’) separated by ‘&’ or ‘;’. The key comparison is case insensitive. NIL is returned if the key is not found.

pre-remap-fragment
Result:
string

The fragment of the URL in the pre-remap user agent request if present, an empty string if not.

Rewrite Rule URLs

During URL rewriting there are two additional URLs available, the “target” and the “replacement” URL. These are fixed values from the rule itself, not the user agent. For this reason there are extractors to get data from these URLs but no directives to modify them. These values are available only for the “remap” hook, that is directives invoked from a rule in “remap.config”. Query values are not permitted in these URLs and so no extractor for that is provided.

remap-target-url
Result:
string

The full target URL.

remap-target-scheme
Result:
string

The scheme in the target URL.

remap-target-loc
Result:
string

The network location of the target URL.

remap-target-host
Result:
string

The host in the target URL. This does not include the port, if any.

remap-target-port
Result:
integer

The port in the target URL. If not specified, the default based on the scheme is extracted.

remap-target-path
Result:
string

The path in the target URL.

remap-replacement-url
Result:
string

The full replacement URL.

remap-replacement-scheme
Result:
string

The scheme in the replacement URL.

remap-replacement-loc
Result:
string

The network location in the replacement URL.

remap-replacement-host
Result:
string

The host in the replacement URL. This does not include the port, if any.

remap-replacement-port
Result:
integer

The port in the replacement URL. If not specified, the default based on the scheme is extracted.

remap-replacement-path
Result:
string

The path in the replacement URL.

Proxy Request

proxy-req-method
Result:
string

The proxy request method.

proxy-req-url
Result:
string

The URL in the request.

proxy-req-scheme
Result:
string

The URL scheme in the proxy request.

proxy-req-loc
Result:
string

The network location in the request. This is retrieved from the URL if present, otherwise from the Host field.

proxy-req-host
Result:
string

Host for the request. This is retrieved from the URL if present, otherwise from the Host field. This does not include the port.

proxy-req-path
Result:
string

The path of the URL in the request. This does not include a leading slash.

proxy-req-port
Result:
integer

The port for the request. This is pulled from the URL if present, otherwise from the Host field.

proxy-req-query
Result:
string

The query string in the proxy request.

proxy-req-query-value
Result:
string
Argument:
Query parameter key.

The value for a specific query parameter, identified by key. This assumes the standard format for a query string, key / value pairs (joined by ‘=’) separated by ‘&’ or ‘;’. The key comparison is case insensitive. NIL is returned if the key is not found.

proxy-req-fragment
Result:
string

The fragment of the URL in the proxy request if present, an empty string if not.

proxy-req-url-host
Result:
string

The host in the request URL.

proxy-req-url-port
Result:
integer

The port in the request URL.

proxy-req-url-loc
Result:
string

The location in the URL if present, an empty string if not.

proxy-req-field
Result:
NULL, string, string list
Argument:
name

The value of a field. This requires a field name as a argument. To get the value of the “Host” field the extractor would be “proxy-req-field<Host>”. The field name is case insensitive.

If the field is not present, the NULL value is returned. Note this is distinct from the empty string which is returned if the field is present but has no value. If there are duplicate fields then a string list is returned, each element of which corresponds to a field.

Upstream Response

upstream-rsp-status
Result:
integer

The code of the response status.

upstream-rsp-status-reason
Result:
string

The reason of the response status.

upstream-rsp-field
Result:
NULL, string, string list
Argument:
name

The value of a field. This requires a field name as a argument. The field name is case insensitive.

If the field is not present, the NULL value is returned. Note this is distinct from the empty string which is returned if the field is present but has no value. If there are duplicate fields then a string list is returned, each element of which corresponds to a field.

Proxy Response

proxy-rsp-status
Result:
integer

The code of the response status.

proxy-rsp-status-reason
Result:
string

The reason of the response status.

proxy-rsp-field
Result:
NULL, string, string list
Argument:
name

The value of a field. This requires a field name as a argument. The field name is case insensitive.

If the field is not present, the NULL value is returned. Note this is distinct from the empty string which is returned if the field is present but has no value. If there are duplicate fields then a string list is returned, each element of which corresponds to a field.

Transaction

is-internal
Result:
boolean

This returns a boolean value, true if the request is an internal request, and false if not.

Session

inbound-txn-count
Result:
integer

The number of transactions, including the current on, that have occurred on the inbound transaction.

inbound-addr-remote
Result:
IP address

The remote address for the inbound connection. This is also known as the “client address”, the address from which the connection originates.

inbound-addr-local
Result:
IP address

The local address for the inbound connection, which is the address used accept the inbound session.

inbound-sni
Result:
string

The SNI name sent on the inbound session.

has-inbound-protocol-prefix
Result:
boolean
Argument:
protocol tag prefix

For the inbound session there is a list of protocol tags that describe the network protocols used for that network connection. This extractor checks the inbound session list to see if it contains a tag that has a specific prefix. The most common use is to determine if the inbound session is TLS

with: has-inbound-protocol-prefix<tls>
select:
-  is-true:
   do: # TLS only stuff.

Note

Checking a request for the scheme “https” is not identical to checking for TLS. Nothing prevents a user agent from sending a scheme at variance with the network protocol stack. This extractor checks the network protocol, not the request.

Checking for IPv6 can be done in a similar way.

with: has-inbound-protocol-prefix<ipv6>
select:
- is-true:
  do: # IPv6 special handling.
inbound-protocol
Result:
string
Argument:
protocol tag prefix

For the inbound session there is a list of protocol tags that describe the network protocols used for that network connection. This extractor searches the inbound session list and if there is a prefix match, returns the matched protocol tag. This can be used to check for different versions of TLS.

with: inbound-protocol<tls>
select:
-  match: "tls/1.3"
   do: # TLS 1.3 only stuff.
-  prefix: "tls"
   do: # Older TLS stuff.
-  otherwise:
   do: # Non-TLS stuff.
inbound-protocol-stack
Result:
tuple of strings

This extracts the entire stack of tags for the network protocols of the inbound connection as a tuple. This could be used to check for an IPv4 connection

with: inbound-protocol-stack
select:
-  for-any:
      match: "ipv4"
   do:
   # IPv4 only things.

In general, though, has-inbound-protocol-prefix is usually a better choice for doing such checking unless the full stack or a full tag is needed.

inbound-cert-verify-result
Result:
integer

The result of verifying the inbound remote (client) certificate. Due to issues in the OpenSSL library this can be a bit odd. If the the inbound session is not TLS the result will be X509_V_ERR_INVALID_CALL which as of this writing has the value 69 (:reference). Otherwise, if no client certificate was provided and was not required the result is X509_V_OK which has the value 0. This lack can be detected indirectly by all of the certificate extractors returning empty strings.

inbound-cert-local-issuer-field
Result:
string
Argument:
Entry name.

Extract the value for an entry in the local (ATS) certificate issuer for an inbound session. This will accept a short or long name as the argument. Note these names are case sensitive.

inbound-cert-local-subject-field
Result:
string
Argument:
Entry name.

Extract the value for an entry in the local (ATS) certificate subject for an inbound session. This will accept a short or long name as the argument. Note these names are case sensitive.

inbound-cert-remote-issuer-field
Result:
string
Argument:
Entry name.

Extract the value for an entry in the remote (client) certificate issuer for an inbound session. This will accept a short or long name as the argument. Note these names are case sensitive.

If a client certificate wasn’t provided or failed validation, this will yield an empty string.

inbound-cert-remote-subject-field
Result:
string
Argument:
Entry name.

Extract the value for an entry in the remote (client) certificate subject for an inbound session. This will accept a short or long name as the argument. Note these names are case sensitive.

If a client certificate wasn’t provided or failed validation, this will yield an empty string.

outbound-cert-local-issuer-field
Result:
string
Argument:
Entry name.

Extract the value for an entry in the local (ATS) certificate issuer for an outbound session. This will accept a short or long name as the argument. Note these names are case sensitive.

outbound-cert-local-subject-field
Result:
string
Argument:
Entry name.

Extract the value for an entry in the local (ATS) certificate subject for an outbound session. This will accept a short or long name as the argument. Note these names are case sensitive.

outbound-cert-remote-issuer-field
Result:
string
Argument:
Entry name.

Extract the value for an entry in the remote (server) certificate issuer for an outbound session. This will accept a short or long name as the argument. Note these names are case sensitive.

If the destination didn’t provide a certificate or failed validation, this will yield an empty string.

outbound-cert-remote-subject-field
Result:
string
Argument:
Entry name.

Extract the value for an entry in the remote (server) certificate subject for an outbound session. This will accept a short or long name as the argument. Note these names are case sensitive.

If the destination didn’t provide a certificate or failed validation, this will yield an empty string.

outbound-txn-count
Result:
integer

The number of transactions between the Traffic Server proxy and the origin server from a single session. Any value greater than zero indicates connection reuse.

with: outbound-txn-count
select:
- gt: 10
  do:
   - proxy-rsp-field<Connection>: "close"

Warning

For ATS versions before 10, this will return 0 and the value should not be taken into consideration to determine connection reuse.

outbound-addr-remote
Result:
IP address

The address of the origin server for a transaction.

outbound-addr-local
Result:
IP address

The local address of the server connection for a transaction.

Duration

A “duration” is a span of time. This is specified by one of a set of extractors.

milliseconds
Result:
duration
Argument:
count

A duration of count milliseconds.

seconds
Result:
duration
Argument:
count

A duration of count seconds.

minutes
Result:
duration
Argument:
count

A duration of count minutes.

hours
Result:
duration
Argument:
count

A duration of count hours.

Utility

This is an ecletic collection of extractors that do not depend on transaction or session data.

...
Result:
any

The feature for the most recent with.

random
Result:
integer

Generate a random integer in a uniform distribution. The default range is 0..99 because the most common use is for a percentage. This can be changed by adding arguments. A single number argument changes the upper bound. Two arguments changes the range. E.g.

random<199> generates integers in the range 0..199.

random<1,100> generates integers in the range 1..100.

The usual style for using this in a percentage form is

with: random
select:
- lt: 5 # match 5% of the time
  do: # ...
- lt: 25: # match 20% of the time - 25% less the previous 5%
  do: # ...
text-block
Result:
string
Argument:
name

Extract the content of the text block (defined by a text-block-define) for name.

ip-col
Argument:
Column name or index

This must be used in the context of the modifier ip-space which creates the row context needed to extract the column value for that row. The argument can be the name of the column, if it has a name, or the index. Note index 0 is the IP address range, and data columns start at index 1.

stat
Result:
integer
Argument:
Plugin statistic name.

This extracts the value of a plugin statistic, which is currently limited to integers by Traffic Server.

Note statistic values are eventually consistent, there can be multiple second delays between incrementing a statistic with stat-update and the value changing.

env
Result:
string
Argument:
Variable name

Extract the value of the named variable from the process environment.

inbound-tcp-info
Result:
integer
Argument:
Field name

Extracts a field value from the tcp_info data available on some operating systems. If not available, NULL is returned.

The currently supported fields are

rtt

Round trip time.

rto

Retransmission timeout.

retrans

Retransmits.

snd-cwnd

Outbound congestion window.

ts-uuid
Result:
string

The process level UUID for this instance of Traffic Server.