Usage Guide¶
This section focuses on tasks rather than mechanism, to illustrate how to use the mechanisms.
Basic Tips¶
Based on production deployment experience, there are a few general items to keep in mind.
How to "do" it¶
Except for top level when
directives for global configuration, all directives are grouped
under a do
keyword in some object. This makes the most important context for a directive
which do
contains it and transitively which object contains that do
. It is that
object that will determine whether the nested directives are invoked.
For example to invoke directives conditionally, a comparison is used. The comparison contains a
do
and the directives attached to that are invoked if the comparison succeeds. Because of
YAML structuring this is very sensitive to indentation. Consider this example from actual production
use.
The goal is to shift traffic unless the query string contains the "qx" key. The original attempt was
with: pre-remap-query
select:
- none-of:
- contains: "qx="
do:
- ua-req-host: "beta.service.ex"
This didn't work because the do
was in the wrong location. Because YAML nodes are order
independent this is identical to
with: pre-remap-query
do:
- ua-req-host: "beta.service.ex"
select:
- none-of:
- contains: "qx="
Now the problem is clear - the traffic shifting is always done and the comparison has no directives. The correct configuration is
with: pre-remap-query
select:
- none-of:
- contains: "qx="
do:
- ua-req-host: "beta.service.ex"
The key rule here is to line up the do
with the containing object that should trigger the
directives. In this case it is the comparison none-of
because the traffic should be shifted
if that matches (i.e. contains
does not match). Therefore the do
should line up
with none-of
so it is triggered by none-of
. In the erroneous case the do
lined up with with
and so was triggered by that with
. This is a bit clearer in the
example because the configuration isn't deeply nested. In production the difference between 5 levels
of indentation and 6 is not always so obvious.
Summary
Always line up do
with the directive or comparison that should trigger the directives
in the do
.
Working with HTTP fields¶
A main use of TxnBox is to manipulate the fields in the HTTP header. There are a variety of directives and extractors, classified primarily by which HTTP message is to be modified or examined If a particular directive or extractor is not allowed on a hook, that indicates it's not useful. For instance, there is no use in changing anything in the client request during the "send proxy response" hook, as it would have no observable effect. Conversely the proxy response can't be changed during the "read client request" hook because the proxy response doesn't exist.
The four prefixes used are
ua-req |
User agent request (inbound to proxy) |
proxy-req |
Proxy request (outbound from proxy) |
upstream-rsp |
Upstream response (inbound to proxy) |
proxy-rsp |
Proxy response (outbound from proxy) |
The field related directives and extractors require an argument, which is the name of the HTTP field. This name is case insensitive because the HTTP fields names are case insensitive. The value is a feature expression which should evaluate to a string or a list of strings. A list of strings represents a list of duplicate fields, all with the sane name but distinct values, one for each element of the list.
To set the field "Best-Band" to the string "Delain" in the proxy request
proxy-req-field<Best-Band>: "Delain"
To set the field "TLS-Source" to the SNI name and the client IP address (see inbound-sni
and
inbound-addr-remote
)
proxy-req-field<TLS-Source>: "{inbound-sni}@{inbound-addr-remote}"
For a connection that had an SNI of "delain.nl" from the address 10.12.97.156, the proxy request sent to the upstream would have "TLS-Source: delain.nl@10.12.97.256".
Consider the case where various requests get remapped to the same upstream host name, but the upstream
needs the value of the "Host" field from the original request. This could by copying the Host
field
to the Org-Host
field -
proxy-req-field<Org-Host>: ua-req-field<Host>
If this was intended for debugging and therefore to be more human readable, it could be done as
proxy-req-field<Org-Host>: "Original host was {ua-req-field<Host>}"
Another common use case is to have a default value. For instance, set the field "Accept-Encoding" to "identity" if not already set.
proxy-req-field<Accept-Encoding>: [ proxy-req-field<Accept-Encoding> , { else: "identity" } ]
This assigns to "Accept-Encoding" as before, but the modifier else
is applied after
retrieving the current value of that field. This modifier keeps the original value unless it's
empty, in which case it uses its own value.
Because the input is YAML, the previous example could also be written in long hand as
proxy-req-field<Accept-Encoding>:
- proxy-req-field<Accept-Encoding>
- else: "identity"
From the TxnBox point of view, these are indistinguishable. In both cases the feature expression is a list of an unquoted string and an object, the first treated as an extractor and the second as modifier. Further note the extractor being the same field as the directive is happenstance - it could be any field, or any extractor or feature expression. This is how values can be easily copied between fields.
A field can also be removed by assigning it the NULL
value. To remove the "X-Forwarded-For"
field from the client request
ua-req-field<X-Forwarded-For>: NULL
Note this is distinct from assigning the string "NULL"
ua-req-field<X-Forwarded-For>: "NULL"
and not the same as assigning the empty string, such that the field is present but without a value
ua-req-field<X-Forwarded-For>: ""
For a list based example consider the Via
header. This can extend over multiple fields. For this
reason the extractor proxy-req-field
can return a list.
Rewriting URLs¶
There are a number of ways to rewrite URLs in a client request. It can be done by specifying the entire replacement URL or by changing it piecewise.
The primary directive for this in a remap invoked configuration is the ua-req-url
directive. This always applies to
the user agent request, and takes a full URL as its value. The user agent request is updated to be to that
URL. If the existing URL is a full URL, it is changed to the URL in the value. Otherwise only
the path is copied over. If the value URL scheme is different, the request is modified to use
that scheme (e.g., if the value URL has "https://" then the proxy request will use TLS). The
"Host field is also updated to contain the host from the value URL.
For instance, to send the request to the upstream "app.txnbox"
ua-req-host: "app.txnbox"
This will change the host in the URL if already present and set the "Host" field. This could also be done as
ua-req-url-host: "app.txnbox"
ua-req-field<Host>: "app.txnbox"
The difference is this will cause the host to be in the URL regardless if it was already present.
Using Variables¶
For each transaction, TxnBox supports a set of named variables. The names can be arbitrary strings
and the value any feature. A variable is set using the var
directive with an
argument of the variable name and the value a feature. To set the variable "Best-Band" to "Delain"
var<Best-Band>: "Delain"
To later set the field "X-Best-Band" to the value of that variable
proxy-req-field<X-Best-Band>: var<Best-Band>
Note variables are not fields in the HTTP transaction, they are entirely an internal feature of TxnBox. In the preceding example, there is only a relationship between the variable "Best-Band" and the proxy request field "X-Best-Band" because of the explicit assignment. If either is changed later, the other is not [1]. Each transaction starts with no variables set, variables do not carry over from one transaction to any other.
One common use case for variables is to cache a value in an early hook for use in a later hook. Note there is only one transaction name space for variables and variables set in global hooks are available in remap and vice versa. This is handy if some remap behavior should depend on the original client request URL or host, and not on the post-remap one. This can be done, in a limited way, with the "proxy.config.http.pristine_host_header" configuration, but that has other potential side effects and may not be usable because of other constraints. In contrast, caching the original host name in a variable is easy
when: ua-req
do:
var<pristine-host>: ua-req-host
A specific use case for this is handling cross site scripting fields, where these should be set unless the original request was to the static image server at "images.txnbox", which may have been remapped to a different upstream shard, changing the host in the client request. This could be done by selecting on the "pristine-host" variable and setting the cross site fields if that is not "image.txnbox" or a subdomain of it.
when: proxy-sp
do:
with: var<pristine-host>
select:
- none-of:
- tld: "image.txnbox" # This domain or any subdomain
do:
- proxy-rsp-field<Expect-CT>: "max-age=31536000, report-uri=\"http://csp.txnbox\""
- proxy-rsp-field<X-XSS-Protection>: "1; mode=block"
Variables can be used to simplify configurations, if there is a complex configuration needed in multiple places, the results can be placed in a variable and then that variable's value used later, avoiding much of the complexity. For instance, remap rules could set a variable as a flag to indicate which remap rule triggered.
Filter Techniques¶
The filter
modifier can be used to perform a variety of tasks. The most common is to
filter out elements in a list.
filter
can also be used as a primitive lookup table, akin to a "switch" or "case" statement.
Consider an example where a proxy request field should be set to "high" for a set of domains,
"medium" for another, and "low" for any other domain. This could be done as
proxy-req-field<Priority>:
- ua-req-host
- filter:
- tld: "important.tld"
replace: "high"
- tld: "interesting.tld"
replace: "medium"
- replace: "low"
In essence, each comparison does a replace
to provide the translated value, with a final
replace
with no comparison that matches anything not already matched.
Footnotes