Logstash

Logstash is a powerful, open source, unstructured data processing program that can accept text data from many different sources (directly over TCP/UDP, via Unix sockets, or by reading in files from disk for example), in many different formats and transform those inputs into structured, searchable documents.

One of the most common use cases is to take text based system and application logs, extract individual fields (e.g. host names, error codes, timing data, and so on), and make the data available in Elasticsearch for searching and reporting.

In this guide, we will cover the basics of getting your Traffic Server log data into Logstash. Going the next step and building fancy Kibana dashboards on top of that is currently left as an exercise for the reader.

Traffic Server Log Formats

Traffic Server provides a very flexible set of logging outputs. Almost any format can be constructed. The full range of options is covered in the Logging chapter.

This guide will walk you through using the appropriate filters in Logstash for the common logging formats in Traffic Server. If you have constructed your own custom log formats, you will need to build upon these examples and refer to the Logstash documentation to produce custom filters capable of parsing your own formats.

Logstash Input

For the on-disk logs produced by Traffic Server, you will want to use Logstash’s file input plugin. Note that your logs must be in ASCII format, not binary, for the plugin to work.

Assuming that your Traffic Server event logs are named access-<rotationtimestamp>.log and stored at /var/log/trafficserver/, the following Logstash input configuration should work:

input {
  file {
    path => /var/log/trafficserver/access-*.log
  }
}

Logstash provides some additional tweaking options, which are explained in the file plugin documentation but the above provides the bare minimum required to have Logstash read log data from local disks.

Logstash Filters

The grok filter in Logstash allows you to completely tailor the parsing of your source data and extract as many or as few fields as you like.

Some patterns are already built and can be used very easily. If you have built custom log formats for Traffic Server, you may need to write your own patterns, however.

Squid Compatible

The Squid log format includes, unsurprisingly, a few useful fields for proxy servers. Using the following grok pattern will extract this information from your Traffic Server logs if you employ the Squid compatible log format:

filter {
  grok {
    match => { "message" => "%{NUMBER:timestamp} %{NUMBER:timetoserve} %{IPORHOST:clientip} %{WORD:cachecode}/%{NUMBER:response} %{NUMBER:bytes} %{WORD:verb} %{NOTSPACE:request} %{USER:auth} %{NOTSPACE:route} %{DATA:contenttype}" }
  }
  date {
    match => [ "timestamp", "UNIX" ]
  }
}

The resulting structured document will contain the following fields:

Field Description
timestamp Date and time of the client request.
timetoserve Time, in seconds, from initial client connection to Traffic Server until the last byte has been sent back to client from Traffic Server.
clientip Client IP address or hostname.
cachecode Cache Result Codes.
response HTTP response status code sent by Traffic Server to the client.
bytes Length, in bytes, of the Traffic Server response to the client, including headers.
verb HTTP method (e.g. GET, POST, etc.) of the client request.
request URL specified by the client request.
auth Authentication username supplied by the client, if present.
route Proxy hierarchy route; the route used by Traffic Server to retrieve the cache object.
contenttype Content type of the response.

Netscape Common

If your Traffic Server instance is already outputting Netscape Common format logs, then Logstash’s COMMONAPACHELOG pattern will handle your logs out of the box. Add the following filter block to your Logstash configuration:

filter {
  grok {
    match => { "message" => "%{COMMONAPACHELOG}" }
  }
}

This will produce a structured document for each log entry with the following fields:

Field Description
clientip Client IP address or hostname.
ident Always a literal - character for Traffic Server logs.
auth The authentication username for the client request. A - means no authentication was required (or supplied).
timestamp The date and time of the client request.
verb HTTP method used for the request (e.g. GET, POST, etc.).
request URL specified by the client request.
httpversion HTTP version (e.g. 1.1) used by the client.
rawrequest See note below.
response HTTP status code used for Traffic Server response (not the origin’s response code).
bytes Length of Traffic Server response to client, in bytes.

Note

rawrequest is populated when the usual "<verb> <request> http/<httpversion>" pattern was not matched. In that event, those three fields will be missing from the document, and instead rawrequest will have the original string.

Netscape Extended

The following pattern adds to Common Apache to support the additional fields found in Netscape Extended:

filter {
  grok {
    match => { "message" => "%{COMMONAPACHELOG} %{NUMBER:originstatus} %{NUMBER:originrespbytes} %{NUMBER:clientreqbytes} %{NUMBER:proxyreqbytes} %{NUMBER:clienthdrbytes} %{NUMBER:proxyresphdrbytes} %{NUMBER:proxyreqhdrbytes} %{NUMBER:originhdrbytes} %{NUMBER:timetoserve}" }
  }
}

Because this starts out with the COMMONAPACHELOG pattern, you will get all of the fields mentioned in Netscape Common above, as well as the following:

Field Description
originstatus HTTP status code returned by origin server.
originrespbytes Body length, in bytes, of origin’s response to Traffic Server.
clientreqbytes Body length, in bytes, of client request to Traffic Server.
proxyreqbytes Body length, in bytes, of Traffic Server request to origin.
clienthdrbytes Header length, in bytes, of client request to Traffic Server.
proxyresphdrbytes Header length, in bytes, of Traffic Server response to client.
proxyreqhdrbytes Header length, in bytes, of Traffic Server request to origin.
originhdrbytes Header length, in bytes, of origin’s response to Traffic Server.
timetoserve Time, in seconds, from initial client connection to Traffic Server until the last byte has been sent back to client from Traffic Server.