Logstash¶

Logstash is a powerful, open source, unstructured data processing program that can accept text data from many different sources (directly over TCP/UDP, via Unix sockets, or by reading in files from disk for example), in many different formats and transform those inputs into structured, searchable documents.

One of the most common use cases is to take text based system and application logs, extract individual fields (e.g. host names, error codes, timing data, and so on), and make the data available in Elasticsearch for searching and reporting.

In this guide, we will cover the basics of getting your Traffic Server log data into Logstash. Going the next step and building fancy Kibana dashboards on top of that is currently left as an exercise for the reader.

Traffic Server Log Formats¶

Traffic Server provides a very flexible set of logging outputs. Almost any format can be constructed. The full range of options is covered in the Logging chapter.

This guide will walk you through using the appropriate filters in Logstash for the common logging formats in Traffic Server. If you have constructed your own custom log formats, you will need to build upon these examples and refer to the Logstash documentation to produce custom filters capable of parsing your own formats.

Logstash Input¶

For the on-disk logs produced by Traffic Server, you will want to use Logstash’s file input plugin. Note that your logs must be in ASCII format, not binary, for the plugin to work.

Assuming that your Traffic Server event logs are named access-<rotationtimestamp>.log and stored at /var/log/trafficserver/, the following Logstash input configuration should work:

input {
  file {
    path => /var/log/trafficserver/access-*.log
  }
}

Logstash provides some additional tweaking options, which are explained in the file plugin documentation but the above provides the bare minimum required to have Logstash read log data from local disks.

Logstash Filters¶

The grok filter in Logstash allows you to completely tailor the parsing of your source data and extract as many or as few fields as you like.

Some patterns are already built and can be used very easily. If you have built custom log formats for Traffic Server, you may need to write your own patterns, however.

Squid Compatible¶

The Squid log format includes, unsurprisingly, a few useful fields for proxy servers. Using the following grok pattern will extract this information from your Traffic Server logs if you employ the Squid compatible log format:

filter {
  grok {
    match => { "message" => "%{NUMBER:timestamp} %{NUMBER:timetoserve} %{IPORHOST:clientip} %{WORD:cachecode}/%{NUMBER:response} %{NUMBER:bytes} %{WORD:verb} %{NOTSPACE:request} %{USER:auth} %{NOTSPACE:route} %{DATA:contenttype}" }
  }
  date {
    match => [ "timestamp", "UNIX" ]
  }
}

The resulting structured document will contain the following fields:

Field	Description
timestamp	Date and time of the client request.
timetoserve	Time, in seconds, from initial client connection to Traffic Server until the last byte has been sent back to client from Traffic Server.
clientip	Client IP address or hostname.
cachecode	Cache Result Codes.
response	HTTP response status code sent by Traffic Server to the client.
bytes	Length, in bytes, of the Traffic Server response to the client, including headers.
verb	HTTP method (e.g. `GET`, `POST`, etc.) of the client request.
request	URL specified by the client request.
auth	Authentication username supplied by the client, if present.
route	Proxy hierarchy route; the route used by Traffic Server to retrieve the cache object.
contenttype	Content type of the response.

Netscape Common¶

If your Traffic Server instance is already outputting Netscape Common format logs, then Logstash’s COMMONAPACHELOG pattern will handle your logs out of the box. Add the following filter block to your Logstash configuration:

filter {
  grok {
    match => { "message" => "%{COMMONAPACHELOG}" }
  }
}

This will produce a structured document for each log entry with the following fields:

Field	Description
clientip	Client IP address or hostname.
ident	Always a literal `-` character for Traffic Server logs.
auth	The authentication username for the client request. A `-` means no authentication was required (or supplied).
timestamp	The date and time of the client request.
verb	HTTP method used for the request (e.g. `GET`, `POST`, etc.).
request	URL specified by the client request.
httpversion	HTTP version (e.g. `1.1`) used by the client.
rawrequest	See note below.
response	HTTP status code used for Traffic Server response (not the origin’s response code).
bytes	Length of Traffic Server response to client, in bytes.

Note

rawrequest is populated when the usual "<verb> <request> http/<httpversion>" pattern was not matched. In that event, those three fields will be missing from the document, and instead rawrequest will have the original string.

Netscape Extended¶

The following pattern adds to Common Apache to support the additional fields found in Netscape Extended:

filter {
  grok {
    match => { "message" => "%{COMMONAPACHELOG} %{NUMBER:originstatus} %{NUMBER:originrespbytes} %{NUMBER:clientreqbytes} %{NUMBER:proxyreqbytes} %{NUMBER:clienthdrbytes} %{NUMBER:proxyresphdrbytes} %{NUMBER:proxyreqhdrbytes} %{NUMBER:originhdrbytes} %{NUMBER:timetoserve}" }
  }
}

Because this starts out with the COMMONAPACHELOG pattern, you will get all of the fields mentioned in Netscape Common above, as well as the following:

Field	Description
originstatus	HTTP status code returned by origin server.
originrespbytes	Body length, in bytes, of origin’s response to Traffic Server.
clientreqbytes	Body length, in bytes, of client request to Traffic Server.
proxyreqbytes	Body length, in bytes, of Traffic Server request to origin.
clienthdrbytes	Header length, in bytes, of client request to Traffic Server.
proxyresphdrbytes	Header length, in bytes, of Traffic Server response to client.
proxyreqhdrbytes	Header length, in bytes, of Traffic Server request to origin.
originhdrbytes	Header length, in bytes, of origin’s response to Traffic Server.
timetoserve	Time, in seconds, from initial client connection to Traffic Server until the last byte has been sent back to client from Traffic Server.