Logstash

Logstash is a powerful, open source, unstructured data processing program that can accept text data from many different sources (directly over TCP/UDP, via Unix sockets, or by reading in files from disk for example), in many different formats and transform those inputs into structured, searchable documents.

One of the most common use cases is to take text based system and application logs, extract individual fields (e.g. host names, error codes, timing data, and so on), and make the data available in Elasticsearch for searching and reporting.

In this guide, we will cover the basics of getting your Traffic Server log data into Logstash. Going the next step and building fancy Kibana dashboards on top of that is currently left as an exercise for the reader.

Traffic Server Log Formats

Traffic Server provides a very flexible set of logging outputs. Almost any format can be constructed. The full range of options is covered in the Logging chapter.

This guide will walk you through using the appropriate filters in Logstash for the common logging formats in Traffic Server. If you have constructed your own custom log formats, you will need to build upon these examples and refer to the Logstash documentation to produce custom filters capable of parsing your own formats.

Logstash Input

For the on-disk logs produced by Traffic Server, you will want to use Logstash’s file input plugin. Note that your logs must be in ASCII format, not binary, for the plugin to work.

Assuming that your Traffic Server event logs are named access-<rotationtimestamp>.log and stored at /var/log/trafficserver/, the following Logstash input configuration should work:

input {
  file {
    path => /var/log/trafficserver/access-*.log
  }
}

Logstash provides some additional tweaking options, which are explained in the file plugin documentation but the above provides the bare minimum required to have Logstash read log data from local disks.

Logstash Filters

The grok filter in Logstash allows you to completely tailor the parsing of your source data and extract as many or as few fields as you like.

Some patterns are already built and can be used very easily. If you have built custom log formats for Traffic Server, you may need to write your own patterns, however.

Squid Compatible

The Squid log format includes, unsurprisingly, a few useful fields for proxy servers. Using the following grok pattern will extract this information from your Traffic Server logs if you employ the Squid compatible log format:

filter {
  grok {
    match => { "message" => "%{NUMBER:timestamp} %{NUMBER:timetoserve} %{IPORHOST:clientip} %{WORD:cachecode}/%{NUMBER:response} %{NUMBER:bytes} %{WORD:verb} %{NOTSPACE:request} %{USER:auth} %{NOTSPACE:route} %{DATA:contenttype}" }
  }
  date {
    match => [ "timestamp", "UNIX" ]
  }
}

The resulting structured document will contain the following fields:

Field

Description

timestamp

Date and time of the client request.

timetoserve

Time, in seconds, from initial client connection to Traffic Server until the last byte has been sent back to client from Traffic Server.

clientip

Client IP address or hostname.

cachecode

Cache Result Codes.

response

HTTP response status code sent by Traffic Server to the client.

bytes

Length, in bytes, of the Traffic Server response to the client, including headers.

verb

HTTP method (e.g. GET, POST, etc.) of the client request.

request

URL specified by the client request.

auth

Authentication username supplied by the client, if present.

route

Proxy hierarchy route; the route used by Traffic Server to retrieve the cache object.

contenttype

Content type of the response.

Netscape Common

If your Traffic Server instance is already outputting Netscape Common format logs, then Logstash’s COMMONAPACHELOG pattern will handle your logs out of the box. Add the following filter block to your Logstash configuration:

filter {
  grok {
    match => { "message" => "%{COMMONAPACHELOG}" }
  }
}

This will produce a structured document for each log entry with the following fields:

Field

Description

clientip

Client IP address or hostname.

ident

Always a literal - character for Traffic Server logs.

auth

The authentication username for the client request. A - means no authentication was required (or supplied).

timestamp

The date and time of the client request.

verb

HTTP method used for the request (e.g. GET, POST, etc.).

request

URL specified by the client request.

httpversion

HTTP version (e.g. 1.1) used by the client.

rawrequest

See note below.

response

HTTP status code used for Traffic Server response (not the origin’s response code).

bytes

Length of Traffic Server response to client, in bytes.

Note

rawrequest is populated when the usual "<verb> <request> http/<httpversion>" pattern was not matched. In that event, those three fields will be missing from the document, and instead rawrequest will have the original string.

Netscape Extended

The following pattern adds to Common Apache to support the additional fields found in Netscape Extended:

filter {
  grok {
    match => { "message" => "%{COMMONAPACHELOG} %{NUMBER:originstatus} %{NUMBER:originrespbytes} %{NUMBER:clientreqbytes} %{NUMBER:proxyreqbytes} %{NUMBER:clienthdrbytes} %{NUMBER:proxyresphdrbytes} %{NUMBER:proxyreqhdrbytes} %{NUMBER:originhdrbytes} %{NUMBER:timetoserve}" }
  }
}

Because this starts out with the COMMONAPACHELOG pattern, you will get all of the fields mentioned in Netscape Common above, as well as the following:

Field

Description

originstatus

HTTP status code returned by origin server.

originrespbytes

Body length, in bytes, of origin’s response to Traffic Server.

clientreqbytes

Body length, in bytes, of client request to Traffic Server.

proxyreqbytes

Body length, in bytes, of Traffic Server request to origin.

clienthdrbytes

Header length, in bytes, of client request to Traffic Server.

proxyresphdrbytes

Header length, in bytes, of Traffic Server response to client.

proxyreqhdrbytes

Header length, in bytes, of Traffic Server request to origin.

originhdrbytes

Header length, in bytes, of origin’s response to Traffic Server.

timetoserve

Time, in seconds, from initial client connection to Traffic Server until the last byte has been sent back to client from Traffic Server.