Ben Clifford

Performance Monitoring w/ ELK - Part II: Monitoring HCP Access Logs

Blog Post created by Ben Clifford Employee on Sep 19, 2018

Introduction

This is the second in a series of blog posts outlining how to use open source ELK (Elasticsearch, Logstash, Kibana) to visualize performance of a system.  If you will be implementing the monitoring described in this post you will first want to follow the steps in the first post: Performance Monitoring w/ ELK - Part I: Installing ELK

 

In this post we will cover how to visualize an HCP cluster's performance using the HCP REST/S3/Swift gateway access logs. To accomplish this you will need an HCP cluster running version 7.3 or greater. For real-time monitoring you must have network connectivity between your HCP cluster and the server running the ELK software. In this post we will configure ELK to receive HCP access logs over syslog port 514, and we will configure the HCP cluster to send the access logs to the ELK server.

 

Logstash Configuration

In the previous post we installed the Logstash software. Logstash is a software component which can be configured to receive messages, A.K.A. "events", and process those events in a particular way. Events can be messages received over the network, outputs from running commands, or the result of tailing log files, just to name a few. Logstash parses events according to the instructions in a configuration file. For each event, values parsed from the event message are assigned to named fields which will be passed to an indexing service, in this case Elasticsearch. In this section we break down a Logstash configuration to better understand each of its elements.

 

A Logstash configuration file has 3 main sections: inputs, filters, and outputs. The input section of the Logstash configuration file defines one or more inputs from which events are received. The filter section defines how that event is processed, and the output section defines one or more outputs to receive the results. This is referred to as the "processing pipeline". The contents of each section are defined using plugins which are described in detail at the following links for input plugins, filter plugins, and output plugins. Let's have a look at examples of each of these sections from a configuration designed to receive access logs from HCP over syslog, parse the access log events, and send them to the Elasticsearch index.

 

Input Section

The input section specifies where the Logstash events are coming from. In our case we will be listening on UDP port 514 to receive HTTP gateway access logs via syslog:

input {
   udp {
     port => 514
     type => syslog
   }
#  stdin {
#    type => syslog
#  }
}

In this example you see two input plugins being used, udp and stdin. The udp plugin will listen on the specified port. The type field will set the event's type to the value specified. Using event type you can have multiple input plugins in the same configuration, and use the type field to distinguish among events received by different input plugins. The stdin plugin is useful for debugging as you can trigger an event by pasting a message into the Logstash terminal without having to wait for some external event. You will notice the stdin plugin is commented out using the # symbol.

 

Filter Section

The filter section is where the real work happens and will constitute the majority of code in any Logstash configuration. The following filter code is a bit simplified from what you will find in the configuration file attached to this article, in order to allow us to focus on the most meaningful aspects. Take a look at the code and the sample event, and we will break it down below.

filter {
  if [type] == "syslog" {
  # Sample event:
  # 172.18.12.24 -  myuser  [22/Sep/2018:10:58:16 -0400] "GET /folder/object.png HTTP/1.1" 200 210 MyNamespace.Mytenant 007 0
    grok {
      match => {
        "message" => "%{IPORHOST:clientip} %{HTTPDUSER:ident} +%{HTTPDUSER:auth} +\[%{HTTPDATE:timestamp}\] \"(?:%{WORD:verb} %{NOTSPACE:request}(?: HTTP/%{NUMBER:httpversion})?|%{DATA:rawrequest})\" %{NUMBER:response:int} (?:%{NUMBER:response_bytes:int}|-) ((?[a-zA-Z0-9-]+)\.)?(?[a-zA-Z0-9-]+)(@hs3)? %{NUMBER:latency:int}( %{INT:node_number:int})?( %{INT:request_bytes:int})?.*"
      }
    }

    date {

      match => [ "timestamp" , "dd/MMM/yyyy:HH:mm:ss Z" ]

    }

  }
}

The first thing you will notice is conditional logic if [type] == "syslog". Type is an event field that was set by the input plugin. You can learn more about using conditionals and accessing event fields here.

 

The next line after the if statement is the beginning of a grok filter plugin definition. Grok plugins are very useful for taking unstructured text like web server logs and turning it into structured data that can be indexed and queried. In this example the match option is matching the content of the message event field1 on the left with the provided expression on the right. The pattern string matches using a combination of predefined Logstash patterns and regular expressions. Predefined Logstash patterns are specified as %{<pattern name>.<event field name>.<data type>}. Grok assigns matched values to the specified event field name and casts the event field datatype if specified. For example:

%{HTTPDATE:timestamp} - Finds a date that matches the Apache HTTP access log date format (see grok-patterns) and assigns the text value to the timestamp event field. All fields are text by default.

%{NUMBER:response:int} - Matches a number and assigns the value to the response event field, casting the field as an integer.

 

The other plugin in the filter section is the date filter plugin. The date filter is used for parsing dates from fields, and then using that date or timestamp as the Logstash timestamp for the event. For example:

match => [ "timestamp" , "dd/MMM/yyyy:HH:mm:ss Z" ] - Converts the text stored in the timestamp event field into a date using the provided joda-time date pattern, and assigns that timestamp to the Logstash event.

 

1 Logstash puts the data from the input event into the message field automatically

 

Output Section

Finally the output section describes what to do with the data which has been stored in the event fields. Typically, and in the following example, data is sent to an Elasticsearch index.

output {

  elasticsearch {

    hosts => ["localhost:9200"]

    user => elastic

    password => changeme

    index => "access_logs-%{+yyMMdd}"

  }

#  stdout { codec => rubydebug }

}

The Elasticsearch output plugin specifies the host and port where the Elasticsearch index is running, the username and password, and the name of the index to add the record to. If the index does not exist it will be created. In this example if the event timestamp were September 7th 2018 the index name would be access_logs-180907 .

 

The stdout output plugin is a simple output which prints to the stdout of the shell running Logstash. This is useful for debugging as you can see how the input event data has mapped to the output event.

 

Elasticsearch Indexes and Managing Indexes w/ Kibana

Elasticsearch is a distributed, RESTful search and analytics engine based on Apache Lucene index and search technology. Elasticsearch is very scalable and can provide near real-time search. Elasticsearch is distributed which means indices can be divided into shards which are distributed among the nodes in a cluster to provide horizontal scaling, while replicas of each shard are distributed to provide high availability.

 

As you read in the previous section, we use Logstash to receive unstructured log messages, parse the messages, and output a structured index document which we send to the Elasticsearch index. We can then use Kibana to manage the indexes we have created. Kibana has a tab called Dev Tools which offers a console where you can run live commands against the Elasticsearch RESTful API. In this blog I will show you some useful commands for managing your indexes with Kibana.

 

Visualizing Elasticsearch Data w/ Kibana

Kibana is a data visualization software for Elasticsearch. It lets you visualize the content indexed in Elasticsearch in several formats including: bar, line or pie charts, tables, and heat maps. Following the steps in this post you will use Kibana to create index patterns for Elasticsearch indexes, explore the data in your index, build visualizations to represent your data in a meaningful way, and finally to collect your visualizations on dashboards.

 

Monitoring HCP with ELK - Step by Step

In order to follow the steps in this section you must have a running HCP, and have installed the ELK stack. Elasticsearch and Kibana should be running as services. To configure a running ELK system you can refer to the fist blog in this series Performance Monitoring w/ ELK - Part I: Installing ELK.

 

Step 1: Configure HCP for Monitoring

The first thing you will need to do is configure HCP to output HTTP gateway access log event messages via syslog. To do this log into your HCP System Management Console as a user with the "Administrator" role, and go to the Monitoring => Syslog page.

 

1a: Check the "Enable syslog" and "Send log messages for HTTP-based data access requests" check boxes and click "Update Settings".

1b: Next you will need to register your Logstash server as a Syslog Server to receive the HCP HTTP access events. Enter the IP address of the server where you installed and will be running Logstash. By default Syslog uses port 514 for communication, if you will be listening on port 514, no port is required. If you will be listening on anything other than port 514 you must enter the port number by typing a colon and the port number after the IP address. Click "Add" to add the record to the list.

Step 2: Logstash Configuration

2a: Download the attached zip file logstash.hcpsyslog-allversions.conf.zip and extract the enclosed configuration file. Copy the configuration file to your Logstash server, and place it in the /etc/logstash/manual_configs folder.

 

2b: Before we run the Logstash configuration for the first time, let's edit our configuration to listen to stdin, and to output documents to stdout. Comment out the udp and elasticsearch plugins, and uncomment the stdin and stdout plugins as shown below:

input {
#  udp {
#    port => 514
#    type => syslog
#  }
  stdin {
    type => syslog
  }
}
...
output {
#  elasticsearch {
#    hosts => ["localhost:9200"]
#    user => elastic
#    password => changeme
#    index => "access_logs-%{+yyMMdd}"
#  }
  stdout {
    codec => rubydebug
  }
}

 

2c: Now you are ready to run Logstash! SSH to your ELK host and run the following command:

/usr/share/logstash/bin/logstash -f /etc/logstash/manual_configs/logstash.hcpsyslog-allversions.conf -w 6 --path.data /usr/share/logstash/hcpsyslog/

2d: It will take a few seconds for Logstash to start, when it is ready you will see the following message. You can ignore a Warning about the logstash.yml file if you see one.

The stdin plugin is now waiting for input:

2e: Paste the following text into the command shell where you are running Logstash:

<142>Sep 22 10:58:17 HCP_System_Event: 0019025005 172.18.12.24 -  -  [22/Sep/2018:10:58:16 -0400] "GET /folder/object.png HTTP/1.1" 200 210 MyNamespace.Mytenant 007 0

If everything is working you should see the following output, though the fields may not be listed in the same order. If you do not see this output you will need to troubleshoot following the guidance in the Logstash Troubleshooting section of this article.

{

    "response_Mbytes_per_second" => 0,

      "request_bytes_per_second" => 0,

                       "message" => "<142>Sep 22 10:58:17 HCP_System_Event: 0019025005 172.18.12.24 -  -  [22/Sep/2018:10:58:16 -0400] \"GET /folder/object.png HTTP/1.1\" 200 210 MyNamespace.Mytenant 007 0",

     "request_Mbytes_per_second" => 0,

                          "type" => "apache_access",

                          "auth" => "-",

                        "tenant" => "Mytenant",

                   "node_number" => 0,

                          "verb" => "GET",

                      "clientip" => "172.18.12.24",

                          "host" => "tmpelk.tech.archivas.com",

                "response_bytes" => 210,

                      "response" => 200,

                     "namespace" => "MyNamespace",

     "response_bytes_per_second" => 30000,

                    "@timestamp" => 2018-09-22T14:58:16.000Z,

                      "@version" => "1",

                       "request" => "/folder/object.png",

                       "latency" => 7

}

2f: At this stage you can edit your input filter to listen to the UDP port, reversing the change made in step 2b. After you do press ctrl-c to stop Logstash, and rerun the Logstash command to restart. Once Logstash is running issue some REST requests to a namespace (GET, PUT, HEAD, DELETE, etc.) using HCP's Namespace Browser, cURL, Cloudberry, or any other method you choose. Every request you issue against the HCP should result in output very much like the output above. If you do not see output when you issue REST requests to a namespace you will need to troubleshoot connectivity between the HCP node and your ELK server. Refer to the General Troubleshooting section of this post for tips.

 

2g: Now that you have validated your configuration file and you have validated that you are receiving HCP syslog messages it is time to start indexing. Edit your configuration file to comment out the stdout plugin and uncomment the elasticsearch plugin. After you edit the configuration press ctrl-c to stop Logstash, and rerun the Logstash command to restart. Once Logstash is running issue some REST requests to a namespace in HCP. In the next section I will show you how to confirm that your index has been created. If your index is not created when you issue REST requests to a namespace you will need to troubleshoot connectivity between Logstash and your Elasticsearch service. Refer to the General Troubleshooting section of this post for tips.

 

2h: Optional: You can run Logstash in the background using the following command:

nohup /usr/share/logstash/bin/logstash -f /etc/logstash/manual_configs/logstash.hcpsyslog-allversions.conf -w 6 --path.data /usr/share/logstash/hcpsyslog/ &

 

Step 3: Confirm Index Creation

Before we move on we must confirm we are successfully indexing. All of the steps from this point forward will be done within the Kibana UI. Make sure you issued some REST requests after enabling the elasticsearch output plugin as described in step 2g above.

 

3a: Log into your Kibana web application at  http://172.18.12.24:5601/  where 172.18.12.24 is your ELK server's IP address. If you followed my instructions from the first blog post you will not need a username or password.

 

3b: In the left pane you will see a link for Dev Tools. Click the link to be taken to a page where you have a console to execute queries against Elasticsearch.

 

3c: In the left hand pane of the console enter the following query

GET _cat/indices?v

To the right of your query (or the selected query if there are multiple) you will see a green play button.

Click the play button to execute the query against Elasticsearch. If you are indexing successfully you will see output similar to the following where you have an index named access_logs-yymmdd. If you do not have an index named access_logs-yymmdd you must go back to step 2g and troubleshoot.

 

health status index              uuid                   pri rep docs.count docs.deleted store.size pri.store.size
yellow open   access_logs-180723 wSbM0h_WRqex8vDZPsOntQ   5   1          2            0     51.8mb         51.8mb
yellow open   .kibana            muTxANaNSlqhpdvsU6TuTg   1   1         45            1     70.1kb         70.1kb

 

3d: Now that you have confirmed everything is working it will be good to have some data in the index to look at. If your HCP is already busy handling REST requests you can skip this step. If not, and if you have a way to drive load to your HCP, go ahead and kick that off now, that way when you get to the final step you will actually have some data to look at.

 

Step 4: Create Your Index Pattern

Once you are successfully indexing HCP access log records in Elasticsearch you will want to create an index pattern which is required for our visualizations.

 

4a: Click the Management link in the left Kibana navigation pane.

 

4b: Click the Index Patterns link to go to the Create index pattern page in Kibana.

 

4c: In the field labeled Index pattern enter  access_logs-*

 

4d: If you entered the correct text you should see a success message indicating your pattern matches at least 1 index. Click the Next step > button to proceed to the Configure settings page for your index pattern.

 

4e: In the Time Filter field name dropdown selector, select @timespamp as your your time field. You may recall capturing this time with the date filter plugin in your configuration. This is an important field as most of the visualizations you will use are time series visualizations.

 

4f: Expand the advanced option by clicking the Show advanced options link to reveal a field labeled Custom index pattern ID. Enter the value access-logs-index-pattern

 

4g: Click the Create index pattern button to create your index pattern.

 

4h: You should now find yourself at a page displaying the details of your index pattern. You will see all the fields Kibana discovered in the index and their data types.

 

One thing to note at this point in the process. The configuration file is designed to work with 7.3+ HCP systems. If you create your index pattern with index documents from an HCP 7.3.x system, and you later begin indexing 8.0+ access log data you will need to recreate your index pattern. This is because certain fields did not exist in 7.3 and would not be discovered in Kibana unless you recreate the index pattern after you have indexed some 8.0+ documents. In this case, just delete your pattern and follow the steps above again.

 

Step 5: Import The Visualizations and the Dashboard

The final step before you are up and running.

 

5a: Download the attached kibana-access-log-exports.zip and extract the 3 enclosed files to the workstation where you have Kibana loaded in the browser.

 

5b: Click the Management link in the left Kibana navigation pane.

 

5c: Click the Saved Objects link to go to the Edit Saved Objects page

 

5d: Import your visualizations. In the upper right corner click the Import link and select the access-log-index-visualizations.json file that you extracted in step 5a. If prompted click Yes to overwrite existing objects.

 

5e: Import your dashboard. In the upper right corner click the Import link and select the access-log-index-performance-dashboard.json file that you extracted in step 5a. If prompted click Yes to overwrite existing objects.

 

5f: Optionally import Timelion Visualizations. In the upper right corner click the Import link and select the access-log-index-timelion-visualizations.json file that you extracted in step 5a. If prompted click Yes to overwrite existing objects.

 

Step 6: View The Dashboard

 

6a: Click the Dashboard link in the left Kibana navigation pane.

 

6b: Click the AL Performance Dashboard link to open the Access Log Performance Dashboard.

 

6c: Don't see your data? Click the time selector in the top right and expand the time window to include your data.

On this dashboard you will see a number of visualizations that I find useful for example Latency by Count and Size below. This visualization allows you to see the impact object size may have on latency (total request time), or increased latency may have on count of operations.

Feel free to add and remove any visualization to or from the dashboard, just click the Edit link at the top right of the screen and you will be able to add, remove, resize and reposition visualizations. Click save to save to update the dashboard or to save as a new dashboard.

 

Step 7: View and Edit Visualizations

 

7a: Click the Visualize link in the left Kibana navigation pane.

 

7b: Click the AL Count by Verb link to open the visualization editor.

 

In the visualization below you can see you have a nice stacked bar chart showing the count of requests received by your cluster, broken down by HTTP Verb (PUT, GET, etc.).

 

But suppose you are interested in seeing the success or failure of transactions?

 

7c: Expand the Split Series area under Buckets. In the Field input, select response then click the blue play button at the top of the editing frame. Notice that now your breakdown is by response code (200 OK, 201 Created, etc):

 

Click save to update or save as a new visualizations.

 

Step 8: Get to Know Kibana

 

Play around with these visualizations and dashboards, there is much more to learn than can be described in this post. Create new visualizations and dashboards, update existing. Don't worry about breaking stuff, you can always delete everything and start over by re-importing.

 

Tips and Tricks

Elasticsearch

  1. If you ever make a mistake (who doesn't) and want to blow away your indexes and start over it is very easy to do. In the Kibana UI go to the Dev Tools tab. You can delete the index for a specific date or delete for all dates using one of the following queries:
    DELETE access_logs-180919
    DELETE access_logs-*
  2. You can also query the index directly in the Dev Tools tab. For example this query shows a count of documents where the verb is GET or HEAD:
    GET /access_logs-*/_count
    {
      "query" : {
        "bool" : {
          "must_not" : {
            "terms" : { "verb" : ["GET", "HEAD"]}
          }
        }
      }
    }
    You can also use direct queries to delete specific rows from your indexes. This will require you read the documentation to learn more about it.

Kibana

  1. When in Kibana, hover over the white-space in any time-series visualization until you see a large + symbol. Click and drag across your visualization to zoom in on the selected time window.
  2. When you see values in the legend, for example response codes in the example below, click on one of the values. You will see a magnifying glass with a + and one with a -, click on either to create a filter to show only results with that value or without that value.
  3. Filters are super useful, click the Add a filter + link at the top of the page to create a new filter. Choose the metric the criteria, and the value, values, or range to filter on. If you plan to pin the filter give it a meaningful name as well.
  4. When you have a filter you use a lot you can "pin" it and the filter will stick around.  Hover over the filter's icon in the filter bar at the top of the page and then click the pin symbol:

    Click the checkbox on the filter to activate and deactivate the filter. Click the - magnifying glass to make the filter exclusive (exclude matches), and the + magnifying glass will make it inclusive again. Click the trash can to delete and the pencil icon to edit the filter's rule.
  5. When splitting a series, such as in the AL Count by Verb chart above which is split by the verb metric, if you do not see all of the expected values make sure that the Size value selected for the split is >=  the number of values in the split, otherwise it will show only the top N matches based on your sorting criteria. For this example if there were more than 5 verbs in the index (GET, PUT, HEAD, DELETE,...) I would only see the 5 most commonly occurring based on my sort.
  6. You may wish to have several metrics in the same chart, such as shown in the AL Latency by Count and Size chart shown above, here are some tips to keep the chart readable and account for metrics on different scales:
    1. Assign specific colors to certain metrics: When editing a visualization you can choose which colors to use for your metrics. Click on the metric in the legend and choose the color you wish to display.
    2. Use different "chart types" for different metrics (Line, Bar, and Area charts only): In the visualization editing panel on the left open the Metrics & Axes tab. In the Metrics section for each metric you may choose the Chart Type of either Line, Bar, or Area. For each chart type you may choose either Normal or Stacked mode. Using different chart types on the same chart greatly increases readability when mixing metrics.
    3. Arrange metrics back to front: You can arrange which metrics appear in front, or on top, and which appear behind, or underneath. As a general rule of thumb I would put area charts in the back, lines in the front, and bars in between. You can order this on the Data tab of the visualization editing panel, each metric has a little slider, grab the metric and slide it to the top to place in the back, and the bottom to place in the front.
    4. Use different axes for metrics of different scales: Some of your metrics may be percentages, some may average in the millions and others in the thousands. If you have them all on the same axis, only the millions will show up, the thousands and percentages will just look like a flat line across the bottom. To fix this go to the Metrics & Axes tab in the visualization editing panel, expand the metric you want to create a new axis for, and in the Value Axis dropdown, select New Axis.... You can customize the axis in the Y-Axes section just below in the same tab, use Advanced Options to edit the scale and range.

Logstash Troubleshooting

When your Logstash configuration isn't working as expected it can be a bit tricky to pinpoint exactly what the error is, here are some pointers to help get you started. Effectively troubleshooting or writing Logstash configurations may require that you know regular expressions or are willing to learn.

  1. Start with the error message. These are not super helpful but sometimes help identify the line where things went wrong. Sometimes it is helpful to copy to a text editor like notepad++ to replace "\n" with actual newline characters for readability. Look for where it says "expected <some character> after <some location>", often the issue is in the line preceding the specified location.
  2. Run the Logstash configuration interactively. Comment out the external input plugins and uncomment the stdin input plugin. Comment out the external output plugins and uncomment the stdout output plugin. Now run the configuration and copy paste messages/events into the terminal window.
  3. Recursively modify the filter element of your configuration. This is where the bugs pretty much always are and how debugging gets done. Typically the issue will be in a grok match expression.
    1. Eliminate from the filter the bit you think is causing the problem. You can comment out the entire filter body, a block of code, an individual line, or selectively remove parts of a grok match expression. For example if this is your grok expression:
      match => { 
        "access_log" => "%{IPORHOST:clientip} %{HTTPDUSER:ident} +%{HTTPDUSER:auth}"
      }
      Remove the suspected troublesome bits of the expression and replace them with .*
      match => { 
        "access_log" => "%{IPORHOST:clientip}.*"
      }
    2. Save your changes and rerun Logstash. If it works without error this confirms that you have removed the troublesome code.
    3. Make small incremental changes to add the code back that you removed. Each time you make a small change you save it, restart logstash, and process an event.
    4. Continue doing step 3 until you pinpoint specifically what code addition causes the error to return.
    5. Fix the broken code following the same process outlined above.

 

    General Troubleshooting

    1. Make sure you have disabled your firewall or opened the ports on the firewall as described in the first post in this series.
    2. Make sure you have configured Kibana and Elasticsearch to listen on the public IP address of the ELK server as described in the previous post.
    3. Verify your Kibana and Elasticsearch service are running as described in the previous post.
    4. Verify network connectivity between the HCP and the ELK server.
    5. Verify that you have configured HCP as described in step 1 of this blog post, and that the IP address is correct, and the port is the same as the port specified in your Logstash configuration.
    6. Be sure that your Logstash configuration input and output plugins are configured properly and that you did not accidentally leave something commented out. For example, Elasticsearch indexes will not be created if the elasticsearch output plugin has been commented out.

     

    Conclusion

    The ELK tools are very flexible, powerful, and reasonably intuitive. While there is much more to know about these tools than can be covered by these guides, I hope that this series gives you a head start to set up your own ELK instance and configure it to monitor HCP.

     

    If you do follow the instructions in this series I would love to hear from you. Please comment below to say you tried and it worked, tried and failed, or to provide any feedback which I can incorporate to improve the content for future readers.

     

    Thanks!

    Outcomes