Skip navigation

A client of mine recently asked me something seemingly simple: how can I quickly check which of my namespaces are cloud optimized?


The built in way of going through the admin GUI does not scale very well beyond a handful of tenants. You have to jump into each tenant to see the namespaces and then navigate to the settings tab of each namespace. So at least a couple of clicks to find out if the namespace is cloud optimized or not. This client has many tenants with hundreds of namespaces so some serious clicking would be needed...


Fortunately there is a better way: automate this through the use of our management API (MAPI). Instead of clicking around we fire some requests against our API to gather the information, collect everything and export in some sensible format. The documentation for the MAPI can be easily accessed through the admin help in the HCP GUI itself.


I am by no means a capable programmer but I decided to cobble together a few lines of python to make the HCP admins life easier. The flow to get to the data is something like this:


  1. get the list of tenants
  2. get the list of namespaces for each tenant
  3. get the list of set options for each namespace
  4. aggregate and export as .csv


There were a couple of difficulties I had to solve with my limited programming skills:

  1. the number of fields returned are not the same for every namespace. This makes it difficult to produce a nice .csv with a single header row and properly aligned data fields. I ended up using a list of the fields to be included in the output.
  2. the MAPI returns nested JSON for some of the attributes, which again makes it hard to include in a table format. Here i used the json_normalize function from to flatten the result.


The result looks something like this for a subset of namespace properties:




Instead of putting this into a .csv file a database could also be a good target. Together with a timestamp this then allows to capture configuration drift.


The script is attached below. Some things you need to adjust to adapt for your environment:

  • make sure the management API is enabled
  • swap out URLs and authorization tokens
  • adjust output name and path
  • enable certificate validation for your production environment


Feel free to use and modify. Also feel free to share back if you enhance it or spot mistakes.





Anyone who is familiar with the limitations of the Hadoop clustered storage architecture knows there is a huge opportunity for purpose built mass storage like Hitachi Content Platform as a data offload target in this environment. See Nick DeRoo's post A Better Big Data Ecosystem with Hadoop and Hitachi Content Platform (Part1) if you are not familiar with the limitations of the Hadoop clustered storage architecture. There are a variety of possible solutions to enable data offload and the viability of these solutions ultimately rests on whether the solution satisfies customer requirements. The Content Solutions Engineering (CSE) team has had the opportunity to engage with several customers (all large financial institutions) and customer account teams to understand their requirements. What we have heard so far are variations of the following 3 main requirements:


  1. They want to offload cold data from Hadoop cluster storage to external storage
    1. To free capacity for warm/hot data
    2. To avoid expanding Hadoop to accommodate PBs of cold data
    3. To save money
    4. To reduce complexity
  2. They do not want to change their applications
    1. They do not want to move the data
    2. They do not want to tag the data to move
  3. They want uninterrupted access to their cold data
    1. Cool/cold data may still be accessed 1-2 times per month/year
    2. Data paths/URIs must be unaltered
    3. Cold data access may slower than hot, but must still be fast
    4. When cold data is accessed it may be accessed again shortly


Most of the customers we have heard from are looking for all 3 of these requirements, a combination we are referring to as "seamless offload". Seamless offload automatically tiers cold data to external storage freeing internal capacity for new data. It provides uninterrupted access to tiered data and is completely hidden from the application layer.


In this post we will cover capabilities that enable these requirements. We will evaluate the capabilities of the three main Hadoop distributions, Hortonworks (HDP), Cloudera (CDH), and MapR. And we will also look at the capabilities of Alluxio, a 3rd party solution evaluated by the CSE team previously and discussed in this blog post: Certification of HCP with Alluxio


Offload Capabilities

This section describes several capabilities in Hadoop platforms and software, that are relevant to the offload use case. The terminology used in this section is not necessarily standard terminology as each vendor may use different terms to describe their version of these features.


Seamless Offload

Seamless offload refers to the ability of the Hadoop platform and software to tier cold data from cluster storage to external storage without affecting the application layer in any way (other than less performant retrieval of cold data). This is achieved by combining several of the capabilities listed below.



S3A is an S3 protocol connector that ships with recent versions of Apache Hadoop, having deprecated the earlier connector S3N. S3A allows applications to directly access data in an S3 bucket, with full read and write capability. S3A does not support cache on read or rehydration, every read must be serviced directly from the S3 bucket.


To use S3A, applications must address the bucket directly with a URI like s3a://folder/ Applications can move data between HDFS and S3A, and use tools like DistCp to bulk copy data. That said, the application is entirely responsible for managing data movement and keeping track of where the data is. Also, the S3A protocol is different than the HDFS protocol, so interfacing with S3A will require separate API logic.


Unified Namespace

Unified namespace refers to the ability to read both HDFS and S3 data in the same namespace using the same protocol. Required for seamless offload.


Outside of a seamless offload solution, the primary value of unified namespace is simplification of application coding. It provides the ability to read S3 data from a S3 bucket in a previously mounted hdfs:// file system without having to use different API logic.


Unified Namespace Write

Same as unified namespace but allows writing to the S3 bucket. Required for seamless offload.


Read Caching

Read caching, or rehydrating,  refers to persisting data that has been recalled from S3 recently in a cache local to the Hadoop cluster, either on RAM, SSD, or HDD.


This provides much faster subsequent data access because generally data which has been accessed recently is more likely to be accessed again in the near future.While not a key capability for seamless offload, read caching is a highly desirable capability to enhance performance of the offload solution.


File Tiering Service

The file tiering service moves data between the storage tiers defined in the Hadoop configuration. Required for seamless offload.


Automatic Tiering Policy

Auto tiering policy is a rules based policy that will automatically identify data to be move based on a defined set of rules. Required for seamless offload.


It is important to differentiate between automatic tiering policy, and manual. Apache Hadoop has a feature called "storage policies" which are used to flag the data to be tiered. Hadoop storage policies are not automatic or rules based. Hadoop storage policies must be applied manually (by the application) to individual files or directories. Conversely, automatic tiering policies are applied by the platform, not by the application, and are applied based on the policy's rule. For example, I might set up a cold data policy for data that was last modified more than 270 days ago and is larger than 10MB. Any data which matches the criteria will automatically be tiered without requiring additional action from the application.


URI Preservation

URI preservation refers to the capability of the platform or software to move data from cluster storage to external storage while allowing applications to continue to access the data using the original URI path. Required for seamless offload.


This is similar to file stubbing technology used in HDI and other cloud gateway solutions. While the data has been moved, the application's view of the data is unaltered, and the application's ability to access the data using the original path is uninterrupted.


Block Level Tiering

Block level tiering allows policies to be applied at the storage block level, as opposed to policies that are applied at the whole file or whole directory level. This allows parts of very large files to be tiered without requiring the whole file to be tiered.  Required for seamless offload of table or stream data. This capability is not required for seamless offload of file data.


Tiering of Table and Stream Data

It is my current understanding that block level tiering is the key to this capability, so for the purposes of this post I will keep them together.


It is possible to offload tables (i.e. the underlying files behind the tables) in their entirety to an S3 bucket. This is not tiering though, this is a manual migration. It is unclear at this point what the performance of tables stored in an S3 bucket would be, but I suspect it would be poor.


Capability Matrix

Now that we have introduced the capabilities required for offload of Hadoop data to an S3 bucket, let's look at the big 3 Hadoop distributions and Alluxio to see which capabilities each possess.


Offload Capability

1 - required for seamless

HDP 3.1CDH 6.0
Mapr 6.1
Alluxio 1.8

Seamless Offload




Unified Namespace 1


2 - HDFS-9806 is read only

    Unified Namespace Write 1


Read Caching


File Tiering Service 1


Automatic Tiering Policy 1


URI Preservation 1


Block Level Tiering

    Tiering of Table and Stream DataNNYN


As you can see from the matrix above, only MapR 6.1 has all of the capabilities to enable seamless offload of data from cluster storage to an S3 bucket. Apache Hadoop is adding capabilities and may catch up at some point (see HDFS-12090 and HDFS-7343), but today you cannot do seamless offload with HDP or CDH without bringing in other technology.  Alluxio can be a valuable addition in HDP and CDH environments, particularly when cache on read is required to enable analytics of data stored in S3. However, because Alluxio lacks tiering capabilities and URI preservation, it cannot be seen as an enabler for a seamless offload solution.



Big data platforms like Hadoop have accumulated massive amounts of data, and are continuing to grow at a rapid rate. The owners of these platforms are clamoring for options to scale capacity more cheaply by tiering cold data to a less expensive tier. While not all big data platforms have the built-in capabilities necessary to support the customers' requirements, the CSE team is busy exploring options for each of these platforms to help our customers offload their data to HCP.


The MapR 6.1 release went GA on September 29th 2018 and the CSE team is testing these capabilities currently. We will be posting to the blog with updates, so check back soon. Until then, please if you have any comments or feedback, capabilities we missed or mischaracterized, please comment in the comment section.


The Hitachi Vantara Content Solutions Engineering Team has successfully certified the Hitachi Content Platform (HCP) as an Alluxio understore. The certification testing involved running several big data benchmarking tools against a Hadoop cluster using Alluxio virtualized Hadoop Distributed File System (HDFS) and HCP storage. After reviewing the results of the certification testing, Alluxio engineering has approved the certification of HCP as an Alluxio understore.


Big data platforms are known for delivering high performance analytics at massive scale. They achieve this by co-locating data and compute on commodity hardware nodes where storage and compute resources are balanced. When additional compute or storage resources are required, nodes are added to the cluster. Over time these clusters can grow to be hundreds or thousands of nodes, which can accumulate great quantities of older, less active cold data. When this occurs, enterprises are forced to scale Hadoop clusters well beyond their computational requirements, in order to meet these increasing storage requirements.


Big data application developers also face the challenge of how to unlock the value of unstructured data stored in HCP. With tools like the HCP metadata query engine and Hitachi Content Intelligence, developers have powerful tools for data discovery. However they do not have a tightly integrated, performance optimized method to access and analyze that data. They need a solution that directly exposes their HCP data to their big data applications while minimizing the cost of repetitive data retrieval.

Hitachi Vantara Content Solutions Engineering Team has partnered with Alluxio to certify a solution that addresses these challenges. Together, HCP and Alluxio empower enterprise customers to extract big data value from their object data, and to recover precious HDFS capacity occupied by cold data.

HCP and Alluxio

Hitachi Content Platform (HCP) is Hitachi Vantara's market leading object storage platform. Available as an appliance or software only HCP scales to store many billions of objects at multi-petabyte scale. HCP delivers durable data protection with much greater storage efficiency than can be achieved with the Hadoop standard 3 replica configuration. HCP offers data protection both by replication as well as by geographically distributed erasure coding where fewer than 2 full copies of the data must be stored to deliver the same durability as three replicas.

Alluxio is a data access layer that lies between compute and heterogeneous storage resources. Alluxio unifies data access at memory-speed and bridges big data frameworks with multiple storage platforms. Applications simply connect with Alluxio to access data stored in any underlying storage system, including HCP. Using Alluxio global namespace, existing data analytics applications such as Apache Hadoop MapReduce, Apache Spark, and Apache Presto can continue using the same industry-standard interfaces to access data federated between HCP and native Hadoop storage.

Use Cases Validated

There are two primary use cases that have been validated with HCP and Alluxio in a big data ecosystem. The first use case is to simplify access to data stored in HCP in order to enable Hadoop applications to perform analytics on HCP data. An HCP bucket virtualized with Alluxio can be accessed by big data applications using Alluxio's Hadoop Compatible File System interface. The Hadoop Compatible File System interface mimics the HDFS interface. By simply changing the URI scheme from "hdfs://" to "alluxio://" big data applications are able to access and analyze data in an HCP bucket using the familiar HDFS interface.


Alluxio provides several client interfaces, and virtualizes a variety of storage types


The second use case is to simplify the movement of data between HDFS and HCP in order to enable the offload of cold data. Virtualizing both HDFS and an HCP bucket with Alluxio provides a unified namespace for Hadoop applications to read and write data to and from both HCP and HDFS. Applications can then move data from HDFS to the HCP bucket as easily as moving data from one directory to another.


The downside of moving data from HDFS to HCP is that analytics performed on cloud data is slower than analytics performed on data stored locally in HDFS. However, Alluxio addresses this issue by providing HDD, SSD, and RAMDISK cache on the Hadoop node where cloud data can be promoted to the Alluxio cache for analysis, enabling memory-speed analytics with object-store savings. We verified the performance benefits of promoting cloud data to the Alluxio cache by analyzing the same HCP data set multiple times. After the data in HCP was promoted to the Alluxio cache during the initial analysis, the performance of analyzing the HCP data in the Alluxio cache was comparable to the performance of analyzing a local HDFS data set.

Software Configuration and Test Methodology

To test HCP and Hadoop together, we installed a Hadoop cluster on four D51B-2U nodes running CentOS 7 Linux OS and configured for 10G networking. Hadoop version HDP- was provisioned and managed using Apache Ambari software. All four nodes had the necessary Hadoop software for the benchmark test suites performed including but not limited to, HDFS, Spark2, and MapReduce2. In addition, all four Hadoop nodes were running Alluxio Enterprise 1.7.1 software.


All S3 bucket tests were performed against HCP 8.1 software running on a 4 node G10 cluster with 10G network and VSP G600 storage volumes. HCP network traffic was routed through a Pulse Secure Virtual Traffic Manager load balancer.


The certification testing was performed using various big data performance testing tools including HiBench, DFSIO, and TPC-DS. Each test was run three times. The first test was a benchmark test and used the S3A protocol to go directly to HCP. Then two consecutive tests were run using Alluxio with HCP as the understore. Validation of the test results involved verifying that the performance of recalling data from HCP with Alluxio was comparable to the S3A benchmark, and that subsequent analyses of previously recalled data showed the performance benefits of being locally cached by Alluxio.


HCP Specific configuration Settings in Alluxio

Alluxio exposes an UnderFileSystem interface that enables the HCP to be configured as the underlying storage for the Alluxio filesystem. When HCP is configured as the understore in Alluxio, HCP acts as the primary storage backend for all applications that interact with Alluxio. This configuration can completely replace HDFS or coexist with HDFS in a big data ecosystem. Within in this configuration, the root directory of the Alluxio filesystem is mapped to the root directory of the HCP namespace. This makes for a one to one mapping between files and directories in the Alluxio filesystem and an HCP namespace. To accomplish configuring HCP as a understore the following configuration properties were configured in the $alluxioHome/conf/ configuration file:


#HCP Config












Some of these settings may not be necessary but represent the configuration used in our testing. For example, list.objects.v1=true was originally set for HCP 8.0 compatibility, but was likely not necessary for HCP 8.1 testing. accessKeyId and secretKey are the base64 encoded username and md5 encoded password of the HCP namespace data access user.


Another method for configuring HCP with Alluxio would be to mount HCP to a specific directory in the Alluxio filesystem. The primary use case for this would be non-seamless HDFS offload. Alluxio would be configured with HDFS as the under filesystem(as described here) and HCP would be mounted to a sub directory within the root filesystem of Alluxio. The end result is to present both HDFS and HCP as a single filesystem. This would be accomplished by following Alluxio’s documentation to configure Alluxio with HDFS, and then using the alluxio fs mount command to mount a HCP namespace as shown in the following command:


./bin/alluxio fs mount --option aws.accessKeyId=<base64_Username> --option aws.secretKey=<md5_Password> --option /mnt/HCP s3a://namespace/directory/


Properties not explicitly set in the mount command will be inherited from $alluxioHome/conf/ configuration file as described above.

HCP Considerations

HCP software versions prior to HCP 8.1 have not been certified to work with Alluxio. There are known functional differences between 8.1 and prior versions, for example the multi object delete (bulk delete) API is not implemented with earlier versions of HCP. Alluxio has a configuration to disable invocation of this API, but this was not tested as part of this certification. On each HCP namespace to be configured as an under filesystem address in Alluxio, the S3 compatible API will need to be enabled along with the ‘Optimize for Cloud’ feature. The optimize for cloud feature must be enabled for HCP to support Multipart Upload. Depending on scale and workload, the following configuration settings may need to be tuned:



For More Information

For more information about Alluxio, please refer to these links which describe Alluxio architecture and data flow. Or you can reach out directly to the Alluxio team at If you have questions about the solution described in this brief or have an opportunity where you think this solution may be a fit, please reach out to our ISV team at


This is the second in a series of blog posts outlining how to use open source ELK (Elasticsearch, Logstash, Kibana) to visualize performance of a system.  If you will be implementing the monitoring described in this post you will first want to follow the steps in the first post: Performance Monitoring w/ ELK - Part I: Installing ELK


In this post we will cover how to visualize an HCP cluster's performance using the HCP REST/S3/Swift gateway access logs. To accomplish this you will need an HCP cluster running version 7.3 or greater. For real-time monitoring you must have network connectivity between your HCP cluster and the server running the ELK software. In this post we will configure ELK to receive HCP access logs over syslog port 514, and we will configure the HCP cluster to send the access logs to the ELK server.


Logstash Configuration

In the previous post we installed the Logstash software. Logstash is a software component which can be configured to receive messages, A.K.A. "events", and process those events in a particular way. Events can be messages received over the network, outputs from running commands, or the result of tailing log files, just to name a few. Logstash parses events according to the instructions in a configuration file. For each event, values parsed from the event message are assigned to named fields which will be passed to an indexing service, in this case Elasticsearch. In this section we break down a Logstash configuration to better understand each of its elements.


A Logstash configuration file has 3 main sections: inputs, filters, and outputs. The input section of the Logstash configuration file defines one or more inputs from which events are received. The filter section defines how that event is processed, and the output section defines one or more outputs to receive the results. This is referred to as the "processing pipeline". The contents of each section are defined using plugins which are described in detail at the following links for input plugins, filter plugins, and output plugins. Let's have a look at examples of each of these sections from a configuration designed to receive access logs from HCP over syslog, parse the access log events, and send them to the Elasticsearch index.


Input Section

The input section specifies where the Logstash events are coming from. In our case we will be listening on UDP port 514 to receive HTTP gateway access logs via syslog:

input {
   udp {
     port => 514
     type => syslog
#  stdin {
#    type => syslog
#  }

In this example you see two input plugins being used, udp and stdin. The udp plugin will listen on the specified port. The type field will set the event's type to the value specified. Using event type you can have multiple input plugins in the same configuration, and use the type field to distinguish among events received by different input plugins. The stdin plugin is useful for debugging as you can trigger an event by pasting a message into the Logstash terminal without having to wait for some external event. You will notice the stdin plugin is commented out using the # symbol.


Filter Section

The filter section is where the real work happens and will constitute the majority of code in any Logstash configuration. The following filter code is a bit simplified from what you will find in the configuration file attached to this article, in order to allow us to focus on the most meaningful aspects. Take a look at the code and the sample event, and we will break it down below.

filter {
  if [type] == "syslog" {
  # Sample event:
  # -  myuser  [22/Sep/2018:10:58:16 -0400] "GET /folder/object.png HTTP/1.1" 200 210 MyNamespace.Mytenant 007 0
    grok {
      match => {
        "message" => "%{IPORHOST:clientip} %{HTTPDUSER:ident} +%{HTTPDUSER:auth} +\[%{HTTPDATE:timestamp}\] \"(?:%{WORD:verb} %{NOTSPACE:request}(?: HTTP/%{NUMBER:httpversion})?|%{DATA:rawrequest})\" %{NUMBER:response:int} (?:%{NUMBER:response_bytes:int}|-) ((?[a-zA-Z0-9-]+)\.)?(?[a-zA-Z0-9-]+)(@hs3)? %{NUMBER:latency:int}( %{INT:node_number:int})?( %{INT:request_bytes:int})?.*"

    date {

      match => [ "timestamp" , "dd/MMM/yyyy:HH:mm:ss Z" ]



The first thing you will notice is conditional logic if [type] == "syslog". Type is an event field that was set by the input plugin. You can learn more about using conditionals and accessing event fields here.


The next line after the if statement is the beginning of a grok filter plugin definition. Grok plugins are very useful for taking unstructured text like web server logs and turning it into structured data that can be indexed and queried. In this example the match option is matching the content of the message event field1 on the left with the provided expression on the right. The pattern string matches using a combination of predefined Logstash patterns and regular expressions. Predefined Logstash patterns are specified as %{<pattern name>.<event field name>.<data type>}. Grok assigns matched values to the specified event field name and casts the event field datatype if specified. For example:

%{HTTPDATE:timestamp} - Finds a date that matches the Apache HTTP access log date format (see grok-patterns) and assigns the text value to the timestamp event field. All fields are text by default.

%{NUMBER:response:int} - Matches a number and assigns the value to the response event field, casting the field as an integer.


The other plugin in the filter section is the date filter plugin. The date filter is used for parsing dates from fields, and then using that date or timestamp as the Logstash timestamp for the event. For example:

match => [ "timestamp" , "dd/MMM/yyyy:HH:mm:ss Z" ] - Converts the text stored in the timestamp event field into a date using the provided joda-time date pattern, and assigns that timestamp to the Logstash event.


1 Logstash puts the data from the input event into the message field automatically


Output Section

Finally the output section describes what to do with the data which has been stored in the event fields. Typically, and in the following example, data is sent to an Elasticsearch index.

output {

  elasticsearch {

    hosts => ["localhost:9200"]

    user => elastic

    password => changeme

    index => "access_logs-%{+yyMMdd}"


#  stdout { codec => rubydebug }


The Elasticsearch output plugin specifies the host and port where the Elasticsearch index is running, the username and password, and the name of the index to add the record to. If the index does not exist it will be created. In this example if the event timestamp were September 7th 2018 the index name would be access_logs-180907 .


The stdout output plugin is a simple output which prints to the stdout of the shell running Logstash. This is useful for debugging as you can see how the input event data has mapped to the output event.


Elasticsearch Indexes and Managing Indexes w/ Kibana

Elasticsearch is a distributed, RESTful search and analytics engine based on Apache Lucene index and search technology. Elasticsearch is very scalable and can provide near real-time search. Elasticsearch is distributed which means indices can be divided into shards which are distributed among the nodes in a cluster to provide horizontal scaling, while replicas of each shard are distributed to provide high availability.


As you read in the previous section, we use Logstash to receive unstructured log messages, parse the messages, and output a structured index document which we send to the Elasticsearch index. We can then use Kibana to manage the indexes we have created. Kibana has a tab called Dev Tools which offers a console where you can run live commands against the Elasticsearch RESTful API. In this blog I will show you some useful commands for managing your indexes with Kibana.


Visualizing Elasticsearch Data w/ Kibana

Kibana is a data visualization software for Elasticsearch. It lets you visualize the content indexed in Elasticsearch in several formats including: bar, line or pie charts, tables, and heat maps. Following the steps in this post you will use Kibana to create index patterns for Elasticsearch indexes, explore the data in your index, build visualizations to represent your data in a meaningful way, and finally to collect your visualizations on dashboards.


Monitoring HCP with ELK - Step by Step

In order to follow the steps in this section you must have a running HCP, and have installed the ELK stack. Elasticsearch and Kibana should be running as services. To configure a running ELK system you can refer to the fist blog in this series Performance Monitoring w/ ELK - Part I: Installing ELK.


Step 1: Configure HCP for Monitoring

The first thing you will need to do is configure HCP to output HTTP gateway access log event messages via syslog. To do this log into your HCP System Management Console as a user with the "Administrator" role, and go to the Monitoring => Syslog page.


1a: Check the "Enable syslog" and "Send log messages for HTTP-based data access requests" check boxes and click "Update Settings".

1b: Next you will need to register your Logstash server as a Syslog Server to receive the HCP HTTP access events. Enter the IP address of the server where you installed and will be running Logstash. By default Syslog uses port 514 for communication, if you will be listening on port 514, no port is required. If you will be listening on anything other than port 514 you must enter the port number by typing a colon and the port number after the IP address. Click "Add" to add the record to the list.

Step 2: Logstash Configuration

2a: Download the attached zip file and extract the enclosed configuration file. Copy the configuration file to your Logstash server, and place it in the /etc/logstash/manual_configs folder.


2b: Before we run the Logstash configuration for the first time, let's edit our configuration to listen to stdin, and to output documents to stdout. Comment out the udp and elasticsearch plugins, and uncomment the stdin and stdout plugins as shown below:

input {
#  udp {
#    port => 514
#    type => syslog
#  }
  stdin {
    type => syslog
output {
#  elasticsearch {
#    hosts => ["localhost:9200"]
#    user => elastic
#    password => changeme
#    index => "access_logs-%{+yyMMdd}"
#  }
  stdout {
    codec => rubydebug


2c: Now you are ready to run Logstash! SSH to your ELK host and run the following command:

/usr/share/logstash/bin/logstash -f /etc/logstash/manual_configs/logstash.hcpsyslog-allversions.conf -w 6 /usr/share/logstash/hcpsyslog/

2d: It will take a few seconds for Logstash to start, when it is ready you will see the following message. You can ignore a Warning about the logstash.yml file if you see one.

The stdin plugin is now waiting for input:

2e: Paste the following text into the command shell where you are running Logstash:

<142>Sep 22 10:58:17 HCP_System_Event: 0019025005 -  -  [22/Sep/2018:10:58:16 -0400] "GET /folder/object.png HTTP/1.1" 200 210 MyNamespace.Mytenant 007 0

If everything is working you should see the following output, though the fields may not be listed in the same order. If you do not see this output you will need to troubleshoot following the guidance in the Logstash Troubleshooting section of this article.


    "response_Mbytes_per_second" => 0,

      "request_bytes_per_second" => 0,

                       "message" => "<142>Sep 22 10:58:17 HCP_System_Event: 0019025005 -  -  [22/Sep/2018:10:58:16 -0400] \"GET /folder/object.png HTTP/1.1\" 200 210 MyNamespace.Mytenant 007 0",

     "request_Mbytes_per_second" => 0,

                          "type" => "apache_access",

                          "auth" => "-",

                        "tenant" => "Mytenant",

                   "node_number" => 0,

                          "verb" => "GET",

                      "clientip" => "",

                          "host" => "",

                "response_bytes" => 210,

                      "response" => 200,

                     "namespace" => "MyNamespace",

     "response_bytes_per_second" => 30000,

                    "@timestamp" => 2018-09-22T14:58:16.000Z,

                      "@version" => "1",

                       "request" => "/folder/object.png",

                       "latency" => 7


2f: At this stage you can edit your input filter to listen to the UDP port, reversing the change made in step 2b. After you do press ctrl-c to stop Logstash, and rerun the Logstash command to restart. Once Logstash is running issue some REST requests to a namespace (GET, PUT, HEAD, DELETE, etc.) using HCP's Namespace Browser, cURL, Cloudberry, or any other method you choose. Every request you issue against the HCP should result in output very much like the output above. If you do not see output when you issue REST requests to a namespace you will need to troubleshoot connectivity between the HCP node and your ELK server. Refer to the General Troubleshooting section of this post for tips.


2g: Now that you have validated your configuration file and you have validated that you are receiving HCP syslog messages it is time to start indexing. Edit your configuration file to comment out the stdout plugin and uncomment the elasticsearch plugin. After you edit the configuration press ctrl-c to stop Logstash, and rerun the Logstash command to restart. Once Logstash is running issue some REST requests to a namespace in HCP. In the next section I will show you how to confirm that your index has been created. If your index is not created when you issue REST requests to a namespace you will need to troubleshoot connectivity between Logstash and your Elasticsearch service. Refer to the General Troubleshooting section of this post for tips.


2h: Optional: You can run Logstash in the background using the following command:

nohup /usr/share/logstash/bin/logstash -f /etc/logstash/manual_configs/logstash.hcpsyslog-allversions.conf -w 6 /usr/share/logstash/hcpsyslog/ &


Step 3: Confirm Index Creation

Before we move on we must confirm we are successfully indexing. All of the steps from this point forward will be done within the Kibana UI. Make sure you issued some REST requests after enabling the elasticsearch output plugin as described in step 2g above.


3a: Log into your Kibana web application at  where is your ELK server's IP address. If you followed my instructions from the first blog post you will not need a username or password.


3b: In the left pane you will see a link for Dev Tools. Click the link to be taken to a page where you have a console to execute queries against Elasticsearch.


3c: In the left hand pane of the console enter the following query

GET _cat/indices?v

To the right of your query (or the selected query if there are multiple) you will see a green play button.

Click the play button to execute the query against Elasticsearch. If you are indexing successfully you will see output similar to the following where you have an index named access_logs-yymmdd. If you do not have an index named access_logs-yymmdd you must go back to step 2g and troubleshoot.


health status index              uuid                   pri rep docs.count docs.deleted store.size
yellow open   access_logs-180723 wSbM0h_WRqex8vDZPsOntQ   5   1          2            0     51.8mb         51.8mb
yellow open   .kibana            muTxANaNSlqhpdvsU6TuTg   1   1         45            1     70.1kb         70.1kb


3d: Now that you have confirmed everything is working it will be good to have some data in the index to look at. If your HCP is already busy handling REST requests you can skip this step. If not, and if you have a way to drive load to your HCP, go ahead and kick that off now, that way when you get to the final step you will actually have some data to look at.


Step 4: Create Your Index Pattern

Once you are successfully indexing HCP access log records in Elasticsearch you will want to create an index pattern which is required for our visualizations.


4a: Click the Management link in the left Kibana navigation pane.


4b: Click the Index Patterns link to go to the Create index pattern page in Kibana.


4c: In the field labeled Index pattern enter  access_logs-*


4d: If you entered the correct text you should see a success message indicating your pattern matches at least 1 index. Click the Next step > button to proceed to the Configure settings page for your index pattern.


4e: In the Time Filter field name dropdown selector, select @timespamp as your your time field. You may recall capturing this time with the date filter plugin in your configuration. This is an important field as most of the visualizations you will use are time series visualizations.


4f: Expand the advanced option by clicking the Show advanced options link to reveal a field labeled Custom index pattern ID. Enter the value access-logs-index-pattern


4g: Click the Create index pattern button to create your index pattern.


4h: You should now find yourself at a page displaying the details of your index pattern. You will see all the fields Kibana discovered in the index and their data types.


One thing to note at this point in the process. The configuration file is designed to work with 7.3+ HCP systems. If you create your index pattern with index documents from an HCP 7.3.x system, and you later begin indexing 8.0+ access log data you will need to recreate your index pattern. This is because certain fields did not exist in 7.3 and would not be discovered in Kibana unless you recreate the index pattern after you have indexed some 8.0+ documents. In this case, just delete your pattern and follow the steps above again.


Step 5: Import The Visualizations and the Dashboard

The final step before you are up and running.


5a: Download the attached and extract the 3 enclosed files to the workstation where you have Kibana loaded in the browser.


5b: Click the Management link in the left Kibana navigation pane.


5c: Click the Saved Objects link to go to the Edit Saved Objects page


5d: Import your visualizations. In the upper right corner click the Import link and select the access-log-index-visualizations.json file that you extracted in step 5a. If prompted click Yes to overwrite existing objects.


5e: Import your dashboard. In the upper right corner click the Import link and select the access-log-index-performance-dashboard.json file that you extracted in step 5a. If prompted click Yes to overwrite existing objects.


5f: Optionally import Timelion Visualizations. In the upper right corner click the Import link and select the access-log-index-timelion-visualizations.json file that you extracted in step 5a. If prompted click Yes to overwrite existing objects.


Step 6: View The Dashboard


6a: Click the Dashboard link in the left Kibana navigation pane.


6b: Click the AL Performance Dashboard link to open the Access Log Performance Dashboard.


6c: Don't see your data? Click the time selector in the top right and expand the time window to include your data.

On this dashboard you will see a number of visualizations that I find useful for example Latency by Count and Size below. This visualization allows you to see the impact object size may have on latency (total request time), or increased latency may have on count of operations.

Feel free to add and remove any visualization to or from the dashboard, just click the Edit link at the top right of the screen and you will be able to add, remove, resize and reposition visualizations. Click save to save to update the dashboard or to save as a new dashboard.


Step 7: View and Edit Visualizations


7a: Click the Visualize link in the left Kibana navigation pane.


7b: Click the AL Count by Verb link to open the visualization editor.


In the visualization below you can see you have a nice stacked bar chart showing the count of requests received by your cluster, broken down by HTTP Verb (PUT, GET, etc.).


But suppose you are interested in seeing the success or failure of transactions?


7c: Expand the Split Series area under Buckets. In the Field input, select response then click the blue play button at the top of the editing frame. Notice that now your breakdown is by response code (200 OK, 201 Created, etc):


Click save to update or save as a new visualizations.


Step 8: Get to Know Kibana


Play around with these visualizations and dashboards, there is much more to learn than can be described in this post. Create new visualizations and dashboards, update existing. Don't worry about breaking stuff, you can always delete everything and start over by re-importing.


Tips and Tricks


  1. If you ever make a mistake (who doesn't) and want to blow away your indexes and start over it is very easy to do. In the Kibana UI go to the Dev Tools tab. You can delete the index for a specific date or delete for all dates using one of the following queries:
    DELETE access_logs-180919
    DELETE access_logs-*
  2. You can also query the index directly in the Dev Tools tab. For example this query shows a count of documents where the verb is GET or HEAD:
    GET /access_logs-*/_count
      "query" : {
        "bool" : {
          "must_not" : {
            "terms" : { "verb" : ["GET", "HEAD"]}
    You can also use direct queries to delete specific rows from your indexes. This will require you read the documentation to learn more about it.


  1. When in Kibana, hover over the white-space in any time-series visualization until you see a large + symbol. Click and drag across your visualization to zoom in on the selected time window.
  2. When you see values in the legend, for example response codes in the example below, click on one of the values. You will see a magnifying glass with a + and one with a -, click on either to create a filter to show only results with that value or without that value.
  3. Filters are super useful, click the Add a filter + link at the top of the page to create a new filter. Choose the metric the criteria, and the value, values, or range to filter on. If you plan to pin the filter give it a meaningful name as well.
  4. When you have a filter you use a lot you can "pin" it and the filter will stick around.  Hover over the filter's icon in the filter bar at the top of the page and then click the pin symbol:

    Click the checkbox on the filter to activate and deactivate the filter. Click the - magnifying glass to make the filter exclusive (exclude matches), and the + magnifying glass will make it inclusive again. Click the trash can to delete and the pencil icon to edit the filter's rule.
  5. When splitting a series, such as in the AL Count by Verb chart above which is split by the verb metric, if you do not see all of the expected values make sure that the Size value selected for the split is >=  the number of values in the split, otherwise it will show only the top N matches based on your sorting criteria. For this example if there were more than 5 verbs in the index (GET, PUT, HEAD, DELETE,...) I would only see the 5 most commonly occurring based on my sort.
  6. You may wish to have several metrics in the same chart, such as shown in the AL Latency by Count and Size chart shown above, here are some tips to keep the chart readable and account for metrics on different scales:
    1. Assign specific colors to certain metrics: When editing a visualization you can choose which colors to use for your metrics. Click on the metric in the legend and choose the color you wish to display.
    2. Use different "chart types" for different metrics (Line, Bar, and Area charts only): In the visualization editing panel on the left open the Metrics & Axes tab. In the Metrics section for each metric you may choose the Chart Type of either Line, Bar, or Area. For each chart type you may choose either Normal or Stacked mode. Using different chart types on the same chart greatly increases readability when mixing metrics.
    3. Arrange metrics back to front: You can arrange which metrics appear in front, or on top, and which appear behind, or underneath. As a general rule of thumb I would put area charts in the back, lines in the front, and bars in between. You can order this on the Data tab of the visualization editing panel, each metric has a little slider, grab the metric and slide it to the top to place in the back, and the bottom to place in the front.
    4. Use different axes for metrics of different scales: Some of your metrics may be percentages, some may average in the millions and others in the thousands. If you have them all on the same axis, only the millions will show up, the thousands and percentages will just look like a flat line across the bottom. To fix this go to the Metrics & Axes tab in the visualization editing panel, expand the metric you want to create a new axis for, and in the Value Axis dropdown, select New Axis.... You can customize the axis in the Y-Axes section just below in the same tab, use Advanced Options to edit the scale and range.

Logstash Troubleshooting

When your Logstash configuration isn't working as expected it can be a bit tricky to pinpoint exactly what the error is, here are some pointers to help get you started. Effectively troubleshooting or writing Logstash configurations may require that you know regular expressions or are willing to learn.

  1. Start with the error message. These are not super helpful but sometimes help identify the line where things went wrong. Sometimes it is helpful to copy to a text editor like notepad++ to replace "\n" with actual newline characters for readability. Look for where it says "expected <some character> after <some location>", often the issue is in the line preceding the specified location.
  2. Run the Logstash configuration interactively. Comment out the external input plugins and uncomment the stdin input plugin. Comment out the external output plugins and uncomment the stdout output plugin. Now run the configuration and copy paste messages/events into the terminal window.
  3. Recursively modify the filter element of your configuration. This is where the bugs pretty much always are and how debugging gets done. Typically the issue will be in a grok match expression.
    1. Eliminate from the filter the bit you think is causing the problem. You can comment out the entire filter body, a block of code, an individual line, or selectively remove parts of a grok match expression. For example if this is your grok expression:
      match => { 
        "access_log" => "%{IPORHOST:clientip} %{HTTPDUSER:ident} +%{HTTPDUSER:auth}"
      Remove the suspected troublesome bits of the expression and replace them with .*
      match => { 
        "access_log" => "%{IPORHOST:clientip}.*"
    2. Save your changes and rerun Logstash. If it works without error this confirms that you have removed the troublesome code.
    3. Make small incremental changes to add the code back that you removed. Each time you make a small change you save it, restart logstash, and process an event.
    4. Continue doing step 3 until you pinpoint specifically what code addition causes the error to return.
    5. Fix the broken code following the same process outlined above.


    General Troubleshooting

    1. Make sure you have disabled your firewall or opened the ports on the firewall as described in the first post in this series.
    2. Make sure you have configured Kibana and Elasticsearch to listen on the public IP address of the ELK server as described in the previous post.
    3. Verify your Kibana and Elasticsearch service are running as described in the previous post.
    4. Verify network connectivity between the HCP and the ELK server.
    5. Verify that you have configured HCP as described in step 1 of this blog post, and that the IP address is correct, and the port is the same as the port specified in your Logstash configuration.
    6. Be sure that your Logstash configuration input and output plugins are configured properly and that you did not accidentally leave something commented out. For example, Elasticsearch indexes will not be created if the elasticsearch output plugin has been commented out.



    The ELK tools are very flexible, powerful, and reasonably intuitive. While there is much more to know about these tools than can be covered by these guides, I hope that this series gives you a head start to set up your own ELK instance and configure it to monitor HCP.


    If you do follow the instructions in this series I would love to hear from you. Please comment below to say you tried and it worked, tried and failed, or to provide any feedback which I can incorporate to improve the content for future readers.



    Hi, this is just a quick post to share a particularly helpful method for troubleshooting issues between a Java client application and the HCP S3 Gateway. Most Java based software will allow you to inject Java System Properties at launch time, either by editing a configuration file or a launch script. This post does not cover how to achieve that step, to answer that question use the product documentation, Google, or ask their support team.


    If you will be adding Java system properties by configuration you want to add the followinq name value pair (choose the correct value for your system type):


    If you will be adding the system property by modifying the JVM launch script you will want to add the following to the JVM launch:



    Finally you will want to create the file with the following contents. This will log to both a file and to stdout, you can remove either file or stdout from the log4j.rootLogger directive to eliminate one or the other. Make sure that you modify the log4j.appender.file.File directive to indicate the desired output file location.

    # Root logger option

    log4j.rootLogger=INFO, stdout, file


    # Class Level Logging Directives


    # Direct log messages to stdout




    log4j.appender.stdout.layout.ConversionPattern=%d [%t] %-5p %c -  %m%n


    # Direct log messages to a log file






    log4j.appender.file.layout.ConversionPattern=%d [%t] %-5p %c -  %m%n


    The properties file above turns AWS and http logging up and will provide detailed output on all the data sent and recieved by the SDK as well as information about how the SDK is calculating the V2 or V4 signature. Here is an example of debug output for a S3 create bucket request. Keep in mind Jive may have changed a bit of content here and there.

    2018-09-13 09:01:01,881 [main] DEBUG com.amazonaws.AmazonWebServiceClient -  Internal logging successfully configured to commons logger: true
    2018-09-13 09:01:01,931 [main] DEBUG com.amazonaws.metrics.AwsSdkMetrics -  Admin mbean registered under
    Creating bucket bc1

    2018-09-13 09:01:02,120 [main] DEBUG com.amazonaws.request -  Sending Request: PUT / Headers: (User-Agent: aws-sdk-java/1.11.84 Linux/4.8.6-300.fc25.x86_64 OpenJDK_64-Bit_Server_VM/25.121-b14/1.8.0_121, amz-sdk-invocation-id: b6959e76-3a13-70c0-8406-6dd42df586b5, Content-Type: application/octet-stream, )
    2018-09-13 09:01:02,149 [main] DEBUG -  Calculated string to sign:

    Thu, 13 Sep 2018 13:01:02 GMT
    2018-09-13 09:01:02,169 [main] DEBUG org.apache.http.client.protocol.RequestAddCookies -  CookieSpec selected: default
    2018-09-13 09:01:02,176 [main] DEBUG org.apache.http.client.protocol.RequestAuthCache -  Auth cache not set in the context
    2018-09-13 09:01:02,177 [main] DEBUG org.apache.http.impl.conn.PoolingHttpClientConnectionManager -  Connection request: [route: {}->][total kept alive: 0; route allocated: 0 of 200; total allocated: 0 of 200]
    2018-09-13 09:01:02,189 [main] DEBUG org.apache.http.impl.conn.PoolingHttpClientConnectionManager -  Connection leased: [id: 0][route: {}->][total kept alive: 0; route allocated: 1 of 200; total allocated: 1 of 200]
    2018-09-13 09:01:02,191 [main] DEBUG org.apache.http.impl.execchain.MainClientExec -  Opening connection {}->
    2018-09-13 09:01:02,198 [main] DEBUG org.apache.http.impl.conn.DefaultHttpClientConnectionOperator -  Connecting to
    2018-09-13 09:01:02,201 [main] DEBUG org.apache.http.impl.conn.DefaultHttpClientConnectionOperator -  Connection established<->
    2018-09-13 09:01:02,201 [main] DEBUG org.apache.http.impl.conn.DefaultManagedHttpClientConnection -  http-outgoing-0: set socket timeout to 50000
    2018-09-13 09:01:02,201 [main] DEBUG org.apache.http.impl.execchain.MainClientExec -  Executing request PUT / HTTP/1.1
    2018-09-13 09:01:02,201 [main] DEBUG org.apache.http.impl.execchain.MainClientExec -  Proxy auth state: UNCHALLENGED
    2018-09-13 09:01:02,203 [main] DEBUG org.apache.http.headers -  http-outgoing-0 >> PUT / HTTP/1.1
    2018-09-13 09:01:02,203 [main] DEBUG org.apache.http.headers -  http-outgoing-0 >> Host:
    2018-09-13 09:01:02,203 [main] DEBUG org.apache.http.headers -  http-outgoing-0 >> Authorization: AWS ZGV2:ma3WomzhOJfQ42bDxlQilmbxVxA=
    2018-09-13 09:01:02,203 [main] DEBUG org.apache.http.headers -  http-outgoing-0 >> User-Agent: aws-sdk-java/1.11.84 Linux/4.8.6-300.fc25.x86_64 OpenJDK_64-Bit_Server_VM/25.121-b14/1.8.0_121
    2018-09-13 09:01:02,203 [main] DEBUG org.apache.http.headers -  http-outgoing-0 >> amz-sdk-invocation-id: b6959e76-3a13-70c0-8406-6dd42df586b5
    2018-09-13 09:01:02,203 [main] DEBUG org.apache.http.headers -  http-outgoing-0 >> amz-sdk-retry: 0/0/500
    2018-09-13 09:01:02,203 [main] DEBUG org.apache.http.headers -  http-outgoing-0 >> Date: Thu, 13 Sep 2018 13:01:02 GMT
    2018-09-13 09:01:02,203 [main] DEBUG org.apache.http.headers -  http-outgoing-0 >> Content-Type: application/octet-stream
    2018-09-13 09:01:02,203 [main] DEBUG org.apache.http.headers -  http-outgoing-0 >> Content-Length: 0
    2018-09-13 09:01:02,204 [main] DEBUG org.apache.http.headers -  http-outgoing-0 >> Connection: Keep-Alive
    2018-09-13 09:01:02,204 [main] DEBUG org.apache.http.wire -  http-outgoing-0 >> "PUT / HTTP/1.1[\r][\n]"
    2018-09-13 09:01:02,204 [main] DEBUG org.apache.http.wire -  http-outgoing-0 >> "Host:[\r][\n]"
    2018-09-13 09:01:02,204 [main] DEBUG org.apache.http.wire -  http-outgoing-0 >> "Authorization: AWS ZGV2:ma3WomzhOJfQ42bDxlQilmbxVxA=[\r][\n]"
    2018-09-13 09:01:02,205 [main] DEBUG org.apache.http.wire -  http-outgoing-0 >> "User-Agent: aws-sdk-java/1.11.84 Linux/4.8.6-300.fc25.x86_64 OpenJDK_64-Bit_Server_VM/25.121-b14/1.8.0_121[\r][\n]"
    2018-09-13 09:01:02,206 [main] DEBUG org.apache.http.wire -  http-outgoing-0 >> "amz-sdk-invocation-id: b6959e76-3a13-70c0-8406-6dd42df586b5[\r][\n]"
    2018-09-13 09:01:02,206 [main] DEBUG org.apache.http.wire -  http-outgoing-0 >> "amz-sdk-retry: 0/0/500[\r][\n]"
    2018-09-13 09:01:02,206 [main] DEBUG org.apache.http.wire -  http-outgoing-0 >> "Date: Thu, 13 Sep 2018 13:01:02 GMT[\r][\n]"
    2018-09-13 09:01:02,206 [main] DEBUG org.apache.http.wire -  http-outgoing-0 >> "Content-Type: application/octet-stream[\r][\n]"
    2018-09-13 09:01:02,207 [main] DEBUG org.apache.http.wire -  http-outgoing-0 >> "Content-Length: 0[\r][\n]"
    2018-09-13 09:01:02,207 [main] DEBUG org.apache.http.wire -  http-outgoing-0 >> "Connection: Keep-Alive[\r][\n]"
    2018-09-13 09:01:02,207 [main] DEBUG org.apache.http.wire -  http-outgoing-0 >> "[\r][\n]"
    2018-09-13 09:01:02,210 [main] DEBUG org.apache.http.wire -  http-outgoing-0 << "HTTP/1.1 200 OK[\r][\n]"
    2018-09-13 09:01:02,210 [main] DEBUG org.apache.http.wire -  http-outgoing-0 << "Date: Thu, 13 Sep 2018 13:01:03 GMT[\r][\n]"
    2018-09-13 09:01:02,211 [main] DEBUG org.apache.http.wire -  http-outgoing-0 << "Pragma: no-cache[\r][\n]"
    2018-09-13 09:01:02,211 [main] DEBUG org.apache.http.wire -  http-outgoing-0 << "Cache-Control: no-cache,no-store,must-revalidate[\r][\n]"
    2018-09-13 09:01:02,211 [main] DEBUG org.apache.http.wire -  http-outgoing-0 << "X-Content-Type-Options: nosniff[\r][\n]"
    2018-09-13 09:01:02,211 [main] DEBUG org.apache.http.wire -  http-outgoing-0 << "X-DNS-Prefetch-Control: off[\r][\n]"
    2018-09-13 09:01:02,211 [main] DEBUG org.apache.http.wire -  http-outgoing-0 << "X-Download-Options: noopen[\r][\n]"
    2018-09-13 09:01:02,211 [main] DEBUG org.apache.http.wire -  http-outgoing-0 << "X-Frame-Options: SAMEORIGIN[\r][\n]"
    2018-09-13 09:01:02,211 [main] DEBUG org.apache.http.wire -  http-outgoing-0 << "X-XSS-Protection: 1; mode=block[\r][\n]"
    2018-09-13 09:01:02,212 [main] DEBUG org.apache.http.wire -  http-outgoing-0 << "Content-Security-Policy: default-src 'self'; script-src 'self' 'unsafe-eval' 'unsafe-inline'; connect-src 'self'; img-src 'self'; style-src 'self' 'unsafe-inline'; object-src 'self'; frame-ancestors 'self';[\r][\n]"
    2018-09-13 09:01:02,212 [main] DEBUG org.apache.http.wire -  http-outgoing-0 << "Strict-Transport-Security: max-age=31536000; includeSubDomains[\r][\n]"
    2018-09-13 09:01:02,212 [main] DEBUG org.apache.http.wire -  http-outgoing-0 << "Expires: Thu, 01 Jan 1970 00:00:00 GMT[\r][\n]"
    2018-09-13 09:01:02,212 [main] DEBUG org.apache.http.wire -  http-outgoing-0 << "Location: /bc1[\r][\n]"
    2018-09-13 09:01:02,212 [main] DEBUG org.apache.http.wire -  http-outgoing-0 << "Content-Type: application/xml;charset=UTF-8[\r][\n]"
    2018-09-13 09:01:02,212 [main] DEBUG org.apache.http.wire -  http-outgoing-0 << "Transfer-Encoding: chunked[\r][\n]"
    2018-09-13 09:01:02,212 [main] DEBUG org.apache.http.wire -  http-outgoing-0 << "[\r][\n]"
    2018-09-13 09:01:02,212 [main] DEBUG org.apache.http.wire -  http-outgoing-0 << "12E[\r][\n]"
    2018-09-13 09:01:02,212 [main] DEBUG org.apache.http.wire -  http-outgoing-0 << "[\n]"
    2018-09-13 09:01:02,212 [main] DEBUG org.apache.http.wire -  http-outgoing-0 << "[\n]"
    2018-09-13 09:01:02,212 [main] DEBUG org.apache.http.wire -  http-outgoing-0 << "  BucketAlreadyOwnedByYou[\n]"
    2018-09-13 09:01:02,212 [main] DEBUG org.apache.http.wire -  http-outgoing-0 << "  Your previous request to create the named bucket succeeded and you already own it[\n]"
    2018-09-13 09:01:02,213 [main] DEBUG org.apache.http.wire -  http-outgoing-0 << "  1536843663173[\n]"
    2018-09-13 09:01:02,213 [main] DEBUG org.apache.http.wire -  http-outgoing-0 << "  Y2x1c3RlcjU1dy0xLmxhYi5hcmNoaXZhcy5jb206MTM4[\n]"
    2018-09-13 09:01:02,213 [main] DEBUG org.apache.http.wire -  http-outgoing-0 << "[\n]"
    2018-09-13 09:01:02,213 [main] DEBUG org.apache.http.wire -  http-outgoing-0 << "[\n]"
    2018-09-13 09:01:02,213 [main] DEBUG org.apache.http.wire -  http-outgoing-0 << "[\r][\n]"
    2018-09-13 09:01:02,215 [main] DEBUG org.apache.http.headers -  http-outgoing-0 << HTTP/1.1 200 OK
    2018-09-13 09:01:02,215 [main] DEBUG org.apache.http.headers -  http-outgoing-0 << Date: Thu, 13 Sep 2018 13:01:03 GMT
    2018-09-13 09:01:02,215 [main] DEBUG org.apache.http.headers -  http-outgoing-0 << Pragma: no-cache
    2018-09-13 09:01:02,215 [main] DEBUG org.apache.http.headers -  http-outgoing-0 << Cache-Control: no-cache,no-store,must-revalidate
    2018-09-13 09:01:02,215 [main] DEBUG org.apache.http.headers -  http-outgoing-0 << X-Content-Type-Options: nosniff
    2018-09-13 09:01:02,215 [main] DEBUG org.apache.http.headers -  http-outgoing-0 << X-DNS-Prefetch-Control: off
    2018-09-13 09:01:02,215 [main] DEBUG org.apache.http.headers -  http-outgoing-0 << X-Download-Options: noopen
    2018-09-13 09:01:02,215 [main] DEBUG org.apache.http.headers -  http-outgoing-0 << X-Frame-Options: SAMEORIGIN
    2018-09-13 09:01:02,215 [main] DEBUG org.apache.http.headers -  http-outgoing-0 << X-XSS-Protection: 1; mode=block
    2018-09-13 09:01:02,215 [main] DEBUG org.apache.http.headers -  http-outgoing-0 << Content-Security-Policy: default-src 'self'; script-src 'self' 'unsafe-eval' 'unsafe-inline'; connect-src 'self'; img-src 'self'; style-src 'self' 'unsafe-inline'; object-src 'self'; frame-ancestors 'self';
    2018-09-13 09:01:02,216 [main] DEBUG org.apache.http.headers -  http-outgoing-0 << Strict-Transport-Security: max-age=31536000; includeSubDomains
    2018-09-13 09:01:02,216 [main] DEBUG org.apache.http.headers -  http-outgoing-0 << Expires: Thu, 01 Jan 1970 00:00:00 GMT
    2018-09-13 09:01:02,216 [main] DEBUG org.apache.http.headers -  http-outgoing-0 << Location: /bc1
    2018-09-13 09:01:02,216 [main] DEBUG org.apache.http.headers -  http-outgoing-0 << Content-Type: application/xml;charset=UTF-8
    2018-09-13 09:01:02,216 [main] DEBUG org.apache.http.headers -  http-outgoing-0 << Transfer-Encoding: chunked
    2018-09-13 09:01:02,221 [main] DEBUG org.apache.http.impl.execchain.MainClientExec -  Connection can be kept alive for 60000 MILLISECONDS
    2018-09-13 09:01:02,224 [main] DEBUG com.amazonaws.request -  Received successful response: 200, AWS Request ID: null
    2018-09-13 09:01:02,225 [main] DEBUG com.amazonaws.requestId -  x-amzn-RequestId: not available
    2018-09-13 09:01:02,225 [main] DEBUG com.amazonaws.requestId -  AWS Request ID: not available
    2018-09-13 09:01:02,225 [main] DEBUG org.apache.http.wire -  http-outgoing-0 << "0[\r][\n]"
    2018-09-13 09:01:02,225 [main] DEBUG org.apache.http.wire -  http-outgoing-0 << "[\r][\n]"
    2018-09-13 09:01:02,225 [main] DEBUG org.apache.http.impl.conn.PoolingHttpClientConnectionManager -  Connection [id: 0][route: {}->] can be kept alive for 60.0 seconds
    2018-09-13 09:01:02,226 [main] DEBUG org.apache.http.impl.conn.PoolingHttpClientConnectionManager -  Connection released: [id: 0][route: {}->][total kept alive: 1; route allocated: 1 of 200; total allocated: 1 of 200]


    Hope this helps you next time you can't figure out why you can't get your client talking to HCP or trying to figure out why a request is failing.

    In the part 1 of this series, we introduced the challenges our customers currently face with storing data long term in Hadoop. In this blog, we’ll discuss how the new Hadoop functionality brings object storage closer to the Hadoop ecosystem and how future Hadoop functionality will continue to simplify big data management. 

    Review: The big data problem

    As we discussed in part 1 of this series, storing petabytes of data in the Hadoop Filesystem (HDFS) and expanding storage in HDFS is costly and inefficient. It requires you to expand compute and storage capacity together. We then reviewed how customers can reduce the cost of storing data in HDFS by offloading data to an object storage system, like Hitachi Content Platform (HCP). However, that only solves part of the problem. Although HDFS offloading solutions exist, they require applications to move their data, or storage administrators to update the applications database after data has been moved. But there must be a better way to more effectively offload data?

    For application owners who don’t want to modify their applications to move their data? Aren’t storage administrators responsible for maintaining the backend storage, so why can’t they solve the growing storage problem in a way that is seamless to the applications? The Hadoop community and Hitachi Vantara recognize this problem and are working towards a seamless Hadoop offload solution to address it.

    Decoupling storage and compute in Hadoop

    Apache Hadoop is currently addressing the issue of uneven storage and compute growth by adding functionality to decouple growing storage capacity from compute capacity. One way Hadoop is addressing this issue is with Heterogeneous Storage. With the introduction of Heterogeneous Storage, Hadoop has made strides towards managing data directly in Hadoop by introducing storage types and storage policies. In Hadoop 2.3, new functionality was introduced to change the data node storage model from a single-storage per data node, to a collection of storage in which each ‘store’ corresponds to physical storage media. This brings the concept of storage types (Disk and SSD) to Hadoop. The concept of storage policies allow data to be stored in different storage types based on a policy. This enables data to be moved between storage types or volumes by setting the storage policy on a file or directory.

    Another important Hadoop feature to decouple storage and compute has been the addition an archival storage type. Nodes with higher density and less expensive storage can be used for archival storage. A new data migration tool called ‘Mover’ was added for archiving data. It periodically scans the files in HDFS to check if the block placement satisfies the storage policy. Although storage policies allow ‘mover’ to identify and move blocks that are supposed to be in a different storage tier, the functionality to transition files from one storage policy to another does not exist. Hadoop is missing a policy engine that looks at file attributes, access patterns, and other higher-level metadata, and then based on what it finds chooses the storage policy for the data.

    To transition data from the Hot storage policy to Cold, the storage administrator either needs to manually tag files and directories or build and maintain complicated tools and logic. Even with an automated storage policy, there is still significant room for cost savings by tiering Cold data outside of the Hadoop filesystem to an object storage system. 

    Bringing object storage closer to Hadoop

    In Apache Hadoop 3.1, external storage can be mounted as a PROVIDED storage type. (See HDFS-9806 for more details.) This brings object storage closer to the Hadoop ecosystem but limits customers by only allowing them to create a read-only image of any remote namespace.  PROVIDED storage allows data stored outside HDFS to be mapped to and accessed from HDFS. Clients accessing data in PROVIDED storages can cache replicas in local media, enforce HDFS security and quotas, and then address more data than the cluster could persist in the storage attached to data nodes. Although more data can be addressed, data still cannot be seamlessly tiered from HDFS to the PROVIDED storage tier.

    Currently, Apache Hadoop is working to extend the tiering functionality to external storage mounted as the PROVIDED storage type. HDFS-12090 is an open item to handle writes from HDFS to a PROVIDED storage target. This enhancement is referenced in a presentation from the Data Works Summit. Features used in the Data Works Summit demo are shown utilizing the ‘hdfs syncservice’ or ‘hdfs providedstorge’ subcommands to assign a storage policy to a data set, and then tier this data to external storage. Unfortunately, the functionality described in HDFS-12090 and what is shown in the Data Works Summit demo is still being designed and not yet scheduled for an Apache release. This leaves us with an open question: How do we seamlessly offload data from HDFS to object storage with the existing HDFS functionality?


    MapR 6.1 functionality

    MapR is another Hadoop implementation, but unlike Apache Hadoop and Cloudera is MapR is fully proprietary and is under its own development. MapR has recently announced that as part of its 6.1 release it will support seamless offloading of data from MapR-FS to "cost-optimized" storage (in other words, an S3 bucket). They describe this functionality as: Policy-Driven automatic data placement across performance-optimized, capacity-optimized and cost-optimized tiers, on-premises or in cloud, with Object Tiering

    The 6.1 MapR announcement also describes the ability to have one global namespace that can transparently store hot, warm, and cold data and eliminate creating segregated namespaces. As data is transitioned from hot to cold, it can be moved from HDFS to cost optimized storage, and applications can continue to access data at the same path.  Object tiering in MapR 6.1 can be easily deployed using simple policies. In a given policy, administrators can identify the data to be tiered, the criteria for tiering, and the choice of a public or private cloud target. Although the described functionality sounds like an end all solution to the data offload problem, this functionality only enables customers who have their Hadoop environments in MapR today, or plan to transition to a MapR Hadoop environment in the future.

    What’s Next?

    The Content Solutions Engineering team recognizes that this is an opportunity to simplify data management and reduce costs for our customers. We are currently evaluating the feasibility of a few different solutions that will provide this seamless Hadoop offload functionality for Apache Hadoop and Cloudera distributions. Keep an eye on the Hitachi community site for more information as we continue to define the solutions around this use case. Please feel free to reach out to the Content Solutions Engineering team if you have feedback or would like to share some customer use cases.

    This first video is a demonstration of leveraging the Hitachi Content Platform(HCP) as an under filesystem in Alluxio and the benefits of using Alluxio to integrate HCP into your Hadoop ecosystem.


    DEMO: Hitachi Content Platform with Alluxio


    The second video is a brief demonstration of the functionality when integrating Hitachi Content Platform(HCP) with an existing Hadoop Filesystem using Alluxio. Additionally we demonstrate moving data from the Hadoop Filesystem to HCP with the Alluxio Command line.


    DEMO: Hadoop Offload to Hitachi Content Platform Using Alluxio

    In this blog, I’ll explore the challenges our customers are facing with storing data long term in Hadoop, and discuss what the Hitachi Content team is doing to help our customers solve these challenges.

    The Big Data Problem with Hadoop

    Data is at the center of our digital world and for years Hadoop has been the go-to data processing platform because it is fast and scalable. While Hadoop has solved the data storage and processing problem for the last ~10 years, it achieves this by scaling storage and compute capacity in parallel. As a result, Hadoop environments have continued to expand compute capacity well beyond their needs as more and more of the storage is consumed by older, inactive data. Although HDFS is effective at storing small-to-mid size repositories of data, HDFS becomes vastly more costly and inefficient as storage needs expand, since this requires increasing both storage and compute. HDFS also relies on data replication (storing multiple copies of each object) for protection. As these data sets grow into the petabytes the growing cost of old data and idle compute in your Hadoop ecosystem will become unsustainable.

    Offloading Solution

    Every storage administrator is thinking about how they can reduce the cost of data storage while still getting the best performance out of their hardware. With this in mind, Apache Hadoop has been continually improving the concept of tiered storage, and in Hadoop 2.6 many improvements to the tiered storage concept have been added. These features allow you to attach a storage policy to a directory, categorize it as Hot, Warm, Cold, or Frozen, and define how many block replicas of the data to keep for that policy. Although storage administrators can reduce the number of copies of data they have to store, they still have the challenge of compute sitting idle. This is where offloading data outside of HDFS can offer huge benefits.

    How Can Object Storage Help Reduce My HDFS Footprint?

    Object storage offers significant cost savings to customers by increasing density and providing greater control over data.  Offloading data from Hadoop to an object store like Hitachi Content platform (HCP) enables customers to unlock a new, cheaper storage tier. The Hitachi Content Solutions engineering team is working with Alluxio to bring in memory caching and object store efficiencies to existing big data challenges. Alluxio is a memory-speed, virtually-distributed storage layer that enables any application to interact with any data from any storage at memory speed. With Alluxio and HCP, HDFS applications can virtualize object storage and move data from HDFS to object storage through a single protocol and interface.

    Why Hitachi Content Platform and Alluxio?

    When configuring Hitachi Content Platform as a understore or a mounted directory in the Alluxio filesystem, applications can simplify and expand their data ecosystem. In this environment, Hadoop applications can read and write data to and from the HCP and Hadoop filesystems. Applications can move data from HDFS to Object storage as simply as moving data from one directory to another. With Alluxio caching, data can be recalled from HCP to the Alluxio in-memory file system on the Hadoop node, enabling memory speed analytics with object store savings. With HCP and Alluxio, applications can unify data access protocols and offload cold data to cost effective storage.


    Looking Ahead

    Later in this blog, I’ll discuss how the new functionality in Hadoop 3.1 brings object storage closer to the Hadoop ecosystem and how future functionality will continue to simplify big data management. Read the next Blog Post.

    Check out our Demo Videos


    This guide is the first in a series explaining how to use open source ELK to visualize the performance of a system. This post includes instructions to install the ELK software. The second guide in the series, Performance Monitoring w/ ELK - Part II: Monitoring HCP Access Logs, gives instructions to configure HCP and your newly installed ELK software to visually monitor HCP. Following the instructions in these 2 posts, you can be visualizing HCP HTTP gateway access logs in under 2 hours. All you need to begin is a Linux server or workstation and a running HCP. Let's get started.


    "ELK" consists of three open source projects: Elasticsearch, Logstash, and Kibana. Elasticsearch is a search and analytics engine. Logstash is a server-side data processing pipeline that ingests data from multiple sources simultaneously, transforms it, and then sends it to a "stash" like Elasticsearch. Kibana is a visualization tool that lets users visualize data in Elasticsearch with charts and graphs.


    The following chart was generated in real time by transmitting HCP access logs to Logstash over the syslog protocol, indexing the logs in Elasticsearch, and visualizing them with Kibana. This chart visualizes transaction load distribution among HCP nodes.


    This first guide explains how to install the ELK stack on a Linux server that supports RPM based install (Redhat, Fedora, CentOS, SUSE, etc.). In subsequent posts, I’ll explain how to apply ELK to monitoring the performance of any Linux system as well as how to monitor the performance of several specific software systems like Pulse Secure vADC load balancer and Hitachi Content Platform.


    For my own ELK performance monitoring, I have been using a CentOS VM system with 64G RAM and 12 vCPU. This has been adequate for monitoring a 4 node HCP system under significant load. Similarly, to use ELK, you should also have a system with significant storage capacity as the ELK indexes can get quite large. Note that this guide is not intended to give any guidance on sizing a production ELK environment or configuring your system for availability. These instructions are intended only for POC ELK environments.


    After you have installed the Linux OS (not covered here) on your host, you’ll need to install Java, ELK, Kibana, and Logstash. The steps described below are specific to CentOS using YUM but should be easily translatable to other front-end software package managers, like DNF. Make sure that you have network connectivity between the ELK host and the systems you wish to monitor.


    Important: This monitoring solution is not provided or supported by Hitachi Vantara. If you are looking for a supported Hitachi Vantara monitoring solution, use the Hitachi Content Monitor (HCM). For more information, refer to this announcement in the Hitachi Content Intelligence space: Announcing Hitachi Content Intelligence v1.3.


    Installing ELK

    Step 1: Disable Firewall or Open Ports

    You can either disable your firewall entirely, or open the ports needed for Kibana, Elasticsearch, and any logstash listeners you configure.


    Examples on CentOS

    1a: To disable FW:

    systemctl disable firewalld
    systemctl stop firewalld

    1b: Or to open ports:

    firewall-cmd --zone=public --add-port=5601/tcp --permanent
    firewall-cmd --zone=public --add-port=9200/tcp --permanent
    firewall-cmd --zone=public --add-port=514/udp --permanent
    firewall-cmd --zone=public --add-port=515/udp --permanent
    systemctl restart firewalld
    • 5601: kibana web application
    • 9200: elastic rest API
    • 514: logstash syslog listener for HCP
    • 515: logstash syslog listener for vADC


    Step 2: Install Java

    1: Download the latest Java 8 SDK RPM from Oracle Technology Network.


    There are multiple ways you can get the rpm to your node. For convenience here is a command to download the Java SE x64 Development Kit 8u172 directly to your ELK node via command line. This may not work as the file locations can change:

    # wget --no-cookies --no-check-certificate --header "Cookie:; oraclelicense=accept-securebackup-cookie;" ""


    2: Install Java by invoking the rpm installation:

    rpm -ivh jdk-8u172-linux-x64.rpm


    Step 3: Install Elasticsearch

    1: Create the file /etc/yum.repos.d/elasticsearch.repo with the following content:

    name=Elasticsearch repository for 6.x packages


    2: Install elasticsearch using yum:

    yum install elasticsearch


    3: Edit the file /etc/elasticsearch/elasticsearch.yml to make elasticsearch available to external IP. Set the following property: _<netadapter>:ipv4_,_local_

    Where <netadapter> is your network adapter for the IP on which you want to expose the service. You can use the ifconfig command to find your adapter. In my case the adapter is ens32 so my setting was: _ens32:ipv4_,_local_


    4: Edit the file /etc/elasticsearch/jvm.options to give elastic jvm enough working memory (heap space). Set the following properties:



    This is one of those settings you will have to play around with in a production config. I expect on a very heavy load system with high transaction counts you will want lots of CPUs and to give elasticsearch a ton of heap.


    5: Enable the service to start automatically on reboot, start the service, and check the status of the service is running

    systemctl enable elasticsearch
    systemctl start elasticsearch
    systemctl status elasticsearch


    Step 4: Install Kibana

    1: Create the file /etc/yum.repos.d/kibana.repo with the following content:

    name=Kibana repository for 6.x packages


    2: Install kibana using yum:

    yum install kibana


    3: Edit the file /etc/kibana/kibana.yml and set the following properties: <ipaddress>

    To make elasticsearch available on external IP. Where <ipaddress> is the IP on which you want to expose the service. You can use the ifconfig command to find your ip.


    elasticsearch.requestTimeout: 360000

    Time in milliseconds to wait for responses from the back end or Elasticsearch. 6 minutes is more than enough to avoid having your queries timeout.


    4: Create the file /etc/kibana/jvm.options to give kibana jvm enough working memory (heap space). Use the following content:

    ## JVM configuration

    ## IMPORTANT: JVM heap size
    ## You should always set the min and max JVM heap
    ## size to the same value. For example, to set
    ## the heap to 4 GB, set:
    ## -Xms4g
    ## -Xmx4g
    ## See
    ## for more information

    # Xms represents the initial size of total heap space
    # Xmx represents the maximum size of total heap space


    If by chance the file already exists, just set the -Xms and -Xmx properties.


    5: Enable the service to start automatically on reboot, start the service, and check the status of the service is running

    systemctl enable kibana
    systemctl start kibana
    systemctl status kibana


    Step 5: Install Logstash

    1: Create the file /etc/yum.repos.d/logstash.repo with the following content:

    name=Elastic repository for 6.x packages


    2: Install logstash using yum:

    yum install logstash



    Following this guide, you should be able to install and configure all of the ELK components needed to begin visualizing system performance. From beginning to end, the entire process should take under 1 hour.


    Check out the next guide in the series, Performance Monitoring w/ ELK - Part II: Monitoring HCP Access Logs. There you will find instructions to configure HCP for monitoring, and to configure your newly installed ELK components to visually monitor HCP HTTP gateway access logs.


    If you do choose to follow the instructions in this series I would love to hear from you. Please comment below to say you tried and it worked, or tried and failed, or to provide feedback which I can incorporate for future readers. Thanks!

    HCP chargeback reports contain valuable information that is useful toward understanding HCP utilization and workloads. The problem is that the data can be overwhelming. Trying to understand this data in it's tabular form is not humanly possible. What we need to understand this data is visual representation, but building charts and graphs is time consuming isn't it? Actually no, you can visualize chargeback report data in under 5 minutes using the PivotChart features in Excel. Read on to find out how.


    In the HCP System Management Console go to the Monitoring => Chargeback page. Select the range of dates you would like to report and choose Hour or Day reporting interval. Hour is recommended if you would like to be able to identify time of day peaks in application activity.

    Click the Download Report button and open the downloaded report.csv file in Excel. The example shown uses Microsoft Excel 2016, but the pivot chart features have existed since at least 2008.  Save the file as an xls/xlsx file, csv format is not capable of saving the changes you will be making.


    Click anywhere inside the data in you spreadsheet and then, in the Insert menu, select PivotChart.

    On the Create PivotChart dialog, your data has automatically been fully selected, and it will default to placing your PivotChart in a new sheet. Accept the defaults and click OK.

    Excel will create a new sheet in your workbook, into which it will insert a new PivotTable (green) and a new PivotChart (red). Both Pivot objects are empty because we have not added anything to them just yet. If you do not see a dialog window titled "PivotChart Fields" just click anywhere inside the PivotChart area1.
    1If you still do not see it right click in the PivotChart area and select "Show field list".

    In the PivotChart Fields dialog you will select the fields from the data that you want to visualize, and how you want to represent and filter that data. Typically you will want to be able to filter by tenantName and namespaceName, so go ahead and drag those fields into the 'Filters' section. We also typically like to display the activity in these reports over time, so go ahead and drag startTime into the 'Axis (Categories)' section. Finally you need to decide what data values you want to visualize, for our first chart lets visualize the number of read and write operations, so we will drag reads and writes into the 'Values' section.


    Notice that after we drop the fields in the Values section that the text changes to "Sum of <field>", this is because we have to specify field aggregation rules. We need these rules if, for example, we decide to roll our results up by month instead of day or hour. In some cases we may wish to show the sum of the values (default), or the average, or perhaps the min or the max. To change aggregation rules for the field click the down arrow next to the field in the Values section and click "Value Field Settings...", select your aggregation rule. For now we are going to leave our aggregation at Sum.

    If you have been following along you should now see something like what you see below. Your PivotTable (green) now has data in it as does your PivotChart (red). The first thing you may notice is that your nice daily or hourly chargeback data has been rolled  up by month. And you may also notice that the numbers shown are much higher than what the system is actually doing. Let's start by addressing the inflated numbers.

    Because of the way chargeback reports roll up the data by tenant (tenant not blank, namespace blank) and by system (tenant and namespace blank), if you aggregate the whole report you will be triple counting values: once in the namespace, once in the tenant rollup, and a third time in the system rollup. For our example let's just look at system numbers, so we will filter on tenantName=blank. In the upper left corner of your PivotChart click on tenantName.

    And in the selection dialog select '(blank)' and click OK.

    Now that we are not double counting data, let's fix the time rollup issue. In the PivotTable right click on any month value, 'Nov' in the example below.

    Select 'Group' from the context menu, and in the Grouping dialog box select Hours, Days, and Months and click OK.

    Now you will see your data reported by hour.

    It may be difficult to digest as a bar chart, you can convert to a different chart type. In the PivotChart Tools Design menu click "Change Chart Type".

    In the Change Chart Type dialog box, choose line and click OK.


    You now know enough to get started and to quickly visualize HCP workload information. You can see from the examples above that you can quickly choose different field values to visualize, filter the data by tenant or namespace, and choose the right chart type to graphically represent your data.

    The Hitachi Content Platform is a great object platform. It provides all the advantage of traditional cloud providers in your datacenter. This article will describe how to perform several tasks via PowerShell in order to automate similar tasks.


    Connect to HCP via PowerShell

    Generating Secure Access Keys and Secret Keys for the HCP Platform with Microsoft .Net for C# and PowerShell

    The Hitachi Content Platform is a great Object Storage for your Datacenter. In order to use it programmatically you need to have a Access and Secret Key just like with every other object storage. This quick blog post will show you how to create those keys with PowerShell or C#






    #                                                               #

    # Name: Get-HS3AccessKey.ps1                                    #

    # Author: Carlos Vargas                                         #

    # Version : 1.0                                                 #

    # Contact : carlos dot vargas at hds dot com                               #

    # Note: Script to convert information for HS3 API for HCP       #

    #                                                               #




    # Intro


    Write-Host "HCP HS3 Access Key Convertion Tool"

    Write-Host ""

    Write-Host ""


    # Function to Convert to Account name to Base64

    function ConvertTo-Base64($string) {

       $bytes  = [System.Text.Encoding]::UTF8.GetBytes($string);

       $encoded = [System.Convert]::ToBase64String($bytes);


       return $encoded;



    # Function to convert Password to MD5

    Function Get-StringHash([String] $String,$HashName = "MD5")


    $StringBuilder = New-Object System.Text.StringBuilder








    # Get Tenenat Account Name and Password

    $accesskey = Read-host  "Type the HCP Tenant Account Name"

    $SecretKey = Read-host  "Type the HCP Tenant Account Password" -AsSecureString


    # Convert Secret to plain text

    $BSTR = `


    $PlainPassword = [System.Runtime.InteropServices.Marshal]::PtrToStringAuto($BSTR)


    # Send values to formulas

    $aktemp =  ConvertTo-Base64($accesskey)

    $sktemp =  Get-StringHash($PlainPassword)


    # Output values

    Write-Host ""

    Write-host "The HS3 Access Key for $accesskey is: " $aktemp -ForegroundColor Yellow

    Write-host "The HS3 Secret Key for $accesskey is: " $sktemp -ForegroundColor Green

    Write-Host ""

    Write-Host ""






    This is a quick blog to help all PowerShell Developers and Administrators to leverage the power of HCP via PowerShell


    1. Install the Hitachi Storage Adapter for Microsoft Windows PowerShell



    2. Open a Windows PowerShell ISE Window.



    3. In order to Add the HDS PowerShell SnapIn to your PowerShell Session follow the following steps.

    # Add Hitachi Data System PowerShell SnapIn

    Write-Host ""

    Write-Host "-------------Importing Hitachi Data System PowerShell Module --------------" -BackgroundColor White -ForegroundColor Black

    Add-PSSnapin "Hitachi.Storage.Management.Powershell2.Admin"

    Add-PSSnapin "Hitachi.Storage.Management.Powershell.Admin.HCP"





    4. Next let us add a function to convert the password into the correct format needed to authenticate against HCP.

    # Functions


    function ConvertFrom-SecureToPlain {


        param( [Parameter(Mandatory=$true)][System.Security.SecureString] $SecurePassword)


        # Create a "password pointer"

        $PasswordPointer = [Runtime.InteropServices.Marshal]::SecureStringToBSTR($SecurePassword)


        # Get the plain text version of the password

        $PlainTextPassword = [Runtime.InteropServices.Marshal]::PtrToStringAuto($PasswordPointer)


        # Free the pointer



        # Return the plain text password






    5. Now let us add a couple of variables for the HCP System name, Username and Password

    # Variables

    $HCPFQDN = Read-Host "Please Type the FQDN or DNS Name of your HCP System. Ex."

    $HCPAdminUser = Read-Host "Please type your administrator acccount username for the HCP System"

    # Capture the Password in Encrypted AES form and so it can be secured

    $HCPAdminPass = Read-Host "Please type your administrator acccount password for the HCP System" -AsSecureString



    6. Let's call the ConvertFrom-SecureToPlain Function and then connect to the HCP

    # Convert Password from AES to Appropiate Format

    $HCPAdminPassUnEncrypted = ConvertFrom-SecureToPlain($HCPAdminPass)


    # Connect to HCP

    $ConnectToHCP = Add-HCP -HCPSystem $HCPFQDN -UserID $HCPAdminUser -Password $HCPAdminPassUnEncrypted



    7. Now let's connect to the HCP System and store the information in a variable called $HCP

    # HCP Details to HCP

    $HCP = Get-HCP




    8. Now we can execute our script and connect to the HCP System and get some basic information like the current tenants in the HCP.




    Hope this has been helpful and that you can use the code to create great scripts.