Splunk Implementation Guide

By R J David Burke posted 10-11-2020 23:15


Hitachi Analytic Infrastructure 002: Splunk Implementation Overview

Owned by David Pascuzzi

Revision History

2020-8-211Initial ReleaseDavid Pascuzzi

Background of the Tech Note

The purpose is to provide a basic understanding of Splunk and deployment of a Splunk cluster. This is not an implementation guide. The customer will install and configure Splunk. 
Splunk is one of those unknown but big software companies. During its FY 2020 Splunk had $2.9 billion in revenue. Currently it has 19,000 installations. This vastly outweighs Cloudera. Its deployments vary from one machine to hundreds spread across multiple sites

Basic Splunk Components

Splunk has two main software components: Splunk Enterprise and Splunk Universal Forward.
Splunk Universal Forward is installed on existing computers. It can gather data from the local machines and then forward the data. It does no storing or processing of data. There can be thousands of these in a deployment spread out across the world. These processes can be installed on a handful of operating systems: 
  • Linux Kernels 3.x and 4.x
  • Apple macOS
  • Microsoft Windows 10, Microsoft Windows Server 2016 and 2019
Splunk Enterprise is the binary that is used for Splunk Search Heads, Splunk Indexers, and Splunk Heavy Forwarders. All three processes use the same binary and can perform the action of any of the other components. 
  • Search Heads are instances that the users use to issues queries, view reports, and so forth.
  • Heavy Forwarders gather data from either the local machine or remote machine. They can store and filter the data before passing it on. They are useful for monitoring systems, like switches, that a universal forward cannot be installed on. Heavy forwards still do not require much in the ways of resources. They may be deployed on existing machines, VMs or new machines. They should be located near to the data being monitored.
  • Indexers are the data store and processing computers. They can be stand alone, loosely clustered, or clustered with a master computer in charge.

System Requirements

Supported file system on Linux ext3, ext4, btrfs, XFS, NFS 3 and 4
As of Splunk 8, released 22 Oct 2019, Splunk no longer supports python 2.
Splunk Enterprise is managed through a browser interface. Splunk Enterprise supports the following browsers:
  • Mozilla Firefox (latest)
  • Microsoft Internet Explorer 11 (Splunk Enterprise does not support this browser in compatibility mode.)
  • Apple Safari (latest)
  • Google Chrome (latest)

Storage Configuration

Storage configuration in as far as number of drives, space, and RAID is dependent on the component and user requirements. Usually you will use as many drives as the chassis holds.
  • Search heads. Usually this is one set of drives.
  • This is used for the operating system and Splunk Deployment.
  • This uses disk mirroring.
  • Forwarders. Usually this is one set of drives.
  • This is used for the operating system and Splunk Deployment.
  • This uses disk mirroring.
  • If searching, indexing, or augmenting of the data is done, there could be extra storage.
  • Storage configuration is TBD. It depends on the number of drives and how much storage they want.
  • Indexers
  • Operating system: two drives with disk mirroring
  • On Splunk without SmartStore, hot and warm data share the same storage devices.
  • SmartStore is a Splunk feature to use external storage for warm and cold storage.
  • This has a Hitachi Content Platform solution provided solution.
  • RAID configurations depend on number of drives and customer requirements.
  • Standard option 1 is one large partition for all data temperatures.
  • Standard option 2 is one partition for hot and warm data with second partition for cold data.
  • If there are two different drive types, the faster type will be for hot or warm storage.
  • Because of Splunk automatic data movement, it may be a wise idea to use hot or warm data with different devices than cold data.
  • Cold and frozen data can reside on NFS or other remote storage.
  • Frozen data can be stored alongside cold data or can be stored remotely.
  • If a thawed data area exists, it is normally on the cold devices.
Splunk’s minimum storage I/O requirements are based upon using SATA drives. SAS configurations will surpass them. SSD and NVMe drives will be even faster.
For details on designing a cluster and different cluster layouts see Splunk Validated Architectures (PDF), Splunk Enterprise Capacity Planning Manual, and Splunk Distributed Deployment Manual.
Figure 1 shows a sample configuration of a multi-site deployment with multiple active-active indexing clusters and a shared search head cluster. A load balancer is used to spread the data from the collection tier to the indexers in multiple sites.
Figure 1

Often the data collection tier will be more complex and will include a Kafka layer between the data and the Splunk cluster. Figure 2depicts one such deployment. A Kafka cluster is sitting between Splunk Forwarders which are using Splunk’s Kafka data reader. From there some of the data is passed directly to the indexers. Other data is passed from Kafka to Splunk’s HTTP Event Collector (HEC) which then passes it to indexers. One thing to note is two different strategies are used to go from Kafka to Splunk indexers. A deployment can have multiple strategies. The data sources, indexing requirements, querying requirements are a few of the factors that determine what method is used.
Figure 2


Download the file and use the appropriate command.
  • rpm -i splunk_package_name.rpm
  • dpkg -i splunk_package_name.deb
  • tar xvzf splunk_package_name.tgz -C /opt
  • on Microsoft Windows, run the msi file.
At this point, the software is ready to run.

Configuration of Splunk software

The actual configuration has these options: 
  • Single site
  • Multisite
  • Clustered or non-clustered search heads
  • Clustered or non-clustered indexers head with a either a master indexer or using load balancers, Kafka as an intermediate, or directly pass the data in, high availability and redundancy, and forward location.
Its more complex than installing an SAP HANA scale-out cluster with high availability and disaster recovery.
The first time you run the software, it will prompt for some configuration information on each computer running Splunk Enterprise. 
Splunk is configured with a combination of using the web console connecting to a master Splunk Instance or Deployment Instance, or by connecting to each Splunk instance, orin some casesboth. 
When an application is deployed, it can get deployed and configured by directly logging into each instance, through master/deployment instances which pushes it out to the other nodes, or a combination of the other two.
Universal forwarders need to be configured on each instance. 
Setting up a cluster and getting anything running on it is more of a consulting service than a deployment service. 

1 comment



05-05-2022 13:23