Skip navigation

SAP HANA is an in-memory database. New and changed data is written first to the SAP HANA memory, from where it is copied to the database persistence.

 

SAP HANA protects the database from power failures using its log area, frequent log area backups, and database savepoints (periodic save-to-disk of the newest data in the database). These all help to ensure that when the lights go out, the database is in a coherent state and can be safely recovered when the server is switched back on.

Unfortunately, what SAP HANA cannot protect against is a loss of the database persistence itself. Protection from the loss of database persistence requires regular database backups.

 

Hitachi Data Systems provides a backup tool for regular database backups: Hitachi Data Instance Director (HDID). This tool fully integrates with single-node SAP HANA databases to provide database snapshot-based backups on Hitachi Thin Image pools on Hitachi storage systems.

 

Data Instance Director can be used with many applications. Using HDID alongside a SAP HANA database allows the following three main functions:

  • Database backup and restore using HDID snapshots
  • Database backup and restore using HDID snapshots, plus SAP HANA log backup replay
  • Database copy (production to non-production, for example), using HDID snapshots

 

Database Backup, Restore Points, and Data Loss

One of the most important concepts to consider when implementing a backup policy is the recovery point objective — or RPO. This concept determines how much data might be lost in case of failure. If the RPO is one hour, then up to one hour’s data might be lost. Backing up with a lower RPO protects the database better, but also creates a higher load on the system. The following graphic describes the different backup and restore options available in a SAP HANA system.

 

Untitled.png

Source: SAP

 

The illustration above shows that, in case of failure, having the following can restore the complete database:

  • A database snapshot or a full backup
  • Log area backups
  • The log area itself, for the log entries not present in the latest log area backup

 

HDID can protect your SAP HANA database by creating backups of the first two of these items.

 

Database Backup and Restore Using HDID Snapshots

Hitachi Data Instance Director can create a snapshot of the SAP HANA data area, using the native ‘savepoint’ feature of SAP HANA itself. This adds very little overhead to the running system. This ‘savepoint’ occurs every five minutes (by default). These savepoint snapshots are then copied to a Hitachi Thin Image pool on the attached Hitachi storage system.

 

If a database restore becomes necessary, then HDID can do the following:

  • Stop the running SAP HANA instance
  • Revert the SAP HANA data area with a copy of the snapshot

 

The database can then be restored using the standard SAP HANA restore tools (SAP HANA Studio, or using the command line). Snapshot-based backups allow the database to be restored to the date and time the snapshot was taken.

 

Database Backup and Restore Using HDID Snapshots, Plus SAP HANA Log Backup Replay

For a more fine-grained backup policy, use Hitachi Data Instance Director to backup the log backup files for SAP HANA. Every five minutes or so, SAP HANA makes a backup of its redo-log area. These log backup files can be used, if present on the SAP HANA system, to allow a database restore to a more fine-grained time than only the snapshot date and time.

 

Using HDID to create backups of these log area backup files, alongside regular database snapshots, can allow the effective RPO of the backup to be as low as five minutes. It is not possible to back up the SAP HANA log area itself. This limitation exists in SAP HANA, because the database locks the files for its exclusive use. Only the log area backups can be saved by HDID.

 

Database backup using snapshots plus log area backups allows more fine-grained backup and restore times. During a database restore, the following happens:

  • Restores the SAP HANA data area and snapshot using HDID
  • Restores the log area backup files to their original locations (or another directory if preferred), again using HDID
  • Performs the database restore using the standard SAP HANA tools

 

The database snapshot is the basis for the restore. The log backups taken after the snapshot date and time are read automatically by the restore process.

 

Database Copy Using HDID Snapshots

While the main use of a database backup is to protect the installed applications from catastrophic data loss, one additional use case is system copy.

 

During a project lifecycle, it can be useful from time to time to refresh development and preproduction environments with a copy of the production database. Then you can use this copy for performance tests on comparable data volumes in the preproduction environment. Or, you can perform regression rests on real-life situations without impacting the production system.

 

Hitachi Data Instance Director can make the SAP HANA database snapshot visible to a second host. This makes replacing the SAP HANA data area in development or preproduction with the contents of the snapshot into a simple operation. You can do this with only a couple of commands under Linux.

 

Conclusion

Using Hitachi Data Instance Director, you can implement efficient backup and restore policies for your SAP HANA database, and even use it to make copies of your SAP HANA database.

 

To learn more about using Hitachi Data Instance Director on a scale-up SAP HANA database, please check the following implementation guide from Hitachi Data Systems: https://www.hds.com/en-us/pdf/white-paper/hdid-backup-for-scale-up-databases-on-sap-hana.pdf

The SAP HANA database keeps the bulk of its data in memory. It uses persistent storage to provide a fall-back in case of failure. However, if the persistent storage is damaged, you need snapshots of the database for recovery. The ability to restore real time data processing, such as the SAP HANA database, translates directly into revenue for your business.

 

In this, we will discuss how we can leverage Hitachi Storage Adapter for the SAP HANA Cockpit, which is storage management software to simplify IT. When used for infrastructure management, Storage Adapter for SAP HANA Cockpit uses Hitachi Thin Image (HTI) and Hitachi ShadowImage heterogeneous replication.

  • Hitachi Thin Image creates a storage snapshot of the data volumes. You can restore a SAP HANA database using this replica snapshot.
  • Hitachi ShadowImage heterogeneous replication creates a replica of logical volume in the same storage system without host. ShadowImage enables concurrent operations, such as backup and batch processing, with minimum impacts on online operation continuity.

 

This solution is for use by IT administrators, database administrators, storage administrators, SAP HANA administrators, and architects implementing snapshot, recovery, and cloning solutions.

 

Incremental Software and Hardware Components

As additional components to an existing scale-up implementation, you need the following components.

 

Hitachi Storage Adapter for SAP HANA Cockpit

Hitachi Storage Adapter for SAP HANA Cockpit is free-to-download software, which runs as a web application. When needed, the adapter calls the web service on the management server of the SAP HANA platform scale-up configuration. Then, the web service communicates to the attached Hitachi storage and returns the information to the adapter.

 

In addition, the web service collects device-mapping information from the SAP HANA host by using SSH. The web service executes SQL queries for retrieving information from the SAP HANA database.

 

The web service manages all configuration information, storing it in the local disk of the management server. Then, the adapter passes configuration information to the web service.

 

Hitachi ShadowImage Heterogeneous Replication

High-speed, non-disruptive local mirroring technology of Hitachi ShadowImage heterogeneous replication rapidly creates multiple copies of mission-critical information within all Hitachi storage systems.

 

ShadowImage heterogeneous replication keeps data RAID-protected and fully recoverable, without affecting service or performance levels. You can split replicated data volumes from the host applications and used for system backups, application testing, and data mining applications, while business continues to run at full capacity.

 

Hitachi Thin Image

The high-speed, nondisruptive snapshot technology of Hitachi Thin Image snapshot software rapidly creates up to one million point-in-time copies of mission-critical information within any Hitachi storage system or virtualized storage pool without impacting host service or performance levels.

 

Because snapshots store only the changed data, the volume of storage capacity required for each snapshot copy is substantially smaller than the source volume. As a result, Thin Image snapshot software can provide significant savings over full volume cloning methods.

 

Thin Image snapshot copies are fully read/write compatible with other hosts. They can be used for system backups, application testing, and data mining applications while the business continues to run at full capacity. Hitachi Storage Adapter for SAP HANA Cockpit uses Thin Image to create snapshots.

 

Additonal Hard Drives

You need additional hard drives to create a storage snapshot of the SAP HANA data volumes using Hitachi Thin Image or to create an exact replica of the logical volumes using Hitachi Shadow Image heterogeneous replication.

 

Use Cases and Best Practices

These are the use cases and best practices for Hitachi Storage Adapter for the SAP HANA Cockpit in Hitachi Unified Compute Platform for SAP HANA in a scale-up configuration to simplify IT management with storage management software.

 

Replicate Using Hitachi Thin Image

  • Use Hitachi Thin Image for backup to create a replica of the specified logical volume using a snapshot. Using Thin Image, you can create and restore a replica of an arbitrary point of time.
  • The capacity used to store a snapshot can be smaller than the storage needed to copy and store the entire volume data. This is because the snapshot data in the pool volume only stores data that is different from what is in the specified logical volume.
  • Hitachi Storage Adapter for SAP HANA Cockpit uses Thin Image to create snapshots of the SAP HANA database.

Replicate Using Hitachi ShadowImage Heterogeneous Replication

  • Hitachi ShadowImage heterogeneous replication creates a replica of logical volume (a pair) in the same storage system without a host. This software enables concurrent operations, such as backup and batch processing, with minimum impacts on online operation continuity. The ShadowImage pair performs all copy operations asynchronously after receiving the pair operation command.

Recover and Restore the Operating System LUN

  • Use Hitachi ShadowImage heterogeneous replication to create a local replica (S-VOL) of the operating system LUN (P-VOL) in the same storage. This replica can be used to restore the operating system in case of failure.

Recover and Restore the SAP HANA database Storage Snapshot

  • Create a SAP HANA storage snapshot to recover the SAP HANA database after a failure. Hitachi Storage Adapter for SAP HANA Cockpit uses Hitachi Thin Image to create a storage snapshot of the data volumes. Restore the SAP HANA database using this replica for a point in time recovery.

Copy and Clone the SAP HANA Database with Hitachi ShadowImage Heterogeneous Replication

  • Use Hitachi ShadowImage heterogeneous replication to create a local replica (S-VOLs) of the following in the same storage:
    • Operating system
    • SAP HANA shared
    • SAP HANA LOG
    • SAP HANA Data LUNs (P-VOLs)
  • You can use this replica to copy or clone the SAP HANA system.

 

 

Learn More

To learn more, see the following resources:

  Hitachi Storage Adapter for the SAP HANA Cockpit in Hitachi Unified Compute Platform for the SAP HANA Platform in a Scale-Up Configuration Best Practice Guide (PDF)

Analysts, software companies and even some end users are convinced that each microgram of data counts. It must be retrieved from wherever it “lives”, no matter what it looks like. Then, it has to be housed for a long term in a place guaranteeing the best analysis conditions.

 

There are, of course, several real life targets for that, such as Smart City, Smart Factory, Predictive Maintenance, Intelligent Vehicle, and Precision Agriculture. Some may sound futuristic whereas others are already part of our lives.01.png

 

From a terminology perspective we have to deal with: Natural Language Processing, Sentiment Analysis, IOT, Data Streaming, Event Processing, Machine Learning, Predictive analytics… and of course, “Big Data”.

 

To achieve that, on the technical side we need tools and storage. In the SAP world we have that of course: HANA (and IQ). Nothing to say on the quality of the tools, but the cost of storage itself is daunting: as you remember, we work (mostly) in memory!  So, the idea is to put the huge amounts of data into a place that is tightly integrated with the SAP world. This place is Hadoop.

 

02.png

In short, Hadoop is an Apache (~free) project for storing and handling big amounts of data. The technical inspiration came from Google when it released the MapReduce algorithm in 2004. For the less technical part, and how the name came to be – at the time, Doug Cutting’s son (creator’s son) had a toy elephant named Hadoop.

 

SAP use cases

 

SAP supports Hadoop in some technical and business scenarios.

 

03.png

The DWF v1.0 (Data Warehousing Foundation) is a set of tools for data management. Among these tools, the DLM (Data Lifecycle Manager) is able to move the data according to its “temperature”. Cold data, i.e. infrequently used, can be stored according to predefined rules in a “cold store” that can be IQ, HANA Dynamic Tiering engine, as well as Hadoop.

 

The red arrows and forms show the data move which is customized (“what to move and where”) and scheduled (“when to move”) in HANA in an application running on top of the internal application server (the XS engine). In this example data is moved from HANA to Hadoop.

 

The black arrows show the path of an SQL query fetching data. This happens through a “Union View” which makes the underlying physical structures transparent for the application.

 

This mechanism is pure database oriented. In other words, this is a table oriented tool with no application level intelligence, meaning that it is not possible to ship purchase orders (for example) from HANA to Hadoop.  A purchase order is a business object spread over a quantity of tables. To be able to ship it from one place to another, we need knowledge similar to that of the archiving process (relationships between tables and even other related objects).

 

Hopefully the data aging process of S/4HANA will be able to use it in future versions.

 

SAP CAR (Customer Activity Repository), a retail industry oriented application, belongs also to the processes and applications for which SAP has documented the way to couple it to Hadoop. One of the functions of SAP CAR is to integrate the sales data from the stores for aggregation and then to send the aggregated results to the ERP (sales orders, goods movements, a.s.o.).

 

The non-aggregated remaining data could certainly be reused, for example for behavior analysis and forecasting purposes. The problem is of course the volume. And here comes Hadoop again.

 

04.png

SAP explains that there are two ways ship and hold CAR data on Hadoop:

  1. Transfer data from CAR to Hadoop with an SAP given report. Currently, this options (table content aging report) does not have a lot of documentation, except this explanation on SAP website.
  2. Use SDA (Smart Data Access) and create, for example, the TLOGF table (one of the biggest) on Hadoop. More on that can be found in the “Quickstart HDA for CAR 2.0 FP2” guide.

 

Of course, all other trendy processes and applications dealing with big data and running on SAP HANA or other SAP engines can take advantage of Hadoop. Hereunder an example combining the Complex Event Processing platform with its sources (IOT, Sentiment Analysis) and outputs.

 

05.png

Almost every data stream, whatever its nature, can go through an event processing engine for real-time analysis (for computing KPIs & raising Alerts for example). This can be sensor data from factories, trains, football players, or it can be discussion flows on Twitter.

 

Here also, the question is the same as for retailers with CAR: what to do with this data, that could potentially contain important information? Then, the answer could be the same: Hadoop.

 

What is Hadoop (more than a Google smelly toy)?

06.png

 

The heart of Hadoop has four major components:

  • Entry level servers.
  • A distributed filesystem (HDFS) spreading data across the cluster in a redundant way.
  • A resource negotiator (YARN) handling the workload distribution on the resources.
  • A Java framework (MAP REDUCE) enabling application development on top of the Hadoop cluster.

 

The next picture depicts that minimalistic Hadoop landscape from above in the center of a large ecosystem. It is not possible to represent all the members but here are some major ones:

 

07.png

It is important to note that most of these tools have their counterpart in terms of functionality in the SAP world (ex: Graph processing and Machine Learning are integrated into HANA, same for Workflow engine and Scheduler, other tools are concurrent to SAP CEP, Data Services and Lumira).

 

Other “semi-free” products like Pentaho can also cover several aspects around Hadoop like data integration and analytics as well as act like a bridge to other ecosystems (SAP, MongoDB…). It is “semi-free” because some of the tools need to be purchased on a subscription model (see for example the Pentaho Wikipedia page).

 

All of these tools are covered by lots of literature and are Apache (sub-) projects by themselves so we won’t talk about each of them. YouTube and Wikipedia will be good entry-points to learn more about them. However, we cannot talk about Hadoop without saying a few words on the frameworks and especially on MapReduce (especially since the goal is to discuss SAP Vora later on). Let’s try to understand its mechanism using an example.

 

The most common MapReduce example is the Word Count program. It is shipped among the example programs when you install Hadoop. The word count program tells you—for a given input file—how many times you see each word.

 

Here is how it works:

 

08.png

This is a simplified version because words were replaced by single letters and some intermediate steps (sort & shuffle) are not represented.

 

Programs running in the MapReduce framework have two procedures: map() and reduce(). These procedures are distributed and run in parallel in the Hadoop cluster:

  • The Map procedures will filter and sort the input and write output files containing “key,value pairs”. Here it associates each letter with 1. These Map output files are used by Reduce procedures as input files.
  • The Reduce procedures will aggregate the Map output and produce a result file.

 

What does not appear on the schema is that all the intermediate results are written and read from file and therefore make the MapReduce program IO intensive. The answer to that problem is Spark, which is another framework running also on top of Hadoop (HDFS & YARN).

 

Here is the rough difference at technical processing level:09.png

  • Spark uses memory rather than file for intermediate storage. Where MapReduce defines a “small” (~100 MB) buffer for intermediate storage and writes to file when a buffer overflow occurs, Spark relies on operating system memory management mechanisms. Data is written to virtual memory, meaning that the operating system decides whether to put it into RAM or SWAP.
  • MapReduce systematically reads all the data from the input file and then start working. Spark starts processing only once it knows what kind of result is expected, so for example it can filter the input file and fetch only the relevant lines.
  • Spark is not bound to YARN & HDFS. It supports other cluster engines (Mesos) and has also its own. Same for the distributed filesystem, you can also use Amazon S3.

 

On the “functional” side, Spark differs also from MapReduce:

  • Spark includes natively some functions that are to be installed separately in a MapReduce context like Machine Learning, Streaming and Graph Processing. Graph processing, for example, gains to be run in Spark regarding the I/Os because this kind of processing has a lot of intermediate results. Better keep them in memory.
  • The initial Spark user interface is a shell (three are available: Scala, Python & R) which means that you don’t have to proceed to complex developments. The same spirit can be found on MapReduce side if you decide to install and use Hive.
  • Of course some tools can work with both: Oozie, Avro, Parquet.

 

When SAP enters the ring

 

For a couple of years (2001-2002), the trend was e-commerce and the underlying J2EE application servers. SAP acquired one of these Java application server editors: IN-Q-MY (with CEO Shai Agassi – do you remember?). The interesting part was when SAP “opened” the J2EE server and modified it (by developing closer integration towards the ABAP engine, adding table buffering capabilities) and named its whole technical layer (ABAP & JAVA): the Web Application Server (WAS).

 

Similarly to this “Java adventure”, now that the trend is big data & IoT, SAP is on its way towards Hadoop and comes with “Vora” in its luggage. Or, according to SAP AG, “HANA Vora” even if it does not sit on HANA but is integrated into Hadoop Spark. Vora contains SAP developments and even third party tools since Vora 1.2: HashiCorp Consul replacing (?) Zookeeper functionalities from earlier releases. SAP modelled the Java engine to fit its needs and now the same is happening to Hadoop.

 

Here is a high-level view of HANA and Vora.

 

10.png

Note: The arrow depicting the relationship goes in both directions:

  • Vora can access HANA (HANA is seen as a data source accessible using SPARKSQL); and
  • HANA can access Vora (via Spark Adapter/Controller).

 

For a complete overview of HANA <-> HADOOP integration, here is a link to SAP Online Help.

 

Check also Vora developers guide to see more details regarding how to access HANA data from Vora.

 

What can I do with Vora? The answer in version 1.2 is : “SAP HANA Vora enables OLAP analysis of Hadoop data, through data hierarchy enhancements in SparkSQL."

 

11.pngHere is an example on the OLAP-way of seeing data when it is organized in hierarchies, thanks to Vora and HANA (the business related technical terms are in French, but it isn’t critically important for general understanding).

  • On the left hand side, we have a train (most probably the common ancestor of TGV & Shinkansen) with sensors sending raw data to Hadoop.
  • On the right hand side, there is an application running in HANA which has the knowledge of the Bill of Material of our train. Could be an MRO application.
  • Both of these worlds have to be combined (joint) to have suitable information. With hierarchical enrichment of sensor data we are able to:
    • Raise an alert only if both thermometers are giving extreme values because we know they are on the same hierarchy level. If only one shows a critical value we have to further investigate to know if there is a problem on one thermometer and maybe also the train.
    • Anticipate which are the parent components that will fail in case of a child failure or the other way round
    • And more.

 

This is possible because Vora knows how to deal with hierarchies. They are integrated in a normalized form to Vora (cf. table in the picture) and can be queried with special functions like level(u), is_root(u), is_child(u,v), is_ancestor(u,v)… More on that in the developers guide.

 

Check also videos on this subject from the HANA Academy, available on YouTube. Here is the first in a series of 3 videos.

 

According to the “DMM200 – SAP HANA Vora: Overview, Architecture, Use Cases, and Roadmap” session at TECHED 2016, release 1.3 of Vora should also incorporate a time series engine and a graph engine, among other things. These features already exist in HANA, so when running both HANA and Vora, the question will be: “On which side should I run my calculations?”

 

Vora from a technician’s perspective (install, operate, size..)

 

Installation. Vora is an SAP product. In the SAP context, installation tasks have strong guidelines. Currently, three Hadoop distributions are certified by SAP: MAPR, Hortonworks and Cloudera. Check note 2213226 - Prerequisites for installing SAP HANA Vora: Operating Systems and Hadoop Components

 

Operations. Administration of the Hadoop cluster is done with tools like Ambari. For monitoring, Ganglia is a good candidate.

 

Data backup and safeguarding. Production Data in a Hadoop cluster can have the same criticality as in an ERP. Traditional backup tools exist with Hadoop agents (ex: Commvault). Another solution to safeguard data is to create a replicated cluster and use Hadoop native Apache DistCP2 tool.

 

Development. In a Hadoop environment, developments have the same importance as anywhere else in the IT world. This means that versioning, deployment and overall organization must rely on robust processes and tools. Here we are talking about tools like GIT, Jenkins & Maven as well as home grown scripts.

 

Regarding landscape and sizing, SAP as well as each component comes with recommendations.

 

12.png

The figures here are initial recommendations taken from installation and sizing guides. In addition to the initial guidelines, SAP gives also formulas for more precise estimation.

 

Here is a starting point:

 

In the SAP world we are used to have almost accurate sizings with the SAP Quicksizer tool. There is no such tool in the Hadoop world. The best recommendation is to make sizing benchmarks with significant amount of data, to be able to make the best extrapolations.

 

Here are three examples of hardware hosting Hadoop clusters:

 

13.png

Hadoop runs well on Raspberry PI. I have some doubts regarding Spark & Vora.

 

----------------------------------------------------------------------

2-color-oxya-hitachi-logo.png

Christian Lindholm is a leading Technical SAP Architect at oXya’s headquarters in Paris, France. Joining oXya in 2008, Christian has nearly 20 years of experience in technical SAP roles, starting as an SAP Basis Admin and progressing to one of oXya’s leading Technical SAP Architects around the world. In his role, Christian works with SAP customers around the world to design, optimize and implement complex solutions, to serve customers’ unique needs.

 

oXya, a Hitachi Group Company, is a technical SAP consulting company, established in 1998. oXya provides ongoing managed services (outsourcing) for enterprises around the world. In addition, oXya helps customers that run SAP with various projects, including upgrades and migrations, including to SAP HANA. oXya currently employs ~700 SAP experts, who service more than 330 enterprise customers with hundreds of thousands of SAP users around the world.