Historically, high performance computing (a.k.a., supercomputing) has placed significant demands on storage technologies. Often these requirements represent bleeding-edge requirements that become main-stream requirement in a few years.
The most prominent trends I've seen in the HPC industry has been the use of SSD over HDD. Prior to lower cost SSD, system designers would build supercomputers using HDD performance IO performance. That means greatly over provisioning the needed capacity in order to gain the disk IO performance. This led to systems with large PB capacity configurations, but TB of used capacity, essentially short-stroking the drives. This resulted in high energy costs, excessive floorspace requirements, sometimes unnecessary capacity license costs, maintenance costs, excessive hardware failures, industry price pressures, and so on, all in the name of more IO performance.
SSD allows much higher IO performance with fewer storage devices. I think it was around 2007-2008 when I read of the first all flash supercomputer at the San Diego Supercomputer Center named Flash Gordon. The goal was to eliminate all of the above negative incidental costs of HDD based IO subsystems by utilizing SSD instead.
The other trend is towards large pure RAM memory based systems and clusters. Today's 100TB memory systems will give way to PB memory systems in the future due to low cost memory. Also expect to see hybrid of RAM and SSD memory extensions for even larger memory based system for in-memory processing.
I guess to overall trend across these trends is "lower IO latency".
agreed. Using SSD / RAM technology you are able to provide the currently fastest IOPS and lowest latency to the processor. But there still remains the issue of 'feeding the beast'. This SSD/RAM still need to be filled and emptied at very high speeds otherwise the mighty CPU's will spend expensive empty cycles. So bringing a lot of data very close to the CPU pipe certainly helps, but the problem is not resolved, it's somewhat less urgent.
And HPC I/O demands are very high. Before the numbers are to be cracked the SSD/RAM needs to be filled with multiple TB's or even PB's of data. And HCP can produce TB's of data / second.
These trends and advances all push the rock a little further up the hill. One of the biggest IO headaches for supercomputer designers is exacerbated by one of these trends I mentioned, large RAM systems. Checkpointing the working state of memory in order to restart from the last known good point takes longer with larger memory systems. In fact, for a time, large memory systems were reaching the 100TB of RAM, while disk IO was still spinning. There was a huge mismatch for a few years until SSD help push the bottleneck somewhere else.
Actually, I believe one of the new bottlenecks is what you are speaking about Mark van Bijnen and that is staging datasets and results in and out of the working storage space (RAM, Extended RAM and High Speed storage) of a supercomputer. These repositories are designed for capacity and resilience, and can't match the speeds of the target staging area, thus processing is stalled while large datasets transfer to the system.
Hi Eric, not reading a real question. But HPC is like the Formula racing, new high performing technologies are developed that later find their way into the mainstream world.
HPC storage can be defined as:
1. very expensive
2. niche market (low volumes, high prices)
3. meeting very high demands in IOPS, latency
You see that the large Industry storage providers not often want to be in this league. They watch this space and startups are experimenting and taking the risks.
Retrieving data ...