On-premises S3-compatible Storage for Snowflake with Hitachi Content Platform
With their June 2023 update, the folks at Snowflake announced general availability of S3-Compatible storage support for Hitachi Content Platform and HCP for Cloud Scale (collectively HCP). With this announcement customers can directly integrate their HCP data sets either on-premises or in collocated private datacenters with their cloud-based Snowflake compute to enable hybrid cloud workflows.
Snowflake’s data platform has revolutionized data warehouse as a service with their impressive suite of capabilities resulting in rapid adoption and tremendous growth. Snowflake’s secret sauce is the ease with which data can be shared on the platform, enabling data producers and consumers to exchange data securely and instantly in the Snowflake marketplace. But there is much more to recommend Snowflake to their customers including:
· the appeal of virtually unlimited and elastic performance scaling with little or no startup or maintenance costs,
· broad support for structured and semi-structured data formats,
· excellent security and governance tools and enforcement,
· and good integration with many of the most popular data integration and BI tools used by their customers.
As good as this all is, Snowflake (like many cloud data platform providers) is facing headwinds as consumers begin to pull back on the size and term of their cloud services investments. The appeal of no startup and maintenance costs is counterbalanced by the reality of complex pricing models with high operating, storage, data ingestion and egress costs. Customers have concerns about data sovereignty and vendor lock in once their data is in a cloud platform entirely outside of their control. This is particularly true for their most sensitive data such as those under HIPAA or GDPR regulation, or intellectual property that represents the corporate crown jewels. For these reasons, customers are again focusing on hybrid cloud and on-premises options that will allow them to reap the benefits of the cloud data platform while balancing their concerns over data control and sovereignty.
With Snowflake’s announcement of s3-compatible storage support for HCP, Hitachi customers now have the flexibility they have been looking for. Snowflake can now directly integrate cloud workloads with on-premises data sets, without the need to first copy or move the data into the cloud. Snowflake external tables, built on top of on-premises data, provide data administrators the ability to expose only those data sets they wish to expose, and control access to those data sets using the same fine grained access control that protect their cloud-native data. With features such as column masking and column level access controls, data admins have the ability to only expose the data that each group of users require for their job. Data queried in this way is loaded directly into the cloud process memory and/or transient cache and is not imported into cloud storage or persisted in the cloud unless the workload saves the results to the native cloud storage. Results may just as easily be persisted to the HCP storage so that both inputs and outputs use on-premises HCP storage. In addition to direct query access, this new capability can be used to import data from the HCP into snowflake, or export data from Snowflake to HCP. This allows customers to easily move less-active data from their Snowflake storage back to their on-premises HCP to comply with long term retention requirements while sensibly managing their cloud spend.
Integrating HCP with Snowflake is as simple as creating an external stage using your HCP endpoint, access key, and secret key with the new “s3compat://” URL scheme:
To export data from Snowflake to HCP use the COPY INTO @stage SQL statement:
And to create tables over the data stored on HCP use the CREATE EXTERNAL TABLE SQL statement with location=@stage, and a pattern to match your file name(s):
Hitachi Content Platform and HCP for Cloud Scale provide extensive support for open table formats such as Iceberg, Delta Lake, and Hudi, as well as for open file formats like Parquet, Avro, ORC, CSV and JSON. We have an extensive ecosystem of analytics ISV partners from traditional RDBMS and Data Warehouse partners like MSSQL, Vertica, and Teradata, to the more modern Data Lakehouse solution providers like Dremio and Starburst. These partners and more are making big investments to support open table and file formats to unlock customer data from proprietary storage formats and to share data more easily across analytics platforms and workloads. Hitachi is all in on supporting our customers and our partners as they embrace open table formats for the next generation of hybrid cloud analytics workloads.
#snowflake #HybridCloud #ReportingAndAnalytics #iceberg