By Puneet Lakhanpal | Sr. Data Architect
In the growing world of digital innovation many enterprises are looking to evolve from modern data warehousing to modern data platforms in order to onboard all their data into a single platform. This allows them to create a data marketplace where they can monetize business insights to internal and external partners.
Snowflake fulfills this vision and propels the journey from data to insights for their customers. To onboard customer data into Snowflake, we generally come across three types of ETL/ELT solutions:
- Cloud Native such as Azure Data Factory
- Enterprise solutions such as Informatica, Qlik, Striim, Talend, Confluent etc.
- Legacy solutions such as SQL Server Integration (SSIS)
According to
Datanyze, Microsoft's SQL Server Integration Services (SSIS) is reported to have 13.40% market share in US across 141 data integration technologies, and stands at position #2 behind Informatica with market share of 18.52% across US. The same story of how pervasive SSIS is in enterprises is also documented by Charles Yorek, Senior Sales Engineer at Snowflake on a recent
Snowflake Blog. In this blog, Charles lays out three strategies of integrating SSIS with Snowflake.
- Option 1: Using native Snowflake ODBC connector and leaving SSIS packages unchanged
- Option 2: Using native Snowflake ODBC connector but modifying SSIS packages to utilize PUT/COPY commands
- Option 3: Using CData’s SSIS Snowflake Components to update existing SSIS packages
To explore some of the strategies above, Trace3 prepared a two-part article that examines a real Snowflake sales cycle where SSIS components were created for data ingest and transformation for a customer who was evaluating Snowflake compared to SQL Server and Hadoop on premises. Part one of the article, available for download below, takes an in-depth look at a version of Option 2 (above) where instead of using Snowflake ODBC driver, an SSIS script task was used to create multiple CSV GZIPPED files in conjunction with SnowSQL.
Here are the results that were driven by the use case:
- Data Transformation: Compared to 34 hours in SQL server, Snowflake transformations were completed in 4.33 minutes with parallel Large Snowflake virtual warehouses that used only 4.53 credits. This equated to around $18.12 in Snowflake’s Business Critical Edition following Snowflake’s Cost calculator ($4 per credit).
- Data Access: Performance gains were demonstrated over a live connection to Snowflake vs Hadoop and Tableau extracts. With a Snowflake live connection, much more complex Tableau calculations could be done with Snowflake.
As you will learn, there are several ways that this architecture can be enhanced, such as integrating with an object store for files and leveraging Snowflake’s Tri-secret secure and Storage integrations. However, the intent of this article is to present a story where even traditional integration engines like SSIS can also be used to ingest and transform data into Snowflake (more on transformations in Part 2 of the article) and achieve blazing performances.
At Trace3, we value our partner relationship with Snowflake and look forward to helping customers on their journey from data to insights.