DataDirect Redshift Drivers: Million Row Challenge

Status
Not open for further replies.
S

Sumit Sarkar

Guest
DataDirect tests a 1M row bulk load of Redshift data.

Can DataDirect Improve Data Performance?


The challenge? Load one million rows of data into Amazon Redshift, a process that usually takes six hours, in less than one. All using my usual tools and no help from an Amazon S3 bucket. That was the challenge I accepted last October. I can’t say that it wasn’t a little bit daunting.

The reason for this challenge is simple: Until now, the only way to load data into Redshift was to use Amazon S3 buckets. Loading data into Redshift is an isolated, time-consuming and frustrating process, outside of usual workflows. This inefficiency just won’t cut it in today’s performance-driven world. I wanted to prove that this process could be much faster than people realize and easily integrated into your daily workflow.

Bulk-Loading Data at Lightning Speed


So, how did we do? Pretty well, I’d say. Using Progress® DataDirect® drivers, we were able to cut the time to load one million rows of data from six hours down to only eight minutes.

Your Step-by-Step Instructions


It really is as simple as downloading our Progress® DataDirect® Amazon Redshift ODBC driver or JDBC driver. After some minor reconfiguration on Redshift, you are ready to bulk upload data at lightning speeds. In my demo, I used Oracle Data Integrator, but the drivers I used are compatible with many more tools including:

  • Microsoft SSIS
  • IBM DataStage
  • Informatica PowerCenter
  • Ab Initio
  • SAP Data Service
  • Pentaho Data Integrator
  • Talend
  • Syncsort DMExpress
  • Qlikview Expressor
  • SAS ETL
  • Actian DataConnect

DDL for target supplier table


Once you’ve chosen your tool, just follow these steps:

  1. Obtain Amazon Redshift credentials or sign up for a free trial: http://aws.amazon.com/redshift/free-trial/
  2. Download a free trial of DataDirect Amazon Redshift ODBC driver or DataDirect Amazon Redshift JDBC driver.
  3. Connect to Amazon Redshift and create the DDL for the target supplier table as shown at right.
  4. Download a CSV source file with sample data.
  5. Build a basic workflow to load data from your CSV file into Amazon Redshift using the DataDirect driver.
  6. Run workflow.
The Results Are In


The following images are sample results using Microsoft SQL Server Integration Services 2012 (SSIS) and finishing in less than 10 minutes compared to six hours with the open source Postgres ODBC driver.


Fir. 1: Million Row Challenge Results







Fig. 2: Data task wofkflow and validation.




Webinar: Get Tips on Better Database Performance


This tutorial shows one way you can get massive improvement in your data connectivity performance, but it’s just a sample insider tip from Progress DataDirect. If you want to discover more ways to improve performance, be sure to register for this February 11 webinar: Industry Insight: Optimize Your Data for Better Performance. We look forward to seeing you there! If you want to get started now, Get Your Free ODBC Driver Trial Now.

I also talk about the challenge in this video: https://www.youtube.com/watch?v=T5pTD-gZ5eo

Author information

sumit-sarkar-w75.jpg
Sumit Sarkar

Principal Systems Engineer, Progress DataDirect at Progress Software

Data Access Evangelist, Speaker and World's leading consultant on direct SQL access to Cloud Data.

Sumit Sarkar has been working in the data connectivity field for 10 years. His interests include performance tuning of the data access layer for which he has developed a patent pending technology for its analysis; business intelligence and data warehousing for SaaS platforms; and data connectivity for aPaaS environments - with a focus on standards such as ODBC, JDBC, ADO.NET, and ODATA.

He is an IBM Certified Consultant for IBM Cognos Business Intelligence and TDWI member.

He has presented sessions on data connectivity at various conferences including IUC34, IUC35, Progress Revolution, Oracle User's Group, Collaborate 12, Dreamforce 2012, Cloud Computing Expo 2012, PRGS13 Exchange, Dreamforce 2013, Oracle OpenWorld 14, PRGS14, DF14

Presented related webinars in collaboration with IOUG, DZone, Progress DataDirect and Progress Communities.

| Twitter | Google+ | LinkedIn |

Continue reading...
 
Status
Not open for further replies.
Top