[Progress News] [Progress OpenEdge ABL] Using AWS Glue and Spark with MongoDB via JDBC

Susan Lamatrice · Apr 2, 2018

Learn how to access MongoDB using a DataDirect JDBC driver with AWS Glue.

What is AWS Glue?

AWS Glue is an Extract, Transform, Load (ETL) service available as part of Amazon’s hosted web services. Glue is intended to make it easy for users to connect their data in a variety of data stores, edit and clean the data as needed, and load the data into an AWS-provisioned store for a unified view. Announced in 2016 and officially launched in Summer 2017, Glue greatly simplifies the cumbersome process of setting up and maintaining ETL jobs.

Why MongoDB?

MongoDB is an open-source, NoSQL data store. Rather than the tabular rows and columns format of relational databases, MongoDB uses documents and schemas. MongoDB has grown in popularity and is generally ranked among the top 5 most popular data stores. At Progress, we've seen increased interest in learning how to use MongoDB in an Amazon AWS Glue environment.

JDBC and Glue

Glue supports accessing data via JDBC and currently the databases supported by Glue through JDBC are Postgres, MySQL, Redshift and Aurora. Of course, JDBC drivers exist for many other data sources besides these four. If you want to access any other database with JDBC, you can do so using JDBC drivers through Spark connections. The data can then be processed in Spark or joined with other data sources, and AWS Glue can fully leverage the data in Spark.

Using JDBC connectors you can access many other data sources via Spark for use in AWS Glue. For example, this AWS blog demonstrates the use of Amazon Quick Insight for BI against data in an AWS Glue catalog. Quick Insight supports Amazon data stores and a few other sources like MySQL and Postgres.

With DataDirect JDBC through Spark, you can open up any JDBC-capable BI tool to the full breadth of databases supported by DataDirect drivers, including MongoDB, Salesforce, Oracle and many others.

Accessing JDBC Data through Spark with DataDirect

So, how do you setup a JDBC connection to access data through Spark using a JDBC driver? Here is a quick overview of the simple steps to get started.

Download and locally install the DataDirect JDBC driver, then copy the driver jar to Amazon Simple Storage Service (S3). The drivers have a free 15 day trial license period, so you’ll easily be able to get this set up and tested in your environment.
Create your Amazon Glue Job in the AWS Glue Console.
Follow our detailed tutorial for an example using the DataDirect Salesforce driver. The same steps will apply for MongoDB or any other DataDirect JDBC driver.

Get Started with DataDirect JDBC and AWS Glue

The industry standard for JDBC database connectivity, the Progress DataDirect JDBC drivers solve the limitations of Type 4 JDBC drivers, delivering the fastest, most scalable Java application performance. The DataDirect line of JDBC drivers supports all major databases and include advanced enterprise functionality such as application failover, bulk load, SSL data encryption, and operating system authentication using the Kerberos protocol. DataDirect also publishes a Security Vulnerability Response Policy to address all databases in a timely manner—including SaaS, big data and relational sources.

Download a DataDirect JDBC driver today and get started with AWS Glue.

Start My Trial

Continue reading...

[Progress News] [Progress OpenEdge ABL] Using AWS Glue and Spark with MongoDB via JDBC

Susan Lamatrice

Guest

Similar threads