Exporting Progress Data to SQL Server

Huzzah1000

New Member
I am unsure if this is the appropriate forum to post this in, so I apologize if I am asking the wrong location.

I am attempting to create an ETL process that pulls data from Progress and imports it into SQL Server. The data is intended to be used in a data warehouse.

I do not have access to the server (I believe Unix) that the Progress database resides on. I do have the connection information (username, password, server, port and database) and can connect via ODBC and query the database. I believe the user account I have is a read-only account.

I was hoping someone might be able to provide some insight on the best approach to pull the data into SQL Server (or more accurately, to extract if from Progress).

Obvious options are to attempt using the ODBC connection directly. Honestly, this approach seems a bit slower than I had hoped. (I am attempting to pull just a single table without doing any joins or other logic that might impact performance.)

I have also read about the use of the sqldump utility. I have done some quick tests with a Windows version of this console application without success. I think the primary issue here is that the Windows version I found is an older version of Progress, which I assume is my first problem.

This leads to my questions about sqldump:

  • Can the sqldump utility extract text files (output) when run from a machine other than that hosting the Progress data? Or rather, can it be run remotely?
  • If yes, the output files would reside on the machine running sqldump, correct?
  • I'll ask the obvious, but these should be the same version, correct?
  • Any issues extracting via a Windows version of sqldump from a Unix installation of Progress?
I do not have access to create any Progress code itself, so I assume that would limit other options available to me. I am also attempting to initiate and run this process completely outside of the Progress server/environment -- we have multiple remote locations running the Progress source data and I would rather not have to maintain code between these various locations to manage the extracts.

Any information or input would be greatly appreciated. Thank you!

EDIT: Naturally, I forget to ask one of my questions! If sqldump is feasible, are there any special server settings or user account permissions (beyond being able to read the data in a read-only capacity) that I would need? Thanks again.
 

RealHeavyDude

Well-Known Member
You don't mention the Progress/OpenEdge version you are trying to connect to.

Depending on the version of Progress/OpenEdge you are trying to connect to it may suck more or less. That is because the Progress/OpenEdge database is not an SQL database to talk via ODBC/JDBC to in the first place. Secondly SQL92 access must be setup accordingly in order get it up and running in a somewhat performing and stable way.

What sqldump utility do you use? I have never used such a beast with the Progress database, and - but here I may be wrong - I think there is none.

If you don't have any access to any Progress environment then your best bet is the ODBC connection unless there is any chance to pull the data from an AppSever that is running against that database.

Heavy Regards, RealHeavyDude.
 

Huzzah1000

New Member
Thank you for the reply. Naturally you would want to know the version I'm using, sorry about that!

I am told we are using version 10.1B.

The sqldump utility I am referring to is this: http://dba.fyicenter.com/Progress-Database-Administration-Guide/580_SQL-92_tables.html. From what I can tell, it appears to be a command-line utility that is capable of outputting delimited files; one for each table (well, I believe two, technically, one file for the schema and one for the data).

Early testing has shown that using some of the native SQL language features via the ODBC driver, such as joining many tables, yields extremely poor performance. I know these same queries will be rather fast in SQL Server, my assumption is that this has a lot to do with the underlying database engines. I believe I read somewhere that Progress is not actually a true relational database like the usual suspects (SQL Server, Oracle, DB2, etc.), and that the SQL layer is more or less an additional layer tacked on top of the regular data access method.

I don't believe we'll be able to do any AppServer or similar code, unfortunately. This is mostly a black box for us.

Thanks again for your time.
 

RealHeavyDude

Well-Known Member
Progress is a true relational database - it is just not an SQL database in the first place. That means that there is some sort of separation between the ABL (that's how Progress calls their language since OpenEdge 10, before that is was called 4GL) and the SQL engine. The natural way to maintain the database schema is the ABL data dictionary. Everything that is maintained in the ABL world schema-wise can not be modified through SQL - these definitions are read only in SQL. Furthermore the contents of this table can not be dumped with the sqldump utility. This utility can only dump tables which are created through SQL (which are not visible to the ABL engine). Therefore I don't think that this dump/load utility is of any use for you.

Are there any chances that you can access the database with the data administration tool? This would give you the possibility to dump the data in ASCII format per table.

Regarding the slowness of your queries: It is true that the Progress SQL engine is not as fast as others (compared to Oracle) but as always - it depends. Query performance is mainly determined whether indexes exist to support the query. If there are none no relational database will be fast. Additionally, it is mandatory to run UPDATE STATISTICS on a Progress database that is accessed via the SQL engine on a regular basis. From what you tell me I understand that this might never have happened.

Has this database been accessed via SQL before your requirement popped up or are you the first to access it? If the latter is the case then I fear that almost nothing is set up to provide a reasonable SQL performance.

Furthermore I know that most Progress database are not even reasonably tuned for usage with the ABL engine too ...

At least what you can do is to set the transaction isolation level to READ_UNCOMMITTED.


Heavy Regards, RealHeavyDude.
 

TomBascom

Curmudgeon
The Progress "storage engine" supports 2 "languages", sql-92 and 4gl. The 4gl is the "native" language and virtually all progress based applications use it. The 4gl uses a static query optimizer (compile-time selection of indexes). sql-92 exists primarily to support reporting tools. SQL-92 uses a cost-based, run-time, query optimizer. As RHD has pointed out it is absolutely critical to run UPDATE STATISTICS from time to time. Without those statistics the cost-based optimizer just makes a lot of bad guesses and performance is poor.

Performance may also be poor because, as RHD pointed out, many installed Progress databases are essentially untuned. Progress applications are usually created by an "application partner" (aka "ISV"). These people are good at understanding a business problem and creating an application to solve it. A lot of times they deploy a "one size fits all" solution that works well for small shops and is easy to support and maintain but which absolutely stinks from a performance POV if you become a mid-sized or larger user. It doesn't have to be that way.
 

Huzzah1000

New Member
Thanks to both of you for your assistance, it is helping me out quite a bit.

I have some assumptions based on past observations:

I think the database is a SQL-only (or mostly SQL database). When connecting via ODBC, you can view all tables. There is an existing ETL which is completely designed around SQL queries written against the ODBC driver, then converted to Progress code (looks like the terminology is ABL/4GL code) for the actual extraction of text files.

I honestly have no clue on the maintenance performed against the tables. My assumption is that poor performance would not be tolerated, but as Tom guessed, this is a product made and sold by an ISV as a solution.

Regarding the usage of the data administration tool to potentially extract data, is that something that can be automated, run via command-line or run programmatically (perhaps via an API or some sort)? I am attempting to automate the extraction as much as possible without actually having to install anything on the Progress servers.

Finally, I installed the evaluation version of Progress OpenEdge over the weekend and spent some time tinkering with it (in SQL mode, of course). It was a very interesting experience, to say the least -- Progress is quite a bit different than any other database that I've worked with. After a bit of fooling around and a lot of reading, I did discover the answer to one item that I inquired about in my original post; you can run sqldump remotely from one machine to another. It also seems rather fast, although I haven't yet attempted to compare ODBC performance in my test environment.

Anyway, enough rambling for now! Thank you again. I'll see what I can dig up regarding where the tables are ABL/4GL versus SQL, and also what sort of maintenance plans or activities may be in place that might affect pulling data via ODBC.
 

TomBascom

Curmudgeon
By default all 4GL tables are visible to SQL-92 in the PUB schema. SQL-92 tables that are created outside of PUB are not available to the 4GL.
 

medu

Member
I think the database is a SQL-only (or mostly SQL database). When connecting via ODBC, you can view all tables. There is an existing ETL which is completely designed around SQL queries written against the ODBC driver, then converted to Progress code (looks like the terminology is ABL/4GL code) for the actual extraction of text files.

If the tool access the database using ODBC driver then there is no ABL code involved, it's a completely separate broker that server the SQL clients (ODBC/JDBC).

I honestly have no clue on the maintenance performed against the tables. My assumption is that poor performance would not be tolerated, but as Tom guessed, this is a product made and sold by an ISV as a solution.

The regular ETL queries are plain selects on a single table, no joins are made against production database tables... just fetch what you need from there as fast as you can in a 'staging' area and then do what ever transformation needed there before loading to the target database. For that kind of scenario you'll hardly see any performance improvement if the tables are being analysed or not, as the one and only heavy guy already said the biggest performance improvement you can get is to switch from SERIALIZED to READ-UNCOMMITED transaction isolation level... on large tables you'll probably never be able to look all records in a running system :)

Regarding the usage of the data administration tool to potentially extract data, is that something that can be automated, run via command-line or run programmatically (perhaps via an API or some sort)? I am attempting to automate the extraction as much as possible without actually having to install anything on the Progress servers.
Yes, it can be automated but that need to be run in an progress environment... but if you leave your ETL environment and start doing custom extract you have to handle all possible errors yourself (when dumping and transporting files from one server to another if needed).

you can run sqldump remotely from one machine to another. It also seems rather fast, although I haven't yet attempted to compare ODBC performance in my test environment.
yes you can, but as far as I know the sqldump utility output can only be loaded with sqlload counterpart which means load the data in another progress database which doesn't look like being what you need... I might be wrong though, never tried to use it against other SQL database.
 
Top