Comparison Among All Data Replication Methods

Hi All,

I want to understand which data replication method is more economical to implement. Below are couple of methods (AFAIK):-

1. Pro2

Pros
- Real Time replication
Cons
- Paid Tool, big cost to company for each license

2. CDC

Pros
- Real Time replication
Cons
- Paid, cost for company for each CDC license (seems less then Pro2)
- Dumping data from change tables and loading that to target database is all manual process

3. Auditing

Pros
- Real Time replication
- I guess, no license required
Cons
- Dumping data from change tables and loading that to target database is all manual process

4. OpenEdge Replication

Pros
- Real Time replication
- No manual intervention (till the time both source and target are in sync)
- Target DB available in read only mode for reports.
Cons
- Paid, big cost to company for each license
- Both source and target should be on same LAN.

5. AI Roll Forward

Pros
- No License required
Cons
- No Real Time replication
- Target DB is down


Please suggest which method is more economical for a company to implement and should be reliable also.

Thanks and Regards,
 

Cringer

ProgressTalk.com Moderator
Staff member
First of all, Auditing requires a license. It may be it's already included in your license, but a license is required for the tool nonetheless.

That being said, and IMHO, you are approaching this all wrong. Not all data replication are created equal. You first need to evaluate what you want to achieve. What are you mitigating against. This will immediately remove some options from the list. Also, licenses are not the only cost. There is also infrastructure and training as examples. Then you need to establish what the cost of NOT implementing them is in the worst case scenario. Then you can evaluate the financial and physical costs of the solutions against the cost of not implementing them and see what the actual benefits are. It's called a cost benefit analysis.

Once you have done this you can take your analysis to the business and have your argument ready to go as to why they need to pay for something.

Also, cutting corners and using tools that are not designed for replication to do replication will cost you more than you think in R&D time and ongoing maintenance as well.
 

TomBascom

Curmudgeon
Using after imaging to maintain a hot spare probably puts you in the position of needing “DR” licenses. In which case you should just go ahead and license OE Replication since that is the same cost.

Your various “manual” points are all really places where you would code something. It’s not like you would manually run ad-hoc commands. A big consideration then becomes “how good are you (or your company) at this kind of stuff?”
 

TomBascom

Curmudgeon
It is, however, possible to pervert auditing to support replication or CDC-like data transfers. There is (or was) at least one vendor hawking such a solution.
 

Rob Fitzpatrick

ProgressTalk.com Sponsor
I want to understand which data replication method is more economical to implement.
Cost considerations aside, you are equating things that are not equal. OpenEdge Replication and Pro2 both get data from one database to another in some way but they do not do the same things and they do not solve the same business need. For example, Pro2 is not intended for disaster recovery.

Replication of any sort is a solution; a means to an end. You haven't articulated the business problem you want to solve. If you want to learn what is appropriate for your needs, please start by describing your needs. What is the problem you want to solve?
 

Rob Fitzpatrick

ProgressTalk.com Sponsor
It is, however, possible to pervert auditing to support replication or CDC-like data transfers. There is (or was) at least one vendor hawking such a solution.
Well, sure, if you build something on top of auditing. If you squint, an empty field is a house. ;)

OE Auditing, with configured policies and events, writes data into audit tables in the database being audited. And it provides utilities for manually dumping that data to binary dump files and for loading those dump files into the audit tables of another database. That's where it ends. It's a long way from there to turning that data into anything resembling the original data, in application tables.

These days, it would be just about the worst way to get application data from one OE database to another.
 

TomBascom

Curmudgeon
1. Pro2

Pros
- Real Time replication
Cons
- Paid Tool, big cost to company for each license

There are (at least) two flavors of Pro2. The original trigger based approach and the more modern CDC based approach.

One weakness with the trigger based code is that it is unaware of changes that might be made by SQL connections. The CDC based version does not have this problem.

Generally speaking you use Pro2 for ETL purposes. In theory, you _could_ use it for whole database replication but that would be an awful lot of setup and ongoing maintenance.

Because Pro2 (and similar approaches) use many "queues" to perform the data transfer there can also be issues with synchronization. It is quite common for some queues to significantly lag behind others.

A nice "pro" for Pro2 is that the target database can be either an OpenEdge database or a SQL database.

Pro2 depends on RECIDs to track the data that has been replicated. This means that anything which results in new RECIDs (like a dump & load) will require you to start over again with a fresh "bulk load".


2. CDC

Pros
- Real Time replication
Cons
- Paid, cost for company for each CDC license (seems less then Pro2)
- Dumping data from change tables and loading that to target database is all manual process


CDC has many nuances. Its purpose is to efficiently support ETL tasks rather than general purpose whole database replication. It is very good at efficiently identifying those changes.

As you mention, there is no pre-built set of tools to act on the change records that CDC creates. Several vendors offer solutions built on top of CDC or you can roll your own code.

CDC is better than triggers because it sees all changes, even SQL changes, and acts at a very low level.

CDC has several levels of detail and if you implement "maximum" you can potentially avoid the issue of needing to "re baseline" or "bulk load" your target. (The CDC based Pro2 uses the "minimum" level at this time.)


3. Auditing

Pros
- Real Time replication
- I guess, no license required
Cons
- Dumping data from change tables and loading that to target database is all manual process


Using auditing for replication or change data capture is a stretch. Yes, you can sort of do it and yes there is at least one commercial product that offers such a capability but you're really abusing the tool at that point.

OpenEdge Auditing requires management of the audit logs. You need to archive and purge the audit data on a regular basis or you will very quickly be in a difficult position. We have had to deal with multiple cases where inattention to basic audit log maintenance resulted in databases where the vast majority of the data is in the audit areas. Among other things this means that your backups and restores are huge and take far too long. Purging that data after it becomes huge is also deeply painful. In one case we had to run the purges every night for almost six months to get things under control.

OE Audit logs are, by default, poorly indexed. The data itself is also not well organized for ETL or replication purposes. Interpreting and extracting the data is painful compared to CDC or trigger based solutions.

Pre-CDC, perverting OE Auditing had some attractions because, like CDC, it acts at a low level and sees SQL changes as well as 4gl changes.


4. OpenEdge Replication

Pros
- Real Time replication
- No manual intervention (till the time both source and target are in sync)
- Target DB available in read only mode for reports.
Cons
- Paid, big cost to company for each license
- Both source and target should be on same LAN.


If you need a hot spare for high availability or disaster recovery purposes OE Replication is what you want.

The entire database is replicated. The replication process occurs at a low level and is based on after-image notes. So it tracks all changes from both 4gl and SQL sources. OE Replication automatically handles schema changes - adding new tables, fields, and indexes "just happens".

There is, however, an unfortunately long list of maintenance tasks that still force you to "rebaseline" replication. On the bright side that list gets shorter and shorter with every release of OpenEdge.

OpenEdge Replication is all or nothing -- you are replicating the entire database in real time. You cannot exclude certain tables and you cannot delay transactions. This means that if a user accidentally deletes all of your orders that happens on both the source and on the target(s). So you still need to manage your after-image files and be prepared to restore and roll-forward. (This is true of *all* of these solutions but it is frighteningly common for people to somehow think that after-imaging does not matter any more if they are using OE Replication.)

If you have OpenEdge Replication Plus the target databases can be used as read-only connections for reporting or inquiry purposes taking some of the load off your production database.


5. AI Roll Forward

Pros
- No License required
Cons
- No Real Time replication
- Target DB is down


Target db must *stay* down. If you open it you get to start over.

As I mentioned previously you are not exempted from needing a proper DR license. If you have OpenEdge installed on another server and you are rolling forward after-image files to have a warm spare then you need that license.

Mentioned elsewhere is the idea of restoring a db and running BUFFER-COMPARE to find changes

Obviously this is not real time.

It also requires someone to write some code, manage, and maintain the whole process.

The comparisons are probably going to need to look at a *lot* of data and unless your schema has been cleverly organized in advance you will be scanning the entire table every time (you are very unlikely to have a "last change" field helping you to determine where to start scanning).

The comparisons will take longer and longer as the database grows.

None the less - this might be workable for small databases with a modest number of tables and now need for real time extracts.
 

TomBascom

Curmudgeon
Another replication solution is to attack it from the OS or SAN level using snapshots or some other technology such as "v motion".

This can seem attractive because it is outside of the OpenEdge environment and may not seem to require much administration or coding.

However:

1) You cannot trust these technologies to "just work". *All* of the database must be carefully synchronized with the snapshot. Vendors will, of course, claim that is no problem and that they do that. Most of them are lying.

2) These solutions often appear to work in test environments. Would anyone like to guess when they fail?

3) To successfully and reliably use a snapshot or similar technique you need to properly enable and disable database quiet points. This requires scripting that not only issues the proquiet commands but which also monitors the .lg file to ensure that the commands have taken affect before you execute the snapshot.

Like OE Replication this technology is all or nothing. If carefully implemented it can be useful for DR and HA purposes but it is probably not helpful for ETL purposes.

You still need properly managed backups and after-imaging. You may also still have licensing issues to consider.
 
Last edited:
There are (at least) two flavors of Pro2. The original trigger based approach and the more modern CDC based approach.

One weakness with the trigger based code is that it is unaware of changes that might be made by SQL connections. The CDC based version does not have this problem.

Generally speaking you use Pro2 for ETL purposes. In theory, you _could_ use it for whole database replication but that would be an awful lot of setup and ongoing maintenance.

Because Pro2 (and similar approaches) use many "queues" to perform the data transfer there can also be issues with synchronization. It is quite common for some queues to significantly lag behind others.

A nice "pro" for Pro2 is that the target database can be either an OpenEdge database or a SQL database.

Pro2 depends on RECIDs to track the data that has been replicated. This means that anything which results in new RECIDs (like a dump & load) will require you to start over again with a fresh "bulk load".





CDC has many nuances. Its purpose is to efficiently support ETL tasks rather than general purpose whole database replication. It is very good at efficiently identifying those changes.

As you mention, there is no pre-built set of tools to act on the change records that CDC creates. Several vendors offer solutions built on top of CDC or you can roll your own code.

CDC is better than triggers because it sees all changes, even SQL changes, and acts at a very low level.

CDC has several levels of detail and if you implement "maximum" you can potentially avoid the issue of needing to "re baseline" or "bulk load" your target. (The CDC based Pro2 uses the "minimum" level at this time.)





Using auditing for replication or change data capture is a stretch. Yes, you can sort of do it and yes there is at least one commercial product that offers such a capability but you're really abusing the tool at that point.

OpenEdge Auditing requires management of the audit logs. You need to archive and purge the audit data on a regular basis or you will very quickly be in a difficult position. We have had to deal with multiple cases where inattention to basic audit log maintenance resulted in databases where the vast majority of the data is in the audit areas. Among other things this means that your backups and restores are huge and take far too long. Purging that data after it becomes huge is also deeply painful. In one case we had to run the purges every night for almost six months to get things under control.

OE Audit logs are, by default, poorly indexed. The data itself is also not well organized for ETL or replication purposes. Interpreting and extracting the data is painful compared to CDC or trigger based solutions.

Pre-CDC, perverting OE Auditing had some attractions because, like CDC, it acts at a low level and sees SQL changes as well as 4gl changes.





If you need a hot spare for high availability or disaster recovery purposes OE Replication is what you want.

The entire database is replicated. The replication process occurs at a low level and is based on after-image notes. So it tracks all changes from both 4gl and SQL sources. OE Replication automatically handles schema changes - adding new tables, fields, and indexes "just happens".

There is, however, an unfortunately long list of maintenance tasks that still force you to "rebaseline" replication. On the bright side that list gets shorter and shorter with every release of OpenEdge.

OpenEdge Replication is all or nothing -- you are replicating the entire database in real time. You cannot exclude certain tables and you cannot delay transactions. This means that if a user accidentally deletes all of your orders that happens on both the source and on the target(s). So you still need to manage your after-image files and be prepared to restore and roll-forward. (This is true of *all* of these solutions but it is frighteningly common for people to somehow think that after-imaging does not matter any more if they are using OE Replication.)

If you have OpenEdge Replication Plus the target databases can be used as read-only connections for reporting or inquiry purposes taking some of the load off your production database.





Target db must *stay* down. If you open it you get to start over.

As I mentioned previously you are not exempted from needing a proper DR license. If you have OpenEdge installed on another server and you are rolling forward after-image files to have a warm spare then you need that license.



Obviously this is not real time.

It also requires someone to write some code, manage, and maintain the whole process.

The comparisons are probably going to need to look at a *lot* of data and unless your schema has been cleverly organized in advance you will be scanning the entire table every time (you are very unlikely to have a "last change" field helping you to determine where to start scanning).

The comparisons will take longer and longer as the database grows.

None the less - this might be workable for small databases with a modest number of tables and now need for real time extracts.

Thanks a lot Tom for this detailed explanation, really appreciate it!

My all doubts are clarified from your explanation above except one regarding DR license. Why DR license is required because DR is all about below three steps and doesn't require license (AFAIK):

1. Enable AI imaging, Archiver on source (With Enterprise Edition)
2. Transfer of AI files to DR server
3. Applying AI files on DR server manually or via some script
4. Done

Best Regards,
 

Cringer

ProgressTalk.com Moderator
Staff member
You need some sort of license for the database on the second machine. A DR license is what you need to do that. It will be cheaper than a full license.
 
Top