purge and resize db

parttime Admin

New Member
Hi Everyone,

I‘m an openedge admin in the making
Progress 11.7.5, Runtime Version
Db disk-size around 700GB (663 GB according to promon), lots of historic Data (running since 15 yrs)

Recent Database Block Usage Summary
Active blocks: 170322260
Data blocks: 170319324
Free blocks: 2936
Empty blocks: 3545172
Extent blocks: 46
Total blocks: 173867432

Recently upgraded from Progress 10.something and spent dozens of hours testing, restoring, copying.
I’d like to purge old data to a) clean up the application and b) being able to subsequently resize the db.
This should speed up backup, restore, copy, idxrebuild and every other db administration task and save my nerves
The data to delete is spread all over the storage areas and all large tables are affected

That’s my layman’s plan
  • Stepwise delete data from within the application (no other access possible, will take weeks anyway)
    Check prostrct db statistics: total Blocks/Area should grow at usual rate, empty Blocks should become significantly more
  • When done deleting:
    Check Percentage of empty Blocks/totalBlocks for each area (single digit at the moment , should rise to appr. 30-40% dep. on how recent data mgmt allows to delete )
  • Create new .st file with smaller/fewer extents in \\testserver\dbfdir
    Size of fixed extents for each area should be around Blocksize x Active Blocks * 1.01
    add 1 variable extent/area
  • Restore recent production.backup on Testserver with new .st file
  • Maybe run idxrebuild (~ 2h) ?
  • Roll forward .ai Files since last BU from production-db until test-db is roughly up to date (few minutes lag won‘t harm)
  • Stop both dbs (test and production)
    this hurts! 24/7 operation with a pain limit of 30 min max. downtime during nighttime
  • Copy db files from \\testserver\dbfdir to \\production\dbfdir (@approx. 500 MB/s -> current DB ca. 22 min, „new“ smaller DB quicker)
  • Start production db and subsequent services
  • Enable aiarchiver again
My question now :
is this a reasonable approach or should i consider alternatives like dump&load,..??

Thank you!

btw here’s a promon snapshot

1703687738632.png
 
Sorry, I do not think it will work...
The backup will restore each records in its current block number (it does not eliminate the empty blocks)
You will have the same number of blocks in each areas (even if you could use different extent size or number ).

If you run an idxbuild you will lost the ability to rollfowrward

You should consider alternatives like dump&load,PRO2, ...
 
As Patrice says - this won't work as you wish. When you delete data you also probably won't create much in the way of "free" and "empty" blocks so you are not going to see much evidence in prostrct db statistics either.

To reclaim disk space you will need to dump & reload. And if you are going to do that you should invest some time in making sure that the new db structure makes sense for your actual data. A lot has probably changed in 15 years so revisiting what tables are assigned to which storage areas and with what settings (rows per block, blocks per cluster, create & toss limits) will result in a much better outcome.

I suggest that you download ProTop, and run the "dbanalys review & reports" (control-r). This will give you lots of food for thought regarding changes you might want to consider.
 
Your table and LOB data lives in record blocks (sometimes called "RM blocks"). Once a block is formatted as a record block, it remains a record block. Even if you delete all the record fragments it contains, while the the record block is in a sense logically empty, its block type does not change. It does not become an empty block. Empty blocks are found above the high water mark of an area, so they are separate from the region in the area where your data currently resides.

Also, purging should be approached carefully, e.g. with a utility provided by the application developers, that will preserve referential integrity, so you do not inadvertently create orphan records and prevent the application from working as users expect.

If you are relatively new to OpenEdge database administration, it's a good idea to review the basics. This is an excellent overview from Tom Bascom on how to optimize your storage areas:
http://download.psdn.com/media/exchange_online_2010/1004TomBascom.wmv

There are also many threads here with advice on how to design your structure for a dump and load. Here is one example:
https://www.progresstalk.com/threads/dump-and-reload-can-i-reduce-the-fixed-database-size.172269
 
Hi everyone,

Great food for thought, thank y'all. I really appreciate your expertise and willingness to help
  • Size and number of extents are up to date, reflecting the application structure and following manufacturers guidelines during last major upgrade to 10.7
  • Purging is done by means of the application, so full referential integrity is ensured. That's the reason it takes so long btw.
  • Dump&load is the way to go -> I got this. Will be next year's challenge, looking forward to maaany test runs and learning a lot
  • My main concern is downtime when taking that final step to production:
    You think the last 5 steps of my original post are reasonable ?
..edit: damn...
1703757544974.png
 
Last edited:
You might want to share who the manufacturer is. Someone may be familiar with them and have specific advice. In any case, OEM provided storage area configurations are frequently "substantially less than optimal".

Is "10.7" the application version number? (That's not a valid OpenEdge version...)

Required downtime is mostly a function of available hardware. You have less than a terabyte of data so on "reasonable" hardware the process should take an experienced OpenEdge DBA under 12 hours. On the other hand, if your server is a bean-counter special it might take weeks.
 
You should also think real hard about the value you are going to get by deleting data. Once deleted, it is no longer available for query and reporting ... you are losing information. Disk is cheap.
 
Data _can_ be a liability. Keeping it beyond legally mandated record retention periods can expose you to needing to provide it during legal "discovery" processes. Which can be fairly onerous in terms of work to be done to extract it, review it, and prepare it for whoever wants it. Whereas if you follow an established data retention policy that purges old data you can avoid all of that.
 
Data _can_ be a liability. Keeping it beyond legally mandated record retention periods can expose you to needing to provide it during legal "discovery" processes. Which can be fairly onerous in terms of work to be done to extract it, review it, and prepare it for whoever wants it. Whereas if you follow an established data retention policy that purges old data you can avoid all of that.
... and if you do have such a policy to follow - make sure to destroy the old backups.
 
I’d like to purge old data to a) clean up the application and b) being able to subsequently resize the db.
This should speed up backup, restore, copy, idxrebuild and every other db administration task and save my nerves
Data. There are only two types of Data. Data you have, and Data you don't have. There is no old.

Issues regarding "speed" are relative to database size, but that's not the only factor. Everyone here is going with the assumption that database size is the bottleneck.

How long does it take to take a database backup of this 700GB database ?
 
Issues regarding "speed" are relative to database size, but that's not the only factor. Everyone here is going with the assumption that database size is the bottleneck.

The original post says:

This should speed up backup, restore, copy, idxrebuild and every other db administration task and save my nerves

True, the concern is the speed of those tasks and there are ways to improve that stuff which do not involve purging data but I actually found this rationale to be a refreshingly solid set of reasons for wanting to purge. The more typical reasons usually boil down to either "we have always done it, so it must be important" or "it will obviously improve OLTP performance" (in some unspecified and unlikely to be true way). So it is a nice change of pace to see someone having reasonable reasons for wanting to purge their data.

FWIW, I'm less sold on "save my nerves". I don't see how making the database smaller would help with that.
 
Hello,
  • Old Version was 10.2B (sorry)
  • Daily full backup takes 90 min (online) WIN19,130GB RAM, 12*2 cores at 3 Mhz , VMWare7.1 (no system expert though)
  • I keep an "archive instance" of db and Application on a dedicated server with all data from day one to end of 2023.
  • However data retention could become a potential problem, being in the EU and handling personal information.
    That's not my main concern at the moment though, but an argument nevertheless (no clear legal policy right now in my country)
  • Concerning my nerves, let me put it this way
    "...This should speed up backup, ...TRAINING RUNS and therefore skill-building and thus save my nerves in the long run".
  • 12 hrs downtime is not possible. Like really not.

    Would deleting and "emptying data blocks" at least free space for future growth? Better than nothing?
    System is ever growing anyhow, as more recent data is stored than the amount of historic data, that could safely be deleted, because our business is growing.
 
I don't think you have described your database structure in detail yet, e.g. whether your application data is in Type 1 or Type 2 storage areas. This is relevant for this discussion, as they do space management differently.

What is the issue that makes you set on freeing space for growth? Even if you delete whatever "could safely be deleted", that's a one-time gain. It might slightly postpone having to address the problem but it isn't a solution. A business accumulates data and a growing business accumulates data at a growing rate. Your employer eventually needs to make it a business priority to address that fact head-on. As Tom said, better hardware can yield improved performance even without deleting any data. A platform upgrade may be a better use of time and money than purging data.

Alternatively, if you are looking for an efficient way to purge large amounts of historic data from certain fast-growing tables, you may want to look into the Table Partitioning add-on product licence for the OpenEdge Enterprise RDBMS. There is a learning curve but it may be suitable for your needs.

https://www.progress.com/openedge/components/table-partitioning
https://docs.progress.com/bundle/openedge-database-table-partitioning-117/page/Introducing-Table-Partitioning.html
 
  • Type II
  • safely means old enough to no longer be required in current business, not in technical sense
  • huge 1st time purge would be followed by regular incremental purges to keep only data up to a certain age.
  • Concerning growth, i would reason: delete 30-40% of total data (acquired more than 10yrs ago over the course of 10yrs)->
    at least make room for the next 5,6, .. years.
    or will "empty" (recordblocks with no content) blocks stay empty forever and new growth will only fill the last extent requiring new blocks?
 
Last edited:
The space in a block freed by deleting records will be reused as new records are added (and if old records grow).

For index blocks you need to run idxcompact to reclaim the space. (That can be done online.)

There are many ways to dump and load. Some methods can greatly reduce the required downtime. A lot depends on your resources and your appetite for adventure/complexity/expense… my 12 hour comment was a worst case estimate for a straight-forward standard d&l.
 
If you like adventure, complexity (not too much) , ...
you could have a look at Nectar's breakout session at the last PUG on Online Dump-load .
(The ZERO downtime Production Dump and load - PUG Challenge)

It will allow you to:
- Dump-Load on line
- Change your RPB for each table
- have small downtimes to do it

There is some tricks to reclaim free space .

There is another step to add to his his breakout session.
This next step could allow you :
- to reclaim the space for your futur deletion plane (with small downtime but you will have to realign your backup).
- decrease the elapse time to "reorganize" your DB for the next years


Patrice
 
Just be aware that that technique requires the horizontal table partitioning add-on and that could create problems with older code if the old code uses RECID in inappropriate ways (almost all usage of RECID is inappropriate).
 
Back
Top