Can we do tabanalys while the DB is Up and Running

Reynold · Aug 28, 2018

Hello All -

I am little new to the DB admin tasks and I need your expert advice.
I need to count the Database Table Records for the production DB. So could anyone suggest whats the best way to do so.
I know using tabanalys we can get the record count of each table in the DB.

proutil sports2000 -C tabanalys

BUT the big question is .. will the tabanalys will work on the up and running DB and if yes do I get the correct results.
And If I need to run on the up and running DB will I need to modify the above command statement with some extra params...

could anyone please suggest, as I need some help little urgent....

Thank You ...

TomBascom · Aug 28, 2018

Yes, it will run.

The results are, of course, an approximation. Records may have been added or deleted as the counting occurs.

Why is it urgent? What problem do you think you can solve by knowing the record count?

jdpjamesp · Aug 29, 2018

proutil sports2000 -C tabanalys > tabanalys.txt
That will ensure the output goes to a file for later use.
Tabanalys is the best way of counting records in the database. It will have an impact on performance though. Whether it's noticeable or not to the users depends on a lot of factors though so we can't say, just beware.
Also, a slight word of warning, if it tries to read a corrupt block, then, like any other process, it will shutdown the DB. The thing is, it will read every block in the database unlike day to day operations, so if there's corruption you will know about it. Unlikely, but something to be aware of.

Rob Fitzpatrick · Aug 29, 2018

In general, it is helpful to tell us your Progress version when asking questions. Some options may not be available to you depending on your release.

Reynold said:
I need to count the Database Table Records for the production DB.

Be careful in production. While it is true that proutil tabanalys is read-only, i.e. it doesn't make any changes to the data stored in the database, you shouldn't assume you can run it any time you like. It does make changes to the state of an online database and to the system on which it runs.

The database has a cache called the buffer pool. Its size is specified by the -B parameter on the primary broker (and optionally also by the -B2 parameter, on 10.2B+). Whenever a client or utility accesses data, like a record or an index key, the database block on disk that contains that data is copied into a buffer in the buffer pool, where the needed data is extracted and provided to the requester. That block then remains in the buffer pool until it too is evicted to make space for another block. If another request is made for data in that block while it is in the buffer pool, then it can be accessed very quickly because it is already in memory; no disk I/O is required. The duration of a buffer pool access is measured in microseconds; a disk I/O could take many thousands of times longer and could be affected by many other factors.

All reads and writes go through the buffer pool, so both its size and its content are tremendously important to the overall performance of the connected applications. Its size is determined at startup by the DBA, via the -B and -B2 parameters. If it is too small, it won't be able to cache enough of the data useful to the application; the data will have to be re-read from disk frequently.

The content of the buffer pool changes constantly, based on demands from connected processes. The principle is that data that has been accessed recently is likely to be accessed again soon so it is useful to have it cached for faster access. (Computer science wonks call this "temporal locality of reference".

) Thus it is possible for a single client to temporarily "foul" the buffer pool by reading a lot of data into the buffer pool that will be of no use to any other client, and evicting the data that would be useful to them. Then their subsequent reads result in cache "misses" (their requested data is not in the buffer pool) and they have to wait longer for their data due to greater disk I/Os to resolve their queries.

This brings us back to proutil tabanalys. As a database client, the above applies to it as well. Most of the data blocks in your database are record blocks and a proutil dbanalys has to read all of them to produce its report. So they all get loaded into the buffer pool and, if it is full, an equivalent number of blocks have to be evicted from the buffer pool to make space for them. The larger the database is in relation to the size of the buffer pool, the greater the effect. This can be doubly problematic for other clients. First, as stated above, their data is now less likely to be in the buffer pool so they will likely need to wait for disk I/O. Second, the proutil is likely causing a lot of disk I/O while it is running, so those clients are having to do many more disk I/Os than usual just at the time when disk I/Os are most time-consuming.

The bottom line is that if your application requires low latency, e.g. if you have an SLA for response time, running a mass read like a proutil tabanalys or dbanalys can result in real impact to the business. This isn't theoretical, I've seen it happen. So depending on your situation, you may want to run your analysis at off-peak hours to minimize impact to production.

Reynold said:
And If I need to run on the up and running DB will I need to modify the above command statement with some extra params...

You can mitigate some of this impact with tuning but not all of it. The proutil analysis commands cause a lot of I/O by their nature; you can't avoid that (with one exception, below). But you can deal with the problem of buffer pool fouling.

The proutil tabanalys is a database client so you can specify a client startup parameter in its command line. Specifically, the useful parameter here is "private read-only buffers", or -Bp. It is documented in the Startup Command and Parameter Reference manual. It specifies the number of buffers in the buffer pool to be used only by this client for read-only access. For example:
proutil dbname -C tabanalys -Bp 10 > dbname.tba
This runs a table analysis against database "dbname" with 10 private read-only buffers, saving the report to the file dbname.tba. The impact to caching is that the utility reuses the same ten buffers for its reads, regardless of the size of the database. Here is some output from promon to illustrate the difference:

Code:

after proserve:

08/29/18        Status: Buffer Cache
03:43:43

Total buffers:                     3002
Hash table size:                   887
Used buffers:                      39   <---
Empty buffers:                     2963

after dbanalys:

08/29/18        Status: Buffer Cache
03:46:18

Total buffers:                     3002
Hash table size:                   887
Used buffers:                      1028   <---
Empty buffers:                     1974

after restart and dbanalys -Bp 10:

08/29/18        Status: Buffer Cache
03:48:54

Total buffers:                     3002
Hash table size:                   887
Used buffers:                      49   <---
Empty buffers:                     2953

After starting this sports database there were 39 used buffers in the buffer pool. After a dbanalys the number of buffers jumped to 1028. But after a database restart, a dbanalys -Bp 10 caused the used buffers to go from 39 to 49. The rest of the buffer pool wasn't touched.

Another way to limit the effect of an analysis command is that it can be run on a single storage area:
proutil dbname -C tabanalys area areaname -Bp 10 > dbname.tba
If you have a very large database and you only need record counts for some tables, this will reduce the amount of I/O done by the analysis.

Can we do tabanalys while the DB is Up and Running

Reynold

Member

TomBascom

Curmudgeon

jdpjamesp

ProgressTalk.com Moderator

Rob Fitzpatrick

ProgressTalk.com Sponsor