AI Performance

ron

Member
We've just moved from:

8.3E, 2K db blocksize ... no AI ... no VST

to:

9.1D, 8K db blocksize ... WITH AI ... WITH VST (incl table stats)

... and performance has taken a very sharp hit (batch jobs take 4 times longer). We're poking-into everything to see if we can improve things - but I thought there might be someone "out there" who would like to comment on our situation.

We are running on a Sun V480 with Solaris 8. We have 16GB memory and the DB is on a RAID 1+0 (s/w) array spread over eight discs (+8 for the mirror). There are six DBs - but one dominates, which is 110 GB.

BI uses a 16K blocksize - and is on a solo disc (mirrored).
AI is also 16K blocksize - also on a solo disc (mirrored).

We've set-up variable AI extents - because fixed ones (in testing) caused a space problem because they attained a HWM.

Below is a typical iostat display when batch work is running. md14 is a single disc for AI; md17 is another single disc for BI -- and md23 is the DB (8 discs).

Clearly the AI %b is a worry, at 49%. During batch work it hovers between about 45% and 60%.

I have three particular questions: (1) Is it normal for i/o activity for AI to be so much higher than for BI? I would have expected them to be closer. (2) I know there is a performance penalty for using variable AI extents vs fixed extents. Can anyone quantify this? Is it a minor, or MAJOR concern? (3) All references indicate no performance for having VST. Is this true?

Thanks to anyone who can shed light on this matter ...
Ron.


extended device statistics tty cpu
device r/s w/s kr/s kw/s wait actv svc_t %w %b tin tout us sy wt id
md0 0.0 1.3 0.0 8.8 0.0 0.0 24.2 0 2 0 61 19 5 15 61
md14 0.0 71.4 0.0 570.6 0.0 0.5 7.0 0 49
md17 0.0 25.1 0.0 200.1 0.0 0.1 5.8 0 14
md23 22.5 51.5 180.5 415.9 0.1 1.0 14.0 4 23

Here are the startup parameters for the major db:

-B 22500 # Blocks in Database buffers
-L 200000 # number-locks
-n 150 # Users
-spin 5000
-rr
-basetable 1
-tablerangesize 307
-baseindex 1
-indexrangesize 1093
-bibufs 30
 
1. Yes if it has to extend the AI each time. The process of extending AI is a significant overhead to the actual write of an AI.

2. The performance penalty will depend on disk bandwidth and utilisation, but will always be there. Whether or not it is significant will depend on the app etc

3. Nearly all VSTs carry no performance hit. However the index range and table range VSTs are enabled unless you specify so I would expect that there is little or no penalty now.

Comments on tuning (if I may!)
175MB Buffers for a 110GB Db seems low - What is the buffer hit rate?
-L 200000 - OUCH
-spin 5000 - I would double this at least
-rr is a client session parameter - No good on the server
-bibufs 30 - Checkpoints? BI Cluster size?

-aibufs - If ai is a problem then this may help
-Mf may help as well but I wouldn't be sure.

Try reducing your BI and AI block size to reduce the size of partial writes (if partial writes are being incurred).

Patch 9.1D06 is out for Solaris and may contain some happiness for you.

Also - You haven't mentioned -S for servers, so there will probably be a -T on the clients - Where is this going?
 

ron

Member
toby.Harman said:
Patch 9.1D06 is out for Solaris and may contain some happiness for you.

Patch 9.1D06 contained MUCH HAPPINESS !! Performance has now improved by a major degree.

We are back to experimenting with AI. We tried Variable AI extents and PROVED that the performance was slugged badly. But it did give us very trim archived copies of the AI extents.

Now we're trying Fixed extents. Performance is substantially better ... but is giving us major problems with disc space for archived copies.

Sometimes it's hard to win!
 
I am happy that the fundamental problem has gone away and you can now concentrate on the more trivial issues (like Disk utilisation!) :)

Any comments on any of the other settings which you would like to share or is everything so perfect now that you want to just leave it all alone?!
 

ron

Member
Norman Biggar said:
Ron,
Obvious question for someone starting off with After Imaging - Are you using an AIW?

Yes, Norman, AIW is loaded ... and 30 aibufs (same as bibufs).

Now that we've applied 9.1D06, performance has improved by a major degree. Using Fixed AI extents, AI no longer has much performance impact ... but it gives us troubles with the sizes of the archived extents. We're in "think mode" trying to solve this.

Thanks for your input.
Ron.
(PS: Where's Motherwell?)
 
Ron,

Apart from compressing your archived AI's and deleting old files once you've done your backup (whether full or incremental), the only obvious solution is to get more disk space.

As for where Motherwell is, it's an ex-steel producing town 15 miles south-east of Glasgow, Scotland. I don't think you'd want to go there on holiday!
 

ron

Member
Norman Biggar said:
Ron,

Apart from compressing your archived AI's and deleting old files once you've done your backup (whether full or incremental), the only obvious solution is to get more disk space.

As for where Motherwell is, it's an ex-steel producing town 15 miles south-east of Glasgow, Scotland. I don't think you'd want to go there on holiday!

Oh, they're compressed alright ... but we have 5 x 500 MB Fixed extents ... because batch activity can generate a "huge" flow of AI. Trouble is when we swap a not-full extent on a time-schedule ... the extent still has a whole lot of trash still in it from when it was last filled-up. Thus when it's compressed it ends-up being a "big" file.

When we tried using Variable extents the total space occupied by compressed archived AI extents was about 300 MB/day (which is "good"). Now - using Fixed extents - we end-up with 2.3 GB/day (which is "NOT good"). It's the same "effective" data - but just consumes seven times the amount of disc space.

We're considering shutting-down the eight databases at (say) 4.30am each day and re-creating all the AI extents to "clear them" so that they will compress properly. But no decision's been made yet ... we're still contemplating.

Ron :drink:
 
I always thought the "rfutil aimage empty" command would delete the old information. We do this after we've backed up the extent.
confused.gif
 

ron

Member
Norman Biggar said:
I always thought the "rfutil aimage empty" command would delete the old information. We do this after we've backed up the extent.
confused.gif

No it doesn't "delete" it at all, it just hangs around to fill-up my file system!

aimage empty just "logically" resets the extent ... but leaves all the data in situ. If the Fixed extent has been recently created then the end of the extent will be blank ... and it compresses into nothing. But after a day or two of activity nearly all the extents have become full (or substantially full) at some time ... so every time they get compressed the old junk gets compressed along with the "good" data in the front of the file.

(I hope that makes sense!)

Ron.
 

ron

Member
toby.Harman said:
I am happy that the fundamental problem has gone away and you can now concentrate on the more trivial issues (like Disk utilisation!) :)

Any comments on any of the other settings which you would like to share or is everything so perfect now that you want to just leave it all alone?!

I would NEVER claim that anything about maintaining a big database was "perfect"!

Now that we're collecting lots of statistics (VST's, iostat, vmstat, etc), were having a play with a few things. -B is 27500. We've tried substantially increasing it ... and there was no change to anything. Our buffer hit ratio very rarely goes below 100:1 ... and often stays well over 500:1 ... so I guess it's no surprise that increasing -B had no effect. -aibufs/-bibufs are both 30. Next week we'll be seeing what happens if we bump them both up - say, to 50. We'll also be trying reducing the AI/BI blocksize to 8K (it's 16K, now). But the incidence of partial writes is rather low - so I doubt that we'll see any difference with that, either.

I'll let you know the outcome. (It will take us about a month - because there are other things on the boil.)

Ron.
 
If compressing the archived AI's isn't the answer, how about taking more regular online incremental backups.

I would expect that they would only contain the changes and (once verified) would allow you to "lose" some of your archived after image files.
 

Corwin

New Member
From previous posts you are definitely doing "time" based archives instead of "full"....

Time: Every x minutes you say "move to next ai extent".
Full: Every x minutes you check to see if you have any full AI extents.

I like using the Full method for a couple of reasons:

1) No wasted space! You don't get a lot of copies of partially filled AIs.
2) There was a bug at one time in marking AIs full on the fly. It was very infrequent, but if you do them enough times it would happen (v8.3, Solaris, don't have the rest of the details, was awhile ago...).

So think about:
1) Using lots of small fixed length AI extents, so that at peek volume you get 2-3 extents filled in "x" minutes (15?).
2) Have your process check for full ones (iteratively) rather than just marking them full.
3) Still use a "Time" based method for off hours (low volume work being done in the middle of the night is an example).

This should not change your performance, or recovery abilty, and should decrease your overall size.

Have fun thinking about it!
 

ron

Member
Corwin said:
From previous posts you are definitely doing "time" based archives instead of "full"....

Time: Every x minutes you say "move to next ai extent".
Full: Every x minutes you check to see if you have any full AI extents.

I like using the Full method for a couple of reasons:

1) No wasted space! You don't get a lot of copies of partially filled AIs.
2) There was a bug at one time in marking AIs full on the fly. It was very infrequent, but if you do them enough times it would happen (v8.3, Solaris, don't have the rest of the details, was awhile ago...).

So think about:
1) Using lots of small fixed length AI extents, so that at peek volume you get 2-3 extents filled in "x" minutes (15?).
2) Have your process check for full ones (iteratively) rather than just marking them full.
3) Still use a "Time" based method for off hours (low volume work being done in the middle of the night is an example).

This should not change your performance, or recovery abilty, and should decrease your overall size.

Have fun thinking about it!

We have other constraints that I didn't mention ... SORRY!

There are eight databases, six of which "belong together" and have to be synchronised. We have chosen to swap AI files on a time basis because (to within about 2 seconds) we can swap all databases together. Also - we will take the daily full backup off the "warm standby" system and when that happens the state of all databases must be at a known point.

Also - because the database size is rather large (now 117 GB) - and split over eight DBs - and using several AI extents for each DB ... we are having trouble with the number of open files. We had to reduce the number of AI extents per DB from 10 to 5 two weeks ago to get us out of trouble.

Having multiple DBs is ... to be quite honest ... a pain in the a?se ... and we hope that in the not too distant future we can coalesce the six "related" DBs into one. That would allow us to, maybe, do something along the lines you have suggested, Corwin. But for now, swapping at particular times seems to be unavoidable. (NB: we also check for Full extents in-between the fixed time slots ... in case the rate of generation of AI data becomes very high ... as it does during batch jobs sometimes.)

Ron. :drink:
 

ron

Member
Norman Biggar said:
If compressing the archived AI's isn't the answer, how about taking more regular online incremental backups.

I would expect that they would only contain the changes and (once verified) would allow you to "lose" some of your archived after image files.

The AI is used for (a) the "general" security of being able to recover the databases from a disaster, and (b) to update a warm spare (the "Backup" server).

If we took incremental backup then the backup for each of our eight DBs would represent a different time. We need then to be sync'd.

Ron. :drink:
 
ron said:
We have other constraints that I didn't mention ... SORRY!

Also - because the database size is rather large (now 117 GB) - and split over eight DBs - and using several AI extents for each DB ... we are having trouble with the number of open files. We had to reduce the number of AI extents per DB from 10 to 5 two weeks ago to get us out of trouble.

I don't know if you had time to review it but one way to reduce file handles is R-Code Libraries.......

They take a bit of planning and implementation, but they can significantly improve this sort of situation, as well allowing shared libraries to reduce the memory usage by clients....

If your buffer Hit rate is 100:1 then that's not too foul. 500:1 is substantially nicer though! We had a problem at my last place of work with huge jobs that handled masses of data, and the only way to improve the buffer hit rate was to allocate enough memory to hold all the data we needed at one time to the database. We hit a "knee" where we would get buffer hits around the level you are. By increasing the memory allocation by several 100 percent we were able to get it up to about 500:1 and hold it there. THe hardware cost was phenomenal, but jobs that took 24 hours to run reduced to 8 hours and that was the business requirement. We solved the problem another way by splitting the database into 10 smaller databases (each holding data that would be processed "together") but you probably don't want to hear about that given your problems with coalescing data! :D
 

ron

Member
toby.Harman said:
I don't know if you had time to review it but one way to reduce file handles is R-Code Libraries.......

They take a bit of planning and implementation, but they can significantly improve this sort of situation, as well allowing shared libraries to reduce the memory usage by clients....

If your buffer Hit rate is 100:1 then that's not too foul. 500:1 is substantially nicer though! We had a problem at my last place of work with huge jobs that handled masses of data, and the only way to improve the buffer hit rate was to allocate enough memory to hold all the data we needed at one time to the database. We hit a "knee" where we would get buffer hits around the level you are. By increasing the memory allocation by several 100 percent we were able to get it up to about 500:1 and hold it there. THe hardware cost was phenomenal, but jobs that took 24 hours to run reduced to 8 hours and that was the business requirement. We solved the problem another way by splitting the database into 10 smaller databases (each holding data that would be processed "together") but you probably don't want to hear about that given your problems with coalescing data! :D

R-code libraries are not used at present ... but the issue is "on the list" ... mainly because of the memory-mapped feature.

I won't argue the point that splitting work across multiple databases can increase throughput. I can see several reasons why using multiple databases could be beneficial. But we have a package - and we have only two options: use six databases or merge them into one. With six databases, one of them (our recently gathered statistics tell us) causes an average of 94% of all disc i/o. Another causes 5% - and the remaining 1% is due to a third database. Three databases are, effectively, dormant. So, in our case, I expect that joining the six DBs into one would have just the tiniest impact on performance ... if any. But we'll see!
 
The only reason I brought up splitting the databases is that we were able to significantly reduce the memory footprint by making sure that the "right" data was grouped together.

One of the really useful features that Oracle provides is the ability to have data stored in particular areas based on data in the tables. e.g. Customers with account codes from 1 to 100000 go in area 1 and 100001 to 200000 goes to area 2. This can reduce the number of blocks required to read the data you specifically want and thus reduce the -B required.

In your case it might be more appropriate to say "Put all the bills for customers who have their bill generated between the 1st and the 5th in area 1 ..." etc....

Some of the v10 stuff looks hopeful for this!
 
Top