Comment Thoughts On Caching...

Rob Fitzpatrick

ProgressTalk.com Sponsor
I have an idea for a possible DB enhancement that's been rattling around in my head for a bit. I thought I'd write it here, as it was inspired by an idea from Tom Bascom, to solicit feedback from Tom and others. Also, writing it down forces me to organize my thoughts, so if it does have merit I'll have a better chance of writing a coherent enhancement request on the Community site.

I'll start with Tom's idea: a "tertiary buffer pool". I'll admit I didn't understand the concept (or rather, the utility of it) when I first heard it. But it eventually sunk in. I believe the idea can be stated as follows: when you read a huge table, especially historic data that isn't otherwise useful, say for a monthly or annual report, you're dragging all that RM and IX data through the primary buffer pool and evicting the data that is useful to the other DB clients, temporarily harming their future performance until the cache "warms up" again with their data. But if you had a way of specifying that reads from Huge Table X always (or maybe on demand, programmatically) go to a tertiary buffer pool, then the useful contents of the primary buffer pool wouldn't be flushed when you read big swaths of Huge Table X. However, on the down side, you would still have to manage this pool manually, you would have to know the behaviour of your application code/BI queries in advance, and you would have to pre-allocate some amount of memory for this buffer pool on startup even if you weren't going to use it often. So there are some challenges with this approach (if I understand it correctly). And one could argue that private read-only buffers could also address this use case.

That got me thinking about caching in general, and the contrast between caching techniques used in OSes and DBs. There are some similarities between the two, e.g. demand-based paging, the use of shared memory, the Windows kernel's modified page writer thread which is sort of analogous to an APW, etc. But there are obvious differences too. Modern OS memory managers use a lot of optimizations like pre-fetching. When an OS or an application starts, there are lots of files (drivers, libraries, executables, etc.) that are always needed from disk, and often in the same order, so their layout on disk can be optimized and they can be pre-fetched into memory so that by the time they are first accessed by the code it's a cache hit. DB read I/O is in some ways a lot less predictable, e.g. due to following different application code paths caused by code changes, data changes, user behaviour, or other factors the DB can't reliably predict or model.

All the server knows with certainty is what the client wants right now; pre-fetching (i.e. speculative reading) is hard to do well. If the client is in the fifth iteration of a FOR EACH customer block, he might want to read the sixth customer record next (and very soon), but he also might hit a RETURN, QUIT, PUBLISH, RUN, LEAVE, etc., and not want to read that sixth customer until an hour from now, or maybe never. And if the server did speculatively page in the next IX block in the bracket chosen for the query or the RM block containing the next rowid after the current one and the client never requested it then (a) that would have been an expensive wasted physical I/O to get it into the buffer pool, (b) it would result in a big increase in memory pressure, paging out valuable cache contents (as the other servers would be behaving the same way) and (c) it would take -B number of other unique buffer accesses until this probably-useless block and others like it were flushed out of the buffer pool.

Now imagine that a server processing a particular query for a given client can selectively tune its read behaviour, not unlike the self-tuning an APW does. If a client reads two records quickly in a FOR EACH or QUERY, it may or may not request the third and subsequent records equally quickly; the probability is relatively low, especially if it's an interactive client. By contrast, if it is a batch client that has barrelled through 999 records in a query, there is a much higher probability that it will equally quickly request the 1000th, 1001st, etc. So if the server can see this behaviour and start to pre-fetch potentially useful blocks, say reading the next IX block and the next RM block or two that are referenced by the rowids in that IX block, then maybe by the time the client requests those records they will already be in the buffer pool and it won't have to wait synchronously on those physical I/Os. If the server kept guessing correctly that a block would be requested, and it was (within some time frame), it could increment a "confidence score" about the usefulness of pre-fetching for that query, and maybe become more aggressive with it over time. And if it guessed wrong (as it must, as all queries eventually end), it could decrement or zero out the score and revert to demand-based paging for that client. I suppose the database would need a way of distinguishing speculatively-read blocks from demand-paged blocks in the buffer pool, and a way of determining the interval between a speculative block read and a data read from that block to score the block read as having been valuable. Obviously these comments about servers also apply to self-service clients acting as their own server.

Part of the problem, as I understand it, is that there are only two ways to access the LRU chain: you can start at the head (or MRU end) and traverse backwards, or start at the tail (or LRU end) and traverse forward. So a block that is added to the chain can only be added at the head, as happens today, or at the tail, which would be pointless as it would fall off almost immediately. But what if there were a third pointer into the chain, say 80% of the way (or whatever) toward the LRU end, where a speculatively-read (rather than demand-paged) block could be inserted into the chain? This would somewhat alleviate problem (c) above as the cost of putting a useless block into the primary buffer would be waiting for only 0.2 * -B unique block accesses for that block to be paged out. And the 80% most-recently accessed blocks (probably, depending on your use of -lruskips) would be unaffected.

I don't know how I/O is done in SQL Server or Oracle, and I don't imagine this is a novel idea (whether good or bad). But I have to think that there is, at least in theory, a more efficient way of caching data than to wait until the instant that the client requests it as OpenEdge does today.

Thoughts?
 

TheMadDBA

Active Member
Oh boy... Another chance to compare Oracle and OE :D

I am going to (mostly) leave Oracle Exadata out of this discussion... mostly because it is crazy complicated under the covers. But the high level idea is that the storage hardware and the database know quite a bit about each other and you will not find a more optimized DB/storage combination out there. If you have time take a look at some of their whitepapers on Exadata. The optimizations that just happen make the following all seem like D-Base.

For most Oracle installs you have 3 distinct buffer pools (DEFAULT, KEEP and RECYCLE) and people use them for what their name implies... but they all use the same storage/aging concepts. You also get a default buffer pool for each block size (the instance has a default block size but you can specify different block sizes by tablespace) but this is usually only used in the largest of large shops. In Oracle terms that means 50TB or larger with very high volumes of IO. In reality the vast majority of Oracle shops just set the default pool and forget about it (they also ignore most of the concepts below).

LRU/buffer pool processing is so much more efficient in Oracle than it is in OE, meaning the relative cost to get a block in or out of a pool is much lower than it is in OE. That alone solves a lot of issues. You also have many different data access methods in Oracle and each one interacts with the buffer pools differently. For each query the cost based optimizer (CBO) decides:
  • Which index or indexes should be used or if scanning a table/partition would be best
  • Which index access method should be used for each index
  • Can the query be satisfied by reading only index entries
  • What order any joins happen and how the joins should happen
  • If the query should be optimized to retrieve the first set of rows or the entire result set
All of those decisions can be altered based on configuration options, the current workload of your hardware, the distribution of the data and any hints passed to the query.

Based on all of that you get an access path for each operation. Each access path has rules to decide if using the buffer pool is a good idea or not. If you are scanning a lot of data it will just read from disk (usually with read ahead and large block sizes) and bypass the buffer pool entirely.

You can also control the caching behavior at an index, table or LOB level. In most cases I set the LOBs so they aren't cached in the buffer pools but obviously that is application specific.

You also have the option of result set caching where the query isn't actually performed again and the results are just pulled from memory. I turn this on for semi-static tables like states, countries, etc.

SQL Server and DB2 have many of the same concepts as Oracle but aren't quite as sophisticated.

TL;DR - In short Oracle/SQL are light years ahead of OE as far as IO and caching go. OE needs to work on the performance of the database in general and embrace multiple access methods in the 4GL like they do on the SQL side.

One of these years I need to do a session at the PUG about how awesome Oracle is :p
 

TomBascom

Curmudgeon
Unless there is a monumental change in market fortunes I doubt that we will ever see parity with Oracle. OTOH it is nice to see what Oracle can do and to think about how some of that stuff could be handy to OE.

My thinking was that a 3rd buffer pool would be relatively simple to implement and easy for a DBA to understand the use of.

Actually I'd like to see an arbitrary number of buffer pools and the ability to assign various policies to them. I think I'd probably refer to the policy that Rob mentions me suggesting as "sequential reads". Another policy that I think would be a good one is "read only". These sorts of things strike me as being stuff that Progress *could* find resources to implement and which would benefit a lot of users. Whereas tight integration with various HW seems very narrowly focused and rather unlikely.
 

TheMadDBA

Active Member
I agree that OE will never catch up with Oracle across the board. That ship sailed back in the 90s. None of what I described above are Exadata features btw... that all comes with any version of Oracle that runs on the same hardware that OE does.

It is a crying shame that quite a few of the OE DB team seem to have their heads in the sand about Oracle/SQL Server/etc though. So many ideas to steal. It would prevent a lot of those annoying conversations where they claim feature X is impossible.. only to be told it exists in most other databases. The whole table partitioning debacle could have been prevented by a simple read of the Oracle tuning guide... or by listening to certain people who assured them pre-development that performance would not improve with their design.

As far as the buffer pool conversation goes... I wouldn't mind seeing another pool available if that was our only option. A table/index would still need to live in one pool or the other regardless of the kind of access. Otherwise things get complicated and/or you waste buffers.

At some point they just need to bite the bullet and embrace the fact that different types of queries benefit from different types of IO and buffering methods. Their bolt on SQL engine should not be able to out perform the ABL. Especially when that SQL engine is ancient in comparison to other databases.
 

TheMadDBA

Active Member
I guess it depends on your definition :p

I consider it a debacle because it was presented as a method to huge performance gains (to my company at the time) and an easy way to do online maintenance for multi TB databases. There was some lively debate about their approach and why it would not provide the benefits they claimed... and we ended up with a function with no real world performance gains and the inability to manage the partitions online (in the common usage of online).

It may have some limited value for certain environments. But compared to the partitioning of other databases it is a very pale shadow of what could have been.
 

Rob Fitzpatrick

ProgressTalk.com Sponsor
My annoyance was that in one of this year's PUG sessions its use was pitched as being "transparent to the application". It clearly is not.
 

Chris Hughes

ProgressTalk.com Sponsor
I'd like to see a few more buffer pools - I'd agree with Tom just a few more of what we have say b2 to b9, shouldn't be too hard to do.

I must admit to still assigning the alternate buffer pool at area level - just find it easier to manage this way.

The only other "feature" enhancement I would really like to see is being able to dedicate a buffer to the SQL engine, we have so many customers that use their own reporting / auto-SQL generating tools (that don't do a great job).

Slightly off track from your original post I also think the main buffer should start with its LRU chain disabled, I raised this at PUG europe last year but didn't get a very welcoming response - I think there was an obsession at the time with "how big" is your database and clearly quite a few folk that have >TB DB's these days - I still work quite often with sub 5GB DB's

Cheers
 
Top