[Progress Communities] [Progress OpenEdge ABL] Forum Post: RE: Question about the performance of ABL queries resolved by multiple indexes

Simon L. Prinsloo · Mar 20, 2020

Hi dbeavon Sorry, I do not know where "Rought" came from. It should have been "roughly". In any case, in my example, the equality match on the doc-type selected the index on doc-type, but the match on inv-date is a range match, i.e. >= the first day of the fiscal month and <= the last day of the fiscal month. So the compiler ignored that index. The important thing to note here is that only indexes where all fields are used in equality matches are combined. This means that it only loaded the index pages for doc-type. That has no information about inv-date, so in order to resolve the second part of the filter, the db engine will read each record in the doc-type index satisfying the equality match and compare the inv-date to the selected range to determine if it needs to be returned to the client. So not only 98% of the index had to be read into memory, but 98% of the data had to be read as well, even though only a small subset was returned to the client. I cannot remember the actual number of records or the specs of the hardware, as this was over 15 years ago. But yes, the volume was sufficient to flush all records from the buffers and consequently slow down all other processes as well. The point is that indexing a single field that would normally be accessed with a range match, like a date, will be pointless in cases where there will always be some other indexed fields involved in equality matches as well, as the ranged index will be ignored. So given a single index on company, a single index on fiscal year and a single index on week-of-year, when reporting on a quarter, only the first two will be combined, forcing the db engine to read a full year's worth of data for a given company. It will then discard 75% of the records based on the fact that they are from other quarters and return only the 25% that match. In short: When the equality matches bracket a relatively low number of records in each index, combining indexes are great. On the other hand, when the equality match will bracket a significantly large amount of data, the index will mostly work against you. Consider this scenario: a client transaction table contains the following keys fields which are each on its own index: company-code, client-code, sku there is also an index on fiscal-year, fiscal-month There are four companies: A, B, C, D They generate data in the table at a ratio of 5:2:2:1 Company A generates 1,000,000 transactions in a month. There are 200,000 customers generating an average of 10 transactions each in a month. There are 10,000 SKUs generating roughly 200 transactions each in a month. There are 2,000,000 transactions in a month. The database contains five years worth of data. Working with the indexes where the data distribution is high: An equality match on customer will match 600 index entries. An equality match on sku will match 12,000 index entries. Finding out when a given customer bought a given item will result in a hash join that will thus work with two result sets of 600 and 12,000. This is reasonably efficient, given that there are 60,000,000 records. If we generate a statement for the customer for a given quarter, the range match on the period will exclude that index. The db engine will bracket over 600 records. Each one will be read, the fiscal-year and fiscal-month on it will be inspected and 30 will be returned while 570 will be discarded. This is quite acceptable if you produce the statement for one customer. This will ultimately read all the data, but return only 6,000,000 records (5%), if you produce a quarterly statement for all customers. Working with the indexes where the data distribution is low: An equality match on company D will match 12,000,000 index entries. An equality match on company A will match 60,000,000 index entries. An equality match on fiscal-year and fiscal-month will match 2,000,000 index entries. Reading the transactions for company A for a given month, will result in a hash join between a result set of 60,000,000 entries and one of 2,000,000. That is a lot of work. Generating a quarterly report for company A will cause a range match on the period, so the index will be ignored. Therefore 60,000,000 records will be retrieved and inspected, 3,000,000 will be returned and 57,000,000 will be ignored. That is where humans come in. Just looking at the X-REF for the above cases, the compiler's choices might look good, but if you know something about the data distribution in your database you will spot many of the problems even without an X-REF. Most of the time you do not even need to investigate the actual data, as your gut will give you a good feeling about distribution. For example, given that you only have few users in the cash office, the data distribution on the user-id in the receipt table will be low. Or given the fact that most sales comes in through the web, the sales rep on most orders will be "WWW". As you pointed out, in your case there is most likely no use case to read data for a fiscal month without the context of a fiscal year, neither is there one for reading a company's data without at least some sort of period associated. I also foresee very little use for reading a period's data without the context of a company, and even if you need that, there are normally quite efficient ways to do that, as you will see below. For instance, if your index is for company, fiscal-year and week-of-year and you want to produce a summary group report, the following will be much more efficient than joining two different indexes, given that it is much simpler to read a very small table than to hash join two or three very large result sets: FOR EACH company NO-LOCK, EACH trans NO-LOCK WHERE trans.company-code EQ company.company-cde AND trans.fixcal-year EQ x AND trans.week-of-year EQ y: Better yet, reading data for a quarter can use the full index, because the bracket is on the last item.

Continue reading...

[Progress Communities] [Progress OpenEdge ABL] Forum Post: RE: Question about the performance of ABL queries resolved by multiple indexes

Simon L. Prinsloo

Guest