Index Questions

KMoody

Member
Version: OpenEdge 11.7.5.00

My table INVHIST has the following indexes:
Code:
Table: INVHIST

Flags Index Name              St Area Cnt Field Name
----- ----------------------- ------- --- ----------------------
      ACT-DATE-PART-LINE      10        6
                                          + ACT-DATE
                                          + ACT-TIME
                                          + PART-NUM
                                          + CTO-CODE
                                          + ORDER-NUM
                                          + LINE-NUM


      ACT-INV-CTRL-PN-CTO     10        4
                                          + ACT-DATE
                                          + INV-CONTROL
                                          + PART-NUM
                                          + CTO-CODE

I ran the following queries against INVHIST:

Query 1:
Code:
do curDate = startDate to endDate:
    FOR EACH bINVHIST WHERE bINVHIST.act-date = curDate
    AND bINVHIST.inv-control = '9' NO-LOCK:
        ....

Query 2:
Code:
FOR EACH bINVHIST WHERE bINVHIST.act-date >= startDate
            and bINVHIST.act-date <= endDate
            AND bINVHIST.inv-control = '9' NO-LOCK:
        ....

Query 3:
Code:
FOR EACH bINVHIST WHERE bINVHIST.act-date >= startDate
            and bINVHIST.act-date <= endDate
            AND bINVHIST.inv-control = '9' NO-LOCK USE-INDEX ACT-INV-CTRL-PN-CTO:
        ....

My results:
  1. Query 1 was the fastest and automatically chose ACT-INV-CTRL-PN-CTO as its index.
  2. Query 2 was considerably slower and chose ACT-DATE-PART-LINE as its index.
  3. Query 3 was even slower than Query 2 and chose ACT-INV-CTRL-PN-CTO, the index I forced it to use.
My questions:
  1. Why didn't Query 2 use ACT-INV-CTRL-PN-CTO? We were under the impression that Progress chooses indexes by matching as many query fields to index fields as it can, and ACT-INV-CTRL-PN-CTO is a closer match than ACT-DATE-PART-LINE. Is it because Query 2 looked at a range of dates?
  2. Why was Query 3 so slow? Why wouldn't Progress for each ACT-DATE in that range quickly find the appropriate INV-CONTROL records to match our query?
  3. Would Query 3 have been faster if we flipped the order of the first two fields in index ACT-INV-CTRL-PN-CTO? If INV-CONTROL was first and ACT-DATE was second, would it be more efficient?
 
I don't use elapsed time as a measure of query efficiency as it can be inconsistent, even for consecutive runs of the same query. And it can be influenced by factors that are external to your code, including database cache contents, file system cache contents, system workload, I/O workload, virtualization host workload, client configuration, etc.

I suggest you look at table and index access statistics (from _tablestat/_indexstat), as they will not be affected by the factors I mentioned. Querying these tables is straightforward; it won't take long to write the code. Or you can let ProTop do it for you. :) Using programmer mode (Ctrl-p) is particularly useful in this case. Let me know if you have questions about that.

You can also obtain useful query diagnostic information in your client log from the LOG-MANAGER by using the QryInfo log entry type. This is documented in the Diagnostics and Troubleshooting manual.

If you want to get into even more obscure territory, there are undocumented client parameters for query info, -zqil and -zqilv, that provide some info on query resolution. Be careful where you use them as they write to the database log. Although they are undocumented, there is some info about them in the KB. They only work with shared-memory clients.

  1. In your second query, you have a range match on the first field in the WHERE clause, so the other fields are not considered for index selection (I suppose technically it's index elimination) by the compiler.

  2. In both query 2 and query 3, you have a query with a range bracket on the first component of the selected index. I don't know why query 3 was "so" slow because I don't know the duration delta or the record counts involved. Check the access statistics. It may also matter how large the indexes are. In some cases there can be significant size differences between a table's indexes.

  3. You cannot "flip" the order of fields in an index. You can create a new index with different component fields, or the same fields in a different order. But be careful about removing existing indexes, as this could make other queries perform worse.
    You could create a new index with inv-control as the first component and act-date as the second component. Then, after removing the USE-INDEX phrase, for query 3 the compiler could select your new index and then have an equality match on the first field and a range match on the second, so you would potentially read fewer records.
P.S.: I recommend against trying to outsmart the compiler and use the USE-INDEX phrase unless you can prove that your selection is more efficient, given your data. And if you can, you should comment the code appropriately so a future maintainer knows that you knew what you were doing and doesn't remove it.

P.P.S.: FYI, on 11.7.5, you are currently 16 updates behind. And OE 11.7 retires on April 1, 2025.
 
Thanks so much! This is really comprehensive and helpful.

A couple follow up questions:

1)
In your second query, you have a range match on the first field in the WHERE clause, so the other fields are not considered for index selection (I suppose technically it's index elimination) by the compiler.
Does this mean that the order of the fields in a WHERE clause (example: querying for INV-CONTROL before ACT-DATE) affect index selection?

2) We have several tables like INVHIST that have decades of data. If we're usually interested in a specific date range (such as the past three years), could partitioning these tables improve queries? (My guess? It depends.)
 
Does this mean that the order of the fields in a WHERE clause (example: querying for INV-CONTROL before ACT-DATE) affect index selection?
I misspoke. The second query has a range match on act-date and an equality match on inv-control. The compiler selected the index ACT-DATE-PART-LINE, whose components are act-date, act-time, etc. As the WHERE clause has a range match on the first index component, act-date, it can only bracket on that component. So the query returns all records between startDate and endDate to the client, for all values of inv-control.

2) We have several tables like INVHIST that have decades of data. If we're usually interested in a specific date range (such as the past three years), could partitioning these tables improve queries? (My guess? It depends.)
Do you really have a business need to retain decades of data? Or did it just accumulate because no one made a plan to remove it when it was no longer needed?

I have heard anecdotal stories of table partitioning implementations where it did not provide expected gains in improving query performance. My sense (not based on hands-on experience) is that the largest benefit of table partitioning is likely to be ease of maintenance, not OLTP gains. For example, let's say you partitioned invhist and you chose a date field (act-date?) as the partition key. Then you could create partition policies to put, say, each year's data into a separate partition. Then you would be able to purge old data by removing partitions rather than running an ABL purge routine. The latter can be time-consuming to run and can cause post-purge query performance problems.

As far as I am aware, TP isn't a widely-used feature. If you decide you want to use it, you should clearly define why you are using it and what you hope to gain. And then test thoroughly to ensure that you can actually achieve that benefit in exchange for the extra costs of licensing, maintenance, and administrative complexity.
 
Back
Top