Usage of Index

Pavan Yadav

Member
Hi All,
I am regular visitor of Progress Talk, a perfect place for Progress Technology related Knowledge and clarified most of my concepts from here.
I faced a strange behaviour of Indexes with a query and so needs to share over here to get a good picture on that and to clarify myself on Index concepts.
I have a table named table1 and having some fields like :
field1, field2, field3, field4, field5, field6 etc.

It's having index as :
index1 pu with fields as in sequence -- field1 field2 field3 field4 field5
index2 u with fields as in sequence -- field1 field3 field5 field4 field2
My query is :
Code:
for each table1 where field1 = "--" and field2 = "--" and field3 = "--" and field4 = "--" and field5 = "--"  no-lock:
display field1.
As per me it should pick index1, but when i compiled it using Xref , output shows that the index used is Index2. I tried it 2-3 times to make myself sure for no mistakes.
How can it be possible or in which scenarios it could be possible...??
Please share yout thoughts for the same.

Cheers,
Pavan Yadav
 

Pavan Yadav

Member
No Tom.. This is the only scenario. There are few more indexes on this table but i guess that should not to do anything with this.
Also, I have a doubt over here that does sequence of fields does really matters while selecting an index because both indexes have same fields but with different sequence only, so are these two indexes really required?
Please guide.
 

RealHeavyDude

Well-Known Member
From my point of view it does not matter which index the compiler picks because both are equally valid choices: Each index contains each field specified in the query - just in a different order. Therefore depending on the index picked, the records are fetched in a different order. If the order would matter one would specify a by clause.

OTOH - I don't remember the rule of 7 exactly ( there are seven rules which the compiler tries to apply sequentially in order to find out which index to pick ), but, as far as I remember, the last rule says the name of the index in alphabetical order.

Are the table, fields and indexes really named like in your sample code - or is this just sample code and you need to reveal the real names?


Heavy Regards, RealHeavyDude.
 

Rob Fitzpatrick

ProgressTalk.com Sponsor
No Tom.. This is the only scenario. There are few more indexes on this table but i guess that should not to do anything with this.
Also, I have a doubt over here that does sequence of fields does really matters while selecting an index because both indexes have same fields but with different sequence only, so are these two indexes really required?
Please guide.

It depends on your WHERE clause. In this particular query, as RHD said, either index yields the same result. Given that all operators are AND, the WHERE clause could be rewritten as "where field1 = "--" and field3 = "--" and field5 = "--" and field4 = "--" and field2 = "--"" and it is logically the same. This matches the component order in index2. However that doesn't mean all queries work equally well with both indexes. If one of your components had a range match instead of an equality match (e.g. field3 > "value"), or if your WHERE clause contained fewer than five fields, then order would matter.
 

Pavan Yadav

Member
Real Indexes are :
pu ld_loc_p_lot
u ld_part_lot
and query picked ld_part_lot instead of ld_loc_p_lot I was expecting as per above scenario.
@ RHD : I am totally agreed with your point that it doesn't matters because both are equally valid choice, But I just put this point for clearing some Index concepts in this particular scenario.

@Rob : Yes, it matters if we will be having some range match or some field would be differnt from one's specified in Index. But here, there is no such case :(
So just had thought of to dicuss that what could be scenario of such behaviour.
 

TomBascom

Curmudgeon
Order of fields in the WHERE clause is not supposed to matter.

Order of fields within the index definition does matter -- having two different indexes with the same fields, but in a different order, means that there are two different sort orders. If you do not need two sort orders then, yes, you only need one of those indexes. But it is likely that whoever created the 2nd one did so because there was a need for an alternative ordering.

The simple index selection rules use the "primary" attribute as a tie-breaker prior to "alphabetical order". If there is an undisclosed BY clause that is also a tie-breaker that ranks before the test for "primary".

However there can be complexities -- especially if one of these indexes has more fields in it than we are being told or if the real query has more comparisons or if some of the comparisons are not actually "=" etc.

Continuing to speculate on why made up stuff behaves in an imaginary way probably won't get anyone anything except frustrated.

Posting the real field and index definitions and the actual query code would be necessary for me to say anything more. The actual XREF output would also be interesting. Without real definitions and a real query I cannot see the point in continuing.
 

Pavan Yadav

Member
Thanks a lot for all of your valuable outputs.. I will provide the actual table, index definitions and actual query code by tomorrow for a better understanding.
 

Pavan Yadav

Member
Code I am Using :
Code:
define variable l_ipfile as character.
define variable l_count as integer.
 
define temp-table tt_lddet
    field tt_site like ld_det.ld_site
    field tt_part like ld_det.ld_part
    field tt_loc like ld_det.ld_loc
    field tt_lot like ld_det.ld_lot
    field tt_ref like ld_det.ld_ref.
 
empty temp-table tt_lddet.
l_ipfile = "Allocated_Inv.csv".
 
input from value(l_ipfile).
      import unformatted ^.
      repeat :
         create tt_lddet.
         import delimiter ",":U tt_lddet.
      end.
INPUT CLOSE.
 
status default "Updating Record...".
l_count = 0.
for each tt_lddet:
    for each ld_det exclusive-lock
        where ld_domain = "Abc"
          and ld_site = tt_site
          and ld_loc  = tt_loc
          and ld_part = tt_part
          and ld_lot  = tt_lot
          and ld_ref  = tt_ref:
        /* assign ld_qty_all = 0. */
        l_count = l_count + 1.
    end.
end.
Message "Total records updated: " l_count view-as alert-box.

Actual Index :
pu ld_loc_p_lot 6 + ld_domain
+ ld_site
+ ld_loc
+ ld_part
+ ld_lot
+ ld_ref
u ld_part_lot 6 + ld_domain
+ ld_part
+ ld_lot
+ ld_ref
+ ld_site
+ ld_loc

Xref Output :
./qty_update.p ./qty_update.p 1 COMPILE qty_update.p
./qty_update.p ./qty_update.p 1 CPINTERNAL ISO8859-1
./qty_update.p ./qty_update.p 1 CPSTREAM ISO8859-1
./qty_update.p ./qty_update.p 1 STRING "l_ipfile" 8 NONE UNTRANSLATABLE
./qty_update.p ./qty_update.p 2 STRING "l_count" 7 NONE UNTRANSLATABLE
./qty_update.p ./qty_update.p 4 STRING "tt_lddet" 8 NONE UNTRANSLATABLE
./qty_update.p ./qty_update.p 4 STRING "tt_site" 7 NONE UNTRANSLATABLE
./qty_update.p ./qty_update.p 4 REFERENCE qaddb.ld_det ld_site
./qty_update.p ./qty_update.p 4 STRING "tt_part" 7 NONE UNTRANSLATABLE
./qty_update.p ./qty_update.p 4 REFERENCE qaddb.ld_det ld_part
./qty_update.p ./qty_update.p 4 STRING "tt_loc" 6 NONE UNTRANSLATABLE
./qty_update.p ./qty_update.p 4 REFERENCE qaddb.ld_det ld_loc
./qty_update.p ./qty_update.p 4 STRING "tt_lot" 6 NONE UNTRANSLATABLE
./qty_update.p ./qty_update.p 4 REFERENCE qaddb.ld_det ld_lot
./qty_update.p ./qty_update.p 4 STRING "tt_ref" 6 NONE UNTRANSLATABLE
./qty_update.p ./qty_update.p 4 REFERENCE qaddb.ld_det ld_ref
./qty_update.p ./qty_update.p 12 STRING "Allocated_Inv.csv" 17 NONE TRANSLATABLE
./qty_update.p ./qty_update.p 22 STRING "Updating Record..." 18 NONE TRANSLATABLE
./qty_update.p ./qty_update.p 25 STRING "ld_det" 6 NONE UNTRANSLATABLE
./qty_update.p ./qty_update.p 25 ACCESS qaddb.ld_det ld_domain
./qty_update.p ./qty_update.p 25 ACCESS qaddb.ld_det ld_site
./qty_update.p ./qty_update.p 25 ACCESS qaddb.ld_det ld_loc
./qty_update.p ./qty_update.p 25 ACCESS qaddb.ld_det ld_part
./qty_update.p ./qty_update.p 25 ACCESS qaddb.ld_det ld_lot
./qty_update.p ./qty_update.p 25 ACCESS qaddb.ld_det ld_ref
./qty_update.p ./qty_update.p 25 STRING "Abc" 3 NONE TRANSLATABLE
./qty_update.p ./qty_update.p 25 SEARCH qaddb.ld_det ld_part_lot
./qty_update.p ./qty_update.p 36 STRING "Total records updated: " 23 NONE TRANSLATABLE
./qty_update.p ./qty_update.p 36 STRING "default" 7 NONE UNTRANSLATABLE
./qty_update.p ./qty_update.p 36 STRING "default" 7 NONE UNTRANSLATABLE
./qty_update.p ./qty_update.p 36 STRING "ld_part_lot" 11 NONE UNTRANSLATABLE

This is the exact picture. Please share your thoughts.
 

TomBascom

Curmudgeon
... Please share your thoughts.

I've no ideas on why that paticular index was chosen and, yes, it does seem odd but you're counting records -- there is no order implied or needed and either index is equally useful and valid. Why do you care which one is chosen?
 

Pavan Yadav

Member
Thanks for looking into this Tom... I don't care about this but I was just trying to understand that why it was picked that way. Anyhow, it doesn't bother me till the time I won't sort it and if required I will try to use use-index to explicitly specify index :)
 

RealHeavyDude

Well-Known Member
Never, never, never use USE-INDEX. It is worst practice.

As I already said: Whenever the sort order matters you should always specify the sort order with the by option. The compiler does a hell of a job picking the best index. Whenever you think that you are more clever than the compiler you might wind up table scanning the hell out of your system.

Heavy Regards, RealHeavyDude.
 
would use USE-INDEX is a direct order, and in the case of a table with multiple index is best, and do not let Progress decide alone.
 

RealHeavyDude

Well-Known Member
I can just repeat myself: Never, never, never use use-index. The usage of use-index is worst practice. Full stop.

Heavy Regards, RealHeavyDude.
 

LarryD

Active Member
Not to get into a religious war (although I do agree that USE-INDEX is bad practice and I don't really ever use it), I can think of one instance (and perhaps incorrectly) where use-index MAY (and I am no longer sure) be used.

Say you have an order table where the ord-num is the primary index, and you have two secondary indexes of cust-num ord-date (perhaps other fields) and ord-date cust-num (perhaps other fields).

You want to run a report that has a range of customers and a range of dates.

While there are other ways to accomplish this (and I can think of a few ways to write this code better to utilize both indexes), let's say you use a simple for each.
Code:
for each order no-lock 
    where order.cust-num >= beg-cust 
    and order.cust-num <= end-cust 
    and order.ord-date >= beg-date
    and order.ord-date <= end-date:
...
end.

So if you use say a year for the date range and customers between 1 and 10, there would be many more date records read than customers read depending on the index used.

Now given there is a range involved, whichever of the secondary indexes are used, it will only use the first part of the index due to the ranges.

My understanding (and perhaps mistakenly) always has been that given a "tie", OE will always use the index that returns the most records. Is that true ?

From a speed/efficiency standpoint, since I know the distribution of the data and how many records would be retrieved (and discarded), could this not be a case for use-index if OE indeed would choose the one with the most records? Or am I totally mistaken? (or should I rewrite it anyway?)
 
I can just repeat myself: Never, never, never use use-index. The usage of use-index is worst practice. Full stop.

Heavy Regards, RealHeavyDude.

Do not get angry, but I would like to know, why not use USE_INDEX?. This was recommended in some manuals of progress and we have several systems using this for years.
 

Rob Fitzpatrick

ProgressTalk.com Sponsor
Not to get into a religious war (although I do agree that USE-INDEX is bad practice and I don't really ever use it), I can think of one instance (and perhaps incorrectly) where use-index MAY (and I am no longer sure) be used.

It's certainly possible to conceive of situations where forcing the compiler not to select an index according to its index selection rules results in a more efficient query. My experience with developers is that they often don't trust the compiler, or think they're smarter, or they refer back to how things worked back in v6 when they last studied index selection mechanics in depth. I believe the platform improves at a greater rate than many (perhaps most) developers.

My understanding (and perhaps mistakenly) always has been that given a "tie", OE will always use the index that returns the most records. Is that true ?

Do you mean the *least* records? As a thought experiment, let's assume that's true. Nothing is known about your data at compile time so this choice would have to be made at run time. How in theory could the 4GL server determine which index selection results in a smaller bracket? It would have to access both indexes and count the keys returned for each, make a choice, and then read through the index keys again and return the records to the client. With large data sets that approach would be increasingly inefficient and could well result in more logical and physical I/O than just deterministically choosing one index based on rules.

Also, if there is record sorting to be done on the client side, it could be the case that the smaller bracket does not result in the faster query.

From a speed/efficiency standpoint, since I know the distribution of the data and how many records would be retrieved (and discarded), could this not be a case for use-index if OE indeed would choose the one with the most records? Or am I totally mistaken? (or should I rewrite it anyway?)

It could be a case for USE-INDEX. But your data can change and invalidate your decisions. If that happens, will you be aware of the change, remember the code you touched that is affected by the change, and deliver a fix? And if there is turnover and now someone else owns that code, will she know to make the fix?
 
Top