FOR FIRST vs FIND FIRST?

Jobaq · May 9, 2007

Hi all just recently received a document with progress standards and it said the i must use FOR FIRST <table> : END. instead of FIND FIRST <table>.

Is this going to get my app less efficient because of the FOR FIRST will buffer up additional reads?

Thanks

TomBascom · May 9, 2007

Neither statement is likely to be a good choice.

They both have significant problems and, IMHO, should both be banned.

Neither statement works the way that you would naturally expect it to work.

If the order of records matters (and if you are saying FIRST or LAST then it apparently does matter) then:

1) You are dealing with a set of records. If the first or last record is treated specially then you are breaking normalization.

2) Since you are dealing with a set of records you should be using a BY phrase in a FOR EACH (or a QUERY).

From a performance perspective if either statement makes a difference you have an indexing problem that should be fixed. Using FIRST to cure performance problems is a band-aid and a bad habit.

tamhas · May 9, 2007

To reinforce Tom's answer, the issue with FIND FIRST is that it can easily not mean what you assume it means. The one context in which it has possible validity is when all you care about is that at least one record with the desired criteria exists, not which of several records it might be. If you think you are predictably going to get a particular record, then you probably don't understand what it is actually doing.

If you construct a FOR statement with a BY clause which would potentially return the whole set matching your criteria and to do so in a particular order, then you can put a LEAVE in it to exit the loop after the first record is retrieved and know exactly which record you have. As Tom points out, this may also not be good practice, but at least it is predictable.

TomBascom · May 9, 2007

FOR FIRST, on the other hand, does not work as expected -- that is to say that it is not equivalent to FOR EACH table BY field: LEAVE. END.

mpowell_esq · May 13, 2007

FOR FIRST is telling Progress:
"go off an interrogate the data, I expect multi rows to be returned that match criteria though I am only interested in the first row found"
FIND FIRST is telling Progress:
"go off an interrogate the data, I expect mylti rows to match the criteria though I am only interested in the first row found make that row available and put into the record buffer to use"

TomBascom · May 14, 2007

The problem (beyond the fact that you're making the FIRST record special and thus violating the principles of normalization) with both statements is the BY criteria. You're specifying the FIRST record in an ordered set. But:

1) In the case of FIND you cannot use a BY clause -- FIND does not support BY. So you must either depend upon your knowledge of index selection or you must specify a particular index which will (hopefully) give you a particular result.

2) Try this code snippet against sports2000:

Code:

for each cust no-lock by discount:
  leave.
end.
display custNum name discount.

for first cust no-lock by discount:
  display custNum name discount.
end.

Frankly I strongly discourage the creation of magic records (just in case that isn't obvious). But if you must do it then the safest and most correct way to do it is to use the:

Code:

for EACH cust no-lock BY discount: leave. end.

Because this construct makes the ordering (which must be important since you are stating that the FIRST is special) explicit and correct.

BTW -- it should also be obvious from this example why making such records magic is a bad idea. But just in case it isn't...

Suppose that your clever business analyst has decided that everyone with a discount of 0 should get the same "terms" and that an even more clever programmer (who, of course, you are now replacing) has thus decided that the code only needs to check the FIRST discount = 0 record to obtain the proper terms for any order.

None of the solutions above (even my "correct" solution) for finding the FIRST record are going to work. They all have nasty failure modes. The fatal flaw isn't the code (at least not if you do the FOR EACH thing

) -- it's the idea that the FIRST record in the set is special.

Think it doesn't happen? Nobody could ever be that silly? Guess again

People mostly think that they're getting away with this stuff because they get lucky and the bugs that they create don't occur often enough or in critical enough areas that they get noticed. But every now and then one of these things creates a fairly spectacular problem. Using FIRST and LAST is a bad habit. Don't form it. If the FIRST record is special then the schema is not in 3rd normal form. Fix that problem and the need for FIRST goes away. And so do all the bugs.

joey.jeremiah · May 14, 2007

Hello Tom,

I respectfully beg to differ.

I have not followed the entire thread but in my humble opinion using FIRST or LAST isn't a bad habit or not always, not if its used properly. And there are cases where using FIRST or LAST is perfectly legitimate, in my humble opinion.

For example -

1. Where the order of records isn't important, for example, if you only wanted to check if records exists for that WHERE filter.

2. Using FIRST or LAST with USE-INDEX, that sets the order of records.

Although, it can be argued that using BY (that would also work in the exact same and efficient way) would be clearer ! and its a very important argument but organizing isn't an exact science and in some cases you may want to emphasize a certain index is being used.

3. Theres also QUERYs and lots of them

The OPEN QUERY statement requires either using EACH or FIRST or LAST and using FIRST is faster then EACH (which would require the database engine to check for an additional record/s).

Meaning -

Code:

for each orderline no-lock

    ,order of orderline no-lock: /* only possible with a static for each statement */

end.

For QUERYs -

Code:

/* Using the first query would be faster then the second one */

open query qOrderLine

    for each orderline no-lock

        ,FIRST order of orderline no-lock.

open query qOrderLine
 
     for each orderline no-lock
 
         ,EACH order of orderline no-lock.

I do not really agree that using FIRST or LAST implies the data isnt normalized.

To the original poster, normalization in a nutshell or most importantly means that a table cannot hold data for more then one "thing" otherwise there would be duplications. For example, if the orderline table would hold its order header data, that data would be duplicated and if you had to make changes to the header data it would have to be done in many places.

I tell guys break the data into all the "things" you need to manage data for. I know its just a phrase, its too simplistic and there are other rules and practices but I believe its a very graphical, helpful description

There can be complicated queries and data analysis and I completely agree that we should do everything to make them as clear, obvious, intuitive as possible and comment where needed but I believe thats more a question of organization and it sometimes depends on what you want to emphasize.

Just my 2 cents.

tamhas · May 14, 2007

I don't think you are quite getting Tom's points here.

One point is that find first and for first don't necessarily give you the results that you might think they do and therefore other approaches should be used in order to be both clear and safe relative to intent. Yes, there are techniques for obtaining the first or last in an ordered set which work reliably and that is what you should do when that is your intent.

The other point is that one needs to question the purpose in such cases. If the purpose is merely to identify that the set is not empty, then one doesn't really care if one obtains the first record, does one? What Tom is cautioning against ... and he is quite right that it is a violation of normalization (you do know that there are multiple levels of normalization?) ... is in designing in such a way that the information in the first or last record is somehow different than the information in the other records in the set, either that it is more authoritative, as in the case he gave, or that it contains values that the others do not, e.g., storing a total in the last line of a set.

This is not to say that there aren't cases where it is legitimate to look at such records. E.g., one might look at the set of telephone calls for a particular date and be interested in the time of the first call and the time of the last call. But, there is nothing in that which will suddenly break if a new first or last record is created.

TomBascom · May 14, 2007

It may, in some limited cases, be an acceptable workaround to a specific problem but, IMHO, it's a horrible habit that leads people astray in some very subtle ways. Far, far too many people use FIRST indiscriminately any time that they code FIND. And the behavior of FOR FIRST is not at all what people expect it to be.

BTW -- the best formulation of the rules of normalization that I've ever seen is:

"The rules leading to and including the third normal form can be summed up in a single statement: Each attribute must be a fact about the key, the whole key and nothing but the key."

Wiorkowski and Kull
DB2 Design and Development Guide

(FIRST is not fact about the key. It is a fact about a particular result set at a given instant in time.)

joey.jeremiah · May 14, 2007

Thomas,

Although I dont have your or Toms experience I wasn't born yesterday

I am in full agreement with the example you gave but what I'm saying is that it doesn't necessarily imply to using FIRST or LAST.

Tom,

Yes. I agree that it could be misleading.

But I see alot of FIRSTs (mostly in QUERYs and mostly as unique joins, come to think of it, I dont think I've used LAST at least not for years) and I don't have an experience of them being misused.

But of course there are lots of cowboys out there writing terrible code.

I just dont think or feel like there should be a red flag, so to speak everytime there's a FIRST in the code, IMVHO.

But more interestingly back to normalization.

The reason I don't like that formulation as much and its basically another variation of

"All the "data" that theres a seperate unique one for every "thing" (entity) the primary key represents." Although its catchy

Is because its too technical and doesn't really give a good description for what is normalization.

And likewise with the step by step normalization process.

Besides most of the time you'd be using normalization for designing tables not refactoring tables, you may not even have fields to start with.

In most cases the specs or description of the data is more of a story and kind of all mixed up together ... we want to build a management system for hotels where we collect x informations on our clients and theres a history about the rooms and which ones are available etc.

I usually start with thinking what are the "things" in that story but I only want the "things" that I need to manage data for (again, that theres a seperate unique one for every "thing" which is pretty intuitive) kind of splitting the data into "things" to give me an idea of what are the tables I'd need to create.

For example, we have clients we want to manage data for, we have rooms that have data, orders that have data etc.

I'm personally a strong believer in using a UID separate from the real world data for primary keys, then theres the whole naming conventions etc.

Again just my 2 cents

tamhas · May 14, 2007

I just dont think or feel like there should be a red flag, so to speak everytime there's a FIRST in the code, IMVHO.

As noted, there are two reasons for being somewhat forceful on this point. The normalization issue points to the potential for bad design ... not a guarantee of it, but enough of a suspicion that one should be asking oneself if it is actually the right thing to do. But, the more immediate reason for wanting a red flag is simply that ABL doesn't do what one might be tempted to think it would do in these cases so, whether or not what one wanted was reasonable or best, there is a very good chance that it is not doing what one thinks. I.e., it is the kind of structure that, if one does end up using it, it probably deserves a comment explaining why.

Is because its too technical and doesn't really give a good description for what is normalization.

And likewise with the step by step normalization process.

I may be old fashioned, but I tend to think that normalization is one of those things that we have an obligation as professionals to understand. It really isn't that complicated and it is important to know why one follows it, why one might sometimes not follow it, and what are the risks associated with not following it. Is it the only rule that one needs to understand in order to design or refactor data structures? Certainly not. But, it is one of the rules.

TomBascom · May 14, 2007

In many shops if FIRST were the red flag that it should be you'd be seeing little but red

Which doesn't make it right to code with FIRST at every opportunity. It does make it difficult to change things though.

Quite frankly if it were up to me FIRST and LAST would have been deprecated years ago and removed from the language shortly thereafter. Sadly, I'm not in charge of such things

It is, and should be, a red flag in code. At best it is a sign of lazy ignorance. I generally approve of a certain amount of judiciously applied laziness and I'm not one to condemn innocent ignorance. But now that your awareness has been raised I expect you to do better

In any event the original poster is apparently getting started at a shop where an enlightened person in a position of authority appears to be in agreement with my position with regards to FIND FIRST. That person obviously deserves a raise!

Unfortunately there has either been a transcription error, or there is not awareness that FOR FIRST has a really big problem or two associated with it also.

TomBascom · May 14, 2007

Normalization is absolutely fundamental to writing good, robust business applications.

Anything that suggests lack of normalization is a red flag of the highest order.

De-normalization has its uses and there are times when it is the right thing to do. But they are few and far between and they should be presumed guilty until proven innocent. De-normalization should be the very rare exception -- not the rule.

The rule that I quoted is extremely simple and surprisingly powerful. Follow it and your life becomes much improved.

joey.jeremiah · May 15, 2007

In regards to FIRST and LAST, I still dont agree that it is a red flag. The main point is that it is to some degree misleading and it can be misused.

But I dont think its that extreme, I can probably come up with misuse cases for almost anything in the language and lots of real world examples.

I don't agree with the point you're making between normalization and FIRST and LAST, and I think that the example given is somewhat reaching.

In regards to "lazy" coding, I've worked very hard in making my code look lazy

(as in elegant, simple and clear). I think we all petty much believe that code needs to look good to work good

Back to normalization,

Its certainly a very fundamental and not a complicated subject, and we've all being doing this for quite awhile, atleast more then a decade and we have our own way of explaining things, seeing things, opinions and so on.

Personally if I needed to explain normalization to some new guy I would not tell him to do technically 1,2,3 ... 6 steps or use the other valid but IMO still technical formula. I would explain it as splitting the data into "things" that you need to manage data for, general approach.

And then send him to read and understand each one of the rules (I know, I know they're dependent steps).

Hey, different strokes for different folks

TomBascom · May 16, 2007

I guess we will have to agree to disagree.

Hopefully our original poster has benefited from the discussion.

TomBascom · May 16, 2007

I forgot to mention... the example is, of course, "contrived" out of necessity to make it work with "sports2000" so that everyone can share an easily replicated test case.

But as it happens I was dealing with one of these bugs yesterday in a certain very well known vendor application where the buggy code was using FIND LAST to fetch the allegedly latest order#. The code had worked correctly for 18 months. But then, somehow, a bogus order# got entered and suddenly that code no longer worked.

FOR FIRST vs FIND FIRST?

Jobaq

New Member

TomBascom

Curmudgeon

tamhas

ProgressTalk.com Sponsor

TomBascom

Curmudgeon

mpowell_esq

Member

TomBascom

Curmudgeon

joey.jeremiah

ProgressTalk Moderator

tamhas

ProgressTalk.com Sponsor

TomBascom

Curmudgeon

joey.jeremiah

ProgressTalk Moderator

tamhas

ProgressTalk.com Sponsor

TomBascom

Curmudgeon

TomBascom

Curmudgeon

joey.jeremiah

ProgressTalk Moderator

TomBascom

Curmudgeon

TomBascom

Curmudgeon