Question Performance loss due to Temp-Tables

Rajat Sharma · Nov 18, 2014

Hello everyone, hope you are doing well.

I was in a discussion of temp-table loading and deleting problem with respect to memory. I am explaining the discussed problem and the example which I have tried regarding the same:

Problem:

One temp-table (for ex: abc) is having around 60000 records and one external procedure (containing temp-table definitions) is called inside FOR EACH of the temp-table (abc). So for 60000 iteration of temp-table, external procedure is called 60000 times. My question is, does progress needs to load/delete temp-table into/from memory each time .P (external procedure with temp table definition) is called. Apparently if this happens then isn't it a big performance loss due to temp-table (please suggest).

I tried one similar example for checking the performance difference but didn't find any huge difference regarding the same.

Code:

DEF VAR i   AS INT NO-UNDO.
DEF VAR t1 AS INT NO-UNDO.

ETIME(YES).

DEF TEMP-TABLE t-prac
   FIELD a AS INT
   FIELD b AS INT
   FIELD c AS INT
   FIELD d AS CHAR
   FIELD f  AS CHAR.

DO i = 1 TO 60000:
  CREATE t-prac.
  ASSIGN
  /*assigning dummy values here*/
END.

FOR EACH t-prac:
  RUN calc.P.  /*program containing temp-table definitions*/
END.
t1 = ETIME.
MESSAGE t1
  VIEW-AS ALERT-BOX.

I didn't get much difference when I run this program (mentioned above) with following scenario’s:

1. I didn’t define anything in calc.p program.
2. I have defined temp-tables in calc.p.
3. Along with defining temp-table, I have created temp-table in calc.p program with some dummy data.

Please give your inputs on this.

Thanks & Regards!
Rajat.

GregTomkins · Nov 18, 2014

I'm kind of confused by your post, but in case this helps:

1. I don't think the time taken to run calc.P should be any different if you were to remove the TT entirely and just run calc.P 60,000 times.

2. Of course any work you do inside calc.P including defining TT's will slow things down; but if all you do is define a TT similar to t-prac and literally do nothing with it, I would expect the time taken to run calc.P and define a TT 60,000 times to be no more than a few hundred ms.

3. Maybe you want calc.P to act on the TT that you're creating. If you pass the TT by value, it will certainly slow things down, roughly linear to how big the TT is. But you can avoid passing the TT by value by either (a) passing a handle, (b) passing it BY-REFERENCE, or (c) making it shared. If you do (c) you will rightly get lectured on the evils of shared data, so don't do that.

Incidentally: we would almost always say 'DEF TEMP-TABLE t-prac NO-UNDO'. I don't know how much performance benefit this has, but it probably helps some.

HTH

GregTomkins · Nov 18, 2014

Damn. I should have tried it before opining. Here's from a quick test I ran on one of our Unix servers:

1. Run calc.p 60,000 times with no TT definitions or other work: 1.6 seconds

2. Run calc.p 60,000 times with a simple TT definition, 6 fields and an index: 3.9 seconds.

So, I'm slightly surprised by that

tamhas · Nov 18, 2014

Look up -Bt and -tmpbsize parameters and play with them. The really cool thing about temp-tables is that they will slop to disk automatically when they get too big. The really uncool thing about temp-tables is that there is an understandable performance hit when this happens. Bumping up these two parameters will increase the memory footprint of your client, but can make dramatic differences in speed.

Rajat Sharma · Nov 18, 2014

Thanks for replying Greg, Tamhas,

@Greg,

I agree, If we aren't using no-undo with define temp-table then it will write notes (logs) to .BI file and unnecessary we are growing the size of .BI file.

I agree, If we define temp-table BY-REFERENCE then instead of deep copying temp-table to called procedure, progress provide a link or pointer from calling to called procedure but i think we should do that if temp-table is huge or having 30-40 fields. i have tried that as well but it's not creating much impact.

My question was, Does progress needs to fetch every temp-table (which is defined) from disk to memory and remove respectively (60000 times OR on each calc.p call). If it happens then isn't it a big performance loss? what can we do for that program? Please suggest.

@tamhas
I have tried using -Bt parameter but it doesn't create much difference. Probably, i am not much aware of these parameters (how to use them). If possible, could you please share any relevant doc regarding the same.

Regards!
Rajat.

tamhas · Nov 18, 2014

All the info is in the manual. The other clue is to look in the -T directory using -t to see what happens to the disk files ... I think it is DBI for temp-tables. If that is not growing, then the temp-table is all in memory already.

In any case, it does not fetch the whole thing every time, just what is needed. If you are moving sequentially through a table and that table is partly on disk, then there will just be a few page reads to bring in the next chunk and then it will process that chunk.

TheMadDBA · Nov 18, 2014

You are missing an important parts of the equation here... how exactly is the TT information being passed from the main program to calc.p? In your example they are two different temp-tables.

Rajat Sharma · Nov 18, 2014

@tamhas
I am not much aware of these parameters settings. i will go through the relevant docs. what i understand so far is:

If we are only defining temp-table then should it be fetched from disk to memory? I think, If size of the temp-table is large then -Bt parameter then only progress moves temp-table from disk to memory otherwise it won't(please suggest).

Thanks for replying TheMadDBA,

I think, we aren't passing any information for temp-table which is defined in calc.p program (i will look into that again).only defining temp-table creates this problem (i think). Probably i am not understanding much, Could you please elaborate that?

Regards!
Rajat.

tamhas · Nov 18, 2014

By default, temp-tables will only be in memory. It is when the memory available is used up that it goes to disk with *the part that doesn't fit". If you use -tmpbsize and -Bt you can allocate *more* memory and a greater part will then be in memory. If you can give it enough room, then it will all be in memory and never go to disk. This can make a huge difference in processing speed.

TheMadDBA · Nov 18, 2014

So you have a completely different temp-table in calc.p (no relationship to t-prac)? In that case nothing is happening (on disk or in memory) for t-prac when you run calc.p.

But.. when you run calc.p the temp-table is instantiated; meaning that it allocates space either on disk or memory depending on your -Bt settings (buffer cache for temp-tables). And then it is destroyed/cleared when calc.p ends. That can add some overhead depending on your system/parameters. Usually it is minor enough and only shows up in non real world tests.

IF you are actually having an issue then this might be a good case for making calc.p persistent procedure and running an internal procedure instead. That way the TT only gets instantiated once (when the persistent procedure is run) instead of each call. But first ask yourself what the real world performance impact is.

-Bt and -tmpsize are good things to tune no matter what.

Rajat Sharma · Nov 18, 2014

@tamhas
As per my understanding, temp-tables are database tables and stored in temp database that's why we could create index on them unlike work-tables which are stored in memory (please suggest).

@TheMadDBA
Sure, I would look into that again. I am just curious to know that if both of the temp-tables are related or similar temp-tables are defined in both external procedures then what happen?

Regards!
Rajat.

TheMadDBA · Nov 18, 2014

Similar doesn't matter.. even the same temp-table defined twice in two different programs behaves like I said before. There is nothing special about a temp-table name that makes it shared between two different procedures.

If you have a specific question and example that actually passes temp-table information between two procedures using parameters then we can discuss that.

Rajat Sharma · Nov 18, 2014

@TheMadDBA, Thanks for your inputs on this!!!

I will look into that again and probably get back to you with better understanding.

Regards!
Rajat.

GregTomkins · Nov 18, 2014

"If we aren't using no-undo with define temp-table then it will write notes (logs) to .BI file and unnecessary we are growing the size of .BI file."

Just to clarify ... I believe local (eg. TT and variable) activity gets written to a so-called "LBI" file, unique to each session, and unrelated to the database .BI file. You see these lying around the file system all the time. Presumably it's still costly, of course. In real life I think NO-UNDO is probably tied with DEF as the single most frequently used keyword in our code base.

"As per my understanding, temp-tables are database tables and stored in temp database that's why we could create index on them unlike work-tables which are stored in memory (please suggest)."

I don't believe Progress has a temp database in the way you are thinking. Unless you get into 100's of 1000's of records, and assuming you use NO-UNDO, I think you can generally assume that there's no disk or DB activity happening at all (in the same way you would assume in a, say, C++ program that arrays and variables create no disk activity ... well, unless you create such as huge heap that the OS starts swapping pages).

I think of temp-tables as flexible arrays of unlimited size. They also could be the single best feature of Progress, but that's another story.

Rajat Sharma · Nov 18, 2014

Thanks for replying Greg!

I agree, that we should always use NO-UNDO with variables and temp-tables.

I agree, that when temp-tables increases more then specified size (defined in -Bt parameter) then it moves from memory to disk or vice-versa (depending on there actual existence). But where is the initial existence of temp-table. I guess initially they aren't part of main memory or RAM because if they are then how could we differentiate them from work table. At the same time i also think that they aren't part of permanent storage disk because temp-tables are volatile in nature. Perhaps, they are using some part of disk as virtual or temporary memory (i think).

Please suggest.

Regards!
Rajat.

TheMadDBA · Nov 18, 2014

DBI* temporary files are used for storing temp-table information that exceeds the memory allocated by -Bt * -tmpbsize. That is the total of all temp-tables defined, not per temp-table.

Temp-Tables were devised because of the flaws of workfiles. Workfiles can't have indexes defined and only exist in the clients memory. Nobody should be using workfiles at this point in time. Back in the ancient days we had to make scratch database tables to store information once workfiles couldn't handle the volume (shudder).

Initial existence will be in memory for the definitions... the contents (rows/indexes) will be stored in memory (up to -Bt * -tmpsize) and then in the DBI* temporary files. Temp-tables are client objects and not stored in the database.

andre42 · Nov 24, 2014

See Progress KB 000048200 : http://knowledgebase.progress.com/articles/Article/000048200

"Prior to OpenEdge 11, static temp-tables are fully instantiated when the program that defines them are loaded.
Starting with OpenEdge 11, the instantiation of the record and index storage is delayed, and this will be performed on-demand.
...
Delayed instantiation was introduced to reduce the initial performance hit if a procedure has many tables defined, and to avoid spending resources until the table is actually used.
...
Currently, the temp-table instantiation happens whenever records are created or when any query is run on a temp-table (including FOR EACH loops or CAN-FIND function).
...
Delayed instantiation can be disabled with the -nottdelay startup parameter. This should only be used if the new mechanisms are a suspected cause of an issue."

tamhas · Nov 24, 2014

Delayed TT instantiation really only impacts the use case where one has multiple TTs defined, but they may not actually be used. The classic was a bunch defined in widely used includes so that often the TT was not used in the particular compile unit. If one is sensible and only defines it where it is used, this delayed instantiation has no impact on overall performance.

TheMadDBA · Nov 24, 2014

100% correct. If you are actually using the temp-tables (or realistically most of them) then delaying the instantiation isn't going to make a measurable difference.

TomBascom · Nov 24, 2014

To clarify one more thing several posts back:

I guess initially they aren't part of main memory or RAM because if they are then how could we differentiate them from work table

Just because something is "part of main memory" does not make it the same as everything else that is in RAM. Variables are also in memory and they are not confused with work tables either. My office contains a desk and some chairs and while I might occasionally sit on a table it is not a chair even though it is in my office. Temp-tables and work-tables share the nomenclature "table" as part of their name and they are organized using some similar paradigms (records and fields, CREATE statements etc.). But the compiler and the run-time environment know that they are not the same, allocate memory to the appropriate data structures and use very different algorithms "under the hood" without any issues.

Question Performance loss due to Temp-Tables

Member

Active Member

Active Member

ProgressTalk.com Sponsor

Member

ProgressTalk.com Sponsor

Active Member

Member

ProgressTalk.com Sponsor

Active Member

Member

Active Member

Member

Active Member

Member

Active Member

Member

ProgressTalk.com Sponsor

Active Member

Curmudgeon