Compile Time

GregTomkins · May 4, 2009

I have observed the following in a procedure that runs the same series of about 60 subprocedures about 100 times (eg. processing 100 orders, each of which entails running about 60 procedures). In all cases the -q parameter is in effect.

1. Time required to run when everything is precompiled = 30 seconds
2. Time required to compile all of the programs involved once = 15 seconds
3. Time required to run when nothing is precompiled = 4 minutes

This makes no sense to me. I would think that #3 should be roughly the sum of #1 and #2. The following stats are from the client.mon file:

From case 1 - everything precompiled:
Compilations of .p's: 1
Checks of files with stat: 62

From case 3 - nothing precompiled:
Compilations of .p's: 61
Checks of files with stat: 38471

BTW, case 3 with no -q parameter is slower only about 5 minutes. I would have expected it to be vastly slower. It does suggest that -q is being used, though.

Anyway, that 38,471 looks suspicious to me. My guess is that in the -q logic, if the file is not precompiled, it checks the file system for each RUN (and each included file, which might explain the huge number), even if it eventually decides to use a session compile.

Any comments? Am I missing something here?

Thanks and Cheers.

TomBascom · May 4, 2009

GregTomkins said:
I have observed the following in a procedure that runs the same series of about 60 subprocedures about 100 times (eg. processing 100 orders, each of which entails running about 60 procedures). In all cases the -q parameter is in effect.

1. Time required to run when everything is precompiled = 30 seconds
2. Time required to compile all of the programs involved once = 15 seconds
3. Time required to run when nothing is precompiled = 4 minutes

This makes no sense to me. I would think that #3 should be roughly the sum of #1 and #2. The following stats are from the client.mon file:

From case 1 - everything precompiled:
Compilations of .p's: 1
Checks of files with stat: 62

From case 3 - nothing precompiled:
Compilations of .p's: 61
Checks of files with stat: 38471

BTW, case 3 with no -q parameter is slower only about 5 minutes. I would have expected it to be vastly slower. It does suggest that -q is being used, though.

Anyway, that 38,471 looks suspicious to me. My guess is that in the -q logic, if the file is not precompiled, it checks the file system for each RUN (and each included file, which might explain the huge number), even if it eventually decides to use a session compile.

Any comments? Am I missing something here?

Thanks and Cheers.

Your guess is correct.

GregTomkins · May 4, 2009

I also noticed that the 38,000 number skyrockets to over 100,000 when -q is disabled. So, -q *is* having a major effect, but I can't account for the difference or the larger number. The whole thing kinda doesn't make sense. The documentation on this is sketchy and the KB is no help.

KnutHandsome · May 6, 2009

GregTomkins said:
I also noticed that the 38,000 number skyrockets to over 100,000 when -q is disabled. So, -q *is* having a major effect, but I can't account for the difference or the larger number. The whole thing kinda doesn't make sense. The documentation on this is sketchy and the KB is no help.

60,000 more OS calls are being made without -q. I'm not sure if you already knew that.

KB 5756 (How to interpret the -y statistics in the log file) said:
Checks of file with stat line indicates a relatively time-consuming
system call. Unless you invoke the Quick Request (-q) startup option,
PROGRESS makes a stat call each time a precompiled subprocedure is
called with RUN

As it notes 'precompiled' which is at odds with your (3) above, I guess this message doesn't add much to your understanding, but some of this stuff is new to me and might be to others reading.

I'm assuming here a 'stat call' is something similar to the various 'stat' OS methods used in other languages.

But you are correct, the documentation is sketchy. There is quite a lot of low-level stuff in "R-code Features and Functions" though if you are interested in that sort of thing.

GregTomkins · May 6, 2009

PSCTS says 'stat' is the same as the C function and they (basically) call it every time they encounter a RUN, etc.

I did some testing and it seems every time you do a RUN, the 'stat' number increments by TWO. If you don't use -q, it is every single time per unique RUN target; if you do, it is only the first time.

I'm still baffled why, with -q, I am getting 38,000 stat calls, when I know from the remainder of the client.mon output that I am only running around 60 programs. With -q my stat number should be about 120.

This is not some esoteric exercise- the 120 vs. 38,000 difference translates into a runtime difference of about 10 seconds vs. about 4 minutes.

TomBascom · May 6, 2009

Running 60 programs how many times?

Code:

/* xyzzy.p */
return.

Code:

define variable i as integer no-undo.
do i = 1 to 1000:
  run xyzzy.p.
end.

(Starting and stopping "pro" without doing anything costs 154 stat() calls on my system...)

Without -q:
Compilations of .p's: 2
Checks of files with stat: 2194

With -q:
Compilations of .p's: 2
Checks of files with stat: 182

Now, just for fun, add an include file to xyzzy.p

xyzzy.p:

Code:

{./nothing.i}
return.

nothing.i:

Code:

/* nothing */

Without -q:
Compilations of .p's: 2
Checks of files with stat: 3229

TomBascom · May 6, 2009

I'm still baffled why, with -q, I am getting 38,000 stat calls

Re-reading...

Does your program name start with "_"? Such programs ignore -q.

GregTomkins · May 6, 2009

Yeah I figured .I's had something to do with it. Your numbers more or less coincide with my findings (I think) but I think you omitted the key one, which is the last one with -q enabled. Don't you think it should be somewhere around 2? (I assume the 182 in your earlier number was mostly due to the pro startup that you mentioned).

Maybe I should explain what I am really doing here. I have a test environment setup on a Unix server for which we only have a runtime license. It works great and is super fast, but it is a major hassle to copy the code (and data files) over there and compile all the time. So I am trying to port it to Windows, where we have a development licence.

My hope is: don't have any .R's, compile on the fly all the time, which is maximally efficient when developing. With -q, the performance should be roughly the performance with .R's plus the time required to compile each file once. I have measured the former at about 30 seconds and the latter at about 15 seconds, theoretical total 45 seconds.

The actual total is about 4 minutes. The difference seems to be directly accounted for by those STAT calls. CLIENT.MON confirms it is only compiling each file once.

TomBascom · May 6, 2009

It is always a good idea to tell us what you are really doing.

You're making my head spin.

pro -y -q, compile on the fly, with the empty include file = 1091 checks with stat().

The existence of an include file seems to cause an extra 1000 or so stat() calls.

Adding a second include doubles the excess to 2000.

Nesting includes bump it up too.

The part that I find surprising is that the case with no include file and -q results in reduced stat() calls. I generally think of compile on the fly as canceling out -q so I would have expected 1,000+ stat() calls for that case.

GregTomkins · May 6, 2009

I generally think of compile on the fly as canceling out -q

That is not what the documentation (to me) implies; the way I read it, -q should result in minimal compiling (which it does) AND minimal stat checks (which it does not). Those extra stat checks would seem to be useless since it's going to use session-compiled code anyway.

Logically I think the time spent running uncompiled code using -q should equal the time to compile each .P/.I once, plus the time spent running the same task precompiled.

However, unfortunately for me, it looks like you are correct. To get reasonable performance you have to precompile, and -q doesn't really help in this case.

(To all the people saying "of COURSE you have to precompile to get reasonable performance!", you aren't understanding the question).

Thanks a lot for your input Tom, much appreciated.

KnutHandsome · May 7, 2009

I agree with most of what you've said in this interesting thread, apart from:

GregTomkins said:
don't have any .R's, compile on the fly all the time, which is maximally efficient when developing.

I'm puzzled as to why you seem reluctant to have precompiled programs in your development propath. Surely the vast majority don't need recompiling from one session to the next? And those that do will be compiled after you have changed them, no?

But perhaps I have misunderstood - what am I missing (I grant you the point that -q seems not to be as efficient as hoped - personally I've never had a need for it)?

TomBascom · May 7, 2009

I'd actually sort of view this behavior in a positive light for a development environment. It just might give me some performance feedback that I might otherwise miss.

Compile Time

GregTomkins

Active Member

TomBascom

Curmudgeon

GregTomkins

Active Member

KnutHandsome

Member

GregTomkins

Active Member

TomBascom

Curmudgeon

TomBascom

Curmudgeon

GregTomkins

Active Member

TomBascom

Curmudgeon

GregTomkins

Active Member

KnutHandsome

Member

TomBascom

Curmudgeon