Question Out of control srt file

First time poster. I applogize in advance if I missed anything of value.
10.2A Linux
Issue: srt file hits 2 gig and crashes (this is not an index build).

I have a new program that has to make calls to a fairly old "processing engine" as they call it that might run through 8-10 scenarios (calling a 'C' program) and eventually returning the recommended formula based on data that I pass it.

The good is it's worked in the lab for about two months.
The bad is that as soon as I hooked it up for a pilot it failed (every time) with the described srt file issue.
The known is changes where made to the calling program (which one would think would make it easy to find since I know the stack trace).

The structure is as follows:
<code>
/* my code */
Run GetPointers.p
Run PrepExtract.p (Pass handles)

FOR EACH ttTable EXCLUSIVE-LOCK:
/* I can't touch "ProcessingEngine.p and noticed it was much fast one row at a time */
CREATE bttTable.
BUFFER COPY ttTable TO bttTable.
/* existing code */ RUN ProcessingEngine.p (INPUT-OUTPUT TABLE bttTable).
BUFFER-COPY bttTable TO ttTable.
EMPTY TEMP-TABLE bttTable.
END.
</code>

Some things to note about ProcessingEngine.p. It's old. Lots of shared variables and shared temp-tables. Lot's of shared locks. Lot's of bad guys in general. There are no query's in this old program.

I've:
- Enabled -t
- Enabled client statement caching
- Ran though it with the debugger
- Used the -noautoresultlist startup param even thought I didn't think it would help.

At this point I'm seeing a bunch of letters and numbers on a page. Any suggestions or insults:) would be very appreciated.

Best Regards and thanks in advance,

Rod Anderson
 

TheMadDBA

Active Member
0) Are you 100% certain it is the SRT file and not the DBI file?

1) Is this all of the code or just a simplified version of it? No BREAK BY or PRESELECT anywhere?

1) ttTable is a temp table?

2) Why are you creating more records for ttTable inside of a FOR EACH for ttTable? That probably isn't going to do what you want it to depending on the index used.

3) What happens when you run your code without calling ProcessingEngine.p

4) Have you ever used the Profiler Tool before?
 
Thanks for the response.
0) Yes it is the SRT file. Same condition you would usually get with an index rebuild. The DBI file is large but consistent. About 200 Meg. You get SYSTEM ERROR: Seek error 22, file 30, addr -2147483648. (163) . The same as an index rebuild.
1) It's just a simplified version for readability. there is a where clause the kicks out about 1/3 of the temp table and it is a for each no query no break by.
1) ttTable is a temp table (about 170,000 records)
2) I'm coping one record at a time to a second "like" temp-table (bttTable) and back after it returns. As noted in the comments I noticed that sending the older program one record in a temp table at a time was faster even with the extra overhead.
3) Run's to completion in seconds. In fact if I switch back to the previous version of ProcessingEngine.p used in the proof in concept it works fine.
4) No but I'll look into it tonight.

thanks again,

R
 

TomBascom

Curmudgeon
What errors do you actually get? What is the *first* error?

You are creating a lot of bttTable widgets and never deleting them. (You are "emptying" them but that isn't the same.) You could create it once (even statically) and just do the copy/process/empty operations in the loop.

Have you enabled any client logging? There are some very useful log types for tracking memory leaks.

Also, passing bttTable by value (the default) is going to hurt. I suggest adding "by reference".

Does "processingEngine" do anything unfortunate like install itself as a super-procedure?

I see an exclusive-lock but no explicit transaction. Is that perhaps because there is more code not being shown and the actual transaction scope is much wider than expected?

Why does ttTable need to be exclusive locked? You're just copying the record -- it seems like no-lock would work equally well. (Does "tt" mean that it is a temp-table? If so, is it a no-undo temp-table?)

I second the motion to run the profiler -- identifying where the time goes may well lead to where the problem is. Although if the session is crashing that will be a problem since you will lose the profiling data unless you gather it in a working environment like the lab.

I also wonder, along with TheMadDBA, if the issue is DBI file rather than SRT file.

Lastly, it "worked" in the lab -- was the lab db the same size as the db which fails?
 

TomBascom

Curmudgeon
2GB limit errors should be a thing of the past.

What does $DLC/version say?

Have large files been enabled?

Is this a 32 bit or 64 bit executable?

Is the -T filesystem large file capable?

How do you know that file descriptor 30 is the SRT file?
 

TomBascom

Curmudgeon
In fact if I switch back to the previous version of ProcessingEngine.p used in the proof in concept it works fine.

What is different in the two versions of ProcessingEnine.p?
 

TheMadDBA

Active Member
Well you know where to look now.. the core issue is in the new version of ProcessingEngine.p (or its new children). Your other code may or may not have issues but it runs to completion with the old version at least.

Do a diff between the versions and find out what has changed.

Download protop if you haven't already and take a look at which records are being read.
 
Thanks for all the responses. While it's not my first rodeo I'm assuming I'm doing something stupid. To clarify when I used the term "lab" I was referring to our development server in the "lab". The db and associated data are the same. I even performed a prorest to assure no differences existed. Protop indicates that it's for 10.2B (I'm on 10.2A) but I'll give it a try.

I'm running SUSE 32 bit version. Large File Support is turned on. ProcessingEngine.p is huge. It calls about 5 over programs plus the previously mentioned 'C' external program. It can take 1-18 seconds based on the number of passes to return the best formula results. I'm basically trying to determine formula changes from one release to another based on data changes that drive dynamic formula changes. I know I should be able to follow the code and the answer is right in front of my face I just thought I was missing something obvious.

I know (or think I do) it's the srt file because I can watch it grow and the (addr -2147483648) part of the returned message is almost exactly the file size of the srt file when it core dumps. Maybe I'm making a bad assumption? I know if I run the old business logic it runs to completion but the new logic with significant changes crashes.

Here is where someone could help explain what I don't understand. If I understand correctly Progress uses the srt file to manage query's, break-by's, for each's,index rebulds, etc. Especially on bad index reads. Wouldn't the need for this be destroyed (for lack of a better term) at the completion of the ProcessingEngine.p run? It's older style codes so there are no query's, handles, anything that would create a memory leak or such.

Thanks again for all the feedback. I'll continue my pursuit.

Rod
 

TomBascom

Curmudgeon
cat $DLC/version

You are on Linux so use "lsof -p <PID>" to get a list of the open files for the process. (You will need to be "root" or get a root user to do it for you.) This will reveal which file descriptor is [HASHTAG]#30[/HASHTAG]. It may indeed be the SRT file but it would be a really good idea to verify that.

When you run the old code does the SRT file grow anywhere near 2GB?

Can you create a file greater than 2GB on the -T fielsystem using standard OS tools? i.e. dd if=/dev/zero of=/protemp/bigfile bs=1G count=3

Yes, ProcessingEngine.p is huge. Got it. But it didn't *all* change recently did it?

You *are* doing things with handles. Specifically you are creating temp tables dynamically. As many as 170,000 of them according to your data above. You CREATE them but never DELETE them -- that is a handle based memory leak in action. If it has not bit you yet it will eventually.

Yes, you have accurately described what the SRT file is used for. In older releases it also held r-code -- I'm not sure exactly when the RCD files appear but if there aren't any visible with -t turned on then you are on a release that stores r-code in the SRT file.

Transaction and record scope issues might also impact how long things are kept in the SRT file (I have not tested that -- I'm just speculating). That's why I'm double checking the apparent issues with scope in the code snippet that you have shared.
 

TheMadDBA

Active Member
If you are using open query or dynamic queries and not closing them the space in the SRT will not be reused. Not 100% sure about other things that would stop the SRT file from being reused instead of grown. I try to not write code that causes SRT files ;)

I think the r-code change to RCD happened around 10.1C or so.
 
Thanks for hanging in there with me. Here are my latest findings:
"lsof -p <PID>" verified it is the srt file
"dd if=/dev/zero of=/protemp/bigfile bs=1G count=3" verified support

I do want to verify my understanding of one thing. Tom said:
"You *are* doing things with handles. Specifically you are creating temp tables dynamically. As many as 170,000 of them according to your data above. You CREATE them but never DELETE them -- that is a handle based memory leak in action. If it has not bit you yet it will eventually. "

I get the whole if your create it you must delete it thing. My handles, queries, etc are destroyed (deleted) and verified. The two temp tables in my very 'simplistic' example code are both statically defined. They not passing handles or buffers to the the handles nor creating and assigning handles. Are you saying that even a statically defined temp tabled using the empty temp-table still leaves a memory leak? If so I did not realize that?

I can swap out the versions of the procedure and it works/fails consistently. I'll continue sweeping through the code. I'm sure you'll appreciate it this. I'm dealing with variables like "ok" as a logical. "ok" is passed to a procedure as a new variable called "inbound" that new procedure also has a variable called "ok" that means something else. Not relevant to the conversation but thought you would enjoy my pain.

R
 

TomBascom

Curmudgeon
My mistake -- I misread the code at the top and thought that you were creating a new bttTable rather than just a record within it.
 

TheMadDBA

Active Member
Are you running 32 bit or 64 bit Progress? If you are running 32 bit then 2GB will still be a limit for temporary files. Some of the early 64 bit versions still had 2GB limits in certain places (temp-files,output, etc).

I am pretty sure protop will still work with 10.2A, if not then you can always set -tablerangesize and -indexrangesize manually and investigate the DB activity using the _UserTableStat and _UserIndexStat VSTs, or at the very least _TableStat and _IndexStat.

Also... when you have a decent sized SRT file, try this from the shell.. strings SRTfile > somefile

Sometimes you can glean useful information from the character contents of the SRT file.
 
Thanks TheMadDBA & Tom. I'll update when I figure it out. Have a great Labor Day weekend.......FYI Tom I agree and added the BY REFERENCE:)

R
 

TomBascom

Curmudgeon
Depends on which ProTop -- the old version "xx" works fine with 10.2A but the user interface is sometimes challenging. ProTop 3.x says that it requires 10.2B but I have made some effort to make things compile conditionally and it /might/ work for 10.2a -- perhaps with s few small source code tweaks if ytou have the appetite for that sort of thing.
 

TomBascom

Curmudgeon
"lsof -p <PID>" verified it is the srt file
"dd if=/dev/zero of=/protemp/bigfile bs=1G count=3" verified support

I'm a "trust but verify" kind of a guy. Lots of times some really crucial assumption that I was *sure* was true has turned out not to be. So when things seem really strange I start by verifying things that should be true. Sometimes you save a lot of time and energy that way.
 
Tom, TheMadDBA,

If you wouldn't mind I was hoping you could either validate or disprove my final assessment. Actually Tom's comment "start by verifying things that should be true" lead me down the path. Here is what I discovered (facts/observations):
  • I was trying to run in excess of 170K transaction through an existing engine that only processed on average 150 transactions a day.
  • I was focusing only on changed code and not the big picture. I discovered two bad read's of a db table with over 300K records causing a large scope read in many cases.
  • I did introduce an environmental difference inadvertently. I tuned the startup parms when I first stated the project (adding -B -TM -TB amount others). None of these existed on the production systems (ouch!)
  • Even with the bad reads the previously failed systems run with the -B hitting receiving about 99% of the hits
  • With the code fix and no tuning they ran without failing
  • I also discovered that there was a memory leak associated with not cleaning up the memory pointer when the 'C' program was called (again old code)
I'm still don't really understand why the srt file would grow even after the program finished and was then recalled. Could it have had something to do with the memory leak or was it the bad reads or a combination thereof?

Either way I'm moving on. Thanks again for you most appreciate help and advice.

Rod
 
Top