Progress DB Engine Is Stupid About Some Things

Lee Curzon · Apr 11, 2006

RE: Running 4GL on same machine as DB.

Yes you need to do this for performance.

See eg. 18342
Title: "How to improve Client Server Performance in Progress"

http://tinyurl.com/fktm6

If you want to call 4GL stuff from a client, don't forget the Appserver option.

Lee Curzon · Apr 11, 2006

And on the point about -S etc.

P98650
Title: "Poor performance with 4GL session connected in client server although a shared memory connection could be used"

http://tinyurl.com/mnlwq

Lee Curzon · Apr 11, 2006

Anyway, hope that helps a little - must get back to work :awink:

TomBascom · Apr 11, 2006

As a general statement RAID5 sucks if the performance of an application matters. Regardless of whether the application is a Progress application or some other application.

You often don't notice that performance sucks (hence the generally low level of recognition that RAID5 is devil spawn) because read operations dominate most applications and RAID5 read performance is mostly ok (not great, just ok). Furthermore vendors realize that RAID5 sucks so they don't really deliver RAID5 -- they deliver RAID5 plus a big RAM cache which mostly masks most of the problem most of the time. But note the word "most" (used thrice).

For the RAM cache to be effective it has to lie to the OS about when writes are completed ("write back" vs "write thru"). If it is lying ("write back") and it has reordered the write operations (a common optimization) and if a bad thing, such as a power outage, happens then the database can be corrupted. This used to be fairly common. Vendors have become smarter but you usually have to pay a lot more for a trustworthy implementation of RAID5 (think EMC or IBM). If you have a low cost RAID5 "solution" you probably have a lot more risk than you realize. (All the more reason to be aggressively leveraging after-imaging and getting those ai files archived onto another machine pronto...)

It is also important to recognize that even "large" caches can be easily saturated by certain operations. It doesn't take very long for a database restore, an ai roll forward, a dump and load, an index rebuild, a dbanalys, dbtool, or a large table scan to saturate the cache (do the math, it isn't hard). At which point you're limited by the naked RAID5 -- and RAID5 write performance is essentially that of a single disk. While many of these operations are relatively rare they also take place under extreme time pressure -- restoring and rolling forward, for instance, isn't usually something that you're doing just for kicks. The boss is probably pacing the aisle by your cube

Read performance can also suck with RAID5. When a disk fails and the data is being reconstructed from parity you'll see what I mean

Lastly -- even if you aren't having any particular performance problems your performance is significantly less than it could be if you had instead implemented RAID10 (striping aka RAID0 and mirroring aka RAID1 combined).

Database performance is not dependent on disk space (which is what RAID5 optimizes). Nor does it generally depend on disk throughput as measured in bytes per second (that's important for sequential IO like streaming video on demand). Instead RDBMS' depend on disk operations -- IO ops per second. The only way to get more IO ops per second is to get more disks -- and since disks get bigger (space wise) faster than they get faster that generally means you'll have lots of empty space. Bean counters have a hard time with that concept. Never allow the empty space to be called "wasted" -- set it aside for particular purposes that do not contend with the database for the IO ops. Good uses include things like a recovery area -- so if you have to restore you do not have to overwrite your database (sometimes backups turn out to be no good; you'll want your broken production database if that happens to you) and an upgrade area (it is very handy to be able to fall back on the untouched database if an upgrade has to be rolled back) and so forth.

BCM · Apr 11, 2006

This seems to be a hot button with you, Tom. Thank you for your detailed response.

BCM · Apr 11, 2006

Here's what I have learned: We use RAID0+1 as (RAID10) as recommended. Our drives are SCSI.

You gentlemen have indicated that 4GL database processing will be considerably faster when the code is executing in shared memory with the database rather than through the TCP/IP stack. How do you enable this when the database is in multi-user mode (during business hours we always have at least 50 concurrent users spread across 5 offices from Atlantic coast to Pacific coast)?

Lee Curzon · Apr 11, 2006

BCM said:
You gentlemen have indicated that 4GL database processing will be considerably faster when the code is executing in shared memory with the database rather than through the TCP/IP stack. How do you enable this when the database is in multi-user mode (during business hours we always have at least 50 concurrent users spread across 5 offices from Atlantic coast to Pacific coast)?

This isn't going to be very helpful but...

I'll leave the config suggestions to those who know about the subject.

You've said you have no control over the source - so is it a modern application (ie. appserver/webservices) or an old one (client-server)? What do your suppliers say about how it should be configured?

If the application is modern and well written, you should be able to configure it with your suppliers help to the best configuration.

On the other hand, if the application is old or (as you've previously suggested) your application suppliers are useless, and your configuration options are limited, perhaps you need to consider exporting/converting your data to a new application.

Lee Curzon · Apr 11, 2006

Assuming the worst case, and the application is client-server, and you can not do much more about the configuration, one option to consider may be something like Citrix, which (as I understand, have never used it) runs the application windows local to the database, and only transmits the screen updates to the remote clients, not the data.

http://www.citrix.com/English/ps2/p...2671&title=About+Citrix+Presentation+Server+4

Apparently it's very expensive.

Obviously, the 'junk the application/use Citrix' options are last resorts to use if you have no legroom on the configuration issues - but to be honest, I don't understand why you are not getting more support from the supplier.

BCM · Apr 11, 2006

We have users spread across 5 offices from Atlantic to Pacific. The burden on Citrix is too much. In 2004 prior to my coming onboard, the firm worked with the application supplier to develop a browser based substitute front-end. The browser is Internet Explorer, the browser code is ASP, the web server is IIS, the web server talks to the application through an API developed by the supplier. The API talks to the Progress AppServer. The AppServer talks to the database. That is how most users get to the data today. It is slow.

The supplier long ago acquired this application and never cleaned house or re-wrote it. There are 565 application tables, and we actually have data in half of those. Many tables have lengthy lists of columns (100+). Many of those columns are not keys but contain data that is available in other columns in other tables. Columns are not consistenly named. Each table has many indices (6-12 per table), and some of the indexed columns are never populated.

I could go on and on, but I would just become angry again. Thanks for your suggestions, thoughts and comments.

By the way, how can a 4GL application running on the same box as the database be configured to use shared memory rather that TCP/IP when the database must be run in multi-user mode?

TomBascom · Apr 11, 2006

Simply stop using -H & -S and the connection will be a shared memory connection. (This *only* works if the database is on the same machine as the client.)

Multi-user isn't really relevant -- "shared" memory is shared between multiple users.

You mention that app servers are used to access the data. Thus it is the app servers that should be running on the same box and using shared memory (aka "self service") connections.

50 users is not very many.

BCM · Apr 11, 2006

I am beginning to understand. Thank you.

joey.jeremiah · Apr 11, 2006

in a self-serving connection there's nothing between the client and
memory (or disk).

there's no server in between that process the queries. in a sense the
client is the server.

of course there are other server processes e.g. the watchdog,
asynchronous page writers, the broker has some general tasks etc.

in a remote connection the server process queries on behalf of the
client.

in a way that acts as an extension to how the client works in a self-
serving connection.

that is extremely problematic over a network and maybe impossible
over wan.

here's a recent post on the subject. i've even cited a recent thread of
yours as the perfect example.

http://progresstalk.com/showthread.php?t=96905&page=2

besides that, again, i think, you'll be happier if you'd also subscribe
to peg.com.

there's not as much of us in here and we're not as famous (yet), well, except
for Tom.

JeremyGiberson · May 24, 2006

Order of testing

I do not want to contribute to the "why progress debate" but I wanted to address the issue speeding up the select. With my experience working with odbc and the SQL-92 the order of conditionals in the SQL statement affect the speed of the initial select.

For instance where a table "payment-trans" has a great many number or rows, 30 fields two of which may be indexed. When selecting from this table with a WHERE CLAUSE that involves one indexed field and one reguler field, the statement with the indexed field conditional first returns suitably fast where as the non-indexed field conditional will take minutes.

I would suggest a couple of things to attempt to speed up your select. First I would try putting the false conditional right up front. If you find it is still taking a long time to process try combinations of conditionals involving indexed fields being selected and your false conditional.

IE:

Code:

WHERE 1 <> 1 AND "indexed-field1" = 'blah' AND "indexed-field2" = 'blah' AND "non-indexed-field" = 'blah'
WHERE "indexed-field1" = 'blah' AND "indexed-field2" = 'blah' AND 1 <> 1 AND "non-indexed-field" = 'blah'

I would also be sure to order unique indexed fields before regular indexed fields.

Let me know if this improves your query.

BCM said:
I have occassional need to build routines that copy data from our Progress database to SQL Server. A simple way to quickly generate an empty table structure is to issue a SELECT ... INTO <TABLENAME> FROM ... WHERE <your criteria> AND 1 <> 1. Because the number 1 is always equal to 1 no records will be returned, but the table structure will be created. With a robust database engine the above query completes in a fraction of a second because the database engine determines that no records will pass the test where 1 <> 1. However, Progress just blindly proceeds to scan all rows.

To be fair, I used the Procedure Editor to try a similar test in 4GL. Go ahead, you try: "FOR EACH <TABLE> WHERE 1 <> 1 NO-LOCK: END."
Sure enough, Progress scanned every row. Frankly, my dog is smarter than this.

BCM · May 24, 2006

JG -
I don't really need to select data where 1<>1. It was just a quick and dirty way to see a column list without having to log into the Data Adminstration tool.

Some database engines are sensitive to the order with which conditions are given.

Thanks for your response.

JeremyGiberson · May 24, 2006

BCM,
I understand that, I am suggesting a way to quicken the time it takes to realize nothing is going to be selected and thus return your column set.

Progress DB Engine Is Stupid About Some Things

Lee Curzon

Member

Lee Curzon

Member

Lee Curzon

Member

TomBascom

Curmudgeon

BCM

Member

BCM

Member

Lee Curzon

Member

Lee Curzon

Member

BCM

Member

TomBascom

Curmudgeon

BCM

Member

joey.jeremiah

ProgressTalk Moderator

JeremyGiberson

New Member

BCM

Member

JeremyGiberson

New Member