Intermittant Read Errors on Schema Database

MarkP

New Member
Good morning.

In a story I've sure you've heard before, about 5 months ago I first heard of Progress. I'm sure that will be amply evidenced by the rest of this post. Since then, we have resolved many issues with our system, but are plagued by one in particular that we just can't get a good read on.

We are running OpenEdge 10.1A, in conjunction with an Oracle database. Most of our users connect via WebSpeed, but we have several automated processes connecting via ProWin32.

These processes are supposed to run 24x7, and occasionally, they crash with the following error. The often crash hours after they are launched, and I'm definately missing something here, as I'm not sure why they would be connecting to the Schema Holder again?

bkioRead:Unknown O/S error during Read, errno 2, fd 256, len 4096, offset 64, file path\database.d1. (9451)

The errno 2 is constant, and should read "File does not exist". The file descriptor is most often 256, but may be 192, 260, or 390. This is on a Windows 2003 SP2 Server, so I would not expect a file handle limit of 256. len and offset are generally, if not always, the same.

I should mention I am absolutely positive the file exists, at that path, and it's not related to the ACL. This item runs successfully hundreds or thousands times before it fails.

The database file referenced is about 13 megs.

I have 3 different enviroments with drastically different usage levels, and receive the error in all 3. I receive the error both under high usage, and low usage scenerios, but I beleive the automated generate more usage then my actual users even under high usage. This happens on average about once a day, although I may have multiple instances in a hour, followed by none for a few days.

Incidently, any recommendations for a good book for a MS-SQL guy who was recently told "Hey, Progress is a database, SQL is a database, our Progress guy is gone, here ya go?"
 
The first things that pop into my head whenever I see these sorts of intermittent problems on windows are:

1) Is there an anti-virus program running? ("Yes" is the wrong answer.)

2) Is there a 3rd party backup program running? (Again "yes" is the wrong answer.)

Both of these cause problems under windows because they often lock files that they are scanning resulting in strange errors inside Progress.

3) Is the db on a network drive? That might be a problem if the network is somewhat flaky and access to the file comes and goes.

As for good books... for a DBA I would suggest visits to either the White Star or Bravepoint websites. Obviously I prefer wss.com ;) A mentoring engagement with an experienced consultant would also probably do you a world of good.
 
I have antivirus running on the server, but the Progress directories are on the exclusions lists, so they will not be scanned. (This was not originally setup this way, originally the database files where under real time on accesss scanning. I changed this as soon as I saw it, a few months ago. Our progress guy set it up that way...)

My backups are block level SAN Replication, and are not currently enabled on this LUN to help isolate this issue. The Schema Holder database, specifically, isn't modified very often, and shouldn't be touched even by replication for months. Unless I have a basic misunderstanding about this file?

It is on a fileshare. I note no Error Log entries about network issues, but that's far from conclusive. Would the normal best practices be to place a copy of the Schema Holder on every machine that accesses the database? That wouldn't be horrible on my deployment- 2 systems running automated processes, 1 Terminal Server, and 2 Webspeed servers would cover it, I think, but it seems like it would be a nightmare if the majority of your clients didn't use the Web Front-End?

I'll be spending some on WSS and Bravehart, and see what I can convince the new bosses to go for regarding training engagements once I'm sure I've read enough to be certain I'll be asking the right questions and equiped to understand the answers- I'd hate to have access to an experianced expert and waste my time and his/hers with the basics.
 
I don't know that I would go so far as to call it a "best practice". Putting a db on a network drive should be safe (so long as the network is reliable). But if the network is questionable I'd give it a try. Or I'd do it to rule out the possibility.

I'd also grab a copy of the sysinternals tools and start trying to figure out if there is a pattern to the fd being reported. Actually, offhand, those seem like awfully high fd#s for a schema holder db. Why so many open file descriptors? Progress can support thousands but it is strange for a small db to have so many.
 
I have numerous instances of Prowin32 running, plus a handful for the OS and related applications.

It appears that it may be an physical hardware issue. A network trace is showing that the file server is responding to the request for the Schema Holder, but the server making the request does not ack the packets. There is a suggestion of correlation with SAN Traffic, or certain kinds of SAN Traffic, that bears further investigation.

Looks to me like this will end up being a shared issue, with "interesting" error handling methods and possibly less then optimal hardware method combine to form the issue.

Our application "support" suggests moving the servers off the blade and SAN, and onto a regular 2U server. I'm told no one else is running a SAN...

I'll post back if I can come up with a better solution, in case someone else has a similiar issue. Thank you for your time, and I'm slowly reading backissues of Progressions in my spare time!
 
Lots of people run Progress on blades and SANs. It may be that your particular vendor has little or no experience with such things. Many Progress partners don't have much experience with deployment issues. Sometimes their customers are way ahead of them WRT to such things. That's where guys like me come in handy ;)
 
Back
Top