Running Progress on a RAID disk array

Chris Kelleher

Administrator
Staff member
<FONT face="Courier New">
FAQ - PROGRESS & RAID
=====================

From: McIntyre_Bruce/amer_tech@qad.com
X-Openmail-Hops: 2
Date: Tue, 25 Apr 95 10:06:20 -0400
Subject: [PEG] [RE] Multi Volume Databases
X-Mailing-List: <progress-list@peg.happmis.com> archive/latest/641
To: progress-list@margaret.peg.happmis.com
Status: RO

Item Subject: message.txt

>From message by Rimmy Kaur:
>
> Our database server is a Compaq Proliant 4000 currently hosting
> a single processor. It has 64 MB of RAM and is running Unixware.
> The RAID 5 implementation consists of five 2.1 GB SCSI-2 hard drives
> and one SMART SCSI array controller. There will be about 30 users
> accessing the system at any given time. The application will be host based.

This is a perfect example of the worst kind of RAID5 implementation. Every
thing is on one "logical" disk, made up of several medium size disks. There
is no ability to "tune" the system for various kinds of load, or to isolate
certain disks for certain kinds of activity. While this may indeed work ok
with the load you have specified, it certainly is penny wise and pound foolish.

1. Can't isolate where and how SWAP is set up and used (and EVERYTHING is
paged to swap at some point.)
2. Can't isolate the users TMP files from database read/writes and BI
reads and writes.
3. One controller channel for five drives virtually guarantees it will be
maxed out.
4. Performance will be "spotty" because of conflicts between the various uses
of the filesystem, and will be impossible to isolate and/or cure.
5. All writes to the BI file will conflict with writes (and reads) from the
database.

If you MUST use RAID 5, use it ONLY for the database, and never more than
four physical drives to a SCSI-2 controller channel. If I were to be setting
up this system, it would be to use ten 1.2 Gbyte disks, 3 controllers
and about 128 Mbytes of RAM.

Then I might use RAID 5 on the DB drives and the users drives, but would
still have the BI file on a mirrored pair. I would put the root filesystem
on a single drive, and leave the rest as a separate RAID5 set. This will
cost a little more to do, will give you high availability, and great
performance.

> The past few days discussion on RAID 5 vs. RAID 0 and RAID 1 has been
> very informative. PSC response to RAID 5 is that Progress does not care
> as long as RAID 5 is implemented properly at an hardware level. RAID 0
> and RAID 1 does offer the best performance and protection but is also
> the most expensive to implement. RAID 5 provides a less expensive
> alternative while maintaing fault tolerance. Wouldn't it be equally
> effective to stripe the bi file (under RAID 5) as having the bi file on a
> seperate disk and mirroring it?

The BI file is basically the most constrained resource in a Progress database.
All transaction notes and rollback notes MUST be written to the BI file. This
is done basically in a sequential fashion. Once you lay that disk drive head
down to write, you never want it to lift. Anything other than that virtually
guarantees that the BI file will become the constraining factor in I/O
throughput. The same thing would be true of the AI file if you were using it.

If you put the BI file on a RAID set, or anywhere it must contend for disk
access time, you will directly and immediately impact the ability of the
entire database to process transactions.

> Rimmy Kaur
(703) 696-4925

Bruce
--
===========================================================================
Bruce A. McIntyre, Industry Analyst for Technology, QAD Inc. (bam@qad.com)

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

From: tom@bass.lkw.com (Tom Bascom)
Message-Id: <9504142146.AA17058@bass.lkw.com>
Subject: re: tuning NCR
Date: Fri, 14 Apr 1995 17:46:55 -0400 (EDT)
X-Mailer: ELM [version 2.4 PL2]
Content-Type: text
Content-Length: 2730
X-Mailing-List: <progress-list@peg.happmis.com> archive/latest/363
Approved: progress-list@peg.happmis.com

> Sorry to cover old ground again. At Freedom Furniture we have an NCR 3500
> machine with 512mb of ram and 18 gig of disk running RAID 5.
>
> We are running Progress 6.3? on this machine.

6.3 what? According to Bruce 6.3b stands for 6.3 Bugs.

> We currently have 200 users on this machine.
>
> About 2/3rds of the users are at the stores and the distribution centres,
> which spread across Australia.

How are they connected?

> The response time of the machine is very bad. We have tried a lot of things
> but don't seem to have gained any ground.

How bad is bad? Have you measured bad? How much better do you need to make
bad in order for it to be good?

> If any one has any information on Tuning this machine or any other
> alternative, be it getting rid of RAID 5 and installing fixed mirrored
> disks, please send me your opinions.

What is the disk layout?

Get rid of RAID 5. Striping and mirroring are good. Make sure that
the .bi file is on it's own, dedicated, spindle. Stripe the db
across as many spindles and controllers as possible. Don't share
the .db disks with other functions. Put the temp files on a disk of
their own. Put the r-code on it's own disk. Put the OS on a disk
of it's own. Lots of disks are lots better than a few great big
disks.

How much RAM do you have and are you using it?

Plenty of RAM is a good thing. 2.5mb per user is a good start.
Make sure that you're using it, that is set -B to a useful value
that does not cause swapping (isn't larger than physical RAM). For
200 users a -B setting of 200,000 would not be unreasonable. Such
a setting will require kernel tuning outside the experience of
most system administrators. SAR lies about how much RAM is being
used on that machine so you have to figure it out empirically.
If -B is set to some ridiculous value like 500 open the box up,
remove 500mb or so of that RAM and send it to me. Nobody will
notice, you aren't using it but I will :)

How many processors?

I'd assume that a 3500 has at least a couple of processors in it.
If so are you using the -spin startup parameter? If you're doing
more than a couple hundred process switches/sec then you're probably
thrashing.A

What are the broker and client startup parameters?

What does PROMON say your problems are?

Have you considered moving to 7.3?

-----------------------------------------------------------------
| Tom Bascom You've just got to be persistent. |
| EDS |
| tom@lkw.com |
-----------------------------------------------------------------

===========================================================================

From McIntyre_Bruce/amer_tech@qad.com Mon Apr 24 03:19:56 1995
Received: from qcohp01.qad.com (qcohp01.qad.com [192.152.75.75]) by margaret.peg.happmis.com (8.6.9/8.6.9) with ESMTP id DAA04925 for <faq@peg.happmis.com>; Mon, 24 Apr 1995 03:19:49 -0500
From: McIntyre_Bruce/amer_tech@qad.com
Received: from by qcohp01.qad.com with SMTP
(1.37.109.16/16.2) id AA179461270; Mon, 24 Apr 1995 01:14:30 -0700
X-Openmail-Hops: 2
Date: Mon, 24 Apr 95 04:18:15 -0400
Message-Id: <P000007d013a5f45@MHS>
Subject: [QAD] RAID and what it means
To: faq

Item Subject: C:\TMP\RAID.TXT

WHAT IS RAID AND WHAT DOES IT MEAN TO YOU!
==========================================
Understanding the Technology
by Bruce McIntyre, qad.inc


Avoid the 'Bigger Is Better' Disease
------------------------------------
The benefits of redundant arrays of independent disks (RAID) are
real. Our benchmark tests prove it. They offer reliability,
reasonable performance, easy upgrades, and better availability than
the alternatives. If your data is worth more than your hardware,
RAID is the way to go.

Historians, journalists and others have documented the American mind
set of 'Bigger is Better' for many years. Quality and efficiency
take a back seat to volume or size. This was fine for 25 years or
so, but the rules have changed. However that mind-set won't die
easily.

Disk storage illustrates the point. Should you buy a few large drives
or lots of smaller drives? Do you hear a little voice telling
you to buy the big ones? That's the disease whispering in your ear.
Its especially hard for data center people to adjust their thinking.
Big is their life. When it's time to buy disks for the MFG/PRO
environment they are tempted to handle it in the same way they
handled the mainframe. It's easy to forget what made distributed
computing attractive in the first place: inexpensive hardware and
software. When it comes to these new systems, smaller is often less
expensive.

Especially in an MFG/PRO and PROGRESS environment, you can never have
too many disks or two many controllers. The idea is to let work
continue on several disks at the same time, and to be able to
distribute the database over several spindles for this purpose. In
fact, make sure that the disk controllers that you select can handle
simultaneous access to multiple drives at the same time as well. The
newer RAID implementations work with this approach very well, and may
even offer improved performance in addition to improved data security.
Understanding the Technology

A redundant array of independent disks (RAID) is any disk subsystem
architecture that combines two or more standard, physical disk drives
into a single logical drive in order to achieve fault-tolerant data
redundancy. Performance will often degrade in many configurations in
a high transaction rate environment. Because one of the real world
reasons to use a RAID architecture is to improve performance, the
differences among the RAID architecture's show up in various levels
of performance, some of which may be improvement in both speed and
security.

Although you might be tempted simply to purchase the RAID with the
highest number (5), keep in mind that RAID implementations are
created by array vendors to fill existing market needs. There are no
official standards to establish RAID guidelines. Therefore, later
RAID specifications augment, but do not supersede, earlier RAID
levels. The key is finding the RAID level that satisfies your needs.
There are generally six (6) accepted levels of RAID. Each level is
described below:

RAID level 0
------------
This level uses disk striping, in which data is written across
multiple disks rather than on just one disk. For example, segment 1
is written to drive 1, segment 2 is written to drive 2, and so on.
When the system reaches the last drive, it starts over writing to the
first available segment on drive 1. Sometimes this is handled at the
hardware level and sometimes this is handled at the software level.
Generally speaking, if it is handled at the hardware or disk sub-system
level, it will offer better reliability and performance. However if it
is only available at the software level, it is generally still better
than not using it.

RAID 0 has NO data redundancy: If one drive fails, the entire drive
subsystem goes down. It is ideal for high performance applications
such as MFG/PRO and PROGRESS that don't require data protection. In
fact, qad.inc has seen better performance on "striped" drives than on
a multi-extent database. The stripes should be as small as is real-
istically possible. (One megabyte is a good target slice size!)

Because it has no fault tolerance (remember our above definition of a
RAID system), RAID 0 is technically not true RAID. Every time a
vendor claims it supports RAID 0, it is talking about disk striping
and not a true RAID technology. RAID 0 and RAID 1 can be combined to
achieve fault tolerance however.

RAID level 1
------------
Often called mirrored or shadowed disks, RAID 1 has a duplicate
backup disk for each active disk in the system. Data redundancy is
obvious; every byte is duplicated. Although simple and relatively
easy to implement, RAID 1 is twice as expensive as the single,
nonredundant drive approach. Read performance can be improved
substantially because both drives can read different pieces of data
at the same time. RAID 1 has slightly slower average access times
than does a single drive. It has better seek times when reading and
worse when writing. In an MFG/PRO application, it does offer the
best reliability, but at a performance reduction. The benefit comes
from being able to "break" the mirror. To perform fastest possible
backups, the PROGRESS server is shut down, the "mirror" is broken or
unlinked, and then the PROGRESS server is re-started. The backup of
the "mirror image" of the database can now be done without impacting
the users access of data. When the backup is complete, then the
"mirror" is re-enabled, and the second copy of the database is
brought into sync.

This will not work on ALL versions of RAID 1, but would offer maximum
reliability with minimum support issues.

RAID level 2
------------
Using a bit-interleave process that spreads data across all of the
drives in the array, RAID 2 gets around the 100 percent disk overhead
of RAID 1. In this way, the first drive in the array contains the
first bit, the second drive contains the second bit, and so on.
Additional drives contain error correcting code (ECC) or parity
information. RAID 2 was designed for mini, mainframe, and
supercomputers rather than microcomputers, which track drive errors
through internal checksums on the disk and standard error flags
performed by the drive and controller. As a result, RAID 2 systems
are too robust for microcomputers, but ideal for mini computers or
super servers.

RAID level 3
------------
Basically, RAID 3 is RAID 2 for microcomputers. RAID 3 has two or
more data drives and only one parity (ECC) drive. Again, data is
interleaved across all data drives. Data can be interleaved at bit
level, byte level (the most common), or any other logical size.
Because data is interleaved across all data drives, a single read
request is performed by multiple drives. Each drive reads a portion
of the data and all of the drives transfer their portions to the
controller in parallel. This yields high transfer rates for serial
I/O, making RAID 3 ideal for application that need high I/O
bandwidth. But only ONE I/O transaction can be processed at a time
because every drive is involved in each read or write transaction.
RAID 3's parallel data transfers often work well for workstations
that require fast sequential access to single large files such as
image processing or CAD systems. It is generally not recommended for
an environment such as MFG/PRO which is made up of lots of small
amounts of data in a basically random mode.

RAID level 4
------------
RAID 3's primary disadvantage is its inability to perform
simultaneous I/O transactions because large blocks of data are
interleaved across all drives. RAID 4, on the other hand, places the
entire first transfer block on the first data drive, the second t
ransfer block on the second drive, and so on. This process improves
disk performance by enabling multiple reads. However, there is still
only one parity drive, present. The parity drive contains the parity
for all the data drives and is involved in every write transaction,
forcing them to be performed one at a time. This is RAID 4's primary
disadvantage.

A multitasking operating system such as UNIX can process independent
read transactions for each data drive in the array. In an array with
four data drives, for example, the array can perform four times as
many reads as a single drive can in the same period.


RAID level 5
------------
The use of dedicated parity drives in RAID levels 1 through 4 limits
each of these architectures to one write transaction at a time. RAID
5 eliminates the need for a dedicated parity drive. Each drive in a
RAID 5 array contains both data and parity blocks. As in RAID 4, an
entire transfer block is placed on a single drive and the parity for
that block is stored on a different drive. When a drive fails, its
data can be reconstructed from the remaining drives. Eliminating the
dedicated parity drive removes most of the single-write bottleneck
and lets RAID 5 perform multiple read and write transactions in
parallel.

Compared to a single drive, an array with four drives can perform
four times as many reads and two times as many writes (because each
write involves two drives) in a given interval. In a combined
read-write environment, the virtual transfer rates could be increased
by a factor of one half the number of drives in the array, compared
to a single drive. As the ratio of reads to writes increases, the
transfer rate increase factor approaches the number of drives
installed. And since MFG/PRO and PROGRESS has a four or five to one
bias of reads to writes, this will show up dramatically over a single
drive.

Because UNIX, NetWare and VMS are multitasking operating systems,
RAID 5 makes good sense in these environments as long as you are not
going to do disk mirroring at all, or on those segments of the file-
system that are not going to be mirrored. However, because these
systems are NOT optimized for sequential writes, the BI file and the AI
file should NOT be used on RAID 5. Only do this if COST or the ability
to add enough controllers is the overiding concern. Even then, make
sure you have the option of changing to RAID 10 if required.

RAID level 6 and beyond
-----------------------
Although no final design has yet won defacto industry approval,
several vendors have proposed an implementation similar to RAID 5 but
with two sections of each disk set aside for parity. It is highly
redundant, and therefore more expensive, but offers higher fault
tolerance than any existing implementation except disk mirroring.

Others are using even more esoteric models in an attempt to minimize
the performance hit of RAID 5 on relational databases. Make sure
that you have the vendor's guarantee that if performance is not
adequate that you can revert to RAID 10 later. Also make sure that
any CACHE memory in the disk sub-system is of the WRITE-THROUGH
mode to eliminate any chance of loss of database integrity.

RAID level 10
---------------
The level known as RAID 10 is a combination of RAID 1 and RAID 0.
Several vendors offer RAID 10, but some call it RAID 1/0 or simply
RAID 0 with fault tolerance. In our benchmark tests, this mode often
offers the best combination of speed and reliability. However, even
here, make sure that the BI and AI files are placed on single mirrored
pairs rather than on a "striped" set.

Overall, this approach offers the best chance of approaching the per-
formance available on a non-redundant system, as the writes can be
delayed and the reads can be multi-threaded.

CONCLUSIONS
-----------
When looking to your hardware vendor to help implement MFG/PRO and
PROGRESS, remember to ask about the different versions of RAID that
may be supported, and make sure that you do NOT buy into the idea
that "ONE BIG DISK WILL BE MUCH CHEAPER!" And also make sure that
the hardware vendor understands how you will be using the system so
that they will suggest the best arrangement to deliver the best
reliability along with the best performance. Also, make sure to
review the possibility of putting the MFG/PRO production database on
a mirrored drive rather than using the PROGRESS roll forward
facility. Both of these aim specifically at managing media failure,
and if it can be done at the controller level rather than the process
level.

After long and painful experience, the optimum performance along with
reliability will be gained by putting the database segments on one or
more mirrored stripe sets (RAID 10), and put the BI and AI segments on
separate mirrored pairs (RAID 1). If the disk subsystem has a good
cache memory system, the result will be even better. Do NOT place the
users files (-T) on any of the database and/or BI,AI segments. If you
size your system allowing for only filling half of each physical drive,
then you will be much better off, and have built in the ability to
handle database dump/loads and or re-indexes.

Remember to include plenty of disk controllers as part of the setup.
Many vendors will sell you a long chain of "striped" drives, but on
one controller channel. This totally mangles the ability of the
drive sub-system to perform at a reasonable I/O speed. Make sure that
you allow plenty of disk I/O channels, regardless of what the hardware
vendor says is possible. Remember, relational databases are almost
always I/O limited, and controllers have their limits as well.

Also remember to "mirror" the disk controllers wherever possible, so
that the system can remain on-line even if a controller is lost. Some
systems will support special dual or triple-ported disks that can be
attached to more than one CPU at the same time. These arrangements
add the possibility to offer even higher availability at an incremental
cost level. You may find yourself setting up multiple stripe-sets in
stead of one large stripe set in order to use multiple controllers.

You may wish to pre-allocate multiple filesystems on each physical drive
as part of the raid implementation, and by allocating the "central" portion
of each drive (the middle tracks) as part of one overall filesystem, there
can be additional performance gained from minimizing the overall require-
ment of the disk heads to move beyond an optimum zone of access.

Please do not disregard using multi-extent databases just because you
are using RAID. There are some inherent performance problems when any
filesystem file gets too big. We have found that even when using a
RAID subsystem on UNIX, a database extent should still be in the range
of 100 to 200 megabytes, depending on overall database size. Since
Progress can handle only a total of 100 database extents, including the
BI, AI and LG segments, targeting a total of about 90 extents is a good
idea.

There can be other performance benefits from using multi-extent databases
on RAID systems as well. There will tend to be less internal fragmenta-
tion, including the BI and AI segments, allowing better overall performance.
Generally, it is best to only truncate the BI and AI segments when it is
absolutely required. There is considerable overhead involved in extending
and formatting these elements.

Three Common RAID Levels for MFG/PRO and PROGRESS
=================================================
Level 0
-------
Disk 1 Disk 2 Disk 3 Disk 4 Disk 5
========== ========== ========== ========== ==========
Segment 1 Segment 2 Segment 3 Segment 4 Segment 5
Segment 6 Segment 7 Segment 8 Segment 9 Segment 10
Segment 11 Segment 12 Segment 13 Segment 14 Segment 15
Segment 16 Segment 17 Segment 18 Segment 19 Segment 20
Segment 21 Segment 22 Segment 23 Segment 24 Segment 25

Data Segments are written to each drive simultaneously, but there's
no parity for backup.

Level 2
-------
Disk 1 Disk 2 Disk 3 Disk 4 Disk 5
========== ========== ========== ========== ==========
Segment 1 Segment 2 Segment 3 Segment 4
Segment 5 Segment 6 Segment 7 Segment 8 Parity for
Segment 9 Segment 10 Segment 11 Segment 12 Segments
Segment 13 Segment 14 Segment 15 Segment 16 1 - 20
Segment 17 Segment 18 Segment 19 Segment 20

Data Segments are spread over several drives. All parity is written
to one drive for backup.

Level 5
-------
Disk 1 Disk 2 Disk 3 Disk 4 Disk 5
========== ========== ========== ========== ==========
Segment 1 Segment 2 Segment 3 Segment 4 Parity for 1-4
Segment 5 Segment 6 Segment 7 Parity for 5-8 Segment 8
Segment 9 Segment 10 Parity 9-12 Segment 11 Segment 12
Segment 13 Parity 13-16 Segment 14 Segment 15 Segment 16
Parity 17-20 Segment 17 Segment 18 Segment 19 Segment 20

Each drive holds both data segments and parity segments, offers
greatest reliability for backup. However, this still requires two writes
for each persistent write, just as a mirrored pair does, but with less
ability to multi-stream reads.

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

From: tom@bass.lkw.com (Tom Bascom)
Subject: Re: Multi-volume databases
Date: Thu, 20 Apr 1995 12:57:45 -0400 (EDT)
X-Mailing-List: <progress-list@peg.happmis.com> archive/latest/531
To: progress-list@margaret.peg.happmis.com
Status: RO

> Hi folks,
>
> We are implementing a 5GB multi-volume database on our Pentium server
> runing Unixware. We are using RAID(5). I am curious to find out
> what size database extents should be defined. Is it better to define
> many smaller size extents as opposed to fewer bigger size extents?
> Progress documentation mentions that extents can be no greater than
> 500 MB and that there is a performance hit for extents larger than 100MB.
>
> All comments/suggestions are welcome. Thanks in advance for your help.

You'd be better off ditching the RAID 5 in favor of RAID 0 and RAID 1.

Failing that keep the extents below the size where the OS goes to triple
indirect blocks or any other "knee" in file IO performance (quite a bit of
this depends on the OS and the FS installed). Traditionally keeping file
sizes under 64mb is a good idea. Keep them on disks of their own (you are
using lot's of small disks and plenty of controllers instead of 1 or 2
really big disks right?). Especially keep the .bi file ( -g ) on it's own
spindle and try really hard to keep the OS somewhere other than where the
db lives. It's also nice to have the temp files (-T) on a spindle of
their own.

How many users? How much RAM? SMP? Client/server?

-----------------------------------------------------------------
| Tom Bascom You've just got to be persistent. |
| EDS |
| tom@lkw.com |
-----------------------------------------------------------------

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

From: McIntyre_Bruce/amer_tech@qad.com
Date: Thu, 20 Apr 95 16:25:16 -0400
Subject: Re: Multi-volume databases
X-Mailing-List: <progress-list@peg.happmis.com> archive/latest/537
To: progress-list@margaret.peg.happmis.com
Status: RO

Item Subject: Message text
From message by donleyj@elroy.mukilteo.wednet.edu:
>
> > You'd be better off ditching the RAID 5 in favor of RAID 0 and RAID 1.
>
> I've seen this type question and reply regarding RAID 5 a number of times on
> the List, and would like some more perspective. I understand the concept
> for RAID 5 and it's inherent weaknesses in terms of 'Writes'. Is this the
> primary problem with RAID 5 and Progress, or is there something more
> specific with Progress that compounds the problem?
>
> My reason for asking is that Digital has a controller in their StorageWorks
> product line that has hardware based RAID (0, 1, 0 + 1, 3, and 5) with
> battery-backed write-back cache. This write-back cache seems to negate the
> downside of RAID 5 while providing good performance at a much lesser cost
> than RAID 0+1 (N+1 vs. 2N). Granted, you don't get the advantage of
> mirroring, but I know we can't afford that luxury.
>
> Jeff Donley
> Mukilteo School District
> donleyj@mukilteo.wednet.edu

Do Not REPEAT DO NOT use a write-back cache. This will invalidate the PSC
support of the RDBMS. Especially on the BI file. While this is strong comment,
it should not be discarded lightly.

Can a well engineered RAID 5 for the database offer adequate performance?
Yes it can. But it will NEVER be as good as a combination of RAID 0 and 1,
with the same integrity. Progress, like most relational DB's is heavily
read oriented, but it is the integrity writes that create the performance
bottlenecks.

Progress has done an excellent job of writing a "fuzzy logic" cache buffer,
and to subvert this with external logic is not a good idea.

Bruce McIntyre, Industry Analyst for Technology, QAD Inc. (bam@qad.com)

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Date: Fri, 21 Apr 1995 09:23:20 GMT
From: Peter Headland - Matrix Link <Peter_Headland@matrixlk.demon.co.uk>
Subject: Re: Multi-volume databases
X-Mailing-List: <progress-list@peg.happmis.com> archive/latest/542
To: progress-list@margaret.peg.happmis.com
Status:

Jeff Donley said:
> My reason for asking is that Digital has a controller in their StorageWorks
> product line that has hardware based RAID (0, 1, 0 + 1, 3, and 5) with
> battery-backed write-back cache. This write-back cache seems to negate the
> downside of RAID 5 while providing good performance at a much lesser cost
> than RAID 0+1 (N+1 vs. 2N). Granted, you don't get the advantage of
> mirroring, but I know we can't afford that luxury.

Bruce McIntyre replied:
> Do Not REPEAT DO NOT use a write-back cache. This will invalidate the PSC
> support of the RDBMS. Especially on the BI file. While this is strong comment,
> it should not be discarded lightly.

I disagree. If the cache is an integral part of the RAID box and is
battery backed, then it is every bit as good as the discs in that box
(very probably better!). If Digital warrant this set-up as equivalent
in security to naked disc drives, then PSC would have no business
refusing support (and I'm sure that they would not do so).

This prejudice against anything which isn't magnetic media is way out
of date! Would you counsel against using one of the "solid-state
discs" that are now on the market (though still extremely expensive)?
If the MTBF is similar to that for magnetic disc, what's the problem?

BM again:
> Can a well engineered RAID 5 for the database offer adequate performance?
> Yes it can. But it will NEVER be as good as a combination of RAID 0 and 1,
> with the same integrity.

But with an adequately-sized write cache and a real-world application
it is very unlikely there would be any detectable difference. And, I
would rather take RAID 5 with (secure) write cache than uncached RAID
0/1 because the cached solution will give better write performance.

BM:
> Progress has done an excellent job of writing a "fuzzy logic" cache buffer,
> and to subvert this with external logic is not a good idea.

"Subvert" implies some conflict between the layers; there is none.
It's worth remembereing that any decent SCSI hard drive will have
around 512KB cache built-in anyway...

--
Peter Headland | Peter_Headland@matrixlk.demon.co.uk
Managing Director |
Matrix Link Limited | "If at first you do succeed --
Stoke-on-Trent UK | try to hide your astonishment."

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

From: McIntyre_Bruce/amer_tech@qad.com
Date: Fri, 21 Apr 95 22:19:09 -0400
Subject: [RE] Multi-volume databases
X-Mailing-List: <progress-list@peg.happmis.com> archive/latest/556
To: progress-list@margaret.peg.happmis.com
Status:

Item Subject: message.txt
From message by Peter_Headland@matrixlk.deamon.co.uk:
> Jeff Donley said:
> > My reason for asking is that Digital has a controller in their StorageWorks
> > product line that has hardware based RAID (0, 1, 0 + 1, 3, and 5) with
> > battery-backed write-back cache. This write-back cache seems to negate the
> > downside of RAID 5 while providing good performance at a much lesser cost
> > than RAID 0+1 (N+1 vs. 2N). Granted, you don't get the advantage of
> > mirroring, but I know we can't afford that luxury.
>
> Bruce McIntyre replied:
> > Do Not REPEAT DO NOT use a write-back cache. This will invalidate the PSC
> > support of the RDBMS. Especially on the BI file. While this is strong
> > comment, it should not be discarded lightly.
>
> I disagree. If the cache is an integral part of the RAID box and is
> battery backed, then it is every bit as good as the discs in that box
> (very probably better!). If Digital warrant this set-up as equivalent
> in security to naked disc drives, then PSC would have no business
> refusing support (and I'm sure that they would not do so).
>
> This prejudice against anything which isn't magnetic media is way out
> of date! Would you counsel against using one of the "solid-state
> discs" that are now on the market (though still extremely expensive)?
> If the MTBF is similar to that for magnetic disc, what's the problem?

First of all, the comparison was to be between RAID5 and RAID0-1, and if
you include the same cache memory with RAID0-1, it will STILL be faster than
RAID5.

Second, this is not prejudice. If your CACHE were WRITE-THROUGH cache, then
there would not be an issue. The problem comes when the write-back cache
does some fancy elevator seek logic to change the timing of what is written
back to the database and when. This DOES subvert the logic in the -B cache,
and can result in runtime errors, even though the hardware vendor says that
it can't happen. I have SEEN it happen. Turning off CACHE fixed the problem.
While this wasn't DEC's version, until I have a few production sites running
this way on a particular hardware mix, I would be very reluctant to say that
it "works always".

Another good example of this is the SUN PrestoServ, which is NVRAM on the
motherboard (batter backed), and does a GREAT job of improving performance
on Solaris. I DO recommend this to our SUN accounts. But at the same time
PSC does not guarantee that there will be no integrity issues, as they have
not done a certification test on it.

Again, I do recommend solid-state disks for things like BI and TMP files.
These do not impact on the Progress "fuzzy-write" logic.

> BM again:
> > Can a well engineered RAID 5 for the database offer adequate performance?
> > Yes it can. But it will NEVER be as good as a combination of RAID 0 and 1,
> > with the same integrity.
>
> But with an adequately-sized write cache and a real-world application
> it is very unlikely there would be any detectable difference. And, I
> would rather take RAID 5 with (secure) write cache than uncached RAID
> 0/1 because the cached solution will give better write performance.

Again, lets compare apples and apples. I would rather have that cache on
a RAID 0-1 than on a RAID 5 implementation. Disk drives are CHEAP in compari-
son to the data on them. And RAID 5 just can't deliver the backup options
that RAID 0-1 can. And RAID 5 cache can be blown when you have users doing
both write and read transactions at a high rate. While the dual heads of
a mirror pair offer better total throughput regardless of the speed of the
controllers, as long as the bus can handle the load.

I say again. In a heavy transaction environment, comparabley configured,
RAID 0-1 will be a better solution (at somewhat more money) than RAID 5.

> BM:
> > Progress has done an excellent job of writing a "fuzzy logic" cache buffer,
> > and to subvert this with external logic is not a good idea.
>
> "Subvert" implies some conflict between the layers; there is none.
> It's worth remembereing that any decent SCSI hard drive will have
> around 512KB cache built-in anyway...

I'll stand by my statement. Combinining the two logic layers must create
some level of incoherency as far as the database persistence is concerned.
If the external logic were a write-through cache, then this would NOT be
an issue.

> --
> Peter Headland | Peter_Headland@matrixlk.demon.co.uk
>

Bruce McIntyre, Industry Analyst for Technolgoy, QAD Inc., (bam@qad.com)

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Date: Sat, 22 Apr 1995 10:36:21 GMT
From: Peter_Headland@matrixlk.demon.co.uk (Peter Headland - Matrix Link)
Subject: Re: [RE] Multi-volume databases
X-Mailing-List: <progress-list@peg.happmis.com> archive/latest/562
To: progress-list@margaret.peg.happmis.com
Status:

> Second, this is not prejudice. If your CACHE were WRITE-THROUGH cache, then
> there would not be an issue. The problem comes when the write-back cache
> does some fancy elevator seek logic to change the timing of what is written
> back to the database and when. This DOES subvert the logic in the -B cache,
> and can result in runtime errors, even though the hardware vendor says that
> it can't happen. I have SEEN it happen. Turning off CACHE fixed the problem.

The interesting thing is that it seems like this class of problem may
also occur on "unadorned" disc drives (which do, in fact, have built-in
cache). Think back to recent PEG correspondence on the subject of
Conner disc drives... Of course, write-through cache doesn't solve the
RAID-5 performance problem at all.

I think I should have made it clear that I would only use write-cached
RAID which was certified by a major vendor (HP/Digital/DG/SUN/etc.) and
which had been shipping for a while. I would also get the RAID vendor
to provide a written warranty with no limit on consequential damages!

I should also make it clear that I was only talking about RAID
susbsystems which have a single connection to the cpu (and thus appear
to be nothing more than a large, fast disc drive) and battery backup
with a life of at least a week or a *built-in* dual-channel UPS which
gives enough autonomy to flush RAM to disc. Any solution which uses
the cpu to do part of the RAID and caching work (typically by using one
or more multi-channel SCSI cards and special device driver software),
is NOT SAFE to use with write-cache.

> While this wasn't DEC's version, until I have a few production sites running
> this way on a particular hardware mix, I would be very reluctant to say that
> it "works always".

Fair comment. I wouldn't go anywhere near an offering from one of the
small companies which have sprung up in this market recently - I'd
almost *expect* those to be untrustworthy as small firms just don't
have the development budget to do that kind of work.

> on Solaris. I DO recommend this to our SUN accounts. But at the same time
> PSC does not guarantee that there will be no integrity issues, as they have
> not done a certification test on it.

I think it's about time PSC worked with some of the big RAID vendors to
certify these platforms.

> Again, I do recommend solid-state disks for things like BI and TMP files.
> These do not impact on the Progress "fuzzy-write" logic.

But why not the db? If a solid-state disc is exactly equivalent to a
magnetic disc (talking about unplugging one SCSI drive and plugging
in another - forget RAID/caching/etc.), then what's the problem?

> I say again. In a heavy transaction environment, comparabley configured,
> RAID 0-1 will be a better solution (at somewhat more money) than RAID 5.

Of course. But that "somewhat more money" seems to be an obstacle to
some folks... Hey, Bruce, I'm on your side! I'm constantly at war
with my customers over the fact they won't buy as many spindles as they
should. I lose lots of those battles. Slick RAID salesmen have a lot
to answer for...

> I'll stand by my statement. Combinining the two logic layers must create
> some level of incoherency as far as the database persistence is concerned.
> If the external logic were a write-through cache, then this would NOT be
> an issue.

We'll just have to disagree. If the RAID box looks like a single fast
disc drive to the CPU box (and assuming it has equivalent reliability),
then there cannot be any such issue. From outside the box it is
irrelevant how the data are stored. If there is a problem it is
because there is a severe bug in the RAID vendor's software, not
because of any inherent feature of the architecture.

--
Peter Headland | Peter_Headland@matrixlk.demon.co.uk
Managing Director |
Matrix Link Limited | "If at first you do succeed --
Stoke-on-Trent UK | try to hide your astonishment."

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

From: NMCRSHQ@delphi.com
Date: Mon, 24 Apr 1995 15:57:20 -0400 (EDT)
Subject: Re: Multi Volume Databases
X-Mailing-List: <progress-list@peg.happmis.com> archive/latest/613
To: progress-list@margaret.peg.happmis.com
Status:


> You'd be better off ditching the RAID 5 in favor of RAID 0 and RAID 1.

> Failing that keep the extents below the size where the OS goes to triple
> indirect blocks or any other "knee" in file IO performance (quite a bit of
> this depends on the OS and the FS installed). Traditionally keeping file
> sizes under 64mb is a good idea. Keep them on disks of their own (you are
> using lot's of small disks and plenty of controllers instead of 1 or 2
> really big disks right?). Especially keep the .bi file ( -g ) on it's own
> spindle and try really hard to keep the OS somewhere other than where the
> db lives. It's also nice to have the temp files (-T) on a spindle of
> their own.

> How many users? How much RAM? SMP? Client/server?

> -----------------------------------------------------------------
> | Tom Bascom You've just got to be persistent. |
> | EDS |
> | tom@lkw.com |
> -----------------------------------------------------------------


Tom, please excuse my ignorance but could you explain the term -
"triple indirect blocks". Also, the recomendation is to keep
the extent sizes under 64 MB. How do we determine this number?

Our database server is a Compaq Proliant 4000 currently hosting
a single processor. It has 64 MB of RAM and is running Unixware.
The RAID 5 implementation consists of five 2.1 GB SCSI-2 hard drives
and one SMART SCSI array controller. There will be about 30 users
accessing the system at any given time. The application will be host based.

The past few days discussion on RAID 5 vs. RAID 0 and RAID 1 has been
very informative. PSC response to RAID 5 is that Progress does not care
as long as RAID 5 is implemented properly at an hardware level. RAID 0
and RAID 1 does offer the best performance and protection but is also
the most expensive to implement. RAID 5 provides a less expensive
alternative while maintaing fault tolerance. Wouldn't it be equally
effective to stripe the bi file (under RAID 5) as having the bi file on a
seperate disk and mirroring it?


Thanks for the information.


Rimmy Kaur
______________________________________________________________________

Navy Marine Corps Relief Society
Arlington, VA
nmcrshq@delphi.com
(703) 696-4925
_______________________________________________________________________

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

From: tom@bass.lkw.com (Tom Bascom)
Message-Id: <9504250145.AA16495@bass.lkw.com>
Subject: Re: Multi Volume Databases
Date: Mon, 24 Apr 1995 21:45:46 -0400 (EDT)
X-Mailing-List: <progress-list@peg.happmis.com> archive/latest/622
To: progress-list@margaret.peg.happmis.com
Status:

> Tom, please excuse my ignorance but could you explain the term -
> "triple indirect blocks". Also, the recomendation is to keep
> the extent sizes under 64 MB. How do we determine this number?

Some (mostly older) UNIX file systems store pointers to data blocks in a
way that involves levels of indirection that depend on the file size. Each
level of indirection is a potential disk access. Too many levels of
indirection (pointers to pointers to pointers...) in order to find a block
of data degrades performance. As a rule of thumb files (or database
extents) bigger than 64mb are suspiciously large.

> Our database server is a Compaq Proliant 4000 currently hosting
> a single processor. It has 64 MB of RAM and is running Unixware.
> The RAID 5 implementation consists of five 2.1 GB SCSI-2 hard drives
> and one SMART SCSI array controller. There will be about 30 users
> accessing the system at any given time. The application will be host based.

-B 60000 sounds like a good start...

> The past few days discussion on RAID 5 vs. RAID 0 and RAID 1 has been
> very informative. PSC response to RAID 5 is that Progress does not care
> as long as RAID 5 is implemented properly at an hardware level. RAID 0
> and RAID 1 does offer the best performance and protection but is also
> the most expensive to implement. RAID 5 provides a less expensive
> alternative while maintaing fault tolerance. Wouldn't it be equally
> effective to stripe the bi file (under RAID 5) as having the bi file on a
> seperate disk and mirroring it?

No. The bi file is especially sensitive because it's basically a lot of
sequential writes. Head movement or contention in such a scenario is bad.
OTOH you shouldn't need a 2gb disk for the .bi file. Go out and buy a
(relatively) small and very fast drive and a controller just for that
drive for the best performance. Then again with 30 self service clients
you probably don't need to get too worked up about most of this.

I hope this helps :)

-----------------------------------------------------------------
| Tom Bascom You've just got to be persistent. |
| EDS |
| tom@lkw.com |
-----------------------------------------------------------------

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

From: McIntyre_Bruce/amer_tech@qad.com
Date: Tue, 25 Apr 95 16:40:28 -0400
Subject: [RE] Multi-volume databases. Slight digression to include AI
X-Mailing-List: <progress-list@peg.happmis.com> archive/latest/659
To: progress-list@margaret.peg.happmis.com
Status: RO

Item Subject: message.txt
From message by andrew.scott@dundee.attgis.com:
>
> Obviously buffered AI writes are better performance wise but riskier due
> to the disconnect between the logical write and the physical write to the
> disk. Is this an unreasonable fear or is unbuffered writing more secure?
>
> I guess it is ultimately down to a personal choice if nothing else, but
> are what are the percentages involved?
>
> Andrew Scott
> andrew.scott@dundee.attgis.com
> ----------
>

I would not be too concerned over using buffered AI as long as I was not
trying to do transactions over multiple databases. The only issue would
be that when you used the AI file to "restore" the database, you could
only restore to the point in time that data from the filesystem buffers
had been flushed out to disk.

Since in most versions of UNIX, the default time for this is about 1 minute,
it is probably true that the worst case is that you could lose up to 2 min.
of data transactions.

On the other hand, if you were doing multi-db transactions, you MUST use
RAW AI in order to guarantee data concurrency. Otherwise, you might as
well not bother using it at all.

Bruce McIntyre (bam@qad.com)

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

From: tom@bass.lkw.com (Tom Bascom)
Message-Id: <9504260048.AA06425@bass.lkw.com>
Subject: Re: Multi Volume Databases
Date: Tue, 25 Apr 1995 20:48:10 -0400 (EDT)
X-Mailing-List: <progress-list@peg.happmis.com> archive/latest/676
To: progress-list@margaret.peg.happmis.com
Status:

> On Mon, 24 Apr 1995, Tom Bascom wrote:
>
> > Some (mostly older) UNIX file systems store pointers to data blocks in a
> > way that involves levels of indirection that depend on the file size. Each
> > level of indirection is a potential disk access. Too many levels of
>
> I thought ALL unix systems stored files this way. (I.E. Inode points to
> first 10 blocks of file, larger files require inode pointing to an
> indirect block {single-indirect} and so on.)
>
> What mechanisms are used now and when did this change?

I was hoping someone smarter than me would bail me out of this one... oh
well, here goes... svr4 implemented something called vnodes which SUN had
apparently been using for a while. While similar in name to an indoe
that's about where it ends. Vnodes are used to support virtual files
systems which could be the traditional s5 or ufs files systems or
something really strange like a dos or a cdrom fs. "The Magic Garden
Explained", which reveals the internals of svr4 goes into all of this in
great detail, there's a much better explanation (via a diagram) than the
one I gave of triple indirect blocks on pg 391. Anyone who's interested
really ought to get this book, it's a lot of fun to read.

-----------------------------------------------------------------
| Tom Bascom You've just got to be persistent. |
| EDS |
| tom@lkw.com |
-----------------------------------------------------------------

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

From: McIntyre_Bruce/amer_tech@qad.com
Date: Wed, 26 Apr 95 07:35:43 -0400
Subject: [RE:AGAIN] Multi Volume Databases. bi file on RAID
X-Mailing-List: <progress-list@peg.happmis.com> archive/latest/684
To: progress-list@margaret.peg.happmis.com
Status:

Item Subject: message.txt
From message by andrew.scott@dundee.attgis.com:
> >
> >If you put the BI file on a RAID set, or anywhere it must contend for disk
> >access time, you will directly and immediately impact the ability of the
> >entire database to process transactions.
>
> Does this hold even for RAID 0 or 1?

RAID 0 YES, RAID 1 NO (after all Raid 1 is mirroring, not really raid)

> What would you suggest for a system which uses multiple databases? Where
> would the bi files be best located?
> I suppose having each on its own disk with its own controller is the ideal
> situation. We have ALL (6) bi files on a RAID 0 layout. In addition most of
> the bi files have one fixed and one variable extent.
>
You are correct that the best approach is each on a separate disk. However
unless these are all similar intensity production BI files, that is very
expensive. An alternate view is to combine a couple of the less intensive
BI files on one disk, understanding that this is sub-optimum but probably
OK. Yes, with this approach they MUST be multi-extent. You might get this
arrangement down to 2 or three disk drives with acceptable results.

Bruce
--
===========================================================================
Bruce A. McIntyre, Industry Analyst for Technology, QAD Inc. (bam@qad.com)

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
From: McIntyre_Bruce/amer_tech@qad.com
Date: Sun, 30 Apr 95 19:39:52 -0400
Message-Id: <P000007d013db92a@MHS>
Subject: [RE-RE:] Multi-volume databases
X-Mailing-List: <progress-list@peg.happmis.com> archive/latest/838
To: progress-list@margaret.peg.happmis.com
Status: O

>From message by wizard@mail.msen.com:
>
> I use a VERIT[A]S based volume manager, it requires a bit more setup then
just > plugging in a hardware device and *off to the races*,... But *any* good
> diskfarm _REQUIRES_ forethought and planning!
>
> Does RAID allow one to select what slices of each disk belong to this or
> that filesystem, I hope it provides striped access vs: concatinated access,
> and at least RAID-1 will mirror. I can mirror anything _at_will_ [provided
> the disk space], and I stripe my filesystems with a single cylinder width.
>
> I build the file system accross up to 8 spindels [currently 4] and create a
> single PROGRESS DB extent file that consumes the *whole* free space. My OS
> & VMsoftware _love_ me!
>
> Name: Dex T Peterson
> E-mail: wizard@mail.msen.com (Electro-Wire Products)

Actually, Veritas is both a good and a poor answer.

First of all, since it is a CPU Device Driver, it does use CPU time to do the
mirroring and RAID calculations. The best RAID subsystems have a separate
CPU in the drive system for this purpose. Second, I have had sites running
Veritas (especially on ROOT filesystems) that have had severe recovery prob-
lems under certain conditions.

Third, this product is a solid solution, and several hardware platforms have
picked up some or all of the VERITAS functionality for inclusion. However,
a poor layout will still be a poor layout, whether it is a HARDWARE RAID,
or a SOFTWARE RAID. I would also suggest that a single Progress DB extent
is probably not the optimum, based on past testing and especially on HP,
SCO and a few other platforms.

Finally, it really doesn't matter What approach you use, the key is to do it
carefully, with a full understanding of the good and bad of each solution,
and then to document it carefully for later rebuilding by someone else. And
it is never wrong to add more disk drives than you ever think you will need,
and to use them effectively.

Bruce McIntyre (bam@qad.com)
</FONT f>
 

rehandba

New Member
help required regarding raw partition

Hello progress talkes,

We are trying to implement the raw devices for storing Read-only data (in GB). we perform some test on test server which failed.

Our environment is:
Progress Database/Appserver: version 9.1B patch 13
AIX : version 5L (5.0 or 5.2)

Work Done: [for testing the scenario on test servers]
- create the raw partition i.e abc
(we find two files [character and block device files] having names abc and rabc in /dev directory)
- give permissions on the 2 files (i.e chmod 777 /dev/*abc)
- create link (e.g /test> ln -s /dev/rabc testdb.d1)
- create st for multi volume database having 2 data extents
[ d "test" 7,32 testdb.d1 r 5120, d "test" 7,32 testdb.d2 ]
- create db using above st file which ends up with 2 System level errors (.e trying to exceed 2 GB limit, testdb.d1 size is too small)
- after db creation, any progress commands ends up generating core and protrace files


Does anyone tell us what is wrong. Also note that we perform the same operations for both type of files [character and block device files].

Any suggestion/comments/guildance is appreciated. Kindly guide us is to how to create raw partitions.


Thanks
Rehan Ahmed Khan
reply me on: rehan_ahmed@cdcpak.com
 
Top