Question About Raid Vs. Vm Configurations

Hello All,

I went through some info on the basics in this area:
Loads of Progress info and documentation links...
My qustions is about this remark:
Note your RAID level(s), if any. RAID 1, good. RAID 10, great. RAID 5 (and other parity-based variants): eeeevil. Avoid it like the plague.
How does a VM machine with RIAD5 (a pretty much default setting for all our VM hosts) fit into this sentence? I have a Linux box on a VM with only 1 huge filesystem where everything is located.

Code:
Filesystem  1K-blocks  Used Available Use% Mounted on
/dev/sda1  24797380  4811720  18705684  21% /
/dev/sda3  376931920  34419928 323056076  10% /XXX

/XXX is where the whole application and database are located - in my opinion a rather unprofessional approach. Does anyone have any experience with VM's configurations and performance considerations?

As a DBA I always like to make clear separations of the application, flat files and databases using different dedicated file systems, as this is a third party configuration, I probably will have little to say about it, but I am curious as to what people think.

Cheers,
Ryszard
 

RealHeavyDude

Well-Known Member
RAID 5 in combination with virtualization ist the worst case scenario when it comes to performance for any RDBMS. Don't get me wrong: It might be great from an administrativ point of view and also cost wise. But you just have to live with the "kinda" performance you might get out of this combination.

Heavy Regards, RealHeavyDude.
 

TheMadDBA

Active Member
Tom will be in shortly to yell at me about this :D but... Your real world experience with RAID 5 will vary based on a number of factors.

While it is very true that writing to parity based raid is slower than a comparable RAID 1/10 array... sometimes that doesn't matter. Or doesn't matter enough. Some databases are tiny relative to the hardware they run on and most are heavily weighted towards reads instead of writes.

The easiest tests for RAID 5 are the edge cases (restores, adding extents, index rebuilds, etc.). Those are more likely to flood the cache and show noticeable performance differences for most databases/apps. It is also more likely to raise eyebrows on the management side when you tell them how long a restore will take. Or shock you to see how little it matters for your configuration.

As far as splitting file systems out by purpose... not a huge fan for most applications. Especially when the underlying storage presented to the host is going to be the same disks anyways. Logical separation is usually enough unless the non database usage approaches some meaningful percentage of the IO.

Obviously if you have a very large/very active/high user count install all of this matters much much more. But those are fairly rare in the OE world.
 

TomBascom

Curmudgeon
Actually I agree.

For many people it does not matter. Quite a few databases are small enough or have little enough activity that the RAID level and VM configuration aren't terribly important. Although "cookie cutter" VM implementations continue to be a big problem.

But when it does matter it matters a lot. People usually call me in when it matters. Often long after they should have. Most of them would have been well served to call me in (or read my posts or pay attention to my PUG lectures) when they were *planning* their systems and *before* they spent a whole lot of money on the wrong configuration. If you have a mission critical system that needs to consistently perform at a very high level you should be willing to put some thought into its configuration. A few dollars on independent upfront planning will save you a whole lot in the long run.

BTW -- you can do worse than RAID5. RAID6 (also known as RAID-DP) is much worse than RAID5.
 

TheMadDBA

Active Member
The following RAID levels should only exist (imo).. 10,1,50 and 5. Anything else is just asking for tragedy.

I am still trying to encourage a certain former employer to embrace the joys of internal SSD for their databases. Not much luck so far because managers just love SAN storage.

Not that SAN is 100% evil if configured properly... the problem is most of the time you have to become a SAN expert because the average SAN admin is just clicking buttons with very little thought about performance.
 

RealHeavyDude

Well-Known Member
Totally agree on all said. To me it matters a lot and these Solaris zones and SAN ( EMC as far as I know with some RAID 5 under the covers ) setup just drives me nuts - performance wise. The bigger production system needs to handle some 35 to 40 GB transaction throughput ( that is the cummulated size of the AI extents that fill up ) during the night. On a decent system with local spinning rust this was no problem at all.

Now that we have been migrated to Solaris zones and SAN file systems we are hardly able to meet our SLA. Although neither do I have control over the Solaris zone nor the SAN configuration - it is a standard configuration with ZFS ( which can be another PITA ) file systems - you guess who get's the blame.

I can fully understand that managers and system administrators love this stuff.

RealHeavyDude.
 
As for this remark:
As far as splitting file systems out by purpose... not a huge fan for most applications. Especially when the underlying storage presented to the host is going to be the same disks anyways. Logical separation is usually enough unless the non database usage approaches some meaningful percentage of the IO.
one of the primary reasons I plan my installations by separating various parts is because the installation parts are very clear, for example all my databases are on file systems starting with /dXX. Just a day ago I wanted to find a script and running a simple "find / -name XXX" on a huge file system with databases in it can become a real problem.
Also I just recently heard about a case where one company that uses IBM as it's support for their systems has removed it's databases because someone deleted database files. If the database files are in a dedicated folder and protected by different permission rules, this is less likely to happen. Somebody who has no clue what the system is made up of would have to think twice before doing anything in a fodler like /dxx or /database/XX. In my case the database files are mixed up with .csv files, log files. I am not surprised someone with less experience and without due consideration deleted files that have similar file extensions and text .log files mixed with database .log files. It just shows how good habits tend to be ignored.

As for the aspect of performance I tend to look at the parameters related to disk latency on VM's and make regular snaphots for comparing in the future. I am not highly experienced, so I don't know if this is the correct approach. As long as one system is part of a VM disk structure there is very little one can do. Any performance issues and you have to look at the VM level, not the server level. Same for any SAN configurations. Any rules that say you need to split your disk areas onto different spindles can be regarded as redundant, a rule that applies to Progress, Oracle, SQL-Server, etc. That poses a challenge sometimes, because where on a dedicated server you have some control over where you data, indexes, temp and BI images are, in a VM situation that is left to some randomness and doesn't help in locating any bottlenecks you would encounter.

Thanks everyone for the intersting thoughts.
Regards,
Ryszard
 
Top