Question Disk Block Size

BigSlick

Member
Hi all,

Our supplier is upgrading our storage to VMAX LUN's and asked if we would consider formatting the disks to 64k.

We currently have 10.2BSp07 running a 4k database block size on 4k disks on windows server 2008.

I've just read the below:

Optimal NTFS block size for Progress 10.1C database? (Optimal NTFS block size for Progress 10.1C database?)

where RHD says:
AFAIK you can set the blocksize when you format a drive with NTFS.

The best: Database blocksize = Filesystem blocksize.

Never: Database blocksize < Filesystem blocksize !!!! (performance hit)

Might be dangerous: Databaseblocksize = n * Filesystem blocksize (Gus - wizard of wizards - once said that this might cause silent database corruption. Progress gives one block to write to the filesystem but the filesystem has to write more blocks - when something happens in between writing the blocks Progress might not be aware ...

AFAIK for Windoze and Linux best database blocksize is 4K, whereas 8K for Unix.

Therefor 4K is the winner on Windoze.

HTH, RealHeavyDude.

I was wondering if this was still the case? I will argue to up the database block size to 8k. (we are going to 11.6.2 also! :) )

TIA
BS
 

TomBascom

Curmudgeon
The block size on the VMAX LUN is not the same entity as the filesystem block size.

FWIW, if performance is important, you would be much better off using the VMAX (or any SAN) for a boat anchor and putting your db on internal SSD.
 

BigSlick

Member
Thanks Tom.

Could you elaborate on this please? - "The block size on the VMAX LUN is not the same entity as the filesystem block size." It's something i'm not familiar with.

Maybe i'll not quote that response and go for a slightly more gentle approach to our supplier. :)
 

TomBascom

Curmudgeon
There are many layers. DB, Filesystem, SAN, actual disks. The LUN block size is underneath the filesystem. Beneath that are the disks themselves (which are probably 512 byte blocks). Depending on the specific infrastructure there may, in fact, be several more layers that I have omitted. Each layer has its own characteristics. Larger LUN blocks (or "stripe sizes") are usually ok.

A SAN is *external* and on the wrong end of a cable. That means it will *always* be slower compared to a internal disks of similar (or better) quality . You cannot fix the latency, it is built in. Once upon a time external disks provided some advantages for large databases because of limits on internal capacity. Those days ended about 10 years ago. The performance problems are also not just because of the cable -- when a SAN is *shared* a whole new crop of performance problems are introduced.

If you want it to be fast use internal SSD.

If performance does not matter at all feel free to implement on a SAN.

(Internal SSD is also usually quite a lot less expensive than a SAN solution...)
 

Rob Fitzpatrick

ProgressTalk.com Sponsor
I have a client who spent hundreds of thousands of dollars on a SAN. It provides storage for our application solution and many other LOB applications. It's big, it's expensive, it does site-to-site SAN replication; spinning disks in RAID 5; the storage guys love it. It's also abysmally slow for database storage.

<rant>I call this approach "penny-wise and pound-foolish". People still use RAID 5 because it provides more capacity per dollar than RAID 1 or RAID 10. It doesn't provide more fault-tolerance. Four disks of capacity X in a RAID 5 array provide 3X capacity, whereas the same four disks in a RAID 10 array provide 2X capacity. But given the small size of my DBs (few dozen to few hundred GB) compared to the huge size of disks these days, who cares about the cost of one disk? Or a few disks? For your mission-critical application? You can find that money in the coffee donation jar in the kitchen. For a fraction of the cost of that SAN I can buy storage that is reliable and far out-performs the SAN. I can also buy the Linux server to run it and the backup appliance too.</rant>

I have a little I/O test script I run at every client site (tip of the hat to Mike Furgal at Progress for the idea); here's the Windows version:
Code:
@echo off
rem simple BI grow test
echo.
echo %time%: Creating a sports database...
prodb sports sports
echo.
echo %time%: Setting the BI cluster size to 32 MB...
call proutil sports -C truncate bi -bi 32768
echo.
echo %time%: Growing the BI by 4 additional clusters (256 MB total)...
call proutil sports -C bigrow 4
echo.
echo %time%: Removing the database...
echo y | prodel sports
del sports.st
echo.
echo %time%: Done.

It tests sequentially writing 256 MB of BI data (which is unbuffered). Granted, it is synthetic and tests only one metric of I/O performance. It doesn't behave like a database. But it tends to tell you the general performance ballpark you're working in. In my experience, a good time is 6 seconds or less. I can live with 10 seconds or less. On the expensive SAN (also using OS virtualization, for added perf-tuning fun), the best run time we got was 150 seconds. The worst was 15 minutes. For perspective, the old *PC* on which I'm writing this (with a slow spinning consumer-grade disk) runs it in 55 seconds. My 6-year-old RHEL 6.3 dev box with internal 10K SAS disks in RAID 1 (ext4) runs it in 6.5 seconds. Several of my clients run it in 5 seconds or less. 150 seconds is beyond ridiculous.

Ultimately, the real test is your application workload (which I'd guess is probably mostly read-heavy). Monday-to-Friday work, batch reports, month-end and year-end reports, database backups, database restores, dump and load; whatever you're going to ask the system to do. If the powers that be are going to tell you to accept this storage solution then you need to test it however you can before it goes live. That will give you the best-case scenario; when the SAN isn't rebuilding an array because a disk died, when it isn't replicating to another SAN, when someone else's 500 GB application database isn't being backed up during your batch run and saturating the SAN's multi-GB cache that "totally makes up for the underlying storage being RAID 5", when the intervening network device or interface isn't misbehaving or misconfigured, etc.

I will argue to up the database block size to 8k. (we are going to 11.6.2 also! :) )
FWIW I use 8K database block size across all file systems. In theory, it's a little more efficient for DB I/O and storage than 4K. If you're going to 11.6, which is a good target, you might as well start with the latest (SP03).
 

Rob Fitzpatrick

ProgressTalk.com Sponsor
I should also say that, in spite of the rant above, I'm not opposed to SAN storage for databases, in principle. Databases, OpenEdge or otherwise, can run very well on SAN storage.

The point is that having SAN storage is no guarantee of great or even passable I/O performance, particularly if you follow many worst practices for storage.
 

RealHeavyDude

Well-Known Member
I can only second what Rob says regarding SANs. From what I experienced you need to have deep expert knowledge about file systems and the underlying layers in order to achieve decent write performance. Out-of-the-box you will most likely get really bad write performance. If you have access to such deep expert knowledge you might achieve sufficient performance to be compliant with your service level agreements.

Still dba tasks like backup/restore, dump & load and/or index rebuilds are incredibly slow.

Heavy Regards, RealHeavyDude.
 
Top