Question Set noatime on DB filesystems? Genral Deployment suggestions.

Jimbro

New Member
HI All,

We are experiencing extreme performance impact during the incremental online backups of our databases. We are in the process of upgrading the SAN however this will not be immediate.

I want to set noatime on the EXT3 filesystems that hold the Progress Database extents, however I am not sure if this is taboo. Can anyone confirm it's a bad idea?

Also from the perspective of general hardware setup, is there any doc that states how the hardware should be setup and more importantly how filesystems should be tuned for Progress Database?

Thanks!
 

Rob Fitzpatrick

ProgressTalk.com Sponsor
I typically set the noatime mount option on ext3 file systems, and I have heard from others that they do as well for Linux Progress DB servers. I don't have any other general tips for file system tuning.

The Database Essentials manual has a section called Administrative Planning that you may want to look at. In general:
  • keep your AI files on separate media from your database (for data protection rather than performance);
  • use RAID 10; don't use parity-based RAID levels like 5, 50, DP, etc.;
  • use quality enterprise disks, SSD if possible;
  • spreading your I/O across more "spindles" can improve performance;
  • don't use low-quality SANs; and even high-quality ($$$) SANs may not be optimally configured out of the box;
  • don't write your backups, dumps, etc. to the same storage volume that holds your database.
Just curious: why are you running incremental backups, and how often do they run? Are the backups written to the database disk(s)? What is your Progress version? What is your SAN?
 

Jimbro

New Member
Hi Rob, Thanks for the feedback.

We do AI to a remote box so it's not on the same set of disk as the DB extents.

We do use RAID 10 here in this situation.

We are using SAS 10k disks, we do not have SSD.

We have this spread across maybe 12 10k SAS drives.

We are using an HP MSA p2000G3 10GbE iSCSI SAN. As I said we are moving to upgrade the controllers to FC or bring in a new SAN with 15k drives and SSD mixed.

We are not writing the backups to the same storage, we are doing to a remote box like AI.

So we use incremental backups once a night with a full backup on the weekend. We were trying to reduce impact of the full backup each night because we were seeing performance issues during the backup. In fact we see impact on most of our progress servers during the full online backup. The incremental takes less time to run, well it used to so the DBA team implemented that. It does reduce impact at most locations.

I am of the opinion that perhaps Progress DB should not be virtualized in some cases.

Thanks again for the information I will remount with noatime.

Jimmy
 

TheMadDBA

Active Member
In most cases you are not going to see amazing differences with FS parameter tweaks. Backing up to a remote/NFS mounted volume will be much slower than local for sure.

Are you having issues with how fast the backup runs or how much it slows down the production system?
 

Rob Fitzpatrick

ProgressTalk.com Sponsor
When you say "we do AI to a remote box" do you mean you write archived AI extents to a remote file system, or are the AI files that are in the database structure actually mounted on a remote file system?

I typically write DB backups to local storage and then move the file off the box once it is complete. When backing up to remote storage there is a greater chance that an I/O operation can block (e.g. due to network issues) which could block other DB clients as well.

Incremental backups could take less time to write in total as there is less to write, although it does as many reads as a full backup. Some people on SANs use a split mirror or SAN snap copy during a DB quiet point instead of using probkup; you may want to do testing with that approach.
 

Jimbro

New Member
Hi,

AI to local disk, then copied to remote storage via NFS.

This is a unique situation, we were writing backups direct to NFS storage and we moved it local to improve performance. It did not help and so my feeling is that the disk read is the issue meaning the SAN may be underpowered.

I completely agree here, incremental although writing less data may read the same or more data, it has to read every block to find the differences since the last full backup. I think it would cause even more read I/O than a full backup but that is speculation.

We are going to implement quiet points for our backups. I do have a question around that functionally..

With quiet point backups... I hear mention of splitmirror and SAN snap copy which is fine if you have the space or SAN integration. However isn't it possible to simply enable a quiet point then copy the data direct from the filesystem and copy direct to tape or disk target? I have used LVM snapshots successfully however I am curious if we can copy the data direct from the filesystem with the quiet point enabled.

Thanks again
 

Jimbro

New Member
In most cases you are not going to see amazing differences with FS parameter tweaks. Backing up to a remote/NFS mounted volume will be much slower than local for sure.

Yes we tried moving local to no avail.. so figured we would try filesystem tweaks at the ext3 level.

Are you having issues with how fast the backup runs or how much it slows down the production system?

The issue is the production impact, we have slow scans and line disconnects. If the backup took 3 hours and caused no impact that would be just fine. Unfortunately it is killing production.
 

TheMadDBA

Active Member
The incremental online backup does read every single block just like a full backup, fixing this has been on the wish list forever but I am not holding my breath.

Until you get the disk IO/memory on the production box sorted out you have a few options.... (not all mutually exclusive)

1) Pause the AI applies on the warm site and do a probkup/OS backup there - moving the workload off the production box. I did this for years on multi terabyte databases and this worked out better all around. We tested our AI process on a frequent basis and removed workload from the production box. Using probkup for a 2.4TB database is not a lot of fun.

2) use quiet points (very very carefully) and split the mirrors or use rsync to merge the files to another location. Just copying the files would probably cause more of a problem because of the increased IO, probably for a shorter time though.

3) look into your specific OS options for controlling IO. Some flavors of UNIX/Linux have options to control how much IO a process can consume. This will not speed up the probkup but might lessen the impact while it is running.

4) Some file systems/OS combos let you mount file systems with options to (mostly) bypass the buffer cache or at least stop rampant usage of the bufer cache for that FS.
 

Jimbro

New Member
Thanks, these are excellent pointers.

Option 2 is the direction I am going to push towards. Quiet point with rsync to another set of disk to avoid I/O contention at the disk layer. I think this is an excellent option.
I did some test simply copying database extents and the impact was much much less so this is encouraging.

Option 3 is out of the question because unfortunately cgroups are not possible as we are running an antiquated kernel version (RHEL5). ionice is not an answer here because that just sets priority and will limit only if something else needs it.

Option 4 sounds interesting but I need to read more to understand.

Thanks again
 

Rob Fitzpatrick

ProgressTalk.com Sponsor
When making OS copies of databases, be sure your structure is up to date, i.e. don't miss any files.

When using quiet points, note that the QP begins only when a message to that effect appears in the DB log or promon screen 5, not when the command returns to the command prompt.
 

TheMadDBA

Active Member
Like Rob said you have to be very careful when dealing with quiet points, especially in scripts. Test, re-test and re-re-test over and over again.

That is the nice thing about probkup (online or offline) - it always gets all of your DB files.

Something I forgot to mention earlier... You should run iostat during the backups to see exactly where the bottleneck is. You might be able to adjust the adapter settings and/or the queue depth settings to help out with the performance issues.
 

Jimbro

New Member
Hi more great advice, thanks.

So the proquiet thing. I am using the "ST" file to gather the necessary DB extents that need to be backed up. Talking with the DBA she said that it's best to use prostrc list [DB] to make sure the "ST" file is up to date before we do the actual Quiet Point and make sure we have everything. Of course we will grab all necessary files not only the extent. We will be testing this week.

I have run IOstat during the backups and indeed it shows 100% utilization so the bottleneck is indeed disk related.
Running iotop further confirms that the culprit is indeed the incremental online backup, it is the top hog at the time of problems. Limiting IO allowed to these processes would fix it however as I said unfortunately the older kernel versions don't allow for that.

I imagine I should look into modifying the queue depth settings at the SAN level, this is a VM.

Thank you
 

Rob Fitzpatrick

ProgressTalk.com Sponsor
Re: structure
Your DBA is correct, you can't assume the structure file is current so you should recreate it. If someone comes along and adds extents to your DB, or worse, entire areas, without updating the structure file and you use dbname.st to copy "all" of the DB files then you're in for a nasty surprise. The prostrct list dbname command recreates the structure file based on the DB's internal structure in the _Area and _AreaExtent tables. But make sure you check for errors. For example if your script inherits a bad environment config tomorrow and it can't find the prostrct command, it won't overwrite the structure file so it will be unchanged from what it is today. Check your command exit statuses and check that your .st file's time stamp remains current.

Re: virtualization
There's no reason why you can't virtualize a DB server, but it does add a layer of complexity and requires some level of specialized knowledge. Some shops use virtualization extensively or exclusively, but they aren't newbies. It also isn't a free lunch; there will be some performance hit.
 

TheMadDBA

Active Member
Queue depth actually applies at the Linux level as well as the SAN. SAN defaults are usually much more reasonable than the OS defaults.

Linux/AIX usually assume something low like 8 outstanding requests per disk. Look at the iostat output for avgwqsz,sqfull or something similar. That will tell you if the OS is queuing IO requests or not. Sometimes fixing the queue depth can make a lot of IO related issues go away (assuming the underlying SAN can handle the load).
 

Jimbro

New Member
OK thank you again.

My queue deoth size is set to 32 for my 4 sd devices sd[a-d]

cat /sys/block/sd[a-d]/device/queue_depth
32
32
32
32

What should I look out for in the avgqu-sz column as a WARNING when I run iostat during the full backup? I should know already and I will do some reading as well.

Thanks again
 

TheMadDBA

Active Member
Basically anything greater than 0 means requests are queuing up. Read up on your version of iostat for the exact options and column names.

To properly set it you will need to know how many disks are underneath the cover for what you see as a disk. Also if you are sharing those physical disks with other systems.

Some SAN vendors have basic (and conservative) guidelines based on the number of spindles in use. I start with assuming 16-32 entries per physical disk. So if sda ends up being 8 physical disks on the SAN I would set it to 128 to start with, never exceeding 256 per disk.

Good luck :)
 

TomBascom

Curmudgeon
I Googled "HP MSA p2000G3 10GbE iSCSI SAN"

HP refers to the MSA as "entry level" and "effective cost vs capacity trade-off".

Those are terms that are synonymous with RAID5 and the ilk. I somehow doubt that you actually have RAID 10.

In any event -- SANs are never deployed to make databases go faster. They are *shared* infrastructure. The trade-off that you are always making is reduced performance compared to dedicated infrastructure.

The same is true of virtualization. Especially if someone implements "thin provisioning" or over commits your server.

That doesn't mean that you cannot or should not use a SAN or virtualize. But neither approach will make your system go faster and if you do not understand where you are compromising performance you are unlikely to be able to make it any better.

Back to the title topic... "noatime" is a very sensible setting for filesystems containing databases. Once in a while I can even imagine that I see it making a difference when benchmarking but the impact is quite small (less than 10%) and difficult to reliably reproduce.

Something that might help a lot with iscsi -- implement "jumbo frames". And use a distinct, dedicated physical network for the iscsi. Don't share it with everything else.
 

Jimbro

New Member
Will look into the MSA P2000 recommane
I Googled "HP MSA p2000G3 10GbE iSCSI SAN"

HP refers to the MSA as "entry level" and "effective cost vs capacity trade-off".

Those are terms that are synonymous with RAID5 and the ilk. I somehow doubt that you actually have RAID 10.

In any event -- SANs are never deployed to make databases go faster. They are *shared* infrastructure. The trade-off that you are always making is reduced performance compared to dedicated infrastructure.

The same is true of virtualization. Especially if someone implements "thin provisioning" or over commits your server.

That doesn't mean that you cannot or should not use a SAN or virtualize. But neither approach will make your system go faster and if you do not understand where you are compromising performance you are unlikely to be able to make it any better.

Back to the title topic... "noatime" is a very sensible setting for filesystems containing databases. Once in a while I can even imagine that I see it making a difference when benchmarking but the impact is quite small (less than 10%) and difficult to reliably reproduce.

Something that might help a lot with iscsi -- implement "jumbo frames". And use a distinct, dedicated physical network for the iscsi. Don't share it with everything else.


Thanks Tom.

Yes P2000 was a bad choice for sure. Nothing but problems with this thing. The 10GbE iSCSI segment is dedicated only to iSCSI traffic. From that perspective we are OK. P2000 is like any basic SAN where you can create "VDISKS" of different RAID levels. We carved out 12 x 10k disks VDISK in a RAID 10 for the progress database. Thing is that the controllers struggle and we just need to upgrade soon. We have a couple of options we are looking at, either upgrade these controllers to FC MSA 2040 controllers and add a shelf of 15k disks or we will add a more enterprise class SAN and run these higher demanding VMs off of that.

I agree that not all servers are virtualization candidates. It's hard to communicate that to people that don't understand though. In some cases, actually in this case, we are going from a dedicated server with 15k disks to a virtualized environment where the 10k Spindles are shared..there is no way you will get the same type I/O in this case. It's been workable up until lately.
 
Top