Partial backup verify returning -1

Cringer

ProgressTalk.com Moderator
Staff member
OE 12.2.3
Windows

Got a backup ant script that takes a backup with probkup and then does a partial verify on it. More accurately it does this on n databases, with up to p of them in parallel.

Most of the time it works fine, but recently I've been seeing that the partial verify is returning -1. Or at least I presume it's -1, it's rendered as 255.

The output from the verify all looks ok, so I'm a little confused. The upshot though is that my Ant script gets marked as a failed build.

Has anyone seen -1 or 255 from a partial verify before? Any ideas what might cause it?

VerifyBackup:
[echo] Working on Database: SmartDB Full path: C:/Datenbanken/Prod/SmartDB-Prod/SmartDB
[echo] Working on Database: WBPDB Full path: C:/Datenbanken/Prod/WBPDB/WBPDB
[exec] Das ist ein full Backup von C:\Datenbanken\Prod\SmartDB-Prod\SmartDB.db. (6759)
[exec] Diese Backup wurde genommen Mon Mar 1 21:15:17 2021. (6760)
[exec] Die Blockgröße is 8192. (6994)
[exec] Teilweises, vergleichendes Lesen des Sicherungsbandes erfolgreich. (6765)
[exec] Verify-Durchlauf gestartet. (3751)
[exec] Das ist ein full Backup von C:\Datenbanken\Prod\WBPDB\WBPDB.db. (6759)
[exec] Diese Backup wurde genommen Mon Mar 1 21:15:17 2021. (6760)
[exec] Die Blockgröße is 8192. (6994)
[exec] Teilweises, vergleichendes Lesen des Sicherungsbandes erfolgreich. (6765)
[exec] Verify-Durchlauf gestartet. (3751)
[exec] Verified 15394 db blocks in 00:00:00
[exec] Backup für C:\Datenbanken\Prod\SmartDB-Prod\SmartDB.db als OK bestätigt. (6758)
[exec]
[echo] Working on Database: drutmp Full path: C:/Datenbanken/Prod/drutmp/drutmp
[exec] Das ist ein full Backup von C:\Datenbanken\Prod\drutmp\drutmp.db. (6759)
[exec] Diese Backup wurde genommen Mon Mar 1 21:15:20 2021. (6760)
[exec] Die Blockgröße is 4096. (6994)
[exec] Teilweises, vergleichendes Lesen des Sicherungsbandes erfolgreich. (6765)
[exec] Verify-Durchlauf gestartet. (3751)
[exec] Verified 4716 db blocks in 00:00:00
[exec] Backup für C:\Datenbanken\Prod\drutmp\drutmp.db als OK bestätigt. (6758)
[exec]
[echo] Working on Database: docu Full path: C:/Datenbanken/Prod/docu/docu
[exec] Das ist ein full Backup von C:\Datenbanken\Prod\docu\docu.db. (6759)
[exec] Diese Backup wurde genommen Mon Mar 1 21:15:21 2021. (6760)
[exec] Die Blockgröße is 4096. (6994)
[exec] Teilweises, vergleichendes Lesen des Sicherungsbandes erfolgreich. (6765)
[exec] Verify-Durchlauf gestartet. (3751)
[exec] Verified 2790 db blocks in 00:00:00
[exec] Backup für C:\Datenbanken\Prod\docu\docu.db als OK bestätigt. (6758)
[exec]
[exec] Verified 1512036 db blocks in 00:00:58
[exec] Backup für C:\Datenbanken\Prod\WBPDB\WBPDB.db als OK bestätigt. (6758)
[exec]

BUILD FAILED
C:\Consultingwerk\Jenkins\workspace\ProdSicherungAnt\DBBackup.xml:92: The following error occurred while executing this line:
C:\Consultingwerk\Jenkins\workspace\ProdSicherungAnt\DBBackup.xml:95: exec returned: 255

Total time: 5 minutes 34 seconds
Build step 'Ant aufrufen' marked build as failure
Finished: FAILURE
 

TomBascom

Curmudgeon
I see that "exec returned 255" but it is unclear what exec was executing. Presumably some form of probkup is involved but it isn't clear what the command line was.

Do any of the .lg files contain any clues?

Probably not at all related except for the error number but I have very recently had a problem with rsync returning 255 from time to time - that turns out to have been from the attempted network connection being rejected, apparently for lack of resources on the target. Perhaps your highly parallel process is sometimes exhausting some system resource?
 

Cringer

ProgressTalk.com Moderator
Staff member
Thanks Tom - this is the Ant command:

<exec executable="${progress.DLC}/bin/_dbutil" failonerror="true" dir="${Database.List.@{database}.Location}">
<arg line="prorest ${Temp.Directory}/${Database.List.@{database}.FileName} ${Database.List.@{database}.LocalBackup}${DSTAMP}${Database.List.@{database}.BackupFile} -vp" />
</exec>

Which will build the following command:

_dbutil prorest <db> <backupfile> -vp

I can't see which of the parallel commands it fails on as all of them seem to be completing.

I could change the parallel number to only do 1 at a time and see if that sheds any light, but ideally want them running in parallel to reduce backup durations. And yes I've tested it does reduce the duration by a decent factor. :)
 

Rob Fitzpatrick

ProgressTalk.com Sponsor
What's the value of the verification? Did something happen with bad backups at an earlier time that necessitated it?
 

Cringer

ProgressTalk.com Moderator
Staff member
Belt and braces Rob. Not specifically had anything bad happen but it can't harm, surely?
 

Rob Fitzpatrick

ProgressTalk.com Sponsor
Belt and braces Rob. Not specifically had anything bad happen but it can't harm, surely?
It can spend your time by leading you down this path. ;)

I don't have an expert opinion on -vp, but I've never used it, or felt the need to. I remember someone from PSC at a conference describing it as "fairly useless", or words to that effect.
 

Cringer

ProgressTalk.com Moderator
Staff member
That's interesting. I actually built it into the script after someone at the EMEA PUG, someone well respected in the DBA world, asked me why my backup script doesn't contain a verify... I can't quite remember exactly who asked it though so I won't throw around accusations.
I mean - I have no problem taking the step out. :)
 

Rob Fitzpatrick

ProgressTalk.com Sponsor
Reading between the lines (Ant task, offline DBs, small DBs, many operations in parallel), are these DBs part of a CI/CD pipeline, as opposed to a production or production-support environment? And if so, is verification really necessary? Just curious.
 

Cringer

ProgressTalk.com Moderator
Staff member
They're online dbs and very much part of a production environment. All bar one of them are small - the trappings of ADM2 mostly and our SmartDB.
 

Rob Fitzpatrick

ProgressTalk.com Sponsor
They're online dbs and very much part of a production environment. All bar one of them are small - the trappings of ADM2 mostly and our SmartDB.
And the users don't mind daily downtime?

I think that being able to restore a backup and open that restored DB is a much better test that your backups are usable than prorest -vp or -vf. If you did online backups and then copied them to DR, you could:
  • eliminate the daily downtime for backup and verification;
  • reduce the overall time taken and I/O load on the prod server, by removing the prorest step(s);
  • have a robust verification step, if desired, that runs on the DR server(s);
  • probably improve post-backup application performance by having a non-empty buffer pool.
 

Cringer

ProgressTalk.com Moderator
Staff member
Not sure where you're getting the downtime from - the -vp works online. Or at least, it finishes ok online. Maybe those two are different things! :)

I'm not in control of what happens after I take the backups - the volume they're put on is snapshotted afterwards. I don't have access to a machine that I could use to do a restore. We do use the backups semi-frequently to restore test and development environments, and they keep a long history of backups and AI files.
 

Rob Fitzpatrick

ProgressTalk.com Sponsor
My understanding is that the prorest -vp and -vf options compare block CRCs or block contents between a backup file and a reference database. If that database is online and being updated between the backup and the verification, I wouldn't expect that comparison to always succeed as it's an apples-to-oranges comparison.
 

Cringer

ProgressTalk.com Moderator
Staff member
Ah I see - I wasn't aware of that. Although the DBs aren't being used at the backup time. But good to know. Might explain though why it fails. Still need to get to the DB logs to check this.
 

Rob Fitzpatrick

ProgressTalk.com Sponsor
It would be interesting to see if it's always the same database where it fails. Maybe there is some indication in the DB logs, or in the Jenkins job console output.
 

TomBascom

Curmudgeon
If it is something external to probkup that is causing an unknown problem (255 often means "unknown problem") then you might also find a clue in the windows event viewer.
 
Top