Partial backup verify returning -1

jdpjamesp

ProgressTalk.com Moderator
Staff member
OE 12.2.3
Windows

Got a backup ant script that takes a backup with probkup and then does a partial verify on it. More accurately it does this on n databases, with up to p of them in parallel.

Most of the time it works fine, but recently I've been seeing that the partial verify is returning -1. Or at least I presume it's -1, it's rendered as 255.

The output from the verify all looks ok, so I'm a little confused. The upshot though is that my Ant script gets marked as a failed build.

Has anyone seen -1 or 255 from a partial verify before? Any ideas what might cause it?

VerifyBackup:
[echo] Working on Database: SmartDB Full path: C:/Datenbanken/Prod/SmartDB-Prod/SmartDB
[echo] Working on Database: WBPDB Full path: C:/Datenbanken/Prod/WBPDB/WBPDB
[exec] Das ist ein full Backup von C:\Datenbanken\Prod\SmartDB-Prod\SmartDB.db. (6759)
[exec] Diese Backup wurde genommen Mon Mar 1 21:15:17 2021. (6760)
[exec] Die Blockgröße is 8192. (6994)
[exec] Teilweises, vergleichendes Lesen des Sicherungsbandes erfolgreich. (6765)
[exec] Verify-Durchlauf gestartet. (3751)
[exec] Das ist ein full Backup von C:\Datenbanken\Prod\WBPDB\WBPDB.db. (6759)
[exec] Diese Backup wurde genommen Mon Mar 1 21:15:17 2021. (6760)
[exec] Die Blockgröße is 8192. (6994)
[exec] Teilweises, vergleichendes Lesen des Sicherungsbandes erfolgreich. (6765)
[exec] Verify-Durchlauf gestartet. (3751)
[exec] Verified 15394 db blocks in 00:00:00
[exec] Backup für C:\Datenbanken\Prod\SmartDB-Prod\SmartDB.db als OK bestätigt. (6758)
[exec]
[echo] Working on Database: drutmp Full path: C:/Datenbanken/Prod/drutmp/drutmp
[exec] Das ist ein full Backup von C:\Datenbanken\Prod\drutmp\drutmp.db. (6759)
[exec] Diese Backup wurde genommen Mon Mar 1 21:15:20 2021. (6760)
[exec] Die Blockgröße is 4096. (6994)
[exec] Teilweises, vergleichendes Lesen des Sicherungsbandes erfolgreich. (6765)
[exec] Verify-Durchlauf gestartet. (3751)
[exec] Verified 4716 db blocks in 00:00:00
[exec] Backup für C:\Datenbanken\Prod\drutmp\drutmp.db als OK bestätigt. (6758)
[exec]
[echo] Working on Database: docu Full path: C:/Datenbanken/Prod/docu/docu
[exec] Das ist ein full Backup von C:\Datenbanken\Prod\docu\docu.db. (6759)
[exec] Diese Backup wurde genommen Mon Mar 1 21:15:21 2021. (6760)
[exec] Die Blockgröße is 4096. (6994)
[exec] Teilweises, vergleichendes Lesen des Sicherungsbandes erfolgreich. (6765)
[exec] Verify-Durchlauf gestartet. (3751)
[exec] Verified 2790 db blocks in 00:00:00
[exec] Backup für C:\Datenbanken\Prod\docu\docu.db als OK bestätigt. (6758)
[exec]
[exec] Verified 1512036 db blocks in 00:00:58
[exec] Backup für C:\Datenbanken\Prod\WBPDB\WBPDB.db als OK bestätigt. (6758)
[exec]

BUILD FAILED
C:\Consultingwerk\Jenkins\workspace\ProdSicherungAnt\DBBackup.xml:92: The following error occurred while executing this line:
C:\Consultingwerk\Jenkins\workspace\ProdSicherungAnt\DBBackup.xml:95: exec returned: 255

Total time: 5 minutes 34 seconds
Build step 'Ant aufrufen' marked build as failure
Finished: FAILURE
 
I see that "exec returned 255" but it is unclear what exec was executing. Presumably some form of probkup is involved but it isn't clear what the command line was.

Do any of the .lg files contain any clues?

Probably not at all related except for the error number but I have very recently had a problem with rsync returning 255 from time to time - that turns out to have been from the attempted network connection being rejected, apparently for lack of resources on the target. Perhaps your highly parallel process is sometimes exhausting some system resource?
 
Thanks Tom - this is the Ant command:

<exec executable="${progress.DLC}/bin/_dbutil" failonerror="true" dir="${Database.List.@{database}.Location}">
<arg line="prorest ${Temp.Directory}/${Database.List.@{database}.FileName} ${Database.List.@{database}.LocalBackup}${DSTAMP}${Database.List.@{database}.BackupFile} -vp" />
</exec>

Which will build the following command:

_dbutil prorest <db> <backupfile> -vp

I can't see which of the parallel commands it fails on as all of them seem to be completing.

I could change the parallel number to only do 1 at a time and see if that sheds any light, but ideally want them running in parallel to reduce backup durations. And yes I've tested it does reduce the duration by a decent factor. :)
 
Belt and braces Rob. Not specifically had anything bad happen but it can't harm, surely?
 
That's interesting. I actually built it into the script after someone at the EMEA PUG, someone well respected in the DBA world, asked me why my backup script doesn't contain a verify... I can't quite remember exactly who asked it though so I won't throw around accusations.
I mean - I have no problem taking the step out. :)
 
Reading between the lines (Ant task, offline DBs, small DBs, many operations in parallel), are these DBs part of a CI/CD pipeline, as opposed to a production or production-support environment? And if so, is verification really necessary? Just curious.
 
They're online dbs and very much part of a production environment. All bar one of them are small - the trappings of ADM2 mostly and our SmartDB.
 
They're online dbs and very much part of a production environment. All bar one of them are small - the trappings of ADM2 mostly and our SmartDB.
And the users don't mind daily downtime?

I think that being able to restore a backup and open that restored DB is a much better test that your backups are usable than prorest -vp or -vf. If you did online backups and then copied them to DR, you could:
  • eliminate the daily downtime for backup and verification;
  • reduce the overall time taken and I/O load on the prod server, by removing the prorest step(s);
  • have a robust verification step, if desired, that runs on the DR server(s);
  • probably improve post-backup application performance by having a non-empty buffer pool.
 
Not sure where you're getting the downtime from - the -vp works online. Or at least, it finishes ok online. Maybe those two are different things! :)

I'm not in control of what happens after I take the backups - the volume they're put on is snapshotted afterwards. I don't have access to a machine that I could use to do a restore. We do use the backups semi-frequently to restore test and development environments, and they keep a long history of backups and AI files.
 
My understanding is that the prorest -vp and -vf options compare block CRCs or block contents between a backup file and a reference database. If that database is online and being updated between the backup and the verification, I wouldn't expect that comparison to always succeed as it's an apples-to-oranges comparison.
 
Ah I see - I wasn't aware of that. Although the DBs aren't being used at the backup time. But good to know. Might explain though why it fails. Still need to get to the DB logs to check this.
 
It would be interesting to see if it's always the same database where it fails. Maybe there is some indication in the DB logs, or in the Jenkins job console output.
 
If it is something external to probkup that is causing an unknown problem (255 often means "unknown problem") then you might also find a clue in the windows event viewer.
 
Back
Top