tar problem

ron

Member
I am re-working the scripts that back-up to tape. No problem with the PROBKUP side of things - but I hit a problem with tar (we copy quite a large number of Unix files to tape too).

The old script included a long list of directory names in the tar command. That worked just fine - but I wanted to have a file list as well and so I changed the logic to create a file with find first, like this:

cd $C_Home
$FIND . -print >${FINDLIST}

and then:

tar cvfb /dev/rmt/0 20 -I ${FINDLIST} >>$FILELOG 2>&1

I found that the tape archive was HUGE. Investigation showed that (as we all know) find (by default) writes-out the names of directories as well as files to ${FINDLIST}. If file cccc is in the path ./aaaa/bbbb/cccc then it will get written to the tape FOUR times! Once because "." is an entry in ${FINDLIST}, again because "./aaaa" is an entry, once more because "./aaaa/bbbb" is an entry - and finally because "./aaaa/bbbb/cccc" is an entry!

I have dealt with the output of find many times ... and worked with tar many times - and I SHOULD have seen this coming ... but I didn't.

I spent a very long time searching the web to see what others have done about this problem ... but found nothing. However - I found MANY references to people recommending using tar like this:

tar cvfz archive.tgz `find /home -ctime -1 -depth -print`

and - as far as I can see - that will cause exactly the same kind of problem as I've had.

I solved the problem by filtering-out "files" only by adding -type f to the find, like this:

$FIND . -type f -print >${FINDLIST}


Has anyone else hit this problem? If so what did you do about it?

Thanks,
Ron.
 
3 alternatives sprint to mind.

1. Use fully qualified file names as per the example.
2. change your script to only backup the top level directories and let tar take care of the rest.
3. Usr tar without the -I

Create your file list as required. Then try the tar command as follows
tar cvfz archive.gz `cat filelist`

This will create one backup of the file, as will the
tar cvfz archive.tgz `find /home -ctime -1 -depth -print`

This is a case where you can practice on a small directory with a couple of subdirectories.

Try reviewing the archive file with
tar tvfz archive.gz | sort | more
and you should be able to see each file once and only once.
 
Thanks, Toby, I already did do a test - but I've just done it again ...

rhunt >tar cvf ../yy2 `find /home/users/rhunt/a -depth -print`
a /home/users/rhunt/a/f1 0K
a /home/users/rhunt/a/f2 0K
a /home/users/rhunt/a/aa/aa.f1 0K
a /home/users/rhunt/a/aa/aa.f2 0K
a /home/users/rhunt/a/aa/ 0K
a /home/users/rhunt/a/aa/aa.f1 0K
a /home/users/rhunt/a/aa/aa.f2 0K
a /home/users/rhunt/a/ 0K
a /home/users/rhunt/a/f1 0K
a /home/users/rhunt/a/f2 0K
a /home/users/rhunt/a/aa/ 0K
a /home/users/rhunt/a/aa/aa.f1 0K
a /home/users/rhunt/a/aa/aa.f2 0K
rhunt > >tar tvf ../yy2
tar: blocksize = 15
-rw-rw-r-- 2807/100 0 May 5 08:06 2003 /home/users/rhunt/a/f1
-rw-rw-r-- 2807/100 0 May 5 08:06 2003 /home/users/rhunt/a/f2
-rw-rw-r-- 2807/100 0 May 5 08:07 2003 /home/users/rhunt/a/aa/aa.f1
-rw-rw-r-- 2807/100 0 May 5 08:07 2003 /home/users/rhunt/a/aa/aa.f2
drwxrwxrwx 2807/100 0 May 5 08:07 2003 /home/users/rhunt/a/aa/
-rw-rw-r-- 2807/100 0 May 5 08:07 2003 /home/users/rhunt/a/aa/aa.f1
-rw-rw-r-- 2807/100 0 May 5 08:07 2003 /home/users/rhunt/a/aa/aa.f2
drwxrwxrwx 2807/100 0 May 5 08:06 2003 /home/users/rhunt/a/
-rw-rw-r-- 2807/100 0 May 5 08:06 2003 /home/users/rhunt/a/f1
-rw-rw-r-- 2807/100 0 May 5 08:06 2003 /home/users/rhunt/a/f2
drwxrwxrwx 2807/100 0 May 5 08:07 2003 /home/users/rhunt/a/aa/
-rw-rw-r-- 2807/100 0 May 5 08:07 2003 /home/users/rhunt/a/aa/aa.f1
-rw-rw-r-- 2807/100 0 May 5 08:07 2003 /home/users/rhunt/a/aa/aa.f2
rhunt >

As you see, files in the current directory occur twice in the archive - and file one level down occur in the archive three times.

I get the same effect with relative or absolute path names.

The original example I gave was simplified ... in real-life the find has change-date qualification - so I can't use directories ... or I get "everything".

Ron.
 
Strange - I just performed the same sort of tar using Solaris 7 TAR from /usr/bin/tar and got a very different result.

Using exactly the command you specified (except changing the directories) I did not get the problem you see.

The only way around this I can think of it to check your version of tar and see if there is a flag which controls how tar deals with directories.

Good luck
 
We're using Solaris 8 - but I'd be a bit surprised if that makes a difference.

rhunt >ls -l $(whence tar)
lrwxrwxrwx 1 root root 11 Jan 22 21:12 /usr/bin/tar -> ../sbin/tar
rhunt >ls -l /usr/sbin/tar
-r-xr-xr-x 1 root bin 65876 Sep 27 2002 /usr/sbin/tar

I'll keep poking around and see what I can "find".

Thanks,
Ron. :confused:
 
I can suggest one difference already - The tar I have doesn't support zipping the output as part of the tar process - for that you need the GNU version of tar (or some such) to handle tar-balls.

tar I have is:
-r-xr-xr-x 1 bin bin 63568 Mar 23 1999 /usr/sbin/tar

An alternative is (is suppose) to use fsdump. The name of this command will vary by your file systems, so if you are using VxFS the it wil be vxfsdump, and ufsdump for ufs.

This will handle various levels of backup (complete with examples) for creating incremental backups, but has the disadvantage that it prefers an unmounted file system.

Have a look through pkginfo results to see if something has been installed
 
Back
Top