Forum Post: RE: Identify Duplicate Blobs

  • Thread starter Thread starter Peter Judge
  • Start date Start date
Status
Not open for further replies.
P

Peter Judge

Guest
Technically, you can read the BLOB fields and calculate a hash (using the MESSAGE-DIGEST or related functions). Compare the hashes to determine uniqueness. Challenge is when the files change – do all the stored versions need to change? Ie if foo.pdf is stored in 3 records, and someone updates foo.pdf with a signature, do all 3 records need to change? If so, you may want to normalise the db some more, to reduce duplicates. -- peter From: James Palmer [mailto:bounce-jdpjamesp@community.progress.com] Sent: Wednesday, 03 December, 2014 09:25 To: TU.OE.Development@community.progress.com Subject: [Technical Users - OE Development] Identify Duplicate Blobs Identify Duplicate Blobs Thread created by James Palmer In our database we have a table that stores files as BLOBS. Due to the sorts of files people are adding it's bloating our database very quickly. I'm in the process of implementing compression on the files which is gaining us around 30GB already, but it's come to my attention that because of the way the code works we quite often end up with duplicate files in different BLOB records. We currently store the File Name in the DB record, and I can easily extrapolate the size of the BLOB using LENGTH(). I can test this for equality, but is there an alternative method I can use to establish if files are identical? Progress 11.2.1 on Windows Stop receiving emails on this subject. Flag this post as spam/abuse.

Continue reading...
 
Status
Not open for further replies.
Back
Top