CLOB experiences/best practices

Rob Fitzpatrick

ProgressTalk.com Sponsor
Version: 10.2B06
Runtime: AppServer on Linux or Windows

I was approached by a couple of developers who are prototyping a new application. I have few specifics, but they will be storing "large" amounts of XML messages in the database. For each message there would be a create, a few reads, and then little or no CRUD activity after that; historical data that would eventually be archived.

I will be meeting with them soon to drag out more details, including whether it makes sense to store this data in the DB at all, as opposed to flat files. I've just started my research but one notion I had was to store the data in a CLOB field. I know a bit about CLOBs from the docs but I haven't actually worked with them.

This app already has two databases so one possible approach is to add a third to store these messages and associated data. The downside, compared with adding the new table to one of the two existing DBs, would be incrementally more admin overhead from having another database. The upside would be avoiding the impact to the existing DB's transaction subsystem and buffer pool from moving large amounts of XML data through them.

I'm looking for any experiences, best practices, war stories, cautionary tales, etc. that you folks have to share re CLOBs. I am aware that they are ignored by dbanalys, at least in 10.2X. I haven't checked in 11.1 yet.
 
War story: I forgot to specify the storage area... the CLOB ended up in the schema area. Lots of data got written. Ugliness ensued.

Warning: proutil doesn't know anything about CLOBs. You will not be able to find any data to help you analyze storage utilization.

Possible tall tale: The "proper" rows per block for CLOBs may not be 1. A famous engine crew personage has argued that other values may work better. But without working data from proutil it is difficult to know if this is true.
 
We do store all sorts of documents as BLOBs in our database for some 4 years. I said database because I wouldn't want to spawn transactions across several databases for many reasons. So far we don't have any issues but from my point of view it is absolutely necessary to have dedicated storage areas for the BLOBs and CLOBs and there should not be anything else in these areas.

Heavy Regards, RealHeavyDude.
 
I would also make sure the CLOB fields are defined as using codepage UTF-8, as progress writes xml data in codepage UTF-8 by default.
 
We do store all sorts of documents as BLOBs in our database for some 4 years. I said database because I wouldn't want to spawn transactions across several databases for many reasons. So far we don't have any issues but from my point of view it is absolutely necessary to have dedicated storage areas for the BLOBs and CLOBs and there should not be anything else in these areas.

Heavy Regards, RealHeavyDude.

Thanks RHD. I understand your point about a single database but that ship has already sailed long ago.

My tentative plan, if we go ahead with this, is to have a dedicated database for this one table, with three areas: one for the table, one for the index, and one for the CLOB field.

I may assign the table and its indexes to B2 so that arbitrary amounts of XML data don't flush them out of the buffer. Too early to tell as I don't yet know this table's I/O profile in the application.
 
War story: I forgot to specify the storage area... the CLOB ended up in the schema area. Lots of data got written. Ugliness ensued.

Warning: proutil doesn't know anything about CLOBs. You will not be able to find any data to help you analyze storage utilization.

Possible tall tale: The "proper" rows per block for CLOBs may not be 1. A famous engine crew personage has argued that other values may work better. But without working data from proutil it is difficult to know if this is true.

Thanks Tom. I'm going to test with 10.2B07 and 11.1 first. If proutil is still oblivious to CLOBs in those releases I'll bug support again as PSC is in a listening mood at the moment... :)
 
CLOB fields? I thought code page is defined at the database level, via cpinternal.

CLOB fields can have their own codepage, independent of cpinternal or database codepage.
Look for documentation on the CLOB-CODEPAGE option for database definitions and the COLUMN-CODEPAGE option for temp-table definitions.
 
There was a discussion on PEG last April about BLOB vs CLOB. It sort of just died out, I couldn't figure out why I would want to limit myself to a CLOB.

COPY-LOB does all conversions equally regardless of the field being a CLOB or BLOB - assuming your -cpinternal is utf-8 - otherwise you cannot even get the utf-8 file into the CLOB whereas the BLOB only has an issue when trying display the retrieved result.

The test text file was created with notepad and saved as ansi and as utf-8 with a hello <euro sign> world content.

Code:
[FONT=arial]DEFINE TEMP-TABLE tt NO-UNDO[/FONT]
[FONT=arial]   FIELD bbu AS [/FONT][FONT=arial]BLOB[/FONT]
[FONT=arial]   FIELD ccu AS [/FONT][FONT=arial]CLOB[/FONT]

[FONT=arial]   FIELD bba as [/FONT][FONT=arial]BLOB[/FONT]
[FONT=arial]   FIELD cca as [/FONT][FONT=arial]CLOB[/FONT]
[FONT=arial]   .[/FONT]

[FONT=arial]DEF VAR lbbu AS LONGCHAR.[/FONT]
[FONT=arial]DEF VAR lbba AS LONGCHAR.[/FONT]
[FONT=arial]DEF VAR lccu AS LONGCHAR.[/FONT]
[FONT=arial]DEF VAR lcca AS LONGCHAR.[/FONT]

[FONT=arial]CREATE tt.[/FONT]
[FONT=arial]COPY-LOB FROM FILE "c:/temp/test.utf.txt" TO tt.bbu. /* cpinternal is [/FONT][FONT=arial]irrelevant */[/FONT]
[FONT=arial]COPY-LOB FROM FILE "c:/temp/test.utf.txt" TO tt.ccu. /* cpinternal must be[/FONT] [FONT=arial]utf-8 otherwise error 12009 */[/FONT]

[FONT=arial]COPY-LOB FROM tt.bbu TO lbbu.[/FONT]
[FONT=arial]COPY-LOB FROM tt.ccu TO lccu.[/FONT]

[FONT=arial]COPY-LOB FROM FILE "c:/temp/test.ansi.txt" TO tt.bba.[/FONT]
[FONT=arial]COPY-LOB FROM FILE "c:/temp/test.ansi.txt" TO tt.cca.[/FONT]

[FONT=arial]COPY-LOB FROM tt.bba TO lbba.[/FONT]
[FONT=arial]COPY-LOB FROM tt.cca TO lcca.[/FONT]

[FONT=arial]MESSAGE[/FONT]
[FONT=arial]   STRING(lbbu) skip[/FONT]
[FONT=arial]   STRING(lccu) SKIP[/FONT]
[FONT=arial]   STRING(tt.ccu) skip(1)[/FONT]

[FONT=arial]   STRING(lbba) skip[/FONT]
[FONT=arial]   STRING(lcca) skip[/FONT]
[FONT=arial]   STRING(tt.cca)[/FONT]
[FONT=arial]VIEW-AS ALERT-BOX.[/FONT]
 
Back
Top