Answered D/L Unicode databases.

ron · Mar 24, 2013

AIX 7.1 -- OE10.2B05

In the next few weeks I have to migrate a DB to be Unicode. I am aware that "-cpinternal UTF-8" is required when re-indexing a Unicode DB. Is there anything else that is different when a Unicode DB is dumped/reloaded?

Second question: the Progress docs say: "When an existing database is converted to UTF-8, the amount of storage required by each non-ASCII character increases. Roughly, each non-ASCII Latin-alphabet character converted to UTF-8 tends to require two bytes, while each double-byte Chinese, Japanese, or Korean character converted to UTF-8 tends to require three bytes." How should I interpret that? Does it mean that each character in a char field in a single-byte DB (isi8859-1) becomes two bytes after Unicode conversion? Or does it become three bytes?

Third question: the size limit of a field is 32K. Is this limit 32K physical bytes? Or 32K logical (ie, Unicode) bytes?

Ron.

Stefan · Mar 25, 2013

2. I would guess it varies depending on the content

Code:

DEF VAR lcc AS LONGCHAR.
 
FIX-CODEPAGE( lcc ) = "utf-8".
 
lcc = "hëllö".
 
MESSAGE 
   LENGTH( lcc, "raw" ) SKIP 
   LENGTH (lcc, "character" ) 
VIEW-AS ALERT-BOX.

3. I would guess 32k bytes

RealHeavyDude · Mar 25, 2013

The codepage ( -cpstream ) is very much relevant when you ASCII dump & load a database. But as you mention the index rebuild I am guessing that you are planning to binary dump & load the database. If that's the case, the binary dump & load don't do anything with code pages, they dump and load the data as is. Therefore you need to ensure that the source and target database have the same code page in order to not screw your data.

How much bytes are needed per character very much depends on the character itself. You might want to have a look yourself here http://en.wikipedia.org/wiki/UTF-8.

Heavy Regards, RealHeavyDude.

ron · Mar 25, 2013

Thanks a lot! I didn't appreciate that UTF-8 preserved single-byte encoding for ASCII - and 2 or 3 bytes for (for example) Chinese characters. That makes me feel a lot happier - because it means the conversion should not materially change the size of the database at all. It will only be a future issue as the database collects new data in other languages.

From the looks of things I need to include -cpinternal, -cpstream, -cpcoll and -cpcase as startup parameters.

Ron.

urgent · Mar 26, 2013

I remember these days, i had to do d/l to a different language.
This is what i had to do ill refer in my example to the oldcode & newcode:

Copy the database from the source db that you want to be used and copy it over to the location of the target db
Restore the data base in a separate sub directory using the new Lang code
Now, using the switches for the original db example used here "oldcode" to start the database
Perform .df & .d dump for everything *. Using the new target db name for the .df file name and using the UNDEFINED code.
Once all steps above completed delete the db that you used for D&L
In the same directory still, I created a new empty db using procopy from emtpy8.db for that newcode prolong code directory

NOTE: At this point I have a new empty "newcode" database a data dump and a .df dump from the "oldcode" database (UNDEFINED-for now) since I selected undefined.

Pro-copy the database and move it to the new final location using the "newcode" prolong codes

procopy example.db newdb.db -cpinternal newcode -cpstream newcode -cpcoll OPTION
Then
prorest newdb.db example.db -cpinternal newcode -cpstream newcode -cpcoll OPTION

Answered D/L Unicode databases.

ron

Member

Stefan

Well-Known Member

RealHeavyDude

Well-Known Member

ron

Member

urgent

New Member

Similar threads