English & Russian in the same database

umityaz

New Member
We have a database created in local codepage(English). Without changing our program codes, how can we use the same database for English and Russian languages?
 

RealHeavyDude

Well-Known Member
You don't say anything 'bout your Progress/OpenEdge version. This is important information as the support for unicode was not there from the beginning.

In a nutshell:

You're talking about different code pages. The only code page that allows you to correctly store characters from different code pages into one is unicode ( UTF-8 ). Therefore the only correct way is to have your database code page changed to UTF-8.

But there's more to it: The clients uses code pages too. Most important they use a -cpinternal and a -cpstream. If the conversion is possible ( say from ibm850 to is8859-1 that is because each character in one code page appears in the other code page, just at a different code ) Progress does it automatically for you. If the conversion is not possible ( say from iso8859-1 to 1250 [ polish ] ) then Progress won't.

This introduces another problem that can produce headaches of all sorts. When the database code page is unicode and you have different clients with different code pages connected to that same database changing each others data - that's when the nightmare really begins. Suppose a client using the iso8859-1 code page is retrieving data that was entered using a Eastern European code page. The characters which are not present in iso8859-1 will be display as question marks. When the user then changes the data your data will get screwed up.

The only solution for that is having your clients running unicode too. But the support for UTF-8 for the GUI client was introduced in 10.0B and only for the GUI client. But I wouldn't use 10.0B - you should at least be on 10.1A for that.

And then, unicode does not come for free. There are all sorts of other things you need to be aware of, of which different sort orders are the least of the issues. Much more significant are the string functions like LENGTH ( ). These will always return the number of bytes and not the number of characters. It is just so that when you use a "English" code page that the number of bytes matches the number of characters. In unicode a character needs two bytes for storage. That means not only will the database grow in size, you need to have a look at the string compare functions in your code too.

HTH, RealHeavyDude.
 

umityaz

New Member
Thanks for your reply, and sorry for insufficient information.

Some of our clients uses OpenEdge 10.1B and some uses OpenEdge 10.2A. I might convert my database to UTF-8 and change clients' -cpinternal and a -cpstream parameters to UTF-8 (if there is a walkthrough for that it will be appriciated).

But what makes me worry is the sort part. How our indexes will be sorted? Will they be in russian order, or English, or UTF-8? When I retrieve data, will it be sorted properly? Can I change the sort for each client appropriate for his language choice? Will it decrease the speed of sorting?

Also do I have to change my source code to UTF-8 as well? If I change only the server database, will my clients be able connect our database without reinstalling Openedge which has been already installed with local codepage?
 

RealHeavyDude

Well-Known Member
  1. Changing the code page of your database to UTF-8 is rather easy *) ( I think that was introduced around 9.1c or 9.1d ):
    • Make a backup of your database
    • Truncate the before image
    • Use the proutil <dbname> -C convchar convert UTF-8 utility to change the code page
  2. To change the client code pages you just need to supply the -cpinternal and -cpstream parameters to the client startup. To change this globally you can change these settings in the startup.pf which you will find in the installation directory. This file is created during installation and the parameters in it reflect the choice you made during the setup. This file is used by every Progress process you start from that installation. UTF-8 is one of the languages that gets installed automatically whether you choose it or not ( have a look in the prolang directory - you will find a utf directory ).
  3. The sort order is determined by the collation tables in the database. If you just change the code page these don't get changed - therefore your sort order will be the same as it was before, it might just be that that is not sufficient for clients using other code pages. You can load a different collation ( have a look, in the utf directory you will find a lot of ICU-*.df files for different sort orders for different languages ) into your database, but, AFAIK, it can only have one - just like the Highlander :awink:.
  4. You also need to take into account that the CONTAINS operator used the word break rules in your database - which are undefined for UTF-8 per default. Therefore you will get an error message whenever you try to execute a data retrieval statement containing the CONTAINS operator. For that you need to use the word break compiler ( don't know on top of my head how it's working - look into the knowledge base ).
  5. For different sorting ( other than the default which is defined in the database's collation tables ) you can use the COLLATE option on data retrieval statements - but you need to change your code for that.
  6. Running your GUI client with -cpinternal UTF-8 and -cpstream UTF-8 requires ( AFAIK ) that you re-compile your application with the same settings. Otherwise you might get screwed up displays.
  7. If you just change the code page of the database to UTF-8 your clients will connect happily to it no matter which code page setting they use.
HTH, RealHeavyDude.

*) The proutil <dbname> -C convchar has a option charscan to scan for character of which you know that they are not present in the target code page - which is not relevant for UTF-8.
 

tamhas

ProgressTalk.com Sponsor
While it won't help you immediately, I know that PSC is working on sorting of multiple languages simultaneously. In the meantime, you might need a custom collation table, but that shouldn't that bad to construct if all you need is English and Russian ... all you need to do is to decide what the correct sort order is when they are mixed.
 

umityaz

New Member
Thanks for your replies. I have followed the steps at the end of my reply, to create a UTF-8 database and Dump&load my database to it.

When I wrote a line
proenv>c:\Progress\OpenEdge\bin\prowin32.exe -p c:\ProgramFolder\program.r
-db myDB -H localhost -U myUser -P myPassword -N TCP -S 20000 -E -ininame c:\ProgramFolder\program.ini -basekey INI -T c:\ProgramFolder\temp -s 20000 -cpinternal UTF-8 -cpstream UTF-8


it gives an error;

Collation table for code page UTF-8 and collation name LOCAL was not found in convmap.cp. (1043)

To solve that; I have uninstalled OpenEdge 10.2A with codepage local, and installed OpenEdge 10.2A with codepage UTF-8. When I wrote a similar line it gave me a similar error;
proenv>c:\Progress\OpenEdge\bin\prowin32.exe .... -cpinternal 1251 -cpstream 1251

Collation table for code page 1251 and collation name BASIC was not found in convmap.cp. (1043)

When I wrote
proenv>c:\Progress\OpenEdge\bin\prowin32.exe .... -cpinternal UTF-8 -cpstream UTF-8

it worked fine.

Now I need a way to make my database work without uninstalling my clients' existing installations at their pcs in their local language.

Also, beacuse I reinstalled my OpenEdge 10.2A with codepage UTF-8, when I try to compile my source codes, I see lots of errors. Do I have to write my code with codepage UTF-8? If it is so, is there an easy way to change my existing codes to UTF-8? Because I have almost 800 w files.


To create a UTF-8 database:
1) Dump the existing database.
2) Create a new empty UTF-8 database:
prodb %DLC%\prolang\utf\empty.db
3) Compile a new version of word break table for UTF-8 to a rule number :
proutil -C wbreak-compiler %DLC%\prolang\convmap\utf8-bas.wbt
where is number between 1 and 255.
4) Place the newly created file proword. in %DLC%
5) Apply the new word rules to the database:
proutil -C word-rules
6) Load the database.
 

RealHeavyDude

Well-Known Member
When you install OpenEdge you need to make sure that the languages you use are installed too. In your case this would be English and Russian. UTF-8 gets installed per default - you don't need to do a installation with UTF-8. During installation the convmap.cp file (which you find in the installation directory) is created. It contains information to handle internationalization information behind the scenes.

To get an understand you should check out this knowledge base article:


This might also be helpful:


HTH, RealHeavyDude.
 

RealHeavyDude

Well-Known Member
Sorry - my fault. Best is if you look up the knowledge base with the following search strings

"I18N. How CONVMAP Relates to Code Page and Collation Settings"

and

"How to add a collation table within convmap.cp for a specified code page specified"


HTH, RealHeavyDude.
 
Top