How to read UTF-16 file with Progress 9.1E?

andre42 · Dec 29, 2009

Hi everybody,

I am currently trying to process a UTF-16 text file with Progress 9.1E.
At first I thought this would be easy, just

Code:

input stream ImStream from ... convert source 'utf-16':U

but then I found that UTF-16 is only supported starting with OpenEdge. Unfortunately the particular customer is running Progress 9.1E (and no, an upgrade is not an option at the moment - maybe in one or two years).
(cpinternal is 1252 if that matters.)

I tried to open the stream with UCS2 instead, but only the first three bytes (BOM and first character) are read, the following zero byte seems to terminate the input. A second import statement does not read anything.

I also tried to create a convmap.cp for Progress 9.1E using the prolang source files from OpenEdge but 9.1E seems to lack the necessary support.

Now I am trying to read the file manually, ie. using

Code:

input stream ImStream from ... binary.

and a memptr variable. This seems to succeed in reading the file, but I am still having trouble converting it. I would like to try codepage-convert from UCS2 to 1252, but codepage-convert only works on characters, and when I convert the memptr variable to a character variable it is again terminated by the zero byte.

My next idea is to use the Win32 API (WideCharToMultiByte looks promising) although I am not that familiar with it and it seems to be an overkill.

Any good suggestions on what I might try? I don't really want to implement a UTF-16 parser in 4GL.

Regards,

André

Cecil Snodgrass · Dec 29, 2009

Hi Andre,

You didn't mention where the source data file came from, may not matter.

Have you tried using a copy of the OE10 convmap.cp file as a temporary replacement for V9?

TomBascom · Dec 29, 2009

I think you should reconsider the "no upgrade" policy. It seems kind of silly to waste a bunch of time and effort on kludges when a simple upgrade would solve your problem.

andre42 · Dec 30, 2009

Hi Cecil,

Cecil Snodgrass said:
You didn't mention where the source data file came from, may not matter.

Have you tried using a copy of the OE10 convmap.cp file as a temporary replacement for V9?

The file is produced by another software. It seems that older versions of that software wrote normal ASCII files. After an upgrade the software now writes UTF-16 files. (At least that's what I deducted from a sample.)

I tried exchanging the convmap.cp but that gives an error message about a version mismatch. (I expected something like that.)

@Tom: The problem is that the version of our application the customer is using is only supported with Progress 9.1E. Personally I think that it will run with OpenEdge and that we would be able to correct the few resulting issues. But since our application is somewhat complex (for that release: > 1000 tables, > 13.000 source code files, > 300 MB source code) and mission critical (ERP system) nobody is eager to try this. (And I am in no position to overrule this.)
In another case (the customer wanted to use web services which are only supported from a certain OpenEdge version) we did an update to a newer version of our product to be able to switch to OpenEdge, but that is a much better reason to upgrade and the customer wanted to do the upgrade. (Besides, all of this means that costs are incurred for the customer.)

Cecil Snodgrass · Dec 30, 2009

You should be able to read the file with Widows notepad and save with ansi encodeing instead of Unicode Big Endian. On Linux use iconv.

andre42 · Dec 30, 2009

That's the workaround the customer is currently using. (Loading and saving with notepad or some other editor.)
The interface to the other software is supposed to be automated - and it was until the switch to UTF-16 (big endian btw.).

I now solved this by reading the whole file to a memptr buffer, converting it to ANSI into another memptr buffer with WideCharToMultiByte, saving that buffer into another file and then finally reading the converted file normally. A bit awkward, but at least I didn't reinvent the wheel, it is automated, and it is independent of external components (Windows OS given).

Thanks anyway,

André

How to read UTF-16 file with Progress 9.1E?

andre42

Member

Cecil Snodgrass

Member

TomBascom

Curmudgeon

andre42

Member

Cecil Snodgrass

Member

andre42

Member