Unicode from Google maps.

ferver

New Member
I retrive some info from Google Maps to validate and geocoding an Address, but Progress ( 10.1B03 ) have problems to translate the unicode UTF-8 characters. This is an example ( without any error controls to simplify ) you can test this in the Editor.

The rigth anwer must be 3580 Jose(with a tilde ´ in the e ) Antonio .......

Thanks for any help on this.

/*****************************************/
DEFINE VARIABLE hParser AS HANDLE NO-UNDO.
define variable hhandler as handle no-undo.
define variable clee as char no-undo.
define variable pdirec as char no-undo.
CREATE SAX-READER hParser.
hParser:HANDLER = this-procedure.
hParser:SET-INPUT-SOURCE ("FILE", "http://maps.google.com/maps/geo?q=3580+cabrera+buenos+aires+argentina&output=xml&key=abcdefg").
hParser:SAX-PARSE() NO-ERROR.
DELETE OBJECT hParser.
PROCEDURE StartElement:
DEFINE INPUT PARAMETER namespaceURI AS CHARACTER.
DEFINE INPUT PARAMETER localName AS CHARACTER.
DEFINE INPUT PARAMETER qname AS CHARACTER.
DEFINE INPUT PARAMETER attributes AS HANDLE.
clee = localName.
END.
PROCEDURE Characters:
DEFINE INPUT PARAMETER charDataA AS MEMPTR NO-UNDO.
DEFINE INPUT PARAMETER numChars AS INTEGER NO-UNDO.
define var charData as memptr no-undo.

case clee:
when "address" THEN
pdirec = pdirec + GET-STRING( charDataA, 1, GET-SIZE(charDataA)).
END.
END PROCEDURE.
PROCEDURE EndElement:
DEFINE INPUT PARAMETER pcNamespaceURI AS CHARACTER NO-UNDO.
DEFINE INPUT PARAMETER pcLocalName AS CHARACTER NO-UNDO.
DEFINE INPUT PARAMETER pcQName AS CHARACTER NO-UNDO.
case clee:
when "address" THEN
message pdirec view-as alert-box information.
eND.
clee="".
END PROCEDURE.
/***************************************************/
 
Of course I missed: the internal code page ISO8859-1 , stream code page ISO8859-1.

The same mistake appear even if you are not connected to a database.

Correct me if I'm worng, the XML comes from Google coding with UTF-8. and Progress parser must convert the data coding to the session coding.

If you use IE, XML Notepad ,....., and many of these tools, the data appears in the correct way, but with Progress the SAX parser "eats" some characters, I tried with DOM X-Documents with same results.

Thanks for your answer, What do you think could be a bug?. It sound too basic to be a bug.
 
Please I you try with my example please change 3580 to 3581 or 3582, ...., Too meny checks to Google Maps without a valid key block the address.

Thanks.
 
I'm not sure which code page is being used here ... my guess would be stream. Does 8859-1 have e tilde? From what I can see here http://publib.boulder.ibm.com/bookmgr/pictures/qb3aq501.p3316z.gif it doesn't. So, you have a fundamental problem. Even if you use UTF-8 for the stream, what it is supposed to map to in 8859-1?

So, for starters, you are going to have to use UTF-8 to even receive such data and then once you get it, you are going to have to decide what you want to do with what you got. To preserve it, you are going to have to move the database and everything else to UTF-8 or some other Unicode standard that has all the characters you expect.

FWIW, super reference on code pages here: http://www.i18nguy.com/unicode/codepages.html
 
If I understand your problem is that your session codepage is Latin-1 (ISO8859-1) for both cpstream and cpinternal.

The accented characters are converted into ? (Question marks).

I don't think that OpenEdge is aculatly converting the caracters into the session codepage but the ABL is just doing a best match.

I think the solution to:

Code:
PROCEDURE Characters:
  DEFINE INPUT PARAMETER charDataA   AS MEMPTR NO-UNDO.
  DEFINE INPUT PARAMETER numChars AS INTEGER NO-UNDO.
 
 
DEFINE VARABIALE mpTEMPMEMPTR AS MEMPTR
 
/**define var charData as memptr no-undo. WHY IS THIS HERE? **/
 
 
Set-size(mpTEMPMEMPTR) = 0.
   case clee:
     when "address" THEN
COPY-LOB(OBJECT charDataA   TO OBJECT mpTEMPMEMPTR, CONVERT SORCE CODEPAGE 'UTF-8' TARGET CODEPAGE 'ISO8859-1' )
        pdirec = pdirec + GET-STRING( mpTEMPMEMPTR, 1, GET-SIZE(mpTEMPMEMPTR)).
 
 
Set-size(mpTEMPMEMPTR) = 0.
 
 
  END.
END PROCEDURE.

I have not tested this my self, so I don't know if it's going to work. Sometime some characters just cannot be mapped to between code pages and that is why you end up this question marks.
 
Sometime some characters just cannot be mapped to between code pages

Yes, the bottom line here is that e tilde is not a character in the 8859-1 code page so one either needs to do one of the following:
* change the code page one is using to store the data;
* provide a convmap, e.g., from e tilde to e; or
* accept the ?
That pretty much exhausts the options.
 
Back
Top