Process compressed XML response

Potish

Member
I have an application that sends a XML request document to a external vendor. The Vendor then responds with an XML document. The respond XML can be relatively large (> 5 MB) and therefore takes a long time to receive back over the WAN. The vendor recommended adding 'Accept-Encoding: gzip' to the request which allows them to compress the response. The compressed file is much smaller. However I am not familiar with how to un-compress the new response in order to convert it back to and XML file I can process using Progress 4GL. Has anyone developed a solution for a similar process that can share ideas. I am running progress OpenEdge 11 on a Windows 2008 Server.


Code I am running looks as follows

Code:
DEFINE INPUT PARAMETER pmop-file      AS MEMPTR    NO-UNDO. /* Memptr of XML request file */

DEFINE VARIABLE vcposturl      AS CHARACTER NO-UNDO.
DEFINE VARIABLE vcfilepath_ZIP AS CHARACTER NO-UNDO.

DEFINE VARIABLE vcHost    AS CHARACTER    INITIAL "xml.vendor.com" NO-UNDO.
DEFINE VARIABLE vcPort    AS CHARACTER    INITIAL "7080"                NO-UNDO.
DEFINE VARIABLE vhSocket  AS HANDLE                                    NO-UNDO.

ASSIGN
    vcposturl      = "http://xml.vendor.com"
    vcfilepath_ZIP  = "C:/.../response.zip".

 
CREATE SOCKET vhSocket.

vhSocket:CONNECT('-H ' + vcHost + ' -S ' + vcPort) NO-ERROR.
 
IF vhSocket:CONNECTED() = FALSE THEN
DO:
    MESSAGE "Vendor Connection failure" VIEW-AS ALERT-BOX.
    RETURN.
END.
ELSE 
DO:
    MESSAGE "Vendor Connection successful" VIEW-AS ALERT-BOX.
END.         

vhSocket:SET-READ-RESPONSE-PROCEDURE('getResponse').

RUN PostRequest (INPUT vcposturl).

WAIT-FOR READ-RESPONSE OF vhSocket.
vhSocket:DISCONNECT() NO-ERROR.

DELETE OBJECT vhSocket.

/* QUIT. */
RETURN.

PROCEDURE getResponse:
 
    DEFINE VARIABLE vcWebResp    AS LONGCHAR        NO-UNDO.
    DEFINE VARIABLE vcXMLExtract AS LONGCHAR        NO-UNDO.
    DEFINE VARIABLE viWebRespIdx AS INTEGER          NO-UNDO.
    DEFINE VARIABLE lSucess      AS LOGICAL          NO-UNDO.
    DEFINE VARIABLE mResponse    AS MEMPTR          NO-UNDO.
 
    IF vhSocket:CONNECTED() = FALSE THEN do:
        MESSAGE "Vendor getResponse Not Connected" VIEW-AS ALERT-BOX.
        RETURN.
    END.
    lSucess = TRUE.     
     
    DO WHILE vhSocket:GET-BYTES-AVAILABLE() > 0: 
        SET-SIZE(mResponse) = vhSocket:GET-BYTES-AVAILABLE() + 1.
        SET-BYTE-ORDER(mResponse) = BIG-ENDIAN.
        vhSocket:READ(mResponse,1,1,vhSocket:GET-BYTES-AVAILABLE()).
        vcWebResp = vcWebResp + GET-STRING(mResponse,1).   
    END.
     
    /* Code below saves response to file. However I don't think I have this correct as the zip file gives errors when I try to extract files from it */   
    vcXMLExtract = vcWebResp.
    OUTPUT TO VALUE(vcfilepath_ZIP).
    EXPORT vcXMLExtract.
    OUTPUT CLOSE.
   
    /* Need ideas on how to process zip file to uncompress and extract XML file(s) for processing using progress */
       
END.

PROCEDURE PostRequest:
    DEFINE INPUT PARAMETER postUrl AS CHAR.

    DEFINE VARIABLE vcRequest      AS CHARACTER.
    vcRequest = 'POST ' + postUrl + ' HTTP/1.1~r~n' +
                'Host: xml.vendor.com~r~n' +
                'Content-Type: application/x-www-form-urlencoded~r~n' +
                'Content-Length: ' + STRING(LENGTH(STRING(pmop-file))) + '~r~n'  +
                'Accept-Encoding: gzip ' + '~r~n'  +
                '~r~n' +
                STRING(pmop-file) + '~r~n'.

    SET-SIZE(pmop-file)            = 0.
    SET-SIZE(pmop-file)            = LENGTH(vcRequest) + 1.
    SET-BYTE-ORDER(pmop-file)      = BIG-ENDIAN.
    PUT-STRING(pmop-file,1)        = vcRequest .
 
    vhSocket:WRITE(pmop-file, 1, LENGTH(vcRequest)).
   
    vhSocket:WRITE(pmop-file, 1, GET-SIZE(pmop-file)).
     
   
END PROCEDURE.
 
Last edited:
You can do two things:
  1. Roll your own HTTP client and make calls to zlip.dll to handle the decompression of the stream as part of the HTTP responce.
  2. Shell out to the OS and make command line calls to curl.exe. cURL is going to handle the transportation of request and responses better than what can be written in the ABL.
I've bitched & moaned at Progress software for years to incorporate a robust HTTP/S client handler. There response was you could write your own in the ABL, that's whats sockets are for. :o(
 
Code:
&global zlib C:\windows\system32\zlib1.dll

PROCEDURE uncompress EXTERNAL "{&zlib}" CDECL PERSISTENT: /* PRIVATE */
    DEFINE INPUT        PARAMETER pDestBuf    AS MEMPTR NO-UNDO.
    DEFINE INPUT-OUTPUT PARAMETER iDestSize  AS LONG NO-UNDO.
    DEFINE INPUT        PARAMETER pSourceBuf  AS MEMPTR NO-UNDO.
    DEFINE INPUT        PARAMETER iSourceSize AS LONG NO-UNDO.
    DEFINE RETURN PARAMETER iretcode AS LONG NO-UNDO.
END PROCEDURE.


FUNCTION DeCompressBuffer RETURNS INTEGER
        (INPUT  InputBuffer  AS MEMPTR,
        OUTPUT OutputBuffer AS MEMPTR,
        OUTPUT OutputSize  AS INTEGER):

  /* DeCompress a piece of memory and return a pointer to the decompressed data,
    in case of failure the size of decompressed data = -1
  */

  DEFINE VARIABLE InputSize  AS INTEGER NO-UNDO.
  DEFINE VARIABLE TempBuffer AS MEMPTR  NO-UNDO.
 
  DEFINE VARIABLE retcode AS INT NO-UNDO.

  InputSize  = GET-SIZE(InputBuffer).
  OutputSize = (InputSize * 100).
  SET-SIZE(TempBuffer) = OutputSize.

  RUN uncompress (TempBuffer,
                  INPUT-OUTPUT OutputSize,
                  InputBuffer,
                  InputSize,
                  OUTPUT retcode).
 
  IF retcode = 0 THEN
  DO:
    SET-SIZE(OutputBuffer) = OutputSize.
    OutputBuffer = GET-BYTES(TempBuffer, 1, OutputSize).
  END.
  ELSE
    OutputSize = -1.
  SET-SIZE(TempBuffer) = 0.

  RETURN retcode.
END FUNCTION. /* DeCompress Buffer */
 
I've seen this use of zlib before and wondered about the factor 100 compression assumption. It's over simplified for larger files and theoretically could be too small for odd files, anyone got a better version?
 
I've bitched & moaned at Progress software for years to incorporate a robust HTTP/S client handler. There response was you could write your own in the ABL, that's whats sockets are for. :eek:(

It's not called ADVANCED business language for nothing...
 
Thank you for the suggestion/sample code using zlib1.dll. I went ahead and setup a simple test to see if it would work with a zip file saved on my hard driver and I keep getting a return code of -3 from the function DeCompressBuffer. Based on the manual for zlib -3 indicates a data error

#define Z_DATA_ERROR (-3)

Below is the sample code I added to the bottom of the code provided for my test.

Code:
DEFINE VARIABLE vczipfile    AS CHARACTER NO-UNDO.
DEFINE VARIABLE vcunzipfile  AS CHARACTER NO-UNDO.

DEFINE VARIABLE zipfileBuffer  AS MEMPTR  NO-UNDO.
DEFINE VARIABLE unzipfileBuffer AS MEMPTR  NO-UNDO.
DEFINE VARIABLE viOutputSize  AS INTEGER NO-UNDO.
DEFINE VARIABLE viSuccess      AS INTEGER NO-UNDO.

ASSIGN
    vczipfile    = "C:\sometest.zip"
    vcunzipfile  = "C:\sometest-out.xml".

FILE-INFO:FILE-NAME = vczipfile.
SET-SIZE(zipfileBuffer) = FILE-INFO:FILE-SIZE.

/* read zip file into MEMPTR */
INPUT FROM VALUE(vczipfile) BINARY NO-MAP NO-CONVERT.
IMPORT zipfileBuffer.
INPUT CLOSE.

viSuccess = DeCompressBuffer(zipfileBuffer, unzipfileBuffer, viOutputSize).
MESSAGE "Success = " STRING(viSuccess) "~r~n Outputsize = " STRING(viOutputSize) VIEW-AS ALERT-BOX.

/* write a MEMPTR out to a xml file */
OUTPUT TO VALUE(vcunzipfile) BINARY NO-MAP NO-CONVERT.
EXPORT zipfileBuffer.
OUTPUT CLOSE.
 
I can see how you might think it would be possible to uncompress a ZIP file, however zip is a more complex form of compression. zlib wont be able to handle zip files. What you need to do to get your test code to work (possibly) is to download a free LINUX for windows tools set which contains an DOS like executable call gzip.exe. Using the gzip.exe command line tool compress a file, then use your test code to try and decompress your compressed file.

In fact, it might be possible to use the gzip.exe to uncompress the data stream sent from the web server. i.e gzip.exe -d "<your compressed file>".

http://gnuwin32.sourceforge.net/
http://gnuwin32.sourceforge.net/packages/gzip.htm
http://gnuwin32.sourceforge.net/downlinks/gzip-bin-zip.php

I hope that makes sense? Someone else might be able to better explain the in-and-outs of compression better than I can. If all else fails you still have cURL.exe for fall back on.
 
Thank you for the clarification on the use of zlib1.dll. I think part of the reason the function is returning a error code -3 (data error) is that the MEMPTR is passing some of the header HTTP trasmission information received with the response to the zlib1.dll procedure call. The text is as follows

!ISO8859-1!HTTP/1.1 200 OK
Server: nginx
Date: Wed, 07 Aug 2013 04:05:57 GMT
Content-Type: text/xml
Content-Length: 89075
Connection: keep-alive
Access-Control-Allow-Origin: *
Content-Encoding: gzip

In the past when working with ascii XML files I used substring to remove this text from response. However now that the response is compressed and binary in nature not sure if substring is a good option.
 
Run this procedure postProcessDataFile after the disconnect. this will strip off the header from the HTTP stream and create a new file called httpcontent.blob

Code:
PROCEDURE postProcessDataFile:
    /** Let's crack this egg adnd see what goodies are inside. **/
    DEFINE VARIABLE mSourceData AS MEMPTR      NO-UNDO.
   
    SET-SIZE(mSourceData) = 0.
    COPY-LOB FROM FILE 'sometest-out.xml' TO OBJECT mSourceData.

    DEFINE VARIABLE inBlobLength AS INTEGER    NO-UNDO.
    DEFINE VARIABLE inPos AS INTEGER    NO-UNDO.

    inBlobLength = GET-SIZE(mSourceData).

    HTTP-HEADER-BLOCK:
    DO inPos = 1 TO inBlobLength:
        IF GET-STRING(mSourceData, inPos, 4) = "~r~n~r~n" THEN
            LEAVE HTTP-HEADER-BLOCK.
    END.
   
    COPY-LOB FROM OBJECT mSourceData STARTING AT (inPos + 4) TO FILE 'httpcontent.blob'.

    RETURN.
END PROCEDURE.

I've modified the getResponce procedure to output to a file as it completely possible for this procedure to be called multiple time during the single session.
Also I this think there might of been a bug in the READ method. It was only reading 1 byte at a time cause many thousands of iteration loops. This might be why it was slow to download a 5MB file.

Code:
PROCEDURE getResponse:

    DISPLAY "getting data".
    DEFINE VARIABLE vcWebResp    AS LONGCHAR        NO-UNDO.
    DEFINE VARIABLE vcXMLExtract AS LONGCHAR        NO-UNDO.
    DEFINE VARIABLE viWebRespIdx AS INTEGER        NO-UNDO.
    DEFINE VARIABLE lSucess      AS LOGICAL        NO-UNDO.
    DEFINE VARIABLE mResponse    AS MEMPTR          NO-UNDO.

    IF vhSocket:CONNECTED() = FALSE THEN do:
        MESSAGE "Vendor  Not Connected" VIEW-AS ALERT-BOX.
        RETURN.
    END.
    lSucess = TRUE.   
   
    DO WHILE vhSocket:GET-BYTES-AVAILABLE() > 0:
        SET-SIZE(mResponse) = 0.
        SET-SIZE(mResponse) = vhSocket:GET-BYTES-AVAILABLE() .
        SET-BYTE-ORDER(mResponse) = BIG-ENDIAN.
        vhSocket:READ(mResponse, 1, vhSocket:GET-BYTES-AVAILABLE()).
        COPY-LOB FROM OBJECT mResponse TO FILE 'sometest-out.xml' APPEND.
    END.
   
END.
 
You are spot on about the READ method possibly being the cause of the delay receiving the XML response from the Vendor because it was reading 1 byte at a time. Utilizing your updated getReponse procedure I am getting signifcantly better response times. I still need to work though making sure the entire XML document is received. Right now it seems like the socket gets disconnected after only part of the message has been received.

Thank you very much for your assistance and suggestions with this.
 
You might need to change the main-block of the code to repeat. As I mentioned before ReadResponce event will/can call the internal procedure multiple times. You need to make sure you can capture all the data being sent from the Web Server then leave the repeat block once the host server issues a disconnect.

Something like this should help:

Code:
  WAIT-SOCKET:
  REPEAT:

    PROCESS EVENTS.

    IF NOT VALID-HANDLE(hSocket) OR 
         NOT hSocket:CONNECTED() THEN
      LEAVE WAIT-SOCKET.

    /* see what comes back from the host, using ReadResponse */
    WAIT-FOR READ-RESPONSE OF hSocket.

  END.
 
I was able to get a working version of the program to read the XML respond data from the host using the pointers provided above. I need to cleanup the program and will post the final working program. Thank you all for all your suggestions.
 
Back
Top