[Progress Communities] [Progress OpenEdge ABL] Forum Post: RE: Read a PDF File using Progress

David Abdala · Oct 4, 2019

Sure, but you are talking about a "full" and "generic" reading implementation. I'm not. When you need to get some specific info from a PDF, on an specific PDF version, things gets a lot easier. I've done more than 5 of this implementations and, as you state, there are lots of complexities if you plan to make it fully standard compliant (Gus is refering to this too). To be more clear: You an Gus are absolutly right about the amount of work required to get anything from a PDF in a fully standard compliant way. In most situations where you want "someting" from a PDF, this is not the case, usually you need to extract an specific piece of information from an specificn PDF format, which makes things a lot easier. To make things even easier, there are free tools that transforms PDF streams (I don't remember which one I'm using right now), that converts "any" PDF format to the simpler "older" formats with unencoded streams, from which getting "text" becomes a simpler matter. I'm not saying any kind of implmentation like this to be easy, or fast. Getting to some understanding of the PDF standard alone requires a couple of weeks. "Text Extraction" is not implemented in ABLPDF (not even close to) as it wasn't implemented in PDFInclude, and not much additional work has been done to it, besides "moving" code to an OO version. But you can use what PDFInclude already does (metadata extraction primarly) to retrieve the streams from where get the text you are looking for. Easy? no Challenging? absolutely Recommended? NO Solution? ask the data in a format other than PDF, it was not designed for data exchange.

Continue reading...

[Progress Communities] [Progress OpenEdge ABL] Forum Post: RE: Read a PDF File using Progress

David Abdala

Guest