Question Parsing out html content in text.

Tracy Hall

New Member
I have some data that was embedded into a html table and stored into a field in the database.

Does anyone have advice on a strategy to do that?

I am doing this to work with Magento, so we are just bringing the data in and outputting it to a csv for Magento import. So it is just a matter of getting it into a temp-table without the html tags.
 

TomBascom

Curmudgeon
Perhaps you could post some sample data?

(It would be best to wrap it in "code tags". That's the funny looking icon 5th from the right in the toolbar while you are posting...)

This is what I expect it to look like:
Code:
<table>
  <tr><td>row 1</td><td>field 1</td><td> field 2</td></tr>
  <tr><td>row 2</td><td>field 1</td><td> field 2</td></tr>
  <tr><td>row 3</td><td>field 1</td><td> field 2</td></tr>
</table>

But that's just a guess based on nothing other than knowing how an HTML table could be coded. There are lots of possible variations.
 

Tracy Hall

New Member
Code:
<table Border="1" Cellpadding="0" Cellspacing="0" Style="border-collapse: collapse ; font-size: 13px; " Bordercolor="#Ff6600" > 

<tr><td width="50%">&nbsp;  Starting Watts</td> <td width="50%" align="center" > 13500 Watts</td> </tr> 

<tr><td width="50%">&nbsp;  Running Watts</td> <td width="50%" align="center" > 8000 Watts</td> </tr> 

<tr><td width="50%">&nbsp;  Engine</td> <td width="50%" align="center" >15 Hp Briggs & Stratton Vanguard&#8482; Commercial Ohv </td> </tr> 

<tr><td width="50%">&nbsp;  Fuel Tank Capacity</td> <td width="50%" align="center" > 7 Gallon / 10 Hour Run Time At Half Load </td> </tr> 

<tr><td width="50%">&nbsp;  Electric Start </td> <td width="50%" align="center" >Yes (With Battery Included)</td> </tr> 

<tr><td width="50%">&nbsp;  Dimensions</td> <td width="50%" align="center" >28.1"L X 18.5"W X 26.75"H</td> </tr> 

<tr><td width="50%">&nbsp;  Alternator</td> <td width="50%" align="center" >Powersurge&#8482; </td> </tr> 

<tr><td width="50%">&nbsp;  Muffler </td> <td width="50%" align="center" >Lo-Tone Muffler</td> </tr> 
.............
[code]

This is a snippet of the data in the table. It is super frustrating that the data is so embedded in all that mess.

Thanks for any insight!
Tracy
 

Rodrigo RUbio

New Member
Hi Tracy,

There are a few "PHP" html parsers (google), I've used a couple in the past i'll see if I can find one and update this post.

Rodrigo
 

Tracy Hall

New Member
Hi Tracy,

There are a few "PHP" html parsers (google), I've used a couple in the past i'll see if I can find one and update this post.

Rodrigo

Thanks Rodrigo,

I am sure I could find it. I was just trying to do it the progress way. I am a progress developer in training. I think these tasks are to help me learn writing programs and whatnot. Sometimes I hunt for the answers and find them after I have come up with a not so graceful solution. Like my sax XML parsing. I think I have a web of 5 different procedures to put together 4 XML files and spit it out for my magento load.

I wish I could pause time for a bit so I can find the answer. I will be glad when I have built up my tool box a bit.

I have about 4 1/2 - 5 months of experience under my belt.

Thanks for your help. If you find the php parser I will definitely take the info.

Thanks!
Tracy
 

Stefan

Well-Known Member
The SAX-parser will also handle HTML - the TM characters are causing some havoc with the stream, but that's a detail:

Code:
DEFINE VARIABLE p_hs       AS HANDLE      NO-UNDO.
DEFINE VARIABLE p_cxpath   AS CHARACTER   NO-UNDO.
DEFINE VARIABLE p_icolumn  AS INTEGER     NO-UNDO.

DEFINE TEMP-TABLE ttproperty NO-UNDO
   FIELD cname    AS CHAR FORMAT "x(16)"
   FIELD cvalue   AS CHAR FORMAT "x(16)" 
   .

CREATE SAX-READER p_hs.

p_hs:SET-INPUT-SOURCE( "file", "c:/temp/table.html" ).
p_hs:SAX-PARSE().

RUN showtt.

PROCEDURE startElement:
   DEFINE INPUT PARAMETER i_cnamespaceuri AS CHARACTER NO-UNDO.  
   DEFINE INPUT PARAMETER i_clocalname    AS CHARACTER NO-UNDO.  
   DEFINE INPUT PARAMETER i_cqname        AS CHARACTER NO-UNDO.  
   DEFINE INPUT PARAMETER i_hattributes   AS HANDLE    NO-UNDO.
   
   p_cxpath = p_cxpath + "/" + i_clocalname.
   CASE p_cxpath:

      WHEN "/table/tr" THEN DO:
         CREATE ttproperty.
         p_icolumn = 0.
      END.
      WHEN "/table/tr/td" THEN DO:
         p_icolumn = p_icolumn + 1.
      END.

   END CASE.

END PROCEDURE.

PROCEDURE endElement:
   DEFINE INPUT PARAMETER i_cnamespaceuri AS CHARACTER NO-UNDO.  
   DEFINE INPUT PARAMETER i_clocalname    AS CHARACTER NO-UNDO.  
   DEFINE INPUT PARAMETER i_cqname        AS CHARACTER NO-UNDO.  

   IF p_cxpath = "/table/tr" THEN 
      RELEASE ttproperty.

   p_cxpath = SUBSTRING( p_cxpath, 1, R-INDEX( p_cxpath, "/":U ) - 1 ).

END PROCEDURE.

PROCEDURE characters:  
   DEFINE INPUT PARAMETER i_carray        AS LONGCHAR NO-UNDO.  
   DEFINE INPUT PARAMETER i_iarraylength  AS INTEGER  NO-UNDO.

   CASE p_cxpath:
      WHEN "/table/tr/td" THEN DO:
         CASE p_icolumn:
            WHEN 1 THEN ttproperty.cname  =  i_carray.
            WHEN 2 THEN ttproperty.cvalue =  i_carray.
         END CASE.
      END.
   END CASE.

END PROCEDURE.

PROCEDURE showtt:

   FOR EACH ttproperty:
      DISPLAY
         ttproperty.
   END.

END PROCEDURE.
 
Top