Single pass EDI parsing without an XML schema - Possible?

Status
Not open for further replies.
J

j a s t i u m

Guest
Before you sigh and hold your head in your hands, please understand that I'm working with a pretty old system on a rather tight timeline.

We have a single pass EDI parser written in a business language. Currently, the data definitions including the loop level, area, and name of each segment are stored in a database table. This table also assigns each segment within an area an incremental sequence number. E.g., the 004010 810 Header area:

  • Segment Sequence
  • BIG 5
  • NTE 10
  • CUR 15
  • REF 20
  • YNQ 25
  • PER 30
  • N1 (start of loop) 35
  • N2 40

etc. etc.

So, if you read the segments in the order that they appear in the standard, we can say that each one can be assigned a sequential number, a "depth" (how many loops "down" it appears) and a name (2-3 characters).

The algorithm followed by the parser at present is as follows:

Reset currentArea to 1
For each segment in the document
{
Search for the segment's name in the table restricting the area >= currentArea.
If not found, we have an error.
else
{
If the area changed
{
empty the temporary "search bounds" table. Create a single record with upper bound equal to MAX(sequence in current area) and lower bound equal to MIN(sequence in current area).
}
If the area did not change
{
Search for the next segment with a matching, but only within the bounds of the last "bounds" record created.
If the segment is found and the loop level changed as a result
{
Create a new bounds record with lower bound = MIN(sequence in current loop) and upper bound = MAX(sequence in current loop).
}
If the segment is not found within the searched bounds
{
"Pop" a bounds record out of the table to widen the search, repeat recursively until a segment having the same name is found.
}
}
}
}


Unfortunately, I'm not sure that I have the time or the means to implement an XML based solution using an actual document schema. I am currently researching several such parsers, and they seem to be able to magically arrange EDI according to the schema, no matter how it looks.

The problem I'm facing is this:

In the 945 document, the Detail area looks like this (excerpt):

<DETAIL>
<LX>
<MAN>
<PAL>
<N9>
<W12 (loop header)>
<G69>
...
<miscellaneous other segments>
...
<LX (loop header)>
...
<miscellaneous other segments in LX loop>
...
...
</DETAIL>


In my raw data, we have:

LX*1~ MAN*GM*0000803225000421444452~ N9*2I*12150-1~ W12*CC*2*2*0*EA*101199007289*VN*10007~ N9*LI*1~ LX*5~ MAN*GM*0000803225000421444453~ N9*2I*12150-2~ ... (other segments)

Based on the algorithm above, when the second LX segment is hit, there is currently a "loop bounding" record from the first segment in the W12 (W12) to the last possible segment within the W12 (FA.FA2). Thus, when performing the search on the document's standard table, the next LX to be found in the definition is the LX that opens up its' own loop within the W12. This is wrong - The detail area is actually resetting here, and the LX is actually the first segment in the area, not the start of the W12.LX loop. Due to the naive nature of the parser, it cannot distinguish this since it is a bottom up search on the standards table based on loops.

Changing the parser to look at the start of the area (top down) rather than the current model creates the opposite problem. If the trading partner actually intended to open the inner W12.LX loop, the parser would interpret it as the start of a new detail area.

Is solving this case possible with a single pass parser that's using the standards as defined in the table I've described? Is finding some way to hack an XML solution into our rather old system the only approach here? Since EDI does not have "end tags", the only way I can be sure that a loop is actually over is by "looking ahead" in the document for scenarios that would be impossible, like a MAN segment appearing after the inner W12.LX (since the detail area MUST reset for the MAN segment to be used again).

I'm at the end of my rope, and any ideas would be welcome.

Continue reading...
 
Status
Not open for further replies.
Top