Phonetic comparison code - soundex

Chris Kelleher

Administrator
Staff member
<BLOCKQUOTE><font size="1" face="Arial, Verdana">code:</font><HR><pre>/*
The soundex code is a useful function for finding names phonetically.
The best use of this would be to have a file with an non-unique indexed
field called Soundex, and to place into this field the soundex code for
the name. Then, when the user inputs a name, convert that input to a
soundex code, and do the find on that. This will narrow down the
choices considerably, and the user can then choose from the resulting
list.
Note that better algorithms exist for phonetic matching then soundex,
(for example, 'PH' and 'F' should be equivalent, 'KN' at the beginning
of words should match 'N', etc.) but soundex is popular both for its
speed and simplicity, and also the fact that it works well most of the
time.
*/

DEFINE INPUT PARAMETER name AS CHARACTER NO-UNDO.
DEFINE OUTPUT PARAMETER code AS CHARACTER NO-UNDO.

DEFINE VARIABLE e AS INTEGER NO-UNDO.
DEFINE VARIABLE i AS INTEGER NO-UNDO.
DEFINE VARIABLE k AS CHARACTER NO-UNDO.
DEFINE VARIABLE l AS CHARACTER NO-UNDO.

ASSIGN
l = ""
name = CAPS(name)
code = SUBSTRING(name,1,1).
DO i = 2 TO LENGTH(name):
e = ASC(SUBSTRING(name,i,1)) - 64.
IF e >= 1 AND e <= 26 THEN DO:
k = SUBSTRING("01230120022455012623010202",e,1).
IF k <> l AND k <> "0" THEN code = code + k.
IF LENGTH(code) > 3 THEN LEAVE.
END.
l = k.
END.
code = SUBSTRING(code + "000",1,4).
RETURN.
[/code]
 

Chris Kelleher

Administrator
Staff member
<BLOCKQUOTE><font size="1" face="Arial, Verdana">code:</font><HR><pre>
/* Soundex routine written by Rick Terrell from an algorithm published in */
/* SPECTRUM Magazine (circa 1986) */
/* Procedure containing soundex routine. Implemented as a procedure with an */
/* internal procedure so that it could be run persistently. */
/* Could be a stand-alone program by removing the PROCEDURE & END PROCEDURE. */
/* Routine skips double letters, but not numbers, so it can be used to verity */
/* street addresses. */
/* SOUNDEX code is passed back via the RETURN statement -- use RETURN-VALUE */
/* function to retrieve */

PROCEDURE soundex:
DEFINE INPUT PARAMETER iword AS CHARACTER NO-UNDO.

DEFINE VAR ocode AS CHARACTER NO-UNDO.
DEFINE VAR code-tbl AS CHARACTER EXTENT 43 NO-UNDO INITIAL "".
DEFINE VAR sbstrg AS CHARACTER NO-UNDO.
DEFINE VAR idx AS INTEGER NO-UNDO.
DEFINE VAR i AS INTEGER NO-UNDO.
DEFINE VAR tbl-idx AS INTEGER NO-UNDO.

/* number = number + 1 11 thru 17 are symbols */
/* a = 18 f = 23 k = 28 p = 33 u = 38 */
/* b = 19 g = 24 l = 29 q = 34 v = 39 */
/* c = 20 h = 25 m = 30 r = 35 w = 40 */
/* d = 21 i = 26 n = 31 s = 36 x = 41 */
/* e = 22 j = 27 o = 32 t = 37 y = 42 */
/* z = 43 */
/* Set the code table to necessary values */
/* NOTE: not all extents have a value */
ASSIGN code-tbl[01] = "0" code-tbl[02] = "1"
code-tbl[03] = "2" code-tbl[04] = "3"
code-tbl[05] = "4" code-tbl[06] = "5"
code-tbl[07] = "6" code-tbl[08] = "7"
code-tbl[09] = "8" code-tbl[10] = "9"
code-tbl[19] = "1" code-tbl[20] = "2"
code-tbl[21] = "3" code-tbl[23] = "1"
code-tbl[24] = "2" code-tbl[27] = "2"
code-tbl[28] = "2" code-tbl[29] = "4"
code-tbl[30] = "5" code-tbl[31] = "5"
code-tbl[33] = "1" code-tbl[34] = "2"
code-tbl[35] = "6" code-tbl[36] = "2"
code-tbl[37] = "3" code-tbl[39] = "1"
code-tbl[41] = "2" code-tbl[43] = "2".
/* First convert to all UPPERCASE for later ASCII conversion */
ASSIGN iword = CAPS(iword)
sbstrg = "".

/* Now let's check for some letter combinations that cause problems */
idx = INDEX(iword,"ITE").
IF idx > 0 THEN sbstrg = "ITE".
ELSE DO:
idx = INDEX(iword,"YTE").
IF idx > 0 THEN sbstrg = "ITE".
ELSE DO:
idx = INDEX(iword,"DG").
IF idx > 0 THEN sbstrg = "DG".
ELSE DO:
idx = INDEX(iword,"TIA").
IF idx > 0 THEN sbstrg = "TIA".
ELSE DO:
idx = INDEX(iword,"CHM").
IF idx > 0 THEN sbstrg = "CHM".
ELSE DO:
idx = INDEX(iword,"DT").
IF idx > 0 THEN sbstrg = "DT".
ELSE DO:
idx = INDEX(iword,"CKS").
IF idx > 0 THEN sbstrg = "CKS".
ELSE DO:
idx = INDEX(iword,"CK").
IF idx > 0 THEN sbstrg = "CK".
END. /* of CK ELSE */
END. /* of CKS ELSE */
END. /* of DT ELSE */
END. /* of TIA ELSE */
END. /* of DG ELSE */
END. /* of YTE ELSE */
END. /* of ITE ELSE */

CASE sbstrg:
WHEN "ITE" THEN iword = SUBSTRING(iword,1,idx - 1) + "IGHT" + SUBSTRING(iword,idx + 3).
WHEN "DG" THEN iword = SUBSTRING(iword,1,idx - 1) + "G" + SUBSTRING(iword,idx + 2).
WHEN "TIA" THEN iword = SUBSTRING(iword,1,idx - 1) + "SHA" + SUBSTRING(iword,idx + 3).
WHEN "CHM" THEN iword = SUBSTRING(iword,1,idx - 1) + "M" + SUBSTRING(iword,idx + 3).
WHEN "DT" THEN iword = SUBSTRING(iword,1,idx - 1) + "T" + SUBSTRING(iword,idx + 2).
WHEN "CKS" THEN iword = SUBSTRING(iword,1,idx - 1) + "X" + SUBSTRING(iword,idx + 3).
WHEN "CK" THEN iword = SUBSTRING(iword,1,idx - 1) + "C" + SUBSTRING(iword,idx + 2).
END CASE.

/* Handle those nasty non-phonetic or silent beginning letters */
sbstrg = SUBSTRING(iword,1,2).
CASE sbstrg:
WHEN "PH" OR WHEN "PF" THEN iword = "F" + SUBSTRING(iword,3).
WHEN "WR" THEN iword = "R" + SUBSTRING(iword,3).
WHEN "KN" OR WHEN "PN" THEN iword = "N" + SUBSTRING(iword,3).
OTHERWISE DO:
IF SUBSTRING(sbstrg,1,1) = "X" THEN iword = "Z" + SUBSTRING(iword,2).
IF SUBSTRING(sbstrg,1,1) = "K" THEN iword = "C" + SUBSTRING(iword,2).
END.
END CASE.

ASSIGN idx = LENGTH(iword)
ocode = SUBSTRING(iword,1,1).

DO i = 2 to idx:
/* skip unwanted characters */
IF ASC(SUBSTRING(iword,i,1)) LT 48 /* 0 */
OR ASC(SUBSTRING(iword,i,1)) GT 90 /* Z */
THEN NEXT.
/* skip multiple letters, but not numbers */
IF NOT CAN-DO("0,1,2,3,4,5,6,7,8,9",SUBSTRING(iword,i,1)) AND
(SUBSTRING(iword,i,1) = SUBSTRING(iword,i - 1,1)) THEN NEXT.
tbl-idx = MAX((ASC(SUBSTRING(iword,i,1)) - 47),1). /* ASCII value or 1 if < 1 */
tbl-idx = MIN(tbl-idx,43). /* tbl-idx value or 43 if > 43 */
ocode = ocode + code-tbl[tbl-idx].
END.

RETURN ocode.

END PROCEDURE.
[/code]
 
Top