Author Topic: Soundex sefun  (Read 1711 times)

Offline z993126

  • BFF
  • ***
  • Posts: 128
    • View Profile
Soundex sefun
« on: February 05, 2012, 09:59:07 AM »
quickie soundex encoder function, which I threw in the english.c file.  free-to-use, etc.

Code: [Select]
string soundex( string s_str ){
string s_new, s_tmp, s_consonants, s_character;
int i_count, i_max, i_count2;
if( !s_str || !stringp( s_str ) ){ return ""; }
s_new = capitalize( s_str[0..0] );
s_consonants = filter( s_str[1..], (:
( $1 >= 'B' && $1 <= 'Z' ) ||
( $1 >= 'b' && $1 <= 'z' ) &&
( $1 != 'e' && $1 != 'h' && $1 != 'i' && $1 != 'o' && $1 != 'u' && $1 != 'w' && $1 != 'y' )
:) );
i_max = sizeof( s_consonants );
for( i_count = 0; i_count < i_max; i_count++ ){
switch( s_consonants[i_count..i_count] ){
case "b": case "f": case "p": case "v": s_tmp = "1"; break;
case "c": case "g": case "j": case "k": case "q": case "s": case "x": case "z": s_tmp = "2"; break;
case "d": case "t": s_tmp = "3"; break;
case "l": s_tmp = "4"; break;
case "m": case "n": s_tmp = "5"; break;
case "r": s_tmp = "6"; break;
}
if( s_new[<1..<1] != s_tmp ){ s_new += s_tmp; i_count2++; }
if( i_count2 > 2 ){ break; }
}
return sprintf( "%-'0'4s", s_new );
}

Optionally, add a space character to the filter for the consonants and generate a code for a multi-word string.  Or whatever.

Offline Tricky

  • BFF
  • ***
  • Posts: 189
  • I like what I code and I code what I like!
    • View Profile
Re: Soundex sefun
« Reply #1 on: February 05, 2012, 12:04:39 PM »
Simplified version based on Quix's version but re-written. I had problems with "trying to put int in string" errors.

The check for array input does recursive calls to soundex() tagging the result onto the end of the return array.

For DS libs, replace query_name() with GetName().

Code: [Select]
protected string _soundex(string word)
{
  mapping codex = ([
    "B": 1, "F": 1, "P": 1, "V": 1,
    "C": 2, "G": 2, "J": 2, "K": 2, "Q": 2, "S": 2, "X": 2, "Z" :2,
    "D": 3, "T": 3,
    "L": 4,
    "M": 5, "N": 5,
    "R": 6
  ]);
  string array letters = ({ });
  string ret;
  int match, current = 0, last = 0;

  if (!word || !stringp(word)) return "Z000";

  ret = word[0..0];
  letters = explode(word[1..<1], "");

  foreach (string letter in letters)
  {
    match = codex[letter];

    if (undefinedp(match)) continue;
    if (match == last) continue;

    last = match;
    current *= 10;
    current += match;

    if (current > 999) break;
  }

  ret += current + "000";

  return ret[0..3];
}


string array soundex(mixed data)
{
  string array ret = ({ });

  if (undefinedp(data) || !data) return ({ "Z000" });
  if (objectp(data)) return ({ _soundex(data->query_name()) });

  if (stringp(data))
  {
    mixed array assoc;
    int i, sz;

    /* Strip out whole words including hyphens and apostrophes. */
    assoc = reg_assoc(data, ({ "[A-Za-z'-]+", }), ({ 1, }), 0);

    for (i = 0, sz = sizeof(assoc[1]) ; i < sz ; i++)
    {
      if (!assoc[1][i]) continue;

      ret += ({ _soundex(upper_case(assoc[0][i])) });
    }
  }
  else
  if (arrayp(data))
    foreach(mixed datum in data)
      ret += soundex(datum);
  else
    ret += ({ "Z000" });

  return ret;
}

Tricky