Author Topic: Metaphone sefun  (Read 2353 times)

Offline z993126

  • BFF
  • ***
  • Posts: 128
    • View Profile
Metaphone sefun
« on: February 16, 2012, 04:21:46 PM »
I put both of these in /secure/sefun/english.c
Metaphone, enumerated in http://en.wikipedia.org/wiki/Metaphone (though the rules listed there are a bit imprecise; "'C' transforms to 'X' if followed by 'IA' or 'H'" taken literally means you'd only change the C to X, not the CIA or CH to X, which is what it actually should do).

Code: [Select]
string strip_punctuation( string s_str ){
return filter( s_str, (:
( $1 >= 'A' && $1 <= 'Z' ) ||
( $1 >= 'a' && $1 <= 'z' ) ||
( $1 >= '0' && $1 <= '9' ) ||
$1 == ' '
:) );
}

string metaphone( string s_str ){
string s_new = "", s_tmp, s_character, *sa_vowels = ({ 'A', 'E', 'I', 'O', 'U' }), *sa_iey = ({ 'I', 'E', 'Y' });
string *sa_vowels2 = ({ "A", "E", "I", "O", "U" });
int i_count, i_max;
if( !s_str || !stringp( s_str ) || sizeof( s_str ) < 1 ){ return ""; }
s_str = upper_case( strip_punctuation( s_str ) );

// 1. drop duplicate adjacent letters, except for C (and G, for GG case in step 7)
i_max = sizeof( s_str );
for( i_count = 0; i_count < i_max; i_count++ ){
if( i_count < i_max - 1 ){
if(
s_str[i_count + 1] != s_str[i_count] ||
( ( s_str[i_count] == 'C' || s_str[i_count] == 'G' ) && s_str[i_count + 1] == s_str[i_count] )
){
s_new += s_str[i_count..i_count];
}
}else{
s_new += s_str[i_count..i_count];
}
}

// 2. if the word begins with KN, GN, PN, AE, or WR, drop the first letter
i_max = sizeof( s_new );
if( i_max > 1 ){
switch( s_new[0..1] ){
case "KN":
case "GN":
case "PN":
case "AE":
case "WR":
s_new = s_new[1..];
break;
default:
break;
}
}

// 3. drop B at end of word if after M
i_max = sizeof( s_new );
if( s_new[<1] == 'B' && i_max > 1 && s_new[<2] == 'M' ){
s_new = s_new[0..<2];
}

// 4. SCH into SK; CIA or CH into X; CI, CI, CY into S; else C into K
s_new = replace_string( s_new, "SCH", "SK" );
s_new = replace_string( s_new, "CIA", "x" );
s_new = replace_string( s_new, "CH", "x" );
s_new = replace_string( s_new, "CI", "S" );
s_new = replace_string( s_new, "CE", "S" );
s_new = replace_string( s_new, "CY", "S" );
s_new = replace_string( s_new, "C", "K" );

// 5. DGE, DJY, DGI into J, else D to T
s_new = replace_string( s_new, "DGE", "J" );
s_new = replace_string( s_new, "DGY", "J" );
s_new = replace_string( s_new, "DGI", "J" );
s_new = replace_string( s_new, "D", "T" );

// 6. drop G if before H and H not at end or before vowel, drop G if before N or NED and N or NED at end
i_max = sizeof( s_new );
for( i_count = 0; i_count < i_max; i_count++ ){
if(
s_new[i_count] == 'G' && i_count < i_max - 2 && s_new[i_count + 1] == 'H' &&
member_array( s_new[i_count + 1], sa_vowels ) > -1
){
s_new[i_count..i_count] == "";
}
}
i_max = sizeof( s_new );
if( sizeof( i_max ) > 1 && s_new[<2..] == "GN" ){
s_new[<2..] = "N";
}else if( sizeof( i_max ) > 3 && s_new[<4..] == "GNED" ){
s_new[<4..] = "NED";
}

// 7. GG into K; GI, GE, GY into J; else G into K
s_new = replace_string( s_new, "GG", "K" );
s_new = replace_string( s_new, "GI", "J" );
s_new = replace_string( s_new, "GE", "J" );
s_new = replace_string( s_new, "GY", "J" );
s_new = replace_string( s_new, "G", "K" );

// 8. drop H if after vowel and not before vowel
i_max = sizeof( s_new );
if( i_max > 1 ){
for( i_count = 1; i_count < i_max; i_count++ ){
if(
s_new[i_count] == 'H' && member_array( s_new[i_count - 1], sa_vowels ) > -1 &&
i_count < i_max - 1 && member_array( s_new[i_count + 1], sa_vowels ) != -1
){
s_new[i_count..i_count] == "";
}
}
}

// 9. CK into K
s_new = replace_string( s_new, "CK", "K" );

// 10. PH into F
s_new = replace_string( s_new, "PH", "F" );

// 11. Q into K
s_new = replace_string( s_new, "Q", "K" );

// 12. SH, SIO, SIA into X
s_new = replace_string( s_new, "SH", "x" );
s_new = replace_string( s_new, "SIO", "x" );
s_new = replace_string( s_new, "SIA", "x" );

// 13. TIA, TIO, TCH into X; TH into 0
s_new = replace_string( s_new, "TIA", "x" );
s_new = replace_string( s_new, "TIO", "x" );
s_new = replace_string( s_new, "TCH", "x" );
s_new = replace_string( s_new, "TH", "0" );

// 14. V into F
s_new = replace_string( s_new, "V", "F" );

// 15. WH into W if at beginning, drop W if not before vowel
if( sizeof( s_new ) > 1 && s_new[0..1] == "WH" ){ s_new[0..1] = "W"; }
i_max = sizeof( s_new );
for( i_count = 0; i_count < i_max; i_count++ ){
if( s_new[i_count] == 'W' ){
if( i_count < i_max - 1 ){
if( member_array( s_new[i_count + 1], sa_vowels ) == -1 ){ s_new[i_count..i_count] = ""; }
}else{
s_new[i_count..i_count] = "";
}
}
}

// 16. X into S if at beginning, else X into KS
if( s_new[0] == 'X' ){ s_new[0] = 'S'; }
s_new = replace_string( s_new, "X", "KS" );

// 17. drop Y if not before vowel
i_max = sizeof( s_new );
for( i_count = 0; i_count < i_max; i_count++ ){
if( s_new[i_count] == 'Y' ){
if( i_count < i_max - 1 ){
if( member_array( s_new[i_count + 1], sa_vowels ) == -1 ){ s_new[i_count..i_count] = ""; }
}else{
s_new[i_count..i_count] = "";
}
}
}

// 18. Z into S
s_new = replace_string( s_new, "Z", "S" );

// 19. drop all vowels unless at beginning
foreach( s_character in sa_vowels2 ){
s_new[1..] = replace_string( s_new[1..], s_character, "" );
}

// 20. x back into X
s_new = replace_string( s_new, "x", "X" );

return s_new;
}

(ed. note; forgot to delete a debug write in step 19, oops)
« Last Edit: February 16, 2012, 04:26:46 PM by z993126 »

Offline detah

  • BFF
  • ***
  • Posts: 190
  • Ruler of 2D
    • View Profile
Re: Metaphone sefun
« Reply #1 on: February 17, 2012, 08:42:52 AM »
What is the value of this function to muds? I am familiar with soundex in the context of spellchecking massive databases. Are you trying to make a system which guesses what some poor hapless typo-prone mudder is typing? is the goal here to suggest the proper syntax?

Code: [Select]
> get tocrh from bag
Do you mean 'torch'?

Is this the goal?

Offline quixadhal

  • BFF
  • ***
  • Posts: 642
    • View Profile
    • WileyMUD
Re: Metaphone sefun
« Reply #2 on: February 17, 2012, 01:40:01 PM »
I dunno about Ardneh's goal, other than spending a great deal of extra time on something that's marginally better than soundex... but other MUD's have used soundex to find "close" matches against help files and whatnot.

>help color

There are help entries for:  colour, ansi colours, and prompt colours.

That kind of stuff.  I guess you could apply it to the parser in general, although the value probably diminishes since if you have two targets in a room who share the same (or close) soundex values, you might typo and hit the wrong one.

You see a proud city guard standing here.
You also see Gard, a shady little man.

kill gaurd

You start to rise your fist to the proud city guard.
He looks at you.
You are DEAD!

Offline detah

  • BFF
  • ***
  • Posts: 190
  • Ruler of 2D
    • View Profile
Re: Metaphone sefun
« Reply #3 on: February 17, 2012, 02:23:12 PM »
This doesn't pass the smell test with me. I think the default error message with no spellchecking is superior in every way to trying to guess what the player intended and actually implementing the command. I see this going badly for the player......

Code: [Select]
> l
An Open Clearing
Lots of grass and stuff. There's a big oak tree to the north.
Exits: east, west.
Rabbtei, the Supreme Overlord of Death
A tiny rabbit.

> kill rabbti

You miss.
Rabbtei smashes you with a devastating blow.
You die.
Death says: WELCOME, MORTAL ONE!

No. I don't see this working out well at all.
« Last Edit: February 17, 2012, 02:30:27 PM by detah »

Offline z993126

  • BFF
  • ***
  • Posts: 128
    • View Profile
Re: Metaphone sefun
« Reply #4 on: February 17, 2012, 04:45:16 PM »
lol.

I suppose it was more just an exercise in programming, than necessarily being useful (though it could be applied in the specific case where there are no matches to a command's argument, so it checks for similarities, and uses that to generate a better error message for the player...?  no idea)