Newsgroups : Borland : borland.public.delphi.internet.winsock : 2006 Jun : Re: spam sender addresses

www.cryer.info
Managed Newsgroup Archive

Re: spam sender addresses

Subject:Re: spam sender addresses
Posted by:"theo" (nospam@for.me)
Date:Thu, 1 Jun 2006 23:02:20

Remy Lebeau (TeamB) schrieb:

>
> Syntax-wise, those are valid addresses.  So a syntax filtering algorithm
> will not help you.  What you are asking for is a content filtering
> algorithm, and that is very hard to implement.

After reading some computer linguistic websites today I gave up with the
idea to make it "the scientific way".
Then I simply tried to write a function which does what I mean (see below).
This is all I need. This function makes a guess if a word is a "natural"
word or consists of random characters (Result > 0)

It is capable of detecting 3 of the 4 words I initially posted in this
thread as spam:

WxOHT -> not detected as spam
rokqawmwrp -> spam
jhmtnr -> spam
xwldsxu -> spam

It simply anaylizes the sequence of soundex-similars, consonants and vowels.

All my normal email contacts pass the test (Result=0), even
"Gschwandtner" and "Burckhardt".

That's really all I wanted. Just to tag messages as "suspicous" by the
sender (user name not host).
The function can certainly be improved. I wrote it two hours ago...


----------------------

function AnalyzeWord(input: ShortString): integer;
var i, len, countsdex, countvow, countcons: integer;
  tmpChar, last: byte;
const
  CSoundexTable: array[65..122] of ShortInt =
  // A  B  C  D  E  F  G  H   I  J  K  L  M  N  O  P  Q  R  S  T  U  V
W   X  Y  Z
  (0, 1, 2, 3, 0, 1, 2, -1, 0, 2, 2, 4, 5, 5, 0, 1, 2, 6, 2, 3, 0, 1,
-1, 2, 0, 2,
  // [  /  ]  ^  _  '
    0, 0, 0, 0, 0, 0,
  // a  b  c  d  e  f  g  h   i  j  k  l  m  n  o  p  q  r  s  t  u  v
w   x  y  z
    0, 1, 2, 3, 0, 1, 2, -1, 0, 2, 2, 4, 5, 5, 0, 1, 2, 6, 2, 3, 0, 1,
-1, 2, 0, 2);

  function Score(AChar: Integer): Integer;
  begin
    Result := 0;
    if (AChar >= Low(CSoundexTable)) and (AChar <= High(CSoundexTable)) then
      Result := CSoundexTable[AChar];
  end;

  procedure ExceptionalCase;
  begin
    inc(Result);
    countsdex := 0;
    countvow := 0;
    countcons := 0;
  end;

begin
  last := 99;
  countsdex := 0;
  countvow := 0;
  countcons := 0;
  Result := 0;

  //Replace some known combinations (I don't know the linguistic term
for this)
  //this reduces false alerts for names like "Gschwandtner" --> "Gswantner"
  //"Burckhardt" --> Burkhart
  input := StringReplace(input, 'sch', 's', [rfReplaceAll, rfIgnoreCase]);
  input := StringReplace(input, 'dt', 't', [rfReplaceAll, rfIgnoreCase]);
  input := StringReplace(input, 'sh', 's', [rfReplaceAll, rfIgnoreCase]);
  input := StringReplace(input, 'ck', 'k', [rfReplaceAll, rfIgnoreCase]);
  input := StringReplace(input, 'ch', 'k', [rfReplaceAll, rfIgnoreCase]);
  input := StringReplace(input, 'th', 't', [rfReplaceAll, rfIgnoreCase]);
  input := StringReplace(input, 'oao', 'oan', [rfReplaceAll,
rfIgnoreCase]); //pt joao
  //to be exented

  len := length(input);
  for i := 1 to len do
  begin
    tmpChar := Score(Ord(input[i]));

    if last = tmpChar then inc(countsdex) else countsdex:=0;
    if tmpChar = 0 then inc(countvow) else countvow := 0;
    if tmpChar <> 0 then inc(countcons) else countcons := 0;
    last := tmpChar;

    if countsdex > 1 then ExceptionalCase else
      if countvow > 2 then ExceptionalCase else
        if countcons > 3 then ExceptionalCase;
  end;
end;

Replies:

In response to:

www.cryer.info
Managed Newsgroup Archive