norm=0.8300

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

norm=0.8300

Dickson, Paul

How does the norm correlate, if at all, to the spam caught/not caught?  Here is my rebuild output:

 

Found 50782678 spam words, 61185972 non-spam words.

Generating weighted keys...

norm=0.8300

Saving rebuilt SPAM database**************************************************

 

 

TIA!

 

Paul

Reply | Threaded
Open this post in threaded view
|

RE: norm=0.8300

Peter Awad-2
Paul,
Don't quote me on it, but the norm relates to statistical measures and means normalized weighting. How the number is generated I don't know, but don't think it's anything simple like spam over non-spam times message count. I don't think it's that simple.

Peter

----- Original Message -----
From: [hidden email] on behalf of "Dickson, Paul"
Sent: Tue, 2/28/2006 9:36am
To: [hidden email]
Subject: [Assp-user] norm=0.8300

How does the norm correlate, if at all, to the spam caught/not caught?  Here is my rebuild output:
 
Found 50782678 spam words, 61185972 non-spam words.
Generating weighted keys...
norm=0.8300
Saving rebuilt SPAM database**************************************************
 
 
TIA!
 
Paul
Reply | Threaded
Open this post in threaded view
|

RE: norm=0.8300

Dickson, Paul
In reply to this post by Dickson, Paul

I guess it would help if I related my goal and that is to figure out whether I should be trying to get it lower, or higher, and what the best possible methods of doing that would be.. or if it is relevant at all and I should just disregard…

 


From: [hidden email] [mailto:[hidden email]] On Behalf Of Peter Awad
Sent: Tuesday, February 28, 2006 11:38 AM
To: [hidden email]
Subject: RE: [Assp-user] norm=0.8300

 

Paul,
Don't quote me on it, but the norm relates to statistical measures and means normalized weighting. How the number is generated I don't know, but don't think it's anything simple like spam over non-spam times message count. I don't think it's that simple.

Peter

----- Original Message -----
From: [hidden email] on behalf of "Dickson, Paul"
Sent: Tue, 2/28/2006 9:36am
To: [hidden email]
Subject: [Assp-user] norm=0.8300


How does the norm correlate, if at all, to the spam caught/not caught?  Here is my rebuild output:

 

Found 50782678 spam words, 61185972 non-spam words.

Generating weighted keys...

norm=0.8300

Saving rebuilt SPAM database**************************************************

 

 

TIA!

 

Paul

Reply | Threaded
Open this post in threaded view
|

Re: norm=0.8300

Fritz Borgstedt
In reply to this post by Dickson, Paul

>How does the norm correlate, if at all, to the spam caught/not
>caught?  Here is my rebuild output:
>Found 50782678 spam words, 61185972 non-spam words.
>Generating weighted keys...
>
>norm=0.8300


The norm is the relation of spam words to non-spam words.
In other words
>50782678 spam words
 are 83% of
>61185972 non-spam words
.



-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
_______________________________________________
Assp-user mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/assp-user
Reply | Threaded
Open this post in threaded view
|

Re: norm=0.8300

Fritz Borgstedt
In reply to this post by Dickson, Paul

>I guess it would help if I related my goal and that is to figure out
>whether I should be trying to get it lower, or higher, and what the
>best possible methods of doing that would be.. or if it is relevant
>at all and I should just disregard?

Lower norm means less false positives. It should be below 1 and above
0.5.
0.83 is ok.

There are different methods to influence the norm.
First is the maxfiles number. If you go higher with maxfiles the norm
goes lower, because HAM tend to be longer (more words) than SPAM.

In my installation the average of HAM  is 8k  and the average of SPAM
is 4 K.
That gives me a norm between of 0.5 and 0.6.
I reduced my maxsize to 4k. By that my norm went to 0.9 and i could
increase maxfiles without having long rebuild time.





-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
_______________________________________________
Assp-user mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/assp-user
Reply | Threaded
Open this post in threaded view
|

RE: norm=0.8300

Dickson, Paul
In reply to this post by Dickson, Paul
So having a lower norm, closer to .5 is better?

-----Original Message-----
From: [hidden email]
[mailto:[hidden email]] On Behalf Of Fritz
Borgstedt
Sent: Tuesday, February 28, 2006 12:43 PM
To: [hidden email]
Subject: Re: [Assp-user] norm=0.8300


>I guess it would help if I related my goal and that is to figure out
>whether I should be trying to get it lower, or higher, and what the
>best possible methods of doing that would be.. or if it is relevant
>at all and I should just disregard?

Lower norm means less false positives. It should be below 1 and above
0.5.
0.83 is ok.

There are different methods to influence the norm.
First is the maxfiles number. If you go higher with maxfiles the norm
goes lower, because HAM tend to be longer (more words) than SPAM.

In my installation the average of HAM  is 8k  and the average of SPAM
is 4 K.
That gives me a norm between of 0.5 and 0.6.
I reduced my maxsize to 4k. By that my norm went to 0.9 and i could
increase maxfiles without having long rebuild time.





-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting
language
that extends applications into web and mobile media. Attend the live
webcast
and join the prime developer group breaking into this new coding
territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
_______________________________________________
Assp-user mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/assp-user


-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
<a href="http://sel.as-us.falkag.net/sel?cmd=lnk&kid0944&bid$1720&dat1642">http://sel.as-us.falkag.net/sel?cmd=lnk&kid0944&bid$1720&dat1642
_______________________________________________
Assp-user mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/assp-user
Reply | Threaded
Open this post in threaded view
|

Re: norm=0.8300

Fritz Borgstedt

>So having a lower norm, closer to .5 is better?

It depends what you want. Call it the sharpness. Less false positives
=> more SPAM undetected. Between 0.5 and 1.0 is ok.



-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
_______________________________________________
Assp-user mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/assp-user