Question: SPAM corpus pollution due to reporting of blocked files possible?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Question: SPAM corpus pollution due to reporting of blocked files possible?

K Post
Now that ASSP_AFC v 4.36 is doing such a great job of blocking MS Office
macro files, we're seeing a lot of "your invoice" emails coming through
with their attachments stripped.  I'm thrilled to see this.

I also see users reporting these as spam, because they are.

My question is if there's an possibility of the spam corpus being tainted
by the contents of the (now) text file that now contains the text:
The attached file (FILENAME) was removed from this email by ASSP for policy
reasons!
our version actually says
The attached file (FILENAME) was removed from this email by ASSP for policy
reasons!  Contact [hidden email] for help.

The text file is getting back to the server as a report.

Would phrases from that file now be included in the corpus?   Would legit
mail that comes in with say the text "Contact [hidden email] for
help." in it now be more likely to be rejected bayes or hmm?

------------------------------------------------------------------------------

_______________________________________________
Assp-test mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/assp-test
Reply | Threaded
Open this post in threaded view
|

Re: Question: SPAM corpus pollution due to reporting of blocked files possible?

Thomas Eckardt/eck
>Would phrases from that file now be included in the corpus?

No. To prevent this, the text is part of an attachment. Only the SHA1 hash
of the first 512 byte of an attachment is going in to the spamdb/hmmdb.

Thomas




Von:    K Post <[hidden email]>
An:     ASSP development mailing list <[hidden email]>
Datum:  22.08.2016 20:20
Betreff:        [Assp-test] Question: SPAM corpus pollution due to
reporting of    blocked files possible?



Now that ASSP_AFC v 4.36 is doing such a great job of blocking MS Office
macro files, we're seeing a lot of "your invoice" emails coming through
with their attachments stripped.  I'm thrilled to see this.

I also see users reporting these as spam, because they are.

My question is if there's an possibility of the spam corpus being tainted
by the contents of the (now) text file that now contains the text:
The attached file (FILENAME) was removed from this email by ASSP for
policy
reasons!
our version actually says
The attached file (FILENAME) was removed from this email by ASSP for
policy
reasons!  Contact [hidden email] for help.

The text file is getting back to the server as a report.

Would phrases from that file now be included in the corpus?   Would legit
mail that comes in with say the text "Contact [hidden email] for
help." in it now be more likely to be rejected bayes or hmm?
------------------------------------------------------------------------------
_______________________________________________
Assp-test mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/assp-test




DISCLAIMER:
*******************************************************
This email and any files transmitted with it may be confidential, legally
privileged and protected in law and are intended solely for the use of the

individual to whom it is addressed.
This email was multiple times scanned for viruses. There should be no
known virus in this email!
*******************************************************


------------------------------------------------------------------------------

_______________________________________________
Assp-test mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/assp-test