URIBL patch

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

URIBL patch

Przemek Czerkas
- fixes some false-positives
- added check for http://www.printeryml*com ( Important ! Replace "*" with "." )
  type of obfuscation

1. REPLACE:
# URI components
$URICharRe='(?:\=(?:\015?\012|\015)|[=%][0-9a-f]{2}|\#\&\d{1,3};?|[0-9a-z\-\_\.\~\!\*\'\(\)\;\:\@\&\=\+\$\,\?\%\#\[\]])';

WITH:
# URI components - RFC3986 section 2, 'Characters'
$URIContinuationRe='\=(?:\015?\012|\015)';
$URIEncodedCharRe='[\=\%][a-f0-9]{2}|\&\#\d{1,3}\;?';
$URIUnreservedCharRe='[a-z0-9\-\_\.\~]';
$URIGenDelimsCharRe='[\:\/\?\#\[\]\@]';
$URISubDelimsCharRe='[\!\$\&\'\(\)\*\+\,\;\=\%\^\`\{\}\|]'; # relaxed to a few other characters
$URIReservedCharRe=$URIGenDelimsCharRe.'|'.$URISubDelimsCharRe;

# URI compounds
$URICommonRe=$URIContinuationRe.'|'.$URIEncodedCharRe.'|'.$URIUnreservedCharRe;
$URIHostRe='(?:'.$URICommonRe.'|'.$URISubDelimsCharRe.')+';
$URIRe='(?:'.$URICommonRe.'|'.$URIReservedCharRe.')+';

2. REPLACE:
  while ($b=~/(?:https?|ftp)[\041-\176]{0,3}\:\/{1,3}($URICharRe+)|((?:www|ftp)\.$URICharRe+)/gi) {
   $uri=$1 || $2;
   
# RFC 2821, section 4.5.2, 'Transparency': delete leading period char
   $uri=~s/\=(?:\015?\012|\015)\.?//g;
   # decode 'at' character
   $uri=~s/[=%]40/@/g;
   $uri=~s/&#0?64;?/@/g;
   if ($uri=~/(?:[^\s\/\@]+\@)?($URICharRe+)/io) {
    $orig_uri=$uri=$1;
    $uri=~s/[=%]([0-9a-f]{2})/chr(hex($1))/gie;
    $uri=~s/&#(\d{1,3});?/chr($1)/ge;
    $uri=~tr/;//d;
    $uri=~s/\.{2,}/\./g;
    $uri=~s/^\.//;
    $uri=~s/[\. ]+$//;

WITH:
  while ($b=~/(?:ht|f)tps?[\041-\176]{0,3}\:\/{1,3}($URIRe)|((?:www|ftp)(?:\=2e|\&\#0?46\;?|\.)$URIRe)/gio) {
   $uri=$1 || $2;
   # RFC 2821, section 4.5.2, 'Transparency': delete leading period char
   $uri=~s/$URIContinuationRe\.?//go; # and strip line continuations
   # decode quoted-printables
   $uri=~s/\=([a-f0-9]{2})/chr(hex($1))/gie;
   # decode 'at' character
   $uri=~s/\%40/@/g;
   $uri=~s/\&\#0?64\;?/@/g;
   if ($uri=~/(?:[^\s\/\@]+\@)?($URIHostRe)/io) {
    $uri=$1;
    $uri=~s/[\'\)]+$//; # fix HTML
    $orig_uri=$uri;
    $uri=~s/\%([a-f0-9]{2})/chr(hex($1))/gie; # decode percents
    $uri=~s/\&\#(\d{1,3})\;?/chr($1)/ge; # decode &#ddd's
    # strip redundant dots
    $uri=~s/\.{2,}/\./g;
    $uri=~s/^\.//;
    $uri=~s/\.$//;
    $uri=~s/$URISubDelimsCharRe//go; # more tricks?


Przemek


-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier.
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Assp-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/assp-devel
Reply | Threaded
Open this post in threaded view
|

Re: URIBL patch - update

Przemek Czerkas
- fixes more FPs

REPLACE:
    $uri=~s/[\'\)]+$//; # fix HTML

WITH:
    # fix HTML
    $uri=~s/[\'\)]+$//;
    $uri=~s/\&(?:nbsp|amp|quot|gt|lt)\;?//gi;


-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier.
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Assp-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/assp-devel