Quantcast

Time-out, workers get stuck and increased cpu usage

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Time-out, workers get stuck and increased cpu usage

assp-3

Dear All,

 

I’ve been using Assp for 2 months, and basically it does a great job.

Unfortunately i have a random error and i need some help to resolve it.

Basically my configuration was made according to this manual: https://vorkbaard.nl/installing-assp-spamfilter-on-ubuntu-server-14-04-lts/

The problem is that almost every day, there are various time-out errors. 1-10 /day, mainly incoming connections, but regularly outgoing connections as well. (ASSP handles around 10-15 thousands mail/day)

The most serious issue is, when a local domain user tries to send a mail, and the specific mail couldn’t be sent:

ASSP log:

Jan-24-17 18:00:41 [Worker_1] 10.125.201.11 info: authentication - login is used

Jan-24-17 18:00:41 m1-77241-04754 [Worker_1] 10.125.201.11 <”sender address”> info: found message size announcement: 58.33 kByte

Jan-24-17 18:03:53 m1-77241-04754 [Worker_1] 10.125.201.11 <”sender address”> to: „recipient address” Connection idle for 180 secs - timeout

Jan-24-17 18:03:53 m1-77241-04754 [Worker_1] 10.125.201.11 < sender address > to: recipient address [SMTP Status] 451 Connection timeout, try later

Here is another assp log, as you can see, the MTA(postfix daemon) couldn’t get the message, it doesn’t reach the DATA part of the SMTP session:

Jan-31-17 22:11:35 m1-96903-05919 [Worker_3] 192.168.168.2 <sender address> to: recipient address Connection idle for 180 secs - timeout

Jan-31-17 22:11:35 m1-96903-05919 [Worker_3] 192.168.168.2 <sender address> to: recipient address [SMTP Status] 451 Connection timeout, try later

Jan-31-17 22:11:35 m1-96903-05919 [Worker_3] 192.168.168.2 <sender address> to: recipient address disconnected: session:100D1A00 192.168.168.2 - command list was 'EHLO,AUTH,MAIL FROM,RCPT TO,RCPT TO,RCPT TO,RCPT TO,RCPT TO,RCPT TO,RCPT TO,RCPT TO,RCPT TO' - used 14 SocketCalls - processing time 0 seconds

Postfix log:

Jan 24 18:00:41 vpss-mail postfix/smtpd[106559]: connect from vpss-mail[10.125.x.x]

Jan 24 18:00:41 vpss-mail postfix/smtpd[106559]: 9235C1609FA: client=vpss-mail[10.125.x.x], sasl_method=LOGIN, sasl_username=xy@localhost

Jan 24 18:03:53 vpss-mail postfix/smtpd[106559]: lost connection after RCPT from vpss-mail[10.125.x.x]

Jan 24 18:03:53 vpss-mail postfix/smtpd[106559]: disconnect from vpss-mail[10.125.x.x] ehlo=1 auth=1 mail=1 rcpt=4 commands=7

Exchange server’s send connector(smart host:assp) log: (after the usual connection and mail from –ok, rcpt to – ok part:)

2017-01-24T17:03:18.761Z,”ASSP hostname” ,08D3EBC54E87F0B0,34,10.125.201.11:35812,10.125.x.x:25,*,,"HandleError has encountered a suspicious connection reset from a remote, non-mailbox transport server (will retry in 00:10:00)."

 

After such an error, the affected assp worker gets stuck in ThreadGetNewCon state, loop age 0 sec, and it doesn’t change any more. Ten minutes later, the exchange server tries to send the mail again, the same error happens, and another worker gets stuck in ThreadGetNewConnection state and so on. At the same time, the assp.perl process starts to eat a lot of cpu. The ubuntu server is on an vmware esxi host. This ubuntu guest normally consumes around 50-150 Mhz cpu, but after the first error, it jumps up to 7000-8000 Mhz and it doesn’t change until an assp service restart.

99% of Assp’s localdomains are domains on remote exchange servers, which use assp as their smart host, they connect over the internet to the assp server’s public ip address (sasl auth to postfix), but there are a few domains, which are on the same esxi host as the assp+postfix ubuntu guest, and the error occurs in both cases.(local intranet in esxi and connections from remote exchange servers over the internet)

 

What i’ve done so far:

First i completely turned off TLS (set to droptls), and disabled damping.

Created dedicated network interfaces for connecting the local exchange servers and assp, inside the esxi host.

Changed the postfix configuration several times, here is the actual one(main.cf):

 

smtpd_relay_restrictions = permit_mynetworks, permit_sasl_authenticated, reject_unauth_destination

maximal_queue_lifetime = 1d

bounce_queue_lifetime = 1d

smtpd_banner = „x.x.hu”

biff=no

append_dot_mydomain = no

readme_directory = no

smtpd_tls_security_level=none

smtpd_relay_restrictions = permit_mynetworks, permit_sasl_authenticated, reject_unauth_destination

myhostname = „x.y.hu”

alias_maps = hash:/etc/aliases

alias_database = hash:/etc/aliases

myorigin = /etc/mailname

mydestination =

relayhost =

mynetworks = 127.0.0.0/8 10.125.x.x/32 192.168.x.x [::ffff:127.0.0.0]/104 [::1]/128

mailbox_size_limit = 0

recipient_delimiter = +

inet_interfaces = all

inet_protocols = all

smtpd_client_restrictions = permit_mynetworks, reject

smtpd_delay_reject = no

transport_maps = hash:/etc/postfix/transport

smtpd_recipient_restrictions = permit_sasl_authenticated, permit_mynetworks, reject_unauth_destination

smtpd_sender_restrictions = permit_sasl_authenticated, permit_mynetworks, reject_unknown_sender_domain

smtpd_sasl_auth_enable = yes

broken_sasl_auth_clients = yes

smtpd_sasl_security_options = noanonymous

smtpd_sasl_local_domain =

message_size_limit = 41943040

# Virtual Mailbox Domain Settings

virtual_alias_maps = mysql:/etc/postfix/mysql_virtual_alias_maps.cf

virtual_mailbox_domains = mysql:/etc/postfix/mysql_virtual_domains_maps.cf

virtual_mailbox_maps = mysql:/etc/postfix/mysql_virtual_mailbox_maps.cf

virtual_mailbox_limit = 51200000

virtual_minimum_uid = 5000

virtual_uid_maps = static:5000

virtual_gid_maps = static:5000

virtual_mailbox_base = /home/vmail

virtual_transport = virtual

 

i turned on all the debugging functions in assp. (an example is above, the second assp log):

command list was 'EHLO,AUTH,MAIL FROM,RCPT TO,RCPT TO,RCPT TO,RCPT TO,RCPT TO,RCPT TO,RCPT TO,RCPT TO,RCPT TO' - used 14 SocketCalls - processing time 0 seconds

Dns response times are good, i use two local windows server 2012 r2 dns servers, response times are 0 and 1 ms.

The load of the server is minimal, the error also appear when the mail traffic is almost zero, and all the workers are sleeping, and while no other smtp connections are in place, so i dont think it is a performance issue.

Assp version is the current one, i’ve already updated the perl modules.

Perl Version:       5.022001

 

Sometimes there are timeouts during incoming connections from the internet like this:

190.252.20.18 <[hidden email]> to: ..@...hu info: file data_disk/spam/Your_order_Canceled_fraud--131742.eml was deleted - reason: MTA reply 421 4.4.2 „ASSP+postfix server name” Error: timeout exceeded

 

Please give me some advice J

 

Thank you!

 

Csanad

 

 

 

 

 

 

 

 


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
Assp-user mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/assp-user
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Time-out, workers get stuck and increased cpu usage

Robert K Coffman Jr. -Info From Data Corp.
I think I might be experiencing the same issue, but I didn't make the
connection to stuck workers.  I've been restarting ASSP to work around
it.  I'm on 2.5.6(17013).  The previous version (and possibly one prior
to that) would peg the processor.  This version, processor utilization
goes up but not to 100%.  I'll see if I can confirm that I'm also seeing
workers stuck.

- Bob Coffman


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
Assp-user mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/assp-user
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Time-out, workers get stuck and increased cpu usage

Robert K Coffman Jr. -Info From Data Corp.
Further information.  Unsure if related.

I did not restart ASSP today, hoping to trigger this condition.

At around 12:11, ASSP died with this error:

  Warning: got unexpected signal ALRM in Main_Thread: package - main,
file - sub main::ConToThread, line - 80!

The last line before this was a call to ASSP_AFC.  I had plugin version
4.40.  I upgraded that to 4.45 and I'm monitoring it.

- Bob


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
Assp-user mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/assp-user
Loading...