Unable to detect any running worker

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
2 messages Options
MK
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Unable to detect any running worker

MK
Using ASSP CVS 2.5.6/17184.

I have a server that pumps about 1800 messages into a queue and exim on that server makes connections to ASSP to forward the mail. Basically ASSP is the outgoing mail server.

It get through about 140 messages, at which point the SMTP connections time out (per exim's logs). I'm not sure the concurrency it generates to do so, but the connections to the proxy SMTP server it sends to gets to about 40 right away and then drops off (so I assume that means my concurrent connections about 40)

Meanwhile, ASSP shows:
...[all is fine to here]...
Jul-07-17 08:27:46 [Main_Thread] Info: unable to detect any running worker for a new connection - wait (max 30 seconds)
...[repeated]...
Jul-07-17 08:27:47 [Main_Thread] Info: unable to detect any running worker for a new connection - wait (max 30 seconds)
Jul-07-17 08:27:47 [Main_Thread] Info: ConnectionTransferTimeOut (30 seconds) is now reached
Jul-07-17 08:27:47 [Main_Thread] Warning: Main_Thread is unable to transfer connection to any worker - try again!
Jul-07-17 08:27:47 [Main_Thread] Error: Main_Thread is unable to transfer connection to any worker within 120 seconds - restart ASSP
!
Jul-07-17 08:27:47 [Main_Thread] Initializing shutdown sequence
Jul-07-17 08:27:47 [Shutdown] Info: removing all SMTP and Proxy listeners
Jul-07-17 08:27:47 [Worker_4] Info: shutdown: Worker_4: Cannot pack NaN with 'C' at sub main::ipNetwork line 11.
Jul-07-17 08:27:47 [Worker_3] Info: shutdown: Worker_3: Cannot pack NaN with 'C' at sub main::ipNetwork line 11.
Jul-07-17 08:27:47 [Worker_5] Info: shutdown: Worker_5: Cannot pack NaN with 'C' at sub main::ipNetwork line 11.
Jul-07-17 08:27:47 [Worker_3] Worker_3 finished
Jul-07-17 08:27:47 [Worker_4] Worker_4 finished
Jul-07-17 08:27:47 [Worker_5] Worker_5 finished
Jul-07-17 08:27:47 [Worker_2] Info: shutdown: Worker_2: Cannot pack NaN with 'C' at sub main::ipNetwork line 11.
Jul-07-17 08:27:47 [Worker_2] Worker_2 finished
Jul-07-17 08:27:47 [Shutdown] Waiting for all SMTP-Workers to be finished
Jul-07-17 08:27:47 [Worker_1] Info: shutdown: Worker_1: Cannot pack NaN with 'C' at sub main::ipNetwork line 11.

Once ASSP restarts and the retry interval is received, ASSP tries again, makes it through about 200 messages and then the same outcome.


Of course what it's doing is flooding ASSP with SMTP connections.
The host is in AccetAllMail (yes I know we're not using relayport, but we need to make sure the SMTP server can handle a flood of connections gracefully)
The maxSMTPSessions is 64 and MaxSMTPipSessions is 15, and given the status of the workers, I don't think it's hitting those limits.
5 NumComWorkers (SMTP Threads), EnableHighPerformance (off), ThreadCycleTime (3000), IO:Poll engine.
Using a local bind resolver which shows nothing strange.

Any thoughts?

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Assp-test mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/assp-test
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Unable to detect any running worker

Colin

There's a lot to consider here,

First up, you could do with figuring out the rates at which Exim is dumping mail into the queue then throttling it using Exim to see if the problem goes away. That way you'll know whether it is a problem with the rate or the number/type of email. Look at things like smtp_accept_max or maybe queue_smtp_domains to make deliveries go through the queue rather than open up a new SMTP thread for every message.

Size of email and encoding/attachments will likely make things take longer.

Secondly, you need to use debugging to find out if something is happening that is causing ASSP to take a long time to handle messages.

Thirdly, resources. I've a max of 112 concurrent connections showing on the stats page though it is only 43 since last restart on Sunday so the general average is lower. I have two VMs running Ubuntu with 16 vCPUs. 12GB on the primary and 16GB on the secondary as this runs the rebuild. MySQL is a separate machine again with 16 vCPUs and 8GB ram.

So ASSP can easily handle the througput you're looking at and more, you need to look for bottlenecks and other errors. The actual issue will have ocurred at least 30s before the logs you have posted at 08:27:17 as that is when the timeout counter started that expired at 08:27:47.

"Cannot pack NaN" makes me suspicious as well for the usual - check all perl modules and ancillary files are up to date as well as the main assp.pl. Something isn't right.

Then there's the another question about the config. Is there a particular reason the Exim server needs to run through ASSP? All my servers accept email then hand off to Exim for delivery. There are plenty of servers that use ASSP as a smart host, but I'd question putting a server that dumps mail like that through. The reason for that is to think about the types of emails and the effect on the corpus. If you're dumping a mailing list through then you're going to affect the bayes/hmm database. You could redlist but then why waste the resources and not just have Exim send direct?

I know it's been a week or so since you posted, hopefully you've done some or all of that by now as it is fairly standard troubleshooting rather than anything specific to ASSP. If you've confirmed your setup is in order and can pull some logs that show ASSP actually causing a problem then that's what the list is for.

All the best,
Colin.

On 07/07/2017 16:39, MK wrote:
Using ASSP CVS 2.5.6/17184.

I have a server that pumps about 1800 messages into a queue and exim on that server makes connections to ASSP to forward the mail. Basically ASSP is the outgoing mail server.

It get through about 140 messages, at which point the SMTP connections time out (per exim's logs). I'm not sure the concurrency it generates to do so, but the connections to the proxy SMTP server it sends to gets to about 40 right away and then drops off (so I assume that means my concurrent connections about 40)

Meanwhile, ASSP shows:
...[all is fine to here]...
Jul-07-17 08:27:46 [Main_Thread] Info: unable to detect any running worker for a new connection - wait (max 30 seconds)
...[repeated]...
Jul-07-17 08:27:47 [Main_Thread] Info: unable to detect any running worker for a new connection - wait (max 30 seconds)
Jul-07-17 08:27:47 [Main_Thread] Info: ConnectionTransferTimeOut (30 seconds) is now reached
Jul-07-17 08:27:47 [Main_Thread] Warning: Main_Thread is unable to transfer connection to any worker - try again!
Jul-07-17 08:27:47 [Main_Thread] Error: Main_Thread is unable to transfer connection to any worker within 120 seconds - restart ASSP
!
Jul-07-17 08:27:47 [Main_Thread] Initializing shutdown sequence
Jul-07-17 08:27:47 [Shutdown] Info: removing all SMTP and Proxy listeners
Jul-07-17 08:27:47 [Worker_4] Info: shutdown: Worker_4: Cannot pack NaN with 'C' at sub main::ipNetwork line 11.
Jul-07-17 08:27:47 [Worker_3] Info: shutdown: Worker_3: Cannot pack NaN with 'C' at sub main::ipNetwork line 11.
Jul-07-17 08:27:47 [Worker_5] Info: shutdown: Worker_5: Cannot pack NaN with 'C' at sub main::ipNetwork line 11.
Jul-07-17 08:27:47 [Worker_3] Worker_3 finished
Jul-07-17 08:27:47 [Worker_4] Worker_4 finished
Jul-07-17 08:27:47 [Worker_5] Worker_5 finished
Jul-07-17 08:27:47 [Worker_2] Info: shutdown: Worker_2: Cannot pack NaN with 'C' at sub main::ipNetwork line 11.
Jul-07-17 08:27:47 [Worker_2] Worker_2 finished
Jul-07-17 08:27:47 [Shutdown] Waiting for all SMTP-Workers to be finished
Jul-07-17 08:27:47 [Worker_1] Info: shutdown: Worker_1: Cannot pack NaN with 'C' at sub main::ipNetwork line 11.

Once ASSP restarts and the retry interval is received, ASSP tries again, makes it through about 200 messages and then the same outcome.


Of course what it's doing is flooding ASSP with SMTP connections.
The host is in AccetAllMail (yes I know we're not using relayport, but we need to make sure the SMTP server can handle a flood of connections gracefully)
The maxSMTPSessions is 64 and MaxSMTPipSessions is 15, and given the status of the workers, I don't think it's hitting those limits.
5 NumComWorkers (SMTP Threads), EnableHighPerformance (off), ThreadCycleTime (3000), IO:Poll engine.
Using a local bind resolver which shows nothing strange.

Any thoughts?


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot


_______________________________________________
Assp-test mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/assp-test


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Assp-test mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/assp-test
Loading...