Dependency Graph

Dependency Graph
related to related to child of child of duplicate of duplicate of

View Issue Details

IDProjectCategoryView StatusLast Update
0003026SOGoBackend Generalpublic2015-01-10 10:19
ReporterSigurd Holter Assigned Toludovic  
PriorityimmediateSeveritymajorReproducibilityalways
Status resolvedResolutionfixed 
Platform[Server] LinuxOSRHEL/CentOSOS Version7
Product Version2.2.11a 
Fixed in Version2.2.14 
Summary0003026: WOWatchDogRequestTimeout stopped working with 2.2.11a
Description

After upgrading to 2.2.11a WOWatchDogRequestTimeout = 120 is not honored any more, and processes are killed after 10 minutes.

TagsNo tags attached.

Relationships

related to 0003051 resolvedludovic ActiveSync raises exceptions, child terminated with signal 11 after 10 minutes 

Activities

ludovic

ludovic

2014-12-16 16:38

administrator   ~0007846

That code has NOT changed since March 2011!

I've just tried with the following values:

WOWatchDogRequestTimeout = 120;
SOGoMailCustomFromEnabled = YES;
SOGoMaximumPingInterval = 3600;
SOGoMaximumSyncInterval = 3600;
SOGoInternalSyncInterval = 60;

and Apache proxy configuration:

ProxyPass /Microsoft-Server-ActiveSync http://127.0.0.1:20000/SOGo/Microsoft-Server-ActiveSync retry=0 connectiontimeout=5 timeout=3600

and it works as expected. The daemon was running well after 10 mins.

...
Dec 16 17:36:07 sogod [16855]: [WARN] <0x0x7fd1501225d0[WOWatchDogChild]> pid 16856 has been hanging in the same request for 10 minutes
Dec 16 17:37:07 sogod [16855]: [WARN] <0x0x7fd1501225d0[WOWatchDogChild]> pid 16856 has been hanging in the same request for 11 minutes
Dec 16 17:38:07 sogod [16855]: [WARN] <0x0x7fd1501225d0[WOWatchDogChild]> pid 16856 has been hanging in the same request for 12 minutes
...

Sigurd Holter

Sigurd Holter

2014-12-16 17:34

reporter   ~0007850

I get repeated occurences (38 times between 0330 and 1030 today) of this sequence :

Dec 16 09:50:22 sogod [15628]: |SOGo| starting method 'OPTIONS' on uri '/SOGo/Microsoft-Server-ActiveSync?User=benjamin&DeviceId=B4389A49654394ACF8174149614DDA60&DeviceType=WP8'
2014-12-16 09:50:22.179 sogod[15628] -[WEClientCapabilities initWithRequest:]: Unknown WebClient: user-agent='MSFT-WP/8.10.12393'
Dec 16 09:50:22 sogod [15628]: |SOGo| request took 0.013008 seconds to execute
Dec 16 09:50:22 sogod [15628]: |SOGo| starting method 'POST' on uri '/SOGo/Microsoft-Server-ActiveSync?jREUBBC0OJpJZUOUrPgXQUlhTdpgBAAAAAADV1A4'
Dec 16 09:50:22 sogod [15628]: |SOGo| request took 0.001679 seconds to execute
Dec 16 09:50:22 sogod [15628]: |SOGo| starting method 'POST' on uri '/SOGo/Microsoft-Server-ActiveSync?jQkUBBC0OJpJZUOUrPgXQUlhTdpgBAAAAAADV1A4'
Dec 16 09:50:22 sogod [15628]: <0x0x7ffb8a20b110[NGImap4Client]> TLS started successfully.
Dec 16 09:50:23 sogod [15628]: |SOGo| request took 1.165617 seconds to execute
Dec 16 09:50:23 sogod [15628]: |SOGo| starting method 'POST' on uri '/SOGo/Microsoft-Server-ActiveSync?jQAUBBC0OJpJZUOUrPgXQUlhTdpgBAAAAAADV1A4'
Dec 16 09:51:23 sogod [18704]: [WARN] <0x0x7ffb890b3180[WOWatchDogChild]> pid 15628 has been hanging in the same request for 1 minutes
Dec 16 09:52:23 sogod [18704]: [WARN] <0x0x7ffb890b3180[WOWatchDogChild]> pid 15628 has been hanging in the same request for 2 minutes
Dec 16 09:53:23 sogod [18704]: [WARN] <0x0x7ffb890b3180[WOWatchDogChild]> pid 15628 has been hanging in the same request for 3 minutes
Dec 16 09:54:23 sogod [18704]: [WARN] <0x0x7ffb890b3180[WOWatchDogChild]> pid 15628 has been hanging in the same request for 4 minutes
Dec 16 09:55:23 sogod [18704]: [WARN] <0x0x7ffb890b3180[WOWatchDogChild]> pid 15628 has been hanging in the same request for 5 minutes
Dec 16 09:56:23 sogod [18704]: [WARN] <0x0x7ffb890b3180[WOWatchDogChild]> pid 15628 has been hanging in the same request for 6 minutes
Dec 16 09:57:23 sogod [18704]: [WARN] <0x0x7ffb890b3180[WOWatchDogChild]> pid 15628 has been hanging in the same request for 7 minutes
Dec 16 09:58:23 sogod [18704]: [WARN] <0x0x7ffb890b3180[WOWatchDogChild]> pid 15628 has been hanging in the same request for 8 minutes
Dec 16 09:59:23 sogod [18704]: [WARN] <0x0x7ffb890b3180[WOWatchDogChild]> pid 15628 has been hanging in the same request for 9 minutes
Dec 16 10:00:23 sogod [18704]: [WARN] <0x0x7ffb890b3180[WOWatchDogChild]> pid 15628 has been hanging in the same request for 10 minutes
Dec 16 10:00:27 sogod [18704]: <0x0x7ffb890b3180[WOWatchDogChild]> child 15628 exited
Dec 16 10:00:27 sogod [18704]: <0x0x7ffb890b3180[WOWatchDogChild]> (terminated due to signal 11)
Dec 16 10:00:27 sogod [18704]: <0x0x7ffb88e38cd0[WOWatchDog]> child spawned with pid 16763

I have in sogo.conf :

WOWatchDogRequestTimeout = 120;

I have also added to the startup script (to see if it helps) :

DAEMON_OPTS="-WOWorkersCount $PREFORK -WOPidFile $PIDFILE -WOLogFile $LOGFILE -WOWatchDogRequestTimeout 120"

ps aux shows :

sogo 18737 0.0 0.0 429748 16364 ? Ss des.14 0:01 /usr/sbin/sogod -WOWorkersCount 30 -WOPidFile /var/run/sogo/sogo.pid -WOLogFile /var/log/sogo/sogo.log -WOWatchDogRequestTimeout 120

... and most importantly of all - the setup and sogo.conf file have not changed for a month or more.
It is the first time the WOWatchDogRequestTimeout has been ignored.
When first installing SOGo I started out with 10 minutes, but due to large mailboxes I have gradually been increasing this.
So this is not a sudden change on our part.

greping sogo.log shows NO "has been hanging for 11 minutes" etc., it always exits after 10 minutes.

What else can be causing this ?

ludovic

ludovic

2014-12-16 18:22

administrator   ~0007851

The child existed with a segfault - signal 11. It was NOT terminated by the partent.

Attach using gdb to a "hanging child' and produce a stacktrace when that happens:

http://www.sogo.nu/fr/nc/support/faq/article/how-do-i-debug-sogo.html

Sigurd Holter

Sigurd Holter

2014-12-18 10:42

reporter  

backtrace.log.tar.gz (989,279 bytes)
Sigurd Holter

Sigurd Holter

2014-12-18 10:44

reporter   ~0007875

Backtrace uploaded. Note that crash occured after about 20 minutes this time.

Sigurd Holter

Sigurd Holter

2014-12-18 10:46

reporter   ~0007876

Last edited: 2014-12-18 10:46

Same crash occurs after 10 minutes when same user uses Outlook 2013 with ActiveSync - same account/server etc.

ludovic

ludovic

2014-12-18 13:40

administrator   ~0007878

Could you be running out of RAM?

Sigurd Holter

Sigurd Holter

2014-12-18 13:55

reporter   ~0007879

I thought it was a memory issue, too, at first and set SxVMemLimit = 8192

But no, doesn't seem that way. There's still 4+ GB free when SOGOd process claims about 2 GB. And no OOM messages either.

It's the same server/setup I've been using since August, and it runs fine in production for other users on Thunderbird etc.

ludovic

ludovic

2014-12-30 17:45

administrator   ~0007946

Try with 2.2.13 with the new SOGoMaximumSyncResponseSize configuration parameter.

Sigurd Holter

Sigurd Holter

2015-01-05 11:50

reporter   ~0007988

Sorry, I see I should have responded to this issue instead of posting a new bug at 0003051.

But anyways, the issue persists in 2.2.13 with SOGoMaximumSyncResponseSize=10240.
Process stil claims about 2GB and there's free RAM on the server on exit.

Backtraces have been uploaded to 0003051.

Issue History

Date Modified Username Field Change
2014-12-15 13:39 Sigurd Holter New Issue
2014-12-16 16:38 ludovic Note Added: 0007846
2014-12-16 16:38 ludovic Status new => closed
2014-12-16 16:38 ludovic Assigned To => ludovic
2014-12-16 16:38 ludovic Resolution open => unable to reproduce
2014-12-16 17:34 Sigurd Holter Note Added: 0007850
2014-12-16 17:34 Sigurd Holter Status closed => feedback
2014-12-16 17:34 Sigurd Holter Resolution unable to reproduce => reopened
2014-12-16 18:22 ludovic Note Added: 0007851
2014-12-18 10:42 Sigurd Holter File Added: backtrace.log.tar.gz
2014-12-18 10:44 Sigurd Holter Note Added: 0007875
2014-12-18 10:44 Sigurd Holter Status feedback => assigned
2014-12-18 10:46 Sigurd Holter Note Added: 0007876
2014-12-18 10:46 Sigurd Holter Note Edited: 0007876
2014-12-18 13:40 ludovic Note Added: 0007878
2014-12-18 13:55 Sigurd Holter Note Added: 0007879
2014-12-30 17:45 ludovic Note Added: 0007946
2015-01-05 11:50 Sigurd Holter Note Added: 0007988
2015-01-09 11:22 Christian Mack Relationship added related to 0003051
2015-01-10 10:19 ludovic Status assigned => resolved
2015-01-10 10:19 ludovic Fixed in Version => 2.2.14
2015-01-10 10:19 ludovic Resolution reopened => fixed