|
Hi @qhivert, I've had SOGo hanging a few times yesterday and today, and after looking closely at the SOGo logs and the matching rproxy logs, I think I have some patterns.
I see some heavy users of the calendar doing some REPORT requests, and soon after, those threads die and stop taking requests. These requests come from client "dataaccessd" (Apple devices), which first appears to do an unauthenticated REPORT request, which fails, and then retries with authentication.
See the following sequence: (rproxy logs first, then sogo)
X.Y.Z.141 - - [18/Feb/2025:07:44:34 +0000] "OPTIONS /SOGo/dav/user8/ HTTP/2.0" 401 0 "-" "macOS/15.0.1 (24A348) dataaccessd/1.0"
X.Y.Z.141 - user8 [18/Feb/2025:07:44:34 +0000] "OPTIONS /SOGo/dav/user8/ HTTP/2.0" 200 0 "-" "macOS/15.0.1 (24A348) dataaccessd/1.0"
X.Y.Z.141 - - [18/Feb/2025:07:44:34 +0000] "REPORT /SOGo/dav/user8/ HTTP/2.0" 401 0 "-" "macOS/15.0.1 (24A348) dataaccessd/1.0"
X.Y.Z.141 - user8 [18/Feb/2025:07:44:34 +0000] "REPORT /SOGo/dav/user8/ HTTP/2.0" 207 404 "-" "macOS/15.0.1 (24A348) dataaccessd/1.0"
X.Y.Z.141 - - [18/Feb/2025:07:44:34 +0000] "PROPFIND /SOGo/dav/user8/Calendar/ HTTP/2.0" 401 0 "-" "macOS/15.0.1 (24A348) dataaccessd/1.0"
X.Y.Z.141 - user8 [18/Feb/2025:07:44:34 +0000] "PROPFIND /SOGo/dav/user8/Calendar/ HTTP/2.0" 207 2043 "-" "macOS/15.0.1 (24A348) dataaccessd/1.0"
Feb 18 08:44:34 sogod [1014543]: |SOGo| starting method 'OPTIONS' on uri '/SOGo/dav/user8/'
Feb 18 08:44:34 sogod [1014543]: |SOGo| request took 0.006724 seconds to execute
Feb 18 08:44:34 sogod [1014543]: X.Y.Z.141, 192.168.21.100 "OPTIONS /SOGo/dav/user8/ HTTP/1.1" 401 0/0 0.009 - - 0 - 14
Feb 18 08:44:34 sogod [1014543]: |SOGo| starting method 'OPTIONS' on uri '/SOGo/dav/user8/'
Feb 18 08:44:34 sogod [1014543]: |SOGo| request took 0.003697 seconds to execute
Feb 18 08:44:34 sogod [1014543]: X.Y.Z.141, 192.168.21.100 "OPTIONS /SOGo/dav/user8/ HTTP/1.1" 200 0/0 0.005 - - 0 - 14
Feb 18 08:44:34 sogod [1014543]: |SOGo| starting method 'REPORT' on uri '/SOGo/dav/user8/'
Feb 18 08:44:34 sogod [1014543]: |SOGo| request took 0.000831 seconds to execute
Feb 18 08:44:34 sogod [1014543]: X.Y.Z.141, 192.168.21.100 "REPORT /SOGo/dav/user8/ HTTP/1.1" 401 0/768 0.002 - - 0 - 14
Feb 18 08:44:34 sogod [1014543]: |SOGo| starting method 'REPORT' on uri '/SOGo/dav/user8/'
Feb 18 08:44:34 sogod [1014543]: |SOGo| request took 0.034553 seconds to execute
Feb 18 08:44:34 sogod [1014543]: X.Y.Z.141, 192.168.21.100 "REPORT /SOGo/dav/user8/ HTTP/1.1" 207 404/768 0.037 1380 70% 0 - 13
Feb 18 08:44:34 sogod [1014543]: |SOGo| starting method 'PROPFIND' on uri '/SOGo/dav/user8/Calendar/'
Feb 18 08:44:34 sogod [1014543]: |SOGo| request took 0.000864 seconds to execute
Feb 18 08:44:34 sogod [1014543]: X.Y.Z.141, 192.168.21.100 "PROPFIND /SOGo/dav/user8/Calendar/ HTTP/1.1" 401 0/2237 0.002 - - 0 - 13
Feb 18 08:44:34 sogod [1014543]: |SOGo| starting method 'PROPFIND' on uri '/SOGo/dav/user8/Calendar/'
Feb 18 08:44:34 sogod [1014543]: |SOGo| request took 0.041733 seconds to execute
Feb 18 08:44:34 sogod [1014543]: X.Y.Z.141, 192.168.21.100 "PROPFIND /SOGo/dav/user8/Calendar/ HTTP/1.1" 207 2043/2237 0.043 23464 91% 0 - 13
[ ... ]
Feb 18 08:44:48 sogod [1014543]: |SOGo| starting method 'PROPFIND' on uri '/SOGo/dav/user25/Contacts/ldaplocal/'
Feb 18 08:44:48 sogod [1014543]: |SOGo| request took 0.415206 seconds to execute
Feb 18 08:44:48 sogod [1014543]: D.E.F.136, 192.168.21.100 "PROPFIND /SOGo/dav/user25/Contacts/ldaplocal/ HTTP/1.1" 207 2803/231 0.417 51721 94% 0 - 13
Feb 18 08:44:50 sogod [1014543]: |SOGo| starting method 'OPTIONS' on uri '/SOGo/dav/user14/Calendar/mycompany_1AC-674D9580-F5-2EDD2A80/'
Feb 18 08:45:50 sogod [1011081]: [WARN] <0x0x56321433b420[WOWatchDogChild]> pid 1014543 has been hanging in the same request for 1 minutes
Feb 18 08:46:50 sogod [1011081]: [WARN] <0x0x56321433b420[WOWatchDogChild]> pid 1014543 has been hanging in the same request for 2 minutes
Feb 18 08:47:50 sogod [1011081]: [WARN] <0x0x56321433b420[WOWatchDogChild]> pid 1014543 has been hanging in the same request for 3 minutes
Feb 18 08:48:50 sogod [1011081]: [WARN] <0x0x56321433b420[WOWatchDogChild]> pid 1014543 has been hanging in the same request for 4 minutes
Feb 18 08:49:50 sogod [1011081]: [WARN] <0x0x56321433b420[WOWatchDogChild]> pid 1014543 has been hanging in the same request for 5 minutes
Feb 18 08:50:50 sogod [1011081]: [WARN] <0x0x56321433b420[WOWatchDogChild]> pid 1014543 has been hanging in the same request for 6 minutes
Feb 18 08:51:50 sogod [1011081]: [WARN] <0x0x56321433b420[WOWatchDogChild]> pid 1014543 has been hanging in the same request for 7 minutes
Feb 18 08:52:50 sogod [1011081]: [WARN] <0x0x56321433b420[WOWatchDogChild]> pid 1014543 has been hanging in the same request for 8 minutes
Feb 18 08:53:50 sogod [1011081]: [WARN] <0x0x56321433b420[WOWatchDogChild]> pid 1014543 has been hanging in the same request for 9 minutes
Feb 18 08:54:50 sogod [1011081]: [WARN] <0x0x56321433b420[WOWatchDogChild]> safety belt -- sending KILL signal to pid 1014543
Feb 18 08:54:50 sogod [1011081]: <0x0x56321433b420[WOWatchDogChild]> child 1014543 exited
As you can see, the thread 1014543 interacted with dataaccessd and soon after, just 14 seconds later, interacts with another process but then dies until sogod is restarted.
I have seen this with other users. The following one has a lot of subscriptions, and his Apple client did a tsunami of REPORTS. Immediately after that, all 22 threads were dead. (note the missing newlines are verbatim, it's how they were logged in sogo.log, for some reason).
Feb 17 10:23:45 sogod [1004377]: z.x.y.144, 192.168.21.100 "PROPFIND /SOGo/dav/user26/Calendar/ HTTP/1.1" 207 1727/2237 0.020 14696 88% 0 - 14
Feb 17 10:23:50 sogod [1004377]: |SOGo| starting method 'REPORT' on uri '/SOGo/dav/user1/Calendar/user23_personal/'Feb 17 10:23:50 sogod [1004360]: |SOGo| starting method 'REPORT' on uri '/SOGo/dav/user1/Calendar/user22_personal/'
Feb 17 10:23:50 sogod [1004380]: |SOGo| starting method 'REPORT' on uri '/SOGo/dav/user1/Calendar/personal/'
Feb 17 10:23:50 sogod [1004298]: |SOGo| starting method 'REPORT' on uri '/SOGo/dav/user1/Calendar/user5_personal/'
Feb 17 10:23:50 sogod [1004296]: |SOGo| starting method 'REPORT' on uri '/SOGo/dav/user1/Calendar/user6_personal/'
Feb 17 10:23:50 sogod [1004374]: |SOGo| starting method 'REPORT' on uri '/SOGo/dav/user1/Calendar/personal/'
Feb 17 10:23:50 sogod [1004365]: |SOGo| starting method 'REPORT' on uri '/SOGo/dav/user1/Calendar/39008A-63FFC180-5-263FAB40/'
Feb 17 10:23:51 sogod [1004369]: |SOGo| starting method 'REPORT' on uri '/SOGo/dav/user1/Calendar/user7_personal/'
Feb 17 10:23:51 sogod [1004375]: |SOGo| starting method 'REPORT' on uri '/SOGo/dav/user1/Calendar/user8_personal/'
Feb 17 10:23:51 sogod [1004317]: |SOGo| starting method 'REPORT' on uri '/SOGo/dav/user1/Calendar/user9_personal/'
Feb 17 10:23:51 sogod [1004363]: |SOGo| starting method 'REPORT' on uri '/SOGo/dav/mycompany/Calendar/3E2B-5F152900-1-2789DE40/'
Feb 17 10:23:51 sogod [1004319]: |SOGo| starting method 'REPORT' on uri '/SOGo/dav/user1/Calendar/user10_personal/'
Feb 17 10:23:51 sogod [1004314]: |SOGo| starting method 'REPORT' on uri '/SOGo/dav/user1/Calendar/user11_personal/'
Feb 17 10:23:51 sogod [1004381]: |SOGo| starting method 'REPORT' on uri '/SOGo/dav/user1/Calendar/3911C3-63FFEC00-1-78222400/'
Feb 17 10:23:51 sogod [1004300]: |SOGo| starting method 'REPORT' on uri '/SOGo/dav/user1/Calendar/user12_personal/'
Feb 17 10:23:51 sogod [1004311]: |SOGo| starting method 'REPORT' on uri '/SOGo/dav/user1/Calendar/user13_personal/'
Feb 17 10:23:51 sogod [1004297]: |SOGo| starting method 'REPORT' on uri '/SOGo/dav/user1/Calendar/mycompany_personal/'
Feb 17 10:23:51 sogod [1004301]: |SOGo| starting method 'REPORT' on uri '/SOGo/dav/user1/Calendar/user14_personal/'
Feb 17 10:23:51 sogod [1004313]: |SOGo| starting method 'REPORT' on uri '/SOGo/dav/user1/Calendar/user15_personal/'
Feb 17 10:23:51 sogod [1004303]: |SOGo| starting method 'REPORT' on uri '/SOGo/dav/user1/Calendar/user16_personal/'
Feb 17 10:23:51 sogod [1004384]: |SOGo| starting method 'REPORT' on uri '/SOGo/dav/user1/Calendar/mycompany-holidays_personal/' Feb 17 10:23:51 sogod [1004304]: |SOGo| starting method 'REPORT' on uri '/SOGo/dav/user1/Calendar/user4_personal/'
Feb 17 10:23:52 sogod [1004303]: [ERROR] <0x0x55eae17268c0[LDAPSource]> Could not bind to the LDAP server ldap://sogo.service.mycompany.com (389) using the bind DN: cn=admin,dc=sogo,dc=local,dc=mycompany,dc=com Feb 17 10:23:52 sogod [1004303]: [ERROR] <0x0x55eae17268c0[LDAPSource]> <NSException: 0x55eae1898610> NAME:LDAPException REASON:operation bind failed: Can't contact LDAP server (0xFFFFFFFF) INFO:{"error_code" = "-1"; login = "cn=admin,dc=sogo,dc=local,dc=mycompany,dc=com"; }
Feb 17 10:23:57 sogod [1004306]: |SOGo| starting method 'GET' on uri '/SOGo/so/user18/Calendar/view' Feb 17 10:24:07 sogod [1004302]: |SOGo| starting method 'REPORT' on uri '/SOGo/dav/user17/Calendar/personal/'Feb 17 10:24:07 sogod [1004385]: |SOGo| starting method 'REPORT' on uri '/SOGo/dav/user17/Calendar/mycompany-holidays_personal/'
Feb 17 10:24:07 sogod [1004299]: |SOGo| starting method 'REPORT' on uri '/SOGo/dav/user17/Calendar/mycompany_personal/'
Feb 17 10:24:35 sogod [1004307]: |SOGo| starting method 'OPTIONS' on uri '/SOGo/dav/user8/'
Feb 17 10:24:35 sogod [1004307]: |SOGo| request took 0.000834 seconds to execute
Feb 17 10:24:35 sogod [1004307]: X.Y.Z.141, 192.168.21.100 "OPTIONS /SOGo/dav/user8/ HTTP/1.1" 401 0/0 0.002 - - 0 - 12 Feb 17 10:24:35 sogod [1004307]: |SOGo| starting method 'OPTIONS' on uri '/SOGo/dav/user8/'
Feb 17 10:24:35 sogod [1004307]: |SOGo| request took 0.001694 seconds to execute Feb 17 10:24:35 sogod [1004307]: X.Y.Z.141, 192.168.21.100 "OPTIONS /SOGo/dav/user8/ HTTP/1.1" 200 0/0 0.003 - - 0 - 12
Feb 17 10:24:35 sogod [1004307]: |SOGo| starting method 'GET' on uri '/SOGo/so/user19/Calendar/eventslist?asc=1&day=20250217&filterpopup=view_next7&search=title_Category_Location&sort=start&value=' Feb 17 10:24:35 sogod [1004308]: |SOGo| starting method 'GET' on uri '/SOGo/so/user19/Calendar/eventsblocks?ed=20250223&sd=20250217&view=weekview'
Feb 17 10:24:35 sogod [1004352]: |SOGo| starting method 'REPORT' on uri '/SOGo/dav/user8/'
Feb 17 10:24:35 sogod [1004308]: |SOGo| request took 0.102340 seconds to execute
Feb 17 10:24:35 sogod [1004352]: |SOGo| request took 0.001017 seconds to execute
Feb 17 10:24:35 sogod [1004352]: X.Y.Z.141, 192.168.21.100 "REPORT /SOGo/dav/user8/ HTTP/1.1" 401 0/768 0.002 - - 0 - 12
Feb 17 10:24:35 sogod [1004308]: A.B.C.44, 192.168.21.100 "GET /SOGo/so/user19/Calendar/eventsblocks?ed=20250223&sd=20250217&view=weekview HTTP/1.1" 200 9412/0 0.106 - - 2M - 13
Feb 17 10:24:35 sogod [1004307]: |SOGo| request took 0.123402 seconds to execute
Feb 17 10:24:35 sogod [1004307]: A.B.C.44, 192.168.21.100 "GET /SOGo/so/user19/Calendar/eventslist?asc=1&day=20250217&filterpopup=view_next7&search=title_Category_Location&sort=start&value= HTTP/1.1" 200 8040/0 0.125 - - 5M - 13
Feb 17 10:24:35 sogod [1004307]: |SOGo| starting method 'REPORT' on uri '/SOGo/dav/user8/'
Feb 17 10:24:35 sogod [1004307]: |SOGo| request took 0.023053 seconds to execute
Feb 17 10:24:35 sogod [1004307]: X.Y.Z.141, 192.168.21.100 "REPORT /SOGo/dav/user8/ HTTP/1.1" 207 400/768 0.025 1380 71% 0 - 13
Feb 17 10:24:35 sogod [1004307]: |SOGo| starting method 'PROPFIND' on uri '/SOGo/dav/user8/Calendar/'
Feb 17 10:24:35 sogod [1004307]: |SOGo| request took 0.000833 seconds to execute
Feb 17 10:24:35 sogod [1004307]: X.Y.Z.141, 192.168.21.100 "PROPFIND /SOGo/dav/user8/Calendar/ HTTP/1.1" 401 0/2237 0.002 - - 0 - 13
Feb 17 10:24:35 sogod [1004307]: |SOGo| starting method 'PROPFIND' on uri '/SOGo/dav/user8/Calendar/'
Feb 17 10:24:35 sogod [1004307]: |SOGo| request took 0.042530 seconds to execute
Feb 17 10:24:35 sogod [1004307]: X.Y.Z.141, 192.168.21.100 "PROPFIND /SOGo/dav/user8/Calendar/ HTTP/1.1" 207 2043/2237 0.044 23464 91% 200K - 13
Feb 17 10:24:50 sogod [1004291]: [WARN] <0x0x55eae14caeb0[WOWatchDogChild]> pid 1004377 has been hanging in the same request for 1 minutes
Feb 17 10:24:50 sogod [1004291]: [WARN] <0x0x55eae14bbc30[WOWatchDogChild]> pid 1004360 has been hanging in the same request for 1 minutes
Feb 17 10:24:50 sogod [1004291]: [WARN] <0x0x55eae14ca7d0[WOWatchDogChild]> pid 1004380 has been hanging in the same request for 1 minutes
Feb 17 10:24:50 sogod [1004291]: [WARN] <0x0x55eae1470780[WOWatchDogChild]> pid 1004298 has been hanging in the same request for 1 minutes
Feb 17 10:24:50 sogod [1004291]: [WARN] <0x0x55eae146f830[WOWatchDogChild]> pid 1004296 has been hanging in the same request for 1 minutes
Feb 17 10:24:50 sogod [1004291]: [WARN] <0x0x55eae14cb100[WOWatchDogChild]> pid 1004374 has been hanging in the same request for 1 minutes
Feb 17 10:24:50 sogod [1004291]: [WARN] <0x0x55eae14bb140[WOWatchDogChild]> pid 1004365 has been hanging in the same request for 1 minutes
Feb 17 10:24:51 sogod [1004291]: [WARN] <0x0x55eae14baaf0[WOWatchDogChild]> pid 1004369 has been hanging in the same request for 1 minutes
Feb 17 10:24:51 sogod [1004291]: [WARN] <0x0x55eae14cafe0[WOWatchDogChild]> pid 1004375 has been hanging in the same request for 1 minutes
Feb 17 10:24:51 sogod [1004291]: [WARN] <0x0x55eae1479280[WOWatchDogChild]> pid 1004317 has been hanging in the same request for 1 minutes
Feb 17 10:24:51 sogod [1004291]: [WARN] <0x0x55eae14bb420[WOWatchDogChild]> pid 1004363 has been hanging in the same request for 1 minutes
Feb 17 10:24:51 sogod [1004291]: [WARN] <0x0x55eae147a0b0[WOWatchDogChild]> pid 1004319 has been hanging in the same request for 1 minutes
Feb 17 10:24:51 sogod [1004291]: [WARN] <0x0x55eae1479520[WOWatchDogChild]> pid 1004314 has been hanging in the same request for 1 minutes
Feb 17 10:24:51 sogod [1004291]: [WARN] <0x0x55eae14ca6a0[WOWatchDogChild]> pid 1004381 has been hanging in the same request for 1 minutes
Feb 17 10:24:51 sogod [1004291]: [WARN] <0x0x55eae1470130[WOWatchDogChild]> pid 1004300 has been hanging in the same request for 1 minutes
Feb 17 10:24:51 sogod [1004291]: [WARN] <0x0x55eae1479a40[WOWatchDogChild]> pid 1004311 has been hanging in the same request for 1 minutes
Feb 17 10:24:51 sogod [1004291]: [WARN] <0x0x55eae1478cf0[WOWatchDogChild]> pid 1004297 has been hanging in the same request for 1 minutes
Feb 17 10:24:51 sogod [1004291]: [WARN] <0x0x55eae146fec0[WOWatchDogChild]> pid 1004301 has been hanging in the same request for 1 minutes
Feb 17 10:24:51 sogod [1004291]: [WARN] <0x0x55eae1479850[WOWatchDogChild]> pid 1004313 has been hanging in the same request for 1 minutes
Feb 17 10:24:51 sogod [1004291]: [WARN] <0x0x55eae146fa40[WOWatchDogChild]> pid 1004303 has been hanging in the same request for 1 minutes
Note we have Evolution clients also doing REPORT requests, but the difference is the unauthenticated request is not done, unlike with Apple, Evo tries authenticated REPORT requests directly. These interactions don't seem to zombify the server threads.
Hopefully these findings provide some useful clue. We're available to provide more info! |