Description | I have a memory exhaustion problem with SOGO ActiveSync with
large mailboxes and judging by the memory that gets used up, the amount
of memory needed is insane and looks like a bug to me.
So here we have a Mailbox with approx. 1100 Folders (including subfolders):
MariaDB [sogo]> select count() from sogo_cache_folder_username where c_path like '/0719227322A744D180EEC2476AC6FA65%';
+----------+
| count() |
+----------+
| 1143 |
+----------+
1 row in set (0,004 sec)
Now let's check how much memory the cached entries use up in encoded form:
MariaDB [sogo]> select sum(length(c_content)) from sogo_cache_folder_username where c_path like '/0719227322A744D180EEC2476AC6FA65%';
+------------------------+
| sum(length(c_content)) |
+------------------------+
| 3166553 |
+------------------------+
1 row in set (0,020 sec)
This is 3MB, not that much.
However when the MS Outlook client tries to sync the mailbox, the sogod process constantly blows up:
Oct 05 10:37:46 sogod [21946]: |SOGo| terminating app, vMem size limit (1024 MB) has been reached (currently 1717 MB)
Oct 05 10:39:43 sogod [22214]: |SOGo| terminating app, vMem size limit (1024 MB) has been reached (currently 1710 MB)
Oct 05 10:41:39 sogod [23618]: |SOGo| terminating app, vMem size limit (1024 MB) has been reached (currently 1717 MB)
Oct 05 10:43:35 sogod [24654]: |SOGo| terminating app, vMem size limit (1024 MB) has been reached (currently 1721 MB)
Oct 05 10:45:34 sogod [24988]: |SOGo| terminating app, vMem size limit (1024 MB) has been reached (currently 1720 MB)
Oct 05 10:47:35 sogod [23931]: |SOGo| terminating app, vMem size limit (1024 MB) has been reached (currently 1717 MB)
Oct 05 10:49:31 sogod [25633]: |SOGo| terminating app, vMem size limit (1024 MB) has been reached (currently 1716 MB)
Oct 05 10:51:28 sogod [25947]: |SOGo| terminating app, vMem size limit (1024 MB) has been reached (currently 1725 MB)
Oct 05 10:53:25 sogod [26250]: |SOGo| terminating app, vMem size limit (1024 MB) has been reached (currently 1716 MB)
Oct 05 10:55:22 sogod [26511]: |SOGo| terminating app, vMem size limit (1024 MB) has been reached (currently 1716 MB)
Oct 05 10:57:18 sogod [26825]: |SOGo| terminating app, vMem size limit (1024 MB) has been reached (currently 1725 MB)
Oct 05 10:58:16 sogod [21941]: |SOGo| terminating app, vMem size limit (1024 MB) has been reached (currently 1717 MB)
Oct 05 11:00:13 sogod [27281]: |SOGo| terminating app, vMem size limit (1024 MB) has been reached (currently 1716 MB)
We have the following settings (as can be seen from error msg):
SxVMemLimit = 1024;
SOGoMaximumSyncWindowSize = 100;
SOGoMaximumSyncResponseSize = 10240;
WOWatchDogRequestTimeout = 60;
WOWorkersCount = 10;
Now in order to find the cause for this ridiculous amount of memory being used, strace has been used.
We see:
strace: Process 28020 attached
brk(0x560f8b0c0000) = 0x560f8b0c0000
brk(0x560f8b0e1000) = 0x560f8b0e1000
sendto(13, "]\0\0\0\3SELECT FROM sogo_cache_folder_user WHERE cpath = '/0719227322A744D180EEC2476AC6FA65';", 97, MSG
recvfrom(13, 0x560f4ea0a2f0, 16384, MSG_DONTWAIT, NULL, NULL) = -1 EAGAIN (Die Ressource ist zur Zeit nicht verfügbar)
poll([{fd=13, events=POLLIN}], 1, -1) = 1 ([{fd=13, revents=POLLIN}])
recvfrom(13, "\2\0\0\1\10\1[\0\0\2\3def\4sogo\32sogo_cache_folder_user\32sogo_cache_folder_user\6c_path\6c_path\0\f!
recvfrom(13, "Y1N2Q1\nZjAwMDBl...
brk(0x560f8b125000) = 0x560f8b125000
recvfrom(13, "\5\0\0\f\376\0\0\2\0", 16384, MSG_DONTWAIT, NULL, NULL) = 9
brk(0x560f8b17d000) = 0x560f8b17d000
brk(0x560f8b1a9000) = 0x560f8b1a9000
brk(0x560f8b1d5000) = 0x560f8b1d5000
brk(0x560f8b1f6000) = 0x560f8b1f6000
brk(0x560f8b217000) = 0x560f8b217000
sendto(13, "]\0\0\0\3SELECT FROM sogo_cache_folder_user WHERE cpath = '/0719227322A744D180EEC2476AC6FA65';", 97, MSG
recvfrom(13, 0x560f4ea0a2f0, 16384, MSG_DONTWAIT, NULL, NULL) = -1 EAGAIN (Die Ressource ist zur Zeit nicht verfügbar)
poll([{fd=13, events=POLLIN}], 1, -1) = 1 ([{fd=13, revents=POLLIN}])
recvfrom(13, "\2\0\0\1\10\1[\0\0\2\3def\4sogo\32sogo_cache_folder_user\32sogo_cache_folder_user\6c_path\6c_path\0\f!
recvfrom(13, "Y1N2Q1\nZjAwMDBl...
brk(0x560f8b25b000) = 0x560f8b25b000
recvfrom(13, "\5\0\0\f\376\0\0\2\0", 16384, MSG_DONTWAIT, NULL, NULL) = 9
brk(0x560f8b2b4000) = 0x560f8b2b4000
brk(0x560f8b2e0000) = 0x560f8b2e0000
brk(0x560f8b30c000) = 0x560f8b30c000
brk(0x560f8b32d000) = 0x560f8b32d000
brk(0x560f8b34e000) = 0x560f8b34e000
sendto(13, "]\0\0\0\3SELECT * FROM sogo_cache_folder_user WHERE cpath = '/0719227322A744D180EEC2476AC6FA65';", 97, MSG
recvfrom(13, 0x560f4ea0a2f0, 16384, MSG_DONTWAIT, NULL, NULL) = -1 EAGAIN (Die Ressource ist zur Zeit nicht verfügbar)
poll([{fd=13, events=POLLIN}], 1, -1) = 1 ([{fd=13, revents=POLLIN}])
recvfrom(13, "\2\0\0\1\10\1[\0\0\2\3def\4sogo\32sogo_cache_folder_user\32sogo_cache_folder_user\6c_path\6c_path\0\f!
recvfrom(13, "Y1N2Q1\nZjAwMDBl...
brk(0x560f8b392000) = 0x560f8b392000
recvfrom(13, "\5\0\0\f\376\0\0\2\0", 16384, MSG_DONTWAIT, NULL, NULL) = 9
brk(0x560f8b3ea000) = 0x560f8b3ea000
brk(0x560f8b416000) = 0x560f8b416000
brk(0x560f8b442000) = 0x560f8b442000
brk(0x560f8b463000) = 0x560f8b463000
brk(0x560f8b484000) = 0x560f8b484000
...
This goes on and on, presumably one call for each folder and this is just the Root path folder, not the
actual folder!
...
sendto(13, "]\0\0\0\3SELECT * FROM sogo_cache_folder_user WHERE cpath = '/0719227322A744D180EEC2476AC6FA65';", 97, MSG
recvfrom(13, 0x560f4ea0a2f0, 16384, MSG_DONTWAIT, NULL, NULL) = -1 EAGAIN (Die Ressource ist zur Zeit nicht verfügbar)
poll([{fd=13, events=POLLIN}], 1, -1) = 1 ([{fd=13, revents=POLLIN}])
recvfrom(13, "\2\0\0\1\10\1[\0\0\2\3def\4sogo\32sogo_cache_folder_user\32sogo_cache_folder_user\6c_path\6c_path\0\f!
recvfrom(13, "Y1N2Q1\nZjAwMDBlNDk2ZjljZQB0bQAAEFN5bmNSZXF1ZXN0K21haWwvNzA1MDExMDY0NmI1MTk2NTdk\nNWYwMDAwZTQ5NmY5Y2UAdG0AABBT
brk(0x560fa5248000) = 0x560fa5248000
recvfrom(13, "\5\0\0\f\376\0\0\2\0", 16384, MSG_DONTWAIT, NULL, NULL) = 9
brk(0x560fa52a0000) = 0x560fa52a0000
brk(0x560fa52cc000) = 0x560fa52cc000
brk(0x560fa52f8000) = 0x560fa52f8000
brk(0x560fa5319000) = 0x560fa5319000
brk(0x560fa533a000) = 0x560fa533a000
MariaDB [sogo]> select length(c_content) FROM sogo_cache_folder_username WHERE c_path = '/0719227322A744D180EEC2476AC6FA65';
+-------------------+
| length(c_content) |
+-------------------+
| 180257 |
+-------------------+
180kb Expands to about 1MB in memory according to brk()! Quite a lot!
So if we assume 1100 folders and 1 call for each folder, we are wasting 1.1 GB without loading the actual folder object.
Then it goes on with loading the actual cached folder objects:
sendto(12, "71 status \"INBOX.Archiv.Sample\" (UIDVALIDITY)", 44, MSG_NOSIGNAL, NULL, 0) = 44
sendto(12, "\r\n", 2, MSG_NOSIGNAL, NULL, 0) = 2
recvfrom(12, 0x560f4d603c20, 512, 0, NULL, NULL) = -1 EAGAIN (Die Ressource ist zur Zeit nicht verfügbar)
poll([{fd=12, events=POLLRDNORM}], 1, 3600000) = 1 ([{fd=12, revents=POLLRDNORM}])
recvfrom(12, " STATUS INBOX.Archiv.Sample (UIDVALIDITY 1696172129)\r\n71 OK Status completed (0.001 + 0.000 secs).\r\n", 512
sendto(13, "\204\0\0\0\3SELECT FROM sogo_cache_folder_user WHERE c_path = '/0719227322A744D180EEC2476AC6FA65+folder00c
recvfrom(13, 0x560fa5961d80, 16384, MSG_DONTWAIT, NULL, NULL) = -1 EAGAIN (Die Ressource ist zur Zeit nicht verfügbar)
poll([{fd=13, events=POLLIN}], 1, -1) = 1 ([{fd=13, revents=POLLIN}])
recvfrom(13, "\2\0\0\1\10\1[\0\0\2\3def\4sogo\32sogo_cache_folder_user\32sogo_cache_folder_user\6c_path\6c_path\0\f!
sendto(13, "\226\0\0\0\3SELECT * FROM sogo_cache_folder_user WHERE c_path = '/0719227322A744D180EEC2476AC6FA65+folder00c
recvfrom(13, 0x560fa5961d80, 16384, MSG_DONTWAIT, NULL, NULL) = -1 EAGAIN (Die Ressource ist zur Zeit nicht verfügbar)
poll([{fd=13, events=POLLIN}], 1, -1) = 1 ([{fd=13, revents=POLLIN}])
recvfrom(13, "\2\0\0\1\10\1[\0\0\2\3def\4sogo\32sogo_cache_folder_user\32sogo_cache_folder_user\6c_path\6c_path\0\f!
poll([{fd=12, events=POLLIN}], 1, 5) = 0 (Timeout)
sendto(12, "72 unselect", 11, MSG_NOSIGNAL, NULL, 0) = 11
sendto(12, "\r\n", 2, MSG_NOSIGNAL, NULL, 0) = 2
recvfrom(12, 0x560f4d603c20, 512, 0, NULL, NULL) = -1 EAGAIN (Die Ressource ist zur Zeit nicht verfügbar)
poll([{fd=12, events=POLLRDNORM}], 1, 3600000) = 1 ([{fd=12, revents=POLLRDNORM}])
recvfrom(12, "72 OK Unselect completed (0.001 + 0.001 secs).\r\n", 512, 0, NULL, NULL) = 48
sendto(12, "73 select \"INBOX.Archiv.Sample\"", 30, MSG_NOSIGNAL, NULL, 0) = 30
sendto(12, "\r\n", 2, MSG_NOSIGNAL, NULL, 0) = 2
recvfrom(12, 0x560f4d603c20, 512, 0, NULL, NULL) = -1 EAGAIN (Die Ressource ist zur Zeit nicht verfügbar)
poll([{fd=12, events=POLLRDNORM}], 1, 3600000) = 1 ([{fd=12, revents=POLLRDNORM}])
recvfrom(12, " FLAGS (\Answered \Flagged \Deleted \Seen \Draft)\r\n OK [PERMANENTFLAGS (\Answered \Flagged \Delete
sendto(13, "\226\0\0\0\3SELECT FROM sogo_cache_folder_user WHERE c_path = '/0719227322A744D180EEC2476AC6FA65+folder00c
recvfrom(13, 0x560fa5961d80, 16384, MSG_DONTWAIT, NULL, NULL) = -1 EAGAIN (Die Ressource ist zur Zeit nicht verfügbar)
poll([{fd=13, events=POLLIN}], 1, -1) = 1 ([{fd=13, revents=POLLIN}])
recvfrom(13, "\2\0\0\1\10\1[\0\0\2\3def\4sogo\32sogo_cache_folder_user\32sogo_cache_folder_user\6c_path\6c_path\0\f!
poll([{fd=12, events=POLLIN}], 1, 5) = 0 (Timeout)
poll([{fd=12, events=POLLIN}], 1, 5) = 0 (Timeout)
poll([{fd=12, events=POLLIN}], 1, 5) = 0 (Timeout)
poll([{fd=12, events=POLLIN}], 1, 5) = 0 (Timeout)
poll([{fd=12, events=POLLIN}], 1, 5) = 0 (Timeout)
poll([{fd=12, events=POLLIN}], 1, 5) = 0 (Timeout)
sendto(12, "74 status \"INBOX.Diverses.Sample2\" (UIDVALIDITY)", 46, MSG_NOSIGNAL, NULL, 0) = 46
sendto(12, "\r\n", 2, MSG_NOSIGNAL, NULL, 0) = 2
recvfrom(12, " STATUS INBOX.Diverses.Sample2 (UIDVALIDITY 1696171750)\r\n74 OK Status completed (0.001 + 0.000 secs).\r\n", 5
brk(0x560fa66d7000) = 0x560fa66d7000
The allocated memory for these calls isn't that much as above, judging by brk(), some use more, some less,
according to size, but usual around a few hundered kb.
Together with the additional 1MB per folder from above, it is clear, that it blows up with not enough memory.
Here is what I could spot with the debugger:
(please refer to attachment, as pasting it here causes internal sever error due to known problems with filtering html tags)
I don't know anything about Objective-C memory management, as I only understand C, which is
far simple with its malloc/free functions, but I guess the statement
globalMetadata = [self globalMetadataForDevice];
cannot just be put outside the loop and one needs to have a "fresh" state for every folder to sync?
But as can be seen in strace, it clearly leaks loads of memory by doing so.
Do we just need a RELEASE(globalMetadata) at the end of the loop?
Whatever, the current behaviour makes EAS barely usable on large mailboxes with a lot of folders
in its folder structure and causes severe problems for me. |
---|