View Issue Details

IDProjectCategoryView StatusLast Update
0003657SOGoBackend Address Bookpublic2016-05-16 20:11
ReporterMcMichaeli Assigned Toplevesque  
PrioritynormalSeverityminorReproducibilityalways
Status resolvedResolutionno change required 
PlatformClientOSAppleOS VersioniOS 9.3.1
Product Versionnightly v2 
Summary0003657: Unicode Emoji break CardDAV Sync with iPhone running IOS 9.3.1
Description

When adding some Emoji (used http://apps.timwhitlock.info/emoji/tables/unicode to identify their character codes) to the name fields of a vCard the data is not then synchronised to/from the iPhone.

Test were carried out with the following characters:

U+00A9 (Copyright) - works
U+2714 (Heavy Check Mark) - works
U+2764 (Heart) - works

U+1F347 (Grapes) - fails
U+1F3C9 (Rugby Football) - fails
U+1F46B (Holding Hands) - fails
U+1F602 (Tears of Joy) - fails

As the longer characters fail (in this limited random test) is this a UTF bit length problem, maybe? The sogo.log shows no particular errors.

Adding an unsupported emoji on the iPhone results in sync but when clicking on the record in the addressbook list (which is present but still without the extra characters) no vCard is displayed on the right.

Adding an unsupported emoji in the web interface results in it being displayed correctly. However, the updated record is never pushed to the iPhone.

Please let me know if there is any more data I can collect?

Steps To Reproduce

See the description and edit a contact record to add one of the listed emoji to either the given name, surname or display name fields using either the web interface or native iOS app.

Additional Information

This was tested with SOGo nightly build v3.0.2.20160505-1 on a 64-bit Ubuntu 14.04 LTS server.

TagsNo tags attached.

Activities

McMichaeli

McMichaeli

2016-05-09 19:24

reporter   ~0010083

I have conducted some more tests with "SOGoDebugRequests = YES;" in sogo.conf and during the testing data loss has been experienced. This is now against SOGo nightly 3.0.2.20160509-1. Given the result, should the severity of this bug be raised from minor?

Test case 1:

Log into the web interface.
Add a long unicode (emoji) character (see list in original note) to the surname of an existing vCard and save it.
Move to iPhone and refresh the address book.
Verify that the vCard has vanished from the iPhone.
Verify that the vCard still appears in the web interface.

Edit the web vCard again and remove the long unicode (emoji) character.
Save the record.
Verify that the record is still missing from the iPhone.
Clicking on the record entry in the web index generates a "Not found" notification and sogo.log responds with a 404 to:

GET /SOGo/so/user@example.net/Contacts/personal/2E25-572B7000-1-4C8B3C00.vcf/view

Test case 2:

Edit a contact on the iPhone.
Add a long unicode (emoji) character (see list in original note) to the surname field and change a standard ASCII character in another filed and save it.
Verify contact remains on iPhone and appears correct (with emoji).
Refresh the web interface.
Clicking on the record entry in the index generates a "Not found" notification and sogo.log responds with a 404 to:

GET /SOGo/so/user@example.net/Contacts/personal/4951-56898080-8D-1AC6C3C0/view

On the iPhone, remove the long unicode (emoji) character from the surname and save the entry.
Refresh the web interface.
Verify that the record can now be opened and the other ASCII edit has been synchronised.

Upon investigation in the MySQL tables, 2E25-572B7000-1-4C8B3C00.vcf appears in the sogouserxxx_quick table but not in sogouserxxx at all. Obviously something in the backend transaction that updates the tables breaks and does not complete writing the main data. This is why the web interface displays the i 2E25-572B7000-1-4C8B3C00.vcfndex entry (from the _quick table) and generates a 404 when it goes to retrieve the rest.

Let me know if I can investigate further in some way?

ludovic

ludovic

2016-05-09 19:26

administrator   ~0010084

Are you sure your database tables in MySQL are set to use UTF-8?

plevesque

plevesque

2016-05-09 19:54

reporter   ~0010085

It should be noted that MySQL offers only partial utf8 support before version 5.5.3.

From 5.5.3 onwards, full Unicode is supported, but MySQL database schemas must specify the utf8mb4 encoding to use non-BMP characters such as emoji.

See https://dev.mysql.com/doc/refman/5.5/en/charset-unicode-utf8mb4.html for further information.

McMichaeli

McMichaeli

2016-05-09 20:01

reporter   ~0010086

I think they are correct and have always used the supplied scripts to create/upgrade them as appropriate:

mysql> select table_name,table_collation from information_schema.tables where table_schema = 'sogo';
+-------------------------------------------------+-----------------+
| table_name | table_collation |
+-------------------------------------------------+-----------------+
| sogo_alarms_folder | utf8_general_ci |
| sogo_cache_folder_userb_A_example_D_co_D_uk | utf8_general_ci |
| sogo_cache_folder_usera_A_example_D_net | utf8_general_ci |
| sogo_cache_folder_userc_A_example_D_co_D_uk | utf8_general_ci |
| sogo_folder_info | utf8_general_ci |
| sogo_sessions_folder | utf8_general_ci |
| sogo_user_profile | utf8_general_ci |
| sogouserb0015681af0e | utf8_general_ci |
| sogouserb0015681af0e_acl | utf8_general_ci |
| sogouserb0015681af0e_quick | utf8_general_ci |
| sogouserb0024fba5f5e | utf8_general_ci |
| sogouserb0024fba5f5e_acl | utf8_general_ci |
| sogouserb0024fba5f5e_quick | utf8_general_ci |
| sogouserb00305219539 | utf8_general_ci |
| sogouserb00305219539_acl | utf8_general_ci |
| sogouserb00305219539_quick | utf8_general_ci |
| sogouserb00115938786 | utf8_general_ci |
| sogouserb00115938786_acl | utf8_general_ci |
| sogouserb00115938786_quick | utf8_general_ci |
| sogouserb00168c2a3af | utf8_general_ci |
| sogouserb00168c2a3af_acl | utf8_general_ci |
| sogouserb00168c2a3af_quick | utf8_general_ci |
| sogouserd00114403dff | utf8_general_ci |
| sogouserd00114403dff_acl | utf8_general_ci |
| sogouserd00114403dff_quick | utf8_general_ci |
| sogouserd00120958972 | utf8_general_ci |
| sogouserd00120958972_acl | utf8_general_ci |
| sogouserd00120958972_quick | utf8_general_ci |
| sogouserd001247a21d6 | utf8_general_ci |
| sogouserd001247a21d6_acl | utf8_general_ci |
| sogouserd001247a21d6_quick | utf8_general_ci |
| sogouserd0014ec8f800 | utf8_general_ci |
| sogouserd0014ec8f800_acl | utf8_general_ci |
| sogouserd0014ec8f800_quick | utf8_general_ci |
| sogouserd00277dc8f9b | utf8_general_ci |
| sogouserd00277dc8f9b_acl | utf8_general_ci |
| sogouserd00277dc8f9b_quick | utf8_general_ci |
| sogousera001021d7a68 | utf8_general_ci |
| sogousera001021d7a68_acl | utf8_general_ci |
| sogousera001021d7a68_quick | utf8_general_ci |
| sogousera002753ed0c2 | utf8_general_ci |
| sogousera002753ed0c2_acl | utf8_general_ci |
| sogousera002753ed0c2_quick | utf8_general_ci |
| sogousera001109dc3c5 | utf8_general_ci |
| sogousera001109dc3c5_acl | utf8_general_ci |
| sogousera001109dc3c5_quick | utf8_general_ci |
| sogousera0011bada576 | utf8_general_ci |
| sogousera0011bada576_acl | utf8_general_ci |
| sogousera0011bada576_quick | utf8_general_ci |
| sogouserc0015d11fd1b | utf8_general_ci |
| sogouserc0015d11fd1b_acl | utf8_general_ci |
| sogouserc0015d11fd1b_quick | utf8_general_ci |
| sogouserc00170ab880c | utf8_general_ci |
| sogouserc00170ab880c_acl | utf8_general_ci |
| sogouserc00170ab880c_quick | utf8_general_ci |
| sogouserc00257df3175 | utf8_general_ci |
| sogouserc00257df3175_acl | utf8_general_ci |
| sogouserc00257df3175_quick | utf8_general_ci |
| sogousere0012d2735c6 | utf8_general_ci |
| sogousere0012d2735c6_acl | utf8_general_ci |
| sogousere0012d2735c6_quick | utf8_general_ci |
| sogousere0017fd97cf7 | utf8_general_ci |
| sogousere0017fd97cf7_acl | utf8_general_ci |
| sogousere0017fd97cf7_quick | utf8_general_ci |
| sogouserf00107134022 | utf8_general_ci |
| sogouserf00107134022_acl | utf8_general_ci |
| sogouserf00107134022_quick | utf8_general_ci |
| sogouserf00177e97da5 | utf8_general_ci |
| sogouserf00177e97da5_acl | utf8_general_ci |
| sogouserf00177e97da5_quick | utf8_general_ci |
| sogouserg0016dc82354 | utf8_general_ci |
| sogouserg0016dc82354_acl | utf8_general_ci |
| sogouserg0016dc82354_quick | utf8_general_ci |
| sogouserg00257120339 | utf8_general_ci |
| sogouserg00257120339_acl | utf8_general_ci |
| sogouserg00257120339_quick | utf8_general_ci |
| sogouserh00127cdceec | utf8_general_ci |
| sogouserh00127cdceec_acl | utf8_general_ci |
| sogouserh00127cdceec_quick | utf8_general_ci |
| sogouserh0021bcf8a87 | utf8_general_ci |
| sogouserh0021bcf8a87_acl | utf8_general_ci |
| sogouserh0021bcf8a87_quick | utf8_general_ci |
+-------------------------------------------------+-----------------+
82 rows in set (0.00 sec)

McMichaeli

McMichaeli

2016-05-09 20:05

reporter   ~0010087

I'm running MySQL 5.5.49 on Ubuntu 14.04 LTS. Does this mean it could work and I need to alter the tables in some way? Is the SOGo code all compliant with utf8mb4?

plevesque

plevesque

2016-05-09 20:18

reporter   ~0010088

I did not test this mysqlf, but as far as I know, emoji was working at some point on some setup using MySQL. You can check the encoding (your SQL query above returned the collation, not the encoding) with this:

SELECT
TABLE_NAME, COLUMN_NAME, CHARACTER_SET_NAME
FROM
information_schema.COLUMNS
WHERE
table_schema = 'sogo';

You should get 'utf8mb4' as CHARACTER_SET_NAME; if you get 'utf8', emoji will not work.

McMichaeli

McMichaeli

2016-05-09 20:36

reporter   ~0010089

Last edited: 2016-05-09 20:36

I have 'utf8' in all the character columns when I run the suggested SQL query. I guess this explains the issues I am seeing.

As per the v3.0.2 Installation and Configuration guide, I have the following in the my.cnf file:

[mysqld]
character_set_server=utf8
character_set_client=utf8

[client]
default-character-set=utf8

[mysql]
default-character-set=utf8

Given that these settings would have been present when the database was initially created, is this the reason for the lack of utf8mb4 support?

What should I do to fix this? Update the my.cnf (all lines?) to utf8mb4? Then I will have to run some sort of magic ALTER TABLE command to migrate the column types?

Given that the standard installation (following the manual) presumably only supports utf8, should the data loss element of this bug report not be investigated further as I would imagine my configuration is fairly standard?

plevesque

plevesque

2016-05-09 21:15

reporter   ~0010090

Your assertions appear valid to me; changes to the my.cnf file and fixing the tables will be required. This guide should ensure everything is covered: https://mathiasbynens.be/notes/mysql-utf8mb4

As for the data loss, I would admit the documentation could be improved when it comes to MySQL peculiarities; Unicode non-BMP support lacks uniformity (for instance, Red Hat 6 still pushes MySQL 5.1 and will be EOL'd only in 2020) and character sets defaults vary wildly from one distro to the next.

I will file this ticket as resolved, please do not hesitate to reopen if utf8mb4 doesn't completely solve your issues.

plevesque

plevesque

2016-05-16 20:11

reporter   ~0010140

I dug into the issue and found out SOPE needs to be forced to connect using utf8mb4 (it was using utf8). Code to handle that was just pushed to the master branch.

There is now a tested and documented way to ensure emoji are stored correctly under MySQL. It should work starting from tomorrow's nightly, or you may wait until 3.1.0 is released.

Note that this involves serious database server reconfiguration and might be challenging for already deployed database servers, especially if they include data other than SOGo's.

Documentation does not include conversion of an already-deployed SOGo instance, but should suffice for a seasoned MySQL administrator to achieve it.

Documentation is at:
https://github.com/inverse-inc/sogo/blob/master/Documentation/SOGoInstallationGuide.asciidoc#user-content-mysql-complete-unicode-compliance

Issue History

Date Modified Username Field Change
2016-05-05 16:54 McMichaeli New Issue
2016-05-09 19:24 McMichaeli Note Added: 0010083
2016-05-09 19:26 ludovic Note Added: 0010084
2016-05-09 19:54 plevesque Note Added: 0010085
2016-05-09 20:01 McMichaeli Note Added: 0010086
2016-05-09 20:05 McMichaeli Note Added: 0010087
2016-05-09 20:18 plevesque Note Added: 0010088
2016-05-09 20:36 McMichaeli Note Added: 0010089
2016-05-09 20:36 McMichaeli Note Edited: 0010089
2016-05-09 21:15 plevesque Note Added: 0010090
2016-05-09 21:16 plevesque Status new => resolved
2016-05-09 21:16 plevesque Resolution open => no change required
2016-05-09 21:16 plevesque Assigned To => plevesque
2016-05-16 20:11 plevesque Note Added: 0010140