This section describes issues pertaining to Unicode support that you may face when upgrading to MySQL 5.5 from an older MySQL release. It also provides guidelines for downgrading from MySQL 5.5 to an older release.
In most respects, upgrading to MySQL 5.5 should present few problems with regard to Unicode usage, although there are some potential areas of incompatibility. These are the primary areas of concern:
            For the variable-length character data types
            (VARCHAR and the
            TEXT types), the maximum
            length in characters is less for utf8mb4
            columns than for utf8 columns.
          
            For all character data types
            (CHAR,
            VARCHAR, and the
            TEXT types), the maximum
            number of characters that can be indexed is less for
            utf8mb4 columns than for
            utf8 columns.
          
        Consequently, if you want to upgrade tables from
        utf8 to utf8mb4 to take
        advantage of supplementary-character support, it may be
        necessary to change some column or index definitions.
      
        Tables can be converted from utf8 to
        utf8mb4 by using ALTER
        TABLE. Suppose that a table was originally defined as
        follows:
      
CREATE TABLE t1 ( col1 CHAR(10) CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL, col2 CHAR(10) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL ) CHARACTER SET utf8;
        The following statement converts t1 to use
        utf8mb4:
      
ALTER TABLE t1
  DEFAULT CHARACTER SET utf8mb4,
  MODIFY col1 CHAR(10)
    CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci NOT NULL,
  MODIFY col2 CHAR(10)
    CHARACTER SET utf8mb4 COLLATE utf8mb4_bin NOT NULL;
        In terms of table content, conversion from
        utf8 to utf8mb4 presents
        no problems:
      
            For a BMP character, utf8 and
            utf8mb4 have identical storage
            characteristics: same code values, same encoding, same
            length.
          
            For a supplementary character, utf8
            cannot store the character at all, while
            utf8mb4 requires four bytes to store it.
            Since utf8 cannot store the character at
            all, you do not have any supplementary characters in
            utf8 columns and you need not worry about
            converting characters or losing data when upgrading
            utf8 data from older versions of MySQL.
          
        In terms of table structure, the catch when converting from
        utf8 to utf8mb4 is that
        the maximum length of a column or index key is unchanged in
        terms of bytes. Therefore, it is smaller in
        terms of characters because the maximum
        length of a character is four bytes instead of three. For the
        CHAR,
        VARCHAR, and
        TEXT data types, watch for these
        things when converting your MySQL tables:
      
            Check all definitions of utf8 columns and
            make sure they will not exceed the maximum length for the
            storage engine.
          
            Check all indexes on utf8 columns and
            make sure they will not exceed the maximum length for the
            storage engine. Sometimes the maximum can change due to
            storage engine enhancements.
          
        If the preceding conditions apply, you must either reduce the
        defined length of columns or indexes, or continue to use
        utf8 rather than utf8mb4.
      
Here are some examples where structural changes may be needed:
            A TINYTEXT column can hold up
            to 255 bytes, so it can hold up to 85 three-byte or 63
            four-byte characters. Suppose that you have a
            TINYTEXT column that uses
            utf8 but must be able to contain more
            than 63 characters. You cannot convert it to
            utf8mb4 unless you also change the data
            type to a longer type such as
            TEXT.
          
            Similarly, a very long
            VARCHAR column may need to be
            changed to one of the longer
            TEXT types if you want to
            convert it from utf8 to
            utf8mb4.
          
            InnoDB has a maximum index length of 767
            bytes, so for utf8 or
            utf8mb4 columns, you can index a maximum
            of 255 or 191 characters, respectively. If you currently
            have utf8 columns with indexes longer
            than 191 characters, you will need to index a smaller number
            of characters. In an InnoDB table, these
            column and index definitions are legal:
          
col1 VARCHAR(500) CHARACTER SET utf8, INDEX (col1(255))
            To use utf8mb4 instead, the index must be
            smaller:
          
col1 VARCHAR(500) CHARACTER SET utf8mb4, INDEX (col1(191))
        The preceding types of changes are most likely to be required
        only if you have very long columns or indexes. Otherwise, you
        should be able to convert your tables from
        utf8 to utf8mb4 without
        problems. You can do this by using ALTER
        TABLE as described earlier in this section after
        upgrading in place to 5.5.
      
The following items summarize other potential areas of incompatibility:
            Performance of four-byte UTF-8 (utf8mb4)
            is slower than for three-byte UTF-8
            (utf8). If you do not want to incur this
            penalty, continue to use utf8.
          
            SET NAMES 'utf8mb4' causes use of the
            four-byte character set for connection character sets. As
            long as no four-byte characters are sent from the server,
            there should be no problems. Otherwise, applications that
            expect to receive a maximum of three bytes per character may
            have problems. Conversely, applications that expect to send
            four-byte characters must ensure that the server understands
            them.
          
            Applications cannot send utf16 or
            utf32 character data to an older server
            that does not understand them.
          
            For replication, if the character sets that support
            supplementary characters are going to be used on the master,
            all slaves must understand them as well. If you attempt to
            replicate from a MySQL 5.5 master to an older
            slave, utf8 data will be seen as
            utf8 by the slave and should replicate
            correctly. But you cannot send utf8mb4,
            utf16, or utf32 data.
          
            Also, keep in mind the general principle that if a table has
            different definitions on the master and slave, this can lead
            to unexpected results. For example, the differences in
            limitations on index key length makes it risky to use
            utf8 on the master and
            utf8mb4 on the slave.
          
If you have upgraded to MySQL 5.5, and then decide to downgrade back to an older release, these considerations apply:
            ucs2 and utf8 data
            should present no problems.
          
            Any definitions that refer to the
            utf8mb4, utf16, or
            utf32 character sets will not be
            recognized by the older server.
          
            For object definitions that refer to the
            utf8mb4 character set, you can dump them
            with mysqldump in MySQL 5.5,
            edit the dump file to change instances of
            utf8mb4 to utf8, and
            reload the file in the older server, as long as there are no
            four-byte characters in the data. The older server will see
            utf8 in the dump file object definitions
            and create new objects that use the (three-byte)
            utf8 character set.
          


User Comments
Add your own comment.