Metadata is “the data about the
        data.” Anything that describes the
        database — as opposed to being the
        contents of the database — is
        metadata. Thus column names, database names, user names, version
        names, and most of the string results from
        SHOW are metadata. This is also
        true of the contents of tables in
        INFORMATION_SCHEMA, because those tables by
        definition contain information about database objects.
      
Representation of metadata must satisfy these requirements:
            All metadata must be in the same character set. Otherwise,
            neither the SHOW commands nor
            SELECT statements for tables
            in INFORMATION_SCHEMA would work properly
            because different rows in the same column of the results of
            these operations would be in different character sets.
          
Metadata must include all characters in all languages. Otherwise, users would not be able to name columns and tables using their own languages.
To satisfy both requirements, MySQL stores metadata in a Unicode character set, namely UTF-8. This does not cause any disruption if you never use accented or non-Latin characters. But if you do, you should be aware that metadata is in UTF-8.
        The metadata requirements mean that the return values of the
        USER(),
        CURRENT_USER(),
        SESSION_USER(),
        SYSTEM_USER(),
        DATABASE(), and
        VERSION() functions have the
        UTF-8 character set by default.
      
        The server sets the
        character_set_system system
        variable to the name of the metadata character set:
      
mysql> SHOW VARIABLES LIKE 'character_set_system';
+----------------------+-------+
| Variable_name        | Value |
+----------------------+-------+
| character_set_system | utf8  |
+----------------------+-------+
        Storage of metadata using Unicode does not
        mean that the server returns headers of columns and the results
        of DESCRIBE functions in the
        character_set_system character
        set by default. When you use SELECT column1 FROM
        t, the name column1 itself is
        returned from the server to the client in the character set
        determined by the value of the
        character_set_results system
        variable, which has a default value of
        latin1. If you want the server to pass
        metadata results back in a different character set, use the
        SET NAMES statement to force the server to
        perform character set conversion. SET NAMES
        sets the character_set_results
        and other related system variables. (See
        Section 9.1.4, “Connection Character Sets and Collations”.) Alternatively, a client
        program can perform the conversion after receiving the result
        from the server. It is more efficient for the client perform the
        conversion, but this option is not always available for all
        clients.
      
        If character_set_results is set
        to NULL, no conversion is performed and the
        server returns metadata using its original character set (the
        set indicated by
        character_set_system).
      
Error messages returned from the server to the client are converted to the client character set automatically, as with metadata.
        If you are using (for example) the
        USER() function for comparison or
        assignment within a single statement, don't worry. MySQL
        performs some automatic conversion for you.
      
SELECT * FROM t1 WHERE USER() = latin1_column;
        This works because the contents of
        latin1_column are automatically converted to
        UTF-8 before the comparison.
      
INSERT INTO t1 (latin1_column) SELECT USER();
        This works because the contents of
        USER() are automatically
        converted to latin1 before the assignment.
      
Although automatic conversion is not in the SQL standard, the SQL standard document does say that every character set is (in terms of supported characters) a “subset” of Unicode. Because it is a well-known principle that “what applies to a superset can apply to a subset,” we believe that a collation for Unicode can apply for comparisons with non-Unicode strings. For more information about coercion of strings, see Section 9.1.7.5, “Special Cases Where Collation Determination Is Tricky”.


User Comments
Add your own comment.