[+/-]
ucs2
Character Set (UCS-2 Unicode Encoding)utf16
Character Set (UTF-16 Unicode Encoding)utf32
Character Set (UTF-32 Unicode Encoding)utf8
Character Set (Three-Byte UTF-8 Unicode
Encoding)utf8mb3
“Character Set” (Alias for
utf8
)utf8mb4
Character Set (Four-Byte UTF-8 Unicode
Encoding)The initial implementation of Unicode support (in MySQL 4.1) included two character sets for storing Unicode data:
ucs2
, the UCS-2 encoding of the Unicode
character set using 16 bits per character
utf8
, a UTF-8 encoding of the Unicode
character set using one to three bytes per character
These two character sets support the characters from the Basic Multilingual Plane (BMP) of Unicode Version 3.0. BMP characters have these characteristics:
Their code values are between 0 and 65535 (or
U+0000
.. U+FFFF
)
They can be encoded with a fixed 16-bit word, as in
ucs2
They can be encoded with 8, 16, or 24 bits, as in
utf8
They are sufficient for almost all characters in major languages
Characters not supported by the aforementioned character sets include supplementary characters that lie outside the BMP. As of MySQL 5.5.3, Unicode support is extended to include supplementary characters, which requires new character sets that have a broader range and therefore take more space. The following table shows a brief feature comparison of previous and current Unicode support.
Before MySQL 5.5 | MySQL 5.5 |
All Unicode 3.0 characters | All Unicode 5.0 characters |
No supplementary characters | With supplementary characters |
ucs2 character set, BMP only |
No change |
utf8 character set for up to three bytes, BMP only |
No change |
New utf8mb4 character set for up to four bytes, BMP
or supplemental |
|
New utf16 character set, BMP or supplemental |
|
New utf32 character set, BMP or supplemental |
These changes are upward compatible. If you want to use the new
character sets, there are potential incompatibility issues for
your applications; see
Section 9.1.11, “Upgrading from Previous to Current Unicode Support”. That section also
describes how to convert tables from utf8
to
the (four-byte) utf8mb4
character set, and
what constraints may apply in doing so.
MySQL 5.5 supports these Unicode character sets:
ucs2
, the UCS-2 encoding of the Unicode
character set using 16 bits per character
utf16
, the UTF-16 encoding for the
Unicode character set; like ucs2
but with
an extension for supplementary characters
utf32
, the UTF-32 encoding for the
Unicode character set using 32 bits per character
utf8
, a UTF-8 encoding of the Unicode
character set using one to three bytes per character
utf8mb4
, a UTF-8 encoding of the Unicode
character set using one to four bytes per character
ucs2
and utf8
support BMP
characters. utf8mb4
,
utf16
, and utf32
support
BMP and supplementary characters.
A similar set of collations is available for each Unicode
character set. For example, each has a Danish collation, the
names of which are ucs2_danish_ci
,
utf16_danish_ci
,
utf32_danish_ci
,
utf8_danish_ci
, and
utf8mb4_danish_ci
. All Unicode collations are
listed at Section 9.1.14.1, “Unicode Character Sets”, which also
describes collation properties for supplementary characters.
Note that although many of the supplementary characters come from East Asian languages, what MySQL 5.5 adds is support for more Japanese and Chinese characters in Unicode character sets, not support for new Japanese and Chinese character sets.
The MySQL implementation of UCS-2, UTF-16, and UTF-32 stores characters in big-endian byte order and does not use a byte order mark (BOM) at the beginning of values. Other database systems might use little-endian byte order or a BOM. In such cases, conversion of values will need to be performed when transferring data between those systems and MySQL.
MySQL uses no BOM for UTF-8 values.
Client applications that need to communicate with the server
using Unicode should set the client character set accordingly;
for example, by issuing a SET NAMES 'utf8'
statement. ucs2
, utf16
,
and utf32
cannot be used as a client
character set, which means that they do not work for
SET NAMES
or SET CHARACTER
SET
. (See Section 9.1.4, “Connection Character Sets and Collations”.)
The following sections provide additional detail on the Unicode character sets in MySQL.
User Comments
Connect with the same characterset as your data to display correctly. This example connects to the MySQL-server using UTF-8:
mysql --default-character-set=utf8 -uyour_username -p -h your_databasehost.your_domain.com your_database
If you get into trouble from a PHP-based web application, check the characterset configurations of these components:
1) the MySQL database
2) php.ini
3) httpd.conf
4) your server
if you get data via php from your mysql-db (everything utf-8)
but still get '?' for some special characters in your browser
(<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />),
try this:
after mysql_connect() , and mysql_select_db() add this lines:
mysql_query("SET NAMES utf8");
worked for me.
i tried first with the utf8_encode, but this only worked for äüöéè...
and so on, but not for kyrillic and other chars.
I had a problem submitting unicode data from ASP pages to the MySQL server while everything was set to utf8 .
It turns out the problem was that my ODBC driver was version 3.5.1 and that's what caused the problem. Installing version 5.1 solved the problem.
http://dev.mysql.com/downloads/connector/odbc/
As of mySQL 5.x you can use the init_connect commands to force UTF-8 compliance from any client connection.
I have blogged about this here: http://www.saiweb.co.uk/mysql/mysql-forcing-utf-8-compliance-for-all-connections
Removing the need to use SET NAME in your PHP/ASP/Ruby/C++ code.
Add your own comment.