[+/-]
This section discusses the procedure for adding a new character set to MySQL. You must have a MySQL source distribution to use these instructions. There is one procedure for MySQL 4.1 and a different one for MySQL 4.0 or older. For either procedure, the instructions depend on whether the character set is simple or complex:
If the character set does not need to use special string collating routines for sorting and does not need multi-byte character support, it is simple.
If the character set needs either of those features, it is complex.
      For example, greek and swe7
      are simple character sets, whereas big5 and
      czech are complex character sets.
    
      In the following instructions, MYSET
      represents the name of the character set that you want to add.
    
If you have MySQL 4.1, use this procedure to add a new character set:
          Add a <charset> element for
          MYSET to the
          sql/share/charsets/Index.xml file. Use
          the existing contents in the file as a guide to adding new
          contents.
        
          The <charset> element must list all
          the collations for the character set. These must include at
          least a binary collation and a default collation. The default
          collation is usually named using a suffix of
          general_ci (general, case insensitive). It
          is possible for the binary collation to be the default
          collation, but usually they are different. The default
          collation should have a primary flag. The
          binary collation should have a binary flag.
        
You must assign a unique ID number to each collation, chosen from the range 1 to 254. To see the currently used collation IDs, use this query:
SHOW COLLATION;
This step depends on whether you are adding a simple or complex character set. A simple character set requires only a configuration file, whereas a complex character set requires C source file that defines collation functions, multi-byte functions, or both.
          For a simple character set, create a configuration file,
          MYSET.xmlsql/share/charsets directory. (You
          can use a copy of latin1.xml as the basis
          for this file.) The syntax for the file is very simple:
        
              Comments are written as ordinary XML comments
              (<!-- ).
            text
              -->
              Words within <map> array elements
              are separated by arbitrary amounts of whitespace.
            
              Each word within <map> array
              elements must be a number in hexadecimal format.
            
              The <map> array element for the
              <ctype> element has 257 words.
              The other <map> array elements
              after that have 256 words. See
              Section 9.4.1, “The Character Definition Arrays”.
            
              For each collation listed in the
              <charset> element for the
              character set in Index.xml,
              MYSET.xml<collation>
              element that defines the character ordering.
            
For a complex character set, create a C source file that describes the character set properties and defines the support routines necessary to properly perform operations on the character set:
              Create the file
              ctype-
              in the MYSET.cstrings directory. Look at one
              of the existing ctype-*.c files (such
              as ctype-big5.c) to see what needs to
              be defined. The arrays in your file must have names like
              ctype_,
              MYSETto_lower_,
              and so on. These correspond to the arrays for a simple
              character set. See Section 9.4.1, “The Character Definition Arrays”.
            MYSET
              For each collation listed in the
              <charset> element for the
              character set in Index.xml, the
              ctype-
              file must provide an implementation of the collation.
            MYSET.c
If you need string collating functions, see Section 9.4.2, “String Collating Support”.
If you need multi-byte character support, see Section 9.4.3, “Multi-Byte Character Support”.
          Follow these steps to modify the configuration information.
          Use the existing configuration information as a guide to
          adding information for MYSYS. The
          example here assumes that the character set has default and
          binary collations, but more lines will be needed if
          MYSET has additional collations.
        
              Edit mysys/charset-def.c, and
              “register” the collations for the new
              character set.
            
Add these lines to the “declaration” section:
#ifdef HAVE_CHARSET_MYSETextern CHARSET_INFO my_charset_MYSET_general_ci; extern CHARSET_INFO my_charset_MYSET_bin; #endif
Add these lines to the “registration” section:
#ifdef HAVE_CHARSET_MYSETadd_compiled_collation(&my_charset_MYSET_general_ci); add_compiled_collation(&my_charset_MYSET_bin); #endif
              If the character set uses
              ctype-,
              edit MYSET.cstrings/Makefile.am and add
              ctype-
              to each definition of the MYSET.cCSRCS
              variable, and to the EXTRA_DIST
              variable.
            
              If the character set uses
              ctype-,
              edit MYSET.clibmysql/Makefile.shared and add
              ctype-
              to the MYSET.lomystringsobjects definition.
            
              Edit configure.in:
            
                  Add MYSET to one of the
                  define(CHARSETS_AVAILABLE...) lines
                  in alphabetic order.
                
                  Add MYSET to
                  CHARSETS_COMPLEX. This is needed
                  even for simple character sets, or
                  configure will not recognize
                  --with-charset=.
                MYSET
                  Add MYSET to the first
                  case control structure. Omit the
                  USE_MB and
                  USE_MB_IDENT lines for 8-bit
                  character sets.
                
MYSET) AC_DEFINE(HAVE_CHARSET_MYSET, 1, [Define to enable charsetMYSET]) AC_DEFINE([USE_MB], 1, [Use multi-byte character routines]) AC_DEFINE(USE_MB_IDENT, 1) ;;
                  Add MYSET to the second
                  case control structure:
                
MYSET) default_charset_default_collation="MYSET_general_ci" default_charset_collations="MYSET_general_ciMYSET_bin" ;;
Reconfigure, recompile, and test.
If you have MySQL 4.0 or older, use this procedure to add a new character set:
          Add MYSET to the end of the
          sql/share/charsets/Index file. Assign a
          unique number to it.
        
This step depends on whether you are adding a simple or complex character set. A simple character set requires only a configuration file, whereas a complex character set requires C source file that defines collation functions, multi-byte functions, or both.
          For a simple character set, create a configuration file that
          describes the character set properties. Create the file
          MYSET.confsql/share/charsets directory.
          (You can use a copy of latin1.conf as the
          basis for this file.) The syntax for the file is very simple:
        
              Comments start with a “#”
              character and continue to the end of the line.
            
Words are separated by arbitrary amounts of whitespace.
When defining the character set, every word must be a number in hexadecimal format.
              The ctype array takes up the first 257
              words. The to_lower[],
              to_upper[], and
              sort_order[] arrays take up 256 words
              each after that. See Section 9.4.1, “The Character Definition Arrays”.
            
For a complex character set, create a C source file that describes the character set properties and defines the support routines necessary to properly perform operations on the character set:
              Create the file
              ctype-
              in the MYSET.cstrings directory. Look at one
              of the existing ctype-*.c files (such
              as ctype-big5.c) to see what needs to
              be defined. The arrays in your file must have names like
              ctype_,
              MYSETto_lower_,
              and so on. These correspond to the arrays for a simple
              character set. See Section 9.4.1, “The Character Definition Arrays”.
            MYSET
Near the top of the file, place a special comment like this:
/* * This comment is parsed by configure to create ctype.c, * so don't change it unless you know what you are doing. * * .configure. strxfrm_multiply_MYSET=N* .configure. mbmaxlen_MYSET=N*/
The configure program uses this comment to include the character set into the MySQL library automatically.
              If you need string collating functions, you must specify
              the
              strxfrm_multiply_
              value in the special comment at the top of the source
              file. MYSET=NN must be a positive
              integer that indicates the maximum ratio to which strings
              may grow during execution of the
              my_strxfrm_
              function.
            MYSET()
              If you need multi-byte character set functions, you must
              specify the
              mbmaxlen_
              value in the special comment at the top of the
              MYSET=Nctype-
              source file for your character set.
              MYSET.cN should be set to the size in
              bytes of the largest character in the set.
            
If you need string collating functions, see Section 9.4.2, “String Collating Support”.
If you need multi-byte character support, see Section 9.4.3, “Multi-Byte Character Support”.
          Follow these steps to modify the configuration information.
          Use the existing configuration information as a guide to
          adding information for MYSYS.
        
              Add the character set name to the
              CHARSETS_AVAILABLE list in
              configure.in.
            
              If the character set uses
              ctype-,
              edit MYSET.cstrings/Makefile.am and add
              ctype-
              to the MYSET.cEXTRA_DIST variable.
            
Reconfigure, recompile, and test.
      The sql/share/charsets/README file includes
      additional instructions.
    


User Comments
Add your own comment.