Disk seeks are a huge performance bottleneck. This problem becomes more apparent when the amount of data starts to grow so large that effective caching becomes impossible. For large databases where you access data more or less randomly, you can be sure that you need at least one disk seek to read and a couple of disk seeks to write things. To minimize this problem, use disks with low seek times.
Increase the number of available disk spindles (and thereby reduce the seek overhead) by either symlinking files to different disks or striping the disks:
Using symbolic links
              This means that, for MyISAM tables, you
              symlink the index file and data files from their usual
              location in the data directory to another disk (that may
              also be striped). This makes both the seek and read times
              better, assuming that the disk is not used for other
              purposes as well. See Section 7.6.1, “Using Symbolic Links”.
            
              Striping means that you have many disks and put the first
              block on the first disk, the second block on the second
              disk, and the N-th block on the
              (N MOD
              number_of_disks
The speed difference for striping is very dependent on the parameters. Depending on how you set the striping parameters and number of disks, you may get differences measured in orders of magnitude. You have to choose to optimize for random or sequential access.
          For reliability, you may want to use RAID 0+1 (striping plus
          mirroring), but in this case, you need 2 ×
          N drives to hold
          N drives of data. This is probably
          the best option if you have the money for it. However, you may
          also have to invest in some volume-management software to
          handle it efficiently.
        
          A good option is to vary the RAID level according to how
          critical a type of data is. For example, store semi-important
          data that can be regenerated on a RAID 0 disk, but store
          really important data such as host information and logs on a
          RAID 0+1 or RAID N disk. RAID
          N can be a problem if you have many
          writes, due to the time required to update the parity bits.
        
          On Linux, you can get much more performance by using
          hdparm to configure your disk's interface.
          (Up to 100% under load is not uncommon.) The following
          hdparm options should be quite good for
          MySQL, and probably for many other applications:
        
hdparm -m 16 -d 1
          Note that performance and reliability when using this command
          depend on your hardware, so we strongly suggest that you test
          your system thoroughly after using hdparm.
          Please consult the hdparm manual page for
          more information. If hdparm is not used
          wisely, file system corruption may result, so back up
          everything before experimenting!
        
You can also set the parameters for the file system that the database uses:
          If you do not need to know when files were last accessed
          (which is not really useful on a database server), you can
          mount your file systems with the -o noatime
          option. That skips updates to the last access time in inodes
          on the file system, which avoids some disk seeks.
        
          On many operating systems, you can set a file system to be
          updated asynchronously by mounting it with the -o
          async option. If your computer is reasonably stable,
          this should give you more performance without sacrificing too
          much reliability. (This flag is on by default on Linux.)
        


User Comments
Add your own comment.