%
% $Id: copyedit-doc.tex,v 1.2 2014/07/17 09:44:04 cvr Exp cvr $
%
\IfFileExists{stm.cls}
  {\documentclass[doc,qlinks]{stm}}
  {%
    \documentclass[a4paper]{article}
    \usepackage[svgnames,dvipsnames]{xcolor}
  } 

\usepackage[lat=2,lang=ca,abbr=italic,last=false,eitem=1]{copyedit}
\usepackage{xspace}
\usepackage[osf,sc]{mathpazo}
\usepackage[osf]{sourcesanspro}
\usepackage{shortvrb}
\MakeShortVerb{\|}

 \usepackage[T1]{fontenc}

\makeatletter
\IfFileExists{inconsolata.sty}{%
  \usepackage[scaled=.9]{inconsolata}
   \def\verbatim@font{\normalfont\fontfamily{zi4}%
     \color{DarkSlateGray}\fontseries{m}\selectfont}
}{}
\def\latex{\LaTeX\xspace}
\def\tex{\TeX\xspace}
\def\tsc#1{{\small #1}\xspace}
\def\texht{\TeX4ht\xspace}
\long\def\textSF#1{\bgroup\color{DarkOliveGreen}\sffamily#1\egroup\xspace}
\def\textmd#1{\bgroup\fontseries{b}\selectfont#1\egroup}


\ifcsname version\endcsname\else\let\version\@gobble\fi
\ifcsname contact\endcsname\else\let\contact\@gobble\fi

\ifcsname qlinkfont\endcsname
  \qlinkfont{\sffamily\fontseries{m}\fontsize{8}{8.5}}
  \colorlet{quicklinkcolor}{white}
  \footerfont{\sffamily\fontseries{m}\fontsize{8}{8.5}}
  \colorlet{footercolor}{white}
\else
  \let\@author\@empty
  \renewcommand\author[2][]{\g@addto@macro\@author
    {\ifnum#1=2\relax \space and \fi
      #2\textsuperscript{\@alph{#1}}}} 
  \newcommand\address[2][]{\g@addto@macro\@author
    {\\\footnotesize\itshape\textsuperscript{\@alph{#1}}#2}}
  \let\footeraddress\@gobble
  \let\dummysection\relax
  \usepackage[colorlinks]{hyperref}
\fi

\makeatother

\begin{document}


\title{\tex and Copyediting}
\version{1.2}
\contact{cvr3@river-valley.org}
\author[1]{SK Venkatesan}
\author[2]{CV Rajagopal}
\date{20 Jul 2014}
\address[1]{\textsc{tnq} Books and Journals Pvt Ltd, Dr Vikram
  Sarabhai Instronic Estate, Kottivakkam, Chennai 600041, India}
\address[2]{River Valley Technologies, \textsc{jwra} 34, Jagathy, 
  Trivandrum 695014, India}

\maketitle

\footeraddress{\href{http://www.tnq.co.in}{\textsc{tnq} Books and
    Journals Pvt Ltd}, Kottivakkam, Chennai 600041, India\\
  \href{http://www.river-valley.com}{River Valley Technologies},
    \textsc{jwra} 34, Jagathy, Trivandrum  69514, India}


\addtolength\baselineskip{2pt}

\section{Introduction}
              
There can be many a slip between the cup and the lip in the publishing
process. The manuscript that arrives in a modern publisher's office,
usually as a \latex or a \textsc{ms w}ord file, gets transformed bit by
bit into a central \tsc{XML} form and then it is typeset into its
final \tsc{PDF} form. It is a bit like smelting and purifying iron
from in its raw form and molding it into the final finished
product. Copyediting is a crucial step in the process and is receiving
increasing attention now as the copyediting changes are being clearly
indicated in the proofing process to the author.
              
Copyediting involves a broad range of activity: the accurate
conversion of the initial input to \tsc{XML}; ensuring consistency of
usage within the manuscript, correcting basic language and grammar,
applying the finer aspects of the publisher's style, and placing
\tsc{XML} hooks to ensure finer typographic aspects are taken
care. The \tsc{XML} keeps the link alive between the present print-led
world and future worlds such as \tsc{HTML5}. Copyeditors and \tsc{XML}
form the bridge between these two worlds. However, there exist a lot
of different ways in which \tex can be misused to make life difficult
for a copyeditor \cite{b1} but we have come a long way from the
earlier days when the technology was still under-developed
\cite{b2}. \latex's own secret little macros and \texht have also made
it easier to form this bridge between the two worlds.
              
Just as in all professions copyeditors also come from a long lineage
of tradition. Copyediting tries to filter-out what it deems
imperfections and inconsistencies in the manuscript and also ensures
that author-reader communication is improved. Each publisher has an
in-house style guide that has been refined over many years and forms
the basis for copyediting. Our experience with different publishers
has established that it is possible to design a generic set of \tex
macros that can be used in the spirit of Bib\latex macros.

It should be mentioned here that these set of macros are not designed
to replace copyeditors but to make it easier for them to take care of
mundane aspects of copyediting in a systematic way, so that they will
be able concentrate on improving the crucial author-reader semantic
communication aspects. Despite market trends that go in the reverse
direction, the role of copyediting has never been more important in
the present world with varied rendering devices, with different aspect
ratios and modern semantic capabilities.
              
It should be mentioned here that these sets of macros are not designed
to replace copyeditors but to make it easier for them to take care of
mundane aspects of copyediting in a systematic way, so that they will
be able to concentrate on improving the crucial author-reader semantic
communication aspects. Despite market trends that go in the reverse
direction, the role of copyediting has never been more important in
the present world with varied rendering devices, with different aspect
ratios and modern semantic capabilities.

\section{Copyediting macros}
              
Copyediting involves quite a broad spectrum of activity. At one end of
the spectrum it improves semantic communication between the author and
the reader. At the other end of the spectrum it reinforces certain
stylistic and typographic conventions of the publisher. Semantic
aspects are much beyond the capability of ordinary \tex macros, so it
is at the latter end of the spectrum that most of this effort will be
focussed.
              
We first attempt to list the copyediting process into various modular
components:

\begin{enumerate}
\item Localization --- British-American-Australian-Canadian
\item Close-up, Hyphenation, and Spaced words
\item Latin abbreviations
\item Acronyms and Abbreviations
\item Itemization, nonlocal lists and labels
\item Parenthetical and serial commas
\item Non-local tokenization in language through Abbreviations and
  pronouns.
%\item Subject specific macros such as Genus-species identification

\end{enumerate}


\section{ Localization --- British-American-Canadian-Australian}

There are many sub-categories in British-American-Australian-Canadian
variations:

\subsection{DG (Am) versus DGE (Au, Br, Ca)}

In American spelling, words like \textSF{Acknowledg\textmd{e}ment} and
\textSF{Judg\textmd{e}ment} loose the \textSF{e} and become
\textSF{Acknowledgment} and \textSF{Judgment}.

\subsection{S (Am, Ca) versus Z (Br, Au)}

In American and Candian spelling prefers \textSF{ize}, while
Australian and British use \textSF{ise} spelling as in words like
\textSF{apolog\textmd{ize}/apolog\textmd{ise}} or as in
\textSF{author\textmd{ize}/author\textmd{ise}}.  However, the rule is
different for \textSF{yze/yse} patterns as in words like
\textSF{anal\textmd{yze}/anal\textmd{yse}} although American prefer
\textSF{z} but the rest use the British \textSF{s}.

\subsection{C (Am, Ca) and S (Br, Au)}

In words like \textSF{defen\textmd{s}e/defen\textmd{c}e},
\textSF{offen\textmd{s}e/offen\textmd{c}e}, the American and Candian
prefer \textSF{s} instead of \textSF{c}

\subsection{G (Am) and GUE (Au, Br, Ca)}
In words like \textSF{dialog/dialog\textmd{ue}},
\textSF{catalog/catalog\textmd{ue}} Americans prefer to drop the
\textSF{ue}

\subsection{OR (Am)  and OUR (Au, Br, Ca)} 

In words like \textSF{color/colo\textmd{u}r},
\textSF{favor/favo\textmd{u}r} Americans do away with \textSF{u} while
teh rest keep the British spelling.

\subsection{ER (Am) and RE (Au, Br, Ca)} 

In words like \textSF{cent\textmd{er}/cent\textmd{re}},
\textSF{calib\textmd{er}/calib\textmd{re}} Americans prefer the
\textSF{er} spelling while the rest follow teh British spelling.

\subsection{L (Am) and LL (Au, Br, Ca)} 

In words like \textSF{cance\textmd{l}ed/cance\textmd{ll}ed},
\textSF{mode\textmd{l}ed/mode\textmd{ll}ed} American prefer the single
\textSF{l} spelling while rest prefer double \textSF{l}.

\subsection{Others}

There are also many other patterns and differences that doesn't fall
into the above set of regular expression patterns and so these can
only be handled by a word list with their language mapping table.

We use a very simple macro to care of all of this complexity:
|\vara{color}|
to take care of British-American-Australian-Canadian.
The switch to particular language spelling can made by using:
\begin{verbatim}
 \usepackge[lang=uk]{copyediting}
\end{verbatim}
in the preamble.  Both |\vara{color}| and |\vara{colour}| would
produce the same output: \textSF{colour}, so the author's original
need not be changed.  The other options for language switch in this
context: |lang=uk,ca,au|. The default language for the package is
British spelling.  The exceptions when one wants to force a particular
use in a particular instance one should use: |\vara*{analog}| as this
starred macro will leave the input unchanged as \textSF{analog}.

\section{Close-up, Hyphenation, and Spaced words}

Although American spellings use less hyphenation, the modern
preference for closed prefixes has a few exceptions:
\begin{enumerate}
\item if the root word is a proper noun or a number (\textSF{post-Depression},
  \textSF{pre-2001})
\item for double prefix (\textSF{non-self-governing})
\item if the prefix precedes a proper open compound then ndash is used
  (\textSF{pre--Civil War}) \item if two instances of the letter
  \textSF{i} or the letter \textSF{a} are adjacent (anti-intellectual,
  extra-action), or another combination of letters that could hamper
  reading (\textSF{pro-labor})
\item for a double prefix (\textSF{anti-antibody})
\item for a repeated prefix with implicit use (\textSF{over- and
  understimulation})
\end{enumerate}

However, many house styles have their own preferences, which can be
dealt with the starred macros.  We use the macro: |\hyp{anti}{body}|
to hyphenate a compound word and for a close-up word we use:
|\closeup{anti}{body}| for compound words that occur as two separate
words: |\sword{Civil}{War}| You might wonder what use is such macros
in a \latex file?  They give visibility to the corrections the
copyeditor makes and offers hooks to produce a global inventory of
various changes while at the same time making it convenient to make
switches on a global scale.

\section{Latin abbreviations}

Latin abbreviations such as:

\medskip
\begin{tabular}{ll}
cf. & compare\\
et. al. & and others\\
etc. & and so forth\\
e.g. & for example\\
i.e. & that is\\
NB   & note\\
viz. & namely
\end{tabular}

is quite straight-forward to handle using macros: |\lat{et al.}| where
the stylistic aspect will be taken care by global switches such as:
\begin{verbatim}
  \usepackge{copyediting}[lat=0,abbr=italic]
\end{verbatim}
The default |lat=0| leaves the text as it is and italic sets it to
italic style.  The other option |lat=1| removes all the dots and
|lat=2| sets the value to its English equivalent shown above.

\section{Acronyms and Abbreviations}

Depending on if the initial letter abbreviations are spoken together
as a word, as in \tsc{AIDS} (Acquired Immune Deficieny Syndrome), the
term acronym is used but we will not make this distinction here and
treat them as one and the same. A simple macro: |\ac{AIDS}| is good
enough and the default global switch will ensure that it is expanded
correctly the first time. The mapping between the acronym and its
expansion is declared the first time as:
\begin{verbatim}
  \newacro{AIDS}{Acquired Immune Deficieny Syndrome}
\end{verbatim}
However many standard acronyms would be available by default from the
package and only new acronyms need to be added this way. This can be
checked during compilation.

\section{Itemizations and nonlocal lists and labels}

In many cases where there are only a few instances of a list we tend
to use like in this example:
\begin{eitem}
\sffamily\color{Brown}\small
\item  this is an endangered species;
\item  humans find them delicious;
\item  they are only found on this island.
\end{eitem}
In this example we could have as well have used, first, second, third,
instead of |*ly|, making that a global option. It is also possible
that this can be changed into a standard arabic numeral list:
|1)... 2)... 3)...| etc.  In order to keep the possibility to make
such changes with a simple switch one can use macros:
\begin{verbatim}
  \begin{eitem}
    \item this is a endangered species;
    \item humans find them delicious;
    \item they are only found on this island.
  \end{eitem}
\end{verbatim}
If we run \latex the third and final time then there is an option to
change the last item in the list to \textSF{lastly} but that's a
global switch:
\begin{verbatim}
  \usepackage{copyediting}[eitem=0,last]
\end{verbatim}
where the |eitem=0| is the default option that causes
\textSF{firstly}, \textSF{secondly\dots} and last indicates that the last
item should be \textSF{lastly}. If |eitem=1| is set then \textSF{ly}
drops out and for |eitem=2,3,..,| it switches to standard enumerated
and bulleted list.

\subsubsection*{eitem=0, last=true}

\cesetup{eitem=0,last=true}
\begin{eitem}
\item  this is an endangered species;
\item  humans find them delicious;
\item  they are only found on this island.
\end{eitem}

\subsubsection*{eitem=1, last=true}

\cesetup{eitem=1,last=true}
\begin{eitem}
\item  this is an endangered species;
\item  humans find them delicious;
\item  they are only found on this island.
\end{eitem}

\subsubsection*{eitem=2}

\cesetup{eitem=2,last=true}
\begin{eitem}
\item  this is an endangered species;
\item  humans find them delicious;
\item  they are only found on this island.
\end{eitem}

\subsubsection*{eitem=3}

\cesetup{eitem=3,last=true}
\begin{eitem}
\item  this is an endangered species;
\item  humans find them delicious;
\item  they are only found on this island.
\end{eitem}

\subsubsection*{eitem=4, last=true}

This option and the succeeding one will make the list in paragraph
mode instead of the usual vertical list. Also, the semicolon at the
end of each item and the \textSF{and} connector at the end of
penultimate item will be automatically added.

\begin{quote}
\cesetup{eitem=4,last=true}
\begin{eitem}
\item  this is an endangered species
\item  humans find them delicious
\item  they are only found on this island.
\end{eitem}
\end{quote}


\subsubsection*{eitem=5, last=true}

\begin{quote}
\cesetup{eitem=5,last=true}
\begin{eitem}
\item  this is an endangered species
\item  humans find them delicious
\item  they are only found on this island.
\end{eitem}
\end{quote}

\section{Parenthetical and serial commas}

Many long sentences are difficult to read and can be communicated
better with parenthetical constructs or footnotes rather than
commas. It would be nice to have switches that can make this
change. For example:
\begin{quote}
\noindent
The enthusiastic young ducks flying in front of the group |\pc{led|
  |by| |the| |sugecious| |older| |ones| |at| |the| |back,| |make| |a|
  |lot| |of noise| |and| |turbulence}| which are used by older ones at
the back to warm their heart and the wings.
\end{quote}
would outputs to:
\begin{quote}
\noindent
The enthusiastic young ducks flying in front of the group
\pc{\textSF{led by the spacious older ones at the back}} make a lot of
noise and turbulence, which are used by older ones at the back to warm
their heart and the wings.
\end{quote}
Depending on global switch |pc=0,1,2,3| or |4| we have the option of
choosing parenthetical comma, parenthesis, emdash, a footnote or a sidenote.

See the output when |pc=1| (in parentheses):
\cesetup{pc=1}
\begin{quote}
\noindent The enthusiastic young ducks flying in front of the group
\pc{\textSF{led by the spacious older ones at the back}} make a lot of
noise and turbulence, which are used by older ones at the back to warm
their heart and the wings.
\end{quote}

See the output when |pc=2| (between emdashes):
\cesetup{pc=2}
\begin{quote}
\noindent
The enthusiastic young ducks flying in front of the group
\pc{\textSF{led by the spacious older ones at the back}} make a lot of
noise and turbulence, which are used by older ones at the back to warm
their heart and the wings.
\end{quote}

See the output when |pc=3| (as a footnote):
\cesetup{pc=3}
\begin{quote}
\noindent
The enthusiastic young ducks flying in front of the group
\pc{\textSF{led by the spacious older ones at the back}} make a lot of
noise and turbulence, which are used by older ones at the back to warm
their heart and the wings.
\end{quote}

See the output when |pc=4| (as a marginpar):
\cesetup{pc=4}
\begin{quote}
\noindent
The enthusiastic young ducks flying in front of the group
\pc{\textSF{led by the spacious older ones at the back}} make a lot of
noise and turbulence, which are used by older ones at the back to warm
their heart and the wings.
\end{quote}

\subsubsection{Elist}

For a list of items such as: \textSF{Suddenly warblers, tits, wrens,
and hummingbirds started singing in chorus from the bushes\dots} we
turn them into:

\begin{quote}
  \noindent Suddenly |\elist{warblers,tits,wrens,hummingbirds}|
  started singing in chorus from the bushes\dots
\end{quote}
This will be tranformed into:
\begin{quote}
  \noindent Suddenly \textSF{\elist{warblers,tits,wrens,hummingbirds}}
  started singing in chorus from the bushes\dots
\end{quote}

This macro will help bring consistency across the document regarding
the placement of comma before and after \textSF{and} in the last item
and in ensuring proper white-space after the comma.

\section{Non-local tokenization in language through Abbreviations and pronouns}

In a sequence of minimization operation, in typical news column the
copyeditor addresses:
\begin{quote}
\noindent
His Holyness, the Prince of Mangoistan addressed a gathering of
ordinary mangoes in the capital New Mango. The Prince of Mangoistan
pointed out the serious threat of foreign insects in the country. He
further pointed out the precautionary methods taken such as the use of
organic insect repellants like the neem leaves and cow-dung to keep
the country free of foreign pests\dots
\end{quote}
\textSF{His Holyness the Prince of Mangoistan} shrinks to \textSF{The
  Prince of Mangoistan} and then finally to \textSF{He}. This
copyediting operation can be denoted using:
\begin{verbatim}
 \definetoken{mango}{His Holyness, the Prince of Mangoistan}
     {The Prince of Mangoistan}{He}
\end{verbatim}
at the first instance and then just |\Token{mango}| at the later
instances. The |\Token{mango}| macro would be quite useful to even
just indicate what the important pronouns link to in a
paragraph. However, not all pronouns have corresponding original
objects as in the case of \textSF{it} in \textSF{It is raining}.

% \section{Subject specific macros such as Genus-species identification}

% The Genus species formatting is similar to latin abbreviations in many
% ways but it follows its own conventions as well. The macro:
% \begin{verbatim}
%   \gensp{E. coli}
% \end{verbatim}
% italicizes all instances and expands the abbreviations at the first
% instance.  Quite like the abbreviation macro |\abbr| this also allows
% the embedding of new undefined genus species entities as in:
% \begin{verbatim}
%   \gensp{E. coli}{Escherichia coli}
% \end{verbatim}

\section{Interactive proofing}

The above set of macros bring certain level of transparency and
consistency to the copyediting process. Using additional macros, this
also has the potential to convey further the key aspects of
copyediting to the author using menus and dashboards, bringing certain
interctive aspect to the proofing process.

\section{Conclusion}

We have made an attempt at bringing together many copyediting aspects
as \latex macros. This involves some amount of drastic simplification
and abstraction that may or may not work in all cases. The starred
macros could be used in those cases where one needs to escape the
global switch. The non-local linkages work just as in the case of
bibliography links by multiple compilation of \latex that passes
information through auxiliary files. Of course, this is only a small
step towards the Himalayan task of climbing the semantic hill through
\latex macros as envisaged by Sense\tex \cite{b3}.

\section*{Acknowledgements}

We would like to thank Lorna O'Brien for important inputs on English
language and its varied usages across countries and publishers. Of
course, this work would not have been possible without the constant
encouragement of Mariam Ram, \tsc{TNQ} and C.V.~Radhakrishanan, River
Valley Technologies.

\begin{thebibliography}{0}

\bibitem{b1} E. Gregorio (2005) Horrors in \latex: How to misuse \latex
  and make a copy editor unhappy, \textit{TUGboat} \textbf{26}(3), 273--279.

\bibitem{b2} P. Flynn (1993) \tex and \tsc{SGML}: A Recipe for Disaster?
  \textit{TUGboat} \textbf{14}(3), 227--230.

\bibitem{b3} S.K. Venkatesan (2005) Moving from bytes to words to
  semantics. \textit{TUGboat} \textbf{26}(2), 165--168.

\end{thebibliography}

\dummysection{}
\end{document}