Node:Dupword Program, Next:Alarm Program, Previous:Miscellaneous Programs, Up:Miscellaneous Programs
A common error when writing large amounts of prose is to accidentally duplicate words. Typically you will see this in text as something like "the the program does the following..." When the text is online, often the duplicated words occur at the end of one line and the beginning of another, making them very difficult to spot.
This program, dupword.awk, scans through a file one line at a time
and looks for adjacent occurrences of the same word.  It also saves the last
word on a line (in the variable prev) for comparison with the first
word on the next line.
The first two statements make sure that the line is all lowercase,
so that, for example, "The" and "the" compare equal to each other. 
The next statement replaces nonalphanumeric and nonwhitespace characters
with spaces, so that punctuation does not affect the comparison either. 
The characters are replaced with spaces so that formatting controls
don't create nonsense words (e.g., the Texinfo @code{NF}
becomes codeNF if punctuation is simply deleted).  The record is
then resplit into fields, yielding just the actual words on the line,
and ensuring that there are no empty fields.
If there are no fields left after removing all the punctuation, the
current record is skipped.  Otherwise, the program loops through each
word, comparing it to the previous one:
# dupword.awk --- find duplicate words in text
{
    $0 = tolower($0)
    gsub(/[^[:alnum:][:blank:]]/, " ");
    $0 = $0         # re-split
    if (NF == 0)
        next
    if ($1 == prev)
        printf("%s:%d: duplicate %s\n",
            FILENAME, FNR, $1)
    for (i = 2; i <= NF; i++)
        if ($i == $(i-1))
            printf("%s:%d: duplicate %s\n",
                FILENAME, FNR, $i)
    prev = $NF
}