Make a better layout detector. Every character on its line.
Separate (more) merged characters. (Not so easy).
Deal better with frames, lines, pictures, etc.
