Sunday 21 February 2021

Replacing CRLF from files

Remove CRLF from files


The following is a number of methods you can use to strip ^M from files.

This was more of an exercise than a feature I need regularly. However, there was a recent incident at work where someone had checked-in such an abomination. Some tools just don't like Windows line endings in files, ie. CRLF, Carriage Return Line Feed. So, as an exercise, here is a collation of the numerous ways to fix these aberrations ...

Note Most editors have a quick way of doing this.

file

You can see whether a file has ^M using the file command:

Example

Consider a file containing ^Mfile reports:

$ file test.txt
test.txt: ASCII text, with CRLF, LF line terminators

After stripping ^M, we have:

$ file test-fixed.txt
test-fixed.txt: ASCII text

dos2unix

Probably the simplest way is to use the dos2unix command.

Example

Show file has ^M line endings:

$ dos2unix -i test.txt
       3      16       0  no_bom    text    test.txt

Now fix:

$ dos2unix test.txt

Proof that file has been fixed:

$ dos2unix -i test.txt
       0      19       0  no_bom    text    test.txt

Haskell

There is a simple Haskell program to do this:

fixcrlf.hs
#!/usr/bin/env runhaskell
 
import           System.Environment (getArgs, getProgName)
 
main = getArgs >>= \args -> case length args of
  1 -> mapM_ (putStrLn . filter (/= '\r')) . lines =<< readFile (head args)
  _ -> getProgName >>= \p -> error $ "Usage: " ++ p ++ " [file_name]"

Example

$ ./fixcrlf.hs test.txt > test-fixed.txt
$ file test.txt test-fixed.txt
test.txt:       ASCII text, with CRLF line terminators
test-fixed.txt: ASCII text

Perl

Using Perl:

$ perl -pi~ -e 's/^M//g' source.file

Where ^M is a control character entered by CTRL-v followed by CTRL-m.

This will keep a backup with the original file saved to source.file~.

Example

$ perl -pi~ -e 's/^M//g' test.txt
$ file test.txt~ test.txt
test.txt~: ASCII text, with CRLF line terminators
test.txt:  ASCII text

Python

Using Python:

fixcrlf.py
#!/usr/bin/env python3
 
import sys
 
if len(sys.argv) == 2:
    with open(sys.argv[1], "r") as f:
        for line in f:
            print(line.rstrip('\r\n'))
else:
    print(f"Usage: {sys.argv[0]} [file-to-fix] > [fixed-file]")

Example

$ ./fixcrlf.py test.txt > test-fixed.txt
$ file test.txt test-fixed.txt
test.txt:       ASCII text, with CRLF line terminators
test-fixed.txt: ASCII text

Ruby

Using Ruby:

fixcrlf.rb
#!/usr/bin/env ruby
 
raise "Usage: #{$PROGRAM_NAME} [file-to-fix] > [fixed-file]" if ARGV.length != 1
 
File.foreach(ARGV[0]) { |line| puts line.chop }

Example

$ ./fixcrlf.rb test.txt > test-fixed.txt 
$ file test.txt test-fixed.txt
test.txt:       ASCII text, with CRLF line terminators
test-fixed.txt: ASCII text

Using shell tools

Using the translate tool, tr:

$ cat source.file | tr -d '\r' > source.file.fixed

This can also be used in a more general fashion to remove non-printable characters from a file:

tr -dc '[:print:]\n' < source.file > source.file.changed

Example

$ cat test.txt | tr -d '\r' > test-fixed.txt 
$ file test.txt test-fixed.txt
test.txt:       ASCII text, with CRLF line terminators
test-fixed.txt: ASCII text