Showing posts with label newline. Show all posts
Showing posts with label newline. Show all posts

Sunday, 21 February 2021

Replacing CRLF from files

Remove CRLF from files


The following is a number of methods you can use to strip ^M from files.

This was more of an exercise than a feature I need regularly. However, there was a recent incident at work where someone had checked-in such an abomination. Some tools just don't like Windows line endings in files, ie. CRLF, Carriage Return Line Feed. So, as an exercise, here is a collation of the numerous ways to fix these aberrations ...

Note Most editors have a quick way of doing this.

file

You can see whether a file has ^M using the file command:

Example

Consider a file containing ^Mfile reports:

$ file test.txt
test.txt: ASCII text, with CRLF, LF line terminators

After stripping ^M, we have:

$ file test-fixed.txt
test-fixed.txt: ASCII text

dos2unix

Probably the simplest way is to use the dos2unix command.

Example

Show file has ^M line endings:

$ dos2unix -i test.txt
       3      16       0  no_bom    text    test.txt

Now fix:

$ dos2unix test.txt

Proof that file has been fixed:

$ dos2unix -i test.txt
       0      19       0  no_bom    text    test.txt

Haskell

There is a simple Haskell program to do this:

fixcrlf.hs
#!/usr/bin/env runhaskell
 
import           System.Environment (getArgs, getProgName)
 
main = getArgs >>= \args -> case length args of
  1 -> mapM_ (putStrLn . filter (/= '\r')) . lines =<< readFile (head args)
  _ -> getProgName >>= \p -> error $ "Usage: " ++ p ++ " [file_name]"

Example

$ ./fixcrlf.hs test.txt > test-fixed.txt
$ file test.txt test-fixed.txt
test.txt:       ASCII text, with CRLF line terminators
test-fixed.txt: ASCII text

Perl

Using Perl:

$ perl -pi~ -e 's/^M//g' source.file

Where ^M is a control character entered by CTRL-v followed by CTRL-m.

This will keep a backup with the original file saved to source.file~.

Example

$ perl -pi~ -e 's/^M//g' test.txt
$ file test.txt~ test.txt
test.txt~: ASCII text, with CRLF line terminators
test.txt:  ASCII text

Python

Using Python:

fixcrlf.py
#!/usr/bin/env python3
 
import sys
 
if len(sys.argv) == 2:
    with open(sys.argv[1], "r") as f:
        for line in f:
            print(line.rstrip('\r\n'))
else:
    print(f"Usage: {sys.argv[0]} [file-to-fix] > [fixed-file]")

Example

$ ./fixcrlf.py test.txt > test-fixed.txt
$ file test.txt test-fixed.txt
test.txt:       ASCII text, with CRLF line terminators
test-fixed.txt: ASCII text

Ruby

Using Ruby:

fixcrlf.rb
#!/usr/bin/env ruby
 
raise "Usage: #{$PROGRAM_NAME} [file-to-fix] > [fixed-file]" if ARGV.length != 1
 
File.foreach(ARGV[0]) { |line| puts line.chop }

Example

$ ./fixcrlf.rb test.txt > test-fixed.txt 
$ file test.txt test-fixed.txt
test.txt:       ASCII text, with CRLF line terminators
test-fixed.txt: ASCII text

Using shell tools

Using the translate tool, tr:

$ cat source.file | tr -d '\r' > source.file.fixed

This can also be used in a more general fashion to remove non-printable characters from a file:

tr -dc '[:print:]\n' < source.file > source.file.changed

Example

$ cat test.txt | tr -d '\r' > test-fixed.txt 
$ file test.txt test-fixed.txt
test.txt:       ASCII text, with CRLF line terminators
test-fixed.txt: ASCII text