Showing posts with label bash. Show all posts
Showing posts with label bash. Show all posts

Sunday, 21 February 2021

Replacing CRLF from files

Remove CRLF from files


The following is a number of methods you can use to strip ^M from files.

This was more of an exercise than a feature I need regularly. However, there was a recent incident at work where someone had checked-in such an abomination. Some tools just don't like Windows line endings in files, ie. CRLF, Carriage Return Line Feed. So, as an exercise, here is a collation of the numerous ways to fix these aberrations ...

Note Most editors have a quick way of doing this.

file

You can see whether a file has ^M using the file command:

Example

Consider a file containing ^Mfile reports:

$ file test.txt
test.txt: ASCII text, with CRLF, LF line terminators

After stripping ^M, we have:

$ file test-fixed.txt
test-fixed.txt: ASCII text

dos2unix

Probably the simplest way is to use the dos2unix command.

Example

Show file has ^M line endings:

$ dos2unix -i test.txt
       3      16       0  no_bom    text    test.txt

Now fix:

$ dos2unix test.txt

Proof that file has been fixed:

$ dos2unix -i test.txt
       0      19       0  no_bom    text    test.txt

Haskell

There is a simple Haskell program to do this:

fixcrlf.hs
#!/usr/bin/env runhaskell
 
import           System.Environment (getArgs, getProgName)
 
main = getArgs >>= \args -> case length args of
  1 -> mapM_ (putStrLn . filter (/= '\r')) . lines =<< readFile (head args)
  _ -> getProgName >>= \p -> error $ "Usage: " ++ p ++ " [file_name]"

Example

$ ./fixcrlf.hs test.txt > test-fixed.txt
$ file test.txt test-fixed.txt
test.txt:       ASCII text, with CRLF line terminators
test-fixed.txt: ASCII text

Perl

Using Perl:

$ perl -pi~ -e 's/^M//g' source.file

Where ^M is a control character entered by CTRL-v followed by CTRL-m.

This will keep a backup with the original file saved to source.file~.

Example

$ perl -pi~ -e 's/^M//g' test.txt
$ file test.txt~ test.txt
test.txt~: ASCII text, with CRLF line terminators
test.txt:  ASCII text

Python

Using Python:

fixcrlf.py
#!/usr/bin/env python3
 
import sys
 
if len(sys.argv) == 2:
    with open(sys.argv[1], "r") as f:
        for line in f:
            print(line.rstrip('\r\n'))
else:
    print(f"Usage: {sys.argv[0]} [file-to-fix] > [fixed-file]")

Example

$ ./fixcrlf.py test.txt > test-fixed.txt
$ file test.txt test-fixed.txt
test.txt:       ASCII text, with CRLF line terminators
test-fixed.txt: ASCII text

Ruby

Using Ruby:

fixcrlf.rb
#!/usr/bin/env ruby
 
raise "Usage: #{$PROGRAM_NAME} [file-to-fix] > [fixed-file]" if ARGV.length != 1
 
File.foreach(ARGV[0]) { |line| puts line.chop }

Example

$ ./fixcrlf.rb test.txt > test-fixed.txt 
$ file test.txt test-fixed.txt
test.txt:       ASCII text, with CRLF line terminators
test-fixed.txt: ASCII text

Using shell tools

Using the translate tool, tr:

$ cat source.file | tr -d '\r' > source.file.fixed

This can also be used in a more general fashion to remove non-printable characters from a file:

tr -dc '[:print:]\n' < source.file > source.file.changed

Example

$ cat test.txt | tr -d '\r' > test-fixed.txt 
$ file test.txt test-fixed.txt
test.txt:       ASCII text, with CRLF line terminators
test-fixed.txt: ASCII text

Friday, 7 March 2014

Bash arrays - what is the difference between array[*] and array[@]?

I've just been updating some Bash scripts and noticed that there are two ways to refer to array elements: one using the "*" the other using "@". So what is the difference? According to the manual
The "@" variable allows word splitting within quotes (extracts variables separated by whitespace). This corresponds to the behaviour of "$@" and "$*" in positional parameters.
So you get two different outcomes as demonstrated below:
$ ARRAY=(0 1 2 3 4 5 6 7 8 9)
 
# first using "@" inside quotes,
# you get the same output with no quotes ...
$ for i in "${ARRAY[@]}"; do echo "$i"; done
0
1
2
3
4
5
6
7
8
9
 
# same as above but using "*" ...
$ for i in "${ARRAY[*]}"; do echo "$i"; done
0 1 2 3 4 5 6 7 8 9
 
# but without quotes ...
$ for i in ${ARRAY[*]}; do echo "$i"; done
0
1
2
3
4
5
6
7
8
9
Source: http://www.tldp.org/LDP/abs/html/arrays.html