Tuesday, August 14, 2012

Using `awk` to Convert CSV Format Data

The awk utility provides a handy way to extract and transform data.

In many application scenarios, we wish to quickly convert a form of tabular data to another tabular form that 1) is separated using a different character, 2) contains a subset of all the fields, or 3) contains a reordered set of fields. The awk utility can be very helpful in these situations by calling only its print command.

1.       To CSV

For example, the table1.txt contains data in tabular form of:

1    a1   b1
2    a2   b2
3    a3   b3
4    a4   b4

And the command below:

$ awk '{print $3","$1;}' table1.txt

will extract the 3rd and the 1st fields of table1.txt which are separated by tabs (or spaces), and generate the results in the comma separated format:

b1,1
b2,2
b3,3
b4,4

2.       From CSV

Or you can also specify the separator in the file by the `-F` argument. For example, the table2.txt contains comma separated data of:

1,a1,b1
2,a2,b2
3,a3,b3
4,a4,b4

And the command of:

$ awk -F, '{print $NF"\t"$2"\t"$1;}' table2.txt

states that the separator of the fields will be `,`. And it will convert the fields in table2.txt into the tab separated form (NF means the number of last field):
b1 a1 1
b2 a2 2
b3 a3 3
b4 a4 4



No comments:

Post a Comment

(Coding && Eating) || Sleeping