I recently ran into a problem after downloading multiple CSV files. I wanted to firstly merge all those files and then extract/filter data from that merged file. Doing this is possible using code, but Linux has a bunch of powerful tools that can also handle such tasks and can do them very quickly(in comparison to Python).
Let us say we have a bunch of CSV files labelled like so: data_1.csv, data_2.csv and so on
In order to merge all these CSV files, we can use
sed 1d data-*.csv > merged.csv
The command collects all the data- files and after merging them, saves them to
merged.csv. From the reference article I learned that if you used
*.csv it may also merge the
merged.csv file and create duplicates and other issues.
Filtering - exact match
In order to find an exact entry in the CSV file, we can use
awk -F"," '$1==somestring' merged.csv
awk to view the comma as a separator. The
$1 says to look in the first column.
somestring is the exact match we are looking for in column 1.
merged.csv is the file we want to look into for the value.
Filtering - partial strings
My main filtering goal was to use partial strings to find certain values that match the partial string. Again we will use
awk 'index($1, "partialstring")' merged.csv
Similar to the above,
$1 looks in column 1 for the
"partialstring" string. In both the
awk commands you can also save the output to a file like so:
awk 'index($1, "partialstring")' merged.csv > output.txt
I hope these 3 commands help you save a lot of time if you are attempting to do anything with CSV files.