Using multiple delimiters when one of them is a pipe character - awk

I have a text file where fields are separated by a pipe character. Since it is a human readable text, there are spaces used for column alignment.
Here is a sample input:
+------------------------------------------+----------------+------------------+
| Column1 | Column2 | Column3 | Column4 | Last Column |
+------------------------------------------+----------------+------------------+
| some_text | other_text | third_text | fourth_text | last_text |
<more such lines>
+------------------------------------------+----------------+------------------+
How can I use awk to extract the third field in this case? The
I tried:
awk -F '[ |]' '{print $3}' file
awk -F '[\|| ]' '{print $3}' file
awk -F '[\| ]' '{print $3}' file
The expected result is:
<blank>
Column3
<more column 3 values>
<blank>
third_text
I am trying to achieve this with a single awk command. Isn't that possible?
The following post talks about using pipe as a delimiter in awk but it doesn't talk about the case of multiple delimiters where one of them is a pipe character:
Using pipe character as a field separator

Am I missing something ?
Example input :
+------------------------------------------+----------------+------------------+
| Column1 | Column2 | Column3 | Column4 | Last Column |
+------------------------------------------+----------------+------------------+
| some_text | other_text | third_text | fourth_text | last_text |
| some_text2| other_text2 | third_text2 | fourth_text2 | last_text2 |
+------------------------------------------+----------------+------------------+
Command :
gawk -F '[| ]*' '{print $4}' <file>
Output :
<blank>
Column3
<blank>
third_text
third_text2
<blank>
Works for every column (you just need to use i+1 instead of i because first column empty values or +-----).

perl is better suited for this use case :
$ perl -F'\s*\|\s*' -lane 'print $F[3]' File
#  ____________
#  ^
#  |
# FULL regex support with -F switch (delimiter, like awk, but more powerful)

First preparse with sed - remove first, third and last line, replace all spaces+|+spaces with a single |, remove leading | - then just split with awk using | (could be really cut -d'|' -f3).
sed '1d;3d;$d;s/ *| */|/g;s/^|//;' |
awk -F'|' '{print $3}'

Related

AWK - Parsing SQL output

I have a SQL output something like below from the output of a custom tool. Would appreciate any help in finding what I am doing incorrectly.
column1 | column2 | column3 | column4 | column5 | column6 | column7 | column8 | column9 | column10 | column11
--------------------------------------+----------+-------------+-------------+--------------------+-----------------------+--------------------+---------------+----------------
cec75 | 1234 | 007 | | 2810 | | SOME_TEXT | | | 2020-12-07 20:28:46.865+00 | 2020-12-08 06:40:10.231635+00
(1 row)
I am trying to pipe this output the columns I need in my case column1, column2, and column7. I have tried piping out like this but it just prints column1
tool check | awk '{print $1, $2}'
column1 |
--------------------------------------+----------+-------------+-------------+--------------------+-----------------------+--------------------+---------------+----------------+----------------------------+-------------------------------
cec75 |
(1 row)
It would be nice to have something like this.
ce7c5,1234,SOME_TEXT
My file contents
column1 | column2 | column3 | column4 | column5 | column6 | column7 | column8 | column9 | column10 | column11
--------------------------------------+----------+-------------+-------------+--------------------+-----------------------+--------------------+---------------+----------------+----------------------------+-------------------------------
6601c | 2396 | 123 | | 9350 | | SOME_TEXT | | | 2020-12-07 22:49:01.023+00 | 2020-12-08 07:22:37.419669+00
(1 row)
column1 | column2 | column3 | column4 | column5 | column6 | column7 | column8 | column9 | column10 | column11
--------------------------------------+----------+-------------+-------------+--------------------+-----------------------+--------------------+---------------+----------------+----------------------------+-------------------------------
cec75 | 1567 | 007 | | 2810 | | SOME_TEXT | | | 2020-12-07 20:28:46.865+00 | 2020-12-08 07:28:10.319888+00
(1 row)
You need to set correct FS and somehow filters out undesired (junk) lines. I would do it following way. Let file.txt content be:
column1 | column2 | column3 | column4 | column5 | column6 | column7 | column8 | column9 | column10 | column11
--------------------------------------+----------+-------------+-------------+--------------------+-----------------------+--------------------+---------------+----------------
cec75 | 1234 | 007 | | 2810 | | SOME_TEXT | | | 2020-12-07 20:28:46.865+00 | 2020-12-08 06:40:10.231635+00
(1 row)
then
awk 'BEGIN{FS="[[:space:]]+\\|[[:space:]]+";OFS=","}(NR>=2 && NF>=2){print $1,$2,$7}' file.txt
output:
cec75,1234,2020-12-07 20:28:46.865+00
Explanation: I set field separator (FS) to one or more :space: literal | one or more :space: where :space: means any whitespace. Depending on your data you might elect to use zero or more rather than one or more - to do so replace + with *. For every line which is not first one (this filter out header) and has at least 2 fields (this filter out line with - and + and (1 row)) I print content of 1st column followed by , followed by content of 2nd column followed by , followed by content of 7th column.
Description:
Command line switches...
The delimiter is | surrounded by spaces. (Note that we need to use a couple of \'s to escape | if we feed the regex for the delimiter in from the command line.)
In addition to input delimiter (input field separator) the output delimiter (output field separator) can also be set using a command line switch.
The awk script...
If a header is encountered or a ( is seen on a line, it's not a valid line; so, just ignore it.
If the line now has any alphanumeric characters, it's now a valid line to operate on; so, and we strip the leading spaces off the line, and then print the columns we want.
tool check | awk -F' *\\| *' -v OFS=, '/column|\(/ { next } /[[:alnum:]]/ { sub(/^ +/, ""); print $1, $2, $7 }'
Examining the data more closely... It looks as though the date-stamp (which always has a : in it) might be present on all valid records... If so, the script can be reduced to something much more simple.
tool check | awk -F' *\\| *' -v OFS=, '$10 ~ /:/ { sub(/^ +/, ""); print $1, $2, $7 }'
EDIT: Since OP added edited set of samples, so adding this solution now. This considers that you want to print lines after lines which starts from ---.
awk -F'[[:space:]]*\\|[[:space:]]*' '/^---/{found=1;next} found{print $1,$2,$7;found=""}' Input_file
OR
your_command |
awk -F'[[:space:]]*\\|[[:space:]]*' '/^---/{found=1;next} found{print $1,$2,$7;found=""}'

Separate two words in the same colum in awk cli?

In CLI I type:
$ timedatectl | grep Time | awk '{print $3}'
Which gives me the correct output:
Country/City
How can I just get the print out of city without the Country?
Use this command instead:
timedatectl | grep Time | awk '{print $3}' | cut -d/ -f2

How to get the empty values in data using awk command

I had linux lvs command sample result set. I am trying to re-arrange the fields using AWK command.Using that I am not able to skip the empty vales in data.
LV VG Attr LSize Pool Origin Data% Meta% Move Log
root centos -wi-ao---- 45.62g root Online
swap centos -wi-ao---- root Offline
I tried the following command,
awk '{print $1, $2, $3, $4, $5, $6, $7, $8, $9, $10}' lvs.txt
But Output is
LV VG Attr LSize Pool Origin Data% Meta% Move Log
root centos -wi-ao---- 45.62g root Online
swap centos -wi-ao---- root Offline
My expected result must be,
LV | VG | Attr | LSize | Pool | Origin | Data% | Meta% | Move | Log
root | centos | -wi-ao---- | 45.62g | | root | | | | Online
swap | centos | -wi-ao---- | | | root | | | | Offline
Please help me through this.
Any other possible ways is also welcome.Thanks in advance.
Solution, use awk with --separator and format the output with awk -F (Field Separator) and printf:
lvs --separator ',' | awk -F ',' '{printf "%-15s| %-10s| %10s| %10s| %10s| %10s| %10s| %10s| %10s|%10s| \n", $1,$2,$3,$4,$5,$6,$7,$8,$9,$10}'
Output:
LV | VG | Attr| LSize| Pool| Origin| Data%| Meta%| Move| Log|
logicalTest1 | testgroup | -wi-a-----| 1.00g| | | | | | |
logicalTest2 | testgroup | -wi-a-----| 1.00g| | | | | | |
Explanation
Installed lvm and created a couple of lvs to test.
The lvs command produces an output without field separators, just a bunch of spaces. After redirecting the output to a file I've proceeded to display the spaces:
root#florida:~# cat test2 | tr " " "*"
**Host*******Attr*******KMaj*LSize*OSize*Origin
**florida****-wi-a-----**254*1.00g*************
**florida****-wi-a-----**254*1.00g*************
Because spaces aren't good delimiters and after reading man lvs I found it has several options to print the output, one of them is the --separator parameter which allows you to use a personalized separator for each column, I've used a "," (comma) which is common for CSV.
Then, looking for options to delimit awk I've found an option to use a field separator on it, I've glued that with the print formatting explained here.
I just had to search and read a little but all the answers for this problem were on internet.

Piping grep result to awk

I would to delete column from csv files containing some text as the column header.
I would like that the output_file be the same name as the file name found by the grep.
I did the following
grep -l "pattern" * | xargs -0 awk -F'\t' '{print $1"\t"$2}' > output_file
How to output the result to the same file found by the grep ?
Thank you.
Just do this :
grep -l "pattern" * | xargs awk -F'\t' '{print $1"\t"$2 > FILENAME}'
FILENAME is the awk variable for your input file
Example :
$ cat file1
ABC zzz
EFG xxx
HIJ yyy
$ cat file2
123 aaa
456 bbb
789 ccc
grep -l "123" * | xargs awk '{print $2"\t"$1 > FILENAME}'
I switch columns 1 and 2 in the file containing "123" and overwrite file2.
$ cat file1
ABC zzz
EFG xxx
HIJ yyy
$ cat file2
aaa 123
bbb 456
ccc 789

Using multicharacter field separator using AWK

I'm having problems with AWK's field delimiter,
the input file appears as below
1 | all | | synonym |
1 | root | | scientific name |
2 | Bacteria | Bacteria | scientific name |
2 | Monera | Monera | in-part |
2 | Procaryotae | Procaryotae | in-part |
2 | Prokaryota | Prokaryota | in-part |
2 | Prokaryotae | Prokaryotae | in-part |
2 | bacteria | bacteria | blast name |
the field delimiter here is tab,pipe,tab \t|\t
so in my attempt to print just the 1st and 2nd column
awk -F'\t|\t' '{print $1 "\t" $2}' nodes.dmp | less
instead of the desired output, the output is the 1st column followed by the pipe character. I tried escaping the pipe \t\|\t, but the output remains the same.
1 |
1 |
2 |
2 |
2 |
2 |
Printing the 1st and 3rd column gave me the original intended output.
awk -F'\t|\t' '{print $1 "\t" $3}' nodes.dmp | less
but i'm puzzed as to why this is not working as intended.
I understand that the perl one liner below will work but what i really want is to use awk.
perl -aln -F"\t\|\t" -e 'print $F[0],"\t",$F[1]' nodes.dmp | less
The pipe | character seems to be confusing awk into thinking that \t|\t implies that the field separator could be one of \t or \t. Tell awk to interpret the | literally.
$ awk -F'\t[|]\t' '{print $1 "\t" $2}'
1 all
1 root
2 Bacteria
2 Monera
2 Procaryotae
2 Prokaryota
2 Prokaryotae
2 bacteria
From your posted input:
your lines can end in |, not |\t, and
you have cases (the first 2 lines) where the input contains |\t|, and
your lines start with a tab
So, an FS of tab-pipe-tab is wrong since it won't match any of the above cases since the first is just tab-pipe and the tab in the middle of the second will match the tab-pipe-tab from the preceding field but then that just leaves pipe-tab for the following field, and the first leaves you with an undesirable leading tab.
What you actually need is to set the FS to just tab-pipe and then strip off the leading tab from each field:
awk -F'\t|' -v OFS='\t' '{gsub(/(^|[|])\t/,""); print $1, $2}' file
That way you can handle all fields from 1 to NF-1 exactly the same as each other.
Using cut command:
cut -f1,2 -d'|' file.txt
without pipe in output:
cut -f1,2 -d'|' file.txt | tr -d '|'