Using multiple delimiters when one of them is a pipe character

Using multiple delimiters when one of them is a pipe character - awk

I have a text file where fields are separated by a pipe character. Since it is a human readable text, there are spaces used for column alignment.
Here is a sample input:
+------------------------------------------+----------------+------------------+
| Column1 | Column2 | Column3 | Column4 | Last Column |
+------------------------------------------+----------------+------------------+
| some_text | other_text | third_text | fourth_text | last_text |
<more such lines>
+------------------------------------------+----------------+------------------+
How can I use awk to extract the third field in this case? The
I tried:
awk -F '[ |]' '{print $3}' file
awk -F '[\|| ]' '{print $3}' file
awk -F '[\| ]' '{print $3}' file
The expected result is:
<blank>
Column3
<more column 3 values>
<blank>
third_text
I am trying to achieve this with a single awk command. Isn't that possible?
The following post talks about using pipe as a delimiter in awk but it doesn't talk about the case of multiple delimiters where one of them is a pipe character:
Using pipe character as a field separator

Am I missing something ?
Example input :
+------------------------------------------+----------------+------------------+
| Column1 | Column2 | Column3 | Column4 | Last Column |
+------------------------------------------+----------------+------------------+
| some_text | other_text | third_text | fourth_text | last_text |
| some_text2| other_text2 | third_text2 | fourth_text2 | last_text2 |
+------------------------------------------+----------------+------------------+
Command :
gawk -F '[| ]*' '{print $4}' <file>
Output :
<blank>
Column3
<blank>
third_text
third_text2
<blank>
Works for every column (you just need to use i+1 instead of i because first column empty values or +-----).

perl is better suited for this use case :
$ perl -F'\s*\|\s*' -lane 'print $F[3]' File
#  ____________
#  ^
#  |
# FULL regex support with -F switch (delimiter, like awk, but more powerful)

First preparse with sed - remove first, third and last line, replace all spaces+|+spaces with a single |, remove leading | - then just split with awk using | (could be really cut -d'|' -f3).
sed '1d;3d;$d;s/ *| */|/g;s/^|//;' |
awk -F'|' '{print $3}'

Related

AWK - Parsing SQL output

I have a SQL output something like below from the output of a custom tool. Would appreciate any help in finding what I am doing incorrectly.
column1 | column2 | column3 | column4 | column5 | column6 | column7 | column8 | column9 | column10 | column11
--------------------------------------+----------+-------------+-------------+--------------------+-----------------------+--------------------+---------------+----------------
cec75 | 1234 | 007 | | 2810 | | SOME_TEXT | | | 2020-12-07 20:28:46.865+00 | 2020-12-08 06:40:10.231635+00
(1 row)
I am trying to pipe this output the columns I need in my case column1, column2, and column7. I have tried piping out like this but it just prints column1
tool check | awk '{print $1, $2}'
column1 |
--------------------------------------+----------+-------------+-------------+--------------------+-----------------------+--------------------+---------------+----------------+----------------------------+-------------------------------
cec75 |
(1 row)
It would be nice to have something like this.
ce7c5,1234,SOME_TEXT
My file contents
column1 | column2 | column3 | column4 | column5 | column6 | column7 | column8 | column9 | column10 | column11
--------------------------------------+----------+-------------+-------------+--------------------+-----------------------+--------------------+---------------+----------------+----------------------------+-------------------------------
6601c | 2396 | 123 | | 9350 | | SOME_TEXT | | | 2020-12-07 22:49:01.023+00 | 2020-12-08 07:22:37.419669+00
(1 row)
column1 | column2 | column3 | column4 | column5 | column6 | column7 | column8 | column9 | column10 | column11
--------------------------------------+----------+-------------+-------------+--------------------+-----------------------+--------------------+---------------+----------------+----------------------------+-------------------------------
cec75 | 1567 | 007 | | 2810 | | SOME_TEXT | | | 2020-12-07 20:28:46.865+00 | 2020-12-08 07:28:10.319888+00
(1 row)

You need to set correct FS and somehow filters out undesired (junk) lines. I would do it following way. Let file.txt content be:
column1 | column2 | column3 | column4 | column5 | column6 | column7 | column8 | column9 | column10 | column11
--------------------------------------+----------+-------------+-------------+--------------------+-----------------------+--------------------+---------------+----------------
cec75 | 1234 | 007 | | 2810 | | SOME_TEXT | | | 2020-12-07 20:28:46.865+00 | 2020-12-08 06:40:10.231635+00
(1 row)
then
awk 'BEGIN{FS="[[:space:]]+\\|[[:space:]]+";OFS=","}(NR>=2 && NF>=2){print $1,$2,$7}' file.txt
output:
cec75,1234,2020-12-07 20:28:46.865+00
Explanation: I set field separator (FS) to one or more :space: literal | one or more :space: where :space: means any whitespace. Depending on your data you might elect to use zero or more rather than one or more - to do so replace + with *. For every line which is not first one (this filter out header) and has at least 2 fields (this filter out line with - and + and (1 row)) I print content of 1st column followed by , followed by content of 2nd column followed by , followed by content of 7th column.

Description:
Command line switches...
The delimiter is | surrounded by spaces. (Note that we need to use a couple of \'s to escape | if we feed the regex for the delimiter in from the command line.)
In addition to input delimiter (input field separator) the output delimiter (output field separator) can also be set using a command line switch.
The awk script...
If a header is encountered or a ( is seen on a line, it's not a valid line; so, just ignore it.
If the line now has any alphanumeric characters, it's now a valid line to operate on; so, and we strip the leading spaces off the line, and then print the columns we want.
tool check | awk -F' *\\| *' -v OFS=, '/column|\(/ { next } /[[:alnum:]]/ { sub(/^ +/, ""); print $1, $2, $7 }'
Examining the data more closely... It looks as though the date-stamp (which always has a : in it) might be present on all valid records... If so, the script can be reduced to something much more simple.
tool check | awk -F' *\\| *' -v OFS=, '$10 ~ /:/ { sub(/^ +/, ""); print $1, $2, $7 }'

EDIT: Since OP added edited set of samples, so adding this solution now. This considers that you want to print lines after lines which starts from ---.
awk -F'[[:space:]]*\\|[[:space:]]*' '/^---/{found=1;next} found{print $1,$2,$7;found=""}' Input_file
OR
your_command |
awk -F'[[:space:]]*\\|[[:space:]]*' '/^---/{found=1;next} found{print $1,$2,$7;found=""}'

Separate two words in the same colum in awk cli?

In CLI I type:
$ timedatectl | grep Time | awk '{print $3}'
Which gives me the correct output:
Country/City
How can I just get the print out of city without the Country?

Use this command instead:
timedatectl | grep Time | awk '{print $3}' | cut -d/ -f2

How to get the empty values in data using awk command

I had linux lvs command sample result set. I am trying to re-arrange the fields using AWK command.Using that I am not able to skip the empty vales in data.
LV VG Attr LSize Pool Origin Data% Meta% Move Log
root centos -wi-ao---- 45.62g root Online
swap centos -wi-ao---- root Offline
I tried the following command,
awk '{print $1, $2, $3, $4, $5, $6, $7, $8, $9, $10}' lvs.txt
But Output is
LV VG Attr LSize Pool Origin Data% Meta% Move Log
root centos -wi-ao---- 45.62g root Online
swap centos -wi-ao---- root Offline
My expected result must be,
LV | VG | Attr | LSize | Pool | Origin | Data% | Meta% | Move | Log
root | centos | -wi-ao---- | 45.62g | | root | | | | Online
swap | centos | -wi-ao---- | | | root | | | | Offline
Please help me through this.
Any other possible ways is also welcome.Thanks in advance.

Solution, use awk with --separator and format the output with awk -F (Field Separator) and printf:
lvs --separator ',' | awk -F ',' '{printf "%-15s| %-10s| %10s| %10s| %10s| %10s| %10s| %10s| %10s|%10s| \n", $1,$2,$3,$4,$5,$6,$7,$8,$9,$10}'
Output:
LV | VG | Attr| LSize| Pool| Origin| Data%| Meta%| Move| Log|
logicalTest1 | testgroup | -wi-a-----| 1.00g| | | | | | |
logicalTest2 | testgroup | -wi-a-----| 1.00g| | | | | | |
Explanation
Installed lvm and created a couple of lvs to test.
The lvs command produces an output without field separators, just a bunch of spaces. After redirecting the output to a file I've proceeded to display the spaces:
root#florida:~# cat test2 | tr " " "*"
**Host*******Attr*******KMaj*LSize*OSize*Origin
**florida****-wi-a-----**254*1.00g*************
**florida****-wi-a-----**254*1.00g*************
Because spaces aren't good delimiters and after reading man lvs I found it has several options to print the output, one of them is the --separator parameter which allows you to use a personalized separator for each column, I've used a "," (comma) which is common for CSV.
Then, looking for options to delimit awk I've found an option to use a field separator on it, I've glued that with the print formatting explained here.
I just had to search and read a little but all the answers for this problem were on internet.

Piping grep result to awk

I would to delete column from csv files containing some text as the column header.
I would like that the output_file be the same name as the file name found by the grep.
I did the following
grep -l "pattern" * | xargs -0 awk -F'\t' '{print $1"\t"$2}' > output_file
How to output the result to the same file found by the grep ?
Thank you.

Just do this :
grep -l "pattern" * | xargs awk -F'\t' '{print $1"\t"$2 > FILENAME}'
FILENAME is the awk variable for your input file
Example :
$ cat file1
ABC zzz
EFG xxx
HIJ yyy
$ cat file2
123 aaa
456 bbb
789 ccc
grep -l "123" * | xargs awk '{print $2"\t"$1 > FILENAME}'
I switch columns 1 and 2 in the file containing "123" and overwrite file2.
$ cat file1
ABC zzz
EFG xxx
HIJ yyy
$ cat file2
aaa 123
bbb 456
ccc 789

Using multicharacter field separator using AWK

I'm having problems with AWK's field delimiter,
the input file appears as below
1 | all | | synonym |
1 | root | | scientific name |
2 | Bacteria | Bacteria | scientific name |
2 | Monera | Monera | in-part |
2 | Procaryotae | Procaryotae | in-part |
2 | Prokaryota | Prokaryota | in-part |
2 | Prokaryotae | Prokaryotae | in-part |
2 | bacteria | bacteria | blast name |
the field delimiter here is tab,pipe,tab \t|\t
so in my attempt to print just the 1st and 2nd column
awk -F'\t|\t' '{print $1 "\t" $2}' nodes.dmp | less
instead of the desired output, the output is the 1st column followed by the pipe character. I tried escaping the pipe \t\|\t, but the output remains the same.
1 |
1 |
2 |
2 |
2 |
2 |
Printing the 1st and 3rd column gave me the original intended output.
awk -F'\t|\t' '{print $1 "\t" $3}' nodes.dmp | less
but i'm puzzed as to why this is not working as intended.
I understand that the perl one liner below will work but what i really want is to use awk.
perl -aln -F"\t\|\t" -e 'print $F[0],"\t",$F[1]' nodes.dmp | less

The pipe | character seems to be confusing awk into thinking that \t|\t implies that the field separator could be one of \t or \t. Tell awk to interpret the | literally.
$ awk -F'\t[|]\t' '{print $1 "\t" $2}'
1 all
1 root
2 Bacteria
2 Monera
2 Procaryotae
2 Prokaryota
2 Prokaryotae
2 bacteria

From your posted input:
your lines can end in |, not |\t, and
you have cases (the first 2 lines) where the input contains |\t|, and
your lines start with a tab
So, an FS of tab-pipe-tab is wrong since it won't match any of the above cases since the first is just tab-pipe and the tab in the middle of the second will match the tab-pipe-tab from the preceding field but then that just leaves pipe-tab for the following field, and the first leaves you with an undesirable leading tab.
What you actually need is to set the FS to just tab-pipe and then strip off the leading tab from each field:
awk -F'\t|' -v OFS='\t' '{gsub(/(^|[|])\t/,""); print $1, $2}' file
That way you can handle all fields from 1 to NF-1 exactly the same as each other.

Using cut command:
cut -f1,2 -d'|' file.txt
without pipe in output:
cut -f1,2 -d'|' file.txt | tr -d '|'

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Using multiple delimiters when one of them is a pipe character - awk

perl is better suited for this use case : $ perl -F'\s\|\s' -lane 'print $F[3]' File # ____________ # ^ # | # FULL regex support with -F switch (delimiter, like awk, but more powerful)

First preparse with sed - remove first, third and last line, replace all spaces+|+spaces with a single |, remove leading | - then just split with awk using | (could be really cut -d'|' -f3). sed '1d;3d;$d;s/ | /|/g;s/^|//;' | awk -F'|' '{print $3}'

Related

AWK - Parsing SQL output

Separate two words in the same colum in awk cli?

How to get the empty values in data using awk command

Piping grep result to awk

Using multicharacter field separator using AWK

Categories

Resources

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Using multiple delimiters when one of them is a pipe character - awk

perl is better suited for this use case : $ perl -F'\s*\|\s*' -lane 'print $F[3]' File # ____________ # ^ # | # FULL regex support with -F switch (delimiter, like awk, but more powerful)

First preparse with sed - remove first, third and last line, replace all spaces+|+spaces with a single |, remove leading | - then just split with awk using | (could be really cut -d'|' -f3). sed '1d;3d;$d;s/ *| */|/g;s/^|//;' | awk -F'|' '{print $3}'

Related

AWK - Parsing SQL output

Separate two words in the same colum in awk cli?

How to get the empty values in data using awk command

Piping grep result to awk

Using multicharacter field separator using AWK

Categories

Resources

perl is better suited for this use case : $ perl -F'\s\|\s' -lane 'print $F[3]' File # ____________ # ^ # | # FULL regex support with -F switch (delimiter, like awk, but more powerful)

First preparse with sed - remove first, third and last line, replace all spaces+|+spaces with a single |, remove leading | - then just split with awk using | (could be really cut -d'|' -f3). sed '1d;3d;$d;s/ | /|/g;s/^|//;' | awk -F'|' '{print $3}'