Using multicharacter field separator using AWK - awk

I'm having problems with AWK's field delimiter,
the input file appears as below
1 | all | | synonym |
1 | root | | scientific name |
2 | Bacteria | Bacteria | scientific name |
2 | Monera | Monera | in-part |
2 | Procaryotae | Procaryotae | in-part |
2 | Prokaryota | Prokaryota | in-part |
2 | Prokaryotae | Prokaryotae | in-part |
2 | bacteria | bacteria | blast name |
the field delimiter here is tab,pipe,tab \t|\t
so in my attempt to print just the 1st and 2nd column
awk -F'\t|\t' '{print $1 "\t" $2}' nodes.dmp | less
instead of the desired output, the output is the 1st column followed by the pipe character. I tried escaping the pipe \t\|\t, but the output remains the same.
1 |
1 |
2 |
2 |
2 |
2 |
Printing the 1st and 3rd column gave me the original intended output.
awk -F'\t|\t' '{print $1 "\t" $3}' nodes.dmp | less
but i'm puzzed as to why this is not working as intended.
I understand that the perl one liner below will work but what i really want is to use awk.
perl -aln -F"\t\|\t" -e 'print $F[0],"\t",$F[1]' nodes.dmp | less

The pipe | character seems to be confusing awk into thinking that \t|\t implies that the field separator could be one of \t or \t. Tell awk to interpret the | literally.
$ awk -F'\t[|]\t' '{print $1 "\t" $2}'
1 all
1 root
2 Bacteria
2 Monera
2 Procaryotae
2 Prokaryota
2 Prokaryotae
2 bacteria

From your posted input:
your lines can end in |, not |\t, and
you have cases (the first 2 lines) where the input contains |\t|, and
your lines start with a tab
So, an FS of tab-pipe-tab is wrong since it won't match any of the above cases since the first is just tab-pipe and the tab in the middle of the second will match the tab-pipe-tab from the preceding field but then that just leaves pipe-tab for the following field, and the first leaves you with an undesirable leading tab.
What you actually need is to set the FS to just tab-pipe and then strip off the leading tab from each field:
awk -F'\t|' -v OFS='\t' '{gsub(/(^|[|])\t/,""); print $1, $2}' file
That way you can handle all fields from 1 to NF-1 exactly the same as each other.

Using cut command:
cut -f1,2 -d'|' file.txt
without pipe in output:
cut -f1,2 -d'|' file.txt | tr -d '|'

Related

check if column matches any line in file with awk

say I have some output from the command openstack security group list:
+--------------------------------------+---------+------------------------+----------------------------------+------+
| ID | Name | Description | Project | Tags |
+--------------------------------------+---------+------------------------+----------------------------------+------+
| 1dda8a57-fff4-4832-9bac-4e806992f19a | default | Default security group | 0ce266c801ae4611bb5744a642a01eda | [] |
| 2379d595-0fdc-479f-a211-68c83caa9d42 | default | Default security group | 602ad29db6304ec39dc253bcbba408a7 | [] |
| 431df666-a9ba-4643-a3a0-9a70c89e1c05 | tempest | tempest test | b320a32508a74829a0563078da3cba2e | [] |
| 5b54e63c-f2e5-4eda-b2b9-a7061d19695f | default | Default security group | 57e745b9612941709f664c58d93e4188 | [] |
| 6381ebaf-79fb-4a31-bc32-49e2fecb7651 | default | Default security group | f5c30c42f3d74b8989c0c806603611da | [] |
| 6cce5c94-c607-4224-9401-c2f920c986ef | default | Default security group | e3190b309f314ebb84dffe249009d9e9 | [] |
| 7402fdd3-0f1e-4eb1-a9cd-6896f1457567 | default | Default security group | d390b68f95c34cefb0fc942d4e0742f9 | [] |
| 76978603-545b-401d-9959-9574e907ec57 | default | Default security group | 3a7b5361e79f4914b09b022bcae7b44a | [] |
| 7705da1e-d01e-483d-ab82-c99fdb9eba9c | default | Default security group | 1da03b5e7ce24be38102bd9c8f99e914 | [] |
| 7fd52305-850c-4d9a-a5e9-0abfb267f773 | default | Default security group | 5b20d6b7dfab4bfbac0a1dd3eb6bf460 | [] |
| 82a38caa-8e7f-468f-a4bc-e60a8d4589a6 | default | Default security group | d544d2243caa4e1fa027cfdc38a4f43e | [] |
| a4a5eaba-5fc9-463a-8e09-6e28e5b42f80 | default | Default security group | 08efe6ec9b404119a76996907abc606b | [] |
| e7c531e3-cdc3-4b7c-bf32-934a2f2de3f1 | default | Default security group | 539c238bf0e84463b8639d0cb0278699 | [] |
| f96bf2e8-35fe-4612-8988-f489fd4c04e3 | default | Default security group | 2de96a1342ee42a7bcece37163b8dfa0 | [] |
+--------------------------------------+---------+------------------------+----------------------------------+------+
And I have a list of Project IDs:
0ce266c801ae4611bb5744a642a01eda
b320a32508a74829a0563078da3cba2e
57e745b9612941709f664c58d93e4188
f5c30c42f3d74b8989c0c806603611da
e3190b309f314ebb84dffe249009d9e9
d390b68f95c34cefb0fc942d4e0742f9
3a7b5361e79f4914b09b022bcae7b44a
5b20d6b7dfab4bfbac0a1dd3eb6bf460
d544d2243caa4e1fa027cfdc38a4f43e
08efe6ec9b404119a76996907abc606b
539c238bf0e84463b8639d0cb0278699
2de96a1342ee42a7bcece37163b8dfa0
which is the intersection of two files I get from runnning fgrep -x -f projects secgrup
how can I extract the rows from the ID column for which the Project column IDs match this list that I have?
It would be something like:
openstack security group list | awk '$2 && $2!="ID" && $10 in $(fgrep -x -f projects secgrup) {print $2}'
which should yield:
1dda8a57-fff4-4832-9bac-4e806992f19a
431df666-a9ba-4643-a3a0-9a70c89e1c05
5b54e63c-f2e5-4eda-b2b9-a7061d19695f
6381ebaf-79fb-4a31-bc32-49e2fecb7651
6cce5c94-c607-4224-9401-c2f920c986ef
7402fdd3-0f1e-4eb1-a9cd-6896f1457567
76978603-545b-401d-9959-9574e907ec57
7fd52305-850c-4d9a-a5e9-0abfb267f773
82a38caa-8e7f-468f-a4bc-e60a8d4589a6
a4a5eaba-5fc9-463a-8e09-6e28e5b42f80
e7c531e3-cdc3-4b7c-bf32-934a2f2de3f1
f96bf2e8-35fe-4612-8988-f489fd4c04e3
but obviously this doesn't work.
You can use this awk for this:
awk -F ' *\\| *' 'FNR == NR {arr[$1]; next}
$5 in arr {print $2}' projects secgrup
1dda8a57-fff4-4832-9bac-4e806992f19a
431df666-a9ba-4643-a3a0-9a70c89e1c05
5b54e63c-f2e5-4eda-b2b9-a7061d19695f
6381ebaf-79fb-4a31-bc32-49e2fecb7651
6cce5c94-c607-4224-9401-c2f920c986ef
7402fdd3-0f1e-4eb1-a9cd-6896f1457567
76978603-545b-401d-9959-9574e907ec57
7fd52305-850c-4d9a-a5e9-0abfb267f773
82a38caa-8e7f-468f-a4bc-e60a8d4589a6
a4a5eaba-5fc9-463a-8e09-6e28e5b42f80
e7c531e3-cdc3-4b7c-bf32-934a2f2de3f1
f96bf2e8-35fe-4612-8988-f489fd4c04e3
Here:
-F ' *\\| *' sets input field separator to | surrounded with 0 or more spaces on both sides.
With your shown samples only, please try following awk code. Written and tested in GNU awk.
awk '
FNR==NR{
arr1[$0]
next
}
match($0,/.*default \| Default security group \| (\S+)/,arr2) && (arr2[1] in arr1){
print arr2[1]
}
' ids Input_file
Explanation:
Checking FNR==NR condition which will be TRUE when first Input_file named ids(where your ids are stored) is being read.
Then creating an array named arr1 is being created with index of current line.
next keyword will skip all further statements from here.
Then using match function with regex .*default \| Default security group \| (\S+) which will create 1 capturing group and share its value to array named arr2.
Then checking condition if arr2 value is present inside arr1 then print its value else do nothing.

Using multiple delimiters when one of them is a pipe character

I have a text file where fields are separated by a pipe character. Since it is a human readable text, there are spaces used for column alignment.
Here is a sample input:
+------------------------------------------+----------------+------------------+
| Column1 | Column2 | Column3 | Column4 | Last Column |
+------------------------------------------+----------------+------------------+
| some_text | other_text | third_text | fourth_text | last_text |
<more such lines>
+------------------------------------------+----------------+------------------+
How can I use awk to extract the third field in this case? The
I tried:
awk -F '[ |]' '{print $3}' file
awk -F '[\|| ]' '{print $3}' file
awk -F '[\| ]' '{print $3}' file
The expected result is:
<blank>
Column3
<more column 3 values>
<blank>
third_text
I am trying to achieve this with a single awk command. Isn't that possible?
The following post talks about using pipe as a delimiter in awk but it doesn't talk about the case of multiple delimiters where one of them is a pipe character:
Using pipe character as a field separator
Am I missing something ?
Example input :
+------------------------------------------+----------------+------------------+
| Column1 | Column2 | Column3 | Column4 | Last Column |
+------------------------------------------+----------------+------------------+
| some_text | other_text | third_text | fourth_text | last_text |
| some_text2| other_text2 | third_text2 | fourth_text2 | last_text2 |
+------------------------------------------+----------------+------------------+
Command :
gawk -F '[| ]*' '{print $4}' <file>
Output :
<blank>
Column3
<blank>
third_text
third_text2
<blank>
Works for every column (you just need to use i+1 instead of i because first column empty values or +-----).
perl is better suited for this use case :
$ perl -F'\s*\|\s*' -lane 'print $F[3]' File
#  ____________
#  ^
#  |
# FULL regex support with -F switch (delimiter, like awk, but more powerful)
First preparse with sed - remove first, third and last line, replace all spaces+|+spaces with a single |, remove leading | - then just split with awk using | (could be really cut -d'|' -f3).
sed '1d;3d;$d;s/ *| */|/g;s/^|//;' |
awk -F'|' '{print $3}'

Separate two words in the same colum in awk cli?

In CLI I type:
$ timedatectl | grep Time | awk '{print $3}'
Which gives me the correct output:
Country/City
How can I just get the print out of city without the Country?
Use this command instead:
timedatectl | grep Time | awk '{print $3}' | cut -d/ -f2

how to edit '|' result from sqllite3 using sed command

hey guys so i have this database:
id | item_name | number_of_store| store_location|
+----+---------+-------------------+-------------+
| 3 | margarine | 2 | QLD |
| 4 | margarine | 2 | NSW |
| 5 | wine | 3 | QLD |
| 6 | wine | 3 | NSW |
| 7 | wine | 3 | NSW |
| 8 | laptop | 1 | QLD |
+----+---------+-------------------+-------------+
i got the result i wanted using the distinct from sqllite3 syntax which are the following:
id | item_name | number_of_store| store_location|
+----+---------+-------------------+-------------+
| 3 | margarine | 2 | QLD |
| 4 | margarine | 2 | NSW |
syntax are :
sqlite3 store.sqlite 'select item_name,number_of_store,store_location from store where item_name = 'margarine'> store.txt
but when i saved it to txt i got
3|margarine|2|QLD
4|margarine|2|NSW
however my desired output in the txt are
3,margarine,2,QLD
4,margarine,2,NSW
i think i should use SED but not quite sure how to do it
i tried with
'|sed 's/|//g' |sed 's/|//g'|sed 's/^//g'|sed 's/$//g'
however the result only erase the '|' i'm not too sure how to change it to ','
Though you should sql itself but as per your request you could use following sed.
awk '{gsub("|",",")} 1' Input_file
Or in sed:
sed 's#|#,#g' Input_file
In case you want to save output into Input_file itself use sed -i.bak option it will take backup of Input_file and save output into Input_file itself.

How to get the empty values in data using awk command

I had linux lvs command sample result set. I am trying to re-arrange the fields using AWK command.Using that I am not able to skip the empty vales in data.
LV VG Attr LSize Pool Origin Data% Meta% Move Log
root centos -wi-ao---- 45.62g root Online
swap centos -wi-ao---- root Offline
I tried the following command,
awk '{print $1, $2, $3, $4, $5, $6, $7, $8, $9, $10}' lvs.txt
But Output is
LV VG Attr LSize Pool Origin Data% Meta% Move Log
root centos -wi-ao---- 45.62g root Online
swap centos -wi-ao---- root Offline
My expected result must be,
LV | VG | Attr | LSize | Pool | Origin | Data% | Meta% | Move | Log
root | centos | -wi-ao---- | 45.62g | | root | | | | Online
swap | centos | -wi-ao---- | | | root | | | | Offline
Please help me through this.
Any other possible ways is also welcome.Thanks in advance.
Solution, use awk with --separator and format the output with awk -F (Field Separator) and printf:
lvs --separator ',' | awk -F ',' '{printf "%-15s| %-10s| %10s| %10s| %10s| %10s| %10s| %10s| %10s|%10s| \n", $1,$2,$3,$4,$5,$6,$7,$8,$9,$10}'
Output:
LV | VG | Attr| LSize| Pool| Origin| Data%| Meta%| Move| Log|
logicalTest1 | testgroup | -wi-a-----| 1.00g| | | | | | |
logicalTest2 | testgroup | -wi-a-----| 1.00g| | | | | | |
Explanation
Installed lvm and created a couple of lvs to test.
The lvs command produces an output without field separators, just a bunch of spaces. After redirecting the output to a file I've proceeded to display the spaces:
root#florida:~# cat test2 | tr " " "*"
**Host*******Attr*******KMaj*LSize*OSize*Origin
**florida****-wi-a-----**254*1.00g*************
**florida****-wi-a-----**254*1.00g*************
Because spaces aren't good delimiters and after reading man lvs I found it has several options to print the output, one of them is the --separator parameter which allows you to use a personalized separator for each column, I've used a "," (comma) which is common for CSV.
Then, looking for options to delimit awk I've found an option to use a field separator on it, I've glued that with the print formatting explained here.
I just had to search and read a little but all the answers for this problem were on internet.