check if column matches any line in file with awk - awk

say I have some output from the command openstack security group list:
| ID | Name | Description | Project | Tags |
| 1dda8a57-fff4-4832-9bac-4e806992f19a | default | Default security group | 0ce266c801ae4611bb5744a642a01eda | [] |
| 2379d595-0fdc-479f-a211-68c83caa9d42 | default | Default security group | 602ad29db6304ec39dc253bcbba408a7 | [] |
| 431df666-a9ba-4643-a3a0-9a70c89e1c05 | tempest | tempest test | b320a32508a74829a0563078da3cba2e | [] |
| 5b54e63c-f2e5-4eda-b2b9-a7061d19695f | default | Default security group | 57e745b9612941709f664c58d93e4188 | [] |
| 6381ebaf-79fb-4a31-bc32-49e2fecb7651 | default | Default security group | f5c30c42f3d74b8989c0c806603611da | [] |
| 6cce5c94-c607-4224-9401-c2f920c986ef | default | Default security group | e3190b309f314ebb84dffe249009d9e9 | [] |
| 7402fdd3-0f1e-4eb1-a9cd-6896f1457567 | default | Default security group | d390b68f95c34cefb0fc942d4e0742f9 | [] |
| 76978603-545b-401d-9959-9574e907ec57 | default | Default security group | 3a7b5361e79f4914b09b022bcae7b44a | [] |
| 7705da1e-d01e-483d-ab82-c99fdb9eba9c | default | Default security group | 1da03b5e7ce24be38102bd9c8f99e914 | [] |
| 7fd52305-850c-4d9a-a5e9-0abfb267f773 | default | Default security group | 5b20d6b7dfab4bfbac0a1dd3eb6bf460 | [] |
| 82a38caa-8e7f-468f-a4bc-e60a8d4589a6 | default | Default security group | d544d2243caa4e1fa027cfdc38a4f43e | [] |
| a4a5eaba-5fc9-463a-8e09-6e28e5b42f80 | default | Default security group | 08efe6ec9b404119a76996907abc606b | [] |
| e7c531e3-cdc3-4b7c-bf32-934a2f2de3f1 | default | Default security group | 539c238bf0e84463b8639d0cb0278699 | [] |
| f96bf2e8-35fe-4612-8988-f489fd4c04e3 | default | Default security group | 2de96a1342ee42a7bcece37163b8dfa0 | [] |
And I have a list of Project IDs:
which is the intersection of two files I get from runnning fgrep -x -f projects secgrup
how can I extract the rows from the ID column for which the Project column IDs match this list that I have?
It would be something like:
openstack security group list | awk '$2 && $2!="ID" && $10 in $(fgrep -x -f projects secgrup) {print $2}'
which should yield:
but obviously this doesn't work.

You can use this awk for this:
awk -F ' *\\| *' 'FNR == NR {arr[$1]; next}
$5 in arr {print $2}' projects secgrup
-F ' *\\| *' sets input field separator to | surrounded with 0 or more spaces on both sides.

With your shown samples only, please try following awk code. Written and tested in GNU awk.
awk '
match($0,/.*default \| Default security group \| (\S+)/,arr2) && (arr2[1] in arr1){
print arr2[1]
' ids Input_file
Checking FNR==NR condition which will be TRUE when first Input_file named ids(where your ids are stored) is being read.
Then creating an array named arr1 is being created with index of current line.
next keyword will skip all further statements from here.
Then using match function with regex .*default \| Default security group \| (\S+) which will create 1 capturing group and share its value to array named arr2.
Then checking condition if arr2 value is present inside arr1 then print its value else do nothing.


awk code to filter sequences below 80 bp and 80% coverage

I want to filter my table based on those sequences that have at least 80 base pair (end-begin+1 >= 80) which spans over 80% of their total length (base pairs left should be =< 20% of the total length: (end-begin+1) + left = total length)
| query sequence | begin | end | (left)|
| -------------- | ------| --- | ----- |
| D1 | 1 | 330 | (1939)|
| D2 | 2180 | 2269| (0) |
| D3 | 4 | 168 | (0) |
| D4 | 1 | 1610| (0) |
| D5 | 1 | 402 | (84) |
| D6 | 1 | 58 | (0) |
| D7 | 1 | 79 | (0) |
| D8 | 4 | 167 | (437) |
| D9 |310 | 478 | (214) |
| D10 |1 | 227 | (234) |
| D11 |2 | 604 | (141) |
that is my awk code:
awk '{print $0, $7-$6+1, $7+$8, ($7-$6+1)/($7+$8)}' | awk '$18 >= 0.8 {print $0}'
however there are sequences that are not filtered according to the minimum 80 base pair nor the 80% of the total length rule, where am I wrong?
the expected output:
| query sequence | begin | end | (left)|
| -------------- | ------| --- | ----- |
| D2 | 2180 | 2269| (0) |
| D3 | 4 | 168 | (0) |
| D4 | 1 | 1610| (0) |
| D5 | 1 | 402 | (84) |
Column $8 (left) has parentheses around the numbers, therefore awk fails to interpret $8 as a number and uses 0 instead. Example: awk '{print $1+2}' <<< '(3)' prints 2 instead of 5.
You can extract the number inside the parentheses into a variable using left=$8; gsub(/[()]/,"",left).
By the way: No need for 2 awk scripts. You can do everything in one script:
awk '{left=$8; gsub(/[()]/,"",left); bp=$7-$6+1; tl=bp+left} bp>=80 && bp>0.8*tl'
You might set custom field separator to get just numbers in $8 (and other columns) rather than digits inside ( and ), i.e. replace
awk '{print $0, $7-$6+1, $7+$8, ($7-$6+1)/($7+$8)}'
awk 'BEGIN{FS="[)[:space:](]+"}{print $0, $7-$6+1, $7+$8, ($7-$6+1)/($7+$8)}'
Explanation: treat any combination of ) whitespace ( as field separator (FS). Not tested due to lack of sample input as text. If you want to know more about FS read 8 Powerful Awk Built-in Variables – FS, OFS, RS, ORS, NR, NF, FILENAME, FNR

AWK Sum and group by : output with headers

I have a huge csv with this structure (sample):
| DATE | WEEKDAY | Shop Code |Shop Manager|Item Presentation Time|Item Sell|
|02-Mar |MONDAY | BOG | Tom |1030 |0 |
|02-Mar |TUESDAY | TEF | Lucas |1300 |1 |
|02-Mar |WEDNESDAY | TDC | Eriberto |1300 |1 |
|02-Mar |THURSDAY | TEF | Lucas |1300 |1 |
|02-Mar |FRIDAY | TEF | Lucas |1300 |1 |
|02-Mar |SATURDAY | GTY | Maya |1600 |1 |
|02-Mar |SUNDAY | TDC | Eriberto |1300 |1 |
I am interested in the sum of successful event ($6)per weekday, the count of presentation per weekday ($2), and the percentual of successful event ( sum $6/count $2 *100)
I wrote the following script:
#!/bin/awk -f
BEGIN {FS = OFS = ","}
{if (NR!=1) a[$2]+=$6;count[$2]++$2}END{for (i in a){ print i","a[i] "," count[i]","a[i]/count[i]*100}}
The script runs:
$ awk -f script.awk raw_file.csv > new_file.csv
It works out perfectly and the output is:
|MONDAY | 2 | 10 |0.20|
|TUESDAY | 18 | 30 |0.60|
|WEDNESDAY | 10 | 20 |0.50|
|THURSDAY | 1 | 20 |0.05|
|FRIDAY | 1 | 15 |0.07|
|SATURDAY | 60 | 100 |0.60|
|SUNDAY | 47 | 80 |0.59|
However I would like to add in the output the header (WEEKDAY,SUCCESSFUL_EVENTS,TOTAL_EVENTS and SUCCESSFUL_RATE. I have no idea how to put in the same script the NR with the header.
I can show the output with:
awk 'NR==1 {print
$0}' new_file.csv
but no way to integrate this in the script
Any suggestion is really appreciated
You can do this in the begin section of your script:
#!/bin/awk -f
FS = OFS = ","
# ...

how to edit '|' result from sqllite3 using sed command

hey guys so i have this database:
id | item_name | number_of_store| store_location|
| 3 | margarine | 2 | QLD |
| 4 | margarine | 2 | NSW |
| 5 | wine | 3 | QLD |
| 6 | wine | 3 | NSW |
| 7 | wine | 3 | NSW |
| 8 | laptop | 1 | QLD |
i got the result i wanted using the distinct from sqllite3 syntax which are the following:
id | item_name | number_of_store| store_location|
| 3 | margarine | 2 | QLD |
| 4 | margarine | 2 | NSW |
syntax are :
sqlite3 store.sqlite 'select item_name,number_of_store,store_location from store where item_name = 'margarine'> store.txt
but when i saved it to txt i got
however my desired output in the txt are
i think i should use SED but not quite sure how to do it
i tried with
'|sed 's/|//g' |sed 's/|//g'|sed 's/^//g'|sed 's/$//g'
however the result only erase the '|' i'm not too sure how to change it to ','
Though you should sql itself but as per your request you could use following sed.
awk '{gsub("|",",")} 1' Input_file
Or in sed:
sed 's#|#,#g' Input_file
In case you want to save output into Input_file itself use sed -i.bak option it will take backup of Input_file and save output into Input_file itself.

How to get the empty values in data using awk command

I had linux lvs command sample result set. I am trying to re-arrange the fields using AWK command.Using that I am not able to skip the empty vales in data.
LV VG Attr LSize Pool Origin Data% Meta% Move Log
root centos -wi-ao---- 45.62g root Online
swap centos -wi-ao---- root Offline
I tried the following command,
awk '{print $1, $2, $3, $4, $5, $6, $7, $8, $9, $10}' lvs.txt
But Output is
LV VG Attr LSize Pool Origin Data% Meta% Move Log
root centos -wi-ao---- 45.62g root Online
swap centos -wi-ao---- root Offline
My expected result must be,
LV | VG | Attr | LSize | Pool | Origin | Data% | Meta% | Move | Log
root | centos | -wi-ao---- | 45.62g | | root | | | | Online
swap | centos | -wi-ao---- | | | root | | | | Offline
Please help me through this.
Any other possible ways is also welcome.Thanks in advance.
Solution, use awk with --separator and format the output with awk -F (Field Separator) and printf:
lvs --separator ',' | awk -F ',' '{printf "%-15s| %-10s| %10s| %10s| %10s| %10s| %10s| %10s| %10s|%10s| \n", $1,$2,$3,$4,$5,$6,$7,$8,$9,$10}'
LV | VG | Attr| LSize| Pool| Origin| Data%| Meta%| Move| Log|
logicalTest1 | testgroup | -wi-a-----| 1.00g| | | | | | |
logicalTest2 | testgroup | -wi-a-----| 1.00g| | | | | | |
Installed lvm and created a couple of lvs to test.
The lvs command produces an output without field separators, just a bunch of spaces. After redirecting the output to a file I've proceeded to display the spaces:
root#florida:~# cat test2 | tr " " "*"
Because spaces aren't good delimiters and after reading man lvs I found it has several options to print the output, one of them is the --separator parameter which allows you to use a personalized separator for each column, I've used a "," (comma) which is common for CSV.
Then, looking for options to delimit awk I've found an option to use a field separator on it, I've glued that with the print formatting explained here.
I just had to search and read a little but all the answers for this problem were on internet.

SQL and NOW() into bash script

I have a file, "sharp.csv" in which data are like this :
Edit : These data are filled by a perl script
I want to insert these data into a SQL database with the help of a BASH script
| Field | Type | Null | Key | Default | Extra |
| code_site | varchar(64) | NO | PRI | NULL | |
| ip_source | varchar(64) | NO | PRI | NULL | |
| mac_relevee | varchar(64) | NO | PRI | NULL | |
| ip_relevee | varchar(64) | YES | | NULL | |
| vlan_concerne | varchar(64) | YES | | NULL | |
| date_polling | datetime | YES | | NULL | |
So my commands are just :
$mysql -h $db_address -u $db_user -p$db_passwd $db_name -e "load data local infile '$arp_router_file' REPLACE INTO TABLE $db_arp_router_table fields terminated by ';';"
But my NOW() command won't work.
Data are beeing inserted, but the "date_polling" row is filled with "0000-00-00 00:00:00"
NOW() is a function but the load data command will treat all the fields in the input file as literal data, thus trying to insert "NOW()" in a datetime field. The string "NOW()" can't be converted to a valid datetime value, so you end up with the default "0000-00-00 00:00:00" value.
You will have to build actual SQL INSERT queries from your input file, you can do this using awk:
cat input_file.csv | awk -F';' '{print "INSERT INTO routeur_arp (code_site, ip_source, mac_relevee, ip_relevee, vlan_concerne, date_polling) VALUES (\"" $1 "\", \"" $2 "\", \"" $3 "\", \"" $4 "\", \"" $5 "\", " $6 ");"}' > sql_statements.sql
To avoid sending NOW() instruction to MySQL, i'll send it a datetime with perl
use Time::Piece ;
my $t = localtime ;
my $date = $t->ymd;
my $time = $t->hms;
And instead of
It'll be
<name>;<src_ip_address>;<mac_address>;<ip_address>;Vlan1;$date $time\n
Perl send it as a String, MySQL understand is as a correct Datetime