Combining awk commands - awk

I currently have this piece of code:
read -p 'Enter the fruit you want to search: ' user_fruit
awk -F ":" -v re="$user_fruit" '$4 ~ re' $fruit_file
Which uses awk to find matches in $4 that match with the pattern provided by the user under the $user_fruit variable in the $fruit_file. However, I need to alter the awk command so that it only displays line matches when the word apple is also on the line.
Any help would be greatly appreciated!

You can extend the awk pattern using boolean operators:
read -p 'Enter the fruit you want to search: ' user_fruit
awk -F ":" -v re="$user_fruit" '/apple/ && $4 ~ re' "$fruit_file"
I.e. print records when the record matches /apple/ and the fourth field matches the regex.

In case you want to check for the presence of literal, fixed strings, you can ise index instead of the regex search:
read -p 'Enter the fruit you want to search: ' user_fruit
awk -F ":" -v re="$user_fruit" 'index($0, "apple") && index($4, re)' file
Here,
index($0, "apple") - checks if there is apple substring on the whole line (if its index is not 0)
&& - AND condition
index($4, re) - checks if there is apple substring in Field 4 (if its index is not 0).
See an online demo:
s='one:two:three:2-plum+pear
apple:two:three:1-plum+pear'
user_fruit='plum+pear'
awk -F ":" -v re="$user_fruit" 'index($0, "apple") && index($4, re)' <<< "$s"
#index($3, "snow") != 0
# => apple:two:three:1-plum+pear

Related

grep command or awk to get particular data

i have a large text file
https://www.google.com/
https://www.google.com/hello?url=xxxxxx
https://www.google.com/admin?x=y&file=zzz
https://www.google.com/abc.png
https://www.google.com//abc.png
https://www.google.com/abc.svg
https://www.google.com/abc.jpg
https://www.google.com/admin?x=aaa&file=yyyy
https://www.google.com/hello?
all i want is urls with paramaters but same parameters shouldn't be their with different value
https://www.google.com/hello?url=xxxxxx
https://www.google.com/admin?x=y&file=zzz
i want this result
You can use an associative array in awk to check whether you have seen the URI before, and so just print the first instance:
$ cat bar.txt
https://www.google.com/
https://www.google.com/hello?url=xxxxxx
https://www.google.com/admin?x=y&file=zzz
https://www.google.com/abc.png
https://www.google.com//abc.png
https://www.google.com/abc.svg
https://www.google.com/abc.jpg
https://www.google.com/admin?x=aaa&file=yyyy
https://www.google.com/hello?
$ awk -F? '$2 != "" && !already_seen[$1]++' bar.txt
https://www.google.com/hello?url=xxxxxx
https://www.google.com/admin?x=y&file=zzz

How do I obtain a specific row with the cut command?

Background
I have a file, named yeet.d, that looks like this
JET_FUEL = /steel/beams
ABC_DEF = /michael/jackson
....50 rows later....
SHIA_LEBEOUF = /just/do/it
....73 rows later....
GIVE_FOOD = /very/hungry
NEVER_GONNA = /give/you/up
I am familiar with the f and d options of the cut command. The f option allows you to specify which column(s) to extract from, while the d option allows you to specify what the delimiters.
Problem
I want this output returned using the cut command.
/just/do/it
From what I know, this is part of the command I want to enter:
cut -f1 -d= yeet.d
Given that I want the values to the right of the equals sign, with the equals sign as the delimiter. However this would return:
/steel/beams
/michael/jackson
....50 rows later....
/just/do/it
....73 rows later....
/very/hungry
/give/you/up
Which is more than what I want.
Question
How do I use the cut command to return only /just/do/it and nothing else from the situation above? This is different from How to get second last field from a cut command because I want to select a row within a large file, not just near from the end or the beginning.
This looks like it would be easier to express with awk...
# awk -v _s="${_string}" '$3 == _s {print $3}' "${_path}"
## Above could be more _scriptable_ form of bellow example
awk -v _search="/just/do/it" '$3 == _search {print $3}' <<'EOF'
JET_FULE = /steal/beams
SHIA_LEBEOUF = /just/do/it
NEVER_GONNA = /give/you/up
EOF
## Either way, output should be similar to
## /just/do/it
-v _something="Some Thing" bit allows for passing Bash variables to awk
$3 == _search bit tells awk to match only when column 3 is equal to the search string
To search for a sub-string within a line one can use $0 ~ _search
{print $3} bit tells awk to print column 3 for any matches
And the <<'EOF' bit tells Bash to not expand anything within the opening and closing EOF tags
... however, the above will still output duplicate matches, eg. if yeet.d somehow contained...
JET_FULE = /steal/beams
SHIA_LEBEOUF = /just/do/it
NEVER_GONNA = /give/you/up
AGAIN = /just/do/it
... there'd be two /just/do/it lines outputed by awk.
Quickest way around that would be to pipe | to head -1, but the better way would be to tell awk to exit after it's been told to print...
_string='/just/do/it'
_path='yeet.d'
awk -v _s="${_string}" '$3 == _s {print $3; exit}' "${_path}"
... though that now assumes that only the first match is wanted, obtaining the nth is possible though currently outside the scope of the question as of last time read.
Updates
To trip awk on the first column while printing the third column and exiting after the first match may look like...
_string='SHIA_LEBEOUF'
_path='yeet.d'
awk -v _s="${_string}" '$1 == _s {print $3; exit}' "${_path}"
... and generalize even further...
_string='^SHIA_LEBEOUF '
_path='yeet.d'
awk -v _s="${_string}" '$0 ~ _s {print $3; exit}' "${_path}"
... because awk totally gets regular expressions, mostly.
It depends on how you want to identify the desired line.
You could identify it by the line number. In this case you can use sed
cut -f2 -d= yeet.d | sed '53q;d'
This extracts the 53th line.
Or you could identify it by a keyword. In this case use grep
cut -f2 -d= yeet.d | grep just
This extracts all lines containing the word just.

awk -Search pattern through Variable

We have wrote shell script for multiple file name search pattern.
file format:
<number>_<20180809>.txt
starting with single number and ending with 8 digits number
Command:
awk -v string="12_1234" -v serch="^[0-9]+_+[0-9][0-9][0-9][0-9]$" "BEGIN{ if (string ~/serch$/) print string }"
If sting matches then return value.
You can just change your command in the following way and it will work:
awk -v string='12_1234' -v search='^[0-9]+_+[0-9][0-9][0-9][0-9]$' 'BEGIN{ if (string ~ search) print string }'
12_1234
You do not need to use /.../ syntax for regex if you use the ~ operator and also you had one extra $. You were really close!!!
Then you must adapt the search regex into ^[0-9]_[0-9]{8}$ to match exactly your_<20180809>` pattern.
Also if you are just extracting this information from the file you can use grep,
$ awk -v string='1_12345678' -v search='^[0-9]_[0-9]{8}$' 'BEGIN{ if (string ~ search) print string }'
1_12345678
$ (search='^[0-9]_[0-9]{8}$'; echo '1_12345678')| grep -oE "$search"
1_12345678

awk doesn't assign variable and uses $0 instead

I'm using the following /bin/sh code to parse the output of apt show and print only the names of the packages matching the second pattern, but it doesn't work. Instead it outputs the pattern itself in both prints as if the variable pac was never assigned and instead uses $0 in all cases.
apt show vim peazip 2> /dev/null | \
awk '
/^Package:/ {
pac = substr($0, 10);
print "found name "$pac;
}
/APT-Sources: \/var\/lib\/dpkg\/status/ {
print "bingo "$pac;
}
'
Output: (gawk on Ubuntu)
found name Package: vim
found name Package: peazip:i386
bingo APT-Sources: /var/lib/dpkg/status
What am I doing wrong?
awk is C-like in that you don't use $ to get the value of a variable:
$ awk 'BEGIN { x=42; print x }'
42
I think of $ in awk as an operator that fetches the value of the field number identified by the expression after $. For example, the 2nd field is $2, the last field is $NF where NF is a variable whose value is the number of fields in the current record.
Now, why does $pac act like $0?
awk, in a numeric context, treats an arbitrary string like this: take the string, truncate it at the first non-digit character; if the truncation results in an empty string, numerically treat the string as zero.
$ echo "foo bar" | awk '{x="2cats"; print $x}'
bar
The value of pac does not start with digits, so numerically is has value zero, then you apply the $ "operator" to get $0, or the whole string
$ echo "foo bar" | awk '{x="no-cats"; print $x}'
foo bar
Awk does not require $ in front of variable names :) your code should look like this instead:
apt show vim peazip 2> /dev/null | \
awk '
/^Package:/ {
pac = substr($0, 10);
print "found name "pac;
}
/APT-Sources: \/var\/lib\/dpkg\/status/ {
print "bingo "pac;
}
'

Awk: retrive lines based on a range of columns of a file

I want to retrieve lines which have a space character from character 83 + 16 characters and ignore all lines which have another character/string/integer in this range.
here is my file.txt:
7653903747235209876401 HGFDKJKK 98765435475237 caJHGFDSQ200 00779999 654321000704 2014100812204898764513165432
7653903747235209854311 KJH 98765435475280 lkjUIHJ100808442700001298765432 654321009999 2014100812204898764513165432
7653903747235209854311 BBB 98765435475280 lkjUIHJ100808442700001298765432 654321009999 2014100812204898764513165432
7653903747235209876401 GHJUYTHH 98765435475237 caJHGFDSQ200 00779999 654321000704 2014100812204898764513165432
here is my code and i want to add this condition to this code:
#!/bin/sh
var='^20141008'
awk -v var=$var '$1~/[01]1$/ && $7 ~ var' file.txt
You could add another regex match to your current awk line:
$ awk -v var="$var" '$1~/[01]1$/ && $7 ~ var && substr($0,83,16) ~ /^ +$/' file.txt
The check is that the substring containing 16 characters starting from character 83 matches the pattern. The pattern ensures that only spaces occur between the start and end of the string.
It could be:
awk -v var="$var" '$1~/[01]1$/ && $7~var && /^.{82} {16}/' file.txt
Actually referring to $7 can cause problems, as it seems that this in not a white-space separated list, but a COBOL like fixed width list. So I may reinterpret it into a single pattern using absolute positions:
var=20141008
awk -v var="$var" 'match($0,"^.{20}[01]1.{60} {16}.{38}"var)' file.txt
or the little bit shorter:
var=20141008
awk '/^.{20}[01]1.{60} {16}.{38}'"$var"'/' file.txt
In older gawk the --posix argument should be added to enable the interval regex.