How to extract word from a string that may/may not start with a single quote - awk

Sample string:
'kernel-rt|kernel-alt|/kernel-' 'headers|xen|firmware|tools|python|utils'
cut -d' ' -f 1 string.txt gives me
'kernel-rt|kernel-alt|/kernel-'
But how do we proceed further to get just the 'kernel' from it?

Assuming you want only the 3rd kernel (in bold) and not the others
'kernel-rt|kernel-alt|/kernel-' 'headers|xen|firmware|tools|python|utils'
Here is how you extract it using single command awk (standard Linux gawk).
input="kernel-rt|kernel-alt|/kernel-' 'headers|xen|firmware|tools|python|utils"
echo $input|awk -F"|" '{split($3,a,"-");match(a[1],"[[:alnum:]]+",b);print b[0]}'
explanation
-F"|" specify field separator is | so that only is 3rd field required
split($3,a,"-") split 3rd field by -, left part assigned to a[1]
match(a[1],"[[:alnum:]]+",b) from a[1] extract sequence of alphanumeric string into b[0]
print b[0] output the matched string.
If you want to extract kernel from 2nd or 1st fields. Change $3 to $2 or $1.

$ cat file
'kernel-rt|kernel-alt|/kernel-' 'headers|xen|firmware|tools|python|utils'
$
$ awk '{print $1}' file
'kernel-rt|kernel-alt|/kernel-'
$
$ awk '{gsub(/\047/,"",$1); print $1}' file
kernel-rt|kernel-alt|/kernel-
$
$ awk '{gsub(/\047/,""); split($1,f,/[|]/); print f[1]}' file
kernel-rt
and just to make you think...
$ awk '{gsub(/\047|\.*/,"")}1' file
kernel-rt

Related

Redhat how can i use a space and double quote as separators to output the 2nd, 4th and last column from a each line in a file

I have the following multiple lines in a file on Linux, the line information differs but the format is always the same:
-item bread.maker -model "modelname model type modelnum-43453-23241.7" -date1 23.10.01 -date2 30.10.04 -date3 04.02.05
I want to output only the 2nd, 4th and last columns of each line. I've tried with awk -F, and print $NF, but I cannot seem get it to treat the double quoted part as 1 column.
With any awk:
$ awk 'match($0,/"[^"]*"/){print $2, substr($0,RSTART,RLENGTH), $NF}' file
bread.maker "modelname model type modelnum-43453-23241.7" 04.02.05
or:
$ awk -v OFS='"' '{split($0,f,/"/); print $2, f[2], $NF}' file
bread.maker"modelname model type modelnum-43453-23241.7"04.02.05
or with GNU awk for FPAT:
$ awk -v FPAT='[^" ]+|"[^"]*"' '{print $2, $4, $10}' file
bread.maker "modelname model type modelnum-43453-23241.7" 04.02.05
Set OFS as appropriate if you want something other than a blank char to separate the output fields. I used " as the OFS for the 2nd script since it must not be present in your input if you're already using it to quote strings.
With bash v5.1, we can assign an even-numbered list of words as an associative array: it will be treated as a key-value list.
declare -A fields
while IFS= read -r line; do
eval fields=("$line") # yeah, eval is needed here to respect
# the quotes in the line
printf '%s,%s,%s\n' "${fields[-item]}" "${fields[-model]}" "${fields[-date3]}"
done < file
bread.maker,modelname model type modelnum-43453-23241.7,04.02.05

awk command to read a key value pair from a file

I have a file input.txt which stores information in KEY:VALUE form. I'm trying to read GOOGLE_URL from this input.txt which prints only http because the seperator is :. What is the problem with my grep command and how should I print the entire URL.
SCRIPT
$> cat script.sh
#!/bin/bash
URL=`grep -e '\bGOOGLE_URL\b' input.txt | awk -F: '{print $2}'`
printf " $URL \n"
INPUT_FILE
$> cat input.txt
GOOGLE_URL:https://www.google.com/
OUTPUT
https
DESIRED_OUTPUT
https://www.google.com/
Since there are multiple : in your input, getting $2 will not work in awk because it will just give you 2nd field. You actually need an equivalent of cut -d: -f2- but you also need to check key name that comes before first :.
This awk should work for you:
awk -F: '$1 == "GOOGLE_URL" {sub(/^[^:]+:/, ""); print}' input.txt
https://www.google.com/
Or this non-regex awk approach that allows you to pass key name from command line:
awk -F: -v k='GOOGLE_URL' '$1==k{print substr($0, length(k FS)+1)}' input.txt
Or using gnu-grep:
grep -oP '^GOOGLE_URL:\K.+' input.txt
https://www.google.com/
Could you please try following, written and tested with shown samples in GNU awk. This will look for string GOOGLE_URL and will catch further either http or https value from url, in case you need only https then change http[s]? to https in following solution please.
awk '/^GOOGLE_URL:/{match($0,/http[s]?:\/\/.*/);print substr($0,RSTART,RLENGTH)}' Input_file
Explanation: Adding detailed explanation for above.
awk ' ##Starting awk program from here.
/^GOOGLE_URL:/{ ##Checking condition if line starts from GOOGLE_URL: then do following.
match($0,/http[s]?:\/\/.*/) ##Using match function to match http[s](s optional) : till last of line here.
print substr($0,RSTART,RLENGTH) ##Printing sub string of matched value from above function.
}
' Input_file ##Mentioning Input_file name here.
2nd solution: In case you need anything coming after first : then try following.
awk '/^GOOGLE_URL:/{match($0,/:.*/);print substr($0,RSTART+1,RLENGTH-1)}' Input_file
Take your pick:
$ sed -n 's/^GOOGLE_URL://p' file
https://www.google.com/
$ awk 'sub(/^GOOGLE_URL:/,"")' file
https://www.google.com/
The above will work using any sed or awk in any shell on every UNIX box.
I would use GNU AWK following way for that task:
Let file.txt content be:
EXAMPLE_URL:http://www.example.com/
GOOGLE_URL:https://www.google.com/
KEY:GOOGLE_URL:
Then:
awk 'BEGIN{FS="^GOOGLE_URL:"}{if(NF==2){print $2}}' file.txt
will output:
https://www.google.com/
Explanation: GNU AWK FS might be pattern, so I set it to GOOGLE_URL: anchored (^) to begin of line, so GOOGLE_URL: in middle/end will not be seperator (consider 3rd line of input). With this FS there might be either 1 or 2 fields in each line - latter is case only if line starts with GOOGLE_URL: so I check number of fields (NF) and if this is second case I print 2nd field ($2) as first record in this case is empty.
(tested in gawk 4.2.1)
Yet another awk alternative:
gawk -F'(^[^:]*:)' '/^GOOGLE_URL:/{ print $2 }' infile

Exact string match in awk

I have a file test.txt with the next lines
1997 100 500 2010TJ
2010TJXML 16 20 59
I'm using the next awk line to get information only about string 2010TJ
awk -v var="2010TJ" '$0 ~ var {print $0}' test.txt
But the code print the two lines. I want to know how to get the line containing the exact string
1997 100 500 2010TJ
the string can be placed in any column of the file.
Several options:
Use a gawk word boundary (not POSIX awk...):
$ gawk '/\<2010TJ\>/' file
An actual space or tab or what is separating the columns:
$ awk '/^2010TJ /' file
Or compare the field directly to the string:
$ awk '$1=="2010TJ"' file
You can loop over the fields to test each field if you wish:
$ awk '{for (i=1;i<=NF;i++) if ($i=="2010TJ") {print; next}}' file
Or, given your example of setting a variable, those same using a variable:
$ gawk -v s=2010TJ '$0~"\\<" s "\\>"'
$ awk -v s=2010TJ '$0~"^" s " "'
$ awk -v s=2010TJ '$1==s'
Note the first is a little different than the second and third. The first is the standalone string 2010TJ anywhere in $0; the second and third is a string that starts with that string.
Try this (for testing only column 1) :
awk '$1 == "2010TJ" {print $0}' test.txt
or grep like (all columns) :
gawk '/\<2010TJ\>/ {print $0}' test.txt
Note
\< \> is word boundarys
another awk with word boundary
awk '/\y2010TJ\y/' file
note \y matches either beginning or end of a word.

AWK that reads up to the /

I have the following lines of text :
170311 005201 0433 DE(N) itemhandling itemAddBarCodeData: Barcode(1/1) <0157357069/OK> ##[ti=7672,
170311 005323 0433 DE(N) itemhandling itemAddBarCodeData: Barcode(1/1) </NOREAD> ##[ti=7672,
I have the following script :
grep "itemAddBarCodeData" %myItemHandling% | gawk -F "[<>]+" -v OFS=, "{for(i=1;i<=NF;++i){if($i~/Barcode/){print substr($1,5,2)substr($1,3,2)substr($1,1,2),substr($1,8,6),$(i+1)}}}" > %myOutputPath%%myFilename%
What I need is a script that reads only the /NOREAD and the /OK so the output is like :
11-03-17,00:52:01,NOREAD
11-03-17,00:53:23,OK
any help would be greatly appreciated
Thanks
Complex gawk approach:
awk -F"[ />]" '{patsplit($1, a, /[0-9]{2}/); patsplit($2, b, /[0-9]{2}/);
printf("%s-%s-%s,%s:%s:%s,%s\n",a[3],a[2],a[1],b[1],b[2],b[3],$10)}' inpufile
The output:
11-03-17,00:52:01,OK
11-03-17,00:53:23,NOREAD
-F"[ />]" - "composite" field separator
patsplit(string, array [, fieldpat [, steps ] ])
Divide string into pieces defined by fieldpat and store the pieces in array and
the separator strings in the seps array.
You can use this following script:
script.awk
/\/[A-Z]+>/ { match($1"-"$2,/(..)(..)(..)-(..)(..)(..)/,ts)
dt=mktime( sprintf("20%s %s %s %s %s %s",
ts[1], ts[2], ts[3],
ts[4], ts[5], ts[6]) )
dtd = strftime( "%d-%m-%y", dt )
dts = strftime( "%H:%M:%S", dt )
match ( $0, /\/[A-Z]+>/) # set RSTART and RLENGTH
print dtd, dts, substr( $0, RSTART+1, RLENGTH-2)
}
Run it like this: awk -v OFS=, -f script.awk yourfile
The important part is the second match function call, which matches
a string of capital letters [A_Z]
preceded by a /
followed by a >.
It should match the OK and NOREAD case and not the Barcode(1/1).
The variables
RSTART and
RLENGTH
are set by the match function, we have to correct them by +1 and -2, because the match RE included / and >.
The first match, mktime, strftime and the sprintf function call are another way the format the date and time. The time functions are GNU AWK extensions.
Regular awk version:
awk '
{
d=$1$2
gsub(/../,"& ",d)
split(d,T)
split($8,R,"[/>]")
printf "%s-%s-%s,%s:%s:%s,%s\n",T[3],T[2],T[1],T[4],T[5],T[6],R[2]
}
' file
With script in file:
script.awk:
{
d=$1$2
gsub(/../,"& ",d)
split(d,T)
split($8,R,"[/>]")
printf "%s-%s-%s,%s:%s:%s,%s\n",T[3],T[2],T[1],T[4],T[5],T[6],R[2]
}
awk -f script.awk file
crammed on one line..
awk '{d=$1$2; gsub(/../,"& ",d); split(d,T); split($8,R,"[/>]"); printf "%s-%s-%s,%s:%s:%s,%s\n",T[3],T[2],T[1],T[4],T[5],T[6],R[2]}' file
You don't need grep when you're using awk. With GNU awk for gensub():
$ awk '/itemAddBarCodeData/{print gensub(/(..)(..)(..) (..)(..)(..).*\/([^>]+).*/,"\\3-\\2-\\1,\\4:\\5:\\6,\\7",1)}' file
11-03-17,00:52:01,OK
11-03-17,00:53:23,NOREAD
Here's a pragmatic combination of awk and sed that is conceptually relatively simple:
On Linux and BSD/macOS:
awk -F'[ />]' -v OFS=, '/itemAddBarCodeData/ {print $1, $2, $10}' file |
sed -E 's/^(..)(..)(..),(..)(..)(..)/\3-\2-\1,\4:\5:\6/'
On a Windows system, invoked from cmd.exe, different quoting and line continuation rules apply (assumes the presence of ported GNU utilities):
awk -F"[ />]" -v OFS=, "/itemAddBarCodeData/ {print $1, $2, $10}" file ^
| sed -E "s/^(..)(..)(..),(..)(..)(..)/\3-\2-\1,\4:\5:\6/"
Note how:
"..." strings rather than '...' strings must be used to protect the embedded content from interpretation by the shell
Unlike with "..." on Unix, $ has no special meaning to cmd.exe, so it can be used as-is.
^ as the very last character on a line serves as the explicit line-continuation character, and the line must be broken before the | (whereas on Unix a line ending in | is implicitly continued).
This is only used for readability here; of course, you can place your command on a single line.

awk to parse field by using period and output unique digits

I am trying to use awk to parse $2 on using the first . in the string and output the digits with the header row above it. The current output is close but both commands seem to taking $1 as well. Do I need to specify something in the command to only prints the digits in $2, it seems close. Thank you :).
file
R_2016_09_20_12_47
IonXpress_007 16-0001.xxx.xxx
IonXpress_008 16-0002.xxx.xxx
IonXpress_009 16-0003.xxx.xxx
R_2016_09_20_12_46
IonXpress_007 16-0004.xxx.xxx
IonXpress_008 16-0005.xxx.xxx
IonXpress_009 16-0006.xxx.xxx
desired output
R_2016_09_20_12_47
16-0001
16-0002
16-0003
R_2016_09_20_12_46
16-0004
16-0005
16-0006
awk
awk -F. '{print $1}' file
cut
cut -d'.' -f1 file
current output
R_2016_09_20_12_47
IonXpress_007 16-0001
IonXpress_008 16-0002
IonXpress_009 16-0003
R_2016_09_20_12_46
IonXpress_001 16-0004
IonXpress_002 16-0005
IonXpress_003 16-0006
Try this :
% awk -F'[ .]' '{print $2 ? $2 : $1}' file
R_2016_09_20_12_47
16-0001
16-0002
16-0003
R_2016_09_20_12_46
16-0004
16-0005
16-0006
NOTE
i take space and . as separators
i use ternary operator to make a condition on $2