Awk equivalent of MySQL's substring_index - awk

I have strings with delimited fields, but a different number of fields in each, eg:
this/that
this/that/theother
this/that/theother/stuff
I want to retrieve the last two fields in each case, ie:
this/that
that/theother
theother/stuff
This is easy in MySQL with the substring_index function, and I see this thread explains how to do it in PHP.
Can someone help me achieve the same with awk in the command line? Thanks

echo 'this/that/theother/stuff' | awk -F/ '{print $(NF-1) "/" $(NF)}'

Related

awk to filter lines in file by removing pattern

Trying to use awk to remove the IonCode_4 digits (always 4 may be different) and leave the file extension. Is the below the best way? Thank you :).
file
1112233 ID_1234_000000-Control_z_zzzz_zz_zz_zz_zz_zz_zzz_zz-zzzz-zzz-zzz_zzzz_zzzz_zzz_zzz_zzz_zzz_zzz.txt
1112231 ID_1234_000000-Control_z_zzzz_zz_zz_zz_zz_zz_zzz_zz-zzzz-zzz-zzz_zzzz_zzzz_zzz_zzz_zzz_zzz_zzz.txt
awk
awk '/_tn_/ {next} gsub ("^.*/|_.*$|IonCode_...._", "", $2)'f
current
1112233 000000-Control
1112231 000000-Control
desired
1112233 000000-Control.txt
1112231 000000-Control.txt
Split records by 1+ spaces or underscore, so the 4th field will be the part you're interested in.
awk -F '[[:space:]]+|_' '!/_tn_/{print $1,$4".txt"}' file
Could you please try following. This is simplest I could think, though we could do it with number of fields mentioning too but that will be more like hard-coding of numbers, so I went with this approach here.
awk '
{
sub(/[^_]*_/,"",$2)
sub(/[^_]*_/,"",$2)
sub(/_.*/,".txt")
}
1
' Input_file
with sed
$ sed -E 's/ID_[0-9]{4}_([^_]+).*(\..*)/\1\2/' file
1112233 000000-Control.txt
1112231 000000-Control.txt

Need help AWK script

Could you let me know how to print "user.%" string in below text by awk?
The value of 'user' is not fixed and the number of strings in '( )' are not fixed.
start user1.table% NOT (%OLD, %2016%) user.% another strings
UPDATE
It is the basis of SQL processing. $2 means schema.table but here user can use '%' and also exclude by NOT keyword. It ends with ')'. The next one is a second schema.table and that is the one I want to catch.
I think I should parse the string after ')' with a regular expression but failed.
Regular expression:
[)]\s+(\S+)
Above expression can be used to catch that string I guess.
How can I apply this one in awk script(Not one liner).
If the structure of the query keeps the same, you can use this:
awk -F'[).]' '{print $3".%"}'
I'm using the closing parenthesis or the literal dot as the delimiter. Doing so the value of interest is in field 3.
While it is simple it leaves some whitespace in front of user. We can enhance the field delimiter regex to fix this:
awk -F')[[:space:]]*|[.]' '{print $3".%"}'
Btw, you may use this sed command alternatively:
sed 's/.*)[[:space:]]*\([^.]*\).*/\1.%/'
or if you have GNU grep, use this:
grep -oP '\)\s*\K[^%]*%'
Try this (GNU awk):
awk '{match($0, /[)] +([^ ]+)/, var);print var[1];}'
You need to match first (GNU awk function).
Given your posted sample input, all you need is:
awk '{print $6}'
e.g.:
$ echo 'start user1.table% NOT (%OLD, %2016%) user.% another strings' |
awk '{print $6}'
user.%
If that doesn't work for you then your posted sample input isn't representative enough of your real input so edit your question to include a few lines of truly representative sample input and the expected output given that input.

Combining awk search with standard awk and awk delimiter

I`m working on a set of data for which I need specific fields as output:
The data looks like this:
/home/oracle/db.log.gz:2013-1-19T00:00:25 <user.info> 1 2013-1-19T00:00:53.911 host_name RT_FLOW [junos#26.1.1.1.2.4 source-address="10.1.2.0" source-port="616" destination-address="100.1.1.2" destination-port="23" service-name="junos-telnet" nat-source-address="20x.2x.1.2" nat-source-port="3546" nat-destination-address="9x.12x.3.0"]
From above I need three things:
(I) - 2013-1-19T00:00:53.911 which is $4
(II)- source-address="10.1.2.0" which is $8 of which I need only 10.1.2.0
(III) - destination-address="100.1.1.2" which $10 of which I need only 100.1.1.2
I cannot use simple awk like this -> awk '{ print $4 \t $8 \t $10 }' since there are some fields after "device_name" in the log file which are not always present in all log lines so I have to make use of delimiters such as
awk -F 'source-address=' '{print $2}' | awk '{print $1} -> this gives source-addressIP which is (II) requirement
I`m not sure how do I combine using a awk search for I and II and III.
Can someone help?
I believe sed is better for this job
sed -r 's/([^ ]+[ ]+){3}([^ ]+).*[ ]+source-address="([^"]+)".*[ ]+destination-address="([^"]+)".*/\2\t\3\t\4/' file
Output:
2013-1-19T00:00:53.911 10.1.2.0 100.1.1.2
What do you exactly want?
solve the problem using any (reasonably standard) tool
solve this challenge using one instance of awk
solve the problem using just awk, no matter how many instances it costs
For the first case, you could parse the line using scripting language of your choice (mine would be Perl), or do it the hard way using sed and a single big substitution. Or something between the two – use three regexes to get the parts you want.
For the second case, you could adapt any of the former solutions, preferably the sed one. Awk and sed solutions have already been posted.
For the third case, you could just run the obvious awk solutions you mentioned in your question and send the results to a single pipe like { awk …; awk …; awk …; } < file | consumer.
Try doing this :
awk '{print gensub(/.*\s+([0-9]{4}-[0-9]+-[0-9]+T[0-9]{2}:[0-9]{2}:[0-9]{2}.[0-9]+).*source-address="([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}).*destination-address="([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}).*/, "(I) \\1\n(II) \\2\n(III) \\3", "g"); }' file
Another solution using perl :
perl -lne 'print "(", "I" x ++$c, ") $_" for m/.*?\s+(\d{4}-\d+-\d+T\d{2}:\d{2}:\d{2}.\d+).*source-address="(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}).*destination-address="(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}).*/' file
Outputs :
(I) 2013-1-19T00:00:53.911
(II) 10.1.2.0
(III) 100.1.1.2

awk extract of a series of lines

I am stuck at getting a right solution using awk to extract versions between "[]" from
Version Repository Repository URL
[1.0.0.44] repo-0 file://test/test-1.0.0.44-features.xml
[1.0.0.21] repo-0 file://test/test-1.0.0.21-features.xml
Is there any quick efficient one-liners anyone can help with please?
With awk, using square brackets as the field separators, output field 2 except for record number 1:
awk -F '[][]' 'NR > 1 {print $2}'
Or, grep with -o is useful for extracting substrings
grep -oP '(?<=\[)[^]]+'

How to quote a shell variable in a TCL-expect string

I'm using the following awk command in an expect script to get the gateway for a particular destination
route | grep $dest | awk '{print $2}'
However the expect script does not like the $2 in the above statement.
Does anyone know of an alternative to awk to perform the same function as above? ie. output 2nd column.
You can use cut:
route | grep $dest | cut -d \ -f 2
That uses spaces as the field delimiter and pulls out the second field
To answer your Expect question, single quotes have no special meaning to the Tcl parser. You need to use braces to protect the body of the awk script:
route | grep $dest | awk {{print $2}}
And as awk can do what grep does, you can get away with one less process:
route | awk -v d=$dest {$0 ~ d {print $2}}
Before switching to another utility, check if changing field separator worrks. Documentation for field separators in GNU Awk here.
SED is the best alternative to use. If you don't mind a dependency, Perl should also be sufficient to solve the task
Depending on the structure of your data, you can use either cut, or use sed to do both filtering and printing the second column.
Alternatively, you could use Perl:
perl -ne 'if(/foo/) { #_ = split(/:/); print $_[1]; }'
This will print second token of each line containing foo, with : as token separator.