Change output field seperator on first row - awk

I have a file like this
name|age
Bob|30
Tom|50
Cindy|10
I want the first row to have a different seperator, "^".
awk 'NR==1 { gsub("|","^")1}1' f
But I keep getting
^n^a^m^e^|^a^g^e^
Bob|30
Tom|50
Cindy|10
Desired output is
name^age
Bob|30
Tom|50
Cindy|10

Your code with gsub("|","^") doesn't have special meta character | (used for alternation in regex) escaped hence it will match every position in input.
You may use this awk without involving any regex:
awk 'BEGIN{FS=OFS="|"} FNR==1{OFS="^"; $1=$1; OFS=FS} 1' f
name^age
Bob|30
Tom|50
Cindy|10
Details:
FS="|": Sets FS as |
OFS="^": Sets OFS as ^
$1=$1: Forces awk to reformat each of the fields using OFS

You can also use sed:
sed '1 s/|/^/' ip.txt
1 address for the command, which is first line here
| is not special, because by default sed uses BRE, see this Q&A for BRE vs ERE differences
use s/|/^/g if you can have multiple matches

Like this :
awk -F'|' 'NR==1{print $1,$2;next}1' OFS='^' file
or a mix between anubhava response and mine:
awk -F'|' 'NR==1{$1=$1}1' OFS='^' file

Could you please try following.
awk 'FNR==1{sub(/\|/,"^")} 1' Input_file
Use gsub in place of sub in case of multiple occurrences needs to be changed.
awk 'FNR==1{gsub(/\|/,"^")} 1' Input_file

Related

How to extract string from a file in bash

I have a file called DB_create.sql which has this line
CREATE DATABASE testrepo;
I want to extract only testrepo from this. So I've tried
cat DB_create.sql | awk '{print $3}'
This gives me testrepo;
I need only testrepo. How do I get this ?
With your shown samples, please try following.
awk -F'[ ;]' '{print $(NF-1)}' DB_create.sql
OR
awk -F'[ ;]' '{print $3}' DB_create.sql
OR without setting any field separators try:
awk '{sub(/;$/,"");print $3}' DB_create.sql
Simple explanation would be: making field separator as space OR semi colon and then printing 2nd last field($NF-1) which is required by OP here. Also you need not to use cat command with awk because awk can read Input_file by itself.
Using gnu awk, you can set record separator as ; + line break:
awk -v RS=';\r?\n' '{print $3}' file.sql
testrepo
Or using any POSIX awk, just do a call to sub to strip trailing ;:
awk '{sub(/;$/, "", $3); print $3}' file.sql
testrepo
You can use
awk -F'[;[:space:]]+' '{print $3}' DB_create.sql
where the field separator is set to a [;[:space:]]+ regex that matches one or more occurrences of ; or/and whitespace chars. Then, Field 3 will contain the string you need without the semi-colon.
More pattern details:
[ - start of a bracket expression
; - a ; char
[:space:] - any whitespace char
] - end of the bracket expression
+ - a POSIX ERE one or more occurrences quantifier.
See the online demo.
Use your own code but adding the function sub():
cat DB_create.sql | awk '{sub(/;$/, "",$3);print $3}'
Although it's better not using cat. Here you can see why: Comparison of cat pipe awk operation to awk command on a file
So better this way:
awk '{sub(/;$/, "",$3);print $3}' file

How to print specific string from a sentence using awk

I have the following sentence within a file
FQDN=joe.blogs.com.
How can I print the string "joe"
I have tried using -->> awk -F"=" '{print $2}' file
but this returns joe.blogs.com as "=" is the delimiter.
Is it possible to use 2 delimiters on the same line?
You might use regular expression as FS. Let file.txt content be
FQDN=joe.blogs.com.
then
awk 'BEGIN{FS="[=.]"}{print $2}' file.txt
output
joe
In case you are ok with sed, could you please try following.
sed 's/.*=\([^.]*\)\..*/\1/' Input_file
With GNU grep and using its -oP flag we could try following too.
grep -oP '(.*=)\K([^.]*)' Input_file
You could use GNU grep:
grep -oP '(?<=FQDN=)[^.]+' file
^ all characters up to a '.'
^ lookbehind for 'FQDN='
^ only print match and Perl style regex
Or with Perl:
perl -lne 'print $1 if /(?<=FQDN=)([^.]+)/' file
With awk I would probably do:
awk 'BEGIN{FS="[.=]"} /FQDN=/{print $2}' file
why not keeping it simple and pipe awk?
awk -F"=" '{print $2}' | awk -F"." '{print $1}'
can I use two field delimiters on one line?
No. You may do further string manipulation as post processing, or you could use a regex as field delimiter.
Another option is to use awk's split function:
awk -F= '{ split($2,map,".");print map[1] }' file
Split the second = separated field into the array map using "." as the delimiter. Print the first index of the array.

Issue with field separator in AWK script

Having a very large file where two lines shown below and having two fields name and revision having colon delimiter. I need to print only the second column.
sam:7.[0:6]
Ram:8.[6:6]_rev[2:4] h_ack[2:6]
vincent:58
I tried this code:
#!/bin/bash
awk -F: '{print $2}'
7.[0
8.[6
58
Output should be:
7.[0:6]
8.[6:6]_rev[2:4] h_ack[2:6]
58
What went wrong in my code.
The problem in your awk expression is that you are splitting on all :.
Instead, you want to split only on the first : from the start.
$ awk -F'^[^:]+:' '{print $2}' file
The regex pattern matches the start of the string ^, any character other than a :, and finally a :.
If you specify field separator as :, it's normal behavior of awk to output this, ex:
7.[0, because you need the other columns after $2.
cut here, better suits the requirement:
cut -d: -f2- file
Could you please try following.
awk '
match($0,/:.*/){
print substr($0,RSTART+1,RLENGTH-1)
}
' Input_file

Print word after string

How to print word after centrain string?
I have a file
Hello word.
I want to get an output:
#word
Something like
awk '{ print '#' }'
Thank you
You may use
awk '{sub(/[[:punct:]]+$/, "", $2);print "#"$2}' file > newfile
See the online demo.
The sub(/[[:punct:]]+$/, "", $2) operation removes 1 or more punctuation chars ([[:punct:]]+) from the end ($) of Field 2 and print "#"$2 prepends it with # and prints it.
To make sure to get the word after Hello you may use
cat file | grep -Po 'Hello\s+\K\w+' | awk '{print "#"$0}'
See the online demo
Or, with the help of sed:
sed 's/.*Hello \([[:alnum:]]*\).*/#\1/' file > newfile
See another demo.
Here, .*Hello \([[:alnum:]]*\).* matches any 0+ chars, then Hello, a space, then captures 0 or more alphanumeric chars into Group 1 (\1) and then matches the rest of the line. The #\1 pattern in the RHS leaves just what was captured with a # in front.
Another solution with awk with . and / as field delimiters using -F'[/.]':
awk -F'[/.]' '{for(i=1;i<=NF;i++){if($i=="54517334"){print "#"$(i+1)}}}' file
See an online demo
Here, for(i=1;i<=NF;i++) enumerates all the fields, and if the field is equal to 54517334, the next field with a # prepended at the start is printed.
EDIT: In case you want to search for a string and print string next to it then try following. I am looking for hello string here you could change it as per your need.
awk '{for(i=1;i<=NF;i++){if(tolower($i)=="hello"){print "#"$(i+1)}}}' Input_file
OR(create an awk variable and perform checks, it will help in changing string in place of hello in variable itself)
awk -v word_to_search="hello" '{for(i=1;i<=NF;i++){if(tolower($i)==word_to_search){print "#"$(i+1)}}}' Input_file
In case you want to print next keyword by finding a string and remove punctuations then try following.
awk -v word_to_search="hello" '{for(i=1;i<=NF;i++){if(tolower($i)==word_to_search){sub(/[[:punct:]]+/,"",$(i+1));print "#"$(i+1)}}}' Input_file
Could you please try following.
awk -F'[ .]' '{print "#"$2}' Input_file
OR
awk -F'[ .]' '{print $(NF-1)}' Input_file

Grep part of string after symbol and shuffle columns

I would like to take the number after the - sign and put is as column 2 in my matrix. I know how to grep the string but not how to print it after the text string.
in:
1-967764 GGCTGGTCCGATGGTAGTGGGTTATCAGAACT
3-425354 GCATTGGTGGTTCAGTGGTAGAATTCTCGCC
4-376323 GGCTGGTCCGATGGTAGTGGGTTATCAGAAC
5-221398 GGAAGAGCACACGTCTGAACTCCAGTCACGTGAAAATCTCGTATGCCGTCT
6-180339 TCCCTGGTGGTCTAGTGGTTAGGATTCGGCGCT
out:
GGCTGGTCCGATGGTAGTGGGTTATCAGAACT 967764
GCATTGGTGGTTCAGTGGTAGAATTCTCGCC 425354
GGCTGGTCCGATGGTAGTGGGTTATCAGAAC 376323
GGAAGAGCACACGTCTGAACTCCAGTCACGTGAAAATCTCGTATGCCGTCT 221398
TCCCTGGTGGTCTAGTGGTTAGGATTCGGCGCT 180339
awk -F'[[:space:]-]+' '{print $3,$2}' file
Seems like a simple substitution should do the job:
sed -E 's/[0-9]+-([0-9]+)[[:space:]]*(.*)/\2 \1/' file
Capture the parts you're interested in and use them in the replacement.
Alternatively, using awk:
awk 'sub(/^[0-9]+-/, "") { print $2, $1 }' file
Remove the leading digits and - from the start of the line. When this is successful, sub returns true, so the action is performed, printing the second field, followed by the first.
Using regex ( +|-) as field separator:
$ awk -F"( +|-)" '{print $3,$2}' file
GGCTGGTCCGATGGTAGTGGGTTATCAGAACT 967764
GCATTGGTGGTTCAGTGGTAGAATTCTCGCC 425354
GGCTGGTCCGATGGTAGTGGGTTATCAGAAC 376323
GGAAGAGCACACGTCTGAACTCCAGTCACGTGAAAATCTCGTATGCCGTCT 221398
TCCCTGGTGGTCTAGTGGTTAGGATTCGGCGCT 180339
here is another awk
$ awk 'split($1,a,"-") {print $2,a[2]}' file
awk '{sub(/.-/,"");print $2,$1}' file
GGCTGGTCCGATGGTAGTGGGTTATCAGAACT 967764
GCATTGGTGGTTCAGTGGTAGAATTCTCGCC 425354
GGCTGGTCCGATGGTAGTGGGTTATCAGAAC 376323
GGAAGAGCACACGTCTGAACTCCAGTCACGTGAAAATCTCGTATGCCGTCT 221398
TCCCTGGTGGTCTAGTGGTTAGGATTCGGCGCT 180339