awk command to print columns with colum data - awk

cat file1.txt | awk -F '{print $1 "|~|" $2 "|~|" $3}' > file2.txt
I am using above command to filter first three columns from file1 and put into file.
But only getting the column names and not the column data.
How to do that?
|~| - is the delimiter.
file1.txt has values as :
a|~|b|~|c|~|d|~|e
1|~|2|~|3|~|4|~|5
11|~|22|~|33|~|44|~|55
111|~|222|~|333|~|444|~|555
my expedted output is :
a|~|b|~|c
1|~|2|~|3
11|~|22|~|33
111|~|222|~|333

With your shown samples, please try following awk code. You need to set field separator to |~| and remove starting space from lines, then print the lines.
awk -F'\\|~\\|' -v OFS='|~|' '{sub(/^[[:blank:]]+/,"");print $1,$2,$3}' Input_file
In case you want to keep spaces(which was in initial post before edit) then try following:
awk -F'\\|~\\|' -v OFS='|~|' '{print $1,$2,$3}' Input_file
NOTE: Had a chat with user in room and got to know why this code was not working for user because of gunzip -c file was being used wrongly, its output was being saved into a variable on which user was running awk program, so correcting that command generated right file and awk program ran fine on it. Adding this as a reference for future readers.

One approach would be:
awk -v FS="," -v OFS="|~|" '{gsub(/[|][~][|]/,","); sub(/^\s*/,""); print $1,$2,$3}' file1.txt
The approach simply replaces all "|~|" with a "," setting the output file separator to "|~|". All leading whitespace is trimmed with sub().
Example Use/Output
With your data in file1.txt, you would have:
$ awk -v FS="," -v OFS="|~|" '{gsub(/[|][~][|]/,","); sub(/^\s*/,""); print $1,$2,$3}' file1.txt
a|~|b|~|c
1|~|2|~|3
11|~|22|~|33
111|~|222|~|333
Let me know if this is what you intended. You can simply redirect, e.g. > file2.txt to write to the second file.

For such cases, my bash+awk script rcut comes in handy:
rcut -Fd'|~|' -f-3 ip.txt
The -F option enables fixed string input delimiter (which is given using the -d option). And by default, the output field separator will also be same as -d when -F is active. -f-3 is similar to cut syntax to specify first three fields.
For better speed, use hck command:
hck -Ld'|~|' -D'|~|' -f-3 ip.txt
Here, -L enables literal field separator and -D specifies output field separator.
Another benefit is that hck supports -z option to automatically handle common compressed formats based on filename extension (adding this since OP had an issue with compressed input).

Another way:
sed 's/|~|/\t/g' file1.txt | awk '{print $1"|~|"$2"|~|"$3}' > file2.txt
First replace the |~| delimiter, and use the default awk separator, then print columns what you need.

Related

Remove '||'(double pipe) separated multiple columns from file using shell script command

Please suggest perfect shell script command to remove last two '||' delimiter separated columns from the file.(Lets assume below example)
File Name: abc.dat
"a1"||"a2"||"a3"||"a4"
"b1"||"b2"||"b3"||"b4"
"c1"||"c2"||"c3"||"c4"
output should be like :
"a1"||"a2"
"b1"||"b2"
"c1"||"c2"
I tried below cut and awk command but not worked:
awk -F '||' '{print $1$2}' ${file} >> ${file}
cut -d'||' -f2 --complement ${file} >> ${file} (not working as cut: the delimiter must be a single character)
With your shown samples, please try following. Make field separator as ||(escaping it to treat literal character) along with setting OFS to || too. Then print 1st and 2nd fields for each line of Input_file.
awk -F'\\|\\|' -v OFS="||" '{print $1,$2}' Input_file
Once you are happy with results of above command, also to make changes into Input_file itself try following.
awk -F'\\|\\|' -v OFS="||" '{print $1,$2}' Input_file > temp && mv temp Input_file
2nd solution: Using GNU grep try following.
grep -oP '^.*?\|\|"[^"]*' Input_file
Rather than assuming || is the delimiter, assume that | is the delimiter and the second field is empty.
$ cut -d'|' -f1-3 <<EOF
> "a1"||"a2"||"a3"||"a4"
> "b1"||"b2"||"b3"||"b4"
> "c1"||"c2"||"c3"||"c4"
> EOF
"a1"||"a2"
"b1"||"b2"
"c1"||"c2"
(This assumes that || was chosen for some aesthetic reason, rather than to allow for single pipes in each field.)
You may use:
awk '{sub(/(\|{2}[^|]*){2}$/, "")} 1' file
"a1"||"a2"
"b1"||"b2"
"c1"||"c2"
Or if you just want to remove last 2 columns without caring how many columns are there in total use:
awk -F '\\|{2}' -v OFS='||' '{
$NF = $(NF-1) = ""
sub(/([|]{2})*$/, "")
} 1' file

awk command to read a key value pair from a file

I have a file input.txt which stores information in KEY:VALUE form. I'm trying to read GOOGLE_URL from this input.txt which prints only http because the seperator is :. What is the problem with my grep command and how should I print the entire URL.
SCRIPT
$> cat script.sh
#!/bin/bash
URL=`grep -e '\bGOOGLE_URL\b' input.txt | awk -F: '{print $2}'`
printf " $URL \n"
INPUT_FILE
$> cat input.txt
GOOGLE_URL:https://www.google.com/
OUTPUT
https
DESIRED_OUTPUT
https://www.google.com/
Since there are multiple : in your input, getting $2 will not work in awk because it will just give you 2nd field. You actually need an equivalent of cut -d: -f2- but you also need to check key name that comes before first :.
This awk should work for you:
awk -F: '$1 == "GOOGLE_URL" {sub(/^[^:]+:/, ""); print}' input.txt
https://www.google.com/
Or this non-regex awk approach that allows you to pass key name from command line:
awk -F: -v k='GOOGLE_URL' '$1==k{print substr($0, length(k FS)+1)}' input.txt
Or using gnu-grep:
grep -oP '^GOOGLE_URL:\K.+' input.txt
https://www.google.com/
Could you please try following, written and tested with shown samples in GNU awk. This will look for string GOOGLE_URL and will catch further either http or https value from url, in case you need only https then change http[s]? to https in following solution please.
awk '/^GOOGLE_URL:/{match($0,/http[s]?:\/\/.*/);print substr($0,RSTART,RLENGTH)}' Input_file
Explanation: Adding detailed explanation for above.
awk ' ##Starting awk program from here.
/^GOOGLE_URL:/{ ##Checking condition if line starts from GOOGLE_URL: then do following.
match($0,/http[s]?:\/\/.*/) ##Using match function to match http[s](s optional) : till last of line here.
print substr($0,RSTART,RLENGTH) ##Printing sub string of matched value from above function.
}
' Input_file ##Mentioning Input_file name here.
2nd solution: In case you need anything coming after first : then try following.
awk '/^GOOGLE_URL:/{match($0,/:.*/);print substr($0,RSTART+1,RLENGTH-1)}' Input_file
Take your pick:
$ sed -n 's/^GOOGLE_URL://p' file
https://www.google.com/
$ awk 'sub(/^GOOGLE_URL:/,"")' file
https://www.google.com/
The above will work using any sed or awk in any shell on every UNIX box.
I would use GNU AWK following way for that task:
Let file.txt content be:
EXAMPLE_URL:http://www.example.com/
GOOGLE_URL:https://www.google.com/
KEY:GOOGLE_URL:
Then:
awk 'BEGIN{FS="^GOOGLE_URL:"}{if(NF==2){print $2}}' file.txt
will output:
https://www.google.com/
Explanation: GNU AWK FS might be pattern, so I set it to GOOGLE_URL: anchored (^) to begin of line, so GOOGLE_URL: in middle/end will not be seperator (consider 3rd line of input). With this FS there might be either 1 or 2 fields in each line - latter is case only if line starts with GOOGLE_URL: so I check number of fields (NF) and if this is second case I print 2nd field ($2) as first record in this case is empty.
(tested in gawk 4.2.1)
Yet another awk alternative:
gawk -F'(^[^:]*:)' '/^GOOGLE_URL:/{ print $2 }' infile

Append prefix to first column of a file with awk

I have a couple of hundreds of files which I want to process with xargs. They all need a fix of their first column.
Therefore I need an awk command to append the prefix "ID_" to the first column of a file (except for the first header line). Can anyone help me with this?
Something along the line:
gawk -f ';' "{$1='ID_' $1; print $0}" file.csv > file_processed.csv
I am not expert for the command, though. And I would rather like to have some inplace processing instead of making a copy of each file. Beforehand, I made it in VIM, but then I only had one file.
:%s/^-/ID_/
I hope someone can help me here.
gawk 'BEGIN{FS=";"; OFS=";"} {if(NR>1) $1="ID_"$1; print}' file.csv > file_processed.csv
FS and OFS set the input and output field separators, respectively.
NR>1 checks whether current line number is larger than 1, so we don't modify the header line.
You can also modify the file in place with -i inplace option:
gawk -i inplace 'BEGIN{FS=";"; OFS=";"} {if(NR>1) $1="ID_"$1; print}' file.csv
Edit
After elaborating the original question, here's the final version:
gawk -i inplace 'BEGIN{FS=OFS=";"} NR>1{sub(/^-/,"ID_",$2)} 1' file.csv
which substitutes - in the beginning of second column with ID_.
NR>1 action applies for all but first (header) line. 1 invokes the default default print action.
If you just want to do something, particularly adding a prefix, on the first field, it is not different from adding the prefix to the whole line.
So you can just awk '$0 = "ID_" $0' file.csv it should do the work. If you want to make it "change in place", you can:
awk '$0="ID_"$0' csv >/tmp/foo && mv /tmp/foo file.csv
You can also make use of sed:
sed -i 's/^/ID_/' file
The -i does "in-place modification"
You mentioned vim, and gave s/^-/ID_/ cmd, it doesn't add the prefix (ID_), it will replace the leading - by the ID_, they are different.

Combine grep -f and awk

I am using two commands:
awk '{ print $2 }' SomeFile.txt > Pattern.txt
grep -f Pattern.txt File.txt
With the first command I create a list of desirable patterns. With the second command I extract all lines in File.txt that match the lines in the Pattern.txt
My question is, is there a way to combine awk and grep in a pipeline so that I don't have to generate the intermediate Pattern.txt file?
Thanks!
You can do this all in one invocation of awk:
awk 'NR==FNR{a[$2];next}{for(i in a)if($0~i)print}' Somefile.txt File.txt
Populate keys in the array a from the second column of the first file. NR==FNR identifies the first file (total record number is equal to this file's record number). next skips the second block for the first file.
In the second block, loop through all the keys in the array and if the line matches any of them, print it. To avoid printing the line more than once if it matches more than one pattern, you could add a next here too, i.e. {for(i in a)if($0~i){print;next}}.
If the "patterns" are actually fixed strings, it is even simpler:
awk 'NR==FNR{a[$2];next}$0 in a' Somefile.txt File.txt
If your shell supports it, you can use process substitution:
grep -f <(awk '{ print $2 }' SomeFile.txt) File.txt
bash and zsh will support that, others will probably too, didn't tested.
Simpler as the above and supported by all shells would be to use a pipe:
awk '{ print $2 }' SomeFile.txt | grep -f - File.txt
- is used as the argument to -f. - has a special meaning here and stands for stdin. Thanks to Tom Fenech for mentioning that!

Using each line of awk output as grep pattern

I want to find every line of a file that contains any of the strings held in a column of a different file.
I have tried
grep "$(awk '{ print $1 }' file1.txt)" file2.txt
but that just outputs file2.txt in its entirety.
I know I've done this before with a pattern I found on this site, but I can't find that question anymore.
I see in the OP's comment that maybe the question is no longer a question. However, the following slight modification will handle the blank line situation. Just add a check to make sure the line has at least one field:
grep "$(awk '{if (NF > 0) print $1}' file1)" file2
And if the file with the patterns is simply a set of patterns per line, then a much simpler version of it is:
grep -f file1 file2
That causes grep to use the lines in file1 as the patterns.
THere is no need to use grep when you have awk
awk 'FNR==NR&&NF{a[$0];next}($1 in a)' file2 file1
$(awk '{ print $1 }' file1.txt) | grep text > file.txt