#!/bin/bash
cat $1 | awk ' /Info/, /<\/Body>/ {print $0}' | while read line; do
file=`awk -F '>' "{print $4}"`
echo "$file"
done
Basically the file input file has some information removed in the first line of awk. Now what I'm trying to do is find a variable using awk -F and print what comes after the > which is the 4th field. Now I can not just search for the > because the file has 100s of them because it's html.
ok maybe someone can answer this so when i run the file now it does not look at the forth field however it just removes all of '>' which is not the goal i am trying to locate the field that comes after the 3 '>' so that would be field 4 but thats not what im getting? any help would be great!
AWK requires 2 parts
Options
File to work on
In your example, you have given options i.e. delimiter, what to print, but you have not mentioned which file to work with
Try this
On the command prompt
cat file | awk -F ">" '{print $4}'
In script
result=`cat file | awk -F ">" '{print $4}'`
echo $result
For the text file as "file" containing data as
a>b>c>d>e>f
Both the above will display 'd'.
Related
cat file1.txt | awk -F '{print $1 "|~|" $2 "|~|" $3}' > file2.txt
I am using above command to filter first three columns from file1 and put into file.
But only getting the column names and not the column data.
How to do that?
|~| - is the delimiter.
file1.txt has values as :
a|~|b|~|c|~|d|~|e
1|~|2|~|3|~|4|~|5
11|~|22|~|33|~|44|~|55
111|~|222|~|333|~|444|~|555
my expedted output is :
a|~|b|~|c
1|~|2|~|3
11|~|22|~|33
111|~|222|~|333
With your shown samples, please try following awk code. You need to set field separator to |~| and remove starting space from lines, then print the lines.
awk -F'\\|~\\|' -v OFS='|~|' '{sub(/^[[:blank:]]+/,"");print $1,$2,$3}' Input_file
In case you want to keep spaces(which was in initial post before edit) then try following:
awk -F'\\|~\\|' -v OFS='|~|' '{print $1,$2,$3}' Input_file
NOTE: Had a chat with user in room and got to know why this code was not working for user because of gunzip -c file was being used wrongly, its output was being saved into a variable on which user was running awk program, so correcting that command generated right file and awk program ran fine on it. Adding this as a reference for future readers.
One approach would be:
awk -v FS="," -v OFS="|~|" '{gsub(/[|][~][|]/,","); sub(/^\s*/,""); print $1,$2,$3}' file1.txt
The approach simply replaces all "|~|" with a "," setting the output file separator to "|~|". All leading whitespace is trimmed with sub().
Example Use/Output
With your data in file1.txt, you would have:
$ awk -v FS="," -v OFS="|~|" '{gsub(/[|][~][|]/,","); sub(/^\s*/,""); print $1,$2,$3}' file1.txt
a|~|b|~|c
1|~|2|~|3
11|~|22|~|33
111|~|222|~|333
Let me know if this is what you intended. You can simply redirect, e.g. > file2.txt to write to the second file.
For such cases, my bash+awk script rcut comes in handy:
rcut -Fd'|~|' -f-3 ip.txt
The -F option enables fixed string input delimiter (which is given using the -d option). And by default, the output field separator will also be same as -d when -F is active. -f-3 is similar to cut syntax to specify first three fields.
For better speed, use hck command:
hck -Ld'|~|' -D'|~|' -f-3 ip.txt
Here, -L enables literal field separator and -D specifies output field separator.
Another benefit is that hck supports -z option to automatically handle common compressed formats based on filename extension (adding this since OP had an issue with compressed input).
Another way:
sed 's/|~|/\t/g' file1.txt | awk '{print $1"|~|"$2"|~|"$3}' > file2.txt
First replace the |~| delimiter, and use the default awk separator, then print columns what you need.
I've got a file with following records:
depots/import/HDN1YYAA_15102018.txt;1;CAB001
depots/import/HDN1YYAA_20102018.txt;2;CLI001
depots/import/HDN1YYAA_20102018.txt;32;CLI001
depots/import/HDN1YYAA_25102018.txt;1;CAB001
depots/import/HDN1YYAA_50102018.txt;1;CAB001
depots/import/HDN1YYAA_65102018.txt;1;CAB001
depots/import/HDN1YYAA_80102018.txt;2;CLI001
depots/import/HDN1YYAA_93102018.txt;2;CLI001
When I execute following oneliner awk:
cat lignes_en_erreur.txt | awk 'FS=";"{ if(NR==1){print $1}}END {}'
the output is not the expected:
depots/import/HDN1YYAA_15102018.txt;1;CAB001
While I am suppose get only the frist column:
If I run it through all the records:
cat lignes_en_erreur.txt | awk 'FS=";"{ if(NR>0){print $1}}END {}'
then it will start filtering only after the second line and I get the following output:
depots/import/HDN1YYAA_15102018.txt;1;CAB001
depots/import/HDN1YYAA_20102018.txt
depots/import/HDN1YYAA_20102018.txt
depots/import/HDN1YYAA_25102018.txt
depots/import/HDN1YYAA_50102018.txt
depots/import/HDN1YYAA_65102018.txt
depots/import/HDN1YYAA_80102018.txt
depots/import/HDN1YYAA_93102018.txt
Does anybody knows why awk is skiping the first line only.
I tried deleting first record but the behaviour is the same, it will skip the first line.
First, it should be
awk 'BEGIN{FS=";"}{ if(NR==1){print $1}}END {}' filename
You can omit the END block if it is empty:
awk 'BEGIN{FS=";"}{ if(NR==1){print $1}}' filename
You can use the -F command line argument to set the field delimiter:
awk -F';' '{if(NR==1){print $1}}' filename
Furthermore, awk programs consist of a sequence of CONDITION [{ACTIONS}] elements, you can omit the if:
awk -F';' 'NR==1 {print $1}' filename
You need to specify delimiter in either BEGIN block or as a command-line option:
awk 'BEGIN{FS=";"}{ if(NR==1){print $1}}'
awk -F ';' '{ if(NR==1){print $1}}'
cut might be better suited here, for all lines
$ cut -d';' -f1 file
to skip the first line
$ sed 1d file | cut -d';' -f1
to get the first line only
$ sed 1q file | cut -d';' -f1
however at this point it's better to switch to awk
if you have a large file and only interested in the first line, it's better to exit early
$ awk -F';' '{print $1; exit}' file
I have following lines in a file. Please note I have intentionally kept the extra hash between 2 and 0 in the 2nd line.
File name : test.txt
Name#|#Age#|#Dept
AC#|#2#0#|#Science
BC#|#22#|#Commerce
I am using awk to get the data in Dept column
awk -F "#|#" -v c="Dept" 'NR==1{for (i=1; i<=NF; i++) if ($i==c){p=i; break}; next} {print $p}' "test.txt" >> result.txt
The result.txt shows me the following
|
Commerce
The first line is coming as pipe because if the extra # in the first line.
Can anyone help on this
Currently the meaning of the delimiter set is: match # or #. The pipe | character in this case acts as an OR statement; instead try using:
awk -F '#[|]#' ...
Putting | into a character class [ ... ] awk will match it literally.
If you desire to extract Dept in your content, here's a good choice you can choose from,
awk -F'#' 'NR>1{print $NF}' test.txt
output:
Science
Commerce
I have 3 fasta files like following
>file_1_head
haszhaskjkjkjkfaiezqbsga
>file_1_body
loizztzezzqieovbahsgzqwqoiropoqiwoioioiweoitwwerweuiruwieurhcabccjashdja
>file_1_tail
mnnbasnbdnztoaosdhgas
I would like to concatenate them into a single like following
>file_1
haszhaskjkjkjkfaiezqbsgaloizztzezzqieovbahsgzqwqoiropoqiwoioioiweoitwwerweuiruwieurhcabccjashdjamnnbasnbdnztoaosdhgas
I tried with cat command cat file_1_head.fasta file_1_body.fasta file_1_tail.fasta but it didnt concatenates into a single line like above. Is it possible with "awk" Kindly guide me.
Do you mean your three files have the content
file_1_head.fasta
>file_1_head
haszhaskjkjkjkfaiezqbsga
file_1_body.fasta
>file_1_body
loizztzezzqieovbahsgzqwqoiropoqiwoioioiweoitwwerweuiruwieurhcabccjashdja
and file_1_tail.fasta
>file_1_tail
mnnbasnbdnztoaosdhgas
including the name of each of them within them as the first line?
Then you could do
(echo ">file_1"; tail -qn -1 file_1_{head,body,tail}.fasta | tr -d "\n\t ") > file_1.fasta
to get file_1.fasta as
>file_1
haszhaskjkjkjkfaiezqbsgaloizztzezzqieovbahsgzqwqoiropoqiwoioioiweoitwwerweuiruwieurhcabccjashdjamnnbasnbdnztoaosdhgas
This also removes some extra whitespace at the end of the lines in your input that I got when I copied them verbatim.
You can do this simply with
cat file1 file2 file3 | tr -d '\n' > new_file
tr deletes the newline character.
EDIT:
For your specific first line just do
echo file_1 > new_file
cat file1 file2 file3 | tr -d '\n' >> new_file
The first command creates the file with one line file_1 in it. Then the cat... command just appends to this file.
What about this?
awk 'BEGIN { RS=""} {for (i=1;i<=NF;i++) { printf "%s",$i } }' f1_head f1_body f1_tail
I have 100 files and want to search a specific word in the first column of each file and print the content of all columns from this word to a new file
I tried this code but doesn't work well it prints only the content of one file not all:
ls -t *.txt > Filelist.tmp
cat Filelist.tmp | while read line do; grep "searchword" | awk '{print $0}' > outputfile.txt; done
This is what you want:
$ awk '$1~/searchword/' *.txt >> output
This compares the first field against searchword and appends the line to output if it matches. The default field separator with awk is whitespace.
The main problem with your attempt is you are overwriting > the file evertime, you want to be using append >>.