Giving an argument list as file names extracted from another tab separated file - awk

I have a tab delimited file as below. The first column represents the list of file names without .txt extension which I want to pass as an argument list to another awk command.
File1 abcd xyz 234 pqr
File2 abcd xyz 234 pqr
File3 abcd xyz 234 pqr
File4 abcd xyz 234 pqr
e.g. Assume this is my awk command, I want to pass arguments as
awk -F"\t" '---Commamd-----' File1.txt File2.txt File3.txt File4.txt >> Final.txt
So that it takes each row from 1st column with ".txt" extention as input and create Final.txt output file. It should be noted that number of columns may vary each time.
I thought of creating it in bash script, but I am not able to provide correct arguments and append next row from 1st column as next argument.

Going by my understanding of your requirements, you want to use the tab-separated file to get the file names on column 1 and you want to add .txt extension to them and pass it to another file. Firstly use mapfile to get the names from the tab-separated file
mapfile -t fileNames < <(awk -v FS="\t" '{print $1}' tabfile)
Now to pass this as an argument list to another function, all you need to do is use this quoted array by suffixing the .txt extension to it
awk ... "${fileNames[#]/%/.txt}"

Not completely sure here as it is not clear. Based on your statement that you want to get file names from 1 awk and pass it to another awk following could be tried.
awk '{print $0}' <(awk 'NF{print $1".txt"}' Input_file)
So in spite of print $0 you could do your operations here, I just printed it to see if file names are coming proper or not. Also add -F="\t" in 2nd awk in case your Input_file is TAB delimited and could change $1 to any other field in case file names are not on first column.

You can try this awk
awk '{file=$1".txt";while (getline<file == 1)print $2}' infile
append .txt on all $1 of infile to get the filename like File2.txt
print $2 of this file if it exist.

Related

I need to sum all the values in a column across multiple files

I have a directory with multiple csv text files, each with a single line in the format:
field1,field2,field3,560
I need to output the sum of the fourth field across all files in a directory (can be hundreds or thousands of files). So for an example of:
file1.txt
field1,field2,field3,560
file2.txt
field1,field2,field3,415
file3.txt
field1,field2,field3,672
The output would simply be:
1647
I've been trying a few different things, with the most promising being an awk command that I found here in response to another user's question. It doesn't quite do what I need it to do, and I am an awk newb so I'm unsure how to modify it to work for my purpose:
awk -F"," 'NR==FNR{a[NR]=$4;next}{print $4+a[FNR]:' file1.txt file2.txt
This correctly outputs 975.
However if I try pass it a 3rd file, rather than add field 4 from all 3 files, it adds file1 to file2, then file1 to file3:
awk -F"," 'NR==FNR{a[NR]=$4;next}{print $4+a[FNR]:' file1.txt file2.txt file3.txt
975
1232
Can anyone show me how I can modify this awk statement to accept more than two files or, ideally because there are thousands of files to sum up, an * to output the sum of the fourth field of all files in the directory?
Thank you for your time and assistance.
A couple issues with the current code:
NR==FNR is used to indicate special processing for the 1st file; in this case there is no processing that is 'special' for just the 1st file (ie, all files are to be processed the same)
an array (eg, a[NR]) is used to maintain a set of values; in this case you only have one global value to maintain so there is no need for an array
Since you're only looking for one global sum, a bit more simpler code should suffice:
$ awk -F',' '{sum+=$4} END {print sum+0}' file{1..3}.txt
1647
NOTES:
in the (unlikely?) case all files are empty, sum will be undefined so print sum will display a blank link; sum+0 insures we print 0 if sum remains undefined (ie, all files are empty)
for a variable number of files file{1..3}.txt can be replaced with whatever pattern will match on the desired set of files, eg, file*.txt, *.txt, etc
Here we go (no need to test NR==FNR in a concatenation):
$ cat file{1,2,3}.txt | awk -F, '{count+=$4}END{print count}'
1647
Or same-same 🇹🇭 (without wasting some pipe(s)):
$ awk -F, '{count+=$4}END{print count}' file{1,2,3}.txt
1647
$ perl -MList::Util=sum0 -F, -lane'push #a,$F[3];END{print sum0 #a}' file{1..3}.txt
1647
$ perl -F, -lane'push #a,$F[3];END{foreach(#a){ $sum +=$_ };print "$sum"}' file{1..3}.txt
1647
$ cut -d, -f4 file{1..3}.txt | paste -sd+ - | bc
1647

AWK print specific line & header

I have a tab seprated file containing 1000+ rows and a header. The samples are defined by the value in column 1. I want to split the file into multiple files by column 1 but ALSO include header. Currently I can easily split into files using:
awk -F'\t' '{print>$1}' file.tab
and that will give me x files each containing all the rows pertaining to each sample. However I also want to include the header which is row 1 in each of these files. How can I go about doing this?
Thanks.
Command:
awk -F'\t' 'NR==1 { H=$0; next } {if(!d[$1]) print H>$1; print>$1; d[$1]=1 }' file.tab
Example input:
FN DATA
1.txt abc
2.txt bcd
1.txt xyz
1.txt:
FN DATA
1.txt abc
1.txt xyz
2.txt:
FN DATA
2.txt bcd
another similar awk
awk -F'\t' 'NR==1{h=$0; next} {print (a[$1]++?"":h ORS) $0 > $1}' file
the trick is keeping track of the headers that are printed indexed by the filename key. If the file is sorted by key, there is an easier solution.

awk/gawk - remove line if line 2 doesn't exist

I have a .txt file with 2 rows, and a seperator, some lines only contain 1 row though, so I want to remove those that only contain 1 row.
example of lines are
Line to keep,
Iamnotyours:email#email.com
Line to remove,
Iamnotyours:
Given your posted sample input all you need is:
grep -v ':$' file
or if you insist on awk for some reason:
awk '!/:$/' file
If that's not all you need then edit your question to clarify your requirements.
awk to the rescue!
$ awk -F: 'NF==2' file
prints only the lines with two fields
$ awk -F: 'NF>1' file
prints lines more than one field. Your case, you have the separator in place, the field count will be two. You need to check whether second field is empty
$ awk -F: '$2!=""' file

I'm trying to compare two fastq file(paired reads),print line number n of another file

I'm trying to compare two fastq reads(paired reads) such that position(considering line number) of pattern match in file1.fastq is compared to file2.fastq. I want to print what lies on the same position or line number in file2.fastq. I'm trying to do this through awk. Ex. If my pattern match lies in line number 200 in file1, I want to see what is there in line 200 in file 2. Any suggestion on this appreciated.
In general, you want this form:
awk '
{ getline line_2 < "file2" }
/pattern/ { print FNR, line_2 }
' file1
Alternately, paste the files together first (assuming your shell is bash)
paste -d $'\1' file1 file2 | awk -F $'\1' '$1 ~ /pattern/ {print FNR, $2}'
I'm using CtrlA as the field delimiter, assuming that characters does not appear in your files.
My understanding is you have three files. A pattern file and two data files. You want to find the line numbers of the patterns in data file 1 and find the corresponding lines in data file 2. You'll get more help if you can clarify it and perhaps provide input files and expected output.
awk to the rescue!
awk -F: -vOFS=: 'NR==FNR{lines[$1]=$0;next} FNR in lines{print lines[FNR],$0}' <(grep -nf pattern data1) data2
will print line number, pattern matched from data file 1, and corresponding line from data file 2. For my made up files with quasi-random data I got.
1:s1265e:s1265e
2:s28629e:s28629e
3:s6630e:s6630e
4:s24530e:s24530e
5:s23216e:s23216e
6:s25985e:s25985e
My novice attempt so far
zcat file1.fastq.gz|awk '~/pattern/{print NR;}'>matches.csv
awk 'FNR==NR{a[$1]=$0;next;}(FNR in a)' matches.csv file2.fastq.gz

Using grep and awk to search and print the output to new file

I have 100 files and want to search a specific word in the first column of each file and print the content of all columns from this word to a new file
I tried this code but doesn't work well it prints only the content of one file not all:
ls -t *.txt > Filelist.tmp
cat Filelist.tmp | while read line do; grep "searchword" | awk '{print $0}' > outputfile.txt; done
This is what you want:
$ awk '$1~/searchword/' *.txt >> output
This compares the first field against searchword and appends the line to output if it matches. The default field separator with awk is whitespace.
The main problem with your attempt is you are overwriting > the file evertime, you want to be using append >>.