awk/gawk - remove line if line 2 doesn't exist - awk

I have a .txt file with 2 rows, and a seperator, some lines only contain 1 row though, so I want to remove those that only contain 1 row.
example of lines are
Line to keep,
Iamnotyours:email#email.com
Line to remove,
Iamnotyours:

Given your posted sample input all you need is:
grep -v ':$' file
or if you insist on awk for some reason:
awk '!/:$/' file
If that's not all you need then edit your question to clarify your requirements.

awk to the rescue!
$ awk -F: 'NF==2' file
prints only the lines with two fields
$ awk -F: 'NF>1' file
prints lines more than one field. Your case, you have the separator in place, the field count will be two. You need to check whether second field is empty
$ awk -F: '$2!=""' file

Related

Print filenames & line number with number of fields greater than 'x'

I am running Ubuntu Linux. I am in need to print filenames & line numbers containing more than 7 columns. There are several hundred thousand files.
I am able to print the number of columns per file using awk. However the output I am after is something like
file1.csv-463 which is to suggest file1.csv has more than 7 records on line 463. I am using awk command awk -F"," '{print NF}' * to print the number of fields across all files.
Please could I request help?
If you have GNU awk with you, try following code then. This will simply check condition if NF is greater than 7 then it will print that particular file's file name along with line number and nextfile will take program to next Input_file which will save our time because we need not to read whole Input_file then.
awk -F',' 'NF>7{print FILENAME,FNR;nextfile}' *.csv
Above will print only very first match of condition to get/print all matched lines try following then:
awk -F',' 'NF>7{print FILENAME,FNR}' *.csv
This might work for you (GNU sed):
sed -Ens 's/\S+/&/8;T;F;=;p' *.csv | paste - - -
If there is no eighth column, break.
Output the file name F, the line number = and print the current line p.
Feed the output into a paste command which prints three lines as one.
N.B. The -s option resets the line numbers for each file, without it, it will number each line for the entire input.

Awk one liner to copy last field to each member of line

Looking for a awk one liner for formatting some text in a file of this format where number of fields and number of lines are arbitrary:
abcd,abce,test1
bbcd,bbee,bbvc,test2
ccdd,ccbb,ccbd,ccab,testxyz
Where output is where the last field in each line is appended to each field in the line:
abcd,test1
abce,test1
bbcd,test2
bbee,test2
bbvc,test2
ccdd,testxyz
ccbb,testxyz
ccbd,testxyz
ccab,testxyz
Assuming all lines have at least 2 fields:
awk -F, '{OFS=","; for(i=1;i<NF;i++) print $i,$NF}' file
can do what you expect.
If there can be lines with just one field and you could just print it:
awk -F, '{OFS=","; for(i=1;i<NF;i++) print $i,$NF; if(NF==1) print $0}' file
This might work for you (GNU sed):
sed -r 's/,(.*(,[^,]*))$/\2\n\1/;P;D' file
If the line contains 2 or more commas, replace the first comma by a comma, the last field and a newline, print the first line, delete the first line and repeat.

I'm trying to compare two fastq file(paired reads),print line number n of another file

I'm trying to compare two fastq reads(paired reads) such that position(considering line number) of pattern match in file1.fastq is compared to file2.fastq. I want to print what lies on the same position or line number in file2.fastq. I'm trying to do this through awk. Ex. If my pattern match lies in line number 200 in file1, I want to see what is there in line 200 in file 2. Any suggestion on this appreciated.
In general, you want this form:
awk '
{ getline line_2 < "file2" }
/pattern/ { print FNR, line_2 }
' file1
Alternately, paste the files together first (assuming your shell is bash)
paste -d $'\1' file1 file2 | awk -F $'\1' '$1 ~ /pattern/ {print FNR, $2}'
I'm using CtrlA as the field delimiter, assuming that characters does not appear in your files.
My understanding is you have three files. A pattern file and two data files. You want to find the line numbers of the patterns in data file 1 and find the corresponding lines in data file 2. You'll get more help if you can clarify it and perhaps provide input files and expected output.
awk to the rescue!
awk -F: -vOFS=: 'NR==FNR{lines[$1]=$0;next} FNR in lines{print lines[FNR],$0}' <(grep -nf pattern data1) data2
will print line number, pattern matched from data file 1, and corresponding line from data file 2. For my made up files with quasi-random data I got.
1:s1265e:s1265e
2:s28629e:s28629e
3:s6630e:s6630e
4:s24530e:s24530e
5:s23216e:s23216e
6:s25985e:s25985e
My novice attempt so far
zcat file1.fastq.gz|awk '~/pattern/{print NR;}'>matches.csv
awk 'FNR==NR{a[$1]=$0;next;}(FNR in a)' matches.csv file2.fastq.gz

Command to replace specific column of csv file for first 100 rows

Following command is replacing second column with value e in a complete csv file,
but what if i want to replace only in first 100 rows.
awk -F, '{$2="e";}1' OFS=, file
Rest of the rows of csv file should be intact..
awk -F, 'NR<101{$2="e";}1' OFS=, file
NR built-in variable gives you either the total number of records being processed or line number depending on the usage. In the above awk example, NR variable has line number. When you put the pattern NR<101 the action will become true for first 100 lines. Once it is false, it will default to 1 which will print remaining lines as-is.
try this:
awk -F, 'NR<=100{$2="e"}1' OFS=, file

Keep only the line that is latest in the file and is a duplicate based on two fields

This is related to the questions
awk - Remove line if field is duplicate
sed/awk + regex delete duplicate lines where first field matches (ip address)
I have a file like this:
FOO,BAR,100,200,300
BAZ,TAZ,500,600,800
FOO,BAR,900,1000,1000
HERE,THERE,1000,200,100
FOO,BAR,100,10000,200
BAZ,TAZ,100,40,500
The duplicates are determined by the first two fields. In addition, the more "recent" record (lower in the file / higher line number) is the one that should be retained.
What is an awk script that will output:
BAZ,TAZ,100,40,500
FOO,BAR,100,10000,200
HERE,THERE,1000,200,100
Output order is not so important.
Explanation of awk syntax would be great.
This is easy in awk : we just need to feed an array with a key combined with the 1st and the 2nd columns and the rest as values :
$ awk -F, '{a[$1","$2]=$3","$4","$5}END{for(i in a)print i,a[i]}' OFS=, file.txt
BAZ,TAZ,100,40,500
HERE,THERE,1000,200,100
FOO,BAR,100,10000,200
This might work for you (tac and GNU sort):
tac file | sort -sut, -k1,2