Print filenames & line number with number of fields greater than 'x' - awk

I am running Ubuntu Linux. I am in need to print filenames & line numbers containing more than 7 columns. There are several hundred thousand files.
I am able to print the number of columns per file using awk. However the output I am after is something like
file1.csv-463 which is to suggest file1.csv has more than 7 records on line 463. I am using awk command awk -F"," '{print NF}' * to print the number of fields across all files.
Please could I request help?

If you have GNU awk with you, try following code then. This will simply check condition if NF is greater than 7 then it will print that particular file's file name along with line number and nextfile will take program to next Input_file which will save our time because we need not to read whole Input_file then.
awk -F',' 'NF>7{print FILENAME,FNR;nextfile}' *.csv
Above will print only very first match of condition to get/print all matched lines try following then:
awk -F',' 'NF>7{print FILENAME,FNR}' *.csv

This might work for you (GNU sed):
sed -Ens 's/\S+/&/8;T;F;=;p' *.csv | paste - - -
If there is no eighth column, break.
Output the file name F, the line number = and print the current line p.
Feed the output into a paste command which prints three lines as one.
N.B. The -s option resets the line numbers for each file, without it, it will number each line for the entire input.

Related

Returning Nth line from multiple files

Given a folder with multiple .csv files, I want to return the Nth line from each file and write to a new file.
For a single file, I use
awk 'NR==5' file.csv
For multiple files, I figured
ls *.csv | xargs awk 'NR==5'
...however that only returns the 5th line from the first file in the list.
Thanks!
Could you please try following and let me know if this helps you(GNU awk should help I believe):
awk 'FNR==5{print;nextfile}' *.csv
In case you need to take output into a single out file then append > output_file at last of above command too.
Explanation:
FNR==5: Checking here if line number is 5th for current Input_file then do actions mentioned after it.
{print;}: print is awk out of the box keyword which prints the current line so it will print the 5th line only.
nextfile: as by name itself it is clear that nextfile will skip all the lines in current Input_file(since I have given *.csv in end so it means it will pass all csv files to awk 1 by 1) it will save time for us since we DO NOT want to read the entries Input_file we needed only 5th line and we got it.

Merge the next line if the current line contains a pattern at the end

I have a very big text file. I want to merge the next line into the current line if the current line has a word OR in the end.
Eg. Like in the lines below
somerandomstring OR
someotherrandomstring
The above 2 lines should become
somerandomstring OR someotherrandomstring
Only those lines should change. Rest of the lines must be kept as they are. Thanks in advance.
Allow me to extend the question a bit further.
I want to also see if the next line starts with OR and the OR is not in the end of the current line, then how to achieve the above case and this case together?
With GNU sed:
sed '/ OR$/{N;s/\n/ /}' file
Search white space followed byOR at end of line ($) and if found then read next line (N) to pattern space and replace newline in pattern space (with s///) by one white space.
If you want to edit your file "in place" use sed's option -i.
You can do this in awk by assigning the "OR" row to a variable and printing whatever is stored in the variable when there is no "OR" found.
awk '$NF=="OR"{buffer=buffer$0" "} $NF!="OR"{print buffer$0;buffer=""}' test.txt
This also works on multiple rows that may have "OR" by concatenating the row to the buffer variable until it finds a non-OR row, prints it and clears the buffer variable.
Another option in awk is to use printf on the OR rows and print on the non-or rows (which is kind of similar to the GNU sed example by #cyrus, but in awk)
awk '$NF=="OR"{printf "%s ", $0} $NF!="OR"{print $0}' test.txt
And this is the same beast, but using a ternary operator within printf:
awk '{printf $NF=="OR"?"%s ":"%s\n", $0}' test.txt
another awk
$2$ awk -v RS='^$' -v k='OR' '{gsub(k ORS,k FS); gsub(ORS k,FS k); printf "%s",$0}' file
x OR y
x OR y
for test input file
$ cat file
x OR
y
x
OR y

awk/gawk - remove line if line 2 doesn't exist

I have a .txt file with 2 rows, and a seperator, some lines only contain 1 row though, so I want to remove those that only contain 1 row.
example of lines are
Line to keep,
Iamnotyours:email#email.com
Line to remove,
Iamnotyours:
Given your posted sample input all you need is:
grep -v ':$' file
or if you insist on awk for some reason:
awk '!/:$/' file
If that's not all you need then edit your question to clarify your requirements.
awk to the rescue!
$ awk -F: 'NF==2' file
prints only the lines with two fields
$ awk -F: 'NF>1' file
prints lines more than one field. Your case, you have the separator in place, the field count will be two. You need to check whether second field is empty
$ awk -F: '$2!=""' file

print last two words of last line

I have a script which returns few lines of output and I am trying to print the last two words of the last line (irrespective of number of lines in the output)
$ ./test.sh
service is running..
check are getting done
status is now open..
the test is passed
I tried running as below but it prints last word of each line.
$ ./test.sh | awk '{ print $NF }'
running..
done
open..
passed
how do I print the last two words "is passed" using awk or sed?
Just say:
awk 'END {print $(NF-1), $NF}'
"normal" awks store the last line (but not all of them!), so that it is still accessible by the time you reach the END block.
Then, it is a matter of printing the penultimate and the last one. This can be done using the NF-1 and NF trick.
For robustness if your last line can only contain 1 field and your awk doesn't retain the field values in the END section:
awk '{split($0,a)} END{print (NF>1?a[NF-1]OFS:"") a[NF]}'
This might work for you (GNU sed):
sed '$s/.*\(\<..*\<.*\)/\1/p;d' file
This deletes all lines in the file but on the last line it replaces all words by the last two words and prints them if successful.

Command to replace specific column of csv file for first 100 rows

Following command is replacing second column with value e in a complete csv file,
but what if i want to replace only in first 100 rows.
awk -F, '{$2="e";}1' OFS=, file
Rest of the rows of csv file should be intact..
awk -F, 'NR<101{$2="e";}1' OFS=, file
NR built-in variable gives you either the total number of records being processed or line number depending on the usage. In the above awk example, NR variable has line number. When you put the pattern NR<101 the action will become true for first 100 lines. Once it is false, it will default to 1 which will print remaining lines as-is.
try this:
awk -F, 'NR<=100{$2="e"}1' OFS=, file