Delete lines from file -- awk - awk

I have a file file.dat containing numbers, for example
4
6
7
I would like to use the numbers of this file to delete lines of another file.
Is there any way to pass this numbers as parameters to awk and delete these lines of another file?
I have this awk solution, but do not like it too much...
awk 'BEGIN { while( (getline x < "./file.dat" ) > 0 ) a[x]=0; } NR in a { next; }1' /path/to/another/file
Can you suggest something more elegant?

using NR==FNR to test which file awk is reading:
$ awk '{if(NR==FNR)idx[$0];else if(!(FNR in idx))print}' idx.txt data.txt
Or
$ awk 'NR==FNR{idx[$0]; next}; !(FNR in idx)' idx.txt data.txt
put index in idx.txt
put data in data.txt

I would use sed instead of awk:
$ sed $(for i in $(<idx.txt);do echo " -e ${i}d";done) file.txt

Related

Merge multiple files and split output to multiple files based in each column (post 2)

I have many csv files with exactly same format on rows and columns. In the example below I present only 2 files as input, but i have a lot files with same characteristics
The purpose is for each input file do:
Take value in row 1, 2 and 3.
example in first file
6174
15
3
Then, print first column from row 4 to 6.
Do same process for all input files and output a file with all information of all readed files.
When the process is done for all files and first column. Do the same of the rest columns
At the end the total files output created will be 4 files as there is 4 columns in each file.
Input1
Record Number 6174
Vibrator Identification 15
Start Time Error 3 us
1.6,19.5,,,
1.7,23.2,28.3,27.0
1.8,26.5,27.0,25.4
Input2
Record Number 6176
Vibrator Identification 17
Start Time Error 5 us
1.6,18.6,,,
1.5,23.5,19.7,19.2
1.3,26.8,19.2,18.5
Using the code below, I got the 4 output files as desired, although files 3-4, are not good as spected, because in the first lines there is empty values and my code does not work as supposed. Also I have an issue to get the good value in row 3 in each file.. I get us instead of a number.
output file1
6174,15,3,1.6,1.7,1.8
6176,17,5,1.6,1.5,1.3
output file2
6174,15,3,19.5,23.2,26.5
6176,17,5,18.6,23.5,26.8
output file3
6174,15,3,0,0,28.3,27.0
6176,17,5,0,0,19.7,19.2
output file4
6174,15,3,0,0,27.0,25.4
6176,17,5,0,0,19.2,18.5
code used
The code works almost fine, merge the csv files and output the 4 files requerides, but there is a problem for files 3-4, when there is empty values.
for f in *.csv ; do
awk -F, 'NR==1 {n=split($NF,f," ");print f[n]}' "$f" >> a-"$f"
awk -F, 'NR==2 {n=split($NF,f," ");print f[n]}' "$f" >> a-"$f"
awk -F, 'NR==3 {n=split($NF,f," ");print f[n]}' "$f" >> a-"$f"
sed -i 's/\r$//' a-"$f"
for i in seq $(1...4); do
awk -F, 'NR>=4{f=1} f{print '"$""$i"'} f==6{exit}' "$f" > "a""$i"-"$f"
cat a-"$f" a"$i""-""$f" >> t"$i"
sed -i 's/\r$//' t"$i"
done
for i in seq $(1...4); do
awk -v RS= -v OFS=',' -v ORS='\n' '{$1=$1}1' t"$i" > file"$i".csv
done
done
rm -f ./a* ./t*
Appreciate your help
With GNU awk for ENDFILE and automatic handling of multiple open files and assuming your posted sample output showing file3 and file4 each having more fields than file1 and file2 is a mistake:
$ cat tst.awk
BEGIN { FS=OFS=","; numHdrFlds=3 }
FNR <= numHdrFlds {
gsub(/[^0-9]/,"")
hdr = (FNR==1 ? "" : hdr OFS) $0
next
}
{
for (i=1; i<=NF; i++) {
data[i] = (FNR==(numHdrFlds+1) ? "" : data[i] OFS) ($i)+0
}
}
ENDFILE {
for ( fileNr=1; fileNr<=NF; fileNr++ ) {
print hdr, data[fileNr] > ("outputFile" fileNr)
}
}
.
$ awk -f tst.awk file1 file2
$ for i in outputFile*; do echo "$i"; cat "$i"; echo "---"; done
outputFile1
6174,15,3,1.6,1.7,1.8
6176,17,5,1.6,1.5,1.3
---
outputFile2
6174,15,3,19.5,23.2,26.5
6176,17,5,18.6,23.5,26.8
---
outputFile3
6174,15,3,0,28.3,27
6176,17,5,0,19.7,19.2
---
outputFile4
6174,15,3,0,27,25.4
6176,17,5,0,19.2,18.5
---

Replace end of line with comma and put parenthesis in sed/awk

I am trying to process the contents of a file from this format:
this1,EUR
that2,USD
other3,GBP
to this format:
this1(EUR),that2(USD),other3(GBP)
The result should be a single line.
As of now I have come up with this circuit of commands that works fine:
cat myfile | sed -e 's/,/\(/g' | sed -e 's/$/\)/g' | tr '\n' , | awk '{print substr($0, 0, length($0)- 1)}'
Is there a simpler way to do the same by just an awk command?
Another awk:
$ awk -F, '{ printf "%s%s(%s)", c, $1, $2; c = ","} END { print ""}' file
1(EUR),2(USD),3(GBP)
Following awk may help you on same.
awk -F, '{val=val?val OFS $1"("$2")":$1"("$2")"} END{print val}' OFS=, Input_file
Toying around with separators and gsub:
$ awk 'BEGIN{RS="";ORS=")\n"}{gsub(/,/,"(");gsub(/\n/,"),")}1' file
this1(EUR),that2(USD),other3(GBP)
Explained:
$ awk '
BEGIN {
RS="" # record ends in an empty line, not newline
ORS=")\n" # the last )
}
{
gsub(/,/,"(") # replace commas with (
gsub(/\n/,"),") # and newlines with ),
}1' file # output
Using paste+sed
$ # paste -s will combine all input lines to single line
$ seq 3 | paste -sd,
1,2,3
$ paste -sd, ip.txt
this1,EUR,that2,USD,other3,GBP
$ # post processing to get desired format
$ paste -sd, ip.txt | sed -E 's/,([^,]*)(,?)/(\1)\2/g'
this1(EUR),that2(USD),other3(GBP)

Filter file with awk and keep header in output

I have following CSV file:
a,b,c,d
x,1,1,1
y,1,1,0
z,1,0,0
I want to keep lines that add up more than 1, so I execute this awk command:
awk -F "," 'NR > 1{s=0; for (i=2;i<=NF;i++) s+=$i; if (s>1)print}' file
And obtain this:
x,1,1,1
y,1,1,0
How can I do the same but retain the first line (header)?
$ awk -F "," 'NR==1; NR > 1{s=0; for (i=2;i<=NF;i++) s+=$i; if (s>1)print}' file
a,b,c,d
x,1,1,1
y,1,1,0
Since it's only 0s and 1s:
$ awk 'NR==1 || gsub(/1/, "1") > 1' file
a,b,c,d
x,1,1,1
y,1,1,0

awk to split variable length record and add unique number on each group of records

i have a file which has variable length columns
x|y|XREC|DELIMITER|ab|cd|ef|IREC|DELIMITER|j|a|CREC|
p|q|IREC|DELIMITER|ww|xx|ZREC|
what i would like is
1|x|y|XREC|
1|ab|cd|ef|IREC|
1|j|a|CREC|
2|p|q|IREC|
2|ww|xx|ZREC|
So far i just managed to get seq number at the beginning
awk '{printf "%d|%s\n", NR, $0}' oldfile > with_seq.txt
Any help?
You could set the delimiter to DELIMITER:
$ awk -F 'DELIMITER[|]' '{for (i=1;i<=NF;i++)print NR"|"$i}' file
1|x|y|XREC|
1|ab|cd|ef|IREC|
1|j|a|CREC|
2|p|q|IREC|
2|ww|xx|ZREC|
Using awk
awk -F "DELIMITER" '{for(i=1;i<=NF;i++)print NR "|" $i}' file|sed 's/||/|/g'
1|x|y|XREC|
1|ab|cd|ef|IREC|
1|j|a|CREC|
2|p|q|IREC|
2|ww|xx|ZREC|

How to use awk sort by column 3

I have a file (user.csv)like this
ip,hostname,user,group,encryption,aduser,adattr
want to print all column sort by user,
I tried awk -F ":" '{print|"$3 sort -n"}' user.csv , it doesn't work.
How about just sort.
sort -t, -nk3 user.csv
where
-t, - defines your delimiter as ,.
-n - gives you numerical sort. Added since you added it in your
attempt. If your user field is text only then you dont need it.
-k3 - defines the field (key). user is the third field.
Use awk to put the user ID in front.
Sort
Use sed to remove the duplicate user ID, assuming user IDs do not contain any spaces.
awk -F, '{ print $3, $0 }' user.csv | sort | sed 's/^.* //'
Seeing as that the original question was on how to use awk and every single one of the first 7 answers use sort instead, and that this is the top hit on Google, here is how to use awk.
Sample net.csv file with headers:
ip,hostname,user,group,encryption,aduser,adattr
192.168.0.1,gw,router,router,-,-,-
192.168.0.2,server,admin,admin,-,-,-
192.168.0.3,ws-03,user,user,-,-,-
192.168.0.4,ws-04,user,user,-,-,-
And sort.awk:
#!/usr/bin/awk -f
# usage: ./sort.awk -v f=FIELD FILE
BEGIN {
FS=","
}
# each line
{
a[NR]=$0 ""
s[NR]=$f ""
}
END {
isort(s,a,NR);
for(i=1; i<=NR; i++) print a[i]
}
#insertion sort of A[1..n]
function isort(S, A, n, i, j) {
for( i=2; i<=n; i++) {
hs = S[j=i]
ha = A[j=i]
while (S[j-1] > hs) {
j--;
S[j+1] = S[j]
A[j+1] = A[j]
}
S[j] = hs
A[j] = ha
}
}
To use it:
awk sort.awk f=3 < net.csv # OR
chmod +x sort.awk
./sort.awk f=3 net.csv
You can choose a delimiter, in this case I chose a colon and printed the column number one, sorting by alphabetical order:
awk -F\: '{print $1|"sort -u"}' /etc/passwd
awk -F, '{ print $3, $0 }' user.csv | sort -nk2
and for reverse order
awk -F, '{ print $3, $0 }' user.csv | sort -nrk2
try this -
awk '{print $0|"sort -t',' -nk3 "}' user.csv
OR
sort -t',' -nk3 user.csv
awk -F "," '{print $0}' user.csv | sort -nk3 -t ','
This should work
To exclude the first line (header) from sorting, I split it out into two buffers.
df | awk 'BEGIN{header=""; $body=""} { if(NR==1){header=$0}else{body=body"\n"$0}} END{print header; print body|"sort -nk3"}'
With GNU awk:
awk -F ',' '{ a[$3]=$0 } END{ PROCINFO["sorted_in"]="#ind_str_asc"; for(i in a) print a[i] }' file
See 8.1.6 Using Predefined Array Scanning Orders with gawk for more sorting algorithms.
I'm running Linux (Ubuntu) with mawk:
tmp$ awk -W version
mawk 1.3.4 20200120
Copyright 2008-2019,2020, Thomas E. Dickey
Copyright 1991-1996,2014, Michael D. Brennan
random-funcs: srandom/random
regex-funcs: internal
compiled limits:
sprintf buffer 8192
maximum-integer 2147483647
mawk (and gawk) has an option to redirect the output of print to a command. From man awk chapter 9. Input and output:
The output of print and printf can be redirected to a file or command by appending > file, >> file or | command to the end of the print statement. Redirection opens file or command only once, subsequent redirections append to the already open stream.
Below you'll find a simplied example how | can be used to pass the wanted records to an external program that makes the hard work. This also nicely encapsulates everything in a single awk file and reduces the command line clutter:
tmp$ cat input.csv
alpha,num
D,4
B,2
A,1
E,5
F,10
C,3
tmp$ cat sort.awk
# print header line
/^alpha,num/ {
print
}
# all other lines are data lines that should be sorted
!/^alpha,num/ {
print | "sort --field-separator=, --key=2 --numeric-sort"
}
tmp$ awk -f sort.awk input.csv
alpha,num
A,1
B,2
C,3
D,4
E,5
F,10
See man sort for the details of the sort options:
-t, --field-separator=SEP
use SEP instead of non-blank to blank transition
-k, --key=KEYDEF
sort via a key; KEYDEF gives location and type
-n, --numeric-sort
compare according to string numerical value