This is test.txt:
0x01,0xDF,0x93,0x65,0xF8
0x01,0xB0,0x01,0x03,0x02,0x00,0x64,0x06,0x01,0xB0
0x01,0xB2,0x00,0x76
If I run
awk -F, 'BEGIN{OFS=","}{$2="";print $0}' test.txt
the result is:
0x01,,0x93,0x65,0xF8
0x01,,0x01,0x03,0x02,0x00,0x64,0x06,0x01,0xB0
0x01,,0x00,0x76
The $2 wasn't deleted, it just became empty.
I hope, when printing $0, that the result is:
0x01,0x93,0x65,0xF8
0x01,0x01,0x03,0x02,0x00,0x64,0x06,0x01,0xB0
0x01,0x00,0x76
All the existing solutions are good though this is actually a tailor made job for cut:
cut -d, -f 1,3- file
0x01,0x93,0x65,0xF8
0x01,0x01,0x03,0x02,0x00,0x64,0x06,0x01,0xB0
0x01,0x00,0x76
If you want to remove 3rd field then use:
cut -d, -f 1,2,4- file
To remove 4th field use:
cut -d, -f 1-3,5- file
I believe simplest would be to use sub function to replace first occurrence of continuous ,,(which are getting created after you made 2nd field NULL) with single ,. But this assumes that you don't have any commas in between field values.
awk 'BEGIN{FS=OFS=","}{$2="";sub(/,,/,",");print $0}' Input_file
2nd solution: OR you could use match function to catch regex from first comma to next comma's occurrence and get before and after line of matched string.
awk '
match($0,/,[^,]*,/){
print substr($0,1,RSTART-1)","substr($0,RSTART+RLENGTH)
}' Input_file
It's a bit heavy-handed, but this moves each field after field 2 down a place, and then changes NF so the unwanted field is not present:
$ awk -F, -v OFS=, '{ for (i = 2; i < NF; i++) $i = $(i+1); NF--; print }' test.txt
0x01,0x93,0x65,0xF8
0x01,0x01,0x03,0x02,0x00,0x64,0x06,0x01
0x01,0x00,0x76
$
Tested with both GNU Awk 4.1.3 and BSD Awk ("awk version 20070501" on macOS Mojave 10.14.6 — don't ask; it frustrates me too, but sometimes employers are not very good at forward thinking). Setting NF may or may not work on older versions of Awk — I was a little surprised it did work, but the surprise was a pleasant one, for a change.
If Awk is not an absolute requirement, and the input is indeed as trivial as in your example, sed might be a simpler solution.
sed 's/,[^,]*//' test.txt
This is especially elegant if you want to remove the second field. A more generic approach to remove, the nth field would require you to put in a regex which matches the first n - 1 followed by the nth, then replace that with just the the first n - 1.
So for n = 4 you'd have
sed 's/\([^,]*,[^,]*,[^,]*,\)[^,]*,/\1/' test.txt
or more generally, if your sed dialect understands braces for specifying repetitions
sed 's/\(\([^,]*,\)\{3\}\)[^,]*,/\1/' test.txt
Some sed dialects allow you to lose all those pesky backslashes with an option like -r or -E but again, this is not universally supported or portable.
In case it's not obvious, [^,] matches a single character which is not (newline or) comma; and \1 recalls the text from first parenthesized match (back reference; \2 recalls the second, etc).
Also, this is completely unsuitable for escaped or quoted fields (though I'm not saying it can't be done). Every comma acts as a field separator, no matter what.
With GNU sed you can add a number modifier to substitute nth match of non-comma characters followed by comma:
sed -E 's/[^,]*,//2' file
Using awk in a regex-free way, with the option to choose which line will be deleted:
awk '{ col = 2; n = split($0,arr,","); line = ""; for (i = 1; i <= n; i++) line = line ( i == col ? "" : ( line == "" ? "" : "," ) arr[i] ); print line }' test.txt
Step by step:
{
col = 2 # defines which column will be deleted
n = split($0,arr,",") # each line is split into an array
# n is the number of elements in the array
line = "" # this will be the new line
for (i = 1; i <= n; i++) # roaming through all elements in the array
line = line ( i == col ? "" : ( line == "" ? "" : "," ) arr[i] )
# appends a comma (except if line is still empty)
# and the current array element to the line (except when on the selected column)
print line # prints line
}
Another solution:
You can just pipe the output to another sed and squeeze the delimiters.
$ awk -F, 'BEGIN{OFS=","}{$2=""}1 ' edward.txt | sed 's/,,/,/g'
0x01,0x93,0x65,0xF8
0x01,0x01,0x03,0x02,0x00,0x64,0x06,0x01,0xB0
0x01,0x00,0x76
$
Commenting on the first solution of #RavinderSingh13 using sub() function:
awk 'BEGIN{FS=OFS=","}{$2="";sub(/,,/,",");print $0}' Input_file
The gnu-awk manual: https://www.gnu.org/software/gawk/manual/html_node/Changing-Fields.html
It is important to note that making an assignment to an existing field changes the value of $0 but does not change the value of NF, even when you assign the empty string to a field." (4.4 Changing the Contents of a Field)
So, following the first solution of RavinderSingh13 but without using, in this case,sub() "The field is still there; it just has an empty value, delimited by the two colons":
awk 'BEGIN {FS=OFS=","} {$2="";print $0}' file
0x01,,0x93,0x65,0xF8
0x01,,0x01,0x03,0x02,0x00,0x64,0x06,0x01,0xB0
0x01,,0x00,0x76
My solution:
awk -F, '
{
regex = "^"$1","$2
sub(regex, $1, $0);
print $0;
}'
or one line code:
awk -F, '{regex="^"$1","$2;sub(regex, $1, $0);print $0;}' test.txt
I found that OFS="," was not necessary
I would do it following way, let file.txt content be:
0x01,0xDF,0x93,0x65,0xF8
0x01,0xB0,0x01,0x03,0x02,0x00,0x64,0x06,0x01,0xB0
0x01,0xB2,0x00,0x76
then
awk 'BEGIN{FS=",";OFS=""}{for(i=2;i<=NF;i+=1){$i="," $i};$2="";print}' file.txt
output
0x01,0x93,0x65,0xF8
0x01,0x01,0x03,0x02,0x00,0x64,0x06,0x01,0xB0
0x01,0x00,0x76
Explanation: I set OFS to nothing (empty string), then for 2nd and following column I add , at start. Finally I set what is now comma and value to nothing. Keep in mind this solution would need rework if you wish to remove 1st column.
I have a text file with the following structure:
bla1
bla2
bla3
bla4
bla5
So you can see that some lines of text are preceeded by an empty line.
I understand that sed has the concept of two buffers, a pattern space buffer and a hold space buffer, so I'm guessing these need to come in to play here, but I'm unclear how to specify them to accomplish what I need.
In my contrived example above, I'd expect to see the following lines outputted:
bla3
bla5
sed is for doing s/old/new on individual lines, that is all. Any time you start talking about buffers or doing anything related to multi-lines comparisons you're using the wrong tool.
You could do this with awk:
$ awk -v RS= -F'\n' 'NR>1{print $1}' file
bla3
bla5
but it would fail to print the first non-empty line if the first line(s) in the file were empty so this may be what you want if you want lines of all space chars considered to be empty lines:
$ awk 'NF && !p{print} {p=NF}' file
bla3
bla5
and this otherwise:
$ awk '($0!="") && (p==""){print} {p=$0}' file
bla3
bla5
All of the above will work even if there are multiple empty lines preceding any given non-empty line.
To see the difference between the 3 approaches (which you won't see given the sample input in the question):
PS1> printf '\nfoo\n \nbar\n\netc\n' | cat -E
$
foo$
$
bar$
$
etc$
PS1> printf '\nfoo\n \nbar\n\netc\n' | awk -v RS= -F'\n' 'NR>1{print $1}'
etc
PS1> printf '\nfoo\n \nbar\n\netc\n' | awk 'NF && !p{print} {p=NF}'
foo
bar
etc
PS1> printf '\nfoo\n \nbar\n\netc\n' | awk '($0!="") && (p==""){print} {p=$0}'
foo
etc
You can use the hold buffer easily to print the line before the blank like this:
sed -n -e '/^$/{x; p;}' -e h input
But I don't see an easy way to use it for your use case. For your case, instead of using the hold buffer, you could do:
sed -n -e '/^$/ba' -e d -e :a -e n -e p input
But I would do this with awk.
awk 'NR!=1{print $1}' RS= FS=\\n input-file
awk 'p;{p=/^$/}' file
above command does these for each line:
if p is 1, print line;
if line is empty, set p to 1.
if lines consisting of one or more spaces are also considered empty:
awk 'p;{p=!NF}' file
to print non-empty lines each coming right after an empty line, you can use this:
awk 'p*!(p=/^$/)' file
if p is 1 and this line is not empty (1*!(0) = 1*1 = 1), print this line;
otherwise (1*!(1) = 1*0 = 0, 0*anything = 0), don't print anything.
note that this one may not work with all awks, a portable version of this would look like:
awk 'p*(/./);{p=/^$/}' file
if lines consisting of one or more spaces are also considered empty:
awk 'p*NF;{p=!NF}' file
see them online here, and here.
If sed/awk is not mandatory, you can do it with grep:
grep -A 1 '^$' input.txt | grep -v -E '^$|--'
You can use sed to match a range of lines and do sub-matches inside the matches, like so:
# - use the "-n" option to omit printing of lines
# - match lines between a blank line (/^$/) and a non-blank one (/^./),
# then print only the line that contains at least a character,
# i.e, the non-blank line.
sed -ne '
/^$/,/^./ {
/^./{ p; }
}' input.txt
tested by gnu sed, your data in 'a':
$ sed -nE '/^$/{N;s/\n(.+)/\1/p}' a
bla3
bla5
add -i option precedes -n to real editing
I'd like to transform a table in such a way that for duplicated
values in column #2 it would have corresponding values from column #1.
I.e. something like that...
MZ00024296 AC148152.3_FG005
MZ00047079 AC148152.3_FG006
MZ00028122 AC148152.3_FG008
MZ00032922 AC148152.3_FG008
MZ00048218 AC148152.3_FG008
MZ00024680 AC148167.6_FG001
MZ00013456 AC149475.2_FG003
to
AC148152.3_FG005 MZ00024296
AC148152.3_FG006 MZ00047079
AC148152.3_FG008 MZ00028122|MZ00032922|MZ00048218
AC148167.6_FG001 MZ00024680
AC149475.2_FG003 MZ00013456
As I need it to computations in R I tried to use:
x=aggregate(mz_grmz,by=list(mz_grmz[,2]),FUN=paste(mz_grmz[,1],sep="|"))
but it don't work (wrong function)
Error in match.fun(FUN) :
'paste(mz_grmz[, 1], sep = "|")' is not a function, character or symbol
I also remind myself about unstack() function, but it isn't what I need.
I tried to do it using awk, based on my base knowledge I reworked code given here:
site1
#! /bin/sh
for y do
awk -v FS="\t" '{
for (x=1;x<=NR;x++) {
if (NR>2 && x=x+1) {
print $2"\t"x
}
else {print NR}
}
}' $y > $y.2
done
unfortunately it doesn't work, it's only produce enormous file with field #2 and some numbers.
I suppose it is easy task, but it is above my skills right now.
Could somebody give me a hint? Maybe just function to use in aggregate in R.
Thanks
You could do it in awk like this:
awk '
{
if ($2 in a)
a[$2] = a[$2] "|" $1
else
a[$2] = $1
}
END {
for (i in a)
print i, a[i]
}' INFILE > OUTFILE
to keep the output as same as the text in your question (empty lines etc..):
awk '{if($0 &&($2 in a))a[$2]=a[$2]"|"$1;else if ($0) a[$2]=$1;}\
END{for(x in a){print x,a[x];print ""}}' inputFile
test:
kent$ echo "MZ00024296 AC148152.3_FG005
MZ00047079 AC148152.3_FG006
MZ00028122 AC148152.3_FG008
MZ00032922 AC148152.3_FG008
MZ00048218 AC148152.3_FG008
MZ00024680 AC148167.6_FG001
MZ00013456 AC149475.2_FG003"|awk '{if($0 &&($2 in a))a[$2]=a[$2]"|"$1;else if ($0) a[$2]=$1;}END{for(x in a){print x,a[x];print ""}}'
AC149475.2_FG003 MZ00013456
AC148152.3_FG005 MZ00024296
AC148152.3_FG006 MZ00047079
AC148152.3_FG008 MZ00028122|MZ00032922|MZ00048218
AC148167.6_FG001 MZ00024680
This GNU sed solution might work for you:
sed -r '1{h;d};H;${x;s/(\S+)\s+(\S+)/\2\t\1/g;:a;s/(\S+\t)([^\n]*)(\n+)\1([^\n]*)\n*/\1\2|\4\3/;ta;p};d' input_file
Explanation: Use the extended regex option-r to make regex's more readable. Read the whole file into the hold space (HS). Then on end-of-file, switch to the HS and firstly swap and tab separate fields. Then compare the first fields in adjacent lines and if they match, tag the second field from the second record to the first line separated by a |. Repeated until no further adjacent lines have duplicate first fields then print the file out.