How can I adjust a text file in VIM to have two columns instead of three while not splitting paired data? - awk

Hello there and thank you for reading this!
I have a very large text file that looks something like this:
a1 b1 a2
b2 a3 b3
a4 b4 a5
b5 a6 b6
I would like my text file to look like this:
a1 b1
a2 b2
a3 b3
a4 b4
a5 b5
a6 b6
In reality, these values are paired lon/lat coordinates. If it is useful the values look like:
1.591336e+02 4.978998e+01 1.591162e+02
4.977995e+01 1.590988e+02 4.976991e+01
1.590815e+02 4.975988e+01 1.590641e+02
4.974984e+01 1.590468e+02 4.973980e+01
I have been learning in vim, but I do have access to other tools if this is easier done elsewhere. Should I be looking for a sed or awk command that will assess the amount of spaces in a given row? I appreciate any and all advice, and if I can offer any more information, I would be glad to!
I have searched for other folks who have had this question, but I don't know how to apply some of the solutions I've seen for similar problems to this - and I'm afraid of messing up this very large file. I am expecting the answer to be something using sed or awk, but I don't know how to be successful with these commands with what I've found so far. I'm rather new to coding and this site, so if I missed this question already being asked, I apologize!
All the best!
EDIT: I used
sed 's/\s/\n/2;P;D' file.txt > newfile.txt
to turn my file into:
a1 b1
a2^M
b2 a3
b3^M
a4 b4
a5^M
b5 a6
b6^M
I then used:
dos2unix newfile.txt
to get rid of the ^M within the data. I haven't made it to the structure, but I am one step closer.

$ tr ' ' '\n' <input_file|paste -d" " - -
a1 b1
a2 b2
a3 b3
$ sed 's/ /\n/2; P; D' <(tr '\n' ' ' <input_file)
a1 b1
a2 b2
a3 b3
$ tr '\n' ' ' <input_file|xargs -d" " printf '%s %s\n'
a1 b1
a2 b2
a3 b3

An approach using awk
% awk '{for(i=1;i<=NF;i++){x++; printf x%2==0 ? $i"\n" : $i" "}}' file
a1 b1
a2 b2
a3 b3
a4 b4
a5 b5
a6 b6
Data
% cat file
a1 b1 a2
b2 a3 b3
a4 b4 a5
b5 a6 b6

Using GNU sed
$ sed -Ez 's/([^ \n]*)[ \n]([^ \n]* ?)\n?/\1 \2\n/g' input_file
a1 b1
a2 b2
a3 b3
a4 b4
a5 b5
a6 b6

Using cut:
$ cut -d' ' -f1,2 test
a1 b1
b2 a3
a4 b4
b5 a6

In vim you can
:%j
to join all the lines, then
:s/\([^ ]\+ [^ ]\+\) /\1\r/g
to turn every 2nd space into a newline.
With perl
perl -0777 -lpe 's/(\S+)\s+(\S+)\s+/$1 $2\n/g' file
That reads the whole file into memory, so it depends on what "very large" is. Is it smaller than the amount of memory you have?

FWIW, here is a bit of a meta-answer.
Vim lets you filter all or some of the lines in your buffer via an external program with :help :!. It is very handy because, wile Vim can do a lot, there are plenty of use cases for which external tools would do a better job.
So… if you already have the file opened in Vim, you should be able to apply the provided answers with little effort:
:%!tr ' ' '\n' | paste -d" " - -
:%!tr '\n' ' ' | sed 's/ /\n/2; P; D'
:%!perl -0777 -lpe 's/(\S+)\s+(\S+)\s+/$1 $2\n/g'
etc.
Of note:
The command after the [range]! takes the lines covered by [range] as standard input which makes constructs like <filename unnecessary in this context.
Vim expands % to the current filename and # to the alternate filename if they exist so they usually need to be escaped, as in:
:%!tr '\n' ' ' | xargs printf '\%s \%s\n'
There's a lot to learn from this thread. Good luck.

This might work for you (GNU sed):
sed '$!N;y/\n/ /;s/ /\n/2;P;D' file
Append the following line if not the last.
Translate all newlines to spaces.
Replace the second space by a newline.
Print the first line, delete the first line and repeat.

Related

how to extract two strings from each line

I have a file of the following content:
product a1 version a2 owner a3
owner b1 version b2 product b3 size b4
....
I am interested in extracting product and version from each line using a shell script, and write them in 2 columns with product first and version second. So the output should be:
a1 a2
b3 b2
...
I used "while read line", but it is extremely slow. I tried to use awk, but couldn't figure out how to do it. Any help is appreciated.
The following will do what you want:
$ nl dat
1 product a1 version a2 owner a3
2 owner b1 version b2 product b3 size b4
$ awk 'NF { delete row;
for( i=1; i <= NF; i += 2 ) {
row[$i] = $(i+1)
}
print( row["product"], row["version"])
}' dat
a1 a2
b3 b2
This builds an associative array from the name-value pairs in your data file by position, and then retrieves the values by name. The NF in the pattern ensures blank lines are ignored. If product or version are otherwise missing, they'll likewise be missing in the output.
A different perl approach:
perl -lane 'my %h = #F; print "$h{product} $h{version}"' input.txt
Uses auto-split mode to put each word of each line in an array, turns that into a hash/associative array, and prints out the keys you're interested in.
Here is a perl to do that:
perl -lnE '
$x=$1 if /(?=product\h+(\H+))/;
$y=$1 if /(?=version\h+(\H+))/;
say "$x $y" if $x && $y;
$x=$y="";' file
Or, same method with GNU awk:
gawk '/product/ && /version/{
match($0,/product[ \t]+([^ \t]+)/,f1)
match($0,/version[ \t]+([^ \t]+)/,f2)
print f1[1],f2[1]
}' file
With the example, either prints:
a1 a2
b3 b2
The advantage here is only complete lines are printed where both targets are found.
With awk:
$ awk '{for(i=1;i<NF;i++){
if($i=="version")v=$(i+1)
if($i=="product")p=$(i+1)}}
{print p,v}' data.txt
a1 a2
b3 b2
If you have lines without a version or product number, and you want to skip them:
awk '{ok=0}
{for(i=1;i<NF;i++){
if($i=="version"){v=$(i+1);ok++}
if($i=="product"){p=$(i+1);ok++}}}
ok==2{print p,v}' data.txt
Thank you guys for the quick and excellent replies. I ended up using the awk version as it is most convenient for me to insert into an existing shell script. But I learned a lot from other scripts too.

Print lines between two patterns along with the header

I have the file like this below:
Name: DB1
========================================================
Primary :
f3
f6
f7
f9
f0
Secondary :
internal input
internal output
internal Loaded
internal output
internal Loaded
Name: DB2
========================================================
Primary :
s2
m5
m7
m8
m9
Secondary :
External output
External Revoke
External Reuse
External input
But I need the output like this need to extract the lines between Primary and Secondary along with the names:
Name: DB1
========================================================
f3
f6
f7
f9
f0
Name: DB2
========================================================
Primary :
s2
m5
m7
m8
I tried this :
$ awk '/Primary :/{flag=1; next} /Undriven :/{flag=0} flag' file
f3
f6
f7
f9
f0
s2
m5
m7
m8
m9
I am not getting the names can anyone please help me in this.
It looks like you're pretty close, except that (a) you're never explicitly matching the Name: line, and (b) you're matching the word "Undriven" which doesn't appear in your sample data.
I would probably do something like this:
awk '
/^Name:/
/^====/
/^Primary :/{flag=1; next}
/^Secondary :/{flag=0}
flag
' file
Which produces as output:
Name: DB1
========================================================
f3
f6
f7
f9
f0
Name: DB2
========================================================
s2
m5
m7
m8
m9
If this isn't all you need then edit your question to provide more truly representative sample input/output that this doesn't work for:
$ awk -v RS= -F'\n' '{for (i=1;i<=8;i++) print $i; print ""}' file
Name: DB1
========================================================
Primary :
f3
f6
f7
f9
f0
Name: DB2
========================================================
Primary :
s2
m5
m7
m8
m9
or:
$ awk -v RS= -v FS='\n' '{print $1 ORS $2; for (i=4;i<=8;i++) print $i; print ""}' file
Name: DB1
========================================================
f3
f6
f7
f9
f0
Name: DB2
========================================================
s2
m5
m7
m8
m9

Replace in every nth line starting from a certain line

I want to replace on every third line starting from second line using sed.
Input file
A1
A2
A3
A4
A5
A6
A7
.
.
.
Expected output
A1
A2
A3
A4_edit
A5
A6
A7_edit
.
.
.
I know there are many solution releted to this is available on stack but for this specific problem, I was unable to find.
My try:
sed '1n;s/$/_edit/;n'
This only replacing on every second line from the beginning.
Something like this?
$ seq 10 | sed '1b ; n ; n ; s/$/_edit/'
1
2
3
4_edit
5
6
7_edit
8
9
10_edit
This breaks down to a cycle of
1b if this is the first line in the input, start the next cycle, using sed default behaviour to print the line and read the next one - which skips the first line in the input
n print the current line and read the next line - which skips the first line in a group of three
n print the current line and read the next line - which skips the second line in a group of three
s/$/_edit/ substitute the end of line for _edit on the third line of each group of three
then use the default sed behaviour to print, read next line and start the cycle again
If you want to skip more than one line at the start, change 1b to, say, 1,5b.
As Wiktor Stribiżew has pointed out in the comments, as an alternative, there is a GNU range extension first~step which allows us to write
sed '4~3s/$/_edit/'
which means substitute on every third line starting from line 4.
In case you are ok with awk, try following.
awk -v count="-1" '++count==3{$0=$0"_edit";count=0} 1' Input_file
Append > temp_file && mv temp_file Input_file in case you want to save output into Input_file itself.
Explanation:
awk -v count="-1" ' ##Starting awk code here and mentioning variable count whose value is -1 here.
++count==3{ ##Checking condition if increment value of count is equal to 3 then do following.
$0=$0"_edit" ##Appending _edit to current line value.
count=0 ##Making value of count as ZERO now.
} ##Closing block of condition ++count==3 here.
1 ##Mentioning 1 will print edited/non-edited lines.
' Input_file ##Mentioning Input_file name here.
Another awk
awk 'NR>3&&NR%3==1{$0=$0"_edit"}1' file
A1
A2
A3
A4_edit
A5
A6
A7_edit
A8
A9
A10_edit
A11
A12
A13_edit
NR>3 Test if line is larger then 3
NR%3==1 and every third line
{$0=$0"_edit"} edit the line
1 print everything
You can use seds ~ step operator.
sed '4~3s|$|_edit|'
~ is a feature of GNU sed, so it will be available in most (all?) distros of Linux. But to use it on macOS (which comes with BSD sed), you would have to install GNU sed to get this feature: brew install gnu-sed.

Convert a tree to a list of paths using awk [duplicate]

I have input files with the structure like the next:
a1
b1
c1
c2
c3
b2
c1
d1
d2
b3
b4
a2
a3
b1
b2
c1
c2
Each level is indented by 2 spaces. The needed output is:
a1/b1/c1
a1/b1/c2
a1/b1/c3
a1/b2/c1/d1
a1/b2/c1/d2
a1/b3
a1/b4
a2
a3/b1
a3/b2/c1
a3/b2/c2
It is like a filesystem, if the next line have bigger indentation, the current one is like a "directory" and when have same indentation it is like a "file". Need print full paths of "files".
Trying to solve this without any high-level language, like python, perl - with only basic bash commands.
My current code/idea is based on recursive function call and working with a stack, but have problem with the "logic". The code currently outputs the next:
a1 b1 c1
a1 b1
a1
DD: line 8: [0-1]: bad array subscript
only the 1st line is OK - so handling the recursion is wrong...
input="ifile.tree"
#stack array
declare -a stack
#stack manipulation
pushstack() { stack+=("$1"); }
popstack() { unset stack[${#stack[#]}-1]; }
printstack() { echo "${stack[*]}"; }
#recursive function
checkline() {
local uplev=$1
#read line - if no more lines - print the stack and return
read -r level text || (printstack; exit 1) || return
#if the current line level is largest than previous level
if [[ $uplev < $level ]]
then
pushstack "$text"
checkline $level #recurse
fi
printstack
popstack
}
# MAIN PROGRAM
# change the input from indented spaces to
# level_number<space>text
(
#subshell - change IFS
IFS=,
while read -r spaces content
do
echo $(( (${#spaces} / 2) + 1 )) "$content"
done < <(sed 's/[^ ]/,&/' < "$input")
) | ( #pipe to another subshell
checkline 0 #recurse by levels
)
Sry for the long code - can anybody help?
interesting question.
this awk (could be one-liner) command does the job:
awk -F' ' 'NF<=p{for(i=1;i<=p;i++)printf "%s%s", a[i],(i==p?RS:"/")
if(NF<p)for(i=NF;i<=p;i++) delete a[i]}
{a[NF] =$NF;p=NF }
END{for(i=1;i<=NF;i++)printf "%s%s", a[i],(i==NF?RS:"/")}' file
you can see above, there are duplicated codes, you can extract them into a function if you like.
test with your data:
kent$ cat f
a1
b1
c1
c2
c3
b2
c1
d1
d2
b3
b4
a2
a3
b1
b2
c1
c2
kent$ awk -F' ' 'NF<=p{for(i=1;i<=p;i++)printf "%s%s", a[i],(i==p?RS:"/")
if(NF<p)for(i=NF;i<=p;i++) delete a[i]}
{a[NF] =$NF;p=NF }END{for(i=1;i<=NF;i++)printf "%s%s", a[i],(i==NF?RS:"/")} ' f
a1/b1/c1
a1/b1/c2
a1/b1/c3
a1/b2/c1/d1
a1/b2/c1/d2
a1/b3
a1/b4
a2
a3/b1
a3/b2/c1
a3/b2/c2
I recently had to do something similar enough that with a few tweaks I can post my script here:
#!/bin/bash
prev_level=-1
# Index into node array
i=0
# Regex to screen-scrape all nodes
tc_re="^(( )*)(.*)$"
while IFS= read -r ln; do
if [[ $ln =~ $tc_re ]]; then
# folder level indicated by spaces in preceding node name
spaces=${#BASH_REMATCH[1]}
# 2 space characters per level
level=$(($spaces / 2))
# Name of the folder or node
node=${BASH_REMATCH[3]}
# get the rest of the node path from the previous entry
curpath=( ${curpath[#]:0:$level} $node )
# increment i only if the current level is <= the level of the previous
# entry
if [ $level -le $prev_level ]; then
((i++))
fi
# add this entry (overwrite previous if $i was not incremented)
tc[$i]="${curpath[#]}"
# save level for next iteration
prev_level=$level
fi
done
for p in "${tc[#]}"; do
echo "${p// //}"
done
Input is taken from STDIN, so you'd have to do something like this:
$ ./tree2path.sh < ifile.tree
a1/b1/c1
a1/b1/c2
a1/b1/c3
a1/b2/c1/d1
a1/b2/c1/d2
a1/b3
a1/b4
a2
a3/b1
a3/b2/c1
a3/b2/c2
$

Deleting columns from a file with awk or from command line on linux

How can I delete some columns from a tab separated fields file with awk?
c1 c2 c3 ..... c60
For example, delete columns between 3 and 29 .
This is what the cut command is for:
cut -f1,2,30- inputfile
The default is tab. You can change that with the -d switch.
You can loop over all columns and filter out the ones you don't want:
awk '{for (i=1; i<=NF; i++) if (i<3 || i>29) printf $i " "; print""}' input.txt
where the NF gives you the total number of fields in a record.
For each column that meets the condition we print the column followed by a space " ".
EDIT: updated after remark from johnny:
awk -F 'FS' 'BEGIN{FS="\t"}{for (i=1; i<=NF-1; i++) if(i<3 || i>5) {printf $i FS};{print $NF}}' input.txt
this is improved in 2 ways:
keeps the original separators
does not append a separator at the end
awk '{for(z=3;z<=15;z++)$z="";$0=$0;$1=$1}1'
Input
c1 c2 c3 c4 c5 c6 c7 c8 c9 c10 c11 c12 c13 c14 c15 c16 c17 c18 c19 c20 c21
Output
c1 c2 c16 c17 c18 c19 c20 c21
Perl 'splice' solution which does not add leading or trailing whitespace:
perl -lane 'splice #F,3,27; print join " ",#F' file
Produces output:
c1 c2 c30 c31