Shell script: How to split line? - awk

here's my scanerio:
my input file like:
/tmp/abc.txt
/tmp/cde.txt
/tmp/xyz/123.txt
and i'd like to obtain the following output in 2 files:
first file
/tmp/
/tmp/
/tmp/xyz/
second file
abc.txt
cde.txt
123.txt
thanks a lot

Here is all in one single awk
awk -F\/ -vOFS=\/ '{print $NF > "file2";$NF="";print > "file1"}' input
cat file1
/tmp/
/tmp/
/tmp/xyz/
cat file2
abc.txt
cde.txt
123.txt
Here we set input and output separator to /
Then print last field $NF to file2
Set the last field to nothing, then print the rest to file1

I realize you already have an answer, but you might be interested in the following two commands:
basename
dirname
If they're available on your system, you'll be able to get what you want just piping through these:
cat input | xargs -l dirname > file1
cat input | xargs -l basename > file2
Enjoy!
Edit: Fixed per quantdev's comment. Good catch!

Through grep,
grep -o '.*/' file > file1.txt
grep -o '[^/]*$' file > file2.txt
.*/ Matches all the characters from the start upto the last / symbol.
[^/]*$ Matches any character but not of / zero or more times. $ asserts that we are at the end of a line.

The awk solution is probably the best, but here is a pure sed solution :
#n sed script to get base and file paths
h
s/.*\/\(.*.txt\)/\1/
w file1
g
s/\(.*\)\/.*.txt/\1/
w file2
Note how we hold the buffer with h, and how we use the write (w) command to produce the output files. There are many other ways to do it with sed, but I like this one for using multiple different commands.
To use it :
> sed -f sed_script testfile

Here is another oneliner that uses tee:cat f1.txt | tee >(xargs -n 1 dirname >> f2.txt) >(xargs -n 1 basename >> f3.txt) &>/dev/random

Related

How to add N blank lines between all rows of a text file?

I have a file that looks
a
b
c
d
Suppose I want to add N lines (in the example 3, but I actually need 20 or 100 depending on the file)
a
b
c
d
I can add one blank line between all of them with sed
sed -i '0~1 a\\' file
But sed -i '0~3 a\\' file inserts one line every 3 rows.
You may use with GNU sed:
sed -i 'G;G;G' file
The G;G;G will append three empty lines below each non-final line.
Or, awk:
awk 'BEGIN{ORS="\n\n\n"};1'
See an online sed and awk demo.
If you need to set the number of newlines dynamically use
nl="
"
awk -v nl="$nl" 'BEGIN{for(c=0;c<3;c++) v=v""nl;ORS=v};1' file > newfile
With GNU awk:
awk -i inplace -v lines=3 '{print; for(i=0;i<lines;i++) print ""}' file
Update with Ed's hints (see comments):
awk -i inplace -v lines=3 '{print; for(i=1;i<=lines;i++) print ""}' file
Update (without trailing empty lines):
awk -i inplace -v lines=3 'NR==1; NR>1{for(i=1;i<=lines;i++) print ""; print}' file
Output to file:
a
b
c
d
With sed and corutils:
N=4
sed "\$b;$(yes G\; | head -n$N)" infile
Similar trick with awk:
N=4
awk 1 RS="$(yes \\n | head -n$N | tr -d '\n')" infile
This might work for you (GNU sed):
sed ':a;G;s/\n/&/2;Ta' file
This will add 2 blank lines following each line.
Change 2 to what ever number you desire between each line.
An alternative (more efficient?):
sed '1{x;:a;/^.\{2\}/!s/^/\n/;ta;s/.//;x};G' file

sed replace text between comma

I have csv files that need to be changed f -> 0 and t -> 1 only between commas for every single csv if it matches. From:
,t,t,f,f,a,t,f,t,f,f,t,f,
tftf
to:
,1,1,0,0,a,1,0,1,0,0,1,0,
tftf
Works this way, but want to know better way that could reduce the replacing time consume
for i in 1 2 3 4 5 6
do
echo "converting tables for mariaDB"
find ./ -type f -name "*.csv" -print0 | xargs -0 sed -i 's/\,t\,/\,1\,/g'
find ./ -type f -name "*.csv" -print0 | xargs -0 sed -i 's/\,f\,/\,0\,/g'
echo "$i time(s) changed "
done
I except , one single command will change the line
Could you please try following. Though it is not perfect solution but would be simplest use it in case you don't have gawk's latest version where -inplace edit option is present.
for file in *.csv
awk '{gsub(/,t,/,",1,");gsub(/,f,/,",0,");gsub(/,t,/,",1,");gsub(/,f,/,",0,")} 1' "$file" > temp && mv temp"$file"
done
OR
for file in *.csv
awk -v t_val="1" -v f_val="0" 'BEGIN{FS=OFS=","}{for(i=2;i<NF;i++){$i=($i=="t"?t_val:$i=="f"?f_val:$i)}} 1' "$file" > temp && mv temp "$file"
done
2nd solution: Using gawk's latest version where we could save edit into Input_file itself.
gawk -i inplace '{gsub(/,t,/,",1,");gsub(/,f,/,",0,");gsub(/,t,/,",1,");gsub(/,f,/,",0,")} 1' *.csv
OR
gawk -i inplace -v t_val="1" -v f_val="0" 'BEGIN{FS=OFS=","}{for(i=2;i<NF;i++){$i=($i=="t"?t_val:$i=="f"?f_val:$i)}} 1' Input_file
The main problem, in this case, is that a regular expression does not allow overlap when parsing it with sed 's/ere/str/g' or awk '{gsub(ere,str,$0)}'. This comment nicely explains how you can circumvent this in sed using the t<label> command, which means: if a change happened to the pattern space, move to <label>. The comment shows a generic way of doing it. The awk alternative to this rule would be:
$ awk '{while(match($0,ere)) gsub(ere,str)}'
An alternative sed solution in the case of the OP's example could use the following idea:
duplicate all commas. Since we are searching for strings of the form ",t,", this duplication avoid overlap using s.
since no overlap is possible, replace all ",f," with ",0," and all ",t," with ",1,".
We can now revert all duplicated commas again. As no overlap is allowed, sequences like ,,,, will be nicely converted to ,, and not ,
In POSIX sed this looks like:
$ sed -e 's/,/,,/g' -e 's/,f,/,0,/g' \
-e 's/,t,/,1,/g' -e 's/,,/,/g' file > file.tmp
$ mv file.tmp file
With GNU sed we can do it in one go:
$ sed -i 's/,/,,/g;s/,f,/,0,/g;s/,t,/,1,/g;s/,,/,/g' file
With awk, this would look like:
$ awk 'BEGIN{FS=",";OFS=FS FS}
{$1=$1;gsub(/,f,/,",0,");gsub(/,t,/,",1,");gsub(OFS,FS)}1' file > file.tmp
$ mv file.tmp file

awk/sed solution for printing only next line after it matches a pattern

I have multiple files in a folder. This is how a file look like
File1.txt
ghfgh gfghh
dffd kjkjoliukjkj
sdf ffghf
sf 898575
sfkj utiith
##
my data to be extracted
I want to extract the line immediately below "##" pattern from all the files and write them to an output file. I want the file name to be appended too in the output file.
Desired output
>File1
My data to be extracted
>File2
My data to be extracted
>File3
My data to be extracted
This is what i tried
awk '/##/{getline; print FILENAME; print ">"; print}' *.txt > output.txt
assumes one extract per file (otherwise filename header will be repeated)
$ awk '/##/{f=1; next} f{print ">"FILENAME; print; f=0}' *.txt > output.txt
Perl to the rescue!
perl -ne 'print ">$ARGV\n", scalar <> if /^##/' -- *.txt > output.txt
-n reads the input line by line
$ARGV contains the current input file name
scalar <> reads one line from the input
a quick way with grep:
grep -A1 '##' *.txt|grep -v '##' > output.txt
POSIX or GNU sed:
$ sed -n '/^##/{n;p;}' file
my data to be extracted
grep and sed:
$ grep -A 1 '##' file | sed '1d'
my data to be extracted

awk print overwrite strings

I have a problem using awk in the terminal.
I need to move many files in a group from the actual directory to another one and I have the list of the necessary files in a text file, as:
filename.txt
file1
file2
file3
...
I usually digit:
paste filename.txt | awk '{print "mv "$1" ../dir/"}' | sh
and it executes:
mv file1 ../dir/
mv file2 ../dir/
mv file3 ../dir/
It usually works, but now the command changes its behaviour and awk overwrites the last string ../dir/ on the first one, starting again the print command from the initial position, obtaining:
../dire1 ../dir/
../dire2 ../dir/
../dire3 ../dir/
and of course it cannot be executed.
What's happened?
How do I solve it?
Your input file contains carriage returns (\r aka control-M). Run dos2unix on it before running a UNIX tool on it.
idk what you're using paste for though, and you should not be using awk for this at all anyway, it's just a job for a simple shell script, e.g. remove the echo once you've tested this:
$ < file xargs -n 1 -I {} echo mv "{}" "../dir"
mv file1 ../dir
mv file2 ../dir
mv file3 ../dir

Trying to use variable in sed or awk

I have 2 separate text files, each in the same exact format. I can grep FILE1.txt for a specific search term and output the line numbers of every match. The line numbers are outputted in numeric order to a file or a variable.
I want use each line number and print that line from FILE2.txt in numeric order to a single OUTPUT.txt. Does anyone know a way, using awk or sed to do this?
I have a string variable $linenumbers with values of 25 26 27 28.
I use the following command:
for i in $linenumbers; do sed -n "/$I/p" $i test_read2.fastq >> test.fastq; done.
I get errors of
sed: can't read 25: No such file or directory
sed: can't read 26: No such file or directory
sed: can't read 27: No such file or directory
sed: can't read 28: No such file or directory
If I do this sed command one by one, I can pull line number 25, 26, 27 and 28 from the file and print it to file using the following command;
sed -n "25p" test_read2.fastq >> test.fastq
I want to replace "25p" with variable so it pulls out multiple lines (25,26,27,28) from the file without doing this one by one...
Try this:
grep -n interesting FILE1.txt | cut -d: -f1 | while read l
do
sed -n "$l p" FILE2.txt
done
Example:
$ cat FILE1.txt
foo
bar
baz
$ cat FILE2.txt
qux
quux
quuux
$ grep -n bar FILE1.txt | cut -d: -f1 | while read l; do sed -n "$l p" FILE2.txt; done
quux
Not sure what exactly you want to do. If you want to print the lines of file which are defined in lines you could do awk 'NR==FNR{a[$0];next}FNR in a' lines file
test:
$ cat lines
1
3
7
$ cat file
a
b
c
d
e
f
g
$ awk 'NR==FNR{a[$0];next}FNR in a' lines file
a
c
g
sed -n "` grep -n 'Yourpattern' File1.txt | sed 's/:.*/p/'`" File2.txt
be carefful with substitution and (double) quote in YourPattern