awk print overwrite strings - awk

I have a problem using awk in the terminal.
I need to move many files in a group from the actual directory to another one and I have the list of the necessary files in a text file, as:
filename.txt
file1
file2
file3
...
I usually digit:
paste filename.txt | awk '{print "mv "$1" ../dir/"}' | sh
and it executes:
mv file1 ../dir/
mv file2 ../dir/
mv file3 ../dir/
It usually works, but now the command changes its behaviour and awk overwrites the last string ../dir/ on the first one, starting again the print command from the initial position, obtaining:
../dire1 ../dir/
../dire2 ../dir/
../dire3 ../dir/
and of course it cannot be executed.
What's happened?
How do I solve it?

Your input file contains carriage returns (\r aka control-M). Run dos2unix on it before running a UNIX tool on it.
idk what you're using paste for though, and you should not be using awk for this at all anyway, it's just a job for a simple shell script, e.g. remove the echo once you've tested this:
$ < file xargs -n 1 -I {} echo mv "{}" "../dir"
mv file1 ../dir
mv file2 ../dir
mv file3 ../dir

Related

sed replace text between comma

I have csv files that need to be changed f -> 0 and t -> 1 only between commas for every single csv if it matches. From:
,t,t,f,f,a,t,f,t,f,f,t,f,
tftf
to:
,1,1,0,0,a,1,0,1,0,0,1,0,
tftf
Works this way, but want to know better way that could reduce the replacing time consume
for i in 1 2 3 4 5 6
do
echo "converting tables for mariaDB"
find ./ -type f -name "*.csv" -print0 | xargs -0 sed -i 's/\,t\,/\,1\,/g'
find ./ -type f -name "*.csv" -print0 | xargs -0 sed -i 's/\,f\,/\,0\,/g'
echo "$i time(s) changed "
done
I except , one single command will change the line
Could you please try following. Though it is not perfect solution but would be simplest use it in case you don't have gawk's latest version where -inplace edit option is present.
for file in *.csv
awk '{gsub(/,t,/,",1,");gsub(/,f,/,",0,");gsub(/,t,/,",1,");gsub(/,f,/,",0,")} 1' "$file" > temp && mv temp"$file"
done
OR
for file in *.csv
awk -v t_val="1" -v f_val="0" 'BEGIN{FS=OFS=","}{for(i=2;i<NF;i++){$i=($i=="t"?t_val:$i=="f"?f_val:$i)}} 1' "$file" > temp && mv temp "$file"
done
2nd solution: Using gawk's latest version where we could save edit into Input_file itself.
gawk -i inplace '{gsub(/,t,/,",1,");gsub(/,f,/,",0,");gsub(/,t,/,",1,");gsub(/,f,/,",0,")} 1' *.csv
OR
gawk -i inplace -v t_val="1" -v f_val="0" 'BEGIN{FS=OFS=","}{for(i=2;i<NF;i++){$i=($i=="t"?t_val:$i=="f"?f_val:$i)}} 1' Input_file
The main problem, in this case, is that a regular expression does not allow overlap when parsing it with sed 's/ere/str/g' or awk '{gsub(ere,str,$0)}'. This comment nicely explains how you can circumvent this in sed using the t<label> command, which means: if a change happened to the pattern space, move to <label>. The comment shows a generic way of doing it. The awk alternative to this rule would be:
$ awk '{while(match($0,ere)) gsub(ere,str)}'
An alternative sed solution in the case of the OP's example could use the following idea:
duplicate all commas. Since we are searching for strings of the form ",t,", this duplication avoid overlap using s.
since no overlap is possible, replace all ",f," with ",0," and all ",t," with ",1,".
We can now revert all duplicated commas again. As no overlap is allowed, sequences like ,,,, will be nicely converted to ,, and not ,
In POSIX sed this looks like:
$ sed -e 's/,/,,/g' -e 's/,f,/,0,/g' \
-e 's/,t,/,1,/g' -e 's/,,/,/g' file > file.tmp
$ mv file.tmp file
With GNU sed we can do it in one go:
$ sed -i 's/,/,,/g;s/,f,/,0,/g;s/,t,/,1,/g;s/,,/,/g' file
With awk, this would look like:
$ awk 'BEGIN{FS=",";OFS=FS FS}
{$1=$1;gsub(/,f,/,",0,");gsub(/,t,/,",1,");gsub(OFS,FS)}1' file > file.tmp
$ mv file.tmp file

awk/sed solution for printing only next line after it matches a pattern

I have multiple files in a folder. This is how a file look like
File1.txt
ghfgh gfghh
dffd kjkjoliukjkj
sdf ffghf
sf 898575
sfkj utiith
##
my data to be extracted
I want to extract the line immediately below "##" pattern from all the files and write them to an output file. I want the file name to be appended too in the output file.
Desired output
>File1
My data to be extracted
>File2
My data to be extracted
>File3
My data to be extracted
This is what i tried
awk '/##/{getline; print FILENAME; print ">"; print}' *.txt > output.txt
assumes one extract per file (otherwise filename header will be repeated)
$ awk '/##/{f=1; next} f{print ">"FILENAME; print; f=0}' *.txt > output.txt
Perl to the rescue!
perl -ne 'print ">$ARGV\n", scalar <> if /^##/' -- *.txt > output.txt
-n reads the input line by line
$ARGV contains the current input file name
scalar <> reads one line from the input
a quick way with grep:
grep -A1 '##' *.txt|grep -v '##' > output.txt
POSIX or GNU sed:
$ sed -n '/^##/{n;p;}' file
my data to be extracted
grep and sed:
$ grep -A 1 '##' file | sed '1d'
my data to be extracted

Redirecting AWK output back to the input file

Consider the bash command, where file is a file with a single nonempty line.
awk '{print "stuff"}' file >> file
It seems like this should do the following: awk reads a line of file, writes "stuff" to it, and then proceeds to the next line, at which point, it should write stuff to file again, and so on to infinity. But, instead, it just terminates after writing once. Why is this? Is this a property of the file system, unix piping or awk?
It works, you just need a bigger file:
$ echo foo > foo
$ awk '{print $1}' foo >> foo
$ wc -l foo
2 foo
But:
$ for i in {1..4096} ; do echo $i ; done >> foo
$ awk '{print $1}' foo >> foo
^C
$ wc -l foo
19429617 foo
Using GNU awk in this example. I assume it (GNU awk) opens the file and reads one full block of data, not just one record. If there is less than or equal to the size of one block amount of data, it closes the file from reading. If there is more, it keeps the file open for reading until EOF and keeps appending to the end of it.

Concatenating multiple files into a single line in linux

I have 3 fasta files like following
>file_1_head
haszhaskjkjkjkfaiezqbsga
>file_1_body
loizztzezzqieovbahsgzqwqoiropoqiwoioioiweoitwwerweuiruwieurhcabccjashdja
>file_1_tail
mnnbasnbdnztoaosdhgas
I would like to concatenate them into a single like following
>file_1
haszhaskjkjkjkfaiezqbsgaloizztzezzqieovbahsgzqwqoiropoqiwoioioiweoitwwerweuiruwieurhcabccjashdjamnnbasnbdnztoaosdhgas
I tried with cat command cat file_1_head.fasta file_1_body.fasta file_1_tail.fasta but it didnt concatenates into a single line like above. Is it possible with "awk" Kindly guide me.
Do you mean your three files have the content
file_1_head.fasta
>file_1_head
haszhaskjkjkjkfaiezqbsga
file_1_body.fasta
>file_1_body
loizztzezzqieovbahsgzqwqoiropoqiwoioioiweoitwwerweuiruwieurhcabccjashdja
and file_1_tail.fasta
>file_1_tail
mnnbasnbdnztoaosdhgas
including the name of each of them within them as the first line?
Then you could do
(echo ">file_1"; tail -qn -1 file_1_{head,body,tail}.fasta | tr -d "\n\t ") > file_1.fasta
to get file_1.fasta as
>file_1
haszhaskjkjkjkfaiezqbsgaloizztzezzqieovbahsgzqwqoiropoqiwoioioiweoitwwerweuiruwieurhcabccjashdjamnnbasnbdnztoaosdhgas
This also removes some extra whitespace at the end of the lines in your input that I got when I copied them verbatim.
You can do this simply with
cat file1 file2 file3 | tr -d '\n' > new_file
tr deletes the newline character.
EDIT:
For your specific first line just do
echo file_1 > new_file
cat file1 file2 file3 | tr -d '\n' >> new_file
The first command creates the file with one line file_1 in it. Then the cat... command just appends to this file.
What about this?
awk 'BEGIN { RS=""} {for (i=1;i<=NF;i++) { printf "%s",$i } }' f1_head f1_body f1_tail

Shell script: How to split line?

here's my scanerio:
my input file like:
/tmp/abc.txt
/tmp/cde.txt
/tmp/xyz/123.txt
and i'd like to obtain the following output in 2 files:
first file
/tmp/
/tmp/
/tmp/xyz/
second file
abc.txt
cde.txt
123.txt
thanks a lot
Here is all in one single awk
awk -F\/ -vOFS=\/ '{print $NF > "file2";$NF="";print > "file1"}' input
cat file1
/tmp/
/tmp/
/tmp/xyz/
cat file2
abc.txt
cde.txt
123.txt
Here we set input and output separator to /
Then print last field $NF to file2
Set the last field to nothing, then print the rest to file1
I realize you already have an answer, but you might be interested in the following two commands:
basename
dirname
If they're available on your system, you'll be able to get what you want just piping through these:
cat input | xargs -l dirname > file1
cat input | xargs -l basename > file2
Enjoy!
Edit: Fixed per quantdev's comment. Good catch!
Through grep,
grep -o '.*/' file > file1.txt
grep -o '[^/]*$' file > file2.txt
.*/ Matches all the characters from the start upto the last / symbol.
[^/]*$ Matches any character but not of / zero or more times. $ asserts that we are at the end of a line.
The awk solution is probably the best, but here is a pure sed solution :
#n sed script to get base and file paths
h
s/.*\/\(.*.txt\)/\1/
w file1
g
s/\(.*\)\/.*.txt/\1/
w file2
Note how we hold the buffer with h, and how we use the write (w) command to produce the output files. There are many other ways to do it with sed, but I like this one for using multiple different commands.
To use it :
> sed -f sed_script testfile
Here is another oneliner that uses tee:cat f1.txt | tee >(xargs -n 1 dirname >> f2.txt) >(xargs -n 1 basename >> f3.txt) &>/dev/random