A script to change file names - awk

I am new to awk and shell based programming. I have a bunch of files name file_0001.dat, file_0002.dat......file_1000.dat. I want to change the file names such as the number after file_ will be a multiple of 4 in comparison to previous file name. SO i want to change
file_0001.dat to file_0004.dat
file_0002.dat to file_0008.dat
and so on.
Can anyone suggest a simple script to do it. I have tried the following but without any success.
#!/bin/bash
a=$(echo $1 sed -e 's:file_::g' -e 's:.dat::g')
b=$(echo "${a}*4" | bc)
shuf file_${a}.dat > file_${b}.dat

This script will do that trick for you:
#!/bin/bash
for i in `ls -r *.dat`; do
a=`echo $i | sed 's/file_//g' | sed 's/\.dat//g'`
almost_b=`bc -l <<< "$a*4"`
b=`printf "%04d" $almost_b`
rename "s/$a/$b/g" $i
done
Files before:
file_0001.dat file_0002.dat
Files after first execution:
file_0004.dat file_0008.dat
Files after second execution:
file_0016.dat file_0032.dat

Here's a pure bash way of doing it (without bc, rename or sed).
#!/bin/bash
for i in $(ls -r *.dat); do
prefix="${i%%_*}_"
oldnum="${i//[^0-9]/}"
newnum="$(printf "%04d" $(( 10#$oldnum * 4 )))"
mv "$i" "${prefix}${newnum}.dat"
done
To test it you can do
mkdir tmp && cd $_
touch file_{0001..1000}.dat
(paste code into convert.sh)
chmod +x convert.sh
./convert.sh

Using bash/sed/find:
files=$(find -name 'file_*.dat' | sort -r)
for file in $files; do
n=$(sed 's/[^_]*_0*\([^.]*\).*/\1/' <<< "$file")
let n*=4
nfile=$(printf "file_%04d.dat" "$n")
mv "$file" "$nfile"
done

ls -r1 | awk -F '[_.]' '{printf "%s %s_%04d.%s\n", $0, $1, 4*$2, $3}' | xargs -n2 mv
ls -r1 list file in reverse order to avoid conflict
the second part will generate new filename. For example: file_0002.dat will become file_0002.dat file_0008.dat
xargs -n2 will pass two arguments every time to mv

This might work for you:
paste <(seq -f'mv file_%04g.dat' 1000) <(seq -f'file_%04g.dat' 4 4 4000) |
sort -r |
sh

This can help:
#!/bin/bash
for i in `cat /path/to/requestedfiles |grep -o '[0-9]*'`; do
count=`bc -l <<< "$i*4"`
echo $count
done

Related

Make 'awk' exit if it is given an empty file list from a subshell

I run:
find mydir -type f -name "the_thing.txt"
And I get nothing (the file is not there).
Then I run:
awk '{print $0}' $(find mydir -type f -name "the_thing.txt")
And I get the shell stuck in awk (because the input file was not specified, and awk is now waiting for standard input).
How can I make awk (or cat) just print nothing and exit in case find does not output anything?
Your previous post included the -maxdepth 1 option which uniquifies the file path.
That is why I've asked about that. Now the option is removed and I've understood what you mean by some subdirectory.
Then would you please try:
find mydir -type f -name "the_thing.txt" -print0 | xargs -0 -r awk '{print $0}'
Please note that the -r option to xargs suppresses the execution if the input is empty.
If you want to limit the file up to one with head command, you can say:
find mydir -type f -name "the_thing.txt" -print0 | head -n1 -z | xargs -0 -r awk '{print $0}'
The -z option to head was introduced in coreutils 8.25 (around January 2016).
If your head command does not support the option, please say alternatively:
find mydir -type f -name "the_thing.txt" | head -n1 | xargs -r awk '{print $0}'
which is less robust against the filenames which contain blank characters.

How to paste values of specific columns of a file into another command?

I want to use the fastacmd to extract specific regions of fasta sequences.
To do that I need to put the name of the fasta file -d, the name of the sequence -s and the position of the sequence to extract -L. For example:
fastacmd -d OAP11402.1.fa -s OAP11402.1 -L 50,100
But the problem is that I have hundreds of files (each file has one sequence with the same name of the file) and the info of position of each sequence to extract is in a protein database (info_sequences.txt). So, I want to make a loop to paste the name of the file, sequence and the positions to extract from the protein database info_sequences.txt in the fastacmd.
The look of info_sequences.txt is like this:
File seq_id position_start position_end
OAP11402.1.fa OAP11402.1 50 100
OAP15774.1.fa OAP15774.1 75 200
OAP10214.1.fa OAP10214.1 33 310
I think that awk could help but i'm struggling with the way to paste the info in the fastcmd
source <(
awk 'NR > 1 {
printf "echo fastacmd -d %s -s %s -L %d,%d\n", $1, $2, $3, $4
}' info_sequences.txt
)
The awk command spits out all the commands.
Then the source <( ... ) evaluates the commands in your current shell.
Same advice as Cyrus, if it looks OK remove the echo
Or, do it all in awk:
awk 'NR > 1 {
cmd = "echo fastacmd -d " $1 " -s " $2 " -L " $3 "," $4
system(cmd)
}' info_sequences.txt
awk 'NR>1 {print "-d",$1,"-s",$2,"-L",$3","$4}' info_sequences.txt | xargs -I {} echo fastacmd {}
Output:
fastacmd -d OAP11402.1.fa -s OAP11402.1 -L 50,100
fastacmd -d OAP15774.1.fa -s OAP15774.1 -L 75,200
fastacmd -d OAP10214.1.fa -s OAP10214.1 -L 33,310
If everything looks okay, remove echo.

Make work find pipe awk command in Makefile

I have this find awk line to get python code analyse::
$ find ./ -name '*.py' -exec wc -l {} \; | sort -n| awk '{print $0}{s+=$0}END{print s}'
12 ./gb/__init__.py
23 ./gb/value_type.py
40 ./setup.py
120 ./gb/libcsv.py
200
$
I try to put it in a Makefile::
$ cat Makefile
python_count_lines: clean
#find ./ -name '*.py' -exec wc -l {} \; | sort -n| awk '{print \$0}{s+=\$0}END{print s}'
But this did not work::
$ make python_count_lines
awk: line 1: syntax error at or near }
Makefile:12: recipe for target 'python_count_lines' failed
make: *** [python_count_lines] Error 2
$
Bertrand Martel is correct that you need to escape dollar signs from make by doubling them, not prefixing them with backslashes (see info here).
However, the rest of that suggestion is not right and won't work; first, you should almost never use the shell function in a recipe. Second, using the info function here cannot work because in the first line you've set a shell variable RES equal to some value, then you try to print the make variable RES in the second line; not only that but each line is run in a separate shell, and also all make variable and function references are expanded up-front, before any part of the recipe is passed to the shell.
You just need to do this:
python_count_lines: clean
#find ./ -name '*.py' -exec wc -l {} \; | sort -n| awk '{print $$0}{s+=$$0}END{print s}'

Linux Grep or Awk to find strings and store into array

I would like to print the string in the following pattern. And I would like to store it in a array. Please help me, I need O/p as follows
test11
orcl
My commands/Tries
egrep -i ":Y|:N" /etc/oratab | cut -d":" -f1 | grep -v "\#" | grep -v "\*" | tr -d '\n' | sed 's/ /\n/g' | awk '{print $1}'
Above commands O/p:
test11orcl
Contents of Oratab will be as follows,
[oracle#rhel6112 scripts]$ cat/etc/oratab
#
# This file is used by ORACLE utilities. It is created by root.sh
# and updated by the Database Configuration Assistant when creating
# a database.
# A colon, ':', is used as the field terminator. A new line terminates
# the entry. Lines beginning with a pound sign, '#', are comments.
#
# Entries are of the form:
# $ORACLE_SID:$ORACLE_HOME:<N|Y>:
#
# Multiple entries with the same $ORACLE_SID are not allowed.
#
#
test11:/u01/app/oracle/product/11.2.0/dbhome_1:N
orcl:/u01/app/oracle/product/10.2.0/db_1:N
End of Cat Output
From the above file am trying to extract the STRING before the :/
As a start, try this:
$ cat input.txt
test11:/u01/app/oracle/product/11.2.0/dbhome_1:N
orcl:/u01/app/oracle/product/10.2.0/db_1:N
$ awk -F: '{print $1}' input.txt
test11
orcl
update
Using bash:
#!/bin/bash
ARRAY=()
while read -r line
do
[[ "$line" = \#* ]] && continue
data=$(awk -F: '{print $1}' <<< $line)
ARRAY+=($data)
done < input.txt
for i in "${ARRAY[#]}"
do
echo "$i"
done
In action:
$ ./db.sh
test11
orcl
You could use sed also,
sed -r 's/^([^:]*):.*$/\1/g' file
Example:
$ cat cc
test11:/u01/app/oracle/product/11.2.0/dbhome_1:N
orcl:/u01/app/oracle/product/10.2.0/db_1:N
$ sed -r 's/^([^:]*):.*$/\1/g' cc
test11
orcl
OR
$ sed -nr 's/^(.*):\/.*$/\1/p' file
test11
orcl

Get total number of lines of code?

Does anyone know if it is possible to get the total number of lines of code from all the classes in my project in Objective-C.
Right now I am guessing that this is not possible but I just wanted to make sure.
If it is possible does anyone know how to do it?
If you like the terminal and have all your files in the same folder, try:
$ wc *.m
To get at the number in your code, you could run it as a shell script build phase that generates a header file for you. E.g.
cd source_folder
wc -l *.m \
| tail -1 \
| awk '{ print "#define kNumberOfLines " $1 }' \
> lines_of_code_header.h
Then include that file and use the constant as you like.
find . -type f -name "*.[mh]" -exec wc -l '{}' \; | awk '{sum+=$1} END {print sum}'