How to number the ls output in unix? - awk

I am trying to write a file with format - "id file_absolute_path" which basically lists down all the files recursively in a folder and give an identifier to each file listed like 1,2,3,4.
I can get the absolute path of the files recursively using the following command:
ls -d -1 $PWD/**/*/*
However, I am unable to give an identifier from the output of the ls command. I am sure this can be done using awk, but can't seem to solve it.

Pipe the output through cat -n.

Assuming x is your command:
x | awk '{print NR, $0}'
will number the output lines

Two posible commands:
ls -d -1 $PWD/**/*/* | cat -n
ls -d -1 $PWD/**/*/* | nl
nl puts numbers to file lines.
I hope this clarifies too.

There is a tool named nl for that.
ls -la | nl

If you do ls -i, you'll get the inode number which is great as an id.
The only potential issue with using inodes is if you folder spans multiple file systems as an inode is only guaranteed to be unique within a file system.

ls -d -1 $PWD/**// | awk ' {x = x + 1} {print x " " $0} '

Related

How to parse a column from one file in mutiple other columns and concatenate the output?

I have one file like this:
head allGenes.txt
ENSG00000128274
ENSG00000094914
ENSG00000081760
ENSG00000158122
ENSG00000103591
...
and I have a multiple files named like this *.v7.egenes.txt in the current directory. For example one file looks like this:
head Stomach.v7.egenes.txt
ENSG00000238009 RP11-34P13.7 1 89295 129223 - 2073 1.03557 343.245
ENSG00000237683 AL627309.1 1 134901 139379 - 2123 1.02105 359.907
ENSG00000235146 RP5-857K21.2 1 523009 530148 + 4098 1.03503 592.973
ENSG00000231709 RP5-857K21.1 1 521369 523833 - 4101 1.07053 559.642
ENSG00000223659 RP5-857K21.5 1 562757 564390 - 4236 1.05527 595.015
ENSG00000237973 hsa-mir-6723 1 566454 567996 + 4247 1.05299 592.876
I would like to get lines from all *.v7.egenes.txt files that match any entry in allGenes.txt
I tried using:
grep -w -f allGenes.txt *.v7.egenes.txt > output.txt
but this takes forever to complete. Is there is any way to do this in awk or?
Without knowing the size of the files, but assuming the host has enough memory to hold allGenes.txt in memory, one awk solution comes to mind:
awk 'NR==FNR { gene[$1] ; next } ( $1 in gene )' allGenes.txt *.v7.egenes.txt > output.txt
Where:
NR==FNR - this test only matches the first file to be processed (allGenes.txt)
gene[$1] - store each gene as an index in an associative array
next stop processing and go to next line in the file
$1 in gene - applies to all lines in all other files; if the first field is found to be an index in our associative array then we print the current line
I wouldn't expect this to run any/much faster than the grep solution the OP is currently using (especially with shelter's suggestion to use -F instead of -w), but it should be relatively quick to test and see ....
GNU Parallel has a whole section dedicated to grepping n lines for m regular expressions:
https://www.gnu.org/software/parallel/man.html#EXAMPLE:-Grepping-n-lines-for-m-regular-expressions
You could try with a while read loop :
#!/bin/bash
while read -r line; do
grep -rnw Stomach.v7.egenes.txt -e "$line" >> output.txt
done < allGenes.txt
So here tell the while loop to read all the lines from the allGenes.txt, and for each line, check whether there are matching lines in the egenes file. Would that do the trick?
EDIT :
New version :
#!/bin/bash
for name in $(cat allGenes.txt); do
grep -rnw *v7.egenes.txt* -e $name >> output.txt
done

bash script variables weird results when using cp

If I use cp inside a bash script the copied file will have weird charachters around the destination filename.
The destination name comes from the results of an operation, it's put inside a variable, and echoing the variable shows normal output.
The objective is to name a file after a string.
#!/bin/bash
newname=`cat outputfile | grep 'hostname ' | sed 's/hostname //g'
newecho=`echo $newname`
echo $newecho
cp outputfile "$newecho"
If I launch the script the echo looks ok
$ ./rename.sh
mo-swc-56001
However the file is named differently
~$ ls
'mo-swc-56001'$'\r'
As you can see the file contains extra charachters which the echo does not show.
Edit: the newline of the file is like this
# file outputfile
outputfile: ASCII text, with CRLF, CR line terminators
I tried in every possible way to get rid of the ^M charachter but this is an example of the hundreds of attempts
# cat outputfile | grep 'hostname ' | sed 's/hostname //g' | cat -v
mo-swc-56001^M
# cat outputfile | grep 'hostname ' | sed 's/hostname //g' | cat -v | sed 's/\r//g' | cat -v
mo-swc-56001^M
This newline will stay there. Any ideas?
Edit: crazy, the only way is to perform a dos2unix on the output...
Looks like your outputfile has \r characters in it, so you could add logic there to remove them and give it a try then.
#!/bin/bash
##remove control M first from outputfile by tr command.
tr -d '\r' < outputfile > temp && mv temp outputfile
newname=$(sed 's/hostname //g' outputfile)
newecho=`echo $newname`
echo $newecho
cp outputfile "$newecho"
The only way was to use dos2unix

SSH recursively change all subfolders names to a specific name

i have hundreads of folders with a subfolder named "thumbs" under each folder. i need to change the "thumbs" subfolder name with "thumb", under each subfolder.
i tried
find . -type d -exec rename 's/^thumbs$/thumb/' {} ";"
and i run this in shell when i am inside the folder that contains all subfolders, and each one of these subfolders contains the "thumbs" folder that need to be renamed with "thumb".
well I ran that command and shell stayed a lot of time thinking, then i gave a CTRL+C to stop, but I checked and no folder was renamed under current directory, I dont know if i renamed folders outside the directory i was in, can someone tell me where i am wrong with the code?
Goal 1: To change a subfolder "thumbs" to "thumb" if only one level deep.
Example Input:
./foo1/thumbs
./foo2/thumbs
./foo2/thumbs
Solution:
find . -maxdepth 2 -type d | sed 'p;s/thumbs/thumb/' | xargs -n2 mv
Output:
./foo1/thumb
./foo2/thumb
./foo2/thumb
Explanation:
Use find to give you all "thumbs" folders only one level deep. Pipe the output to sed. The p option prints the input line and the rest of the sed command changes "thumbs" to "thumb". Finally, pipe to xargs. The -n2 option tells xargs to use two arguments from the pipe and pass them to the mv command.
Issue:
This will not catch deeper subfolders. You can't simple not use depth here because find prints the output from the top and since we are replacing things with sed before we mv, mv will result in a error for deeper subfolders. For example, ./foo/thumbs/thumbs/ will not work because mv will take care of ./foo/thumbs first and make it ./foo/thumb, but then the next output line will result in an error because ./foo/thumbs/thumbs/ no longer exist.
Goal 2: To change all subfolders "thumbs" to "thumb" regardless of how deep.
Example Input:
./foo1/thumbs
./foo2/thumbs
./foo2/thumbs/thumbs
./foo2/thumbs
Solution:
find . -type d | awk -F'/' '{print NF, $0}' | sort -k 1 -n -r | awk '{print $2}' | sed 'p;s/\(.*\)thumbs/\1thumb/' | xargs -n2 mv
Output:
./foo1/thumb
./foo2/thumb
./foo2/thumb/thumb
./foo2/thumb
Explanation:
Use find to give you all "thumbs" subfolders. Pipe the output to awk to print the number of '/'s in each path plus the original output. sort the output numerically, in reverse (to put the deepest paths on top) by the number of '/'s. Pipe the sorted list to awk to remove the counts from each line. Pipe the output to sed. The p option prints the input line and the rest of the sed command finds the last occurrence of "thumbs" and changes only it to "thumb". Since we are working with sorted list in the order of deepest to shallowest level, this will provide mv with the right commands. Finally, pipe to xargs. The -n2 option tells xargs to use two arguments from the pipe and pass them to the mv command.

Shell script: How to split line?

here's my scanerio:
my input file like:
/tmp/abc.txt
/tmp/cde.txt
/tmp/xyz/123.txt
and i'd like to obtain the following output in 2 files:
first file
/tmp/
/tmp/
/tmp/xyz/
second file
abc.txt
cde.txt
123.txt
thanks a lot
Here is all in one single awk
awk -F\/ -vOFS=\/ '{print $NF > "file2";$NF="";print > "file1"}' input
cat file1
/tmp/
/tmp/
/tmp/xyz/
cat file2
abc.txt
cde.txt
123.txt
Here we set input and output separator to /
Then print last field $NF to file2
Set the last field to nothing, then print the rest to file1
I realize you already have an answer, but you might be interested in the following two commands:
basename
dirname
If they're available on your system, you'll be able to get what you want just piping through these:
cat input | xargs -l dirname > file1
cat input | xargs -l basename > file2
Enjoy!
Edit: Fixed per quantdev's comment. Good catch!
Through grep,
grep -o '.*/' file > file1.txt
grep -o '[^/]*$' file > file2.txt
.*/ Matches all the characters from the start upto the last / symbol.
[^/]*$ Matches any character but not of / zero or more times. $ asserts that we are at the end of a line.
The awk solution is probably the best, but here is a pure sed solution :
#n sed script to get base and file paths
h
s/.*\/\(.*.txt\)/\1/
w file1
g
s/\(.*\)\/.*.txt/\1/
w file2
Note how we hold the buffer with h, and how we use the write (w) command to produce the output files. There are many other ways to do it with sed, but I like this one for using multiple different commands.
To use it :
> sed -f sed_script testfile
Here is another oneliner that uses tee:cat f1.txt | tee >(xargs -n 1 dirname >> f2.txt) >(xargs -n 1 basename >> f3.txt) &>/dev/random

grep a number from the line and append it to a file

I went through several grep examples, but don't see how to do the following.
Say, i have a file with a line
! some test here and number -123.2345 text
i can get this line using
grep ! input.txt
but how do i get the number (possibly positive or negative) from this line and append it to the end of another file? Is it possible to apply grep to grep results?
If yes, then i could get the number via something like
grep -Eo "[0-9]{1,}|\-[0-9]{1,}"
p/s/ i am using OS-X
p/p/s/ i'm trying to fetch data from several files and put into a single file for later plotting.
The format with your commands would be:
grep ! input.txt | grep -Eo "[0-9]{1,}|\-[0-9]{1,}" >> output
To grep from grep we use the pipe operator | this lets us chain commands together. To append this output to a file we use the redirection operator >>.
However there are a couple of problems. You regexp is better written: grep -Eoe '-?[0-9.]+' this allows for the decimal and returns the single number instead of two and if you want lines that start with ! then grep ^! is better to avoid matches with lines what contain ! but don't start with it. Better to do:
grep '^!' input | grep -Eoe '-?[0-9.]+' >> output
perl -lne 'm/.*?([\d\.\-]+).*/g;print $1' your_file >>anotherfile_to_append
$foo="! some test here and number -123.2345 text"
$echo $foo | sed -e 's/[^0-9\.-]//g'
$-123.2345
Edit:-
for a file,
[ ]$ cat log
! some test here and number -123.2345 text
some blankline
some line without "the character" and with number 345.566
! again a number 34
[ ]$ sed -e '/^[^!]/d' -e 's/[^0-9.-]//g' log > op
[ ]$ cat op
-123.2345
34
Now lets see the toothpicks :) '/^[^!]/d' / start of pattern, ^ not (like multiply with false), [^!] anyline starting with ! and d delete. Second expression, [^0-9.-] not matching anything within 0 to 9, and . and -, (everything else) // replace with nothing (i.e. delete) and done :)