bzip2: Input file file.txt has 1 other link - bzip2

When calling
bzip2 file.txt
I get this error message
bzip2: Input file file.txt has 1 other link
I'm using OSX, but I think this problem is not specific to OSX, so I'm asking here.

I solved it using the force flag: -f
Don't know why.

My solution was to copy the file:
cp file.txt tmp
rm file.txt
mv tmp file.txt
bzip2 file.txt
But perhaps someone could explain it anyway?

Related

How to modify sed awk command to work with relative path

Context
I had a SO question successfully answered at https://stackoverflow.com/a/59244265/80353
I have successfully used the command that was given.
cap()(cd /tmp;rm -f *.vtt;youtube-dl --skip-download --write-auto-sub "$1";\
sed '1,/^$/d' *.vtt|sed 's/<[^>]*>//g'|awk -F. 'NR%8==1{printf"%s ",$1}NR%8==3'\
|tee -a "$2")
What does this command do?
This command will download captions for a youtube video as a .vtt file from $1 parameter
then print out the simplified version of the .vtt file into another file that's stated as parameter $2
This works as advertised.
How to call the command
In the terminal I will run the above command once and then run cap $youtube_url $full_path_to_output_file
What changes I would like
Currently, the $2 parameter must be a full path. Also currently, if the $2 parameter doesn't exist, an actual file will be created. What I would like is this behavior remains even for relative path. So hopefully for relative path, this behavior of creating a new empty file still works.
Update
I see that comments are such that there's nothing wrong with the command.
However, I did try running
cap $youtube_url $relative_path_to_a_text_file and it definitely did not work for me in macOS
Perhaps I am missing something else?
Update 2
This is a video of me running the awk sed command . First I did it with just a relative path. No output file shows up in the current working directory. The second shows me typing the full path and it works.
https://www.loom.com/share/1c179506fa5b48b4a3d62c81a9d2a411
I hope this clarifies the question i am raising and the commenters would kindly update their comments based on this video.
EDIT: Adding a solution after OP's comment which do checks inside OP's function itself, warning not tested it though.
cap()(
user_path=$(echo "$path_details" | awk 'match($0,/.*\//){print substr($0,RSTART,RLENGTH)}')
path_details="$2"
PWD=`pwd`
cd "$PWD"
user_path=$(echo "$path_details" | awk 'match($0,/.*\//){print substr($0,RSTART,RLENGTH)}')
if [[ -d "$user_path" ]]
then
echo "Present path $user_path."
##Call your program here....##
cd /tmp;rm -f *.vtt;youtube-dl --skip-download --write-auto-sub "$1";\
sed '1,/^$/d' *.vtt|sed 's/<[^>]*>//g'|awk -F. 'NR%8==1{printf"%s ",$1}NR%8==3'\
|tee -a "$2"
else
echo "NOT present path $user_path."
##Can exit from here. if needed.##
fi
)
I believe OP wants to check directory of relative path passed as 2nd argument, is present or not, if this is the case then one could try following.
cat file.ksh
path_details="$2"
PWD=`pwd`
##Why I am going to your path is, in case you are running this from cron, so in that case you can mention complete path here, rather than pwd as mentioned above.
cd "$PWD"
user_path=$(echo "$path_details" | awk 'match($0,/.*\//){print substr($0,RSTART,RLENGTH)}')
if [[ -d "$user_path" ]]
then
echo "Present path $user_path."
##Call your program here....##
else
echo "NOT present path $user_path."
##Can exit from here. if needed.##
fi
Explanation: Adding detailed explanation for above code.
cat file.ksh ##For OP reference to show content I am using cat script_name here.
path_details="$2" ##Creating variable path_details whose value is $2(2nd argument passed to script)
PWD=`pwd` ##Creating variable PWD whose value is pwd(current working directory).
##Why I am going to your path is, in case you are running this from cron, so in that case you can mention complete path here, rather than pwd as mentioned above.
cd "$PWD" ##Going to current directory, why I did is you can set PWD above variable value as per your need and navigate to that path, this will help in case of script is running from Cron.
user_path=$(echo "$path_details" | awk 'match($0,/.*\//){print substr($0,RSTART,RLENGTH)}') ##Now getting path details from passed 2nd argument for script.
if [[ -d "$user_path" ]] ##Checking if user_path(path value is existing on system)
then
echo "Present path $user_path."
##Call your program here....## ##If path existing then call your program.
else ##If path NOT existing then exit from program or print message up to you :)
echo "NOT present path $user_path."
##Can exit from here. if needed.##
fi ##Closing if condition here.

extra a value from a file in bash script

i have this file content in my sample file "haproxy-monitoring.conf"
[[inputs.haproxy]]
servers = ["http://localhost:31330/haproxy?stats" ]
Can you please help me, how I can extract just the port number '31330' from the file haproxy-monitoring.conf in a bash script.
with sed
$ sed -rn '/servers/s/.*:([0-9]+).*/\1/p' file
or similarly with awk
$ awk '/servers/{print gensub(/.*:([0-9]+).*/,"\\1",1)}' file
awk -F'[:/]' '{print $5}' file
31330
Or something like
grep -Eo '[0-9]+' file
Questions that state what the output should be without explaining why it should be that leave you open to all sorts of answers unrelated to what you really are trying to do. idk if this is what you want or not since you haven't told us:
$ tr -cd '0-9' < file
31330

AWK to process compressed files and printing original (compressed) file names

I would like to process multiple .gz files with gawk.
I was thinking of decompressing and passing it to gawk on the fly
but I have an additional requirement to also store/print the original file name in the output.
The thing is there's 100s of .gz files with rather large size to process.
Looking for anomalies (~0.001% rows) and want to print out the list of found inconsistencies ALONG with the file name and row number that contained it.
If I could have all the files decompressed I would simply use FILENAME variable to get this.
Because of large quantity and size of those files I can't decompress them upfront.
Any ideas how to pass filename (in addition to the gzip stdout) to gawk to produce required output?
Assuming you are looping over all the files and piping their decompression directly into awk something like the following will work.
for file in *.gz; do
gunzip -c "$file" | awk -v origname="$file" '.... {print origname " whatever"}'
done
Edit: To use a list of filenames from some source other than a direct glob something like the following can be used.
$ ls *.awk
a.awk e.awk
$ while IFS= read -d '' filename; do
echo "$filename";
done < <(find . -name \*.awk -printf '%P\0')
e.awk
a.awk
To use xargs instead of the above loop will require the body of the command to be in a pre-written script file I believe which can be called with xargs and the filename.
this is using combination of xargs and sh (to be able to use pipe on two commands: gzip and awk):
find *.gz -print0 | xargs -0 -I fname sh -c 'gzip -dc fname | gawk -v origfile="fname" -f printbadrowsonly.awk >> baddata.txt'
I'm wondering if there's any bad practice with the above approach…

How to run an .awk file without 'awk -f' command?

I am new to awk script. I am trying to figure out how to run an awk file without awk -f command. I see people keep saying add "#!bin/awk -f" for the first line of an awk file. But this didn't for my awk. It still gives me "no file or directory" error.
I question is what does "#!bin/awk -f" really mean, and what does it do?
Its #!/bin/awk -f not #!bin/awk. That will probably work, but theres no guaranty. If someone who has awk installed in a different location runs your script, it won't work. What you want is this: #!/usr/bin/env awk -f.
#! is what tells bash what to use to interpret your script. It should go at the very top of your file. It's called a Shebang. Right after that, you put the path to the interpreter.
/usr/bin/env finds where awk is located, and uses that script as the interpreter. So if they installed awk into somewhere else like /usr/local/bin then it'll find it. This probably won't matter for you, but it's a good habit to get into. It's more portable, and can be shared easier.
The -f says that awk is gonna read from a file. You could do awk -f yourfilename.awk in bash, but in the shebang, -f means the rest of the code will be the file it reads from.
I hope this helped. Feel free to ask me any questions if it doesn't work, or isn't clear enough.
UPDATE
If you get the error message:
/usr/bin/env: ‘awk -f’: No such file or directory
/usr/bin/env: use -[v]S to pass options in shebang lines
then change the first line of your script to #!/usr/bin/env -S awk -f (tested with GNU bash, version 4.4.23)
You probably want
#!/bin/awk -f
(The first slash after the #! is important).
This tells unix what program it should use to 'run' the script with.
It is usually called the 'shebang' which comes from hash + bang.
If you want to run your script like this you need to make sure it is executable (chmod +x <script>).
Otherwise you can just run your script by typing the command /bin/awk -f <script>
The Shebang for Awk Explained
#! is the start of a shebang line, which tells the shell which interpreter to use for the script.
/bin/awk is the path to your awk executable. You may need to change this is your awk is installed elsewhere, or if you want to use a different version of awk.
-f is a flag to awk to tell it to interpret the flag's argument as an awk script. In a shebang, it tells some awks to interpret the remainder of the script instead of a file.
Your Shebang is (Probably) Broken
You are using #!bin/awk -f which is unlikely to work, unless you have awk installed as $PWD/bin/awk. You probably meant to use #!/bin/awk instead.
In some instances, passing a flag on the shebang line may not work with your shell or your awk. If you have the rest of the shebang line correct, you might try removing the -f flag and see if that works for you.

Execute a command in the re-verse order of ids present in a file

I am running the following command using awk on file.txt ,currently its running the command on the ids present in file.txt from top to bottom..i want the commmand to be run in the reverse order for the ids present in file.txt..any inputs on how we can do this?
git command $(awk '{print $1}' file.txt)
file.txt contains.
97a65fd1d1b3b8055edef75e060738fed8b31d3
fb8df67ceff40b4fc078ced31110d7a42e407f16
a0631ce8a9a10391ac4dc377cd79d1adf1f3f3e2
.....
If you aren't bound to using awk then tail with the -r (for reverse) argument will do the trick...
myFile.txt
97a65fd1d1b3b8055edef75e060738fed8b31d3
fb8df67ceff40b4fc078ced31110d7a42e407f16
a0631ce8a9a10391ac4dc377cd79d1adf1f3f3e2
Now to print it in reverse...
$ tail -r myFile.txt
a0631ce8a9a10391ac4dc377cd79d1adf1f3f3e2
fb8df67ceff40b4fc078ced31110d7a42e407f16
97a65fd1d1b3b8055edef75e060738fed8b31d3
EDIT:
To output this to a file simply redirect it out...
$ tail -r myFile.txt > newFile.txt
EDIT:
Want to write to the same file? No problem!
tail -r myFile.txt > temp.txt; cat temp.txt > myFile.txt; rm temp.txt;
For some reason when I redirected tail -r to the same file it came back blank, this workaround avoids that issue by writing to a temporary "buffer" file.
To reverse the lines in a file using awk, use
awk '{a[i++]=$0} END {for (j=i-1; j>=0;) print a[j--] }' file
use $1 instead of $0 above to operate on the first field only instead of the whole line.