stat vs mkdir with EEXIST - mkdir

I need to create folder if it does not exists, so I use:
bool mkdir_if_not_exist(const char *dir)
{
bool ret = false;
if (dir) {
// first check if folder exists
struct stat folder_info;
if (stat(dir, &folder_info) != 0) {
if (errno == ENOENT) { // create folder
if (mkdir(dir, S_IRWXU | S_IXGRP | S_IRGRP | S_IROTH | S_IXOTH) ?!= 0) // 755
perror("mkdir");
else
ret = true;
} else
perror("stat");
} else
ret = true; ?// dir exists
}
return ret;
}
The folder is created only during first run of program - after that it is just a check.
There is a suggestion to skipping the stat call and call mkdir and check errno against EEXIST.
Does it give real benefits?

More important, with the stat + mkdir approach, there is a race condition: in between the stat and the mkdir another program could do the mkdir, so your mkdir could still fail with EEXIST.

There's a slight benefit. Look up 'LBYL vs EAFP' or 'Look Before You Leap' vs 'Easier to Ask Forgiveness than Permission'.
The slight benefit is that the stat() system call has to parse the directory name and get to the inode - or the missing inode in this case - and then mkdir() has to do the same. Granted, the data needed by mkdir() is already in the kernel buffer pool, but it still involves two traversals of the path specified instead of one. So, in this case, it is slightly more efficient to use EAFP than to use LBYL as you do.
However, whether that is really a measurable effect in the average program is highly debatable. If you are doing nothing but create directories all over the place, then you might detect a benefit. But it is definitely a small effect, essentially unmeasurable, if you create a single directory at the start of a program.
You might need to deal with the case where strcmp(dir, "/some/where/or/another") == 0 but although "/some/where" exists, neither "/some/where/or" nor (of necessity) "/some/where/or/another" exist. Your current code does not handle missing directories in the middle of the path. It just reports the ENOENT that mkdir() would report. Your code that looks does not check that dir actually is a directory, either - it just assumes that if it exists, it is a directory. Handling these variations properly is trickier.

Similar to Race condition with stat and mkdir in sequence, your solution is incorrect not only due to the race condition (as already pointed out by the other answers over here), but also because you never check whether the existing file is a directory or not.
When re-implementing functionality that's already widely available in existing command-line tools in UNIX, it always helps to see how it was implemented in those tools in the first place.
For example, take a look at how mkdir(1) -p option is implemented across the BSDs (bin/mkdir/mkdir.c#mkpath in OpenBSD and NetBSD), all of which, on mkdir(2)'s error, appear to immediately call stat(2) to subsequently run the S_ISDIR macro to ensure that the existing file is a directory, and not just any other type of a file.

Related

Get directories count of IPFS

I installed the ipfs version 0.8.0 on WSL Ubuntu 18.04. Started ipfs using sudo ipfs daemon. Added 2 directories using the command sudo ipfs add -r /home/user/ipfstest, it results like this:
added QmfYH2KVxANPA3um1W5MYWA6zR4Awv8VscaWyhhQBVj65L ipfstest/abc.sh
added QmTXny9ZjuFPm4C4KbQSEYxvUp2MYbSCLppPQirW7ap4Go ipfstest
Likewise, I added one more directory having 2 files. Now, I need the total files and directories in my ipfs using go-ipfs-api. Following is my code:
package main
import (
"fmt"
"context"
"os"
"net/http"
"github.com/prometheus/client_golang/prometheus"
"github.com/prometheus/client_golang/prometheus/promhttp"
"github.com/ipfs/go-ipfs-api"
)
var sh *shell.Shell
func main() {
sh := shell.NewShell("localhost:5001")
dir,err:=sh.FilesLs(context.TODO(),"")
if err != nil {
fmt.Fprintf(os.Stderr, "error: %s", err)
os.Exit(1)
}
fmt.Printf("Dir are: %d", dir)
pins,err:=sh.Pins()
if err != nil {
fmt.Fprintf(os.Stderr, "error: %s", err)
os.Exit(1)
}
fmt.Printf("Pins are: %d", len(pins))
dqfs_pincount.Add(float64(len(pins)))
prometheus.MustRegister(dqfs_pincount)
http.Handle("/metrics", promhttp.Handler())
http.ListenAndServe(":8090", nil)
}
If I run this code, I get the output as:
Dir are: [824634392256] Pins are: 18
Pinned files are incremented as I added files. But what is this output [824634392256]? And why only one?
I tried giving a path to the function dir,err:=sh.FilesLs(context.TODO(),"/.ipfs"). As I guess the files and dir's must be stored in ~/.ipfs. But this gives an error:
error: files/ls: file does not exist
How can I get all directories of ipfs? Where I am mistaken? what path should I prove as a parameter? Please help and guide.
There's a bit to unpack here.
Why are you using sudo?
IPFS is meant to be run as a regular user. Generally you don't want to run it as root, but you'd instead run the same commands, just without sudo:
ipfs daemon
ipfs add -r /home/user/ipfstest
...
Code doesn't compile
Let's begin with the code, and make sure that's working as intended before moving forward, first off your import:
"github.com/ipfs/go-ipfs-api"
Should read:
shell "github.com/ipfs/go-ipfs-api"
As otherwise the code won't compile, because of your usage of shell later in the code.
Why does dir produce the output it does?
Next, let's look at your usage of dir. You're storing *[]MfsLsEntry (MfsLsEntry), which is a slice of pointers. You're outputting that with string formatting %d, which will be a base10 integer (docs), so the "824634392256" is just the memory address of the MfsLsEntry object in the first index of the slice.
Why does sh.FilesLs(context.TODO(),"/.ipfs") fail?
Well FilesLs isn't querying your own regular filesystem that your OS runs on, but actually MFS. MFS is stored locally though, but using the add API doesn't automatically add something to your MFS. You can use FilesCp to add a CID to your MFS after you add it though.
How do I list my directories on IPFS?
This is a bit of a tricky question. The only data really retain on IPFS is either data pinned, or data referenced in the MFS. So above we already learned the FilesLs command lists the files/directories on your MFS. To list your recursive pins (directories), it's quite simple using the command line:
ipfs pin ls -t recursive
For the API though, you'll first want to call something like Shell.Pins(), filter out for the pins you want (maybe a quick scan through, pull out anything recursive), then query the CIDs using Shell.ObjectStat or whatever you prefer.
If working with the pins though, do remember that it won't feel quite like a regular mutable filesystem, because it isn't. It's much easier to navigate through CIDs added to MFS. So that's how I'd recommend you list your directories on IPFS.

How to create directory if it doesn't exist?

If I want to write a file to C:/output/results.csv what's a simple way to make the directory if it doesn't exist? I want to do this because CSV.write(path,data) errors if C:/output/ doesn't exist.
mkdir errors if the directory already exists. I am currently doing the following, but is there a safer/cleaner way to do this?
try
mkdir("C:/output")
catch
# if errors, likely already exists
end
Edit:
As one of the commenters pointed out, mkpath will create a directory if it doesn't exist, and in either case will return the directory name.
My question was confounding the usage of mkdir (which errors if directory exists) and mkpath which does not error in that case.
You could explicitly check whether the directory exists beforehand using isdir:
isdir(dir) || mkdir(dir)
CSV.write(joinpath(dir, "results.csv"), data)
But this will not necessarily handle all corner cases, like when the path already exists but is a link to another directory. The mkpath function in the standard library should handle everything for you:
mkpath(path)
CSV.write(joinpath(path, "results.csv"), data)
mkpath(oath) will create a directory if it does not exist and return the path after doing so. If it already exists, the path is returned.

Efficient way to find if a given string is _not_ listed in (sqlite3) table

I have a Db table listing media files which have been archived to LTO (4.3 million of them). The ongoing archiving process is manual, carried out by different people as and when downtime arises. We need an efficient way of determining which files in a folder are not archived so we can complete the job if needed, or confidently delete the folder if it's all archived.
(For the sake of argument let's assume all filenames are unique, we do need to handle duplicates but that's not this question.)
I should probably just fire up Perl/Python/Ruby and talk to the Db thru them. But it would take me quite a while to get back up to speed in those and I have a nagging feeling that it would be overkill.
I can think of a two simpler approaches, but each has drawbacks and I wonder if there's a yet better way?
Method 1: is to simply bash-recurse down each directory structure, invoking sqlite3 per-file and outputting the filename if the query returns and empty result
This is probably less efficient than
Method 2: recurse through the directory structure and produce an sql file which will:
create a table with all our on-disk files in it (let's call it the "working table")
compare that with the archive table - select all files in the working table but not in the archive table
destroy the working table, or quit without saving
While 2 seems likely more efficient than 1, it seems that building the comparison table in the first place might incur some overhead and I did kind of imagine the backup table as a monolithic read-only thing that people refer to and don't write into.
Is there any way in pure SQL to just output a list of not-founds (without them existing in another table)?
Finding values not in some other table is easy:
SELECT *
FROM SomeTable
WHERE File NOT IN (SELECT File
FROM OtherTable);
To create the other table, you can write a series of INSERT statements, or just use the .import command of the shell from a plain text file.
A temporary table will not be saved.
Sooo, I think I have to answer my own question.
tl;dr - use a scripting language (the thing I was hoping to avoid)
Trying that and the other two approaches (details below) on my system yields the following numbers when checking a 33-file directory structure against the 4.3 million record Db:
A Ruby script: 0.27s
Bash running sqilte3 once per file ("Method 1"): 0.73s
SQL making a temp table and using "NOT IN" (Method 2): 8s
The surprising thing for me is that the all-sql is an order of magnitude slower than bash. This was true using the macOS (10.12) commandline sqlite3 and the GUI "DB Browser for SQLite"
The details
Script method
This is the crux of my Ruby script. Ruby of course is not the fastest language out there and you could probably do better than this (but if you really need speed, it might be time for C)
require "sqlite3"
db = SQLite3::Database.open 'path/to/mydb.db'
# This will skip Posix hidden files, which is fine by me
Dir.glob("search_path/**/*") do |f|
file = File.stat(f)
next unless file.file?
short_name = File.basename(f)
qouted_short_name = short_name.gsub("'", "''")
size = File.size(f)
sql_cmd = "select * from 'Backup_Table' where filename='#{qouted_short_name}' and sizeinbytesincrsrc=#{size}"
count = db.execute(sql_cmd).length
if count == 0
puts "UNARCHIVED: #{f}"
end
end
(Note the next two are Not The Answer, but I'll include them if anyone wants to check my methodology)
Bash
This is a crude Bash recurse-through-files which will print a list of files that are backed up (not what I want, but gives me an idea of speed):
#! /bin/bash
recurse() {
for file in *; do
if [ -d "${file}" ]; then
thiswd=`pwd`
(cd "${file}" && recurse)
cd "${thiswd}"
elif [ -f "${file}" ]; then
fullpath=`pwd`${file}
filesize=`stat -f%z "${file}"`
sqlite3 /path/to/mydb.db "select filename from 'Backup_Table' where filename='$file'"
fi
done
}
cd "$1" && recurse
SQL
CL has detailed method 2 nicely in his/her answer

How to avoid race condition when checking if file exists and then creating it?

I'm thinking of corner cases in my code and I can't figure out how to avoid problem when you check if file exists, and if it does not, you create a file with that filename. The code approximately looks like this:
// 1
status = stat(filename);
if (!status) {
// 2
create_file(filename);
}
Between the call to 1 and 2 another process could create the filename. How to avoid this problem and is there a general solution to this type of problems? They happen often in systems programming.
This is what the O_EXCL | O_CREAT flags to open() were designed for:
If O_CREAT and O_EXCL are set, open() shall fail if the file exists. The check for the existence of the file and the creation of the file if it does not exist shall be atomic with respect to other threads executing open() naming the same filename in the same directory with O_EXCL and O_CREAT set. If O_EXCL and O_CREAT are set, and path names a symbolic link, open() shall fail and set errno to [EEXIST], regardless of the contents of the symbolic link. If O_EXCL is set and O_CREAT is not set, the result is undefined.
So:
fd = open(FILENAME, O_EXCL | O_CREAT | O_RDWR);
if (fd <0) { /* file exists or there were problems like permissions */
fprintf(stderr, "open() failed: \"%s\"\n", strerror(errno));
abort();
}
/* file was newly created */
You're supposed to create the file anyway, and let the OS know whether or not you want a new file to be created in the process if the file doesn't already exist. You shouldn't perform a separate check before.

How do I extend this batch command?

I came across this piece of batch code. It should find the path to every single .exe file if you enter it.
#Set Which=%~$PATH:1
#if "%Which%"=="" ( echo %1 not found in path ) else ( echo %Which% )
For instance, if you save this code in the file which.bat and then go to its directory in DOS, you can write
which notepad.exe
The result will be: C:\WINDOWS\System32\notepad.exe
But it's a bit limited in that it can't find other executables. I've done a bit of batch, but I don't see how I can edit this code so that it can crawl the hard drive and return the exact path.
When you want to find an executable (or other file) anywhere on the drive, not just in PATH, then perhaps only the following will work reliably:
dir /s /b \*%!~x1 | findstr "%1"
But still, it's horribly slow. And it doesn't work with cyclic directory structures. And it probably eats children.
You may be much better off using either Windows Search (dependin on OS) or writing a program from scratch which does exactly what you want (the cyclic dir thing might happen on recent Windows versions pretty easily; afaik they have that already by default).
Here's the same thing written in python:
import os
def which(program,additional_dirs=[]):
path = os.environ["PATH"]
path_components = path.split(":")
path_components.extend(additional_dirs)
for item in path_components:
location = os.path.join(item,program)
if os.path.exists(location):
return location
return None
If called with just an argument, this will only search the path. If called with two arguments ( the second being an array ), other directories will be searched.Here are some snippets :
# this will search notepad.exe in the PATH variable
print which("notepad.exe")
# this will search whatever.exe in PATH. If not found there,
# it will continue searching in the D:\ drive and in the Program Files
print which("whatever.exe",["D:/","C:/Program Files"])