Automatically add database entry after ftp upload - sql

Sorry if this seems stupid but I wonder if it's possible to add a database entry after an ftp upload.
To be more clear, thanks to winSCP I have several folders sending everything I put in there automatically to my server.
However, I would like to create a mysql entry for each uploaded files and once again, automatically. Is it possible to do that? How?
To gives the full details of what I need to do, you can read the following.
I have several folders with pictures and each folders are uploaded automatically.
Each of those folders belong to one user and the goal is to give them an account and allow them to see and download those files through a web interface. Since one account = one folder, that's kinda easy.
And I think a simple .htaccess can simply secure things so one user can only see and download the file in his own repository, no?
However if I want them to be able to see what's new (=something they didn't download or simply mark as read) I think I need a table to manage those files.
Something like id | file (string) | read (bool).
If you think this way to proceed is bad, they I'm open to change how to do things, but to be clear uploading the file need to work this way. Not using any kind of formulary.
Thanks for reading that, sorry for my english.

Your problem contains three steps:
Folders/Files been automatically uploaded to your server directory, as you say, this been efficiently handled by winSCP.
You need to update your database with all the files and folders present in your server directory.
You need to update whether or not it is been read/downloaded by the user.
Since your first step is in place, we don't need anything there. For second step, you should write a script and schedule that script to run at a fixed time interval using CRON (if using LINUX or UNIX, or WINDOWS). The script would be responsible to create a list of file(s) present in the directory, and simply insert the file(s) information that are not present in your database.
EDIT:
This edit is to describe how your script file should work. As I explained, the cron jobs would simply help you run your script file in fixed set of interval (which can be every minute, or every hour, or every day, and so on). Lets say your database table has following columns:
fileid (varchar[20])
filepath (varchar[20])
status (boolean)
Your script file should do following things:
Create a list of existing filepaths in your server directory
Run a select query, create a list of existing filepaths from database table.
Compare list1 with list2, and find the ones that doesn't exist in list2 (This would give you a list of filepath that needs to be inserted into table)
Just insert the list of file paths you got above, and set there status to be false (which means the file is not read/downloaded yet)
NOTE: Please keep in mind that I am not advising right now that how your database table should look like. It can be what you have proposed or can even differ depending on your will or requirements.
For the third step, simply keep the status of your file to be unread when creating entries in your table from the second step, and then when user click on the file link in your application whether to view or download it, send a POST request to your server updating the file status to be marked as read.
Let me know if this helps!

Related

Multiple users executing the same workflow

Are there guidelines regarding how to share a Snakemake workflow among multiple users on the same data under Linux, or is the whole thing considered bad practice?
Let me explain in case it's not clear:
Suppose user A executes a workflow in directory dir/. Assume the workflow terminates successfully, and he/she then properly sets file/directory permissions recursively on all output and intermediate files and the .snakemake/ subdirectory for other users to read/write, of course.
User B subsequently navigates to dir/, adds input files to the workflow, then executes it. Can anything go wrong?
TL;DR: I'm asking about non-concurrent execution of the same workflow by distinct users on the same system, and on the same data on disk. Is Snakemake designed for such use cases?
It's possible to run snakemake --nolock which will prevent locking of the directory, so multiple runs can be made from inside the same directory. However, without lock, there's now an opening for errors due to concurrent runs trying to modify the same files. It's probably OK, if you are certain that this will be avoided, e.g. if you are in constant communication with another user about which files will be modified.
An alternative option is to create a third directory/path, and put all the data there. This way you can work from separate directories/path and avoid costly recomputes.
I would say that from the point of view of snakemake, and workflow management in general, it's ok for user B to add or update input files and re-run the pipeline. After all, one of the advantages of a workflow management system is to update results according to new input. The problem is that user A could find her results updated without being aware of it.
From the top of my head and without more detail this is what I would suggest. Make snakemake read the list of input files from a table (pandas comes in handy for this) or from some configuration file. Keep this sample sheet under version control (with git/github) together with the Snakefile and other source code.
When users update the working directory with new files, they will also need to update the sample sheet in order for snakemake to "see" the new input and other users will know about it via version control. I prefer this setup over dumping files in a directory and letting snakemake process whatever is in there.

What's a best approach to create a filestore

This is an open ended question. I have noob understanding of databases but willing to learn whatever is required. Though I believe my problem could be done without learning a lot.
So, here goes the question:
I have large amount of files getting generated in mt projects(depending on the builds) and I need to archive them and also need to reproduce them according to buildNumber if requested by users. I don't expect these requests to be a lot. May be 1-2 requests a day.
For eg: 16GB data per build every week. Most of the files in weekly builds are duplicate. And I don't want to archive them again and again. I prefer to store them only once. There is one caveat that it can happen that the files relative location can change, even though content hasn't changed.
My approach is as follow: Create a hash from each file. Create the key-value pair as fileHash-actual file and store it. Store this information in some kind of manifest file for each build. So, I should be able to create the builds back with correct files/paths etc.
Can it ever happen that 2 different files will ever have the same hash? Can some database help to do it efficiently? I am currently thinking of dumping all files in one folder.
Thanks

Move file in one AccuRev workspace that has been edited in another workspace

We have a need to refactor a code base. The thing is that this will be done by one person and it would be desirable to avoid having the rest of the development team sitting idle while this job takes place.
We therefore tried the following scenario to see if it is possible to work in parallel.
Created file test.txt in directory first in developer A's workspace.
Promoted this file.
Updated developer B's workspace, thereby getting file test.txt
In A's workspace moved file test.txt to directory second.
Promoted this move.
In B's workspace edited file test.txt while it still resides in directory first (no update is made thereby emulating that work is done while refactoring is taking place).
Tried to promote and got a message saying that file test.txt had been modified (correct, file has been moved).
Tried to merge but got an error message saying that AccuRev can't merge since the file is missing in directory second (where it has been moved).
Tried to update B's workspace but that is not allowed since there is a modified file that needs to be merged first.
We are now stuck in a catch 22 situation.
We did try to place a fake file in directory second but that is not being recognized since this file does not belong to the workspace.
Has anyone out there tried something like this and gotten it to work?
It is of course possible to copy files but if there is a better way we would be grateful to hear about this. Or if this is a known bug or limitation in the tool.
We will contact also contact AccuRev support but I thought that I might be able to get some useful tips from the community.
Currently we are using AccuRev client 5.5.0.
Thanks for any suggestions on how to make the tool support this operation.
Referring to your steps 6 & 7: In AccuRev 5.5 after a file is edited and has a (modified) status you first have to keep before you can promote.
At step 8 you could try doing the merge from the Browse Versions view of the file. That way you can select any node to merge with, including the one that has been moved.
Step 9. An AccuRev update will not run successfully if one of the files to be updated is (modified). This is by design. You can keep the file so it has (kept)(member) status then run the update.
David Howland
After contact with AccuRev support the answer is that the only option available is to copy the file to some temp directory, revert the changes, update the workspace and copy the file into the new location in the workspace.
AccuRev will at least tell you which files you have to copy since they will be marked as modified.
I could experimentally verify David's remark to step 9 using AccuRev 5.5.
Let's assume that in the workspace of user A the file was moved and the move was promoted, while in the workspace of user B the file was modified and user B is about to promote his/her change.
Before the file is kept, it will not be possible for user B neither to merge nor to update. But after keeping the modified file the update is possible. The file is first marked as overlap, then the merge succeeds in the new location. Basically, this avoids creating a copy of the file, reverting it and restoring it in the new location after an update, which can be quite cumbersome, as AccuRev does not reveal easily where the move goes.
If user B promotes the modification before user A promotes the move, all goes smoothly, i.e. on update the moved file appears as overlap, but easily merges into the moved file in the new location.
Similar results are obtained when the two users have workspaces connected to different streams and the overlap occurs on a common parent stream. Only if the file is unkept, an error can occur (i.e. only if the move is promoted before the change). Then a simple keep allows to proceed as usual (update, merge, then promote).

How do i force a file to be deleted? Windows server 2008

On my site a user may upload a file (pic, zip, audio, video, whatever). He then may decide to replace it with a newer revision. This user may upload a file, make a post then decide to put up a new revision replacing the old (lets say its a large zip or tar.gz file). Theres a good chance people may be downloading it if he sent out an email or even im for the home user.
Problem. I need to replace the file and people may be downloading and it may be some minutes before it is deleted. I dont want my code to stall until i cant delete or check every second to see if its unused (especially bad if another user can start and he takes long creating a cycle).
How do i delete the file while users are downloading the file? i dont care if they stop i just care that the file can be replaced and new downloads are the new revision.
What about referencing the files indirectly?
A mapping script, maps a virtual file entry from your site to a real file . If the user wants to upload a new revision of his file you just update the mapping, not the real file.
You can install a daily task that scans all files and deletes all files without a mapping and without open connections.
lajuette's answer is right, the easiest solution is to work around the file locking altogether:
When a user uploads file foo.zip, internally store it as foo-v1.zip.
Create a mapping file somewhere (database, code, whatever) that maps foo.zip to foo-v1.zip.
Rather than exposing a direct link to the file, expose a link to a service that gets the file: mysite.com/Download?foo.zip or something. This service uses the mapping to determine which version of the file to send to the client.
When a new version is uploaded, create foo-v2.zip and update the mapping file.
It wouldn't be that hard to write a scheduled task that cleans up old, un-mapped files.
If your oppose to a database and If the filenames are in a fix format (such as user/id.ext) you could append the id with a revision number and enumerate the folder using a pattern (user/id-*) and use the latest revision.

Accessing tables from different .mdb files

I need to show a grid of saved projects (compare "orders") in a datagrid, where the projects are saved in an Access 2000 database with a similar schema as follows:
ID Name Country_ID Plant_Type
1 'Test' 1 1
2 'Second' 2 2
Let's call the file "Projects.mdb". This is then showed in the datagrid as:
ID Name Country Plant Type
1 'Test' 'Germany' 'Free Range'
2 'Second' 'France' 'Inclined Roof'
where the countries and "Plant Types" are fetched from a different table in a different .mdb file (also Access 2000, call it "Language.mdb", although there is a lot of different background data in it), depending on the current user's language preference. It is unfortunately not an alternative to merge these .mdb's into one file.
To be able to show the datagrid I have so far linked the tables from "Language.mdb" into "Projects.mdb", but this screws up when the project is being installed on another computer with the .msi file i created (we'd like to have this easily packaged and installed), as the "Language.mdb" doesn't exist on the linked path on the target computer (Basically the problem here).
I can come up with the following solutions:
Force all users to install on the same path, so that the links will work (undesirable)
Use connection strings in the query as shown here on MSDN (still trying this out, but I need to work on the details)
make a post-install script that relinks the tables according to the correct path.
But I think I'm doing something wrong here. As stated above, it is not an option to merge the .mdb-files, but other suggestions to changing the database schema or whatever it could be (I'm not very experienced with databases) would be very appreciated.
To get around the 'different install paths' problem I use code (on every database load) which first looks for any back end databases in the current db folder; if not found, it asks the user to locate the missing .mdb file. Then the code relinks the database(s). Once the dbs have been successfully linked, the database saves the path and checks this path first on subsequent loads.
Well, based on the constraints that you have put on the solution. I would either go with option 2 or 3. There is not an elegant solution to this at all.
I would however, lean towards your third option, as a "one time" fix to get the files linked, so that the path between them is known, and you are not dynamically adding path information into every query.
note
I'll just mention, but I'm sure you already know this, that if you are looking at doing something like this, it just feels wrong to be doing it with Access, let alone access 2000 at this time for client deployments. I would strongly recommend additionally truly evaluating the solution and see if you can either merge to one, or possibly move to SQL Server Express or something that you could send off to the user as an installer
Is Project split, as it should be, to allow a front end on each user's computer? If so, can you not store the path on the front-end and only re-link if it changes? Code to re-link tables is quite simple, for the most part. The user can be allowed to browse for the location and the Connect property can be updated accordingly.