Changing hash of a files - scripting

I have a folder full of binary files and I want to make a change to these files so that the hash of these files will change. I want to do this is a fashion that doesn't pertinently corrupt the files. Meaning that the change should still allow the file to operate normally or that I should be able to undo the change at any point in time.
Does anyone know of a script that I could use to do this or many a program that will automate this?
Cheers
UPDATE
Its a edge case that I am trying to deal with. I have a system that only allows me to store a file with a given hash once. Hence I am wanting to change the content hash of the file to allow the file to be stored. Note the system in question is not one I control or can change.
Couldn't I just add a random 1 to the end of the file and then remove it afterward without breaking anything? I'm just not sure how to script this - as in how to modify the binary data in this way. Note I'm in a windows environment.

Without knowing the format of the files, we can't tell. It may in fact be impossible - for instance if these binary files are self-signed with some private key. Changing any single bit within the file is likely to render it invalid.
Is your hash calculated purely from the contents, and not any other metadata that you can change (such as filename or modified date)? If so, you're probably out of luck. If the hash is meant to detect when the content changes, but you're trying to change the hash without actually changing the content, you've clearly got a problem...
What is the hash used for? Why do you want to change it? There may be an alternative solution if you could give us more information about the bigger picture.
EDIT: One alternative is to effectively create your own container format - so while a file is stored in your container format, it's not usable in its original form, but it can be extracted easily. Your container could be as simple as "add four bytes at the end as a seed to disturb the hash" - "extracting" the file would just involve copying it and removing the last four bytes. But the important point is that what you end up with isn't an MP3 file or whatever you started with - it's your custom format, simple as it is. You need to package/extract the file any time you interact with the store.

Related

Get MD5 Checksum of online file before download in VB

I have a website that has the old "list files" style of doing things, and I want to perform a hash on a file there before downloading it to the user's local system. I know how to hash a local file, but it seems there's not a lot of info as to whether or not I can do this without downloading the online file. My logic is, if the user already has the same file, why waste time downloading it? So, is it possible to do this?
After further contemplation I decided that the date modified comparison is actually the behavior that I want. If a client were to modify a file on accident, there is now an option to correct it. If they modify it on purpose, I certainly don't want to wipe out their work.

Database Schema, pointer to file

This is probably a really simple question, but just making sure. I am designing a database schema and some of tables should link to files on the file system (PDF, PPT, etc).
How should this be done?
My initial idea is varchar(255) with the absolute/relative path to the file. Is there a better way to do this? I've searched online and found varbinary(max), but not sure if that's what I actually want; I don't wish to actually load any binary into the database, merely to have a pointer to a file.
This depends on the OS and the max length of a valid path. What you are calling a "pointer" is just a text field with the file path, so no different than other character data.
I would usually store the relative path, and have the root folder specified in my application. This way you can move files to a different drive, for example, and not have to udpate the rows in your db.
The actual data type you choose depends on the dbms you are using. Some databases also provide specific data types for files that you may want to explore, e.g., the FileStream data type introduced in SQL Server 2008.
You need to store in the database de name of the file, and it's path, is that right? Then you should create a fild with varchar(255). I always used like that and never had problems.
Hope it helped.
If you don't want to store the file's binary data in the database, then storing the path is the only way to go. Whether you store the absolute path or the relative path is up to you.
Yep that's basically it.
Relative path from some location configured as a parameter in Db is the usual way of it.
Aside from getting round length restrictions.
If you had say C:MySystem\MyData as the base path. Then you could do Images\MyImageFile.jpg, Docs\MyDopc.pdf etc.
Note the impact on backup and restore though. You have to do the database and the file system.
One other potential consideration is filenames have to be unique. So you If Fred and Wilma both up load Picture1.jpg, the db is okay, but the file system will be stuffed.
Usual way round this is to have a user filename and an actual filename.
So Fred's Picture1.jpg is actually p000004566.jpg
Don't forget to add code to cope with the file you think should be there has been deleted by some twit.
Also some sort of admin task to tidy up orphaned files might be in order, in the infinitely unlikely event that a coding error was made. :)
Also if the path to the file is configurable by software, make sure you check that the account that will be doing the work has read write access, might also want to use a UNC path, but don't saddle yourself with a mapped drive.

Where can I put temporary files that I don't want show to the user?

in my Mac software I need decrypt a file and, after I do my operations on it, I will remove it. My problem is: Where can I put this file? I don't want show it to the user.
The following API will give you a directory path that is "out of the way":
NSTemporaryDirectory();
Do you mean "decrypt a file in a place the user can't access?" Any place your app can write to, the user can see. And in theory, a user can access any bit or byte on a computer to which they have physical access.
There are obfuscations and such that reduce the odds a user will come across sensitive data, but they are meant for particular situations.
Can you tell us more about your end goal here? Are you trying to implement a DRM/copy protection scheme? Are you trying to prevent cheating in a game? Do you just not trust your user? What?
I think your best bet would be to keep it in memory.
If that's not an option, it depends on what you want to do with it. It's possible you can open a temporary file, and immediately delete it - keeping the valid filehandle open, but not keeping a link to it on the disk.
Another option, perhaps - can you get your secondary program to read from STDIN or a pipe? You could then decrypt the file and pass it's content via a pipe? Clearly, the more complex this process is, the more weak links it might have, but sometimes you just have to get things working.

How should I format user uploaded pictures' filenames?

My website deals with pictures that users upload. I'm kind of conflicted on what my picture filename should consist of. I'm worried about scalability simply and possibly security? Maybe someone out there deals with the same thing and can tell me what their use on their site?
Currently, my filename convention is
{pictureId}_{userId}_{salt}_{variant}.{fileExt}
where salt is a token generated server-side (not sure why I decided to put this here, maybe for security purposes I don't know) and variant is something like t where it signifies it's a thumbnail. So it would look something like
12332_22_hb8324jk_t.jpg
Please advise, thanks.
In addition to the previous comments, you may want to consider creating a directory hierarchy for your files. Depending on volume and the particular OS hosting the files, you can easily reach a point where you have an unreasonably large number of files in a single directory. There may be limits on the number of files allowed per folder. If you ever need to do any manual QA or maintenance on your files, this may be problematic (especially if such maintenance is not scripted).
I once worked on a project with a high volume of images. We decided to record a subpath in our database in addition to the filename of each file. Our folder names looked like this:
a/e/2/f/9
3/3/2/b/7
Essentially, we created folders 5 deep with a single hex value as the folder name. The depth was probably excessive, but effective. I suppose this could have led to us reaching a limit on the number of folders on a volume (not sure if such a limit exists).
I would also consider storing a drive in addition to a path (assuming you have a bunch of disks for storage). This way you can move images around and then update your database (assuming you have one) as part of the move.
My 2 pence worth; there is a bit of a conflict between scalability and security in this problem I would say.
If you have real security concerns, then you should not rely at all on the filename of the target image : this is just security-by-obfusication - somebody could just guess the name eventually.[even with your salt idea, which makes it harder]
Instead you should at least have a login mechanism to create a session between client and server , to make sure you can only get at stuff once you have authenticated: even then stuff is sniffable: if security really is a concern , then I would say you have to use SSL.
Regarding scalability : I would suggest you actually do give your images sequential numbers: and store them in 'bins' of (say) 500 images each. As you fill up a bin, create a new one. Store bin (min-image-id, max-image id) information in one DB table and image numbers in another: you can then comparitively cheaply find which bin a particular image lives in from its id. This is a fairly common solution for storing lots of docs/images.
You could then map your URLs to the bin+image id: but then to avoid the problem noted by Jason Williams (sequential numbering, makes it easy to probe), you really should address security separately as in point 1.
You might like to consider replacing the underscores with (e.g.) minuses. (Underscores are used as wildcards in SQL, so you could potentially run into trouble one day in a LIKE comparison). (And of course, underscores are just plain evil :-)
It looks form your example like you're avoiding spaces and upper-case characters - good move. I'd keep everything lowercase and use case-insensitive comparisons to eliminate any potential case-sensitivity issues with different file systems.
Scalability should be fine as long as you can cope with any number of digits in your user, picture and type IDs. You're very unlikely to hit any filename length limits with this scheme.
Security could be an issue if you use sequential IDs, as someone could potentially tweak the numbers and request a picture they shouldn't be able to access - but the salt should make it virtually impossible for someone to guess the correct filename for another picture. If users can't see/access the internal filename in any way, that may be an unnecessary measure though.
The first thing to do is to setup a directory structure that models your use case. In your case you have a user that uploads a picture. You would probably have a directory structure like this (probably on a network share somewhere):
-Pictures
-UserID1
-PictureID1~^~Variant.jpg
-PictureID2~^~Variant.jpg
-UserID2
-PictureID1~^~Variant.jpg
-PictureID2~^~Variant.jpg
Pictures - simply the root directory for the following.
UserID - is the database user ID.
PictureID is simply the picture ID from the database (assuming you record the filename of each uploaded picture in a database.)
~^~ - This is simply a delimitor. You can use a one character or X character sequence. I like three characters as it is easily handled with the split function and is readily distinguishable in the file name.
Sometimes I like to add the size of the picture in with the file name .256.jpg or .1024.jpg.
At any rate, all of this depends on your use case. The most important thing is setting up the directory structure properly. That will make it easier to access/serve and manage the pictures.
You can add any other information you need into the filename as long it doesn't exceed the maximum filename length on your system.

Vb.Net Document Storage

I am attempting to add a document storage module to our AR software.
I will be prompting the user to attach a doc/image to thier account. I will then put a copy of this file into our folder so that we can reference it without having to rely on them keeping the file in its original place. This system is not using a database but instead its using multiple flat files.
I am looking for guidance on how to handle these files once they have attached them to our system.
How should I store these attached files?
I was thinking I could copy the file over to a sub directory then renaming it to a auto-generated number so that we do not have duplicates. The bad thing about this, is the contents of the folder can get rather large.
Anyone have a better way? Should I create directories and store them...?
This system is not using a database but instead its using multiple flat files.
This sounds like a multi-user system. How are you handing concurrent access issues? Your answer to that will greatly influence anything we tell you here.
Since you aren't doing anything special with your other files to handle concurrent access, what I would do is add a new folder under your main data folder specifically for document storage, and write your user files there. Additionally, you need to worry about name collisions. To handle that, I'd name each file there with by appending the date and username to the original file name and taking the md5 or sha1 hash of that string. Then add a file to your other data files to map the hash values to original file names for users.
Given your constraints (and assuming a limited number of total users) I'd also be inclined to go with a "documents" folder -- plus a subfolder for each user. Each file name should include the date to prevent collisions. Over time, you'll have to deal with getting rid of old or outdated files either administratively or with a UI for users. Consider setting a maximum number of files or maximum byte count for each user. You'll also want to handle the files of departed users.