Digital Asset Management tool for large files that are not photos or videos - assets

Most DAMs that I have found are geared towards media like photos and videos. I have need to manage large binary files like ISOs and IMG files.
Does anybody know of a DAM that can manage non-media files? Specifically something that is on premise? Going to a DAM in the cloud would be too expensive because of the amount of storage we would need and the bandwidth it would consume.

DAMs have specific functionality tailored towards visual content. For example, DAM systems will create previews for the files stored and also, possibly, extract metadata from the file itself. In addition to that, it will also provide you options to transform and download content in various formats. Considering that all these options are part of the DAM package, I would not expect too much from them with respect to previews, metadata extraction and transformations when it comes to large binary files, such as ISO and IMG files.
You can however, use most of the DAMs to upload any file you want. It will simply take it and allow you to tag metadata against it. An example would be Elvis DAM where you can simply upload content (I would use hot folder type of uploads for large files) and tag them with metadata. You can create custom fields such as OS version, applications, etc. and store it against the ISO files. These will become searchable and it will scale to hold all of this information and allow you to quickly find your content.
There might be other simpler and less expensive solutions out there that might just simply keep a file and assign metadata to it.

Try NeoFinder
It's original incarnation was as a catalog program for CDs, but it supports extensive metadata for tagging, as well as pulling metadata from images.
https://www.cdfinder.de

We solved our need by using Git Large File Storage (LFS) to manage our large binary files. We tried out git-annex as well, which worked well, but in the end we went with Git LFS.

Related

Is Dropbox considered a Distributed File System?

I was just reading this https://en.wikipedia.org/wiki/Clustered_file_system#Distributed_file_systems
The definition of a DFS seems to exactly describe Dropbox to me but it isn't in the list of examples, which of course it would be if it was one I think.
So what is different about Dropbox which makes it not fall into this category?
Usually, when talking about distributed file-systems, you expect properties that Dropbox doesn't support. For example, if you and I share a folder, I can create a file called "work.txt" in it and you can create a file "work.txt" in it, and if we do it fast enough (or when we're not syncing with dropbox) we'll have conflicting copies of the same file.
A similar example would be if we both edit the same file concurrently - we'll have conflicting copies, which is something a distributed file system should prevent. In the link you refer to, this is called "Concurrency transparency; all clients have the same view of the state of the file system".
Another example of a property dropbox doesn't support: if my computer fails (e.g., my hard-drive is corrupted) I might lose data that wasn't uploaded to Dropbox. There is a small window in which I think my data was written to the local disk, but if my computer fails, I lose that data.
Lastly, I'm not sure how Dropbox will operate with file locks. For example, MS office takes locks on .doc files, to ensure no one else is working on them at the same time. I don't think Dropbox supports this feature.
I've written a blog post about some of complexities of implementing a distributed file-system, you might find it helpful as well.

embed identification in file and resistance to detection

Say I'm distributing a file that I want to be secret, and I assign each person that I give the file a unique id.
How can I embed this id in the file so that I can determine who leaks my file?
Some file formats have a section in which I can put information that won't render the file corrupt. But this is easily detectable by looking at the specific section, or by changing the information.
I would guess that any solution is identifiable by byte comparison, but I was wondering if there exists solutions that embed the id in a part that if changed, renders the file corrupt. (I would guess this would be file format specific, but this question is to learn about techniques, so I'd gladly read about specific cases.)
Thanks!
For image files and Unicode text you may use Steganography.
For audio files there are special watermarking algorithms that add noise not heard by humans.
You may use metadata to add watermarks, but they can be easily removed by end user.
See at what is currently possible in this SO question: Good library for Digital watermarking

How to put files inside files

MS Word's .docx files contain a bunch of .xml files.
Setup.exe files spit out hundreds of files that a program uses.
Zips, rars etc also hold lots of compressed stuff.
So how are they made? What does MS Word or another program that produces these files have to do to put files inside files?
When I looked this up I just got a bunch of results about compression, but let's say I wanted to make a program that 'wraps' files inside a file without making the final result any smaller. What would I even have to write?
I'm not asking/expecting any source code that does this, I just need a pointer. Is there something you think I'm misunderstanding based on what I've asked here?
Even a simple link to an article or some documentation would be greatly appreciated.
Ok, I'll just come up with some headers for ordinary files and write them along with the bytes of the actual files into one custom-defined file. You guys were very helpful, thank you!
Historically, Windows had a number of technologies to support solutions like this. These were often called Compound Files or Structured storage. However, I don't think the newer Office documents use these technologies. I think the Office file formats are similar to ZIP files with a different extensions. If you change a file with .docx extension to .zip and open it with your favorite compression tool, you'll see a bunch of folders and XML files.
Here are some links to descriptions of different file formats that create "files within files"
Zip file format
Compound File Binary Format (CFBF)
Structured Storage
Compound Document File Format
Office Open XML I: Exploring the Office Open XML Formats
At least on POSIX systems (e.g. Linux), a file is only a stream (i.e. a sequence) of bytes. And you can only grow (or shrink, i.e. truncate) it at the end - there is no way to insert bytes in the middle (without copying the rest).
You need some conventions, and some additional software, to handle it otherwise.
You might be interested in Sqlite, which gives you a library to handle some (e.g.) *.sqlite file as an SQL database
You could also use GDBM - a library giving you some indexed file abstraction.
libtar is a library to manipulate tar archives. See also tardy, a tar file postprocessor.

How Can I Share Referenced Resources Between PDF Files

I create hundreds of PDF files with the same images and fonts. I there a way I can share these resources between all the files instead of having them embedded in each PDF? It sure would be a disk space saver.
No. PDFs are meant to be stand-alone files which fully encompass font information, vector graphics and whatnot in a single file. Sharing between files would break this. If you're looking to save space (and application requirements), you might consider generating the PDFs on the fly.
You can embed external links for things like files if you just want to share linked files.

Use ZIP-archives to store NSDocument data

I noticed that Apple started using zip archives to replace document packages (folders appearing as a single file in Finder) in the iWork applications. I'm considering doing the same as I keep getting support emails related to my document packages getting corrupted when copying them to a windows fileserver.
My questions is what would be the best way to do this in a NSDocument-based application?
I guess the easiest way would be to create a directory file wrapper, create an archive of it and return it in NSDocument's
- (NSFileWrapper *)fileWrapperOfType:(NSString *)typeName error:(NSError **)outError
But I fail to understand how to create a zip archive of the NSFileWrapper.
If you just want to make a zip file your format (ie, "mydoc.myextension" is actually a zip file), there's no convenient, built-in Cocoa mechanism for creating zip archives with code. Take a look at this Google Code project: ziparchive I don't believe a file wrapper will help in that case, though.
Since you cited iWork, I don't own iWork 09, but previous versions use a package format (ie, NSFileWrapper would be ideal) but zip the XML that describes the document's structure, while keeping attachments (like embedded media, images, etc.) in a resource folder, all within the package. I assume they do this because XML can be quite large for large, complicated documents, but compresses very well because it's text. This results in an overall smaller document.
If indeed Apple has moved to making the entire document one big zip archive (which I would find odd), they'd either be extracting necessary resources to a temp folder somewhere or loading the whole thing into memory (a step backward from their package-based approach, IMO). These are considerations you'll need to take into account as well.
You’ll want to take the data from the file wrapper and feed it into something like ziparchive.
Pierre-Olivier Latour has written an extension to NSData that deals with zip compression. You can get it here: http://code.google.com/p/polkit/
I know this is a little late to the party but I thought I'd offer up another link that could help anyone that comes across this post.
Looks like the ZipBrowser sample from Apple would be a good start http://developer.apple.com/library/mac/#samplecode/ZipBrowser/Introduction/Intro.html
HTH