Transfer images from ArangoDB’s file system to amazon s3 - amazon-s3

I am currently using ArangoDB to store all the data I'm using for my application, including images. Now I want to migrate to S3 to store the image files and transfer the files I currently have in my ArangoDB.
I am aware that images are stored in file system, but I am not sure how to actually transfer them to s3.
Thank you for your help

The location of the data files is implementation-specific, as it can be changed at install and startup. On Linux, the default directory is /var/lib/arangodb3.
But in my experience, backing-up the raw storage files is not a good idea. I have found it very difficult to restore or access data with this method. Instead, I recommend using one of these two "official" methods:
Hot backups (enterprise-edition only)
JSON export (using arangobackup/arangoimport)
Snapshot-style "Hot backups" are really great - truly the preferred method. They have everything you would need (speed, reliability, portability, etc.), with only a few case-dependent limitations. The real downside is that it's only available in the enterprise editions (including Oasis).
JSON export is the "thrifty" backup option - I would forget about arangorestore (it does horrible things to your _id/_key values, and takes forever to do so). The good news about JSON export is that it's EXTREMELY portable. Almost ANY code-base (and even most good DB's) can work with it, so you're never locked into a single product or workflow, or even a specific version of ArangoDB (making up/down-grades much easier).

Related

Can Npgsql dump/restore an entire database?

Is it possible to use Npgsql in a way that basically mimics pg_dumpall to a single output file without having to iterate through each table in the database? Conversely, I'd also like to be able to take such output and use Npgsql to restore an entire database if possible.
I know that with more recent versions of Npgsql I can use the BeginBinaryExport, BeginTextExport, or BeginRawBinaryCopy methods to export from the database to STDOUT or to a file. On the other side of the process, I can use the BeginBinaryImport, BeginTextImport, or BeginRawBinaryCopy methods to import from STDIN or an existing file. However, from what I've been able to find so far, these methods use the COPY SQL syntax, which (AFAIK) is limited to a single table at a time.
Why am I asking this question? I currently have an old batch file that I use to export my production database to a file (using pg_dumpall.exe) before importing it back into my testing environment (using psql.exe with the < operation). This has been working pretty much flawlessly for quite a while now, but we've recently moved the server to an off-site hosted environment, which is causing a delay that prevents the batch file from completing successfully. Because of the potential for other connectivity/timeout issues, I'm thinking of moving the batch file's functionality to a .NET application, but this part has got me a bit stumped.
Thanks for your help and let me know if you need any further clarification.
This has been asked for in https://github.com/npgsql/npgsql/issues/1397.
Long story short, Npgsql doesn't have any sort of support for dumping/restoring entire databases. Implementing that would be a pretty significant effort that would pretty much duplicate all the pg_dump logic, and the danger for subtle omissions and bugs would be considerable.
If you just need to dump data for some tables, then as you mentioned, the COPY API is pretty good for that. If, however, you need to also save the schema itself as well as other, non-table entities (sequence state, extensions...), then the only current option AFAIK is to execute pg_dump as an external process (or use one of the other backup/restore options).

JSON vs classic schema design [duplicate]

The Project
I've been asked to work on an interesting project -- what amounts to a basic Web CMS -- that uses HTML/CSS/jQuery with PHP. However, one requirement is that there won't be a database to house the data (they want flat files for the documents/pages -- preferable in JSON format).
In a very basic sense, it'll be used to generate HTML pages via a very "non-techie" interface. Each installation would only have around 20 pages, but a few may get up to 100. It has to be fairly easy to drop onto a PHP capable server and run, with very little setup needed.
What's Out There
There are tons of CMS options and quite a few flat file versions. But an OSS or other existing CMS is not an option. They need a simple propriety system.
Initial Thoughts
So flat files it is... but I'd really like to get some feedback on the drawbacks, and if it is worth the effort to try and convince them to use something like MySQL (SQLite or CouchDB are out since none of the servers can be configured to run them at the present time).
Of course the document files are pretty straightforward, but we're also talking about login info for 1 or 2 admins per installation, a few lists, as well as configs/settings (which also can easily be stored in a file with protection).
The Dilemma
If there are benefits to using MySQL rather than JOSN formatted files and some arrays in a simple project like this -- beyond my own pre-conceived notions :) -- I'll be sure to argue them.
But honestly I can't see any that outweigh their need to not have a database system.
I'd appreciate you insight and opinions.
If you can't cite a specific need for relational table design, then you're good with flat files. Build as specified. The moment you can cite a specific need, let them know; upgrading isn't that hard, if you're perception is timely (that is, if you aren;t in the position of having to normalize data that should have been integrated earlier).
It's a shame you can't use CouchDB, this seems like the perfect application for it. Keep in mind that using flat-files severely constrains your architecture and, especially, scalability.
What's the best case scenario for your CMS app? It's successful and people want to use it more? If you're using flat-files it'll be harder to service and improve your system (e.g. make it more robust, and add new features for future versions) and performance will not scale well. So "success" in this case is at best short-lived, as success translates into more and more work for less and less gains in feature-set and performance.
Then again, if the CSM is designed right, then switching between a flat file to RDMS should be as simple as using a different data access file.
Will this be installed on any shared hosting sites. For this to work somewhat safely, a mechanism like suEXEC needs to be set up properly as the web server will need write permissions to various directories.
What would be cool with a simple site that was feed via JSON and jQuery is that the site wouldn't need to load on each click. Just the relevant data would change. You could then use hashes in the location bar to keep track of where you were (ex. http://localhost/#about)
The problem being if they are editing the raw JSON file they can mess it up pretty quick. I think your admin tools would have to generate the JSON files based on the input so that you can ensure nothing breaks. The admin tools would be more entailed then the site (though isn't that always the case with dynamic sites)
What is the predicted data sizes for the CMS?
A large reason for the use of a RDMS is quick,specific access to large amounts of data. The data format might not be large, but if there is a lot of the data, then it might be better in the long run for a RDMS.
Then again, if the CSM is designed right, then switching between a flat file to RDMS should be as simple as using a different data access file.
While an RDBMS may be necessary for a very large CMS, a small one could run off flat files very well. A lot of CMS products out there fall down in that regard, I think, by throwing an RDBMS into the mix when there's no real need.
However, if you are using flat files, there are security issues which others have highlighted. Another issue I've come across is hosting providers using the disable_functions directive in php.ini to disable file I/O functions like fopen() and friends. If you're hosting your CMS on a box you control, you won't have this problem but if you're using a third-party provider, check first.
As the original poster, I wasn't signed in, so I'm following up to the answers so far in an answer (sorry if this is bad form).
There may instances where this is on
a shared host.
Though the JSON files can technically
be edited, this won't be the case.
The admin interface will be robust
enough to do all of the creating/editing of pages
The size for each install will be
relatively small -- 1 - 2 admins,
10-100 pages. A few lists of common
items may run longer (snippets of
copy for example).
Security will be a big issue -- any
other options suggestions on this
specifically?
Well, isn't there a problem with they being distrustful to any database system? Isn't the problem more in their thinking than in technology? Maybe they are afraid of database because it sounds complex to them. In that case, if you just present them some very simple CMS (like CMS made simple, which I've heard is really simple and the learning process is very fast), if they see everything is easy then may be they just don't care what's behind, if it's a database or whatever!
They could hear to arguments like better maintenance, lower cost of maintenance, much better handover to another webmaster than proprietary solutions (they are not dependent on you) etc.

Provide example for why it is not advisable to store images in CoreData?

this question has been asked many times, I have read many users telling that it is not advisable to store images in a DB, in particular within CoreData. By they all seems to omit the reason why they would do so. Even Apple documentation state this, and everybody points to that direction, and every discussion end like this "well you can, but storing the path is better".
Apart from opinions, I would like to have a concrete example of why it is not a good solution.
I explain better, I have a strong background in building Web Application. A concrete example I would give from my point of view could be: do not store images in a DB, but rather the path to them, because you can have them served them by the web server, which can apply all of its caching issues.
But in a desktop environment, especially in iOS application, what are the downside of having stored in Core Data using sqllite, providing that:
There's a separate entity holding the images, it is not an attribute
of main entity
Also seems to be a limit of 100kb for images. Why ? What does happen with a 110,120...200kb ecc ?
thanks
There's nothing special about what Core Data normally does here. It's just using an SQLite database. You can put large blobs of data into it, but it just doesn't scale all that well. You can read more about it here: Internal Versus External BLOBs in SQLite.
That said, Core Data has support for external blobs which in Core Data terminology is called stored in external record (iOS 5.0 and later). Again, there's nothing magic about it, it's just storing the large pieces of data in the file system separately from the SQLite db itself. The benefit is that Core Data updates all this for you.
When you're in Xcode, there'll be a checkbox called Allows External Storage that you can check for Binary Data properties.
The filesystem, and the API:s surrounding it is (just like a webserver) optimized to serve files, of any size, and to apply caching where appropriate.
CoreData is optimized for handling an object graph with tiny pieces of data, like integers and short strings.
Also, there are a number of other issues that tend to creep up on you, like periodically vacuuming the SQLite database CoreData uses, or it won't be able to shrink, just grow.
Leonardo,
With Lion/iOS 5, Core Data started handling file system storage of large BLOBs for you.
The choice is really determined by how many images you are going to have open. If you have many, then you should keep them in the DB. Why? Because you only have a modest number of file descriptors, one of which is used for each open image stored in the file system.
That said, there is still a reason to manage the files yourself. If your BLOBs are really big, say 2+ MB, you will want to map them into memory and not just read them in. (When the memory warnings come, this lets the OS automatically purge them from your resident memory. This is a very good thing.) Even so, you still have the limited number of file descriptors problem.
Andrew

Is there an editor for inserting/editing rows into a Core Data DB?

I've created a Core Data schema in xcode (3.2.5 if it matters) so I have the .xcdatamodel file with the proper entities and relations.
Now - How can I insert data, edit data and/or delete data from it, NOT from within the code ?
Like what phpMyAdmin is for MySql.
Thanks.
Core Data is meant to be used programmatically. Once you run the app once, it should create a file somewhere on disk (exactly where is probably specified in the AppDelegate class). It is likely that this file will be a SQLite database, but it doesn't have to be (the point of Core Data is to abstract your data away from the file format used to store it). It could also be an XML file or a binary file.
If it's a SQLite file, then you can open it in your favorite SQLite editor.
HOWEVER
The schema used in the SQLite format is not documented. If you go mucking around in it, you might get stuff to work, but it's also very likely that you could irreparably screw it up. (If it's an XML file or a binary file, you're probably totally out of luck)
In the end, Core Data is supposed to be used programmatically. To use it in a different way (such as what you're asking for) would be to use it in a way for which it was not intended and therefore not designed.
I don't know if you already solved your problem, but there's this SQLite Manager plug-in for firefox: http://code.google.com/p/sqlite-manager/
I haven't tried importing data or using the INSERT command to insert individual rows, but you could give it a try. It's free and works very well for me as is.
There's quite a few database management tools available for sqlite that allow you to do this. I've tried a few but to be honest none of them have impressed me that much as yet.
Would be great to have something like Toad available.
Anyway, find wherever your database file is, then drop it onto whichever application.
You can then add, delete, and edit rows and columns.
Of course, you will need to maintain any foreign keys and such like.
I find the generated Core Data models to be pretty easy to understand.
Example tools are SQLite Database Browser (free), SQLiteManager (not free), and Base. A quick Google search should reveal those and a few more.
I normally use SQLite Database Browser although it does crash occasionally.
See Christian Kienle's Core data editor. It's not free, but is designed to work directly with core data models and stores via Apple's API, supports binary data, builds relationships and even triggers validation, etc. I've found it's worth the $20.

How can I sync a filesystem structure to SQL?

I currently have a filesystem path I would like to index into a SQL database. I need to access the data so that I can do queries against files based on modified times, or partial names, or many other items.
Is there a way to somehow sync a filesystem to a database automatically, or even access a filesystem in a sql-like interface, without having to crawl through folders recursively?
I was checking the Microsoft Sync Framework 2.0, since it supports SQL databases now, however it doesn't appear to support syncing files to databases.
I'm sure other vendors do something similar, such as Microsoft for the database of files in Media Center, or programs like TVersity storing a database of the files as well.
You don't mention a programming language but here is how I would do it and I think it's the way most media apps maintaining a library do it (although they might be writter in different languages and use the win32 api).
Using .net to get your initial data I would once recursively scan the directory and then add all information to the database. Then I would run a service using a FileSystemWatcher object to be notified about changes and process the events accordingly.
It sounds like what you want is Windows Search. (or the Windows 7 Library feature if you're feeling bleeding edge).
Ultimately though something has to crawl the disk to pull the info out, whether it's you or a third party service.
As an aside SQL may not be the best tool for that particular job, depending on exactly what you want to search for. Tree structures are notoriously tricky to represent in relational databases efficiently - your searches could get quite expensive!