Database options in Mac application - objective-c

I am currently using mysql database at server and for local sqlite in my application. I am facing lots problem in local database. Sometime database lock, unable to update etc.
Is there any other option than Core data and Sqlite for storing data locally in Mac application?

Just replacing one database with another is unlikely to fix locking problems. SQLite and CoreData (which often uses SQLite) are solid technologies that are used by many, if not most, Mac applications.
Without more information about the locks you're experiencing, I'd suggest that it's more likely that you're using the database incorrectly. Are you trying to access the database from multiple-threads? Are you correctly closing prepared statements?

You could continue to use Core Data but use a different Storage Backend, e.g., the Binary one (which should be less problematic concerning locking and transaction safety in general). See the Core Data Programming Guide for the different kinds of Persistent Store Coordinators.
Regarding Stephen Darlington's Answer: I don't quite agree. Depending on the Concurrency Control in SQLite (most probably Optimistic Concurrency Control) a transaction may be aborted because it modifies data that is currently "in use". This may happen at the granularity of a database of even a single relation. Using a "less transaction safe" backend like the binary store may already be sufficient in this case. This puts the burden on you to manage consistency but if you are sure your transactions don't conflict you should be fine.

I'd say that Core Data is the best one. Why don't you use this?
You can also save things in XML. This is quite handy to save some small things.
You can save a dicitionary for example like this:
[dict writeToFile:#"YOUR_PATH" atomically:NO];
You are able to get the dictionary again by implementing:
NSDicionary* dict = [NSDictionary dictionaryWithContentsOfFile:#"YOUR_PATH"];

Related

Consistency/Atomicity (or even ACID) properties in multiple SQL/NoSQL databases architecture

I'm rather used to use one database alone (say PostgreSQL or ElasticSearch).
But currently I'm using a mix (PG and ES) in a prototype app and may throw other kind of dbs in the mix (eg: redis).
Say some piece of data need to be persisted to each databases in a different way.
How do you keep a system consistent in the event of a failure on one of the components/databases ?
Example scenario that i'm facing:
Data update on PostgreSQL, ElasticSearch is unavailable.
At this point, the system is inconsistent, as I should have updated both databases.
As I'm using an SQL db, I can simply abort the transaction to put the system in its previous consistent state.
But what is the best way to keep the system consistent ?
Check everytime that the value has been persisted in all databases ?
In case of failure, restore the previous state ? But in some NoSQL databases there is no transaction/ACID mechanism, so I can't revert as easily the previous state.
Additionnaly, if multiple databases must be kept in sync, is there any good practice to have, like adding some kind of "version" metadata (whether a timestamp or an home made incrementing version number) so you can put your databases back in sync ? (Not talking about CouchDB where it is built-in!)
Moreover, the databases are not all updated atomically so some part are inconsistent for a short period. I think it depends on the business of the app but does anyone have some thought about the problem that my occur or the way to fix that ? I guess it must be tough and depends a lot of the configuration (for maybe very few real benefits).
I guess this may be a common architecture issue but I'm having trouble to find information on the subject.
Keep things simple.
Search engine can and will lag behind sometimes. You may fight it. You may embrace it. It's fine, and most of the times its acceptable.
Don't mix the data. If you use Redis for sessions - good. Don't store stuff from database A in B and vice versa.
Select proper database with ACID and strong consistency for your Super Important Business Data™®.
Again, do not mix the data.
Using more than one database technology in one product is a decision one shouldn't make light-hearted. The more technologies you use the more complex your project will become in development, deployment, maintenance and administration. Also, every database technology will become an individual point of failure. That means it is often much wiser to stick to one technology, even when it means that you need to make some compromises.
But when you have good(!) reason to use multiple DBMS, you should try to keep them as separated as possible. Avoid placing related data spanning multiple databases. When possible, no feature should require more than one DBMS to work (preferably a failure of the DBMS would only affect those features which use it). Storing redundant data in two different DBMS should also be avoided.
When you can't avoid redundancies and relationships spanning multiple DBMS, you should decide on one system to be the single source of truth (preferably one which you trust most regarding consistency). When there are inconsistencies between systems, they should be resolved by synchronizing the data with the SSOT.

What database for crawler/scraper?

I am currently researching what database to use for a project I am working on. Hopefully you guys can give me some hints.
The project is an automated web crawler that checks websites as per a user's request, scrapes data under certain circumstances, and creates log files of what was done.
Requirements:
Only few tables with few columns; predefining columns is no problem
No overly complex associations between models
Huge amount of date & time based queries
Due to logging, database will grow rapidly and use up a lot of space
Should be able to scale over multiple servers
Fields contain mostly ids (int), strings (around 200-500 characters max), and unix timestamps
Two different types of servers will simultaneously read/write data directly to/from it:
One(/later more) rails app that takes user input and displays results upon request
One(/later more) Node.js server that functions as the executing crawler/scraper. It will have enough load to run continuously and make dozens of database queries every second.
I assume it will neither be a graph database (no complex associations), nor a memory based key/value store (too much data to hold in cached). I'm still on the fence for every other type of database I could find, each seems to have it's merits.
So, any advice from the pros how I should decide?
Thanks.
I would agree with Vladimir that you would want to consider a document-based database for this scenario. I am most familiar with MongoDB. My reasons for using it here are as follows:
Your 'schema requirements' of "only a few tables with few columns" fits well with the NoSQL nature of MongoDB.
Same as above for "no overly complex associations between nodes" -- you will want to decide whether you'd prefer nested documents or using dbref (I prefer the former)
Huge amount of time-based data (and other scaling requirements) - MongoDB scales well via sharding or partitioning
Read/write access - this is why I am recommending MongoDB over something like Hadoop. The interactive query requirement is best met by something other than a Hadoop-style store, as this type of storage is designed for batch (rather than interactive query) requirements.
Google built a database called "BigTable" for crawling, indexing and the search related business. They released a paper about it (google for "BigTable" if you're interested). There are several open source implementations for bigtable-like designs, one of them is Hypertable. We have a blog posting describing a crawler/indexer implementation (http://hypertable.com/blog/sehrchcom_a_structured_search_engine_powered_by_hypertable/) written by the guys from sehrch.com. And looking at your requirements: all of them are supported and are common use cases.
(disclaimer: i work for hypertable.)
Take a look at document-oriented database like a CouchDB or MongoDB.

How to go from a full SQL querying to something like a NoSQL?

In one of my process I have this SQL query that take 10-20% of the total execution time. This SQL query does a filter on my Database, and load a list of PricingGrid object.
So I want to improve these performance.
So far I guessed 2 solutions :
Use a NoSQL solution, AFAIK these are good solutions for improving reading process.
But the migration seems hard and needs a lot of work (like import the data from sql server to nosql in a regular basis)
I don't have any knowledge , I even don't know which one I should use (the first I'd use is Ravendb because I follow ayende and it's done by the .net community).
I might have some stuff to change in my model to make my object ok for a nosql database
Load all my PricingGrid object in memory (in a static IEnumerable)
This might be a problem when my server won't have enough memory to load everything
I might reinvent the wheel (indexes...) invented by the NoSQL providers
I think I'm not the first one wondering this, so what would be the best solution ? Is there any tools that could help me ?
.net 3.5, SQL Server 2005, windows server 2005
Migrating your data from SQL is only the first step.
Moving to a document store (like RavenDB or MongoDB) also means that you need to:
Denormalize your data
Perform schema validation in your code
Handle concurrency of complex operations in your code since you no longer have transactions (at least not the same way)
Perform rollbacks in the event of partial commits (changes)
Depending on your updates, reads and network model you might also need to handle conflicts
You provided very limited information but it sounds like your needs include a single database server and that your data fits well in the relational model.
In such a case I would vote against a NoSQL solution, it is more likely that you can speed up your queries with database optimizations and still retain all the added value of a RDBMS.
Non-relational databases are tools for a specific job (no matter how they sell them), if you need them it is usually because your data doesn't fit well in the relational model or if you have a need to distribute your data over multiple machines (size or availability). For instance, I use MongoDB for a write-intensive high throughput job management application. It is centralized and the data is very transient so the "cost" of having low durability is acceptable. This doesn't sound like the case for you.
If prefer to use a NoSQL solution perhaps you should try using Memcached+MySQL (InnoDB) this will allow you to get the speed benefits of an in-memory cache (in the form of a memcached daemon plugin) with the underlying protection and capabilities of an RDBMS (MySQL). It should also ease data migration and somewhat reduce the amount of changes required in your code.
I myself have never used it, I find that I either need NoSQL for the reasons I stated above or that I can optimize the RDBMS using stored procedures, indexes and table views in a way which is sufficient for my needs.
Asaf has provided great information in regards to the usage of NoSQL and when it is most appropriate. Given that your main concern was performance, I would tend to agree with his opinion - it would take you much more time and effort to adopt a completely new (and very different) data persistence platform than it would to trick out your SQL Server cluster. That said, my answer is mainly to address the "how" part of your question.
Addressing misunderstandings:
Denormalizing Data - You do not need to manually denormalize your existing data. This will be done for you when it is migrated over. More than anything you need to simply think about your data in a different fashion - root aggregates, entity and value types, etc.
Concurrency/Transactions - Transactions are possible in both Mongo and Raven, they are simply done in a different fashion. One of the inherent ways Raven does this is by using an ORM-like "unit of work" pattern with its RavenSession objects. Yes, your data validation needs to be done in code, but you already should be doing it there anyway. In my experience this is an over-hyped con.
How:
Install Raven or Mongo on a primary server, run it as a service.
Create or extend an existing application that uses the database you intend to port. This application needs all the model classes/libraries that your SQL database provides persistence for.
a. In your "data layer" you likely have a repository class somewhere. Extract an interface form this, and use it to build another repository class for your Raven/Mongo persistence. Both DB's have plenty good documentation for using their APIs to push/pull/update changes in the document graphs. It's pretty damn simple.
b. Load your SQL data into C# objects in memory. Pull back your top-level objects (just the entities) and load their inner collections and related data in memory. Your repository is probably already doing this (ex. when fetching an Order object, ensure not only its properties but associated collections like Items are loaded in memory.
c. Instantiate your Raven/Mongo repository and push the data to it. Primary entities become "top level documents" or "root aggregates" serialized in JSON, and their collections' data nested within. Save changes and close the repository. Note: You may break this step down into as many little pieces as your data deems necessary.
Once your data is migrated, play around with it and ensure you are satisfied. You may want to modify your application Models a little to adjust the way they are persisted to Raven/Mongo - for instance you may want to make both Orders and Items top-level documents and simply use reference values (much like relationships in RDBMS systems). Watch out here though, as doing so sort-of goes against the principal and performance behind NoSQL as now you have to tap the DB twice to get the Order and the Items.
If satisfied, shard/replicate your mongo/raven servers across your remaining available server boxes.
Obviously there are tons of little details I did not explain, but that is the general process, and much of it depends on the applications already consuming the database and may be tricky if more than one app/system talks to it.
Lastly, just to reiterate what Asaf said... learn as much as you can about NoSQL and its best use-cases. It is an amazing tool, but not golden solution for all data persistence. In your case try to really find the bottlenecks in your current solution and see if they are solvable. As one of my systems guys says, "technology for technology's sake is bullshit"

Is there a way of sharing a Core Data store between processes?

What am I trying to do?
A UI process that reads data from a Core Data store on disk. It wouldn't need to edit the data, just read and display the data.
A command line process that writes to the same data store as accessed by the UI.
Why?
So that the command line process can be running all the time but the user can quit the UI process and forget about the app until they need to look at the data it's captured.
What would be the simplest and most reliable way of achieving this?
What Have I Tried?
I've read up on sharing a data store between threads and implemented this once before, but I can't find anything in the docs or on the web indicating how to share a store between processes.
Is it as simple as pointing both processes at the same data store file? I've experimented with this briefly. It appeared to work OK, but I'm worried I might run into problems with locking etc when it's really put under stress.
Finally
I'd really appreciate someone giving me pointers on what direction to go with this. Thanks.
This might be one of those situations in which you'll simply have to Try It And See™.
Insofar as I can remember, SQLite (which is the data store you'll most likely want to be using) has built in mechanisms for file locking and so on; so the integrity of the file is likely to be assured. If, on the other hand, you use the CoreData/XML approach, you might run into problems.
In other words; use the SQLite backing for your file, and you should likely be fine.
You can do exactly what you want, you probably want to use the SQLite store otherwise saving and committing every time you want to synch out data will be horrifically slow. You just need to use some sort of IPC doorbell between the apps so that you can inform one app it needs to recheck the persistent store on disk and merge in its data.
Apple documents using multiple persistent store corindators as a valid option in Multi-Threading with Core Data (in "General Guidelines", open 2). That happens to be discussing completely parallel CD stacks in the same process, but it is valid if they are in completely separate address spaces as well.
Nearly two years on, and I've just found a much better way of doing this.
The answer seems to lie with Sync Services. I didn't even realise it existed! There's an excellent post about this at:
http://www.timisted.net/blog/archive/core-data-and-sync-services/
I've not tried this with my app yet, but it seems like an excellent way of sharing a core data store between two processes or applications.
If I experience any performance issues, I'll update this answer accordingly, but this seems like the Apple recommended way of doing it.
You need to re-think your architecture. If you want a daemon to own the data store, then have your GUI app connect to the daemon. Trying to share the data store is a can of worms you don't want to open.

Highest Performance Database Storage Mechanism

I need ideas to implement a (really) high performance in-memory Database/Storage Mechanism. In the range of storing 20,000+ objects, with each object updated every 5 or so seconds. I would like a FOSS solution.
What is my best option? What are your experiences?
I am working primarily in Java, but I need the datastore to have good performance so the datastore solution need not be java centric.
I also need like to be able to Query these objects and I need to be able to restore all of the objects on program startup.
SQLite is an open-source self-contained database that supports in-memory databases (just connect to :memory:). It has bindings for many popular programming languages. It's a traditional SQL-based relational database, but you don't run a separate server – just use it as a library in your program. It's pretty quick. Whether it's quick enough, I don't know, but it may be worth an experiment.
Java driver.
are you updating 20K objects every 5 seconds or updating one of the 20K every 5 seconds?
What kind of objects? Why is a traditional RDBMS not sufficient?
Check out HSQLDB and Prevayler. Prevayler is a paradigm shift from traditional RDBMS - one which I have used (the paradigm, that is, not specifically Prevayler) in a number of projects and found it to have real merit.
Depends exactly how you need to query it, but have you looked into memcached?
http://www.danga.com/memcached/
Other options could include MySQL MEMORY Tables, the APC Cache if you're using PHP.
Some more detail about the project/requirements would be helpful.
An in-memory storage ?
1) a simple C 'malloc' array where all your structures would be indexed.
2) berkeleyDB: http://www.oracle.com/technology/products/berkeley-db/index.html. It is fast because you build your own indexes (secondary database) and there is no SQL expression to be evaluated.
Look at some of the products listed here: http://en.wikipedia.org/wiki/In-memory_database
What level of durability do you need? 20,000 updates every 5 seconds will probably be difficult for most IO hardware in terms of number of transactions if you write the data back to disc for every one.
If you can afford to lose some updates, you could probably flush it to disc every 100ms with no problem with fairly cheap hardware if your database and OS support doing that.
If it's really an in-memory database that you don't want to flush to disc often, that sounds pretty trivial. I've heard that H2 is pretty good, but SQLite may work as well. A properly tuned MySQL instance could also do it (But may be more convoluted)
Oracle TimesTen In-Memory Database. See: http://www.informationweek.com/whitepaper/Business-Intelligence/Datamarts-Data-Warehouses/oracle-timesten-in-memory-databas-wp1228511232361
Chronicle Map is an pure Java key-value store
it has really high performance, sustaining 1 million writes/second from a single thread. It's a myth that a fast database couldn't be written in Java.
Seamlessly stores and loads any serializable Java objects, provides a simple Map interface
LGPLv3
Since you don't have many "tables" a full-blown SQL database could be an overkill solution, indexes & queries could be implemented with a handful of distinct key-value stores which are updated manually by vanilla Java code. Chronicle Map provides mechanisms to make such updates concurrently isolated from each other, if you need it.