Html5 local datastore, and sync across devices - sql

I am building a full featured web application. Naturally, you can save when you are in 'offline' mode to the local datastore. I want to be able to sync across devices, so people can work on one machine, save, then get on another machine and load their stuff.
The questions are:
1) Is it a bad idea to store json on the server? Why parse the json on the server into model objects when it is just going to be passed back to the (other) client(s) as json?
2) Im not sure if I would want to try a NoSql technology for this. I am not breaking the json down, for now the only relationships in the db would be from a user account to their entries. Other than the user data, the domain model would be a String, which is the json. Advice welcome.
In theory, in the future I might want to do some processing on the server or set up more complicated relationships. In other words, right now I would just be saving the json, but in the future I might want a more traditional relational system. Would NoSQL approach get in the way of this?
3) Are there any security concerns with this? JS injection for example? In theory, for this use case, the user doesn't get to enter anything, at least right now.
Thank you in advance.
EDIT - Thanx for the answers. I chose the answer I did because it went into the most detail on the advantages and disadvantages of NoSql.

JSON on the SERVER
It's not a bad idea at all to store JSON on the server, especially if you go with a noSQL solution like MongoDB or CouchDB. Both use JSON as their native format(MongoDB actually uses BSON but it's quite similar).
noSQL Approach: Assuming CouchDB as the storage engine
Baked in replication and concurrency handling
Very simple Rest API, talk to the data base with HTTP.
Store data as JSON natively and not in blobs or text fields
Powerful View/Query engine that will allow you to continue to grow the complexity of your documents
Offline Mode. You can talk to CouchDb directly using javascript and have the entire app continue to run on the client if the internet isn't available.
Security
Make sure you're parsing the JSON documents with the browers JSON.parse or a Javascript library that is safe(json2.js).
Conclusion
I think the reason I'd suggest going with noSQL here, CouchDB in particular, is that it's going to handle all of the hard stuff for you. Replication is going to be a snap to setup. You won't have to worry about concurrency, etc.
That said, I don't know what kind of App you're building. I don't know what your relationship is going to be to the clients and how easy it'll be to get them to put CouchDB on their machines.
Links
CouchDB # Apache
CouchOne
CouchDB the definitive guide
MongoDB
Update:
After looking at the app I don't think CouchDB will be a good client side option as you're not going to require folks to install a database engine to play soduku. That said, I still think it'd be a great server side option. If you wanted to sync the server CouchDb instance with the client you could use something like BrowserCouch which is a JavaScript implementation of CouchDB for local-storage.

If most of your processing is going to be done on the client side using JavaScript, I don't see any problem in storing JSON directly on the server.
If you just want to play around with new technologies, you're most welcome to try something different, but for most applications, there isn't a real reason to depart from traditional databases, and SQL makes life simple.
You're safe as long as you use the standard JSON.parse function to parse JSON strings - some browsers (Firefox 3.5 and above, for example) already have a native version, while Crockford's json2.js can replicate this functionality in others.

Just read your post and I have to say I quite like your approach, it heralds the way many web applications will probably work in the future, with both an element of local storage (for disconnected state) and online storage (the master database - to save all customers records in one place and synch to other client devices).
Here are my answers:
1) Storing JSON on server: I'm not sure I would store the objects as JSON, its possible to do so if your application is quite simple, however this will hamper efforts to use the data (running reports and emailing them on a batch job for example). I would prefer to use JSON for TRANSFERRING the information myself and a SQL database for storing it.
2) NoSQL Approach: I think you've answered your own question there. My preferred approach would be to setup a SQL database now (if the extra resource needed is not a problem), that way you'll save yourself a bit of work setting up the data access layer for NoSQL since you will probably have to remove it in the future. SQLite is a good choice if you dont want a fully-featured RDBMS.
If writing a schema is too much hassle and you still want to save JSON on the server, then you can hash up a JSON object management system with a single table and some parsing on the server side to return relevant records. Doing this will be easier and require less permissioning than saving/deleting files.
3) Security: You mentioned there is no user input at the moment:
"for this use case, the user doesn't
get to enter anything"
However at the begining of the question you also mentioned that the user can
"work on one machine, save, then get
on another machine and load their
stuff"
If this is the case then your application will be storing user data, it doesn't matter that you havent provided a nice GUI for them to do so, you will have to worry about security from more than one standpoint and JSON.parse or similar tools only solve half the the problem (client-side).
Basically, you will also have to check the contents of your POST request on the server to determine if the data being sent is valid and realistic. The integrity of the JSON object (or any data you are tying to save) will need to be validated on the server (using php or another similar language) BEFORE saving to your data store, this is because someone can easily bypass your javascript-layer "security" and tamper with the POST request even if you didnt intend them to do so and then your application will be sending the evil input out the client anyway.
If you have the server side of things tidied up then JSON.parse becomes a bit obsolete in terms of preventing JS injection. Still its not bad to have the extra layer, specially if you are relying on remote website APIs to get some of your data.
Hope this is useful to you.

Related

Advantage of LDAP over RDBMS?

I have an application with a backend as database.
The application is sort of PUB-SUB model where users post changes to the application and other peers subscribe to those changes. These changes may happen very frequently or periodically and all the changes have to be written to database.
Now, I am being asked to find the possibility of replacing this RDBMS with LDAP. Probably they want unified DB for all applications but anyways I have to find the advantage/disadvantages of both approaches.
I cannot directly compare RDBMS a with LDAP as I have almost no idea of LDAP though I tried to get some.
I understand that LDAP is designed for directory access and is optimized for Read access, so it is write once and read many. I have read that frequent writes will reduce the performance of LDAP server as each write will result a trigger to indexing process.
Just to give a scenario in regards with indexing in LDAP, my table will have few columns say 2 viz. Name and Desc. Now in LDAP I suppose this would become two attributes as Name and Desc. In my scenario it's Desc which will be frequently updated. I assume Name will be indexed so even if Desc is changing frequently it won't trigger indexing process.
I point is worth mentioning that the database will be hosted on some cloud platform.
I tried to find out the differences but nothing conclusive I could find out.
LDAP is a protocol, REST is a service based on the HTTP (protocol). So when the LDAP server shall not be exposed to the internet, how do you want to get the data from it? As LDAP is the protocol you would need direct access to the LDAP-server. Its like a database server that you would not expose directly to the internet. You would build an interface to encapsulate it. and that might as well be a REST interface.
I'd try to get the point actos that one is the transfer protocol and a storage backend and the ither is the public interface to its data. It's a bit like why is mysql better than a webinterface. You'd never make the mysql-server publicly available but encapsulate its protocol into an application.
REST is an interface. It doesn't matter how you orgsnize your data behind that interface. When you decide that you want to organize it differently you can do so without the consumer of your API noticing any change. And you can provide different versions of your API depending on improvements of your service.
LDAP on the other hand is an implementation. You can't change the way your data is handled without the consumer noticing it. So there's no way to rearrange your backend without affecting the consumer.
With REST you can therefore change the backend from MySQL to PostgreSQL even to LDAP without notice which you won't be able with LDAP.
Hope that helps
Now that we finally know what you're actually asking, which has nothing to do with your title, the body of your question, or REST, the simple answer is that there is no particular reason to believe that an LDAP server will perform significantly better than an RDBMS in this application, with two riders:
it may not even be feasible, due to the schema issue, and
if it is feasible it may not be semantically suitable, due to the lack of ACID properties, lack of JOINs, and the other issues mentioned in comments.
I will state that this is one of the worst formulated questions I have seen here for some considerable time, and the difficulty of extracting the actual question was extreme.

Which db should I choose for my transaction logging

I have a database question. I am developing an application where users sends some request and gets an answer from a vendor. I have a server receiving the request (through a rest call or a running web service, haven't decided which yet).
Whenever a new request comes in it should be logged in a database and when the vendor responds the record should be updated indicating whether it was accepted or not and stuff like that. The only reason for this storage of transactions is for reporting and logging purposes. So now that I have stated my requirement I need help from someone with more expertise in this.
What I've come up with so far is that it would be best to use a structured database since all records will have one type and the same information, so there's no need to waste space using a semi-structured database with each record containing both structure and information.
But I don't know if there are any databases that are particularly good for this kind of "create/update operations only" ?? As I said I only need to read the data perhaps once a month or so.
Any inputs are appreciated!
You can use any open source database like postgreSql as you are mostly going to do inserts and not much other features needed. My suggestion will try to put logging process in separate threads rather than the one you are using for processing to have better performance for your api calls.
I'm developing a application with a lot of create/update queries and currently using Neo4j.
It's fast and really good with j2E and php. NoSQL is really fast to learn with it, and the web interface is really user friendly :)

What's the best way to get a 'lot' of small pieces of data synced between a Mac App and the Web?

I'm considering MongoDB right now. Just so the goal is clear here is what needs to happen:
In my app, Finch (finchformac.com for details) I have thousands and thousands of entries per day for each user of what window they had open, the time they opened it, the time they closed it, and a tag if they choose one for it. I need this data to be backed up online so it can sync to their other Mac computers, etc.. I also need to be able to draw charts online from their data which means some complex queries hitting hundreds of thousands of records.
Right now I have tried using Ruby/Rails/Mongoid in with a JSON parser on the app side sending up data in increments of 10,000 records at a time, the data is processed to other collections with a background mapreduce job. But, this all seems to block and is ultimately too slow. What recommendations does (if anyone) have for how to go about this?
You've got a complex problem, which means you need to break it down into smaller, more easily solvable issues.
Problems (as I see it):
You've got an application which is collecting data. You just need to
store that data somewhere locally until it gets sync'd to the
server.
You've received the data on the server and now you need to shove it
into the database fast enough so that it doesn't slow down.
You've got to report on that data and this sounds hard and complex.
You probably want to write this as some sort of API, for simplicity (and since you've got loads of spare processing cycles on the clients) you'll want these chunks of data processed on the client side into JSON ready to import into the database. Once you've got JSON you don't need Mongoid (you just throw the JSON into the database directly). Also you probably don't need rails since you're just creating a simple API so stick with just Rack or Sinatra (possibly using something like Grape).
Now you need to solve the whole "this all seems to block and is ultimately too slow" issue. We've already removed Mongoid (so no need to convert from JSON -> Ruby Objects -> JSON) and Rails. Before we get onto doing a MapReduce on this data you need to ensure it's getting loaded into the database quickly enough. Chances are you should architect the whole thing so that your MapReduce supports your reporting functionality. For sync'ing of data you shouldn't need to do anything but pass the JSON around. If your data isn't writing into your DB fast enough you should consider Sharding your dataset. This will probably be done using some user-based key but you know your data schema better than I do. You need choose you sharding key so that when multiple users are sync'ing at the same time they will probably be using different servers.
Once you've solved Problems 1 and 2 you need to work on your Reporting. This is probably supported by your MapReduce functions inside Mongo. My first comment on this part, is to make sure you're running at least Mongo 2.0. In that release 10gen sped up MapReduce (my tests indicate that it is substantially faster than 1.8). Other than this you can can achieve further increases by Sharding and directing reads to the the Secondary servers in your Replica set (you are using a Replica set?). If this still isn't working consider structuring your schema to support your reporting functionality. This lets you use more cycles on your clients to do work rather than loading your servers. But this optimisation should be left until after you've proven that conventional approaches won't work.
I hope that wall of text helps somewhat. Good luck!

What kind of server for operational transform operations?

I am hoping to use the Diff-Match-Patch algorithms available from google as apart of the Google-Mobwrite real time collaborative text editor protocol in order to embed a real time collaborative text editor in my program.
Anyways I was wondering what exactly might be the most efficient way of storing "global" copies of each document that users are editing. I would like to have each document stored on a server that is not local to any user and each time a user performs an "operation" ( delete insert paste cut ) that the diff is computed between their copy and the server and its patched etc... if you know the Google mobwrite protocol you probably understand what I am saying.
Should the servers text files be stored as a file that is changed or inside an sql database as a long string or what? Should I be using websockets to communicate with the server? I am honestly kind of an amateur when it comes to this but am generally a fast learner. Does anyone have any tips or resources I could follow perhaps? Thanks lot
This would be a big project to tackle from scratch, so I suggest you use one of the many open source projects in this area. For example, etherPad:
https://code.google.com/p/etherpad/
Mobwrite is using Differential Synchronization technique and its totally different from Operational Transformation technique.
Differential Synchronization suppose to have a communication circle that always starts from the client(the browser), which means you cant use web-sockets to send diffs from the server directly. The browser needs to request the server frequently to get the updates (lets say every 2 seconds), otherwise your shadow-copies will be out of sync.
For storing your shadow-copies when the user is active, you can use whatever you want, but its better to to use in-memory DB (Redis) since you need fast access to do the diffs and patches. And when the user leaves the session you don't need his copy anymore. But, If you need persistence in you app, you should persist only the server-copy not the shadow-copy (shadow-copies are used to find-out the diffs), then you can use MySQL or whatever you like.
But for Operational Transformation technique there are some nice libs out there
NodeJS:
ShareJS (sharejs.org): supports all operations for JSON.
RacerJS: synchronization model built on top of ShareJS
DerbyJS: Complete framework that uses RacerJS as its model.
OpenCoweb (opencoweb.org):
The server is either Java or Python, the client is built with Dojo

Database of images and text

background:
I'm in the design phase of building an app.
I want the app to display text and images, the problem is that I will have A LOT of them. hundreds to thousands.
This is my largest app so far, and I am unsure on how to handle all the data.
The question???????:
What would be the best way to store and access these images and text?
Would I use a formal database approach like SQL?
Or would it be better to navigate files/folders e.g. dropping all the files in res/drawable?
potentially useful facts:
The database will be stored and accessed natively so it can be accessed off-line.
The user will not be adding to the database in anyway, only accessing the data.
the database will be updated every 6 months.
The application 'page' will display 1-5 images along with several blocks of text.
Concept:
the app will be like a recipe app...the user will pick some parameters e.g. ingredients, type, diet.. then select a recipe. And then several images and blocks of text will be displayed showing and detailing the process of some recipe.
I apologize if this is repeated but I didn't see a specific answer for my purposes.
The "Best" approach will depend on the functionality of the database server in question.
Generally, you should store the images "In" the database until that becomes a performance issue. Once you start storing images "Outside" of the database you will have to handle all the issue that are normally taken care of by the database. Disk space management, orphan records, file name conflicts, folder file limits, to name just a few. Depending on your situation these may be big issues or thay may be nothing to worry about.
I've seen several application where images (or attachements) were kept "Outside" the database, and in each case it was done poorly. There are just so many issues to handle, and most developers don't even think of half of them. In many cases the performance of storing the images "In" the databse was acceptable, but the developers decided against it because they just knew it would not perform well.
If your using SQL server 2008 the Filestream data type is ideal for your case. It stores the binary files outside of the database but behaves as a normal field. Also you are able to read/write the files using a stream instead of getting/setting the whole file as a byte array (like when using varbin(max))
If you don't have this functionality in your database, I would recommend storing the images outside of the DB
Its probably a better idea to use a file based approach for deployed static resources.
At the very least because taking a dependency on file system is typically easier to manage then taking a dependency on a DB.
Also this line indicates some sort of non-web client
The database will be stored and accessed natively so it can be accessed off-line."
This means if you go with the DB approach you'll have a couple of other interesting problems
Deployment
Depending on the platform deploying a DB can be a real bear depending on your target platform. What happens if they if already have the engine but its a different version.
Resources
Is your DB going to be client/server based (like MySQL/SQL Server etc)? If so then your app has to now manage the current state of its process. If not then you'll be using a file-based db SQL Lite/MS Access, at which point I would question why using a static DB is worth doing at all.
One final note. There's nothing stopping your Content Production environment from using a DB. Its quite common for Content producers to maintain a database for their content that will you will later use to produce the files for publishing/deployment.