I have an application with a backend as database.
The application is sort of PUB-SUB model where users post changes to the application and other peers subscribe to those changes. These changes may happen very frequently or periodically and all the changes have to be written to database.
Now, I am being asked to find the possibility of replacing this RDBMS with LDAP. Probably they want unified DB for all applications but anyways I have to find the advantage/disadvantages of both approaches.
I cannot directly compare RDBMS a with LDAP as I have almost no idea of LDAP though I tried to get some.
I understand that LDAP is designed for directory access and is optimized for Read access, so it is write once and read many. I have read that frequent writes will reduce the performance of LDAP server as each write will result a trigger to indexing process.
Just to give a scenario in regards with indexing in LDAP, my table will have few columns say 2 viz. Name and Desc. Now in LDAP I suppose this would become two attributes as Name and Desc. In my scenario it's Desc which will be frequently updated. I assume Name will be indexed so even if Desc is changing frequently it won't trigger indexing process.
I point is worth mentioning that the database will be hosted on some cloud platform.
I tried to find out the differences but nothing conclusive I could find out.
LDAP is a protocol, REST is a service based on the HTTP (protocol). So when the LDAP server shall not be exposed to the internet, how do you want to get the data from it? As LDAP is the protocol you would need direct access to the LDAP-server. Its like a database server that you would not expose directly to the internet. You would build an interface to encapsulate it. and that might as well be a REST interface.
I'd try to get the point actos that one is the transfer protocol and a storage backend and the ither is the public interface to its data. It's a bit like why is mysql better than a webinterface. You'd never make the mysql-server publicly available but encapsulate its protocol into an application.
REST is an interface. It doesn't matter how you orgsnize your data behind that interface. When you decide that you want to organize it differently you can do so without the consumer of your API noticing any change. And you can provide different versions of your API depending on improvements of your service.
LDAP on the other hand is an implementation. You can't change the way your data is handled without the consumer noticing it. So there's no way to rearrange your backend without affecting the consumer.
With REST you can therefore change the backend from MySQL to PostgreSQL even to LDAP without notice which you won't be able with LDAP.
Hope that helps
Now that we finally know what you're actually asking, which has nothing to do with your title, the body of your question, or REST, the simple answer is that there is no particular reason to believe that an LDAP server will perform significantly better than an RDBMS in this application, with two riders:
it may not even be feasible, due to the schema issue, and
if it is feasible it may not be semantically suitable, due to the lack of ACID properties, lack of JOINs, and the other issues mentioned in comments.
I will state that this is one of the worst formulated questions I have seen here for some considerable time, and the difficulty of extracting the actual question was extreme.
Related
I will have multiple computers on the same network with the same C# application running, connecting to a SQL database.
I am wondering if I need to use the service broker to ensure that if I update record A in table B on Machine 1, the change is pushed to Machine 2. I have seen applications that need to use messaging servers to accomplish this before but I was wondering why this is necessary, surely if they connect to the same database, any changes from one machine will be reflected on the other?
Thanks :)
This is mostly about consistency and latency.
If your applications always perform atomic operations on the database, and they always read whatever they need with no caching, everything will be consistent.
In practice, this is seldom the case. There's plenty of hidden opportunities for caching, like when you have an edit form - it has the values the entity had before you started the edit process, but what if someone modified those in the mean time? You'd just rewrite their changes with your data.
Solving this is a bunch of architectural decisions. Different scenarios require different approaches.
Once data is committed in the database, everyone reading it will see the same thing - but only if they actually get around to reading it, and the two reads aren't separated by another commit.
Update notifications are mostly concerned with invalidating caches, and perhaps some push-style processing (e.g. IM client might show you a popup saying you got a new message). However, SQL Server notifications are not reliable - there is no guarantee that you'll get the notification, and even less so that you'll get it in time. This means that to ensure consistency, you must not depend on the cached data, and you have to force an invalidation once in a while anyway, even if you didn't get a change notification.
Remember, even if you're actually using a database that's close enough to ACID, it's usually not the default setting (for performance and availability, mostly). You need to understand what kind of guarantees you're getting, and how to write code to handle this. Even the most perfect ACID database isn't going to help your consistency if your application introduces those inconsistencies :)
I have a database question. I am developing an application where users sends some request and gets an answer from a vendor. I have a server receiving the request (through a rest call or a running web service, haven't decided which yet).
Whenever a new request comes in it should be logged in a database and when the vendor responds the record should be updated indicating whether it was accepted or not and stuff like that. The only reason for this storage of transactions is for reporting and logging purposes. So now that I have stated my requirement I need help from someone with more expertise in this.
What I've come up with so far is that it would be best to use a structured database since all records will have one type and the same information, so there's no need to waste space using a semi-structured database with each record containing both structure and information.
But I don't know if there are any databases that are particularly good for this kind of "create/update operations only" ?? As I said I only need to read the data perhaps once a month or so.
Any inputs are appreciated!
You can use any open source database like postgreSql as you are mostly going to do inserts and not much other features needed. My suggestion will try to put logging process in separate threads rather than the one you are using for processing to have better performance for your api calls.
I'm developing a application with a lot of create/update queries and currently using Neo4j.
It's fast and really good with j2E and php. NoSQL is really fast to learn with it, and the web interface is really user friendly :)
I am building a full featured web application. Naturally, you can save when you are in 'offline' mode to the local datastore. I want to be able to sync across devices, so people can work on one machine, save, then get on another machine and load their stuff.
The questions are:
1) Is it a bad idea to store json on the server? Why parse the json on the server into model objects when it is just going to be passed back to the (other) client(s) as json?
2) Im not sure if I would want to try a NoSql technology for this. I am not breaking the json down, for now the only relationships in the db would be from a user account to their entries. Other than the user data, the domain model would be a String, which is the json. Advice welcome.
In theory, in the future I might want to do some processing on the server or set up more complicated relationships. In other words, right now I would just be saving the json, but in the future I might want a more traditional relational system. Would NoSQL approach get in the way of this?
3) Are there any security concerns with this? JS injection for example? In theory, for this use case, the user doesn't get to enter anything, at least right now.
Thank you in advance.
EDIT - Thanx for the answers. I chose the answer I did because it went into the most detail on the advantages and disadvantages of NoSql.
JSON on the SERVER
It's not a bad idea at all to store JSON on the server, especially if you go with a noSQL solution like MongoDB or CouchDB. Both use JSON as their native format(MongoDB actually uses BSON but it's quite similar).
noSQL Approach: Assuming CouchDB as the storage engine
Baked in replication and concurrency handling
Very simple Rest API, talk to the data base with HTTP.
Store data as JSON natively and not in blobs or text fields
Powerful View/Query engine that will allow you to continue to grow the complexity of your documents
Offline Mode. You can talk to CouchDb directly using javascript and have the entire app continue to run on the client if the internet isn't available.
Security
Make sure you're parsing the JSON documents with the browers JSON.parse or a Javascript library that is safe(json2.js).
Conclusion
I think the reason I'd suggest going with noSQL here, CouchDB in particular, is that it's going to handle all of the hard stuff for you. Replication is going to be a snap to setup. You won't have to worry about concurrency, etc.
That said, I don't know what kind of App you're building. I don't know what your relationship is going to be to the clients and how easy it'll be to get them to put CouchDB on their machines.
Links
CouchDB # Apache
CouchOne
CouchDB the definitive guide
MongoDB
Update:
After looking at the app I don't think CouchDB will be a good client side option as you're not going to require folks to install a database engine to play soduku. That said, I still think it'd be a great server side option. If you wanted to sync the server CouchDb instance with the client you could use something like BrowserCouch which is a JavaScript implementation of CouchDB for local-storage.
If most of your processing is going to be done on the client side using JavaScript, I don't see any problem in storing JSON directly on the server.
If you just want to play around with new technologies, you're most welcome to try something different, but for most applications, there isn't a real reason to depart from traditional databases, and SQL makes life simple.
You're safe as long as you use the standard JSON.parse function to parse JSON strings - some browsers (Firefox 3.5 and above, for example) already have a native version, while Crockford's json2.js can replicate this functionality in others.
Just read your post and I have to say I quite like your approach, it heralds the way many web applications will probably work in the future, with both an element of local storage (for disconnected state) and online storage (the master database - to save all customers records in one place and synch to other client devices).
Here are my answers:
1) Storing JSON on server: I'm not sure I would store the objects as JSON, its possible to do so if your application is quite simple, however this will hamper efforts to use the data (running reports and emailing them on a batch job for example). I would prefer to use JSON for TRANSFERRING the information myself and a SQL database for storing it.
2) NoSQL Approach: I think you've answered your own question there. My preferred approach would be to setup a SQL database now (if the extra resource needed is not a problem), that way you'll save yourself a bit of work setting up the data access layer for NoSQL since you will probably have to remove it in the future. SQLite is a good choice if you dont want a fully-featured RDBMS.
If writing a schema is too much hassle and you still want to save JSON on the server, then you can hash up a JSON object management system with a single table and some parsing on the server side to return relevant records. Doing this will be easier and require less permissioning than saving/deleting files.
3) Security: You mentioned there is no user input at the moment:
"for this use case, the user doesn't
get to enter anything"
However at the begining of the question you also mentioned that the user can
"work on one machine, save, then get
on another machine and load their
stuff"
If this is the case then your application will be storing user data, it doesn't matter that you havent provided a nice GUI for them to do so, you will have to worry about security from more than one standpoint and JSON.parse or similar tools only solve half the the problem (client-side).
Basically, you will also have to check the contents of your POST request on the server to determine if the data being sent is valid and realistic. The integrity of the JSON object (or any data you are tying to save) will need to be validated on the server (using php or another similar language) BEFORE saving to your data store, this is because someone can easily bypass your javascript-layer "security" and tamper with the POST request even if you didnt intend them to do so and then your application will be sending the evil input out the client anyway.
If you have the server side of things tidied up then JSON.parse becomes a bit obsolete in terms of preventing JS injection. Still its not bad to have the extra layer, specially if you are relying on remote website APIs to get some of your data.
Hope this is useful to you.
I'm currently evaluating possible solutions to the follwing problem:
A set of data entries must be synchonized between multiple clients, where each client may only view (or even know about the existence of) a subset of the data.
Each client "owns" some of the elements, and the decision who else can read or modify those elements may only be made by the owner. To complicate this situation even more, each element (and each element revision) must have an unique identifier that is equal for all clients.
While the latter sounds like a perfect task for CouchDB (and a document based data model would fit my needs perfectly), I'm not sure if the authentication/authorization subsystem of CouchDB can handle these requirements: While it should be possible to restict write access using validation functions, there doesn't seem to be a way to authorize read access. All solutions I've found for this problem propose to route all CouchDB requests through a proxy (or an application layer) that handles authorization.
So, the question is: Is it possible to implement an authorization layer that filters requests to the database so that access is granted only to documents that the requesting client has read access to and still use the replication mechanism of CouchDB? Simplified, this would be some kind of "selective replication" where only some of the documents, and not the whole database is replicated.
I would also be thankful for directions to some detailed information about how replication works. The CouchDB wiki and even the "Definite Guide" Book are not too specific about that.
this begs for replication filters. you filter outbound replication based on whatever criteria you impose, and give the owner of the target unrestricted access to their own copy.
i haven't had the opportunity to play with replication filters directly, but the idea would be that each doc would have some information about who has access to it, and the filtering mechanism would then allow outbound replication of only those documents that you have access to. replication from the target back to the master would be unrestricted, allowing for the master to remain a rollup copy, and potentially multicast changes to overlapping sets of data.
What you are after is replication filters. According to Chris Anderson, it is a 0.11 feature.
"The current status is that there is
an API for filtering the _changes
feed. The replicator in 0.10 consumes
the changes feed, so the next step is
getting the replicator to use the
filter API.
There is work in progress on this, so
it should be fully ready to go in
0.11."
See the orginal post
Here is a new link to the some documentation about this:
http://blog.couchbase.com/what%E2%80%99s-new-apache-couchdb-011-%E2%80%94-part-three-new-features-replication
Indeed, as others have said, replication filters are the way to go for this. Here is a link with some information on using them.
One caveat I would add is that at scale replication filters can be extremely slow. More information about this and other nuances about couchdb can be found in this excellent blog post: "what every developer should know about couchdb". For large scale systems performing replication in the application layer has proven faster and more reliable.
I've been in touch with LDAP in many projects I've been involved in but, the truth be told, I don't really understand it. I thought it was just a person directory but after I discovered that it can contain any objects in a hierarchical structure.
I installed openldap in my box and I found many tutorials regarding just the installation.
What is LDAP? What are the scenarios where LDAP is the right choice? What are the LDAP concepts I should know for working with it? What are the advantages of LDAP? Is it used just because old applications used it? Is there a good doc anywhere on internet explaining all this questions?
UPDATE:
Complementing the answers I found this link which contains a quick start guide for LDAP newbie like me.
What is LDAP? What are the scenarios where LDAP is the right choice?
At its core, LDAP is a protocol for accessing objects that are suitable for storage in a directory. Whether something is "suitable" is an entirely subjective determination that's left up to implementers, but typically this means collections of many objects that each have infrequently (or never) updated data, where each object has an obvious or canonical way to be looked up:
a phone book (look up by name or by phone number)
titles in a library (look up by title, author, etc.)
tenants in a building (look up by floor, suite, name, etc.)
and so on.
Note that LDAP itself is just a protocol and doesn't provide any actual storage -- in much the same way, HTTP doesn't imply anything about whether you're using Apache, Jetty, Tomcat, Mongrel, et al. as a web server. (One problem with LDAP in general is the confusing reuse of names to mean different things. Wikipedia has a good section on this.)
DITs are a hierarchical description scheme that lend themselves to B-Tree algos very nicely, resulting in tremendous search performance in most cases. Directory Server like OpenDS return indexed searches in micro-seconds, whereas RDBMS systems are much slower. Directory Servers (often called LDAP servers) trade resources (RAM, CPU) for fast read response. RDBMS systems provide greater functionality in terms of management of data in question. Need speed with few or zero updates, simplicity, and small network protocol? Use a Directory Server. Need data management and mining capabilities, and/or high rate-of-change of the database with relational aspects defined between data? Use an RDBMS (MySQL is your best bet here).
LDAP has O(1) read performance, in exchange for O(something worse) write performance. It's ideal for data that's accessed frequently, but changed rarely - directories of people, machine names and addresses, and so on. (hence the acronym: Lightweight Directory Access Protocol.)
LDAP is the right choice where the pain of using a database that isn't relational, in terms of decreased developer familiarity and strange performance characteristics, is less than the gain of blindingly fast read access.
This link will explain LDAP http://blogs.oracle.com/raghuvir/entry/ldap
We use LDAP in our office for email address lookups company wide. We use it as a single source sign on service for our internal apps as well.
One perspective I like to harp on is LDAP is an app on top of a persistence store and a database is a persistence store. Both can be used to store user information.
LDAP gives you a hierarchy which is harder to do in a database. You can make a hierarchy in a database but it's harder to do things like delegation (these rows belong to you only) or ACLs on rows. So pushing security problems out of the database is easier if you use LDAP for storing user identities. Trying to solve it in the database is weird.
At the same time, LDAP is terrible for reporting against (transform LDAP to a DB for reporting). Storing attributes deep in the tree that need to be searched quickly can be problematic for performance (don't do this, have a DB on the side or try to flatten the query by redesigning your DIT). Storing attributes all over the place in a really deep DIT is just bad LDAP or system design but sometimes it's unavoidable if you're tied to a vendor product or legacy app.
LDAP is just a protocol, the wikipedia article explains it adequately http://en.wikipedia.org/wiki/Lightweight_Directory_Access_Protocol
Its a way to query an underlying organizational structure like Microsoft's Active Directory. You can use LDAP queries to get all kinds of information about users, use it for setting application rights, etc.
I am working part time and a full time student. My curriculum encourages (read requires) many group projects.
I have used openLdap and phpLdapAdmin to control access to my Subversion and Mercurial repos, Trac projects, Hudson, etc. It wasn't easy to install, but the time saved in administration was a God send.
If you have projects where you will have many groups of people who need to be able to use different resources, it is a good tool.
See this link :
http://www.umich.edu/~dirsvcs/ldap/doc/guides/slapd/1.html#RTFToC1
Which explains deeply LDAP :
For example you can see this image in that documentation ,
(source: dirsvcs at www.umich.edu)
LDAP is an access protocol; it only provides an API to the underlying technology for which you are trying to find applications - a directory service. OpenLDAP is one of the open source directory services; Sun has another implementation called OpenDS. Active Directory and Novell NDS are another two commonly seen in the field.
The directory can be used for storing information about any sort of resource, and the relationships between the resources - for example, rights of a user to a directory, a printer, or a network access device.
Is there a good doc anywhere on internet explaining all this questions?
IBM published an excellent Red Book about LDAP. The title is:
Understanding LDAP - Design and Implementation.
It can be downloaded from the previous link.
In one of my old workplaces we used LDAP as our primary user authentication system.
This in turn provided our various systems with information which dept. they belonged to, where they should mount their home directories, contact information, employee management.
Not necessarily controlled by LDAP, but other things that we had mixed to work through LDAP was the existence of SQL users, K4, samba and email account generation.