Use SQL or NoSQL? - sql

I'm designing a system that checks a given website for any security vulnerabilities. The system includes a client (firefox plugin) and a server. The server does all the scanning while the client just relays that info to the user. If a website is dangerous, it is blacklisted; otherwise whitelisted.
The system must hypothetically be able to handle several thousands of requests and updates to the database simultaneously.
Although the database is expected to have a very simple structure, I am still considering using NoSQL because my understanding is that it can handle a greater amount of queries. Is this true? Which db technology is better suited for my system?

I suggest a NoSQL database.
In fact I've been working with two databases in the last weeks, and searching on internet I found the differences between a NoSQL an a SQL database.
Pratically, you should use a NoSQL db if you have a lot of data to query. Remind that it's not sure the data recovery in case of a db disaster.
Instead, use a SQL database if your data MUST be permanent, and you can't lose it. But query times will be longer, so it's not suggested if you have tons of data.
I understood, from what you wrote, that you need lot of queries and you "can lose" the data (if you lose a website of the list, you'll just need to re-check it, right?).
So I suggest you to go for a NoSQL db (I worked with MongoDb, it is the most famous worl-wide).

If you consider NoSQL Databases you have to analyze your data to get the right Database.
For your use case I think you should look at document databases (like MongoDB) or, if you want really high performance, a key-value Database like Redis or Riak.
With Key-Value databases you can only use the key to find the data you want.
With document databases you still have some kind of querys to find the data.
For further information look at: http://nosql-database.org/

Related

SQL Server and Oracle terminology

SQL Server and Oracle terminology -
In SQL Server If I have two applications and want to keep the database completely separate, I could simply create 1 database for each application therefore I end up with 2 databases.
If I wanted to do the same thing in oracle, what do I need to create?
- create a new "Databases"? "Instance", "Schema", or "Tablespace" per application?
(Note, these two applications is the same application used by two different companies, that do not share data!)
Reference: http://www.codeproject.com/Tips/492342/Concept-mapping-between-SQL-Server-and-Oracle
Having worked with SQL Server a lot in the past, I have sympathy with trying to figure out how Oracle organizes things as I struggled with the same thing. My comments below are from SQL Server 2000 and 2003 so forgive me if things have changed since then.
Previous responders have been helpful. I think one problematic assumption here is that there is an exact "level" equivalency between SQL Server and Oracle. What I mean by "level" is something that occupies the same space in the hierarchies that you have diagrammed above (and which, btw, I think are a good place to start but might need a bit of editing in a couple of places, for example how you have diagrammed "user" and "schema" in the Oracle hierarchy, I might put them side-by-side.) I do not think these concept "levels" match exactly between the DB platforms.
A schema in Oracle is somewhat equivalent to a separate database in SQL Server but not entirely.
I would say that the "walls" -- not an exact technical term but oh well -- between databases in SQL server are a bit higher than the "walls" between schemas in Oracle. Others might disagree but here is my reasoning:
a. A schema in Oracle is a purely logical construct. It denotes who has ownership of objects. It has nothing to do with the physical location or layout of the objects. A tablespace (orthagonal concept, as noted by a previous poster) indicates the physical location of objects. A tablespace can hold objects that are in multiple schemas and vice versa. In SQL Server these two concepts are sort of merged into one -- a database is both tablespace and schema, more or less, although in some respects within a DB in SQL Server you then have multiple owners with various object ownership. This can get a bit confusing because as I remember (it's been a couple of years) if not using NT Authentication the users are defined at the server level and then have to "link" to the users in the individual DBs.
b. I remember finding it easier, or at least a bit simpler, to assure myself that users to two separate DBs in SQL Server had no access to the relative other user's DB than I have found it in Oracle.
c. Because a DB in SQL server represents both physical storage and logical ownership, you can detach the DB and move it to another SQL Server Instance and attach it. You can't do this with a schema in Oracle. I mean, you can datapump the data out or back it up or whatever to another server and another schema, but that all takes at least some scripting and such or at least a fair amount of clicking in Enterprise Manager. It doesn't give you the one-click "Detach DB" option that you have in SQL Server which makes it a lot easier to get the idea that SQL Server DBs are units that you can more-or-less move back and forth between databases.
To sum things up, I think either option would work. That is, 1) Create two separate instances of Oracle with one schema in each instance for each application, or 2) Create two separate schemas in one Oracle instance.
There are pros and cons for each option. Option 1 is probably going to be more work to set up and configure but will also give you more separation, independence, ability to have separate hardware, etc., for each DB. Option 2 will be quite a bit simpler but gives you less separation between the data and greater risk of configuration screw-ups or other things allowing users of one schema to access the other. It also means you have to be a bit more careful that someone writing a query accessing data in one schema doesn't use all the CPU and IO resources and starve a user on the other schema.
Also, yes, you could use pluggable databases in 12c. However, given the fact that you need to ask these questions (no shame, just pointing out where you're at) makes me hesitant about recommending what can easily be a more complex setup.
TL;DR -- SQL Server isn't Oracle and Oracle isn't SQL Server. Either option works and there are pros and cons to each.
If you're using 12.1 or later with the multitenant option, you could create separate pluggable databases in a single container database. The other option, which works in any version of Oracle, would be to create a separate schema. It would be possible, as well, to create a separate database, though that is generally not the preferred approach unless you have a particular need to do things like upgrade the database that one application is using without affecting the other.
Creating a Database
If you create a separate database, you'd end up with complete separate memory structures (i.e. the SGA and PGA for each database would be separate) as well as a completely separate set of background processes (each database would have its own log writer process(es) for example). That is a very heavyweight option-- you can't have too many databases on a single server before you start having a lot of contention for RAM, for scheduling all the background processes, etc. It does provide for the maximum separation between different applications-- each database can be running a different version of Oracle with a different set of initialization parameters-- but this also tends to increase the complexity of managing the environment. This generally only makes sense when you have third party applications that require a specific version of the database or a specific set of initialization parameters.
Creating a Schema
If you create a separate schema, you still have a single database so the two schemas are sharing the same memory structures (competing with each other for space in the SGA's buffer cache, for example), initialization parameters, etc. You have to exercise a modicum of planning to ensure that that the two don't interfere with each other-- you'd probably want to make sure that nether application creates public synonyms or at least that they won't wan to create the same public synonym as the other application-- but this is generally pretty trivial.
Creating a Pluggable Database
This only works in 12.1 and only if you have the multitenant option. This is the most similar to the SQL Server concept of creating a new database for each application.
You should create a new instance (schema) on the same database, where the schema in oracle is the same as the SQL server database

Querying multiple database servers?

I am working on a database for a monitoring application, and I got all the business logic sorted out. It's all well and good, but one of the requirements is that the monitoring data is to be completely stand-alone.
I'm using a local database on my web-server to do some event handling and caching notifications. Since there is one event row per system on my monitor database, it's easy to just get the id and query the monitoring data if needed, and since this is something only my web server uses, integrity can be enforced externally. Querying is not an issue either, as all the relationships are one-to-one so it's very straight forward.
My problem comes with user administration. My original plan had it on yet another database (to meet the requirement of leaving the monitoring database alone), but I don't think I was thinking straight when I thought of that. I can get all the ids of the systems a user has access to easily enough, but how then can I efficiently pass that to a query on the other database? Is there a solution for this? Making a chain of ors seems like an ugly and buggy solution.
I assume this kind of problem isn't that uncommon? What do most developers do when they have to integrate different database servers? In any case, I am leaning towards just talking my employer into putting user administration data in the same database, but I want to know if this kind of thing can be done.
There are a few ways to accomplish what you are after:
Use concepts like linked servers (SQL Server - http://msdn.microsoft.com/en-us/library/ms188279.aspx)
Individual connection strings within your front end driving the database layer
Use things like replication to duplicate the data
Also, the concept of multiple databases on a single database server instance seems like it would not be violating your business requirements, and I investigate that as a starting point, with the details you have given.

Should a Text File be stored in my SQL Server Database?

We are using several text files as Templates to create the results of a WCF Data Services - Service Operation call.
The text files are each less than 3000 Bites Max.
What are the pros and cons of storing my template files on the file system with the WCF Data Services files vs storing them in a SQL Server 2008 R2 server?
Prior to SQL Server 2008, I recommended strongly against storing large objects like text files in the database. It tended to slow down access and made them generally harder to work with. Instead, I generally recommended storing links to the files in question.
Of course, this meant that the database would not protect the files in the event someone deleted something they shouldn't and the files needed to be backed up and transferred separately from the database.
With SQL Server 2008, I think many of the former problems have been overcome using the filestream functions and I think that storing files using filestream can be quite useful at times. It continues to store the actual data outside the database, which avoids many of the former complications. But it still binds the two together and permits the database to protect the files rather than just relying on the links in the database to be correct.
There's a lot of pros and cons for either storage method. Nowadays (my opinion has changed, and may change again some day), I'd focus on security and managability.
If it is sensitive data, you might get a bit more security by storing them in the database. If nothing else, it might be more difficult to hack a database than a file system. If security is not so important, it can be easier/simpler to store it on the OS.
For managing, if the data gets updated (and how frequently does that happen), how easy is it to update? One instance in a database is simpler to update (or corrupt...) than an instance on each of however many servers are in your web farm. (1 server, no problem, 20 servers, possible headache.)
I think it's better to store the data directly in the database. This makes it even faster to access because a database is more efficient in reading and generally handling data. You could always store movies in databases - that's no problem. Then it's also possible to stream large data.
For security reasons existing more as enough options to configure your database. And if you cluster your database - this is even more scalable.

should i advocate migrating from access to (my)sql

We have a windows MFC app that is written against an access database on a company server. The db is not that big: 19 MB. There are at most 2-3 users accessing it at any one time. It is used in a factory environment where access speed (or lack thereof) over the intranet becomes noticeable as it is part of the manufacturing time for our widgets.
The scenario is this: as each widget is completed, it gets a record in the db.. by the end of the year, the db is larger and searching for a record takes longer and longer. The solution so far has been to manually move older records to an archival table about once a year.
We are reworking other portions of this app right now, and it would be a good time to move to another db if we are going to do it.
It is my understanding that if we were using sql, the search time would not go up as the table gets bigger because the entire .mdb does not have to be sent over the network each time. Is this correct? Does anyone have any insight about whether it could be worth it to go to the trouble (time and money) of migrating to a new db, or should I just add more functionality to the application we have now, and maybe automatically purge the older records from time to time, and add additional facilities to the app to get at the older records when needed?
Thanks for any wisdom you can share..
Since your database is small and very few users, I could not make a solid case for migration. I would definetly set up an script to archive old records on a more frequent basis (don't archive into same db, this would somewhat defeat the purpose).
But also make sure two things are correct as well.
INDEXES. If your queries start slowing down, make sure you have proper indexes
http://support.microsoft.com/kb/304272
Your network connection between computers is fast. Maybe upgrade to gigabit cards and router? Possibly put the db on a scsi drive (raid 10 for speed and redundancy)
Throwing advanced technology at simple problems is an expensive way to go and not always the answer!
First of all, the information that the whole table and the whole database is transferred across the network is simply incorrect. If the queries are indexed, then the search times should not go up that much over time.
As others have mentioned spending the time + money to setup and maintain and then have someone maintain and manage and support that database server is certainly a possibility here. However, keep in mind that simply migrating a JET based application to sql server in many cases will run slower, and in fact sql server is slower then JET when no network is involved.
So, I would take some time to ascertain why things slow down so much, and also check into how indexing is setup.
So, just keep in mind that it is pure folklore and myth that the whole tables and whole database is transferred over the network. This concept is ONLY DUE to most people really not having any computer training and not knowing and understanding how the JET data engine works.
I would probably move to either Microsoft SQL Server 2008 R2 Express Edition (free) or MySQL (free) if there is both funding and time to put in a data access layer. Because you will be making requests of a remote server and not operating on data at the local workstation this move is very involved from the development standpoint.
However you should analyze whether or not its more cost effective to perform your archival process quarterly or monthly, and just move the archive database to SQL Server 2008 R2 Express Edition. (You can install the Microsoft SQL Server Management Studio client tools on workstations and query the archival database for faster reports on historical data without rewriting your entire production application; similar solutions exist for using MySQL or other OSS/free RDBMS).
I have cilents with 300 mb databases although they should be upsizing to SQL Server for other reasons. 19 Mb is relatively small. If performance is bad enough that archiving speeds things up then check the indexes to the tables for all your sorting and selection fields. Albert gave you a good URL there to check.
Entire MDB files do not go down the wire. Unless you are missing indexes.
Instead of shipping the DB over the network to the client and then performing queries, you could instead write a small wrapper on the server that handles requests, looks up the result in the Access DB (using SQL + the Access ODBC driver), and returns the result. This avoids the overhead of a large migration you might not need and still gets rid of the basic problem the users are experiencing.
Moving to a "proper" database solution is the best long term solution, but if your needs scale linearly and slowly over the next 30 years, it's hard to justify an expensive migration. That said, if you expect to really ramp up, or want to be more "future-proof", migrating now will likely save money/time.
It is my understanding that if we were
using sql, the search time would not
go up as the table gets bigger because
the entire .mdb does not have to be
sent over the network each time. Is
this correct?
This general idea is true for almost all databases. The idea of a database is to separate your application from the actual data. The data resides in a database server. Your application doesn't.
Does anyone have any insight about
whether it could be worth it to go to
the trouble (time and money) of
migrating to a new db
Yes. Having proposed this many times. It's expensive. It's complicated. Your MS-Access database will never get better or faster.
Other database servers will (and can) get faster and more sophisticated. After all, you're not sending .MDB files through a network anymore. The limitations are reduced. You're working with standard SQL through ODBC. Any database will work at the end of ODBC. You can fire vendors to find better, faster, cheaper products. Once you stop using Access you have choices.
Either stop using Access now or plan to suffer with it forever. And remake this decision every year until the end of time.

Single or multiple databases

SQL Server 2008 database design problem.
I'm defining the architecture for a service where site users would manage a large volume of data on multiple websites that they own (100MB average, 1GB maximum per site). I am considering whether to split the databases up such that the core site management tables (users, payments, contact details, login details, products etc) are held in one database, and the database relating to the customer's own websites is held in a separate database.
I am seeing a possible gain in that I can distribute the hardware architecture to provide more meat to the heavy lifting done in the websites database leaving the site management database in a more appropriate area. But I'm also conscious of losing the ability to directly relate the sites to the customers through a Foreign key (as far as I know this can't be done cross database?).
So, the question is two fold - in general terms should data in this sort of scenario be split out into multiple databases, or should it all be held in a single database?
If it is split into multiple, is there a recommended way to protect the integrity and security of the system at the database layer to ensure that there is a strong relationship between the two?
Thanks for your help.
This question and thus my answer may be close to the gray line of subjective, but at the least I think it would be common practice to separate out the 'admin' tables into their own db for what it sounds like you're doing. If you can tie a client to a specific server and db instance then by having separate db instances, it opens up some easy paths for adding servers to add clients. A single db would require you to monkey with various clustering approaches if you got too big.
[edit]Building in the idea early that each client gets it's own DB also just sets the tone for how you develop when it is easy to make structural and organizational changes. Discovering 2 yrs from now you need to do it will become a lot more painful. I've worked with split dbs plenty of times in the past and it really isn't hard to deal with as long as you can establish some idea of what the context is. Here it sounds like you already have the idea that the client is the context.
Just my two cents, like I said, you could be close to subjective on this one.
Single Database Pros
One database to maintain. One database to rule them all, and in the darkness - bind them...
One connection string
Can use Clustering
Separate Database per Customer Pros
Support for customization on per customer basis
Security: No chance of customers seeing each others data
Conclusion
The separate database approach would be valid if you plan to support per customer customization. I don't see the value if otherwise.
You can use link to connect the databases.
Your architecture is smart.
If you can't use a link, you can always replicate critical data to the website database from the users database in a read only mode.
concerning security - The best way is to have a service layer between ASP (or other web lang) and the database - so your databases will be pretty much isolated.
If you expect to have to split the databases across different hardware in the future because of heavy load, I'd say split it now. You can use replication to push copies of some of the tables from the main database to the site management databases. For now, you can run both databases on the same instance of SQL Server and later on, when you need to, you can move some of the databases to a separate machine as your volume grows.
Imagine we have infinitely fast computers, would you split your databases? Of course not. The only reason why we split them is to make it easy for us to scale out at some point. You don't really have any choice here, 100MB-1000MB per client is huge.