I'm designing a system that checks a given website for any security vulnerabilities. The system includes a client (firefox plugin) and a server. The server does all the scanning while the client just relays that info to the user. If a website is dangerous, it is blacklisted; otherwise whitelisted.
The system must hypothetically be able to handle several thousands of requests and updates to the database simultaneously.
Although the database is expected to have a very simple structure, I am still considering using NoSQL because my understanding is that it can handle a greater amount of queries. Is this true? Which db technology is better suited for my system?
I suggest a NoSQL database.
In fact I've been working with two databases in the last weeks, and searching on internet I found the differences between a NoSQL an a SQL database.
Pratically, you should use a NoSQL db if you have a lot of data to query. Remind that it's not sure the data recovery in case of a db disaster.
Instead, use a SQL database if your data MUST be permanent, and you can't lose it. But query times will be longer, so it's not suggested if you have tons of data.
I understood, from what you wrote, that you need lot of queries and you "can lose" the data (if you lose a website of the list, you'll just need to re-check it, right?).
So I suggest you to go for a NoSQL db (I worked with MongoDb, it is the most famous worl-wide).
If you consider NoSQL Databases you have to analyze your data to get the right Database.
For your use case I think you should look at document databases (like MongoDB) or, if you want really high performance, a key-value Database like Redis or Riak.
With Key-Value databases you can only use the key to find the data you want.
With document databases you still have some kind of querys to find the data.
For further information look at: http://nosql-database.org/
I'm currently developing a service for an App with WCF. I want to host this data on windows-azure and it should host data from differed users. I'm searching for the right design of my database. In my opinion there are only two differed possibilities:
Create a new database for every customer
Store a customer-id to every table (or the main table when every table is connected via entities)
The first approach has very good speed and isolating, but it's very expansive on windows azure (or am I understanding something of the azure pricing wrong?). Also I don't know how to configure a WCF- Service that way, that it always use another database.
The second approach is low on speed and the isolating is poor. But it's easy to implement and cheaper.
Now to my question:
Is there any other way to get high isolation of data and also easy integration in a WCF- service using azure?
What design should I use and why?
You have two additional options: build multiple schema containers within a database (see my blog post about this technique), or even better use SQL Database Federations (you can use my open-source project called Enzo SQL Shard to access federations). The links I am providing give you access to other options as well.
In the end it's a rather complex decision that involves a tradeoff of performance, security and manageability. I usually recommend Federations, even if it has its own set of limitations, because it is a flexible multitenant option for the cloud with the option to filter data automatically. Check out the open source project - you will see how to implement good separation of customer of data independently of the physical storage.
I'm working on an .NET MVC SQL application that will contain sensitive data, for example- HIV test results or income. I want to error-proof this privacy as much as possible so no one except the user can access it (think Joe the Plumber having his information hacked by a state employee).
I read hear that splitting the database in two doesn't seem reasonable:
Is splitting databases a legitimate security measure?
although I've heard of this being done. If we could just use two tables... better.
But when I say error-proofing, I mean impossible for ANYONE in our company to access both databases/tables. I'm thinking about putting access to the application code (which would access both databases) and to both databases in the hands of a deep-pockets third party (like PWC or EY) for when the government came calling or some other real need to see both data sources came along.
Anyone have any thoughts on the cleanest way to do this? We'd want to design the tables such that most queries would not require access to both data sources so the relative cost in throughput wouldn't be that much.
You can encrypt a column of data in SQL. So the columns which has the sensitive data e.g. HIV test results/income, you can encrypt the data while storing it in the DB.
Check the details here:
http://msdn.microsoft.com/en-us/library/ms179331.aspx
http://msdn.microsoft.com/en-us/library/bb964742.aspx
Let me know if it helps.
I'm a lead developer on a project which is building web applications for my companies SaaS offering. We are currently using LDAP to store user data such as IDs, passwords, contanct details, preferences and other user specific data.
One of the applications we are building is a reporting service that will both collect and present management information to our end users. Obviously this service will require a RDBMS but it will also need to access user data stored in LDAP.
As I see it we have a two basic implementation options:
Duplicate user data in both LDAP and the RDBMS.
Have the reporting service access LDAP whenever it needs user data.
Although duplicating data (and implementing the mechanisms to make this happen) as suggested in option 1 seems the wrong way to go, my gut feeling is that option 2 would not perform well enough (how do you 'join' LDAP data to RDBMS data as efficiently as a pure RDBMS implementation?).
I did find a related question but I'm still unsure which approach to take. I'd be interested in seeing what people thought of either option or perhaps other options.
Why would you feel that duplicating data would be the wrong way to go? Reporting tools (web based and otherwise) are mostly built around RDBMS's, so any mix'n'match will introduce unnecessary complexities. Reports are likely to need to be changed fairly frequently (from experience), so you want them to be as simple as possible. The data you store about users is unlikely to change its format very often, so once you have your import function working, you won't need to touch it again.
The only obstacle I can see is latency: how do you ensure that your RDBMS copy is up to date? You might need to ensure that your updating code writes to both destinations. Personally, also, I wouldn't necessarily use LDAP for application specific personal preferences: LDAP can't handle transactions, so what happens when data is updated from several directions? (Transactionality is of course also a problem with letting updaters write to both stores...) I'd rather let the RDBMS be the master for most data, and let LDAP worry only about identity, credentials and entitlements, which are rarely changed and only for one set of purposes. For myself, LDAP's ability to deal with hierarchical data isn't all that great a selling point.
Data duplication is not always a bad thing, especially when the usage scenarios are different enough.
I'm no beginner to using SQL databases, and in particular SQL Server. However, I've been primarily a SQL 2000 guy and I've always been confused by schemas in 2005+. Yes, I know the basic definition of a schema, but what are they really used for in a typical SQL Server deployment?
I've always just used the default schema. Why would I want to create specialized schemas? Why would I assign any of the built-in schemas?
EDIT: To clarify, I guess I'm looking for the benefits of schemas. If you're only going to use it as a security scheme, it seems like database roles already filled that.. er.. um.. role. And using it a as a namespace specifier seems to have been something you could have done with ownership (dbo versus user, etc..).
I guess what I'm getting at is, what do Schemas do that you couldn't do with owners and roles? What are their specifc benefits?
Schemas logically group tables, procedures, views together. All employee-related objects in the employee schema, etc.
You can also give permissions to just one schema, so that users can only see the schema they have access to and nothing else.
Just like Namespace of C# codes.
They can also provide a kind of naming collision protection for plugin data. For example, the new Change Data Capture feature in SQL Server 2008 puts the tables it uses in a separate cdc schema. This way, they don't have to worry about a naming conflict between a CDC table and a real table used in the database, and for that matter can deliberately shadow the names of the real tables.
I know it's an old thread, but I just looked into schemas myself and think the following could be another good candidate for schema usage:
In a Datawarehouse, with data coming from different sources, you can use a different schema for each source, and then e.g. control access based on the schemas. Also avoids the possible naming collisions between the various source, as another poster replied above.
If you keep your schema discrete then you can scale an application by deploying a given schema to a new DB server. (This assumes you have an application or system which is big enough to have distinct functionality).
An example, consider a system that performs logging. All logging tables and SPs are in the [logging] schema. Logging is a good example because it is rare (if ever) that other functionality in the system would overlap (that is join to) objects in the logging schema.
A hint for using this technique -- have a different connection string for each schema in your application / system. Then you deploy the schema elements to a new server and change your connection string when you need to scale.
At an ORACLE shop I worked at for many years, schemas were used to encapsulate procedures (and packages) that applied to different front-end applications. A different 'API' schema for each application often made sense as the use cases, users, and system requirements were quite different. For example, one 'API' schema was for a development/configuration application only to be used by developers. Another 'API' schema was for accessing the client data via views and procedures (searches). Another 'API' schema encapsulated code that was used for synchronizing development/configuration and client data with an application that had it's own database. Some of these 'API' schemas, under the covers, would still share common procedures and functions with eachother (via other 'COMMON' schemas) where it made sense.
I will say that not having a schema is probably not the end of the world, though it can be very helpful. Really, it is the lack of packages in SQL Server that really creates problems in my mind... but that is a different topic.
I tend to agree with Brent on this one... see this discussion here. http://www.brentozar.com/archive/2010/05/why-use-schemas/
In short... schemas aren't terribly useful except for very specific use cases. Makes things messy. Do not use them if you can help it. And try to obey the K(eep) I(t) S(imple) S(tupid) rule.
I don't see the benefit in aliasing out users tied to Schemas. Here is why....
Most people connect their user accounts to databases via roles initially, As soon as you assign a user to either the sysadmin, or the database role db_owner, in any form, that account is either aliased to the "dbo" user account, or has full permissions on a database. Once that occurs, no matter how you assign yourself to a scheme beyond your default schema (which has the same name as your user account), those dbo rights are assigned to those object you create under your user and schema. Its kinda pointless.....and just a namespace and confuses true ownership on those objects. Its poor design if you ask me....whomever designed it.
What they should have done is created "Groups", and thrown out schemas and role and just allow you to tier groups of groups in any combination you like, then at each tier tell the system if permissions are inherited, denied, or overwritten with custom ones. This would have been so much more intuitive and allowed DBA's to better control who the real owners are on those objects. Right now its implied in most cases the dbo default SQL Server user has those rights....not the user.
I think schemas are like a lot of new features (whether to SQL Server or any other software tool). You need to carefully evaluate whether the benefit of adding it to your development kit offsets the loss of simplicity in design and implementation.
It looks to me like schemas are roughly equivalent to optional namespaces. If you're in a situation where object names are colliding and the granularity of permissions is not fine enough, here's a tool. (I'd be inclined to say there might be design issues that should be dealt with at a more fundamental level first.)
The problem can be that, if it's there, some developers will start casually using it for short-term benefit; and once it's in there it can become kudzu.
In SQL Server 2000, objects created were linked to that particular user, like if a user, say
Sam creates an object, say, Employees, that table would appear like: Sam.Employees. What
about if Sam is leaving the compnay or moves to so other business area. As soon you delete
the user Sam, what would happen to Sam.Employees table? Probably, you would have to change
the ownership first from Sam.Employees to dbo.Employess. Schema provides a solution to
overcome this problem. Sam can create all his object within a schemam such as Emp_Schema.
Now, if he creates an object Employees within Emp_Schema then the object would be
referred to as Emp_Schema.Employees. Even if the user account Sam needs to be deleted, the
schema would not be affected.
development - each of our devs get their own schema as a sandbox to play in.
Here a good implementation example of using schemas with SQL Server. We had several ms-access applications. We wanted to convert those to a ASP.NET App portal. Every ms-access application is written as an App for that portal. Every ms-access application has its own database tables. Some of those are related, we put those in the common dbo schema of SQL Server. The rest gets its own schemas. That way if we want to know what tables belong to an App on the ASP.NET app portal that can easily be navigated, visualised and maintained.