Should the query side of a CQRS application call the database directly? - repository

As title says:
Should the query side of CQRS applications call the database direcly in the controllers/handlers and skip application services, domains and repositories?
What if the query logic is complex and/or I also need to publish an event (related to the read operation) to a message broker? In what layer would that logic fit?

The Query side will only contain the methods for getting data, so it can/should be really simple. The domain model from the command side is definitely not part of the query side. The queries are separate from the model we have in our domain. An abstraction on top of your persistence is not required too.
Simple query logic would make your life easier. The secret sauce of CQRS is polyglot persistence. You may maintain multiple denormalized representations of your data, also known as a materialized views, which are tailored to your query needs.You can have multiple projections on your data on different databases depending on your query needs. If you do that, the query side tends to become simple
e.g. if you have a projection of something that is an entity in your domain like a customer then you can persist it in Mongo and query it by id - really simple and performant, if you have some report with multiple orders you can persist those in a relational database and do sql queries - simple and performant. This way you would end up with GET queries that do database queries and return the read models without any additional mapping.
Having said that, I would like to state that this a typical use case, but your read models can also be slightly different queries on the same table of a db. This would make the query a bit more complex, but might be good enough too.
I also don't think that you should publish an event from the query side. What would that event be?

Related

Use SQL or NoSQL?

I'm designing a system that checks a given website for any security vulnerabilities. The system includes a client (firefox plugin) and a server. The server does all the scanning while the client just relays that info to the user. If a website is dangerous, it is blacklisted; otherwise whitelisted.
The system must hypothetically be able to handle several thousands of requests and updates to the database simultaneously.
Although the database is expected to have a very simple structure, I am still considering using NoSQL because my understanding is that it can handle a greater amount of queries. Is this true? Which db technology is better suited for my system?
I suggest a NoSQL database.
In fact I've been working with two databases in the last weeks, and searching on internet I found the differences between a NoSQL an a SQL database.
Pratically, you should use a NoSQL db if you have a lot of data to query. Remind that it's not sure the data recovery in case of a db disaster.
Instead, use a SQL database if your data MUST be permanent, and you can't lose it. But query times will be longer, so it's not suggested if you have tons of data.
I understood, from what you wrote, that you need lot of queries and you "can lose" the data (if you lose a website of the list, you'll just need to re-check it, right?).
So I suggest you to go for a NoSQL db (I worked with MongoDb, it is the most famous worl-wide).
If you consider NoSQL Databases you have to analyze your data to get the right Database.
For your use case I think you should look at document databases (like MongoDB) or, if you want really high performance, a key-value Database like Redis or Riak.
With Key-Value databases you can only use the key to find the data you want.
With document databases you still have some kind of querys to find the data.
For further information look at: http://nosql-database.org/

Querying multiple database servers?

I am working on a database for a monitoring application, and I got all the business logic sorted out. It's all well and good, but one of the requirements is that the monitoring data is to be completely stand-alone.
I'm using a local database on my web-server to do some event handling and caching notifications. Since there is one event row per system on my monitor database, it's easy to just get the id and query the monitoring data if needed, and since this is something only my web server uses, integrity can be enforced externally. Querying is not an issue either, as all the relationships are one-to-one so it's very straight forward.
My problem comes with user administration. My original plan had it on yet another database (to meet the requirement of leaving the monitoring database alone), but I don't think I was thinking straight when I thought of that. I can get all the ids of the systems a user has access to easily enough, but how then can I efficiently pass that to a query on the other database? Is there a solution for this? Making a chain of ors seems like an ugly and buggy solution.
I assume this kind of problem isn't that uncommon? What do most developers do when they have to integrate different database servers? In any case, I am leaning towards just talking my employer into putting user administration data in the same database, but I want to know if this kind of thing can be done.
There are a few ways to accomplish what you are after:
Use concepts like linked servers (SQL Server - http://msdn.microsoft.com/en-us/library/ms188279.aspx)
Individual connection strings within your front end driving the database layer
Use things like replication to duplicate the data
Also, the concept of multiple databases on a single database server instance seems like it would not be violating your business requirements, and I investigate that as a starting point, with the details you have given.

Allow some SQL in public API?

I'm exposing a more or less public API that allows the user to query datasets from a database. Since the user will need to filter out specific datasets I'm tempted to accept the WHERE-part of the SELECT statement as an API parameter. Thus the user could perform queries as complex as she'd like without worrying about a cluttered API interface.
I'm aware of the fact that I would have to catch SQL-injection attempts.
Do you think that this would circumvent the purpose of an API wrapping a database too much or would you consider this a sane approach?
In general, I'd recommend against letting them embed actual sql in their requests
You can allow them to submit where conditions in their request pretty easily:
<where>
<condition "field"="name" "operator"="equal" "value"="Fred"/>
</where>
or something similar.
The value of doing this is muli-fold:
You parse each condition and make sure they're correct before running them
You can create 'fake' fields, such as "full_name" that may not exist.
You can limit the columns they can put conditions on
You can isolate the users from actual changes in your underlying database.
I think the last point is actually most important. The day will come when you'll need to make changes to the underlying schema of the database. Eventually, it will happen. At that point you'll appreciate having some 'translation' layer between what the users send in and the queries. It will allow you to isolate the users from actual changes in the underlying database.
The API should present an 'abstracted' version of the actual tables themselves that meet the users needs and isolate them from changes to the actual underlying database.
I would recommend limitting your users account by modifying the permissions to only allow the user to SELECT from tables. Don't allow updating, inserting, or deleting recordsets. Lock down the user as much as possibile, possibly at a table level.
If the WHERE clause is limited to only a few columns and the comparator is limited to >, = or < then perhaps you could just have the user pass in some extra parameters to represent columns and comparators. You then build the WHERE safely on your server side.
If this is too messy then by all means let them pass a full WHERE clause - it's not too hard to sanitise and if you combine that with running the query under a locked-down account (SELECT only), then any potential damage is limited.
Personally I would not want to allow users to be able to pass in SQL directly to my database, the risks are too great.
If you fail to catch all injection attempts you risk either data theft, someone just destroying your database or hijacking it for some other use that you really dont want.

Is directly executing SQL bad app design?

I'm developing an iOS application that's a manager/viewer for another project. The idea is the app will be able to process the data stored in a database into a number of visualizations-- the overall effect being similar to cacti. I'm making the visualizations fully user-configurable: the user defines what she wants to see and adds restrictions.
She might specify, for instance, to graph a metric over the last three weeks with user accounts that are currently active and aren't based in the United States.
My problem is that the only design I can think of is more or less passing direct SQL from the iOS app to the backend server to be executed against the database. I know it's bad practice and everything should be written in terms of stored procedures. But how else do I maintain enough flexiblity to keep fully user-defined queries?
While the application does compose the SQL, direct SQL is never visible or injectable by the user. That's all abstracted away in UIDateTimeChoosers, UIPickerViews, and the like.
Is all of the data in the database available to all of the users, or do you only permit each user to access a subset of the data? If the latter, simply restricting the database login to read-only access isn't enough to secure your data.
As a trivial example, a user could compromise your interface in order to submit the query SELECT password, salt FROM users WHERE login = 'admin', hijack the response to get at the raw data, and brute force your admin password. As the popularity of an app grows, the pool of malicious users grows more than linearly, until eventually their collective intelligence exceeds that of your team; you shouldn't put yourself in a situation where success will be your downfall.
You could take the SQL query sent by the client application and try to parse it server-side in order to apply appropriate restrictions on the query, to fence the user in, so to speak. But getting there would require you to write a mini SQL parser in your server code, and who wants to do all that work? It's much easier to write code that can write SQL than it is to write code that can read it.
My team solved a similar problem for a reporting interface in a rather complex web application, and our approach went something like this:
Since you already intend to use a graphical interface to build the query, it would be fairly easy to turn the raw data from the interface elements into a data structure that represents the user's input (and in turn, the query). For example, a user might specify, using your interface, the condition that they want the results to be confined to those collected on May 5, 2010 by everyone but John. (Suppose that John's UserID is 3.) Using a variant of the JSON format my team used, you would simply rip that data from the UI into something like:
{ "ConditionType": "AND",
"Clauses": [
{ "Operator": "Equals",
"Operands": [
{ "Column": "CollectedDate" },
{ "Value": "2010-05-05" }
]
},
{ "Operator": "NotEquals",
"Operands": [
{ "Column": "CollectedByUserID" },
{ "Value": 3 }
]
}
]
}
On the client side, creating this kind of data structure is pretty much isomorphic to the task of creating an SQL query, and is perhaps somewhat easier, since you don't have to worry about SQL syntax.
There are subtleties here that I'm glossing over. This only represents the WHERE part of the query, and would have to live in a larger object ({ "Select": ..., "From": ..., "Where": ..., "OrderBy": ... }). More complicated scenarios are possible, as well. For example, if you require the user to be able to specify multiple tables and how they JOIN together, you have to be more specific when specifying a column as a operand in a WHERE clause. But again, all of this is work you would have to do anyway to build the query directly.
The server would then deserialize this structure. (It's worth pointing out that the column names provided by the user shouldn't be taken dirty – we mapped them onto a list of allowed columns in our application; if the column wasn't on the list, deserialization failed and the user got an error message.) With a simple object structure to work with, making changes to the query is almost trivial. The server application can modify the list of WHERE clauses to apply appropriate data access restrictions. For example, you might say (in pseudo-code) Query.WhereClauses.Add(new WhereClause(Operator: 'Equals', Operands: { 'User.UserID', LoggedInUser.UserID } )).
The server code then passes the object into a relatively simple query builder that walks the object and splits back an SQL query string. This is easier than it sounds, but make sure that all of the user-provided parameters are passed in cleanly. Don't sanitize – use parameterized queries.
This approach ultimately worked out really nicely for us, for a few reasons:
It allowed us to break up the complexity of composing a query from a graphical interface.
It ensured that user-generated queries were never executed dirty.
It enabled us to add arbitrary clauses to queries for various kinds of access restrictions.
It was extensible enough that we were able to do nifty things like allowing users to search on custom fields.
On the surface, it may seem like a complex solution, but my team found that the benefits were many and the implementation was clean and maintainable.
EDIT: I have come to dislike my answer here. I agree with some of the commenters below, and I would like to recommend that you build "Query" objects on the client and pass those to a web service which constructs the SQL statement using prepared statements. This is safe from SQL injection because you are using prepared statements, and you can control the security of what is being constructed in the web service which you control.
End of Edit
There is nothing wrong with executing SQL passed from the client. Especially in query building situations.
For example, you can add as many where clauses by joining them with "AND". However, what you should not do is allow a user to specify what the SQL is. You should instead provide an interface that allows your users to build the queries. There are a couple reasons this is advantageous:
Better user experience (who wants to write SQL other than developers?)
Safer from injection. There is just no way you could possibly filter out all dangerous SQL strings.
Other than that, it's absolutely fine to execute dynamic SQL instead of using a stored procedure. Your view that everything should be written in terms of stored procedures seems misguided to me. Sure, stored procedures are nice in a lot of ways, but there are also many downsides to using them.
In fact, overuse of stored procs sometimes leads to performance problems since developers reuse the same stored procedure in multiple places even when they don't need all the data it returns.
One thing you might want to look into though is building the SQL on the server side and passing over some kind of internal representation of the built query. If you have some kind of web service which is exposed and allows your client to run whatever SQL it wants to run, then you have a security concern. This would also help in versioning. If you modify the database, you can modify the web service with it and not worry about people using old clients building invalid SQL.
I see this fully user-configurable visualizations more like building blocks.
I wouldn't pass direct sql queries to the back-end. I would make the user send parameters (wich view to use, filters in the where clause, so on). But letting the user inject sql it's a potential nightmare (both for security and maintenance)
If you want to let users send over actual sql, try filtering words like "drop and truncate." If you have to allow deletes, you can enforce that they use a primary key.
There is nothing wrong about an application sending SQL commands to a database, as long as you are aware of injection issues. So don't do this in you're code:
(Pseudocode)
String sqlCommand = "SELECT something FROM YOURTABLE WHERE A='" + aTextInputFieldInYourGui + "'";
cmd.execute(sqlCommand);
Why not? See what happens if the user enters this line into aTextInputFieldInYourGui
' GO DELETE * FROM YOURTABLE GO SELECT '
(assuming your DB is MS SQL Server here, for other RDBMS slightly different syntax)
Use prepared statements and Parameterbinding instead
(Pseudocode)
String sqlCommand = "SELECT something FROM YOURTABLE WHERE A=?";
cmd.prepare(sqlCommand);
cmd.bindParam(1, aTextInputFieldInYourGui);
cmd.execute();
Regards

What's the best way to insert/update/delete multiple records in a database from an application?

Given a small set of entities (say, 10 or fewer) to insert, delete, or update in an application, what is the best way to perform the necessary database operations? Should multiple queries be issued, one for each entity to be affected? Or should some sort of XML construct that can be parsed by the database engine be used, so that only one command needs to be issued?
I ask this because a common pattern at my current shop seems to be to format up an XML document containing all the changes, then send that string to the database to be processed by the database engine's XML functionality. However, using XML in this way seems rather cumbersome given the simple nature of the task to be performed.
It depends on how many you need to do, and how fast the operations need to run. If it's only a few, then doing them one at a time with whatever mechanism you have for doing single operations will work fine.
If you need to do thousands or more, and it needs to run quickly, you should re-use the connection and command, changing the arguments for the parameters to the query during each iteration. This will minimize resource usage. You don't want to re-create the connection and command for each operation.
You didn't mention what database you are using, but in SQL Server 2008, you can use table variables to pass complex data like this to a stored procedure. Parse it there and perform your operations. For more info, see Scott Allen's article on ode to code.
Most databases support BULK UPDATE or BULK DELETE operations.
From a "business entity" design standpoint, if you are doing different operations on each of a set of entities, you should have each entity handle its own persistence.
If there are common batch activities (like "delete all older than x date", for instance), I would write a static method on a collection class that executes the batch update or delete. I generally let entities handle their own inserts atomically.
The answer depends on the volume of data you're talking about. If you've got a fairly small set of records in memory that you need to synchronise back to disk then multiple queries is probably appropriate. If it's a larger set of data you need to look at other options.
I recently had to implement a mechanism where an external data feed gave me ~17,000 rows of dta that I needed to synchronise with a local table. The solution I chose there was to load the external data into a staging table and call a stored proc that did the synchronisation completely within the database.