I am developing an application which mimics in someways as notebook. The users login in to the web application, connect to the data-source(database/csv) and then write series of SQL queries. These are typically queries used to compute metrics.
The workflow subsequently is to run these queries on periodic basis to compute the metrics and persist them as time-series data.
Since the users can write SQL queries here, what would be the suggested approach to persist the query in a backend store?
You can store SQL queries as text.
But you must not allow users to define SQL queries and then run them verbatim, without first having a human vet the query to make sure it's not malicious (or even simply errors or unwise queries).
Related
I have a Java application that does a POST with the sql query that is typed in the UI and is executed using JDBC. Since the query is user defined, I'm unable to find a way to prevent the SQL injection issue. For instance if this is the query the user issues :
select * from test_table where id=123
a POST is done with this string to the servlet and this is executed as a query. Is there anyway to get around this since there is no restriction on what user can send in?
Thanks
Technically if the user is allowed to write the entire query, it's not an injection attack risk, it's simply an attack risk
Run the query using a database user that has permission only to carry out the types of operations you deem acceptable on the tables you're willing to give access to.
For example, only permit SELECT on tableX, tableY and tableZ. No DML, no DDL and no selecting from any other table
If your dbms of choice doesn't allow fine grained control in this way then instead execute a regular batch script that creates another database containing only a few tables. Permit your users to query this new db. If it does get wrecked it will soon be dropped and replaced by a working one with updated data, by the script. This is also beneficial if placed on another server, it stops your live system from being innocently DOSed by a user executing a duff query that takes up all resource on the server
SQL injection would be passing select * from test_table where id=123 in place of a parameter.
Not sure exactly what information you are letting the application use, but I would suggest granting access only to a specific schema. That would provide a consistent security model.
As others have suggested, this is not SQL injection - I call this a "designed in" SQL injection. How you deal with it depends on the use case:
Design a separate interface that does not require the full SQL statement
As Caius suggested, if you can limit the privs in the DB account to only do what the user can do, that would limit the damage
If this is an administrative interface, you may want to limit the usage of this interface to "trusted" users. If you go that route, you want to be very careful to document that users with this privilege have full access to the database, and provide an auditing mechanism to make sure that that list of users is well known.
It is not realistically possible to limit the SQL statement through validation - its a powerful language, especially in the context of modern databases.
See also this related question
Is there anyway to get around this since there is no restriction on what user can send in?
I'm not sure what you mean by "get around." Is it not the design of this application to allow users to run any query?
If you want to prevent them from running unauthorized queries, then you'll have to implement some Java code in the servlet to check the query and decide whether it's one they're authorized to run.
Some people do this by whitelisting a specific set of known queries. Just match the user's input query against the whitelist.
If they can run a given query with a variety of different constant values, then replace constant values with a ? in both the whitelisted form and in a copy of the user's input SQL query.
If they can run a variety of different queries, like with optional clauses and stuff, so that it's impossible to make a whitelist of finite length, then you'll have to implement a SQL parser in your Java servlet and some kind of business rule engine to decide if their query is authorized before you run it against the real database.
At this point, it seems easier to change the application front-end so that users are not allowed to submit arbitrary SQL queries!
Im designing a UWP app that uses an SQLite database to store its information. From previous research I have blearnt that using the SQLite function SQLiteConnection.Update() and SQLiteConnetion.Insert() functions are safe to use as the inputs are sanitised before entering in the database.
The next step I need to do is sync that data with an online database - in this case SQL Server - using a service layer as my go between. Given that the data was previously sanitised by the SQLite database insert, do I still need to parameterise the object values using the service layer before they are passed to my SQL Server database?
The simple assumption says yes because, despite them being sanitised by the SQLite input, they are technically still raw strings that could have an effect on the main database if not parameterised when sending them there.
Should I just simply employ the idea of "If in doubt, parameterise" ?
I would say that you should always use SQL parameters. There are a few reasons why you should do so:
Security.
Performance. If you use parameters the reuse of execution plans could increase. For details see this article.
Reliability. It is always easier to make a mistake if you build SQL commands by concatenating strings.
I know that OLAP is used in Power Pivot, as far as I know, to speed up interacting with data.
But I know that big data databases like Google BigQuery and Amazon RedShift have appeared in the last few years. Do SQL targeted BI solutions like Looker and Chart.io use OLAPs or do they rely on the speed of the databases?
Looker relies on the speed of the database but does model the data to help with speed. Mode and Periscope are similar to this. Not sure about Chartio.
OLAP was used to organize data to help with query speeds. While used by many BI products like Power Pivot and Pentaho, several companies have built their own ways of organizing data to help with query speed. Sometimes this includes storing data in their own data structures to organize the data. Many cloud BI companies like Birst, Domo and Gooddata do this.
Looker created a modeling language called LookML to model data stored in a data store. As databases are now faster than they were when OLAP was created, Looker took the approach of connecting directly to the data store (Redshift, BigQuery, Snowflake, MySQL, etc) to query the data. The LookML model allows the user to interface with the data and then run the query to get results in a table or visualization.
That depends. I have some experience with BI solution (for example, we worked with Tableau), and it can operate is two main modes: It can execute the query against your server, or can collect the relevant data and store it on the user's machine (or on the server where the app installed). When working with large volumes, we used to make Tableau query the SQL Server itself, that's because our SQL Server machine is very strong compared to the other machines we had.
In any way, even if you store the data locally and want to "refresh" it, when it updates the data it needs to retrieve it from the database, which sometimes can also be an expensive operation (depends on how your data is built and organized).
You should also notice that you compare 2 different families of products: while Google BigQuery and Amazon's RedShift are actually database engines that used to store the data and also query it, most of the BI and reporting solutions are more concerend about querying the data and visualizing it and therefore (generally speaking) are less focused on having smart internal databases (at least from my experience).
I have windows server 2008 r2 with microsoft sql server installed.
In my application, I am currently designing a tool for my users, that is querying database to see, if user has any notifications. Since my users can access the application multiple times in a short timespan, i was thinking about putting some kind of a cache on my query logic. But then I thought, that my ms sql server probably does that already for me. Am I right? Or do I need to configure something to make it happen? If it does, then for how long does it keep the cache up?
It's safe to assume that MSSQL will has the caching worked out pretty well =)
Don't bother trying to build anything yourself on top of it, simply make sure that the method you use to query for changes is efficient (eg. don't query on non-indexed columns).
PS: wouldn't caching locally defeat the whole purpose of checking for changes on the database?
Internally the database does all sorts of things, including 'caching', but at all times it works incredibly hard to make sure your users see up-to-date data. So it has to do some work each time your application makes a request.
If you want to reduce the workload by keeping static data in your application then you have to implement it yourself.
The later versions of the .net framework have caching features built in so you should take a look at those (building your own caching can get very complex).
SQL Server will handle caching for you, yes. When you create a query or a stored procedure SQL Server will cache that execution plan and reuse it accordingly. From MSDN:
SQL Server execution plans have the following main components: Query
Plan The bulk of the execution plan is a re-entrant, read-only data
structure used by any number of users. This is referred to as the
query plan. No user context is stored in the query plan. There are
never more than one or two copies of the query plan in memory: one
copy for all serial executions and another for all parallel
executions. The parallel copy covers all parallel executions,
regardless of their degree of parallelism.
Execution Context, each user that is currently executing the query has a data structure that holds
the data specific to their execution, such as parameter values. This
data structure is referred to as the execution context. The execution
context data structures are reused. If a user executes a query and one
of the structures is not being used, it is reinitialized with the
context for the new user.
If you wish to clear this cache you can execute sp_recompile or DBCC FREEPROCHCACHE
It seems that one could stop all threat of Sql injection once and for all by simply rejecting all queries that don't use named parameters. Any way to configure Sql server to do that? Or else any way to enforce that at the application level by inspecting each query without writing an entire SQL parser? Thanks.
Remove the grants for a role to be able to SELECT/UPDATE/INSERT/DELETE against the table(s) involved
Grant EXECUTE on the role for stored procedures/functions/etc
Associate the role to database user(s) you want to secure
It won't stop an account that also has the ability to GRANT access, but it will stop the users associated to the role (assuming no other grants on a per user basis) from being able to execute queries outside of the stored procedure/functions/etc that exist.
There are only a couple ways to do this. OMG Ponies has the best answer: don't allow direct sql statements against your database and instead leverage the tools and security sql server can provide.
An alternative way would be to add an additional tier which all queries would have to go through. In short you'd pass all queries (SOA architecture) to a new app which would evaluate the query for passing on to sql server. I've seen exactly one company do this in reaction to sql injection issues their site had.
Of course, this is a horrible way of doing things because SQL injection is only one potential problem.
Beyond SQL Injection, you also have issues of what happens when the site itself is cracked. Once you can write a new page to a web server it becomes trivial to pass any query you want to the associated database server. This would easily bypass any code level thing you could put in place. And it would allow the attacker to just write select * from ... or truncate table ... Heck, an internal person could potentially just directly connect to the sql server using the sites credentials and run any query they wanted.
The point is, if you leverage the security built into sql server to prevent direct table access then you can control through stored procedures the full range of actions availble to anyone attempting to connect to the server.
And how do you want to check for that? Queries sometimes have constant values that would just as easy be added to the query. For instance, I have a database that is prepared to be multi lingual, but not all code is, so my query looks like this:
SELECT NAME FROM SOMETABLE WHERE ID = :ID AND LANGUAGEID = 1
The ID is a parameter, but the language id isn't. Should this query be blocked?
You ask to block queries that don't use named parameters. That can be easily enforced. Just block any query that doesn't specify any parameters. You can do this in your application layer. But it will be hard to block queries like the one above, where one value is a parameter and the other one isn't. You'll need to parse that query to detect it, and it will be hard too.
I don't think sql server has any built in features to do this.