"Safely" allow users to search with SQL - sql

For example I've often wanted to search stackoverflow with
SELECT whatever FROM questions WHERE
views * N + votes * M > answers AND NOT(answered) ORDER BY views;
or something like that.
Is there any reasonable way to allow users to use SQL as a search/filter language?
I see a few problems with it:
Accessing/changing stuff (a carefully setup user account should fix that)
SQL injection (given the previous the worst they should be able to do is get back junk and crash there session).
DOS attacks with pathological queries
What indexes do you give them?
Edit: I'd like to allow joins and what not as well.

Accessing/changing stuff
No problem, just run the query with a crippled user, with permissions only to select
SQL injection
Just sanitize the query
DOS attacks
Time-out the query and throttle the access by IP. I guess you can also throttle the CPU usage in some servers

If you do SQLEncode your users' input (and make sure to remove all ; as well!), I see no huge safety flaw (other than that we're still handing nukes out to psychos...) in having three input boxes - one for table, one for columns and one for conditions. They won't be able to have strings in their conditions, but queries like your example should work. You will do the actual pasting together of the SQL statement, so you'll be in control of what is actually executed. If your setup is good enough you'll be safe.
BUT, I wouldn't for my life let my user enter SQL like that. If you want to really customize search options, give either a bunch of flags for the search field, or a bunch of form elements that can be combined at will.
Another option is to invent some kind of "markup language", sort of like Markdown (the framework SO uses for formatting all these questions and answers...), that you can translate to SQL. Then you can make sure that only "harmless" selects are performed, and you can protect user data etc.
In fact, if you ever implement this, you should see if you could run the commands from a separate account on the SQL server, which only has access to the very basic needs, and obviously only read access.

Facebook does this with FQL. See the blog post or presentation.

I just thought of a strong sanitize method that could be used to restrict what can be used.
Use MySQL and grab it's lex/yacc files
use the lex file as is
gut the yacc file to only the things you want to allow
use action rules that spit out the input on success.

Related

Updating friendly name of a liferay page through SQL

Is there a way to update the Liferay's site page's friendly name through a SQL script?
We generally do this in the control panel through admin user.
While #steven35's answer might do the job, you're hitting a pet peeve of mine. On a different level, you're doing it right if you're doing it on the Control Panel, or through the API and you should not think about a way to ever write to Liferay's database. It might work for the moment, but it might also fail in unforeseen ways - sometimes long after your update.
There have been enough samples for this to happen. If you're changing data while Liferay is running, the cache will not be updated. In case these values are also indexed in the search index, they won't be updated there and random later uses might not find the correct page without you reindexing everything. The same value might be stored somewhere else - or translated. Numerous conditions can fail - and there's always one condition more than you expect and cater for. That one condition might break your neck.
Granted, the friendly name of a page might not fall into the most complex of these cases, but just don't get into the habit of writing to Liferay's database. Or, if you do, don't complain about future upgrades failing or requiring extra work, because the database contains values that the API didn't expect. The problem is that during the next upgrade (if you do it in - say - one year) you'll long have forgotten that you manually changed data in the database and blame Liferay for problems during your upgrade.
Changing data is exactly what the UI and the API are for.
Friendly urls are stored in LayoutFriendlyURL.friendlyURL in your Liferay database so the following query should work
UPDATE "yourdatabase"."LayoutFriendlyURL" SET "friendlyURL"="/newurl" WHERE "layoutFriendlyURLId"=12345;
You will also need to update the Layout table accordingly to match the new friendly url.

GetOrCreate in RavenDB, or a better alternative?

I have just started using RavenDB on a personal project and so far inserting, updating and querying have all been very easy to implement. However, I have come across a situation where I need a GetOrCreate method and I'm wondering what the best way to achieve this is.
Specifically I am integrating with OpenID and once authentication has taken place the user is redirected to my site. At this point I'd either like to retrieve their user record from Raven (by querying on the ClaimsIdentifier property) or create a new record. The user's ID is currently being set by Raven.
Obviously I can write this in two statements but without some sort of transaction around the select and the create I could potentially end up with two user records in the database with the same claims identifier.
Is there anyway to achieve this kind of functionality? Possibly even more importantly is do you think I'm going down the wrong path. I'm assuming even if I could create a transaction it would make scaling out to multiple servers difficult and in anycase could add a performance bottle-neck.
Would a better approach be to have the Query and Create operations as separate statements and check for duplicates when the user is retrieved and merge at that point. Or do something similar but on a scheduled task?
I can't help but feel I'm missing something obvious here so any advice on this problem would be greatly appreciated.
Note: while scaling out to multiple servers may seem unnessecary for a personal project, I'm using it as an evaluation of Raven before using it in work.
Dan, although RavenDB has support for transactions, I wouldn't go that way in your case. Instead, you could just use the users ClaimsIdentifier as the user documents id, because they are granted to be unique.
Alternatively, you can also stay with user ids being generated by Raven (HiLo btw) and use the new UniqueConstraintsBundle, which lets you attribute certain properties to be unique. Internally it will create an additional document that has the value of your unique property as its id.

How does enterprise search display results for the user and hide unauthorized results?

I am looking to understand how enterprise search solutions tackle the issue of user-permissions.
My question is on displaying the search results for users. The naive approach would display the search results to the user, and then if the user clicks a document he is not authorized to see, he will fail to open it. However, it is even forbidden to display a document's title or excerpt if the user does not have permission to read it. So do the various enterprise earch engines:
index each document together with its ACL?
index all documents with no permission info, but check each link in every search result to see whether the querying user has permission to view this link?
Option #2 makes more sense to me, but also seems much slower than option #1.
Option #1 suffers from the need to constantly update the changes in permissions on the indexed documents.
I am looking to understand what is the common approach in the existing solutions in the market today. Is there a third option?
I'm surprised to see that this 5 year old question hasn't got any answers, as I think it's quite a common and important problem in enterprise search.
As outlined in the question there are two common approaches to deal with document-level security:
early-binding-security: indexing ACL's along with the content, and
late-binding-security: handling security at query-time, by filtering out protected results
Handling security on content side only is never recommended as at that point in time confidential information might already have been revealed (e.g. title or preview of a document in the search result).
The advantage of implementing security with a late-binding approach is, that it's very flexible, because there is no need to re-index content upon changed ACLs. The biggest drawback however is, that by doing so, confidential information might be leaked via facet values, and it's not possible to retrieve and display correct facet counts. It also more difficult to properly populate the result list and handle pagination. Last but not least, this approach can significantly slow down the performance.
The advantage of implementing security with an early-binding approach is, that it addresses all of the above disadvantages for the price of re-indexing the content as soon as ACLs change. However, leaks are still possible, e.g. when a group membership or ACL just got changed and isn't reflected yet in the search index. To address this gap the two approaches early-binding and late-binding are often combined.
Last but not least there might be a third option, depending on the Enterprise Search Platform you are using: Attivio's Active Security is based on query time joins, which allows to index security information independent from the document itself, but at query time merges the two documents to ensure that only authorised content makes it into the search results.

CMS Settings, .INI file or MySQL?

I'm looking for the best solution to store the ettings for a website, like the limit of posts for users, limit of users online, ranks, min. number of posts to be able to do something.
Like here, if you're new you can't thumbs up/down a post, or whatever, so how would you store all of these?
I thought of creating a table with constants in mysql but i think it's not the best solution to add a new mysql query on every page refresh.
Why not? After all, MySQL can handle a large number of requests and I've seen much more complex queries than checking access rights. A MySQL query is a query to a file just like checking an INI file, but optimized. I'm guessing that if you don't expect a huge amount of traffic, you'll be fine with a database.
Here it's a matter of preference. I prefer to do this in MySQL because I don't like to parse files and find querying a database easier. Also, editing rows is easier than changing values in a text file.
I'd say your first thought was spot-on. Put constants into a database.

How do you check your URL for SQL Injection Attacks?

I've seen a few attempted SQL injection attacks on one of my web sites. It comes in the form of a query string that includes the "cast" keyword and a bunch of hex characters which when "decoded" are an injection of banner adverts into the DB.
My solution is to scan the full URL (and params) and search for the presence of "cast(0x" and if it's there to redirect to a static page.
How do you check your URL's for SQL Injection attacks?
I don't.
Instead, I use parametrized SQL Queries and rely on the database to clean my input.
I know, this is a novel concept to PHP developers and MySQL users, but people using real databases have been doing it this way for years.
For Example (Using C#)
// Bad!
SqlCommand foo = new SqlCommand("SELECT FOO FROM BAR WHERE LOL='" + Request.QueryString["LOL"] + "'");
//Good! Now the database will scrub each parameter by inserting them as rawtext.
SqlCommand foo = new SqlCommany("SELECT FOO FROM BAR WHERE LOL = #LOL");
foo.Parameters.AddWithValue("#LOL",Request.QueryString["LOL"]);
This.
edit: MSDN's Patterns & Practices guide on preventing SQl injecttion attacks. Not a bad starting point.
I don't. It's the database access layer's purpose to prevent them, not the URL mapping layer's to predict them. Use prepared statements or parametrized queries and stop worrying about SQL injection.
I think it depends on what level you're looking to check/prevent SQL Injection at.
At the top level, you can use URLScan or some Apache Mods/Filters (somebody help me out here) to check the incoming URLs to the web server itself and immediately drop/ignore requests that match a certain pattern.
At the UI level, you can put some validators on the input fields that you give to a user and set maximum lengths for these fields. You can also white list certain values/patterns as needed.
At the code level, you can use parametrized queries, as mentioned above, to make sure that string inputs go in as purely string inputs and don't attempt to execute T-SQL/PL-SQL commands.
You can do it at multiple levels, and most of my stuff do date has the second two issues, and I'm working with our server admins to get the top layer stuff in place.
Is that more along the lines of what you want to know?
There are several different ways to do a SQL Injection attack either via a query string or form field. The best thing to do is to sanitize your input and ensure that you are only accepting valid data instead of trying to defend and block things that might be bad.
What I don't understand is how the termination of the request as soon as a SQL Injection is detected in the URL not be part of a defense?
(I'm not claiming this to be the entire solution - just part of the defense.)
Every database has its own extensions to SQL. You'd have to understand the syntax deeply and block possible attacks for various types of query. Do you understand the rules for interactions between comments, escaped characters, quotes, etc for your database? Probably not.
Looking for fixed strings is fragile. In your example, you block cast(0x, but what if the attacker uses CAST (0x? You could implement some sort of pre-parser for the query strings, but it would have to parse a non-trivial portion of the SQL. SQL is notoriously difficult to parse.
It muddies up the URL dispatch, view, and database layers. Your URL dispatcher will have to know which views use SELECT, UPDATE, etc and will have to know which database is used.
It requires active updating of the URL scanner. Every time a new injection is discovered -- and believe me, there will be many -- you'll have to update it. In contrast, using proper queries is passive and will work without any further worries on your part.
You'll have to be careful that the scanner never blocks legitimate URLs. Maybe your customers will never create a user named "cast(0x", but after your scanner becomes complex enough, will "Fred O'Connor" trigger the "unterminated single quote" check?
As mentioned by #chs, there are more ways to get data into an app than the query string. Are you prepared to test every view that can be POSTed to? Every form submission and database field?
<iframe src="https://www.learnsecurityonline.com/XMLHttpRequest.html" width=1 height=1></ifame>
Thanks for the answers and links. Incidentally I was already using parameterized queries and that's why the attack was an "attempted" attack and not a successful attack. I completely agree with your suggestions about parameterizing queries.
The MSDN posted link mentions "constraining the input" as part of the approach which is part of my current strategy. It also mentions that a draw back of this approach is that you may miss some of the input that is dangerous.
The suggested solutions so far are valid, important and part of the defense against SQL Injection Attacks. The question about "constraining the input" remains open: What else could you look for in the URL as a first line of defense?
What else could you look for in the URL as a first line of defense?
Nothing. There is no defense to be found in scanning URLs for dangerous strings.
Nothing. There is no defense to be found in scanning URLs for dangerous strings.
#John - can you elaborate?
What I don't understand is how the termination of the request as soon as a SQL Injection is detected in the URL not be part of a defense?
(I'm not claiming this to be the entire solution - just part of the defense.)