Search - Order By Keywords - sql

Is there any way to do something like this:
var keywords = SearchUtilities.FindKeyWords(q);
var j = (from p in _dataContext.Jobs
orderby p.JobKeywords.Select(jobKeyword => jobKeyword.Keyword)
.Intersect(keywords).Count())
.Take(10).AsEnumerable();
The main idea here is to order search results by the count of the keywords that exists both in the search query and in the keywords that associated with the jobs.
I don't want to bring all the records from the SQL-land first, and then orderby, because it's very slow. When I try that code it throws:
Local sequence cannot be used in LINQ to SQL implementations of query operators except the Contains operator.
Ideas?

If you are looking more performance on search, I suggest you to use Stored Procedure. In my opinion TSQL works faster than LinQ. because Linq gets all results, but Stored Procedure (TSql) doesn't take all records.

Related

Sql to Hql query in grails for a very large assosiation

In my app there's a following relation: Page hasMany Paragraphs and I need to create a query that returns all pages where the number of paragraphs is less then limit. The problem is that the pages are created in another app approximately 2 per second and the paragraphs table contains more then 2 million rows. All standard grails approaches, like dynamic finders and criteria queries just hang as they create very not optimal sql. In the database console the following query does the job:
select * from (
select a.id, count(b.page_id) count from page a
left join paragraph b ON a.id = b.page_id
group by 1) sub
WHERE sub.count <= 10 LIMIT 1000
And I couln't translate this query into HQL. I know there's groovy sql available, but it's rows method returns a List of GroovyResult, not list of domain classes. Is there a better approach to the issue?
If a query gets too complicated I tend to do something like this:
def results = new Sql(dataSource).rows(SQL)*.id*.asType(Integer).colect(DomainClass.&get)
I know it doesn't look too great and you'd probable get no kudos for it but it gets the job done.
However it if you'd like to use something more expressive you could give a try to JOOQ (http://www.jooq.org/)

Parameterizing 'limit' and 'order' in sqlite3

I have a sqlite query that I'm looking into parameterization to avoid bad sql injection things on the internet...
So things like:
Select * From myTable Where id = $id
are fine if I have $id defined somewhere and pass that as a parameter to my db calls.
paramters.$id = 150;
db.all(myQuery, parameters, function (err, rows) {
results = rows;
});
I wonder if I need to go out of my way to also parameterize things that are sorted and paginated (both are inputs that users can give)...
I tried to do something like:
var sorter = JSON.parse(value);
parameters.$sortMethod = sorter.method;
parameters.$sortOrder = sorter.order;
sort_filter += 'ORDER BY $sortMethod $sortOrder';
No dice though. I'm guessing sqlite3 just doesn't let you parameterize things that are in ORDER, LIMIT and OFFSET. I thought there was something really sneaky maybe folks out there could do by ending a sqlite statement prematurely in the order and then creating a new malicious statement, but maybe SQLITE3 only lets you exercise one statement at a time (http://www.qtcentre.org/threads/54748-Execute-multiple-sql-command-in-SQLITE3)
Should I not worry about parameterizing things in order limit and offset? For reference, I'm running this on node.js with this sqlite library: https://github.com/mapbox/node-sqlite3
Thanks much in advance!
SQLite (and any other database) allows you to parameterize expressions, that is, any numbers, strings, blobs, or NULL values that appear in a statement.
This includes the values in the LIMIT/OFFSET clauses.
Anything else cannot be parameterized.
This would be table and column names, operators, or any other keyword (like SELECT, ORDER BY, or ASC).
If you need to change any parts of your SQL statements that are not expressions, you have to create the statement on the fly.
(There is no danger of SQL injection as long as your code constructs the statement by itself, not using any unchecked user data.)

SQL, how to pick portion of list of items via SELECT, WHERE clauses?

I'm passing three parameter to URL: &p1=eventID, &p2=firstItem, &p3=numberOfItems. first parameter is a column of table. second parameter is the first item that I'm looking for. Third parameter says pick how many items after firstItem.
for example in first query user send &p1=1, &p2=0, &p3=20. Therefore, I need first 20 items of list.
Next time if user sends &p1=1, &p2=21, &p3=20, then I should pass second 20 items (from item 21 to item 41).
PHP file is able to get parameters. currently, I'm using following string to query into database:
public function getPlaceList($eventId, $firstItem, $numberOfItems) {
$records = array();
$query = "select * from {$this->TB_ACT_PLACES} where eventid={$eventId}";
...
}
Result is a long list of items. If I add LIMIT 20 at the end of string, then since I'm not using Token then result always is same.
how to change the query in a way that meets my requirement?
any suggestion would be appreciated. Thanks
=> update:
I can get the whole list of items and then select from my first item to last item via for(;;) loop. But I want to know is it possible to do similar thing via sql? I guess this way is more efficient way.
I would construct the final query to be like this:
SELECT <column list> /* never use "select *" in production! */
FROM Table
WHERE p1ColumnName >= p2FirstItem
ORDER BY p1ColumnName
LIMIT p3NumberOfItems
The ORDER BY is important; according to my reading of the documentation, PostGreSql won't guarantee which rows you get without it. I know Sql Server works much the same way.
Beyond this, I'm not sure how to build this query safely. You'll need to be very careful that your method for constructing that sql statement does not leave you vulnerable to sql injection attacks. At first glance, it looks like I could very easily put a malicious value into your column name url parameter.
If you are using mysql, then use the LIMIT keyword.
SELECT *
FROM Table
LIMIT $firstItem, $numberOfItems
See the documentation here (search for LIMIT in the page).
I found my answer here: stackoverflow.com/a/8242764/513413
For other's reference, I changed my query based on what Joel and above link said:
$query = "select places.title, places.summary, places.description, places.year, places.date from places where places.eventid = $eventId order by places.date limit $numberOfItems offset $firstItem";

Building Query from Multi-Selection Criteria

I am wondering how others would handle a scenario like such:
Say I have multiple choices for a user to choose from.
Like, Color, Size, Make, Model, etc.
What is the best solution or practice for handling the build of your query for this scneario?
so if they select 6 of the 8 possible colors, 4 of the possible 7 makes, and 8 of the 12 possible brands?
You could do dynamic OR statements or dynamic IN Statements, but I am trying to figure out if there is a better solution for handling this "WHERE" criteria type logic?
EDIT:
I am getting some really good feedback (thanks everyone)...one other thing to note is that some of the selections could even be like (40 of the selections out of the possible 46) so kind of large. Thanks again!
Thanks,
S
What I would suggest doing is creating a function that takes in a delimited list of makeIds, colorIds, etc. This is probably going to be an int (or whatever your key is). And splits them into a table for you.
Your SP will take in a list of makes, colors, etc as you've said above.
YourSP '1,4,7,11', '1,6,7', '6'....
Inside your SP you'll call your splitting function, which will return a table-
SELECT * FROM
Cars C
JOIN YourFunction(#models) YF ON YF.Id = C.ModelId
JOIN YourFunction(#colors) YF2 ON YF2.Id = C.ColorId
Then, if they select nothing they get nothing. If they select everything, they'll get everything.
What is the best solution or practice for handling the build of your query for this scenario?
Dynamic SQL.
A single parameter represents two states - NULL/non-existent, or having a value. Two more means squaring the number of parameters to get the number of total possibilities: 2 yields 4, 3 yields 9, etc. A single, non-dynamic query can contain all the possibilities but will perform horribly between the use of:
ORs
overall non-sargability
and inability to reuse the query plan
...when compared to a dynamic SQL query that constructs the query out of only the absolutely necessary parts.
The query plan is cached in SQL Server 2005+, if you use the sp_executesql command - it is not if you only use EXEC.
I highly recommend reading The Curse and Blessing of Dynamic SQL.
For something this complex, you may want a session table that you update when the user selects their criteria. Then you can join the session table to your items table.
This solution may not scale well to thousands of users, so be careful.
If you want to create dynamic SQL it won't matter if you use the OR approach or the IN approach. SQL Server will process the statements the same way (maybe with little variation in some situations.)
You may also consider using temp tables for this scenario. You can insert the selections for each criteria into temp tables (e.g., #tmpColor, #tmpSize, #tmpMake, etc.). Then you can create a non-dynamic SELECT statement. Something like the following may work:
SELECT <column list>
FROM MyTable
WHERE MyTable.ColorID in (SELECT ColorID FROM #tmpColor)
OR MyTable.SizeID in (SELECT SizeID FROM #tmpSize)
OR MyTable.MakeID in (SELECT MakeID FROM #tmpMake)
The dynamic OR/IN and the temp table solutions work fine if each condition is independent of the other conditions. In other words, if you need to select rows where ((Color is Red and Size is Medium) or (Color is Green and Size is Large)) you'll need to try other solutions.

How bad is my query?

Ok I need to build a query based on some user input to filter the results.
The query basically goes something like this:
SELECT * FROM my_table ORDER BY ordering_fld;
There are four text boxes in which users can choose to filter the data, meaning I'd have to dynamically build a "WHERE" clause into it for the first filter used and then "AND" clauses for each subsequent filter entered.
Because I'm too lazy to do this, I've just made every filter an "AND" clause and put a "WHERE 1" clause in the query by default.
So now I have:
SELECT * FROM my_table WHERE 1 {AND filters} ORDER BY ordering_fld;
So my question is, have I done something that will adversely affect the performance of my query or buggered anything else up in any way I should be remotely worried about?
MySQL will optimize your 1 away.
I just ran this query on my test database:
EXPLAIN EXTENDED
SELECT *
FROM t_source
WHERE 1 AND id < 100
and it gave me the following description:
select `test`.`t_source`.`id` AS `id`,`test`.`t_source`.`value` AS `value`,`test`.`t_source`.`val` AS `val`,`test`.`t_source`.`nid` AS `nid` from `test`.`t_source` where (`test`.`t_source`.`id` < 100)
As you can see, no 1 at all.
The documentation on WHERE clause optimization in MySQL mentions this:
Constant folding:
(a<b AND b=c) AND a=5
-> b>5 AND b=c AND a=5
Constant condition removal (needed because of constant folding):
(B>=5 AND B=5) OR (B=6 AND 5=5) OR (B=7 AND 5=6)
-> B=5 OR B=6
Note 5 = 5 and 5 = 6 parts in the example above.
You can EXPLAIN your query:
http://dev.mysql.com/doc/refman/5.0/en/explain.html
and see if it does anything differently, which I doubt. I would use 1=1, just so it is more clear.
You might want to add LIMIT 1000 or something, when no parameters are used and the table gets large, will you really want to return everything?
WHERE 1 is a constant, deterministic expression which will be "optimized out" by any decent DB engine.
If there is a good way in your chosen language to avoid building SQL yourself, use that instead. I like Python and Django, and the Django ORM makes it very easy to filter results based on user input.
If you are committed to building the SQL yourself, be sure to sanitize user inputs against SQL injection, and try to encapsulate SQL building in a separate module from your filter logic.
Also, query performance should not be your concern until it becomes a problem, which it probably won't until you have thousands or millions of rows. And when it does come time to optimize, adding a few indexes on columns used for WHERE and JOIN goes a long way.
TO improve performance, use column indexes on fields listen in "WHERE"
Standard SQL Injection Disclaimers here...
One thing you could do, to avoid SQL injection since you know it's only four parameters is use a stored procedure where you pass values for the fields or NULL. I am not sure of mySQL stored proc syntax, but the query would boil down to
SELECT *
FROM my_table
WHERE Field1 = ISNULL(#Field1, Field1)
AND Field2 = ISNULL(#Field2, Field2)
...
ORDRE BY ordering_fld
We've been doing something similiar not too long ago and there're a few things that we observed:
Setting up the indexes on the columns we were (possibly) filtering, improved performance
The WHERE 1 part can be left out completely if the filters're not used. (not sure if it applies to your case) Doesn't make a difference, but 'feels' right.
SQL injection shouldn't be forgotten
Also, if you 'only' have 4 filters, you could build up a stored procedure and pass in null values and check for them. (just like n8wrl suggested in the meantime)
That will work - some considerations:
About dynamically built SQL in general, some databases (Oracle at least) will cache execution plans for queries, so if you end up running the same query many times it won't have to completely start over from scratch. If you use dynamically built SQL, you are creating a different query each time so to the database it will look like 100 different queries instead of 100 runs of the same query.
You'd probably just need to measure the performance to find out if it works well enough for you.
Do you need all the columns? Explicitly specifying them is probably better than using * anyways because:
You can visually see what columns are being returned
If you add or remove columns to the table later, they won't change your interface
Not bad, i didn't know this snippet to get rid of the 'is it the first filter 3' question.
Tho you should be ashamed of your code ( ^^ ), it doesn't do anything to performance as any DB Engine will optimize it.
The only reason I've used WHERE 1 = 1 is for dynamic SQL; it's a hack to make appending WHERE clauses easier by using AND .... It is not something I would include in my SQL otherwise - it does nothing to affect the query overall because it always evaluates as being true and does not hit the table(s) involved so there aren't any index lookups or table scans based on it.
I can't speak to how MySQL handles optional criteria, but I know that using the following:
WHERE (#param IS NULL OR t.column = #param)
...is the typical way of handling optional parameters. COALESCE and ISNULL are not ideal because the query is still utilizing indexes (or worse, table scans) based on a sentinel value. The example I provided won't hit the table unless a value has been provided.
That said, my experience with Oracle (9i, 10g) has shown that it doesn't handle [ WHERE (#param IS NULL OR t.column = #param) ] very well. I saw a huge performance gain by converting the SQL to be dynamic, and used CONTEXT variables to determine what to add. My impression of SQL Server 2005 is that these are handled better.
I have usually done something like this:
for(int i=0; i<numConditions; i++) {
sql += (i == 0 ? "WHERE " : "AND ");
sql += dbFieldNames[i] + " = " + safeVariableValues[i];
}
Makes the generated query a little cleaner.
One alternative i sometimes use is to build the where clause an an array and then join them together:
my #wherefields;
foreach $c (#conditionfields) {
push #wherefields, "$c = ?",
}
my $sql = "select * from table";
if(#wherefields) { $sql.=" WHERE " . join (" AND ", #wherefields); }
The above is written in perl, but most languages have some kind of join funciton.