Sql to Hql query in grails for a very large assosiation - sql

In my app there's a following relation: Page hasMany Paragraphs and I need to create a query that returns all pages where the number of paragraphs is less then limit. The problem is that the pages are created in another app approximately 2 per second and the paragraphs table contains more then 2 million rows. All standard grails approaches, like dynamic finders and criteria queries just hang as they create very not optimal sql. In the database console the following query does the job:
select * from (
select a.id, count(b.page_id) count from page a
left join paragraph b ON a.id = b.page_id
group by 1) sub
WHERE sub.count <= 10 LIMIT 1000
And I couln't translate this query into HQL. I know there's groovy sql available, but it's rows method returns a List of GroovyResult, not list of domain classes. Is there a better approach to the issue?

If a query gets too complicated I tend to do something like this:
def results = new Sql(dataSource).rows(SQL)*.id*.asType(Integer).colect(DomainClass.&get)
I know it doesn't look too great and you'd probable get no kudos for it but it gets the job done.
However it if you'd like to use something more expressive you could give a try to JOOQ (http://www.jooq.org/)

Related

How to paginate results in Legacy SQL

We are using Legacy SQL on a specific request. We can't use standard SQL for some internal reasons.
We would like to paginate our results, because we have a lots of rows. Like that :
SELECT ... FROM ... LIMIT 10000 30000 // In standard SQL
But in Legacy SQL Offset don't exists. So how to do the same job ?
Edit :
I don't want to order. I want to paginate. For example get 1000 rows after skipping 2000 rows. A simple LIMIT clause with an offset, like in traditional SQL Database or like in BigQuery Standard SQL.
To do this, I want to use Big Query Legacy SQL.
The pagination you talking about is done via tabledata.list API
Based on your question and follow-up comments - It might be the way for you to go. Even though it does not involve querying. Just API or related method in client of your choice.
pageToken parameter allows you to page result
Btw, another benefit of this approach - it is free of charge
If you still need to do pagination via query - you option is using ROW_NUMBER()
In this case - you can prepare your data in temp table with below query
SELECT <needed fields>, ROW_NUMBER() OVER() num
FROM `project.dataset.table`
Then, you can page it using num
SELECT <needed fields>
FROM `project.dataset.temp`
WHERE num BETWEEN 10000 AND 30000

SQL selecting results from multiple tables

Hello I want to display results from unrelated tables where a text string exists in a column which is common to all tables in the database.
I can get the desired result with this:
SELECT *
FROM Table1
WHERE Title LIKE '%Text%'
UNION
SELECT *
FROM Table2
WHERE Title LIKE '%Text%'`
However my question is is there a more efficient way to go about this as I need to search dozens of tbls. Thanks for any help you can give!
ps the system I am using supports most dialects but would prefer to keep it simple with SQL Server as that is what I am used to.
There is a SP script you can find online called SearchAllTables (http://vyaskn.tripod.com/search_all_columns_in_all_tables.htm).
When you call it pass in the string, it will return the tables and columns as well as the full string.
You can modify it to work with other datatypes quite easily. It's a fantastic resource for tasks exactly like yours.

How to create a faceted search with SQL Server

I have an application which I will be accessing SQL server to return data which has been filtered by selections from the application as any common faceted search. I did see some out the box solutions, but these are expensive and I prefer building out something custom, but just don't know where to start.
The database structure is like this:
The data from the PRODUCT table would be searched by tags from the TAG table. Values which would be found in the TAG table would be something like this:
ID NAME
----------------------
1 Blue
2 Green
3 Small
4 Large
5 Red
They would be related to products through the ProductTag table.
I would need to return two groups of data from this setup:
The Products that are only related to the Tags selected, whether single or multiple
The Remaining tags that are also available to select for the products which have already been refined by single or multiple selected tags.
I would like this to be all with-in SQL server if possible, 2 seperate as stored procedures.
Most websites have this feature built into it these days, ie: http://www.gnc.com/family/index.jsp?categoryId=2108294&cp=3593186.3593187 (They've called it 'Narrow By')
I have been searching for a while how to do this, and I'm taking a wild guess that if a stored procedure has to be created in this nature, that there would need to be 1 param that accepts CSV values, like this:
[dbo].[GetFacetedProducts] #Tags_Selected = '1,3,5'
[dbo].[GetFacetedTags] #Tags_Selected = '1,3,5'
So with this architecture, does anyone know what types of queries need to be written for these stored procedures, or is the architecture flawed in any way? Has anyone created a faceted search before that was like this? If so, what types of queries would be needed to make something like this? I guess I'm just having trouble wrap my head around it, and there isn't much out there that shows someone how to make something like this.
A RDBMS for being used for faceted searching is the wrong tool for the job at hand. Faceted searching is a multidimensional search, which is difficult to express in the set-based SQL language. Using a data-cube or the like might give you some of the desired functionality, but would be quite a bit of work to build.
When we were faced with similar requirements we ultimately decided to utilize the Apache Solr search engine, which supports faceting as well as many other search-oriented functions and features.
It is possible to do faceted search in SQL Server. However don't try to use your live product data tables. Instead create a de-normalised "fact" table which holds every product (rows) and every tag (columns) so that the intersection is your product-tag values. You can re-populate this periodically from your main product table.
It is then straightforward and relatively efficient to get the facet counts for the matching records for each tag the user checks.
The approach I have described will be perfectly good for small cases, e.g. 1,000 product rows and 50-100 tags (attributes). Also there is an interesting opportunity with the forthcoming SQL Server 2014, which can place tables in memory - that should allow much larger fact tables.
I have also used Solr, and as STW points out this is the "correct" tool for facet searches. It is orders of magnitude faster than a SQL Server solution.
However there are some major disadvantages to using Solr. The main issue is that you have to setup not only another platform (Solr) but also all the paraphernalia that goes with it - Java and some kind of Java servlet (of which there are several). And whilst Solr runs on Windows quite nicely, you will still soon find yourself immersed in a world of command lines and editing of configuration files and environment variables that will remind you of all that was great about the 1980s ... or possibly not. And when that is all working you then need to export your product data to it, using various methods - there is a SQL Server connector which works fairly well but many prefer to post data to it as XML. And then you have to create a webservice-type process on your application to send it the user's query and parse the resulting list of matches and counts back into your application (again, XML is probably the best method).
So if your dataset is relatively small, I would stick with SQL Server. You can still get a sub-second response, and SQL 2014 will hopefully allow much bigger datasets. If your dataset is big then Solr will give remarkably fast results (it really is very fast) but be prepared to make a major investment in learning and supporting a whole new platform.
There's other places where you can get examples of turning a CSV parameter into a table variable. Assuming you have done that part your query boils down to the following:
GetFacetedProducts:
Find Product records where all tags passed in are assigned to each product.
If you wrote it by hand you could end up with:
SELECT P.*
FROM Product P
INNER JOIN ProductTag PT1 ON PT1.ProductID = P.ID AND PT1.TagID = 1
INNER JOIN ProductTag PT2 ON PT1.ProductID = P.ID AND PT1.TagID = 3
INNER JOIN ProductTag PT3 ON PT1.ProductID = P.ID AND PT1.TagID = 5
While this does select only the products that have those tags, it is not going to work with a dynamic list. In the past some people have built up the SQL and executed it dynamically, don't do that.
Instead, lets assume that the same tag can't be applied to a product twice, so we could change our question to:
Find me products where the number of tags matching (dynamic list) is equal to the number of tags in (dynamic list)
DECLARE #selectedTags TABLE (ID int)
DECLARE #tagCount int
INSERT INTO #selectedTags VALUES (1)
INSERT INTO #selectedTags VALUES (3)
INSERT INTO #selectedTags VALUES (5)
SELECT #tagCount = COUNT(*) FROM #selectedTags
SELECT
P.ID
FROM Product P
JOIN ProductTag PT
ON PT.ProductID = P.ID
JOIN #selectedTags T
ON T.ID = PT.TagID
GROUP BY
P.ID,
P.Name
HAVING COUNT(PT.TagID) = #tagCount
This returns just the ID of products that match all your tags, you could then join this back to the products table if you want more than just an ID, otherwise you're done.
As for your second query, once you have the product IDs that match, you want a list of all tags for those product IDs that aren't in your list:
SELECT DISTINCT
PT2.TagID
FROM aProductTag PT2
WHERE PT2.ProductID IN (
SELECT
P.ID
FROM aProduct P
JOIN aProductTag PT
ON PT.ProductID = P.ID
JOIN #selectedTags T
ON T.ID = PT.TagID
GROUP BY
P.ID,
P.Name
HAVING COUNT(PT.TagID) = #tagCount
)
AND PT2.TagID NOT IN (SELECT ID FROM #selectedTags)

Search - Order By Keywords

Is there any way to do something like this:
var keywords = SearchUtilities.FindKeyWords(q);
var j = (from p in _dataContext.Jobs
orderby p.JobKeywords.Select(jobKeyword => jobKeyword.Keyword)
.Intersect(keywords).Count())
.Take(10).AsEnumerable();
The main idea here is to order search results by the count of the keywords that exists both in the search query and in the keywords that associated with the jobs.
I don't want to bring all the records from the SQL-land first, and then orderby, because it's very slow. When I try that code it throws:
Local sequence cannot be used in LINQ to SQL implementations of query operators except the Contains operator.
Ideas?
If you are looking more performance on search, I suggest you to use Stored Procedure. In my opinion TSQL works faster than LinQ. because Linq gets all results, but Stored Procedure (TSql) doesn't take all records.

Building Query from Multi-Selection Criteria

I am wondering how others would handle a scenario like such:
Say I have multiple choices for a user to choose from.
Like, Color, Size, Make, Model, etc.
What is the best solution or practice for handling the build of your query for this scneario?
so if they select 6 of the 8 possible colors, 4 of the possible 7 makes, and 8 of the 12 possible brands?
You could do dynamic OR statements or dynamic IN Statements, but I am trying to figure out if there is a better solution for handling this "WHERE" criteria type logic?
EDIT:
I am getting some really good feedback (thanks everyone)...one other thing to note is that some of the selections could even be like (40 of the selections out of the possible 46) so kind of large. Thanks again!
Thanks,
S
What I would suggest doing is creating a function that takes in a delimited list of makeIds, colorIds, etc. This is probably going to be an int (or whatever your key is). And splits them into a table for you.
Your SP will take in a list of makes, colors, etc as you've said above.
YourSP '1,4,7,11', '1,6,7', '6'....
Inside your SP you'll call your splitting function, which will return a table-
SELECT * FROM
Cars C
JOIN YourFunction(#models) YF ON YF.Id = C.ModelId
JOIN YourFunction(#colors) YF2 ON YF2.Id = C.ColorId
Then, if they select nothing they get nothing. If they select everything, they'll get everything.
What is the best solution or practice for handling the build of your query for this scenario?
Dynamic SQL.
A single parameter represents two states - NULL/non-existent, or having a value. Two more means squaring the number of parameters to get the number of total possibilities: 2 yields 4, 3 yields 9, etc. A single, non-dynamic query can contain all the possibilities but will perform horribly between the use of:
ORs
overall non-sargability
and inability to reuse the query plan
...when compared to a dynamic SQL query that constructs the query out of only the absolutely necessary parts.
The query plan is cached in SQL Server 2005+, if you use the sp_executesql command - it is not if you only use EXEC.
I highly recommend reading The Curse and Blessing of Dynamic SQL.
For something this complex, you may want a session table that you update when the user selects their criteria. Then you can join the session table to your items table.
This solution may not scale well to thousands of users, so be careful.
If you want to create dynamic SQL it won't matter if you use the OR approach or the IN approach. SQL Server will process the statements the same way (maybe with little variation in some situations.)
You may also consider using temp tables for this scenario. You can insert the selections for each criteria into temp tables (e.g., #tmpColor, #tmpSize, #tmpMake, etc.). Then you can create a non-dynamic SELECT statement. Something like the following may work:
SELECT <column list>
FROM MyTable
WHERE MyTable.ColorID in (SELECT ColorID FROM #tmpColor)
OR MyTable.SizeID in (SELECT SizeID FROM #tmpSize)
OR MyTable.MakeID in (SELECT MakeID FROM #tmpMake)
The dynamic OR/IN and the temp table solutions work fine if each condition is independent of the other conditions. In other words, if you need to select rows where ((Color is Red and Size is Medium) or (Color is Green and Size is Large)) you'll need to try other solutions.