Many to many relation, multi where clause on the same column and hibernate - sql

Sorry for the bad question title, couldnt think of anything better.
Anyway, my tables are Tags - Poststags - Posts. Poststags is a junction table for many to many relation. I need to select all posts with given tags, I dont know how many tags the user will choose to search for. One way I found to do this is in the code below, however i would need to loop all the tags given by the user and construct the query string from there since the number of tags is unknown. Seems like a pretty bad solution to me.
Another solution would be to store all tags in one column in the Posts table as a pure string, but I dont want to do that because of other application requirements.
I have a working sql query, since I was trying pure sql before trying to implement it in hibernate, but I dont like doing a select of all posts containing each tag and then joining each query, is there a way to specify the same column multiple times in the WHERE clause? Something along the lines WHERE pt.tag_id = x AND pt.tag_id = y? (I know this won't work). IN operator won't work either since it will give me Posts that contain any of the supplied tags and not just the posts containing ALL of the supplied tags.
Also how would I implement such a query in HQL(if subqueries like this are even supported?). Or can I somehow manage this via criteria? Or do I have to resort to using createSQLQuery method of a hibernate session?
SELECT * FROM
( SELECT * FROM posts p
inner join poststags pt on pt.post_id = p.id
WHERE pt.tag_id = 1 ) AS A
INNER JOIN
( SELECT * FROM posts p
inner join poststags pt on pt.post_id = p.id
WHERE pt.tag_id = 2 ) AS B ON A.id = B.id
And yes, I know this query is not returning the Post entity itself, but I can handle that later.

Don't use hibernate or ORM for this kind of complex select, it may work, but in a bad way.
Your use case should be solved by full text search, which means each Post will need have its own tags.
I don't see much value to make Tag an entity. It's just a string.
Full text search could be heavy for database , A better way is using elasticsearch to help. Spring has integration with spring-data-elasticsearch and it's not difficult to use. Elasyicsearch is very powerful for free text search.

Here is a solution that 'should' work using Criteria queries in Hibernate.
Assuming that you have an entity for Post and an entity for PostTag and PostTag has reference to Post (which I think it should given the example query that you provided), I believe that something like this should do what you want:
static DetachedCriteria getPostTagCriteria(String tagString)
{
DetachedCriteria criteria = DetachedCriteria.forClass(PostTag.class, "uniqueName_" + postTagId);
criteria.createAlias("tag", "tag");
criteria.add(Restrictions.eq("tag.tagString", tagString));
criteria.setProjection(Projections.property("postId"));
return criteria;
}
static List<Post> getPosts(List<String> tagStrings)
{
Criteria criteria = getCurrentSession().createCriteria(Post.class, "post");
for(String tagString : tagStrings)
{
criteria.add(Property.forName("post.id").in(getPostTagCriteria(tagString)));
}
List<Post> ret = criteria.list();
return ret;
}
This assumes that you have reasonable entities to represent Post, PostTag and Tag and that they all reference each other in obvious parent/child sort of ways that I have completely made up here.
But, the general idea of creating multiple detached criteria objects based on your input should solve your problem. This solution also comes with the same caveats regarding SQL complexity mentioned above. You will be creating a sub-query for each tag passed in. So, depending on your indexes and table sizes, you may need to consider a different approach.

Related

ORMlite join and order by over three tables

I have a many to many relationship and am trying to order by the one side. So in SQL this would be:
select * from
patient join patientuserrelation on patient.id=patientuserrelation.p_id
join user on patientuserrelation.u_id=user.id
order by user.name
Which I have implemented in Ormlite as:
QueryBuilder<Visit, String> qbVisit = setupAccess(Visit.class)
.queryBuilder();
QueryBuilder<UserVisitRelation, String> qbUserVisitRelation = setupAccess(
UserVisitRelation.class).queryBuilder();
QueryBuilder<User, String> qbUser = setupAccess(User.class)
.queryBuilder();
qbUser.orderBy(sortByThisColumn, true);
qbUserVisitRelation.join(qbUser);
qbVisit.join(qbUserVisitRelation);
return qbVisit.distinct().query();
However, this does not work. The results are not ordered at all. I could try to use rawSQL and rawRowMapper but that bloat up my code.
There is a similar question here: ORMLITE order by a column from another table. Unfortunately with no answer. Is there a helpful expert around?
Ok, to answer my own question for posterity: It seems like join and order across multiple tables is not supported in ormlite 4.48. If you think about it for a while you figure out why this is probably the case. Anyway, the solution is to write a raw sql statement, only select the necessary columns WITHOUT foreign collections and cast it to your object using RawRowMapper and GenericRawResults. Not what you like to do when using an ORM, but OK.

Rails ActiveRecord finding questions by tag in named scope

I want the equivalent of SO search by tag, so I need an exists query but I also still need to left join on all tags. I've tried a couple of approaches and I'm out of ideas.
The Qustion - Tag relationship is through has_and_belongs_to_many both ways (i.e. I have a QuestionTags joiner table)
e.g.
Question.join(:tags).where('tag.name = ?', tag_name).includes(:tags)
I would expect this to do what I need but actually it just mashes up the includes with the join and I just end up with basically an inner join.
Question.includes(:tags)
.where("exists (
select 1 from questions_tags
where question_id = questions.id
and tag_id = (select id
from tags
where tags.name = ?))", tag_name)
This fetches the correct results but a) is really ugly and b) gives a deprecation warning as again it seems to confuse the includes with the join:
DEPRECATION WARNING: It looks like you are eager loading table(s) (one
of: questions, tags) that are referenced in a string SQL sn ippet. For
example:
Post.includes(:comments).where("comments.title = 'foo'")
Note I'm trying to write these as named scopes.
Let me know if the question isn't clear. Thanks in advance.
OK, got it. I know no built in syntax to do it. I have used an alternative before, You can do like this:
Question.include(:tags).where("questions.id IN (
#{ Question.joins(:tags).where('tags.name = ?', tag_name).select('questions.id').to_sql})")
You can also join this subquery to your questions table instead of using IN. Alternatively if You are not against adding gems and You are using Postgres, use this gem.
It provides really neat syntax for advanced queries.
Use preload instead of includes:
Question.preload(:tags).where("exists ....

How to create a faceted search with SQL Server

I have an application which I will be accessing SQL server to return data which has been filtered by selections from the application as any common faceted search. I did see some out the box solutions, but these are expensive and I prefer building out something custom, but just don't know where to start.
The database structure is like this:
The data from the PRODUCT table would be searched by tags from the TAG table. Values which would be found in the TAG table would be something like this:
ID NAME
----------------------
1 Blue
2 Green
3 Small
4 Large
5 Red
They would be related to products through the ProductTag table.
I would need to return two groups of data from this setup:
The Products that are only related to the Tags selected, whether single or multiple
The Remaining tags that are also available to select for the products which have already been refined by single or multiple selected tags.
I would like this to be all with-in SQL server if possible, 2 seperate as stored procedures.
Most websites have this feature built into it these days, ie: http://www.gnc.com/family/index.jsp?categoryId=2108294&cp=3593186.3593187 (They've called it 'Narrow By')
I have been searching for a while how to do this, and I'm taking a wild guess that if a stored procedure has to be created in this nature, that there would need to be 1 param that accepts CSV values, like this:
[dbo].[GetFacetedProducts] #Tags_Selected = '1,3,5'
[dbo].[GetFacetedTags] #Tags_Selected = '1,3,5'
So with this architecture, does anyone know what types of queries need to be written for these stored procedures, or is the architecture flawed in any way? Has anyone created a faceted search before that was like this? If so, what types of queries would be needed to make something like this? I guess I'm just having trouble wrap my head around it, and there isn't much out there that shows someone how to make something like this.
A RDBMS for being used for faceted searching is the wrong tool for the job at hand. Faceted searching is a multidimensional search, which is difficult to express in the set-based SQL language. Using a data-cube or the like might give you some of the desired functionality, but would be quite a bit of work to build.
When we were faced with similar requirements we ultimately decided to utilize the Apache Solr search engine, which supports faceting as well as many other search-oriented functions and features.
It is possible to do faceted search in SQL Server. However don't try to use your live product data tables. Instead create a de-normalised "fact" table which holds every product (rows) and every tag (columns) so that the intersection is your product-tag values. You can re-populate this periodically from your main product table.
It is then straightforward and relatively efficient to get the facet counts for the matching records for each tag the user checks.
The approach I have described will be perfectly good for small cases, e.g. 1,000 product rows and 50-100 tags (attributes). Also there is an interesting opportunity with the forthcoming SQL Server 2014, which can place tables in memory - that should allow much larger fact tables.
I have also used Solr, and as STW points out this is the "correct" tool for facet searches. It is orders of magnitude faster than a SQL Server solution.
However there are some major disadvantages to using Solr. The main issue is that you have to setup not only another platform (Solr) but also all the paraphernalia that goes with it - Java and some kind of Java servlet (of which there are several). And whilst Solr runs on Windows quite nicely, you will still soon find yourself immersed in a world of command lines and editing of configuration files and environment variables that will remind you of all that was great about the 1980s ... or possibly not. And when that is all working you then need to export your product data to it, using various methods - there is a SQL Server connector which works fairly well but many prefer to post data to it as XML. And then you have to create a webservice-type process on your application to send it the user's query and parse the resulting list of matches and counts back into your application (again, XML is probably the best method).
So if your dataset is relatively small, I would stick with SQL Server. You can still get a sub-second response, and SQL 2014 will hopefully allow much bigger datasets. If your dataset is big then Solr will give remarkably fast results (it really is very fast) but be prepared to make a major investment in learning and supporting a whole new platform.
There's other places where you can get examples of turning a CSV parameter into a table variable. Assuming you have done that part your query boils down to the following:
GetFacetedProducts:
Find Product records where all tags passed in are assigned to each product.
If you wrote it by hand you could end up with:
SELECT P.*
FROM Product P
INNER JOIN ProductTag PT1 ON PT1.ProductID = P.ID AND PT1.TagID = 1
INNER JOIN ProductTag PT2 ON PT1.ProductID = P.ID AND PT1.TagID = 3
INNER JOIN ProductTag PT3 ON PT1.ProductID = P.ID AND PT1.TagID = 5
While this does select only the products that have those tags, it is not going to work with a dynamic list. In the past some people have built up the SQL and executed it dynamically, don't do that.
Instead, lets assume that the same tag can't be applied to a product twice, so we could change our question to:
Find me products where the number of tags matching (dynamic list) is equal to the number of tags in (dynamic list)
DECLARE #selectedTags TABLE (ID int)
DECLARE #tagCount int
INSERT INTO #selectedTags VALUES (1)
INSERT INTO #selectedTags VALUES (3)
INSERT INTO #selectedTags VALUES (5)
SELECT #tagCount = COUNT(*) FROM #selectedTags
SELECT
P.ID
FROM Product P
JOIN ProductTag PT
ON PT.ProductID = P.ID
JOIN #selectedTags T
ON T.ID = PT.TagID
GROUP BY
P.ID,
P.Name
HAVING COUNT(PT.TagID) = #tagCount
This returns just the ID of products that match all your tags, you could then join this back to the products table if you want more than just an ID, otherwise you're done.
As for your second query, once you have the product IDs that match, you want a list of all tags for those product IDs that aren't in your list:
SELECT DISTINCT
PT2.TagID
FROM aProductTag PT2
WHERE PT2.ProductID IN (
SELECT
P.ID
FROM aProduct P
JOIN aProductTag PT
ON PT.ProductID = P.ID
JOIN #selectedTags T
ON T.ID = PT.TagID
GROUP BY
P.ID,
P.Name
HAVING COUNT(PT.TagID) = #tagCount
)
AND PT2.TagID NOT IN (SELECT ID FROM #selectedTags)

Rails double match from has_and_belongs_to_many

Say that I have a has_and_belongs_to_many relationship where I have posts and categories. It is simple to find all the posts in a category, or all the categories that a particular post is a member of. However, what if I want to find a list of posts that belong to multiple categories? For example, a list of posts that are on the topic of security in Rails, I might want the posts that belong to the categories "Security" and "Rails".
Is it possible to do this with the finder methods build into ActiveRecord, or will I need to use SQL? Can someone please explain how?
You can use includes or joins, like:
#result = Post.includes(:categories).where("categories.name = 'Security' OR categories.name = 'Rails'")
or
#result = Post.joins(:categories).where("categories.name = 'Security' OR categories.name = 'Rails'")
I also suggest to check this railscast to understand the difference between joins and includes, so you can decide what is better in your case.
i don't know anything about rails, but i'm attempting a similar thing with some sql. this may or may not work for either of us....
i have a table of articles, and a look-up table of applied categories. to get an article that has the 'security' category and the 'rails' category, i'm joining the article table to the category table, of course, but also re-joining it a second time. each join of the category table uses a hint in the table alias name (ie language or topic)
pseudo code:
SELECT article.*,
category_language.category_id,
category_topic.category_id
FROM category category_language
INNER JOIN article ON category_language.articleID = article.articleID
INNER JOIN category category_topic ON article.articleID = category_topic.articleID
WHERE category_language.category_id in (420) /* rails */
and category_topic.category_id in (421) /* security */
this isn't completely ironed out, and i hope that if i am showing my ignorance here, someone will speak up.

Any way to merge two queries in solr?

In my project, we use solr to index a lot of different kind of documents, by example Books and Persons, with some common fields (like the name) and some type-specific fields (like the category, or the group people belong to).
We would like to do queries that can find both books and persons, with for each document type some filters applied. Something like:
find all Books and Persons with "Jean" in the name and/or content
but only Books from category "fiction" and "fantasy"
and only Persons from the group "pangolin"
everything sorted by score
A very simple way to do that would be:
q = name:jean content:jean
&
fq=
(type:book AND category:(fiction fantasy))
OR
(type:person AND group:pangolin)
But alas, as fq are cached, I'd prefer something allowing me simpler and so more reusable fq like :
fq=type:book,
fq=type:person,
fq=category(fiction fantasy),
fq=group:pangolin.
Is there a way to tell solr to merge or combine many queries? Something like 'grouping' fq together.
I read a bit about nested queries with _query_, but the very few documentation about it makes me think it's not the solution I'm looking for.
As Geert-Jan mentioned it in his answer, the possibility to do OR between fq is a solr asking feature, but with very little support by now: https://issues.apache.org/jira/browse/SOLR-1223
So I managed to simulate what I want to in a simple way:
for each field a document type can have, we have to define everytime a value (so if in my own example Books can have no category, at index time we still have to define something like category=noCategoryCode
when using a filter on one of this fields in a query on multiple types, we add a non-present condition in the filter, so fq=category:fiction becomes fq=category:fiction (*:* AND -category:*)
By this way, all other types (like Person) will pass through this filter, and the filter stands quite atomic and often used - so caching is still useful.
So, my full example becomes:
q = name:jean content:jean
&
fq= type:(book person)
&
fq= category:(fiction fantasy) (*:* AND -category:*)
&
fq= group:(pangolin) (*:* AND -group:*)
Still, can't wait SOLR-1223 to be patched :)
You can apply multiple filter queries at the same time
q=name:jean content:jean&fq=type:book&fq=type:person&fq=category(fiction fantasy)&fq=group:pangolin
Perhaps I am not understanding your issue, but the only difference between a query and a filter is that the filter is cached. If you don't care about the caching, just modify their query:
real query +((type:book category:fiction) (type:person group:pangolin))