Ruby on Rails search for querying two tables - sql

I want to build a search on my website that is similar to Facebook.
For example, entering a search phrase and returning results from multiple tables.
In turn, I have two tables on my website which include the following: Account and Posts. I want to make a search that returns results from both tables based on a search phrase.
I am confused on how to do this.
Can someone point me in the right direction please.
Thank you,
Brian

You can do a search on a joined field by doing a joins to join the second table. For example:
Account.where(:your_attribute => search_term).joins(:post).where("posts.some_attribute = ?", search_term_2)
Or if you're searching from the opposite direction:
Post.where(:some_attribute => search_term).joins(:accounts).where("accounts.your_attribute = ?", search_term_2)
If you want to do an or between the two tables, you can. Just modify the query a little:
Post.joins(:accounts).where("posts.attribute = ? or accounts.attribute = ?", search_term)

If I'm reading your post correctly, what you want to do is in a single search box, you want to search multiple tables and return multiple types of data. On Facebook, it returns people, apps, pages, etc. By doing a join, you are going to return Post with their associated users, however, this won't return Users that don't have any posts, and even if you did an "outer join" on the table, it wouldn't be scalable if you wanted to search additional models.
Your simplest solution without introducing more software in to the mix is to create a database view that maps the data to a structure where its more easily queryable. In Rails/Ruby, you would query that view like you would a normal database table.
A more complex solution would be to use a full text index such as Apache Solr and use a gem like acts_as_solr_reloaded to query the full text index. At the end of the day, this would be a more robust and scalable solution.

Related

How to retrieve identical entries from two different queries which are based on the same table

I think I got a knot in my line of thought, surely you can untie it.
Basically I have two working queries which are based on the same table and result in an identical structure (same as source table). They are simply two different kinds of row filters. Now I would like to "stack" these filters, meaning that I want to retract all the entries which are in query a and query b.
Why do I want that?
Our club is structured in several local groups and I need to hand different kinds of lists (e.g. members with email-entry) to these groups. In this example I would have a query "groupA" and a query "newsletter". Another could be "groupB" and "activemember", but also "groupB" and "newsletter". Unfortunately each query is based on a set of conditions, which imho would be stored best in a single query instead of copying the conditions several times to different queries (in case something changes).
Judging from the Venn diagrams 1, I suppose I need to use INNER JOIN but could not get it to work. Neither with the LibreOffice Base query assistant nor an SQL-Code. I tried this:
SELECT groupA.*
FROM groupA
INNER JOIN newsletter
ON groupA.memberID = newsletter.memberID
The error code says: Cannot be in ORDER BY clause in statement
I suppose that the problem comes from the fact, that both queries are based on the same table.
May be there is an even easier way of nesting queries?
I am hoping for something like
SELECT * FROM groupA
WHERE groupA.memberID = newsletter.memberID
Thank you and sorry if this already has a duplicate, I just could not find the right search terms.

Search for multiple words, across multiple models

I'm trying to create search functionality in a site, and I want the user to be able to search for multiple words, performing substring matching against criteria which exist in various models.
For the sake of this example, let's say I have the following models:
Employee
Company
Municipality
County
A county has multiple municipalities, which has multiple companies, which have multiple employees.
I want the search to be able to search against a combination of Employee.firstname, Employee.lastname, Company.name, Municipality.name and County.name, and I want the end result to be Employee instances.
For example a search for the string "joe tulsa" should return all Employees where both words can be found somewhere in the properties I named in the previous sentence. I'll get some false positives, but at least I should get every employee named "Joe" in Tulsa county.
I've tried a couple of approaches, but I'm not sure I'm going down the right path. I'm looking for a nice RoR-ish way of doing this, and I'm hoping someone with more RoR wisdom can help outline a proper solution.
What I have tried:
I'm not very experienced with this kind of search, but outside RoR, I'd manually create an SQL statement to join all the tables together, create where clauses for each separate search word, covering the different tables. Perhaps use a builder. Then just execute the query and loop through the results, instantiate Employee objects manually and adding them to an array.
To solve this in RoR, I've been:
1) Dabbling with named scopes in what in my project corresponds to the Employee model, but I got stuck when I needed to join in tables two or more "steps" away (Municipality and County).
2) Created a view (called "search_view") joining all the tables together, to simplify the query. Then thought I'd use Employee.find_by_sql() on this table, which would yield me these nice Employee objects. I thought I'd use a builder to create the SQL, and it seemed that Arel was the thing to use, so I tried doing something like:
view = Arel::Table.new(:search_view)
But the resulting Ariel::Table does not contain any columns, so it's not usable to build my query. At this point I'm a bit stuck as I don't know how to get a working query builder.
I strongly recommend using a proper search engine for something like this, it will make life a lot easier for you. I had a similar problem and I thought "Boy, I bet setting up something like Sphinx means I have to read thousands of manuals and tutorials first". Well, that's not the case.
Thinking Sphinx is a Rails gem I recommend which makes it very easy to integrate Sphinx. You don't need to have much experience at all to get started:
http://freelancing-god.github.com/ts/en/
I haven't tried other search engines, but I'm very satisfied with Sphinx. I managed to set up a relatively complex real-time search in less than a day.

Search in huge table

I got table with over 1 millions rows.
This table represents user information, e.g userName, email, gender, marrial status etc.
I'm going to write search over all rows in this table, when some conditions are applied.
In simples case, when search is perfomed only on userName, it takes over 4-7 seconds to find result.
select from u where u.name ilike " ... "
Yes, i got indexes over some fileds. I checked that they are applied using explain analyse command.
How search can be boost ?
I heart something about Lucene, can it help ?
I'm wondering how does Facebook search working, they got billions users and their search works much faster.
There is great difference between these three queries:
a) SELECT * FROM u WHERE u.name LIKE "George%"
b) SELECT * FROM u WHERE u.name LIKE "%George"
c) SELECT * FROM u WHERE u.name LIKE "%George%"
a) The first will use the index on u.name (if there is one) and will be very fast.
b) The second will not be able to use any index on u.name but there are ways to circumvent that rather easily.
For example, you could add another field nameReversed in the table where REVERSE(name) is stored. With an index on that field, the query will be rewritten as (and will be as fast as the first one):
b2) SELECT * FROM u WHERE u.nameReversed LIKE REVERSE("%George")
c) The third query poses the greatest difficulty as neither of the two previous indexes will be of any help and the query will scan the whole table. Alternatives are:
Using a dedicated for such problems solution (search for "full text search"), like Sphinx. See this question on SO with more details: which-is-best-search-technique-to-search-records
If your field has names only (or another limited set of words, say a few hundred different words), you could create another auxilary table with those names (words) and store only a foreign key in table u.
If off course that is not the case and you have tens of thousands or millions different words or the field contains whole phrases, then to solve the problem with many auxilary tables, it's like creating a full text search tool for yourself. It's a nice exercise and you won't have to use Sphinx (or other) besides the RDBMS but it's not trivial.
Take a look at
Hibernate Search
this is using Lucene but a lot more easier to implement.
Google or Facebook are using different approaches. They have distributed systems. Googles BigTable is a good keyword or the "Map and Reduce" concept (Apache Hadoop) is a good starting point for more research.
Try to use table partitioning.
In large table scenarios can be helpful to partiton a table.
For PostgreSQL try here PostgreSQL Partitioning.
For high scalable fast performance searches, sometimes may be useful to adopt NoSQL database (like Facebook does).
I heart something about Lucene, can it help ?
Yes, it can. I'm sure, you will love it!
I had the same problem: An table with round about 1.2 Million Messages. By searching trough these Messages it needs some seconds. An full text search on the "message" column needs about 10 seconds.
At the same server hardware lucene returns the result in about 200-400ms.
That's very fast.
Cached results returns in round about 5-10 ms.
Lucene is able to connect to your SQL database (for example mysql) - scans your database an builds an searchable index.
For searching this index it depends on the kind of application.
I my case, my PHP Webaplication uses solr for searching inside lucene.
http://lucene.apache.org/solr/

How to efficiently build and store semantic graph?

Surfing the net I ran into Aquabrowser (no need to click, I'll post a pic of the relevant part).
It has a nice way of presenting search results and discovering semantically linked entities.
Here is a screenshot taken from one of the demos.
On the left side you have they word you typed and related words.
Clicking them refines your results.
Now as an example project I have a data set of film entities and subjects (like wolrd-war-2 or prison-escape) and their relations.
Now I imagine several use cases, first where a user starts with a keyword.
For example "world war 2".
Then i would somehow like to calculate related keywords and rank them.
I think about some sql query like this:
Lets assume "world war 2" has id 3.
select keywordId, count(keywordId) as total from keywordRelations
WHERE movieId IN (select movieId from keywordRelations
join movies using (movieId)
where keywordId=3)
group by keywordId order by total desc
which basically should select all movies which also have the keyword world-war-2 and then looks up the keywords which theese films have as well and selects those which occour the most.
I think with theese keywords I can select movies which match best and have a nice tag cloud containing similar movies and related keywords.
I think this should work but its very, very, very inefficient.
And its also only one level or relation.
There must be a better way to do this, but how??
I basically have an collection of entities. They could be different entities (movies, actors, subjects, plot-keywords) etc.
I also have relations between them.
It must somehow be possible to efficiently calculate "semantic distance" for entities.
I also would like to implement more levels of relation.
But I am totally stuck. Well I have tried different approaches but everything ends up in some algorithms that take ages to calculate and the runtime grows exponentially.
Are there any database systems available optimized for that?
Can someone point me in the right direction?
You probably want an RDF triplestore. Redland is a pretty commonly used one, but it really depends on your needs. Queries are done in SPARQL, not SQL. Also... you have to drink the semantic web koolaid.
From your tags I see you're more familiar with sql, and I think it's still possible to use it effectively for your task.
I have an application where a custom-made full-text search implemented using sqlite as a database. In the search field I can enter terms and popup list will show suggestions about the word and for any next word only those are shown that appears in the articles where previously entered words appeared. So it's similar to the task you described
To make things more simple let's assume we have only three tables. I suppose you have a different schema and even details can be different but my explanation is just to give an idea.
Words
[Id, Word] The table contains words (keywords)
Index
[Id, WordId, ArticleId]
This table (indexed also by WordId) lists articles where this term appeared
ArticleRanges
[ArticleId, IndexIdFrom, IndexIdTo]
This table lists ranges of Index.Id for any given Article (obviously also indexed by ArticleId) . This table requires that for any new or updated article Index table should contain entries having known from-to range. I suppose it can be achieved with any RDBMS with a little help of autoincrement feature
So for any given string of words you
Intersect all articles where all previous words appeared. This will narrow the search. SELECT ArticleId FROM Index Where WordId=... INTERSECT ...
For the list of articles you can get ranges of records from ArticleRanges table
For this range you can effectively query WordId lists from Index grouping the results to get Count and finally sort by it.
Although I listed them as separate actions, the final query can be just big sql based on the parsed query string.

DB Design question: Tree (one table) vs. Two tables for tweets and retweets?

I've heard that on stackoverflow questions and answers are stored in the same DB table.
If you were to build a twitter like service that only would allow 1 level of commenting. ie 1 tweet and then comments/replies to that tweet but no re-comments or re-replies.
would you use two tables for tweets and retweets? or just one table where the field parent_tweet_id is optional?
I know this is an open question, but what are some advantages of either solutions?
Retweets are still normal tweets as well. So one table. You wouldn't want to have to load from two tables to include the retweets.
Advantages of one table:
You can search through all tweets and comments in a simple way.
You can use one identity column easily for all posts.
Every post has the same set of columns.
Advantages of two tables:
If it's more common to search or display only top-level tweets instead of tweets + comments, the table of tweets is that much smaller without comments.
Two tables can have different sets of columns, so if there are columns meaningful for one type of post but not the other, you can put these columns in the respective table without having to leave them null when not applicable.
Indexes can also be different on two tables, so if you have the need to search comments in different ways, you can make indexes specialized to that task.
In short, it depends on how you use the data, not only how it's structured. You haven't said much about the operations you need to do with the data.
Like all design questions, it depends.
I don't normally like to mix concepts in a single table. I find it can quickly damage the conceptual integrity of the database schema. For example, I would not put posts and replies in the same table because they are different entities.