SELECT with ORs including table joins - sql

I've got a database with three tables: Books (with book details, PK is CopyID), Keywords (list of keywords, PK is ID) and KeywordsLink which is the many-many link table between Books and Keywords with the fields ID, BookID and KeywordID.
I'm trying to make an advanced search form in my app where you can search on various criteria. At the moment I have it working with Title, Author and Publisher (all from the Book table). It produces SQL like:
SELECT * FROM Books WHERE Title Like '%Software%' OR Author LIKE '%Spolsky%';
I want to extend this search to also search using tags - basically to add another OR clause to search the tags. I've tried to do this by doing the following
SELECT *
FROM Books, Keywords, Keywordslink
WHERE Title LIKE '%Joel%'
OR (Name LIKE '%good%' AND BookID=Books.CopyID AND KeywordID=Keywords.ID)
I thought using the brackets might separate the 2nd part into its own kinda clause, so the join was only evaluated in that part - but it doesn't seem to be so. All it gives me is a long list of multiple copies of the one book that satisfies the Title LIKE '%Joel%' bit.
Is there a way of doing this using pure SQL, or would I have to use two SQL statements and combine them in my app (removing duplicates in the process).
I'm using MySQL at the moment if that matters, but the app uses ODBC and I'm hoping to make it DB agnostic (might even use SQLite eventually or have it so the user can choose what DB to use).

You need to join the 3 tables together, which gives you a tablular resultset. You can then check any columns you like, and make sure you get distinct results (i.e. no duplicates).
Like this:
select distinct b.*
from books b
left join keywordslink kl on kl.bookid = b.bookid
left join keywords k on kl.keywordid = k.keywordid
where b.title like '%assd%'
or k.keyword like '%asdsad%'
You should also try to avoid starting your LIKE values with a percent sign (%), as this means SQL Server can't use an index on that column and has to perform a full (and slow) table scan. This starts to make your query into a "starts with" query.
Maybe consider the full-text search options in SQL Server, also.

What you've done here is made a cartesian result set by having the tables joined with the commas but not having any join criteria. Switch your statements to use outer join statements and that should allow you to reference the keywords. I don't know your schema, but maybe something like this would work:
SELECT
*
FROM
Books
LEFT OUTER JOIN KeywordsLink ON KeywordsLink.BookID = Books.CopyID
LEFT OUTER JOIN Keywords ON Keywords.ID = KeywordsLink.KeywordID
WHERE Books.Title LIKE '%JOEL%'
OR Keywords.Name LIKE '%GOOD%'

Use UNION.
(SELECT Books.* FROM <first kind of search>)
UNION
(SELECT Books.* FROM <second kind of search>)
The point is that you could write two (or more) simple and efficient queries instead of one complicated query that tries to do everything at once.
If number of resulting rows is low, then UNION will have very little overhead (and you can use faster UNION ALL if you don't have duplicates or don't care about them).

SELECT * FROM books WHERE title LIKE'%Joel%' OR bookid IN
(SELECT bookid FROM keywordslink WHERE keywordid IN
(SELECT id FROM keywords WHERE name LIKE '%good%'))
Beware that older versions of MySQL didn't like subselects. I think they've fixed that.

You must also limit the product of the join by specifying something like
Books.FK1 = Keywords.FK1 and
Books.FK2 = Keywordslink.FK2 and
Keywords.FK3 = Keywordslink.FK3
But i don't know your exact data model so your solution may be slightly different.

I'm not aware of any way to accomplish a "conditional join" in SQL. I think you'll be best served with executing the two statements separately and combining them in the application. This approach is also more likely to stay DB-agnostic.

It looks like Neil Barnwell has covered the answer that I would have given, but I'll add one thing...
Books can have more than one author. If your data model is really designed as your query implies you might want to consider changing it to accommodate that fact.

Related

Items in Multiple Categories Show Multiple Times

I've got items in an "equip" table that are linked to the equipcat table using a junction table. The problem is that I want to get a list of all items where the user supplied search term is found inside one of numerous fields, including the equipcat (aka category description) field. But I want each item to only be listed once.
It seems I must have some fundamental misunderstanding about SQL because I've faced this problem before and had trouble figuring it out. I'm not only looking to solve this particular issue but to also understand it better for future needs.
Here's my SQL. Please ignore the fuzzy searches as I realize they don't scale/perform well. I'm also aware that my use of a single field to hold keywords violates good design and I'm simply asking that you ignore that unless you feel that it is important to the question I'm asking.
SELECT equip.equipid, equip.equipdesc, equip.equipgeneraldesc,
equip.keywords, equip.dayprice, equip.weekprice,
equip.monthprice, equip.hideyn, equipcat.equipcat,
equipcat.equipcatkeywords
FROM (equip INNER JOIN equip_equipcat ON equip.equipid = equip_equipcat.equipid)
INNER JOIN equipcat ON equip_equipcat.equipcatid = equipcat.equipcatid
WHERE (equip.equipdesc LIKE '%rake%' OR equip.keywords LIKE '%rake%' OR
equipcat.equipcat LIKE '%rake%' OR equipcat.equipcatkeywords LIKE '%rake%')
AND (equip.hideyn = 0)
ORDER BY equipdesc ASC;
SELECT equip.*
FROM equip e
WHERE equipid IN
(
SELECT equipid
FROM equip_equipcat ec
JOIN equipcat c
ON c.equipcatid = ee.equipcatid
WHERE equipcat LIKE '%rake%'
OR
equipcatkeywords LIKE '%rake%'
)
AND
(
equipdesc LIKE '%rake%'
OR
keywords LIKE '%rake%'
)
AND hideyn = 0
ORDER BY
equipdesc
A) You included fields From table equipcat into your result and thus a line is needed for each category anyways => remove those columns from your query
B) You may then add a distinct keyword to your query (i.e. SELECT DISTINCT...) and reduce multiple lines to distinct ones only.

How to avoid Cartesian product in an INNER JOIN query?

I have 6 tables, let's call them a,b,c,d,e,f. Now I want to search all the colums (except the ID columns) of all tables for a certain word, let's say 'Joe'. What I did was, I made INNER JOINS over all the tables and then used LIKE to search the columns.
INNER JOIN
...
ON
INNER JOIN
...
ON.......etc.
WHERE a.firstname
~* 'Joe'
OR a.lastname
~* 'Joe'
OR b.favorite_food
~* 'Joe'
OR c.job
~* 'Joe'.......etc.
The results are correct, I get all the colums I was looking for. But I also get some kind of cartesian product, I get 2 or more lines with almost the same results.
How can i avoid this? I want so have each line only once, since the results should appear on a web search.
UPDATE
I first tried to figure out if the SELECT DISTINCT thing would work by using this statement: pastie.org/970959 But it still gives me a cartesian product. What's wrong with this?
try SELECT DISTINCT?
On what condition do you JOIN this tables? Do you have foreign keys or something?
Maybe you should find that word on each table separately?
What kind of server are you using? Microsoft SQL Server has a full-text index feature (I think others have something like this too) which lets you search for keywords in a much less resource-intensive way.
Also consider using UNION instead of joining the tables.
Without seeing your tables, I can only really assume what's going on here is you have a one-to-many relationship somewhere. You probably want to do everything in a subquery, select out the distinct IDs, then get the data you want to display by ID. Something like:
SELECT a.*, b.*
FROM (SELECT DISTINCT a.ID
FROM ...
INNER JOIN ...
INNER JOIN ...
WHERE ...) x
INNER JOIN a ON x.ID = a.ID
INNER JOIN b ON x.ID = b.ID
A couple of things to note, however:
This is going to be sloooow and you probably want to use full-text search instead (if your RDBMS supports it).
It may be faster to search each table separately rather than to join everything in a Cartesian product first and then filter with ORs.
If your tables are entity type tables, for example a being persons and b being companies, I don't think you can avoid a cartesian product if you search for the results in this way (single query).
You say you want to search all the tables for a certain word, but you probably want to separate the results into the corresponding types. Right? Otherwise a web search would not make much sense.
So if you seach for 'Joe', you want to see persons containing the name 'Joe' and for example the company named 'Joe's gym'. Since you are searching for different entities so you should split the search into different queries.
If you really want to do this in one query, you will have to change your database structure to accommodate. You will need some form of 'search table' containing an entity ID (PK) and entity type, and a list of keywords you want that entity to be found with. For example:
EntityType, EntityID, Keywords
------------------------------
Person, 4, 'Joe', 'Doe'
Company, 12, 'Joe''s Gym', 'Gym'
Something like that?
However it's different when your search returns only one type of entity, say a Person, and you want to return the Persons for which you get a hit on that keyword (in any related table to that Person). Then you will need to select all the fields you want to show and group by them, leaving out the fields in which you are searching. Including them inevitably leads to a cartesian product.
I'm just brainstorming here, by the way. It hope it's helpful.

'dynamic' SQL join possible?

I have a table Action with a (part of the) structure like this:
Action:
ActionTypeID,
ActionConnectionID
...
ActionTypeID refers to types 1-5 which correspond to different tables (1=Pro, 2=Project etc.)
ActionConnectionID is the primaryKey in the corresponding table;
ie. ActionTypeID=1,
ActionConnectionID=43 -> would point
to Pro.ProID=43 and ActionTypeID=2,
ActionConnectionID=233 -> would point
to Project.ProjectID=233
Is there a way to 'dynamically join the different tables depending on the value in the ActionTypeID column?
ie. for records with the ActionTypeID=1 this would be:
Select Action.*
From Action Left Join Pro On Action.ActionConnectionID=Pro.ProID
for records with the ActionTypeID=2 this would be:
Select Action.*
From Action Left Join Project On Action.ActionConnectionID=Project.ProjectID
etc.
If this is not possible to accomplish in one query I will have to loop over all the possible ActionTypes and perform the query and then afterwards join the data in one query again - that would be possible, but doesnt sound like the most efficient way :-)
Something like this should do:
Select Action.*
From Action
Left Join Pro
ON Action.ActionConnectionID=Pro.ProID and ActionTypeID=1
Left Join Project
ON Action.ActionConnectionID=Project.ProjectID and ActionTypeID=2
If that doesn't work for either try using dynamic sql which is a bad solution or properly normalize your data.
Are you just trying to select everything without any filters at all? I always hate when people give answers that are basically "don't do t like that, do it like this instead" but now I'm going to go ahead and do that myself. Have you considered a different schema where you don't have to write this kind of query? I assume that the Pro, Project, etc, tables all have the same schema - can they be combined into one table? Perhaps you don't have control over that and are working with a DB you can't change (been there myself). You should explore using UNION to join up the pieces that you need.
(Select Action.*
From Action Left Join Pro On Action.ActionConnectionID=Pro.ProID)
UNION
(Select Action.*
From Action Left Join Project On Action.ActionConnectionID=Project.ProjectID)

Why is selecting specified columns, and all, wrong in Oracle SQL?

Say I have a select statement that goes..
select * from animals
That gives a a query result of all the columns in the table.
Now, if the 42nd column of the table animals is is_parent, and I want to return that in my results, just after gender, so I can see it more easily. But I also want all the other columns.
select is_parent, * from animals
This returns ORA-00936: missing expression.
The same statement will work fine in Sybase, and I know that you need to add a table alias to the animals table to get it to work ( select is_parent, a.* from animals ani), but why must Oracle need a table alias to be able to work out the select?
Actually, it's easy to solve the original problem. You just have to qualify the *.
select is_parent, animals.* from animals;
should work just fine. Aliases for the table names also work.
There is no merit in doing this in production code. We should explicitly name the columns we want rather than using the SELECT * construct.
As for ad hoc querying, get yourself an IDE - SQL Developer, TOAD, PL/SQL Developer, etc - which allows us to manipulate queries and result sets without needing extensions to SQL.
Good question, I've often wondered this myself but have then accepted it as one of those things...
Similar problem is this:
sql>select geometrie.SDO_GTYPE from ngg_basiscomponent
ORA-00904: "GEOMETRIE"."SDO_GTYPE": invalid identifier
where geometrie is a column of type mdsys.sdo_geometry.
Add an alias and the thing works.
sql>select a.geometrie.SDO_GTYPE from ngg_basiscomponent a;
Lots of good answers so far on why select * shouldn't be used and they're all perfectly correct. However, don't think any of them answer the original question on why the particular syntax fails.
Sadly, I think the reason is... "because it doesn't".
I don't think it's anything to do with single-table vs. multi-table queries:
This works fine:
select *
from
person p inner join user u on u.person_id = p.person_id
But this fails:
select p.person_id, *
from
person p inner join user u on u.person_id = p.person_id
While this works:
select p.person_id, p.*, u.*
from
person p inner join user u on u.person_id = p.person_id
It might be some historical compatibility thing with 20-year old legacy code.
Another for the "buy why!!!" bucket, along with why can't you group by an alias?
The use case for the alias.* format is as follows
select parent.*, child.col
from parent join child on parent.parent_id = child.parent_id
That is, selecting all the columns from one table in a join, plus (optionally) one or more columns from other tables.
The fact that you can use it to select the same column twice is just a side-effect. There is no real point to selecting the same column twice and I don't think laziness is a real justification.
Select * in the real world is only dangerous when referring to columns by index number after retrieval rather than by name, the bigger problem is inefficiency when not all columns are required in the resultset (network traffic, cpu and memory load).
Of course if you're adding columns from other tables (as is the case in this example it can be dangerous as these tables may over time have columns with matching names, select *, x in that case would fail if a column x is added to the table that previously didn't have it.
why must Oracle need a table alias to be able to work out the select
Teradata is requiring the same. As both are quite old (maybe better call it mature :-) DBMSes this might be historical reasons.
My usual explanation is: an unqualified * means everything/all columns and the parser/optimizer is simply confused because you request more than everything.

Attempt at database localization using table-valued functions

I'm looking for opinions on the following localization technique:
We start with 2 tables:
tblProducts : ProductID, Name,Description,SomeAttribute
tblProductsLocalization : ProductID,Language,Name,Description
and a table-valued function:
CREATE FUNCTION [dbo].[LocalizedProducts](#locale nvarchar(50))
RETURNS TABLE
AS (SELECT a.ProductID,COALESCE(b.Name,a.Name)as [Name],COALESCE(b.Description,a.Description)as [Description],a.SomeAttribute
from tblProducts a
left outer join tblProductsLocalization_Locale b
on a.ProductID= b.ProductID and b.[Language]=#locale)
What I plan to do is include the the function whenever i need localized-data returned:
select * from LocalizedProducts('en-US') where ID=1
instead of
select * from tblProducts where ID=1
I'm interested if there are major performance concerns arround this or any showstoppers. Any reasons I shouldn't adopt this?
Edit: I've tagged this SQL2005 , altough I develop this using 2008, I think the deployment target only has SQL2005. I could upgrade to 2008 if the need arises though.
Later edit:
I have created a view, with identical content, but without the parameter:
CREATE VIEW [dbo].[LocalizedProductsView]
AS
SELECT b.Language,a.ProductID,COALESCE(b.Name,a.Name)as [Name],
COALESCE(b.Description,a.Description)as [Description],a.SomeAttributefrom tblProducts a
left outer join tblProductsLocalization_Locale b on a.ProductID= b.ProductID
I then proceeded to run some tests:
Estimated execution plan looks identical to both queries:
select * from LocalizedProducts('us-US') where SomeNonIndexedParameter=2
select * from LocalizedProductsView where (Language='us-US' or Language is null) and SomeNonIndexedPramaters=2
Final Question that arrises is: Should I understand that the TVF is computing the translations on ALL the products, regardless of the WHERE parameters? is the View doing the same thing ?
Short answer: As a general rule, there is nothing wrong with using a TVF for this sort of thing, but I would suggest making the ID be a parameter, also:
CREATE FUNCTION [dbo].[LocalizedProducts](#ID int, #locale nvarchar(50))
RETURNS TABLE
AS (SELECT a.ProductID,COALESCE(b.Name,a.Name)as [Name],COALESCE(b.Description,a.Description)as [Description],a.SomeAttribute
from tblProducts a
left outer join tblProductsLocalization _Locale b
on a.ProductID= b.ProductID and b.[Language]=#locale)
where a.ProductId = #ID
Used like so:
select * from LocalizedProducts(1, 'en-US')
Longer explanation:
I've never tried something like this in SQL 2008 yet, so it's possible that SQL Server can optimized this issue away.
My experience in earlier versions, though, seems to suggest that SQL Server tends to handle
User-Defined Functions in a more procedural than declarative fashion, so it doesn't interpret what you want and then figure out the best way to get you what you want, but actually performs in order the instructions you've written. So it appears to me that this method would:
select all English-language text, placing it into a table variable.
take the results of step #1 and select any records with the given ID.
This would mean a lot of wasted cycles, putting mostly-unused English text into the table variable, before applying the ID filter to that result set. On the other hand, putting all of the filters into the UDF would let SQL Server determine whether it's easiest to filter by ID first (more likely, assuming a standard indexing scheme), and then apply the locale filter, or vice versa. Either way, you should be having less data being moved around in the background, and thus have better performance, if you put all your filters in one spot. Again, this all assumes that SQL Server is not now making giant leaps in optimization. But if so, that's even more reason to say, yes, there is no problem using the TVF.
It's a safe bet that you'll have to translate more than product names. So I'd design the translation solution to handle any kind of string.
For example, you could have a localization table like:
Id, TranslatableStringId, Language, Translation
Then each product could have a translatable string associated with it. But also the explanatory text on top of the product list.
For products, you'd query like:
SELECT *
FROM Products p
INNER JOIN Translations t
ON p.DescriptionId = t.TranslatableStringId
AND t.language = 'en-US'
For an explanatory text, you'd get a simple:
SELECT t.Translation
FROM Translations t
WHERE t.TranslatableStringId = 123 -- ID of string
AND t.language = 'en-US'
P.S. For a real program, I'd use a more shorthand description than TranslatableStringId, like tsid, because translations tend to pop up everywhere.
I wanted to come back with an answer to this after doing a lot more testing.
It appears to me that SQL2008 is actually looking inside the TVF when performing the query plan and optimizing accordingly:
For instance:
select pr.* from LocalizedProducts('en-US') pr inner join LocalizedPhotos('en-US') ph on
ph.ProductId=pr.Id where pr.SomeUnindexProperty= 5
This query needs to touch 4 tables:
Products
Products_Localization
Photos
Photos_Localization
The way the query plan looks is that (let me see if I can format this):
Product gets a Clustered Index Seek
-- >> Products gets nested loop with Photos
-->> nested loop Products_Localization -
->> nested loop Photos_Localization.
Which is not what you would expect if the TVF would be a black box. The simple fact that Product gets an index SEEK would suggest to me that the query will not interpret blindly the entire TVF.
I ran a lot of performance tests, and on average the "localization" TVF are between 50% - 100% slower than using direct table-queries, but that would be expected as twice as many tables are involved in the TVFs than in the normal queries.