SQL many-to-many matching - sql

I'm implementing a tagging system for a website. There are multiple tags per object and multiple objects per tag. This is accomplished by maintaining a table with two values per record, one for the ids of the object and the tag.
I'm looking to write a query to find the objects that match a given set of tags. Suppose I had the following data (in [object] -> [tags]* format)
apple -> fruit red food
banana -> fruit yellow food
cheese -> yellow food
firetruck -> vehicle red
If I want to match (red), I should get apple and firetruck. If I want to match (fruit, food) I should get (apple, banana).
How do I write a SQL query do do what I want?
#Jeremy Ruten,
Thanks for your answer. The notation used was used to give some sample data - my database does have a table with 1 object id and 1 tag per record.
Second, my problem is that I need to get all objects that match all tags. Substituting your OR for an AND like so:
SELECT object WHERE tag = 'fruit' AND tag = 'food';
Yields no results when run.

Given:
object table (primary key id)
objecttags table (foreign keys objectId, tagid)
tags table (primary key id)
SELECT distinct o.*
from object o join objecttags ot on o.Id = ot.objectid
join tags t on ot.tagid = t.id
where t.Name = 'fruit' or t.name = 'food';
This seems backwards, since you want and, but the issue is, 2 tags aren't on the same row, and therefore, an and yields nothing, since 1 single row cannot be both a fruit and a food.
This query will yield duplicates usually, because you will get 1 row of each object, per tag.
If you wish to really do an and in this case, you will need a group by, and a having count = <number of ors> in your query for example.
SELECT distinct o.name, count(*) as count
from object o join objecttags ot on o.Id = ot.objectid
join tags t on ot.tagid = t.id
where t.Name = 'fruit' or t.name = 'food'
group by o.name
having count = 2;

Oh gosh I may have mis-interpreted your original comment.
The easiest way to do this in SQL would be to have three tables:
1) Tags ( tag_id, name )
2) Objects (whatever that is)
3) Object_Tag( tag_id, object_id )
Then you can ask virtually any question you want of the data quickly, easily, and efficiently (provided you index appropriately). If you want to get fancy, you can allow multi-word tags, too (there's an elegant way, and a less elegant way, I can think of).
I assume that's what you've got, so this SQL below will work:
The literal way:
SELECT obj
FROM object
WHERE EXISTS( SELECT *
FROM tags
WHERE tag = 'fruit'
AND oid = object_id )
AND EXISTS( SELECT *
FROM tags
WHERE tag = 'Apple'
AND oid = object_id )
There are also other ways you can do it, such as:
SELECT oid
FROM tags
WHERE tag = 'Apple'
INTERSECT
SELECT oid
FROM tags
WHERE tag = 'Fruit'

#Kyle: Your query should be more like:
SELECT object WHERE tag IN ('fruit', 'food');
Your query was looking for rows where the tag was both fruit AND food, which is impossible seeing as the field can only have one value, not both at the same time.

Combine Steve M.'s suggestion with Jeremy's you'll get a single record with what you are looking for:
select object
from tblTags
where tag = #firstMatch
and (
#secondMatch is null
or
(object in (select object from tblTags where tag = #secondMatch)
)
Now, that doesn't scale very well but it will get what you are looking for. I think there is a better way to go about doing this so you can easily have N number of matching items without a great deal of impact to the code but it currently escapes me.

I recommend the following schema.
Objects: objectID, objectName
Tags: tagID, tagName
ObjectTag: objectID,tagID
With the following query.
select distinct
objectName
from
ObjectTab ot
join object o
on o.objectID = ot.objectID
join tabs t
on t.tagID = ot.tagID
where
tagName in ('red','fruit')

I'd suggest making your table have 1 tag per record, like this:
apple -> fruit
apple -> red
apple -> food
banana -> fruit
banana -> yellow
banana -> food
Then you could just
SELECT object WHERE tag = 'fruit' OR tag = 'food';
If you really want to do it your way though, you could do it like this:
SELECT object WHERE tag LIKE 'red' OR tag LIKE '% red' OR tag LIKE 'red %' OR tag LIKE '% red %';

Related

Flatten multiple query results with same ID to single row?

I'm curious about something in a SQL Server database. My current query pulls data about my employer's items for sale. It finds information for just under 105,000 items, which is correct. However, it returns over 155,000 rows, because each item has other things related to it. Right now, I run that data through a loop in Python, manually flattening it out by checking if the item the loop is working on is the same one it just worked on. If it is, I start filling in that item's extra information. Ideally, the SQL would return all this data already put into one row.
Here is an overview of the setup. I'm leaving out a few details for simplicity's sake, since I'm curious about the general theory, not looking for something I can copy and paste.
Item: contains the item ID, SKU, description, vendor ID, weight, and dimensions.
AttributeName: contains attr_id and attr_text. For instance, "color", "size", or "style".
AttributeValue: contains attr_value_id and attr_text. For instance, "blue" or "small".
AttributeAssign: contains item_id and attr_id. This ties attribute names to items.
attributeValueAssign: contains item_id and attr_value_id, tying attribute values to items.
A series of attachments is set up in a similar way, but with attachment and attachmentAssignment. Attachments can have only values, no names, so there is no need for the extra complexity of a third table as there is with attributes.
Vendor is simple: the ID is used in the item table. That is:
select item_id, vendorName
from item
join vendor on vendor_id = item.vendorNumber
gets you the name of an item's vendor.
Now, the fun part: items may or may not have vendors, attributes, or attachments. If they have either of the latter two, there's no way to know how many they have. I've seen items with 0 attributes and items with 5. Attachments are simpler, as there can only be 0 or 1 per item, but the possibility of 0 still demands an outer left join so I am guaranteed to get all the items.
That's how I get multiple rows per item. If an item has three attrigbutes, I get either four or seven rows for just that item--I'm not sure if it's a row per name/value or a row per name AND a row per value. Either way, this is the kind of thing I'd like to stop. I want each row in my result set to contain all attributes, with a cap at seven and null for any missing attribute. That is, something like:
item_id; item_title; item_sku; ... attribute1_name; attribute1_value; attribute2_name; attribute2_value; ... attribute7_value
1; some random item; 123-45; ... color; blue; size; medium; ... null
Right now, I'd get multiple rows for that, such as (only ID and attributes):
ID; attribute 1 name; attribute 1 value; attribute 2 name; attribute 2 value
1; color; blue; null; null
1; color; blue; size; medium
I'm after the second row only--all the information put together into one row per unique item ID. Currently, though, I get multiple rows, and Python has to put everything together. I'm outputting this to a spreadsheet, so information about an item has to be on that item's row.
I can just keep using Python if this is too much bother. But I wondered if there was a way to do it that would be relatively easy. My script works fine, and execution time isn't a concern. This is more for my own curiosity than a need to get anything working. Any thoughts on how--or if--this is possible?
Here is #WCWedin's answer modified to use a CTE.
WITH attrib_rn as
(
select
*, row_number() over(partition by item_id order by name, attribute_id) as row_number
from attributes
)
select
i.item_id,
attr1.name as attribute1_name, attr1.value as attribute1_value,
...
attr7.name as attribute7_name, attr7.value as attribute7_value
from items i
left join attrib_rn as attr1 ON attr1.item_id = i.item_id AND attr1.row_number = 1
left join attrib_rn as attr2 ON attr2.item_id = i.item_id AND attr2.row_number = 2
left join attrib_rn as attr3 ON attr3.item_id = i.item_id AND attr3.row_number = 3
left join attrib_rn as attr4 ON attr4.item_id = i.item_id AND attr4.row_number = 4
left join attrib_rn as attr5 ON attr5.item_id = i.item_id AND attr5.row_number = 5
left join attrib_rn as attr6 ON attr6.item_id = i.item_id AND attr6.row_number = 6
left join attrib_rn as attr7 ON attr7.item_id = i.item_id AND attr7.row_number = 7
Since you only want the first 7 attributes and you want to keep all of the logic in the SQL query, you're probably looking at using row_number. Subqueries will do the job directly with multiple joins, and the performance will probably be pretty good since you're only joining so many times.
select
i.item_id,
attr1.name as attribute1_name, attr1.value as attribute1_value,
...
attr7.name as attribute7_name, attr7.value as attribute7_value
from
items i
left join (
select
*, row_number() over(partition by item_id order by name, attribute_id) as row_number
from
attributes
) AS attr1 ON
attr1.item_id = i.item_id
AND attr1.row_number = 1
...
left join (
select
*, row_number() over(partition by item_id order by name, attribute_id) as row_number
from
attributes
) AS attr7 ON
attr7.item_id = i.item_id
AND attr7.row_number = 7
In SQL Server, you can tackle this with a subquery containing 'ROW_NUMBER() OVER', and a few CASE statements to map the top 7 into columns.
A little tricky, but post your full query that returns the big list and I'll demonstrate how to transpose it.

How to use a variable AS a where clause?

I have one where clause which I have to use multiple times. I am quite new to Oracle SQL, so please forgive me for my newbe mistakes :). I have read this website, but could not find the answer :(. Here's the SQL statement:
var condition varchar2(100)
exec :condition := 'column 1 = 1 AND column2 = 2, etc.'
Select a.content, b.content
from
(Select (DBMS_LOB.SUBSTR(ost_bama_vrij_veld.inhoud,3)) as content
from table_name
where category = X AND :condition
group by (DBMS_LOB.SUBSTR(ost_bama_vrij_veld.inhoud,3))
) A
,
(Select (DBMS_LOB.SUBSTR(ost_bama_vrij_veld.inhoud,100)) as content
from table_name
where category = Y AND :condition
group by (DBMS_LOB.SUBSTR(ost_bama_vrij_veld.inhoud,100))) B
GROUP BY
a.content, b.content
The content field is a CLOB field and unfortunately all values needed are in the same column. My query does not work ofcourse.
You can't use a bind variable for that much of a where clause, only for specific values. You could use a substitution variable if you're running this in SQL*Plus or SQL Developer (and maybe some other clients):
define condition = 'column 1 = 1 AND column2 = 2, etc.'
Select a.content, b.content
from
(Select (DBMS_LOB.SUBSTR(ost_bama_vrij_veld.inhoud,3)) as content
from table_name
where category = X AND &condition
...
From other places, including JDBC and OCI, you'd need to have the condition as a variable and build the query string using that, so it's repeated in the code that the parser sees. From PL/SQL you could use dynamic SQL to achieve the same thing. I'm not sure why just repeating the conditions is a problem though, binding arguments if values are going to change. Certainly with two clauses like this it seems a bit pointless.
But maybe you could approach this from a different angle and remove the need to repeat the where clause. Querying the table twice might not be efficient anyway. You could apply your condition once as a subquery, but without knowing your indexes or the selectivity of the conditions this could be worse:
with sub_table as (
select category, content
from my_table
where category in (X, Y)
and column 1 = 1 AND column2 = 2, etc.
)
Select a.content, b.content
from
(Select (DBMS_LOB.SUBSTR(ost_bama_vrij_veld.inhoud,3)) as content
from sub_table
where category = X
group by (DBMS_LOB.SUBSTR(ost_bama_vrij_veld.inhoud,3))
) A
,
(Select (DBMS_LOB.SUBSTR(ost_bama_vrij_veld.inhoud,100)) as content
from sub_table
where category = Y
group by (DBMS_LOB.SUBSTR(ost_bama_vrij_veld.inhoud,100))) B
GROUP BY
a.content, b.content
I'm not sure what the grouping is for - to eliminate duplicates? This only really makes sense if you have a single X and Y record matching the other conditions, doesn't it? Maybe I'm not following it properly.
You could also use a case statement:
select max(content_x), max(content_y)
from (
select
case when category = X
then DBMS_LOB.SUBSTR(ost_bama_vrij_veld.inhoud,3) end as content_x,
case when category = Y
then DBMS_LOB.SUBSTR(ost_bama_vrij_veld.inhoud,100) end as content_y,
from my_table
where category in (X, Y)
and column 1 = 1 AND column2 = 2, etc.
)

Parent-child sql query with order by and limit

I have two tables DOCUMENT and ATTRIBUTES like these
DOCUMENT(id),
ATTRIBUTE(name, value, doc_fk).
I need to run a query that works like this "abstract query"
select top 100 documents
where $state='COMPLETED'
order by $creationDate
Where $state and $creationDate are two attributes.
Note that the limit is on documents, not attributes, and sort and filter are on two different attributes. The final query should return all document attributes, not only the filtered/sorted ones.
I was able to write this with a very complex query and I'm looking for better alternatives. I could post my solution if useful, but I do not want to point you in the, possibly, wrong direction.
It's ok to get a FEW extra documents, like 1000 instead of 100, and filter/sort in memory.
Could be ok for the limit not to be exact, like 74 instead of the required limit 100, but not too far from it.
Extra "soft" requirements:
the query should work with several databases (oracle, mysql and sqlserver), so weird analytic functions should be avoided unless available on all platforms
should work with JPA (eclipselink 2.4.0 implementation)
The expected output is something like this
DOC_ID ATTRIBUTE_NAME VALUE
123 state COMPLETED
123 creationDate 21/11/2012
123 userid someone
456 state COMPLETED
...
Ah, the flaws of an EAV design.
Try this.
select
top 100
document.*
from document
inner join attribute astate on document.id = astate.doc_fk
and astate.name='state'
and astate.value = 'completed'
inner join attribute acreation on document.id = acreation.doc_fk
and acreation.name='creationdate'
order by cast(acreation.value as date)
But it's only going to get more complicated if you persist with this EAV structure.
(PS. MySQL doesn't use TOP, but LIMIT instead)
SELECT doc_id, attr_name, attr_val, creationDate FROM
(
SELECT * FROM (
SELECT
doc.id as 'doc_id', attr.name as 'attr_name', null as 'attr_val', attr.value as 'creationDate'
FROM
ATTRIBUTE attr
LEFT JOIN
DOCUMENT doc ON attr.doc_fk = doc.id
WHERE
attr.name='creationDate'
ORDER BY creationDate desc;
) AS dt1
UNION ALL
SELECT * FROM(
SELECT
doc.id as 'doc_id', attr.name as 'attr_name', attr.value as 'attr_val', null as 'creationDate'
FROM
ATTRIBUTE attr
LEFT JOIN
DOCUMENT doc ON attr.doc_fk = doc.id;
) as dt2
) as dt0 GROUP BY doc_id ORDER by creationDate desc LIMIT 100;
Derived table 1 (dt1) gives you all the date attributes - to enable order your results by document's creation date.
Derived table 2 gives you all the attribute.. all put together by "union all", enables you to group by document, then order by the date of creation.
Hope this is in the right direction.

SQL query on a condition

I'm writing a query to retrieve translated content. I want it so that if there isn't a translation for the given language id, it automatically returns the translation for the default language, with Id 1.
select Translation.Title
,Translation.Summary
from Translation
where Translation.FkLanguageId = 3
-- If there is no LanguageId of 3, select the record with LanguageId of 1.
I'm working in MS SQL but I think the theory is not DBMS-specific.
Thanks in advance.
This assumes one row per Translation only, based on how you phrased the question. If you have multiple rows per FkLanguageId and I've misunderstood, please let us know and the query becomes more complex of course
select TOP 1
Translation.Title
,Translation.Summary
from
Translation
where
Translation.FkLanguageId IN (1, 3)
ORDER BY
FkLanguageId DESC
You'd use LIMIT in another RDBMS
Assuming the table contains different phrases grouped by PhraseId
WITH Trans As
(
select Translation.Title
,Translation.Summary
,ROW_NUMBER() OVER (PARTITION BY PhraseId ORDER BY FkLanguageId DESC) RN
from Translation
where Translation.FkLanguageId IN (1,3)
)
SELECT *
FROM Trans WHERE RN=1
This assumes the existance of a TranslationKey that associates one "topic" with several different translation languages:
SELECT
isnull(tX.Title, t1.Title) Title
,isnull(tX.Summary, t1.Summary) Summary
from Translation t1
left outer join Translation tX
on tx.TranslationKey = t1.Translationkey
and tx.FkLanguageId = #TargetLanguageId
where t1.FkLanguageId = 1 -- "Default
Maybe this is a dirty solution, but it can help you
if not exists(select t.Title ,t.Summary from Translation t where t.FkLanguageId = 3)
select t.Title ,t.Summary from Translation t where t.FkLanguageId = 1
else
select t.Title ,t.Summary from Translation t where t.FkLanguageId = 3
Since your reference to pastie.org shows that you're looking up phrases or specific menu item names in a table I'm going to assume that there is a phrase ID to identify the phrases in question.
SELECT ISNULL(forn_lang.Title, default_lang.Title) Title,
ISNULL(forn_lang.Summary, default_lang.Summary) Summary
FROM Translation default_lang
LEFT OUTER JOIN Translation forn_lang ON default_lang.PhraseID = forn_lang.PhraseID AND forn_lang.FkLanguageId = 3
WHERE default_lang.FkLanguageId = 1

How to write a query returning non-chosen records

I have written a psychological testing application, in which the user is presented with a list of words, and s/he has to choose ten words which very much describe himself, then choose words which partially describe himself, and words which do not describe himself. The application itself works fine, but I was interested in exploring the meta-data possibilities: which words have been most frequently chosen in the first category, and which words have never been chosen in the first category. The first query was not a problem, but the second (which words have never been chosen) leaves me stumped.
The table structure is as follows:
table words: id, name
table choices: pid (person id), wid (word id), class (value between 1-6)
Presumably the answer involves a left join between words and choices, but there has to be a modifying statement - where choices.class = 1 - and this is causing me problems. Writing something like
select words.name
from words left join choices
on words.id = choices.wid
where choices.class = 1
and choices.pid = null
causes the database manager to go on a long trip to nowhere. I am using Delphi 7 and Firebird 1.5.
TIA,
No'am
Maybe this is a bit faster:
SELECT w.name
FROM words w
WHERE NOT EXISTS
(SELECT 1
FROM choices c
WHERE c.class = 1 and c.wid = w.id)
Something like that should do the trick:
SELECT name
FROM words
WHERE id NOT IN
(SELECT DISTINCT wid -- DISTINCT is actually redundant
FROM choices
WHERE class == 1)
SELECT words.name
FROM
words
LEFT JOIN choices ON words.id = choices.wid AND choices.class = 1
WHERE choices.pid IS NULL
Make sure you have an index on choices (class, wid).