I've got a SQL related question regarding a general database structure that seems to be somewhat common. I came up with it one day while trying to solve a problem and (later on) I've seen other people do the same thing (or something remarkably similar) so I think the structure itself makes sense. I just have trouble trying to form certain queries against it.
The idea is that you've got a table with "items" in it and you want to store a set of fields and their values for each item. Normally this would be done by simply adding columns to the items table, the problem is that the field(s) themselves (not just the values) vary from item to item. For example, I might have two items:
Article 1
product_id = aproductid
hidden_key = ahiddenkeyvalue
Article 2
product_id = anotherproductid
address = anaddress
You can see that both items have a product_id field (with different values) but the data stored for each item is different.
The structure in the database ends up something like this:
ItemsTable
id
itemdata_1
itemdata_2
...
FieldsTable
id
field_name
...
And the table that relates them and makes it work
FieldsItemRelationsTable
field_id
item_id
value
Well when I'm trying to do something that involves just one "dynamic field" value there's no problem. I usually do something similar to:
SELECT i.* FROM ItemsTable i
INNER JOIN FieldsItemRelationsTable v ON v.item_id = i.id
INNER JOIN FieldsTable f ON f.id = v.field_id
WHERE v.value = 50 AND f.name = 'product_id';
Which selects all items where product_id=50
The problem arises when I need to do something involving multiple "dynamic field" values. Say I want to select all items where product_id = 50 AND hidden_key = 30. Is it possible with a single SQL statement? I've tried:
SELECT i.* FROM ItemsTable i
INNER JOIN FieldsItemRelationsTable v ON v.item_id = i.id
INNER JOIN FieldsTable f ON f.id = v.field_id
WHERE (v.value = 50 AND f.name = 'product_id')
AND (v.value = 30 AND f.name = 'hidden_key');
But it just returns zero rows.
You'll need to do a seperate join for each value you are bringing back...
SELECT i.* FROM ItemsTable i
INNER JOIN FieldsItemRelationsTable v ON v.item_id = i.id
INNER JOIN FieldsTable f ON f.id = v.field_id
INNER JOIN FieldsItemRelationsTable v2 ON v2.item_id = i.id
INNER JOIN FieldsTable f2 ON f2.id = v2.field_id
WHERE v.value = 50 AND f.name = 'product_id'
AND (v2.value = 30 AND f2.name = 'hidden_key');
er...that query might not function (a bit of a copy/paste sludge job on my part), but you get the idea...you'll need the second value held in a second instance of the table(s) (v2 and f2 in my example here) that is seperate than the first instance. v1.value = 30 and v2.value = 50. v1.value = 50 and v1.value = 30 should never return rows as nothing will equal 30 and 50 at the same time
As an after thought...the query will probably read easier had you put the where clause in the join statement
SELECT i.* FROM ItemsTable i
INNER JOIN FieldsItemRelationsTable v ON v.item_id = i.id and v.value = 50
INNER JOIN FieldsTable f ON f.id = v.field_id and f.name = 'product_id'
INNER JOIN FieldsItemRelationsTable v2 ON v2.item_id = i.id and v2.value = 30
INNER JOIN FieldsTable f2 ON f2.id = v2.field_id and f2.name = 'hidden_key'
Functionally both queries should operate the same though. I'm not sure if there's a logical limit...in scheduling systems you'll often see a setup for 'exceptions'...I've got a report query that's joining like this 28 times...one for each exception type returned.
It's called EAV
Some people hate it
There are alternatives (SO)
Sorry to be vague, but I would investigate your options more.
Try doing some left or right joins to see if you get any results. inner joins will not return results sometimes if there are null fields.
its a start.
Dont forget though, outer join = cartesian product
Related
Could anyone help me speed this query up? It currently take 17 minutes to run but does return the correct data and it populates a subform in MS Access. Functions in the rest of the VBA are declared as long to try to speed up more.
Here's the full query:
SELECT lots of things
FROM (((((((((((((((ngstest
INNER JOIN patients
ON ngstest.internalpatientid = patients.internalpatientid)
INNER JOIN referral
ON ngstest.referralid = referral.referralid)
INNER JOIN checker
ON ngstest.bookby = checker.check1id)
INNER JOIN ngspanel
ON ngstest.ngspanelid = ngspanel.ngspanelid)
LEFT JOIN ngspanel AS ngspanel_1
ON ngstest.ngspanelid_b = ngspanel_1.ngspanelid)
INNER JOIN status
ON ngstest.statusid = status.statusid)
INNER JOIN dbo_patient_table
ON patients.patientid = dbo_patient_table.patienttrustid)
LEFT JOIN dna
ON ngstest.dna = dna.dnanumber)
INNER JOIN status AS status_1
ON patients.s_statusoverall = status_1.statusid)
LEFT JOIN gw_gendertable
ON dbo_patient_table.genderid = gw_gendertable.genderid)
LEFT JOIN ngswesbatch
ON ngstest.wesbatch = ngswesbatch.ngswesbatchid)
LEFT JOIN checker AS checker_1
ON ngstest.check1id = checker_1.check1id)
LEFT JOIN checker AS checker_2
ON ngstest.check2id = checker_2.check1id)
LEFT JOIN checker AS checker_3
ON ngstest.check3id = checker_3.check1id)
LEFT JOIN ngspanel AS ngspanel_2
ON ngstest.ngspanelid_c = ngspanel_2.ngspanelid)
LEFT JOIN checker AS checker_4
ON ngstest.check4id = checker_4.check1id
WHERE ((ngstest.referralid IN
(SELECT referralid FROM referral
WHERE grouptypeid = 14)
AND ngstest.ngstestid IN
(SELECT ngstest.ngstestid
FROM ngsanalysis
INNER JOIN ngstest
ON ngsanalysis.ngstestid = ngstest.ngstestid
WHERE ngsanalysis.pedigree = 3302) )
AND status.statusid = 1202218800)
ORDER BY ngstest.priority,
ngstest.daterequested;
The two nested queries are strings from elsewhere in the code so are called in the vba as " & includereferralls & " And " & ParentsStatusesFilter & "
They are:
ParentsStatusesFilter = "NGSTest.NGSTestID in
(SELECT NGSTest.NGSTestID
FROM NGSAnalysis
INNER JOIN NGSTest
ON NGSAnalysis.NGSTestID = NGSTest.NGSTestID
WHERE NGSAnalysis.Pedigree IN (3302,3303,3304)"
And
includereferrals = "NGSTest.ReferralID
(SELECT referralid FROM referral WHERE referral.grouptypeid = 14)"
The query needs to remain readable (and therefore editable) so can't use things like Distinct, Group By or contain any Unions. Have tried Exists instead of In for the nested queries but that stops it from actually filtering the results.
WHERE EXISTS (SELECT NGSTest.NGSTestID
FROM NGSAnalysis
INNER JOIN NGSTest
ON NGSAnalysis.NGSTestID = NGSTest.NGSTestID
WHERE NGSAnalysis.Pedigree IN (3302,3303,3304)
So the exist clause you have there isn't tied to the outer query which would run similar to just added 1 = 1 to the where clause. I took your where clause and converted it. It should look something like this...
WHERE EXISTS (
SELECT referralid
FROM referral
WHERE grouptypeid = 14 AND ngstest.referralid = referral.referralid)
AND EXISTS (
SELECT ngsanalysis.ngstestid
FROM ngsanalysis
WHERE ngsanalysis.pedigree IN (3302,3303,3304) AND ngstest.ngstestid = ngsanalysis.ngstestid
)
AND status.statusid = 1202218800
Adding exists will speed it up a bit, but the the bulk of the slowness is the left joins. Access does not handle the left joins as well as SQL Server does. Change all your joins to inner joins and you will see the query runs very fast. This is obviously not ideal since some relationships are optional. What I have done to get around this is add a default record that replaces a null relationship.
Here is what that looks like for you: In the checker table you could add a record that represents a null value. So put a record into the checker table with check1id of -1 or 0. Then default check1id, check2id, check3id on ngstest to -1 or 0. You will need to do that type of thing for all tables you need to left join on.
I am facing a complex SQL query in some code, which is suppose to return products without duplicates (by the use of DISTINCT keywork at the beginning), here is the query:
SELECT DISTINCT p.`id_product`, p.*, product_shop.*, pl.* , m.`name` AS manufacturer_name, x.`id_feature` , x.`id_feature_value` , s.`name` AS supplier_name
FROM `ps_product` p
INNER JOIN ps_product_shop product_shop
ON (product_shop.id_product = p.id_product AND product_shop.id_shop = 1)
LEFT JOIN `ps_product_attribute` y ON (y.`id_product` = p.`id_product`)
LEFT JOIN `ps_product_attribute_combination` ac ON (y.`id_product_attribute` = ac.`id_product_attribute`)
LEFT JOIN `ps_product_lang` pl ON (p.`id_product` = pl.`id_product` AND pl.id_shop = 1 )
LEFT JOIN `ps_manufacturer` m ON (m.`id_manufacturer` = p.`id_manufacturer`)
LEFT JOIN `ps_feature_product` x ON (x.`id_product` = p.`id_product`)
LEFT JOIN `ps_supplier` s ON (s.`id_supplier` = p.`id_supplier`)
LEFT JOIN `ps_category_product` c ON (c.`id_product` = p.`id_product`)
WHERE pl.`id_lang` = 1 AND c.`id_category` = 18 AND p.`price` between 0 and 1000
AND product_shop.`visibility` IN ("both", "catalog") AND product_shop.`active` = 1
ORDER BY p.`id_product` ASC LIMIT 1,4
But it returns 4 product with 2 products with same "id_product" (11941)
What I need is to return 4 products but of different ids each.
Anyone ?
Thanks a lot
Aymeric
[EDIT]
The result of this query shows 4 rows, with 2 having the same exact columns values EXCEPT for the id_feature_value column which 36 for one and 38 for the other.
SELECT DISTINCT gets all the distinct combinations of all selected fields in your query, not just the first field.
Now, you could solve that by using GROUP BY to select only distinct values of id_product specifically, like:
SELECT p.`id_product`, p.*, product_shop.*, pl.* , m.`name` AS manufacturer_name, x.`id_feature` , x.`id_feature_value` , s.`name` AS supplier_name
FROM `ps_product` p
INNER JOIN ps_product_shop product_shop
ON (product_shop.id_product = p.id_product AND product_shop.id_shop = 1)
LEFT JOIN `ps_product_attribute` y ON (y.`id_product` = p.`id_product`)
LEFT JOIN `ps_product_attribute_combination` ac ON (y.`id_product_attribute` = ac.`id_product_attribute`)
LEFT JOIN `ps_product_lang` pl ON (p.`id_product` = pl.`id_product` AND pl.id_shop = 1 )
LEFT JOIN `ps_manufacturer` m ON (m.`id_manufacturer` = p.`id_manufacturer`)
LEFT JOIN `ps_feature_product` x ON (x.`id_product` = p.`id_product`)
LEFT JOIN `ps_supplier` s ON (s.`id_supplier` = p.`id_supplier`)
LEFT JOIN `ps_category_product` c ON (c.`id_product` = p.`id_product`)
WHERE pl.`id_lang` = 1 AND c.`id_category` = 18 AND p.`price` between 0 and 1000
AND product_shop.`visibility` IN ("both", "catalog") AND product_shop.`active` = 1
GROUP BY p.`id_product`
ORDER BY p.`id_product` ASC LIMIT 1,4
However, the problem now is that your query has multiple different values of all the other fields you are selected to choose from, and no deterministic way to pick from them. Even though the id_product is unique in it's table, it's not unique in the result set because in at least one of your JOINs there is a one-to-many relationship, meaning there are several rows that match the JOIN conditions.
On older versions of MySQL, it will just pick the first value it finds in this case, but on SQL Server it will actually error out and tell you that the remaining fields either have to be mentioned in the GROUP BY clause, or they have to be aggregated. So, you've got a few ways you can go from here:
You are on an old version of MySQL and you don't particularly care which values are returned for the rest of the fields, so leave the query as I've posted and use that. I wouldn't recommend this, as it's undefined behaviour so in theory it could change at MySQL's whim. All the values returned will be from the same result row though.
Add aggregate functions, such as MIN() or MAX() to the rest of the remaining fields in the select clause. This will reduce the possible values for the fields down to one, but you will probably end up with a mixture of values from different rows.
Remove any one-to-many JOINs from your query so that you only ever get one row back in the result set for each individual id_product. Then, fetch the remaining data you need in a separate query.
There may be other alternative solutions, but it depends a lot on which values you want returned for the rest of the rows and what RDBMS you are using. For example, on SQL Server you could potentially make use of PARTITION BY to select the first row for each distinct id_product deterministically.
I know there ust be a few hundred of this similar post, but I have tried all the other ways in MS Access and still cannot get it to work.
So my working code is as follows
SELECT FVR.*, V.[Week Commencing], F.Date, V.Date
FROM FVR
INNER JOIN (F
INNER JOIN V ON (F.[Week Commencing] = V.[Week Commencing]) AND (F.GUID = V.GUID))
ON (FVR.GUID = V.GUID) AND (FVR.GUID = F.GUID)
My desired effect would be to show the Dates of the "F" table that have no entries in the "V"Table.
Sorry for being crpytic on the tables but it is for work. I thoght i had a good idead on how to do most of this.
any help would be amazing as I have been pulling my hair over this for a while now.
Cheers and thanks in advance.
Editing this to add in the full code as it will make more sense.
I basically have am unable to produce the Data range from F(Forecast) that Does not match in V(Visits) am trying to bring up a list of forecasted dates that have not been visited using the Week Commencing and GUID from both tables, The FVR table is just a table that holds the regional data matching up to the GUID. #Hogan I tried your way and ended up with syntax errors, I almost got somewhere and then lost it again. I thought I had a bit more knowledge of SQL than this.
Full code is as follows
SELECT FVR.*, [Visits].[Week Commencing], [Forecast].[Forecast Date], [Visits].Date
FROM ForecastVisitRegion
INNER JOIN ([Forecast] INNER JOIN [Visits] ON ([Forecast].[Week Commencing] = [Visits].[Week Commencing])
AND ([Forecast].GUID = [Visits].GUID)) ON (FVR.GUID = [Visits].GUID)
AND (FVR.GUID = [External - Forecast].GUID)
Thanks again
Stephen Edwards
You need to use left joins:
SELECT FVR.*, V.[Week Commencing], NZ(V.Date,F.Date) as virtual_date
FROM FVR
LEFT JOIN F ON FVR.GUID = F.GUID
LEFT JOIN V ON FVR.GUID = V.GUID F.[Week Commencing] = V.[Week Commencing]
Not sure I understand why FVR is coming into the mix but you need a left Join.
Select F.*
from F
left join V on F.[Week Commencing] = V.[Week Commencing] AND F.GUID = V.GUID
where V.GUID is null
The left join ensures all the records (matched or not) from F are included in the result set. Then the where V.GUID is null removes the records where no match was found in V leaving you with the F records with no match.
Another approach would be to use the NOT EXISTS statement in the WHERE Clause
Select F.*
from F
where not exists (select * from V where F.[Week Commencing] = V.[Week Commencing] AND F.GUID = V.GUID)
I have a large stored procedure that is used to return results for a dialog with many selections. I have a new criteria to get "extra" rows if a particular bit column is set to true. The current setup looks like this:
SELECT
CustomerID,
FirstName,
LastName,
...
FROM HumongousQuery hq
LEFT JOIN (
-- New Query Text
) newSubQuery nsq ON hq.CustomerID = nsq.CustomerID
I have the first half of the new query:
SELECT DISTINCT
c.CustomerID,
pp.ProjectID,
ep.ProductID
FROM Customers c
JOIN Evaluations e (NOLOCK)
ON c.CustomerID = e.CustomerID
JOIN EvaluationProducts ep (NOLOCK)
ON e.EvaluationID = ep.EvaluationID
JOIN ProjectProducts pp (NOLOCK)
ON ep.ProductID = pp.ProductID
JOIN Projects p
ON pp.ProjectID = p.ProjectID
WHERE
c.EmployeeID = #EmployeeID
AND e.CurrentStepID = 5
AND p.IsComplete = 0
The Projects table has a bit column, AllowIndirectCustomers, which tells me that this project can use additional customers when the value is true. As far as I can tell, the majority of the different SQL constructs are geared towards adding additional columns to the result set. I tried different permutations of the UNION command, with no luck. Normally, I would turn to a table-valued function, but I haven't been able to make it work with this scenerio.
This one has been a stumper for me. Any ideas?
So basically, you're looking to negate the need to match pp.ProjectID = p.ProjectID when the flag is set. You can do that right in the JOIN criteria:
JOIN Projects p
ON pp.ProjectID = p.ProjectID OR p.AllowIndirectCustomers = 1
Depending on the complexity of your tables, this might not work out too easily, but you could do a case statement on your bit column. Something like this:
select table1.id, table1.value,
case table1.flag
when 1 then
table2.value
else null
end as secondvalue
from table1
left join table2 on table1.id = table2.id
Here's a SQL Fiddle demo
I've got a bunch of tables that I'm joining using a unique item id. The majority of the where clause conditions will be built programatically from a user sumbitted form (search box) and multiple conditions will often be tested against the same table, in this case item tags.
My experience with SQL is minimal, but I understand the basics. I want to find the ids of active (status=1) items that have been tagged with a tag of a certain type, with the values "cats" and "kittens". Tags are stored as (id, product_id, tag_type_id, value), with id being the only column requiring a unique value. My first attempt was;
select
distinct p2c.product_id
from '.TABLE_PRODUCT_TO_CATEGORY.' p2c
inner join '.TABLE_PRODUCT.' p on p2c.product_id = p.id
inner join '.TABLE_PRODUCT_TAG.' pt on p.id = pt.product_id
inner join '.TABLE_TAG_TYPE.' tt on pt.tag_type_id = tt.id
where
tt.id = '.PRODUCT_TAG_TYPE_FREE_TAG.'
and p.status = 1
and lower(pt.value) = "cats"
and lower(pt.value) = "kittens"
but that returned nothing. I realised that the final AND condition was the problem, so tried using a self-join instead;
select
distinct p2c.product_id
from '.TABLE_PRODUCT_TO_CATEGORY.' p2c
inner join '.TABLE_PRODUCT.' p on p2c.product_id = p.id
inner join '.TABLE_PRODUCT_TAG.' pt on p.id = pt.product_id
inner join '.TABLE_PRODUCT_TAG.' pt2 on p.id = pt2.product_id
inner join '.TABLE_TAG_TYPE.' tt on pt.tag_type_id = tt.id
where
tt.id = '.PRODUCT_TAG_TYPE_FREE_TAG.'
and p.status = 1
and lower(pt.value) = "cats"
and lower(pt2.value) = "kittens"
Now everything works as expected and the result set is correct. So what do I want to know? To re-iterate, the results I'm after are the ids of active (status = 1) items that have been tagged with a tag of a certain type, with the values "cats" AND "kittens"...
Are self-joins the best way of achieving these results?
This query has the potential to be huge (I've omitted a category condition, of which there may be ~300), so does this self-join approach scale well? If not, is there an alternative?
Will the self-join approach be the best way forward (assuming there is an alternative) if I allow users to specify complex tag searches? ie "cats" and ("kittens" or "dogs") not "parrots".
wouldn't this work in your first query?
instead of
and lower(pt.value) = "cats"
and lower(pt.value) = "kittens"
do this
and lower(pt.value) in ("cats","kittens")
select
distinct p2c.product_id
from '.TABLE_PRODUCT_TO_CATEGORY.' p2c
inner join '.TABLE_PRODUCT.' p on p2c.product_id = p.id
inner join '.TABLE_PRODUCT_TAG.' pt on p.id = pt.product_id
inner join '.TABLE_TAG_TYPE.' tt on pt.tag_type_id = tt.id
where
tt.id = '.PRODUCT_TAG_TYPE_FREE_TAG.'
and p.status = 1
and (lower(pt.value) = "cats" or lower(pt.value) = "kittens")
The problem with the initial query was this:
and lower(pt.value) = "cats"
and lower(pt.value) = "kittens"
There exists no tag for which the value is both "cats" and "kittens", therefore no records will be returned. Using an IN clause as SQLMenace suggests would be the solution - that way you're saying, "give me back any active item that has been tagged 'cats' or 'kittens'".
But if you want any active item that has BOTH tags - then you need to do something like your second query. It's not perfectly clear from your question if that's what you're after.
For something like your Question #3:
"cats" and ("kittens" or "dogs") not "parrots".
you would want pt1, pt2, and (in a subquery) pt3, and something like this:
and lower(pt1.value) = "cats"
and lower(pt2.value) in ("kittens", "dogs")
and not exists (select * from '.TABLE_PRODUCT_TAG.' pt3 where pt3.product_id = p.id and lower(pt3.value) = "parrots")
The broadly general case could get quite messy...
Your answer is "yes, that's a scalable technique". As far as adding complexity, I think you'll overreach your users' ability to understand what they are doing before you have an efficient-query problem.
Ok, let me re-state the question to make sure I understand:
You are trying to show all products that have two different specific tags ("cats" and "kittens") but the tags are stored in a 1-to-many table.
The double-join does work, but here's another alternative:
SELECT ...
FROM P
WHERE p.status = 1
AND p.ProductID IN (SELECT Product_ID FROM tags WHERE value = "cats")
AND p.ProductID IN (SELECT Product_ID FROM tags WHERE value = "kittens")
Just add additional AND statements depending on the options the user selects.
The SQL optimizer should actually treat this the same way it treats a join, so I don't think performance would scale any worse than your version. Worth testing with your dataset, though, to make sure.
You're building yet another entity-attribute-value data model. Since you asked about scalability, here's a warning: EAV models usually don't scale and don't perform on top of RDBMS. Ultimately this 'flexible' data model ends up clobbering the optimizer and you'll be scanning millions and millions of rows to fetch your few dogs and kittens. Wikipedia has a topic covering this model and some of the downsides. Don't know what your target DB is, for instance SQL Server CAT published a white paper with common problems in the EAV model.
AIR CODE
select
distinct p2c.product_id
from '.TABLE_PRODUCT_TO_CATEGORY.' p2c
inner join '.TABLE_PRODUCT.' p on p2c.product_id = p.id
where
and p.status = 1
and 2 = (
SELECT COUNT(1)
FROM '.TABLE_PRODUCT_TAG.' pt
INNER JOIN '.TABLE_TAG_TYPE.' tt ON pt.tag_type_id = tt.id
WHERE tt.id = '.PRODUCT_TAG_TYPE_FREE_TAG.'
AND pt.product_id = p.id /* edit */
lower(pt.value) IN( "cats", "kittens" )
)