I have a following table structure.
USERS
PROPERTY_VALUE
PROPERTY_NAME
USER_PROPERTY_MAP
I am trying to retrieve user/s from the users table who have matching properties in property_value table.
A single user can have multiple properties. The example data here has 2 properties for user '1', but there can be more than 2. I want to use all those user properties in the WHERE clause.
This query works if user has a single property but it fails for more than 1 properties:
SELECT * FROM users u
INNER JOIN user_property_map upm ON u.id = upm.user_id
INNER JOIN property_value pv ON upm.property_value_id = pv.id
INNER JOIN property_name pn ON pv.property_name_id = pn.id
WHERE (pn.id = 1 AND pv.id IN (SELECT id FROM property_value WHERE value like '101')
AND pn.id = 2 AND pv.id IN (SELECT id FROM property_value WHERE value like '102')) and u.user_name = 'user1' and u.city = 'city1'
I understand since the query has pn.id = 1 AND pn.id = 2 it will always fail because pn.id can be either 1 or 2 but not both at the same time. So how can I re-write it to make it work for n number of properties?
In above example data there is only one user with id = 1 that has both matching properties used in the WHERE clause. The query should return a single record with all columns of the USERS table.
To clarify my requirements
I am working on an application that has a users list page on the UI listing all users in the system. This list has information like user id, user name, city etc. - all columns of the in USERS table. Users can have properties as detailed in the database model above.
The users list page also provides functionality to search users based on these properties. When searching for users with 2 properties, 'property1' and 'property2', the page should fetch and display only matching rows. Based on the test data above, only user '1' fits the bill.
A user with 4 properties including 'property1' and 'property2' qualifies. But a user with only one property 'property1' would be excluded due to the missing 'property2'.
This is a case of relational-division. I added the tag.
Indexes
Assuming a PK or UNIQUE constraint on USER_PROPERTY_MAP(property_value_id, user_id) - columns in this order to make my queries fast. Related:
Is a composite index also good for queries on the first field?
You should also have an index on PROPERTY_VALUE(value, property_name_id, id). Again, columns in this order. Add the the last column id only if you get index-only scans out of it.
For a given number of properties
There are many ways to solve it. This should be one of the simplest and fastest for exactly two properties:
SELECT u.*
FROM users u
JOIN user_property_map up1 ON up1.user_id = u.id
JOIN user_property_map up2 USING (user_id)
WHERE up1.property_value_id =
(SELECT id FROM property_value WHERE property_name_id = 1 AND value = '101')
AND up2.property_value_id =
(SELECT id FROM property_value WHERE property_name_id = 2 AND value = '102')
-- AND u.user_name = 'user1' -- more filters?
-- AND u.city = 'city1'
Not visiting table PROPERTY_NAME, since you seem to have resolved property names to IDs already, according to your example query. Else you could add a join to PROPERTY_NAME in each subquery.
We have assembled an arsenal of techniques under this related question:
How to filter SQL results in a has-many-through relation
For an unknown number of properties
#Mike and #Valera have very useful queries in their respective answers. To make this even more dynamic:
WITH input(property_name_id, value) AS (
VALUES -- provide n rows with input parameters here
(1, '101')
, (2, '102')
-- more?
)
SELECT *
FROM users u
JOIN (
SELECT up.user_id AS id
FROM input
JOIN property_value pv USING (property_name_id, value)
JOIN user_property_map up ON up.property_value_id = pv.id
GROUP BY 1
HAVING count(*) = (SELECT count(*) FROM input)
) sub USING (id);
Only add / remove rows from the VALUES expression. Or remove the WITH clause and the JOIN for no property filters at all.
The problem with this class of queries (counting all partial matches) is performance. My first query is less dynamic, but typically considerably faster. (Just test with EXPLAIN ANALYZE.) Especially for bigger tables and a growing number of properties.
Best of both worlds?
This solution with a recursive CTE should be a good compromise: fast and dynamic:
WITH RECURSIVE input AS (
SELECT count(*) OVER () AS ct
, row_number() OVER () AS rn
, *
FROM (
VALUES -- provide n rows with input parameters here
(1, '101')
, (2, '102')
-- more?
) i (property_name_id, value)
)
, rcte AS (
SELECT i.ct, i.rn, up.user_id AS id
FROM input i
JOIN property_value pv USING (property_name_id, value)
JOIN user_property_map up ON up.property_value_id = pv.id
WHERE i.rn = 1
UNION ALL
SELECT i.ct, i.rn, up.user_id
FROM rcte r
JOIN input i ON i.rn = r.rn + 1
JOIN property_value pv USING (property_name_id, value)
JOIN user_property_map up ON up.property_value_id = pv.id
AND up.user_id = r.id
)
SELECT u.*
FROM rcte r
JOIN users u USING (id)
WHERE r.ct = r.rn; -- has all matches
dbfiddle here
The manual about recursive CTEs.
The added complexity does not pay for small tables where the additional overhead outweighs any benefit or the difference is negligible to begin with. But it scales much better and is increasingly superior to "counting" techniques with growing tables and a growing number of property filters.
Counting techniques have to visit all rows in user_property_map for all given property filters, while this query (as well as the 1st query) can eliminate irrelevant users early.
Optimizing performance
With current table statistics (reasonable settings, autovacuum running), Postgres has knowledge about "most common values" in each column and will reorder joins in the 1st query to evaluate the most selective property filters first (or at least not the least selective ones). Up to a certain limit: join_collapse_limit. Related:
Postgresql join_collapse_limit and time for query planning
Why does a slight change in the search term slow down the query so much?
This "deus-ex-machina" intervention is not possible with the 3rd query (recursive CTE). To help performance (possibly a lot) you have to place more selective filters first yourself. But even with the worst-case ordering it will still outperform counting queries.
Related:
Check statistics targets in PostgreSQL
Much more gory details:
PostgreSQL partial index unused when created on a table with existing data
More explanation in the manual:
Statistics Used by the Planner
SELECT *
FROM users u
WHERE u.id IN(
select m.user_id
from property_value v
join USER_PROPERTY_MAP m
on v.id=m.property_value_id
where (v.property_name_id, v.value) in( (1, '101'), (2, '102') )
group by m.user_id
having count(*)=2
)
OR
SELECT u.id
FROM users u
INNER JOIN user_property_map upm ON u.id = upm.user_id
INNER JOIN property_value pv ON upm.property_value_id = pv.id
WHERE (pv.property_name_id=1 and pv.value='101')
OR (pv.property_name_id=2 and pv.value='102')
GROUP BY u.id
HAVING count(*)=2
No property_name table needed in query if propery_name_id are kown.
If you want just to filter:
SELECT users.*
FROM users
where (
select count(*)
from user_property_map
left join property_value on user_property_map.property_value_id = property_value.id
left join property_name on property_value.property_name_id = property_name.id
where user_property_map.user_id = users.id -- join with users table
and (property_name.name, property_value.value) in (
values ('property1', '101'), ('property2', '102') -- filter properties by name and value
)
) = 2 -- number of properties you filter by
Or, if you need users ordered descending by number of matches, you could do:
select * from (
SELECT users.*, (
select count(*) as property_matches
from user_property_map
left join property_value on user_property_map.property_value_id = property_value.id
left join property_name on property_value.property_name_id = property_name.id
where user_property_map.user_id = users.id -- join with users table
and (property_name.name, property_value.value) in (
values ('property1', '101'), ('property2', '102') -- filter properties by name and value
)
)
FROM users
) t
order by property_matches desc
SELECT * FROM users u
INNER JOIN user_property_map upm ON u.id = upm.user_id
INNER JOIN property_value pv ON upm.property_value_id = pv.id
INNER JOIN property_name pn ON pv.property_name_id = pn.id
WHERE (pn.id = 1 AND pv.id IN (SELECT id FROM property_value WHERE value
like '101') )
OR ( pn.id = 2 AND pv.id IN (SELECT id FROM property_value WHERE value like
'102'))
OR (...)
OR (...)
You can't do AND because there is no such a case where id is 1 and 2 for the SAME ROW, you specify the where condition for each row!
If you run a simple test, like
SELECT * FROM users where id=1 and id=2
you will get 0 results. To achieve that use
id in (1,2)
or
id=1 or id=2
That query can be optimised more but this is a good start I hope.
you are using AND operator between two pn.id=1 and pn.id=2. then how you getting the answer is between that:
(SELECT id FROM property_value WHERE value like '101') and
(SELECT id FROM property_value WHERE value like '102')
So like above comments , Use or operator.
Update 1:
SELECT * FROM users u
INNER JOIN user_property_map upm ON u.id = upm.user_id
INNER JOIN property_value pv ON upm.property_value_id = pv.id
INNER JOIN property_name pn ON pv.property_name_id = pn.id
WHERE pn.id in (1,2) AND pv.id IN (SELECT id FROM property_value WHERE value like '101' or value like '102');
If you just want the distinct columns in U, it is:
SELECT DISTINCT u.*
FROM Users u INNER JOIN USER_PROPERTY_MAP upm ON u.id = upm.[user_id]
INNER JOIN PROPERTY_VALUE pv ON upm.property_value_id = pv.id
INNER JOIN PROPERTY_NAME pn ON pv.property_name_id = pn.id
WHERE (pn.id = 1 AND pv.[value] = '101')
OR (pn.id = 2 AND pv.[value] = '102')
Notice I used pv.[value] = instead of the subquery to reacquire id... this is simplification.
If I understand your question correctly I would do it like this.
SELECT u.id, u.user_name, u.city FROM users u
WHERE (SELECT count(*) FROM property_value v, user_property_map m
WHERE m.user_id = u.id AND m.property_value_id = v.id AND v.value IN ('101', '102')) = 2
This should return a list of users that have all the properties listed in the IN clause. The 2 represents the number of properties searched for.
Assuming you want to select all the fields in the USERS table
SELECT u.*
FROM USERS u
INNER JOIN
(
SELECT USERS.id as user_id, COUNT(*) as matching_property_count
FROM USERS
INNER JOIN (
SELECT m.user_id, n.name as property_name, v.value
FROM PROPERTY_NAME n
INNER JOIN PROPERTY_VALUE v ON n.id = v.property_name_id
INNER JOIN USER_PROPERTY_MAP m ON m.property_value_id = v.property_value_id
WHERE (n.id = #property_id_1 AND v.value = #property_value_1) -- Property Condition 1
OR (n.id = #property_id_2 AND v.value = #property_value_2) -- Property Condition 2
OR (n.id = #property_id_3 AND v.value = #property_value_3) -- Property Condition 3
OR (n.id = #property_id_N AND v.value = #property_value_N) -- Property Condition N
) USER_PROPERTIES ON USER_PROPERTIES.user_id = USERS.id
GROUP BY USERS.id
HAVING COUNT(*) = N --N = the number of Property Condition in the WHERE clause
-- Note :
-- Use HAVING COUNT(*) = N if property matches will be "MUST MATCH ALL"
-- Use HAVING COUNT(*) > 0 if property matches will be "MUST MATCH AT LEAST ONE"
) USER_MATCHING_PROPERTY_COUNT ON u.id = USER_MATCHING_PROPERTY_COUNT.user_id
Related
I have the following problem. I run this request, which passes successfully but it says "0 (rows affected)". In short, what I want to do. I'm trying to write a query that adds elements to the main table because I've linked Child tables I'm trying to retrieve the IDs and put them in the main table
INSERT INTO Articles
([ID],
[ArtName],
[ArtType],
[SerNo],
[MACNo],
[UserID],
[Available],
[CityID],
[StoreID],
[WorkplaceID],
[ItemPrice],
[IP_01],
[IP_02],
[Note])
SELECT
(SELECT max([ID])+1 FROM Articles),
'HP',
art.ID,
'123',
'А18Н31',
u.ID,
av.ID,
c.ID ,
s.ID,
w.ID,
'14.23',
'192.168.11.3',
'192.168.11.3',
GetDate()
FROM Articles a
INNER JOIN Workplace w ON a.WorkplaceID = w.ID
INNER JOIN Store s ON a.StoreID = s.ID
INNER JOIN City c ON a.CityID = c.ID
INNER JOIN Avaiable av ON a.Available = av.ID
INNER JOIN Users u ON a.UserID = u.ID
INNER JOIN ArtType art ON a.ArtType = art.ID
WHERE c.CityName LIKE '%Sofia%' AND art.ArtTypeName LIKE '%FirstType%' AND s.StoreName LIKE '%First%' AND av.AvaiableName LIKE '%yes%' AND u.UserName LIKE '%Valq%' AND w.WorkplaceName LIKE '%FWorkplace%'
This says "0 rows affected" because the SELECT returns no rows. This could be because nothing matches the JOINs. This could be because the WHERE clause filters out all rows. Without sample data, there is no way to tell. You have to investigate yourself.
That said, this is highly suspicious:
(SELECT max([ID])+1 FROM Articles),
This is not the right way to have an incremental id in a table. You should be using an identity column. Or perhaps default to a sequence. In either case, the value of id would be set automatically when rows are inserted.
Also note that if this inserts multiple rows, all would get the same id, which is presumably not what you want.
select
picks.`fbid`,
picks.`time`,
categories.`name` as cname,
options.`name` as oname,
users.`name`
from
picks
left join categories
on (categories.`id` = picks.`cid`)
left join options
on (options.`id` = picks.oid)
left join users
on (users.fbid = picks.`fbid`)
order by
time desc
that query returns a result that like:
my question is.... I would like to modify the query to select only DISTINCT fbid's. (perhaps the first row only sorted by time)
can someone help with this?
select
p2.fbid,
p2.time,
c.`name` as cname,
o.`name` as oname,
u.`name`
from
( select p1.fbid,
min( p1.time ) FirstTimePerID
from picks p1
group by p1.fbid ) as FirstPerID
JOIN Picks p2
on FirstPerID.fbid = p2.fbid
AND FirstPerID.FirstTimePerID = p2.time
LEFT JOIN Categories c
on p2.cid = c.id
LEFT JOIN Options o
on p2.oid = o.id
LEFT JOIN Users u
on p2.fbid = u.fbid
order by
time desc
I don't know why you originally had LEFT JOINs, as it appears that all picks must be associated with a valid category, option and user... I would then remove the left, and change them to INNER joins instead.
The first inner query grabs for each fbid, the FIRST entry time which will result in a single entity for the FBID. From that, it re-joins to the picks table for the same ID and timeslot... then continues for the rest of the category, options, users join criteria of that single entry.
2 options, you could write a group by clause.
Or you could write a nested query joined back to itself to get pertinent info.
Nested aliased table:
SELECT
n.fBids
FROM
MyTable t
INNER JOIN
(SELECT DISTINCT fBids
FROM MyTable) n
ON n.ID = t.ID
Or group by option
SELECT fBId from MyTable
GROUP BY fBID
select picks.`fbid`, picks.`time`, categories.`name` as cname,
options.`name` as oname, users.`name` from picks left join categories
on (categories.`id` = picks.`cid`) left join options on (options.`id` = picks.oid)
left join users on (users.fbid = picks.`fbid`)
order by time desc GROUP BY picks.`fbid`
select
picks.fbid,
MIN(picks.time) as first_time,
MAX(picks.time) as last_time
from
picks
group by
picks.fbid
order by
MIN(picks.time) desc
However, if you want only distinct fbid's you cannot display cname and other columns at the same time.
I'm getting the result with mixed dates values, instead get the last revision for each title i get them mixed.
I'm using MySQL.
The general idea is retireve all rows for each entry, the last revision of each entry.
My current sql query:
SELECT DISTINCT
w.owner_id,
w.date,
w.title,
MAX(w.revision),
u.name AS updater
FROM wiki_pages AS w
JOIN users AS u ON w.owner_id = u.id
GROUP BY title
ORDER BY title ASC
SQL TABLE
Use:
SELECT wp.owner_id,
wp.date,
wp.title,
wp.revision,
u.name AS updater
FROM WIKI_PAGES wp
JOIN USERS u ON u.id = wp.owner_id
JOIN (SELECT t.title,
MAX(t.revision) AS max_rev
FROM WIKI_PAGES t
GROUP BY t.title) x ON x.title = wp.title
AND x.max_rev = wp.revision
In your query, the only thing you can guarantee is the title & the revision value is the highest. The other rows aren't necessarily related, hence the join to a derived table...
I asked this question on SO. However, I wish to extend it further. I would like to find the max value of the 'Reading' column only where the 'state' is of value 'XX' for example.
So if I join the two tables, how do I get the row with max(Reading) value from the result set. Eg.
SELECT s.*, g1.*
FROM Schools AS s
JOIN Grades AS g1 ON g1.id_schools = s.id
WHERE s.state = 'SA' // how do I get row with max(Reading) column from this result set
The table details are:
Table1 = Schools
Columns: id(PK), state(nvchar(100)), schoolname
Table2 = Grades
Columns: id(PK), id_schools(FK), Year, Reading, Writing...
I'd think about using a common table expression:
WITH SchoolsInState (id, state, schoolname)
AS (
SELECT id, state, schoolname
FROM Schools
WHERE state = 'XX'
)
SELECT *
FROM SchoolsInState AS s
JOIN Grades AS g
ON s.id = g.id_schools
WHERE g.Reading = max(g.Reading)
The nice thing about this is that it creates this SchoolsInState pseudo-table which wraps all the logic about filtering by state, leaving you free to write the rest of your query without having to think about it.
I'm guessing [Reading] is some form of numeric value.
SELECT TOP (1)
s.[Id],
s.[State],
s.[SchoolName],
MAX(g.[Reading]) Reading
FROM
[Schools] s
JOIN [Grades] g on g.[id_schools] = s.[Id]
WHERE s.[State] = 'SA'
Group By
s.[Id],
s.[State],
s.[SchoolName]
Order By
MAX(g.[Reading]) DESC
UPDATE:
Looking at Tom's i don't think that would work but here is a modified version that does.
WITH [HighestGrade] (Reading)
AS (
SELECT
MAX([Reading]) Reading
FROM
[Grades]
)
SELECT
s.*,
g.*
FROM
[HighestGrade] hg
JOIN [Grades] AS g ON g.[Reading] = hg.[Reading]
JOIN [Schools] AS s ON s.[id] = g.[id_schools]
WHERE s.state = 'SA'
This CTE method should give you what you want. I also had it break down by year (grade_year in my code to avoid the reserved word). You should be able to remove that easily enough if you want to. This method also accounts for ties (you'll get both rows back if there is a tie):
;WITH MaxReadingByStateYear AS (
SELECT
S.id,
S.school_name,
S.state,
G.grade_year,
RANK() OVER(PARTITION BY S.state, G.grade_year ORDER BY Reading DESC) AS ranking
FROM
dbo.Grades G
INNER JOIN Schools S ON
S.id = G.id_schools
)
SELECT
id,
state,
school_name,
grade_year
FROM
MaxReadingByStateYear
WHERE
state = 'AL' AND
ranking = 1
One way would be this:
SELECT...
FROM...
WHERE...
AND g1.Reading = (select max(G2.Reading)
from Grades G2
inner join Schools s2
on s2.id = g2.id_schools
and s2.state = s.state)
There are certainly more.
Our application allows administrators to add “User Properties” in order for them to be able to tailor the system to match their own HR systems. For example, if your company has departments, you can define “Departments” in the Properties table and then add values that correspond to “Departments” such as “Jewelry”, “Electronics” etc… You are then able to assign a department to users.
Here is the schema:
(source: mindgravy.net)
In this schema, a User can have only one UserPropertyValue per Property, but doesn’t have to have a value for the property.
I am trying to build a query that will be used in SSRS 2005 and also have it use the PropertyValues as the filter for users. My query looks like this:
SELECT UserLogin, FirstName, LastName
FROM Users U
LEFT OUTER JOIN UserPropertyValues UPV
ON U.ID = UPV.UserID
WHERE UPV.PropertyValueID IN (1, 5)
When I run this, if the user has ANY of the property values, they are returned. What I would like to have is where this query will return users that have values BY PROPERTY.
So if PropertyValueID = 1 is of Department (Jewelry), and PropertyValueID = 5 is of EmploymentType (Full Time), I want to return all users that are in Department Jewelry that are EmployeeType of Full Time, can this be done?
Here's a full data example:
User A has Department(Jewelry value = 1) and EmploymentType(FullTime value = 5)User B has Department(Electronics value = 2) and EmploymentType(FullTime value = 5)User C has Department(Jewelry value = 1) and EmployementType(PartTime value = 6)
My query should only return User A using the above query
UPDATE:
I should state that this query is used as a dataset in SSRS, so the parameter passed to the query will be #PropertyIDs and it is defined as a multi-value parameter in SSRS.
WHERE UPV.PropertyValueID IN (#PropertyIDs)
I figured out how to get this working in a completely hacky fashion:
SELECT UserLogin, FirstName, LastName
FROM Users
LEFT OUTER JOIN
(
SELECT UserID
FROM UserPropertyValues
WHERE PropertyValueID IN (#PropertyIDs)
GROUP BY UserID
HAVING COUNT(UserID) =
(
SELECT COUNT(*) FROM
(
SELECT PropertyID FROM PropertyValues
WHERE ID IN (#PropertyIDs) GROUP BY PropertyID
) p
)
) filtered on Users.UserID = filtered.UserID
Because we can determine the number of properties used in the parameter #PropertyIDs (returned by the select count(*)), we can make sure that the users returned from the query has the same number of properties as what was passed in #PropertyIDs.
SELECT UserLogin, FirstName, LastName
FROM Users
where UserID in (
select UserID
from UserPropertyValues upv1
inner join UserPropertyValues upv2 on upv1.UserID = upv2.UserID
and upv1.PropertyValueID = 1 and upv2.PropertyValueID = 5
) a
You need to join to the key_value store individually for each property you want to get (2 properties, 2 joins, 100 properties, 100 joins), This is why it is a poor design choice for performance.
Expanding on OrbMan's answer. This is going to give you one row per user/property.
SELECT U.UserLogin, U.FirstName, U.LastName, P.Name AS PropertyName, PV.Name AS PropertyValue
FROM Users U
INNER JOIN UserPropertyValues UPV
ON U.ID = UPV.UserID
INNER JOIN Properties P
ON UPV.PropertyID = P.ID
INNER JOIN PropertyValues PV
ON UPV.PropertyID = PV.ID
where UserID in (
select UserID
from UserPropertyValues upv1
inner join UserPropertyValues upv2 on upv1.UserID = upv2.UserID
and upv1.PropertyValueID IN (#PropertyIDs))