Select rows with most similar set of attributes - sql

I have a PostgreSQL 8.3.4 DB to keep information about photo taggings.
First off, my table definitions:
create table photos (
id integer
, user_id integer
, primary key (id, user_id)
);
create table tags (
photo_id integer
, user_id integer
, tag text
, primary key (user_id, photo_id, tag)
);
What I'm trying to do + simple example:
I am trying to return all the photos that have at least k other photos with at least j common tags.
I. e., if Photo X has these tags (info field in the tags table):
gold
clock
family
And photo Y has the next tags:
gold
sun
family
flower
X and Y have 2 tags in common. For k = 1 and j = 2 X and Y will be returned.
What I have tried
SELECT tags1.user_id , users.name, tags1.photo_id
FROM users, tags tags1, tags tags2
WHERE ((tags1.info = tags2.info) AND (tags1.photo_id != tags2.photo_id)
AND (users.id = tags1.user_id))
GROUP BY tags1.user_id, tags1.photo_id, tags2.user_id, tags2.photo_id, users.name
HAVING ((count(tags1.info) = <j>) and (count(*) >= <k>))
ORDER BY user_id asc, photo_id asc
My failed results:
When I tried to run it on those tables:
photos
photo_id user_id
0 0
1 0
2 0
20 1
23 1
10 3
tags
photo_id user_id tag
0 0 Car
0 0 Bridge
0 0 Sky
20 1 Car
20 1 Bridge
10 3 Sky
The result for k = 1 and j = 1:
Expected:
| user_id | User Name | photo_id |
| 0 | Bob | 0 |
| 1 | Ben | 20 |
| 3 | Lev | 10 |
Actual:
| user_id | User Name | photo_id |
| 0 | Bob | 0 |
| 3 | Lev | 10 |
For k = 2 and j = 1:
Expected:
| user_id | User Name | photo_id |
| 0 | Bob | 0 |
Actual: empty result.
For j = 2 and k = 2:
Expected: empty result.
Actual:
| user_id | User Name | Photo ID |
| 0 | Bob | 0 |
| 1 | Ben | 20 |
How to solve this properly?

Working with your current design, this uses only basic SQL features and should work for Postgres 8.3, too (untested):
SELECT *
FROM photos p
WHERE (
SELECT count(*) >= 1 -- k other photos
FROM (
SELECT 1
FROM tags t1
JOIN tags t2 USING (tag)
WHERE t1.photo_id = p.id
AND t1.user_id = p.user_id
AND (t2.photo_id <> p.id OR
t2.user_id <> p.user_id)
GROUP BY t2.photo_id, t2.user_id
HAVING count(*) >= 1 -- j common tags
) t1
);
Or:
SELECT *
FROM (
SELECT id, user_id
FROM (
SELECT t1.photo_id AS id, t1.user_id
FROM tags t1
JOIN tags t2 USING (tag)
WHERE (t2.photo_id <> t1.photo_id OR
t2.user_id <> t1.user_id)
GROUP BY t1.photo_id, t1.user_id, t2.photo_id, t2.user_id
HAVING count(*) >= 1 -- j common tags
) sub1
GROUP BY 1, 2
HAVING count(*) >= 1 -- k other photos
) sub2
JOIN photos p USING (id, user_id);
In Postgres 9.3 or later you could use a correlated subquery with a LATERAL join ...
The above are probably even faster than my first query:
SELECT *
FROM (
SELECT photo_id, user_id
FROM tags t
GROUP BY 1, 2
HAVING (
SELECT count(*) >= 1
FROM (
SELECT photo_id, user_id
FROM tags
WHERE tag = ANY(array_agg(t.tag))
AND (photo_id <> t.photo_id OR
user_id <> t.user_id)
GROUP BY 1, 2
HAVING count(*) >= 2
) t1
)
) t
JOIN photos p ON p.id = t.photo_id
AND p.user_id = t.user_id;
SQL Fiddle showing both on Postgres 9.3.
The 1st query just needs the right basic indexes.
For the 2nd, I would build a materialized view with integer arrays, install the intarray module, a GIN index on the integer array column for better performance ...
Related:
Order result by count of common array elements
Proper design
It would be much more efficient to have a single column serial PK for photos and only store IDs of tags per photo ...:
CREATE TABLE photo (
photo_id serial PRIMARY KEY
, user_id int NOT NULL
);
CREATE TABLE tag (
tag_id serial PRIMARY KEY
, tag text UNIQUE NOT NULL
);
CREATE TABLE photo_tag (
photo_id int REFERENCES (photo)
, tag_id int REFERENCES (tag)
, PRIMARY KEY (photo_id, tag_id)
);
Would make the query much simpler and faster, too.
How to implement a many-to-many relationship in PostgreSQL?

If I understood you correctly, you want to calculate similarity between all photos of all users by common tags.
I think you need this:
SELECT
A.user_id,
A.photo_id,
B.user_id,
B.photo_id,
(
SELECT COUNT(*)
FROM
tags TA
JOIN tags TB ON TA.tag = TB.tag
WHERE
A.user_id = TA.user_id
AND A.photo_id = TA.photo_id
AND B.user_id = TB.user_id
AND B.photo_id = TB.photo_id
) AS common_tags
FROM
users A
,users B
WHERE
-- Exclude results to self
A.user_id <> B.User_id
AND A.photo_id <> B.photo_id

Related

SQL - How to check if users are in the same hierarchy?

I want to find out if users are directly in a parent child relation.
Given my user table schema
User_id | Parent_ID | Name
For example, I have a list of user_id's and I want to know if they are all in the same hierarchical tree.
I have tried using CTE recursive.
Sample data
User_id | Parent_ID | Name
1 | | A
2 | 1 | B
3 | 2 | C
4 | 3 | D
5 | 2 | E
6 | | F
7 | 6 | G
user_id varchar(100)
parent_id varchar(100)
Desired result: Input [2,3,4] => Same Team
Input [2,3,7] => Not same team
Use the top-level parents' parent_id as the hierarchy identifier:
with recursive hierarchies as (
select user_id, user_id as hierarchy_id
from ttable
where parent_id is null
union all
select c.user_id, p.hierarchy_id
from hierarchies p
join ttable c on c.parent_id = p.user_id
)
select * from hierarchies;
With that mapping of each user_id to a single hierarchy_id, you can join to your list of users.
EDIT BEGINS
Since you added sample data and example results that do not match your original question, here is an example of how any minimally competent programmer could slightly tweak the above to match the newly added contradictory examples:
with recursive subhierarchies as (
select user_id, array[user_id] as path
from ttable
where parent_id is null
union all
select c.user_id, p.path||c.user_id as path
from subhierarchies p
join ttable c on c.parent_id = p.user_id
)
select d.user_ids, count(s.path) > 0 as same_team
from (values (array[2, 3, 4]), (array[2, 3, 6])) as d(user_ids)
left join subhierarchies s
on s.path #> d.user_ids
group by d.user_ids
;

SQL referrals view

I have a table with users:
id referred_by_user_id
-----------------------------------
1 null
2 1
3 1
4 2
I need to write request to get number of people referred by user in two levels.
First - direct referral (example: user 1 referred users 2 3. count for level 1 = 2)
Second - user 1 referred to users 2 and 3, user 2 referred to user 4. So count for level 2 should be 1
Result of query should be:
id referred_user_tier_one_total referred_user_tier_two_total
------------------------------------------------------------------------------
1 2 1
2 1 null
3 null null
4 null null
I figured out how to count referred_user_tier_one_total:
select
"id", referred_user_tier_one_total
from
"user"
inner join
(select
count(*) as referred_user_tier_one_total, referred_by_user_id
from
"user"
where
"user".referred_by_user_id is not null
group by
"user".referred_by_user_id) ur on "user".id = ur.referred_by_user_id
But I don't understand how to calculate referred_user_tier_two_total. Please, help
UPD:
Thanks #Stoff for SQL Server solution.
Here is the script rewritten for Postgres
WITH RECURSIVE agg AS
(
SELECT
a.ID, a.referred_by_user_id,
COUNT(b.referred_by_user_id) AS "count"
FROM
"user" a
LEFT JOIN
"user" b ON a.ID = b.referred_by_user_id
GROUP BY
a.ID, a.referred_by_user_id
)
SELECT
a.ID,
a.Count AS referred_user_tier_one_total,
CASE
WHEN SUM(b.count) IS NULL
THEN 0
ELSE SUM(b.count)
END AS referred_user_tier_two_total
FROM
agg a
LEFT JOIN
agg b ON a.ID = b.referred_by_user_id
GROUP BY
a.ID, a.Count
ORDER BY
a.ID
Here is a solution which works in Postgres.
We could carry on writing more levels in the same way.
create table referals(
id int,
referred_by_user_id int);
insert into referals values
(1 , null),
(2 , 1),
(3 , 1),
(4 , 2 );
select
t0.id,
count(t1.id) tier1,
count(t2.id) tier2
from
referals t0
left join referals t1
on t0.id = t1.referred_by_user_id
left join referals t2
on t1.id = t2.referred_by_user_id
group by t0.id
id | tier1 | tier2
-: | ----: | ----:
1 | 2 | 1
2 | 1 | 0
3 | 0 | 0
4 | 0 | 0
db<>fiddle here

How can I get last 2 records from another table as columns

I have a table called products with this schema:
CREATE TABLE products (
id INT PRIMARY KEY,
sku TEXT NOT NULL,
fee REAL
);
And another table with fee change log with this schema:
CREATE TABLE fee_change(
id SERIAL PRIMARY KEY,
sku_id INT NOT NULL,
old_fee REAL NOT NULL,
new_fee REAL NOT NULL,
FOREIGN KEY (sku_id) REFERENCES products(id)
);
Is there anyway to get last 2 fee changes for each sku in one sql and not 2 rows for each sku, I want to have 2 new columns with old_fee_1, new _fee_1, old_fee_2, new_fee_2:
Desired result:
id | sku | old_fee_1 | new_fee_1 | old_fee_2 | new_fee_2
1 | ASC | 4 | 2.5 | 3 | 4
2 | CF2 | 4 | 1 | 3 | 4
3 | RTG | 0.5 | 1 | 2 | 0.5
4 | VHN5 | null | null | null | null
dbfiddle
As starting point I took your query from the fiddle you linked:
SELECT *
FROM products AS p
LEFT JOIN LATERAL (
SELECT *
FROM fee_change
WHERE sku_id = p.id
ORDER BY id DESC
LIMIT 2
) AS oo
ON true
demo: db<>fiddle
You can use the FILTER clause (alternatively it works with a CASE WHEN construct as well) to pivot your joined table. To get the pivot value, you can add a row count (using the row_number() window function):
SELECT
p.id, p.sku, p.fee,
MAX(old_fee) FILTER (WHERE row_number = 1) AS old_fee_1, -- 2
MAX(new_fee) FILTER (WHERE row_number = 1) AS new_fee_1,
MAX(old_fee) FILTER (WHERE row_number = 2) AS old_fee_2,
MAX(new_fee) FILTER (WHERE row_number = 2) AS new_fee_2
FROM products AS p
LEFT JOIN LATERAL (
SELECT
*,
row_number() OVER (PARTITION BY sku_id) -- 1
FROM fee_change
WHERE sku_id = p.id
ORDER BY id DESC
LIMIT 2
) AS oo ON true
GROUP BY p.id, p.sku, p.fee -- 2
Create pivot value
Do the filtered aggregation to create the pivoted table.
Something like this should do the trick :
SELECT p.id,
p.sku,
old.old_fee_1,
old.new_fee_1,
new.old_fee_2,
new.new_fee_2
FROM products p
LEFT JOIN (SELECT fee.sku_id id, fee.old_fee old_fee_1, fee.new_fee new_fee_1
FROM fee_change ORDER BY fee.id DESC LIMIT 1 OFFSET 1) old ON old.id = p.id
LEFT JOIN (SELECT fee.sku_id id, fee.old_fee old_fee_2, fee.new_fee new_fee_2
FROM fee_change ORDER BY fee.id DESC LIMIT 1 OFFSET 0) new ON new.id = p.id

Join one table with two other ones by id

I am trying to join one table with two others that are unrelated to each other but are linked to the first one by an id
I have the following tables
create table groups(
id int,
name text
);
create table members(
id int,
groupid int,
name text
);
create table invites(
id int,
groupid int,
status int \\ 2 for accepted, 1 if it's pending
);
Then I inserted the following data
insert into groups (id, name) values(1,'group');
insert into members(id, groupid, name) values(1,1,'admin'),(1,1,'other');
insert into invites(id, groupid, status) values(1,1,2),(2,1,1),(3,1,1);
Obs:
The admin does not has an invite
The group has an approved invitation with status 2 (because the member 'other' joined)
The group has two pending invites with status 1
I am trying to do a query that gets the following result
groupid | name | inviteId
1 | admin | null
1 | other | null
1 | null | 2
1 | null | 3
I have tried the following querys with no luck
select g.id, m.name, i.id from groups g
left join members m ON m.groupid = g.id
left join invites i ON i.groupid = g.id and i.status = 1;
select g.id, m.name, i.id from groups g
join (select groupid, name from members) m ON m.groupid = g.id
join (select groupid, id from invites where status = 1) i ON i.groupid = g.id;
Any ideas of what I am doing wrong?
Because members and invites are not related, you need to use two separate queries and use UNION (automatically removes duplicates) or UNION ALL (keeps duplicates) to get the output you desire:
select g.id as groupid, m.name, null as inviteid from groups g
join members m ON m.groupid = g.id
union all
select g.id, null, i.id from groups g
join invites i ON (i.groupid = g.id and i.status = 1);
Output:
groupid | name | inviteid
---------+-------+----------
1 | admin |
1 | other |
1 | | 3
1 | | 2
(4 rows)
Without a UNION, your query implies that the tables have some sort of relationship, so the columns are joined side-by-side. Since you want to preserve the null values, implying that the tables are not related, you need to concatenate/join them vertically with UNION
Disclosure: I work for EnterpriseDB (EDB)

SELECT to pick users who both viewed a page

I have a table that logs page views of each user:
+--------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+--------------+--------------+------+-----+---------+----------------+
| view_id | int(11) | NO | PRI | NULL | auto_increment |
| page_id | int(11) | YES | MUL | NULL | |
| user_id | int(11) | YES | MUL | NULL | |
+--------------+--------------+------+-----+---------+----------------+
For every pair of users, I would like to generate a count of how many pages they have both looked at.
I simply do not know how to do this. : ) I am using mysql, in case it has a non-standard feature that makes this a breeze.
select u1.user_id, u2.user_id, count(distinct u1.page_id) as NumPages
from logtable u1
join
logtable u2
on u1.page_id = u2.page_id
and u1.user_id < u2.user_id /* This avoids counting pairs twice */
group by u1.user_id, u2.user_id;
But you should consider filtering this somewhat...
(Edited above to put u1.page_id, it was originally just page_id, which is really bad of me)
SELECT DISTINCT page_id
FROM logtable
WHERE user_id = 1 OR user_id = 2
GROUP BY page_id
HAVING COUNT(DISTINCT user_id) = 2
This table returns all pages they both have looked at. If you want the count, then just make this a subquery and count the rows.
SELECT COUNT(*) FROM (the query above) s;
Update, let's do it for all pairs of users then.
SELECT u1.user_id, u2.user_id, COUNT(DISTINCT u1.page_id)
FROM logtable u1, logtable u2
WHERE u1.user_id < u2.user_id
AND u1.page_id = u2.page_id
GROUP BY u1.user_id, u2.user_id
For users_ids 100 and 200.
SELECT
page_id
FROM table1
WHERE user_id IN (100, 200)
GROUP BY page_id
HAVING MAX(CASE WHEN user_id = 100 THEN 1 ELSE 0 END) = 1
AND MAX(CASE WHEN user_id = 200 THEN 1 ELSE 0 END) = 1;
select a.user_id as user1, b.user_id as user2, count(distinct a.page_id) as views
from yourtable a, yourtable b
where a.page_id = b.page_id
and a.user_id < b.user_id
group by a.user_id, b.user_id
change yourtable to the name of your table ..