Reducing number of records in CTE table - sql

In PostgreSQL 9.5.4 I keep player infos from various social networks:
# TABLE words_social;
sid | social | female | given | family | photo | place | stamp | uid
-------+--------+--------+---------+--------+-------+-------+------------+-----
aaaaa | 1 | 0 | Abcde1 | | | | 1470237061 | 1
aaaaa | 2 | 0 | Abcde2 | | | | 1477053188 | 1
aaaaa | 3 | 0 | Abcde3 | | | | 1477053330 | 1
kkkkk | 3 | 0 | Klmnop3 | | | | 1477053810 | 2
kkkkk | 4 | 0 | Klmnop4 | | | | 1477053857 | 2
ggggg | 2 | 0 | Ghijk2 | | | | 1477053456 | 3
ggggg | 3 | 0 | Ghijk3 | | | | 1477053645 | 3
ggggg | 4 | 0 | Ghijk4 | | | | 1477053670 | 3
xxxxx | 4 | 0 | Xyzok | | | | 1470237393 | 4
(9 rows)
The 1, 2, 3, 4 values in column social mean "Facebook", "Twitter", etc.
For a player I can always select her most recent info by:
# select * from words_social s1 WHERE stamp =
(SELECT max(stamp) FROM words_social s2 WHERE s1.uid = s2.uid);
sid | social | female | given | family | photo | place | stamp | uid
-------+--------+--------+---------+--------+-------+-------+------------+-----
aaaaa | 3 | 0 | Abcde3 | | | | 1477053330 | 1
kkkkk | 4 | 0 | Klmnop4 | | | | 1477053857 | 2
ggggg | 4 | 0 | Ghijk4 | | | | 1477053670 | 3
xxxxx | 4 | 0 | Xyzok | | | | 1470237393 | 4
(4 rows)
Then there is another table storing current games (I have omitted some columns below):
# select gid, created, finished, player1, player2 from words_games;
gid | created | finished | player1 | player2
-----+-------------------------------+----------+---------+---------
1 | 2016-10-21 14:51:12.624507+02 | | 4 | 1
2 | 2016-10-21 14:51:22.631507+02 | | 3 |
(2 rows)
Whenever a user (for example with uid 1) connects to the server, I send her the games she is taking part in:
# select gid, created, finished, player1, player2 from words_games where player1 = 1
union select gid, created, finished, player2, player1 from words_games where player2 = 1;
gid | created | finished | player1 | player2
-----+-------------------------------+----------+---------+---------
1 | 2016-10-21 14:51:12.624507+02 | | 4 | 1
(1 row)
My problem: to the above UNION SELECT statement I need to add user infos from words_social table - so that I can display user photos and names above the game board in my 2-player game.
So I try this with CTE (and add the i.given column with the user first name):
# with user_infos AS (select * from words_social s1 WHERE stamp =
(SELECT max(stamp) FROM words_social s2 WHERE s1.uid = s2.uid))
select g.gid, g.created, g.finished, g.player1, g.player2, i.given from words_games g join user_infos i on (g.player1=i.uid) where g.player1 = 1
union select g.gid, g.created, g.finished, g.player2, g.player1, i.given from words_games g join user_infos i on (g.player2=i.uid) where g.player2 = 1;
gid | created | finished | player1 | player2 | given
-----+-------------------------------+----------+---------+---------+--------
1 | 2016-10-21 14:51:12.624507+02 | | 1 | 4 | Abcde3
(1 row)
This works well, but I still have the following problem -
I am worried that the CTE-table user_infos will get very large, once my game has many players.
How to rewrite the query, so that user_infos only holds relevant records?
I can not just perform
# with user_infos AS (select * from words_social s1 WHERE stamp =
(SELECT max(stamp) FROM words_social s2 WHERE s1.uid = s2.uid))
AND s1.uid = 1
...
because I also need the infos (given and family names, photo) of the game opponents.

user_infos query can be rewritten and used as following:
with user_infos as (
select row_number() over (partition by uid order by stamp desc), * from words_social
)
select g.gid, g.created, g.finished, g.player1, g.player2, i.given from words_games g
join user_infos i on g.player1=i.uid and i.row_number = 1 and g.player1 = 1
union select g.gid, g.created, g.finished, g.player2, g.player1, i.given from words_games g
join user_infos i on g.player2=i.uid and i.row_number =1 and g.player2 = 1;

You should wrap it the other way.
Start from the word_games, then make your join with the words_social table.
Also you could use dinstinct on (postgres specific) function to avoid a second table lookup.
So in the end:
with game_finder as (
select g.gid, g.player1, g.player2
from words_games g where g.player1 = 1
union
select g.gid,g.player2, g.player1
from words_games g where g.player2 = 1),
player1_infos as (
select distinct on (uid)
gf.gid,
uid,
social,
given
from words_social ws
inner join game_finder gf on gf.player1 = ws.uid
ORDER BY uid, stamp DESC
),
player2_infos as (
select gf.gid,
uid,
social,
given
from words_social ws
inner join game_finder gf on gf.player2 = ws.uid
ORDER BY uid, stamp DESC
)
select *
from game_finder gf
left outer join player1_infos p1 on gf.gid = p1.gid
left outer join player2_infos p2 on gf.gid = p2.gid;

Related

Postgres - Unique values for id column using CTE, Joins alongside GROUP BY

I have a table referrals:
id | user_id_owner | firstname | is_active | user_type | referred_at
----+---------------+-----------+-----------+-----------+-------------
3 | 2 | c | t | agent | 3
5 | 3 | e | f | customer | 5
4 | 1 | d | t | agent | 4
2 | 1 | b | f | agent | 2
1 | 1 | a | t | agent | 1
And another table activations
id | user_id_owner | referral_id | amount_earned | activated_at | app_id
----+---------------+-------------+---------------+--------------+--------
2 | 2 | 3 | 3.0 | 3 | a
4 | 1 | 1 | 6.0 | 5 | b
5 | 4 | 4 | 3.0 | 6 | c
1 | 1 | 2 | 2.0 | 2 | b
3 | 1 | 2 | 5.0 | 4 | b
6 | 1 | 2 | 7.0 | 8 | a
I am trying to generate another table from the two tables that has only unique values for referrals.id and returns as one of the columns the count for each apps as best_selling_app_count.
Here is the query I ran:
with agents
as
(select
referrals.id,
referral_id,
amount_earned,
referred_at,
activated_at,
activations.app_id
from referrals
left outer join activations
on (referrals.id = activations.referral_id)
where referrals.user_id_owner = 1),
distinct_referrals_by_id
as
(select
id,
count(referral_id) as activations_count,
sum(coalesce(amount_earned, 0)) as amount_earned,
referred_at,
max(activated_at) as last_activated_at
from
agents
group by id, referred_at),
distinct_referrals_by_app_id
as
(select id, app_id as best_selling_app,
count(app_id) as best_selling_app_count
from agents
group by id, app_id )
select *, dense_rank() over (order by best_selling_app_count desc) best_selling_app_rank
from distinct_referrals_by_id
inner join distinct_referrals_by_app_id
on (distinct_referrals_by_id.id = distinct_referrals_by_app_id.id);
Here is the result I got:
id | activations_count | amount_earned | referred_at | last_activated_at | id | best_selling_app | best_selling_app_count | best_selling_app_rank
----+-------------------+---------------+-------------+-------------------+----+------------------+------------------------+-----------------------
2 | 3 | 14.0 | 2 | 8 | 2 | b | 2 | 1
1 | 1 | 6.0 | 1 | 5 | 1 | b | 1 | 2
2 | 3 | 14.0 | 2 | 8 | 2 | a | 1 | 2
4 | 1 | 3.0 | 4 | 6 | 4 | c | 1 | 2
The problem with this result is that the table has a duplicate id of 2. I only need unique values for the id column.
I tried a workaround by harnessing distinct that gave desired result but I fear the query results may not be reliable and consistent.
Here is the workaround query:
with agents
as
(select
referrals.id,
referral_id,
amount_earned,
referred_at,
activated_at,
activations.app_id
from referrals
left outer join activations
on (referrals.id = activations.referral_id)
where referrals.user_id_owner = 1),
distinct_referrals_by_id
as
(select
id,
count(referral_id) as activations_count,
sum(coalesce(amount_earned, 0)) as amount_earned,
referred_at,
max(activated_at) as last_activated_at
from
agents
group by id, referred_at),
distinct_referrals_by_app_id
as
(select
distinct on(id), app_id as best_selling_app,
count(app_id) as best_selling_app_count
from agents
group by id, app_id
order by id, best_selling_app_count desc)
select *, dense_rank() over (order by best_selling_app_count desc) best_selling_app_rank
from distinct_referrals_by_id
inner join distinct_referrals_by_app_id
on (distinct_referrals_by_id.id = distinct_referrals_by_app_id.id);
I need a recommendation on how best to achieve this.
I am trying to generate another table from the two tables that has only unique values for referrals.id and returns as one of the columns the count for each apps as best_selling_app_count.
Your question is really complicated with a very complicated SQL query. However, the above is what looks like the actual question. If so, you can use:
select r.*,
a.app_id as most_common_app_id,
a.cnt as most_common_app_id_count
from referrals r left join
(select distinct on (a.referral_id) a.referral_id, a.app_id, count(*) as cnt
from activations a
group by a.referral_id, a.app_id
order by a.referral_id, count(*) desc
) a
on a.referral_id = r.id;
You have not explained the other columns that are in your result set.

Sql join multiple tables, get count of certain rows, and also check some rows satisfy condition

I have a Zoo, each Zoo has many Cages, each Cage has many Animals.
Zoo:
+----+
| Id |
+----+
| 1 |
| 2 |
+----+
Cage:
+----+-------+
| Id | ZooId |
+----+-------+
| 1 | 1 |
| 2 | 1 |
| 3 | 2 |
| 4 | 2 |
| 5 | 2 |
+----+-------+
Animal:
+----+--------+----------+
| Id | CageId | IsHungry |
+----+--------+----------+
| 1 | 1 | 0 |
| 2 | 1 | 0 |
| 3 | 1 | 0 |
| 4 | 2 | 1 |
| 5 | 3 | 0 |
| 6 | 4 | 0 |
| 7 | 5 | 0 |
+----+--------+----------+
I'm trying to design a query to show each Zoo, the number of cages in that Zoo, and whether or not the Zoo has hungry Animals.
Here is the results I expect:
+-------+-----------+--------------+
| ZooID | CageCount | AnyoneHungry |
+-------+-----------+--------------+
| 1 | 2 | 1 |
| 2 | 3 | 0 |
+-------+-----------+--------------+
I can get the number of Cages in a Zoo:
SELECT
[c].[ZooId],
COUNT(*) AS [NumCages]
FROM [Cage] [c]
GROUP BY [c].[ZooId]
ORDER BY [NumCages] DESC
I can determine if a Cage has a hungry animal or not:
SELECT CASE WHEN EXISTS (
SELECT NULL
FROM [Animal] [a]
WHERE [a].[CageId] = #CageId AND [a].[IsHungry] = 1
) THEN 1 ELSE 0 END
But I'm having trouble combining these two into a single query that runs efficiently (in this universe zoos are very popular and have millions of cages and animals).
SELECT
[c].[ZooId],
COUNT(*) AS [CageCount],
MAX(CONVERT(INT, [x].[AnyoneHungry])) AS [AnyoneHungry]
FROM [Cage] [c]
INNER JOIN (
SELECT [a].[CageId], MAX(CONVERT(INT, [a].[IsHungry])) AS [AnyoneHungry]
FROM [Animal] [a]
GROUP BY [a].[CageId]
) [x] on [x].[CageId] = [c].[Id]
GROUP BY [c].[ZooId]
I feel like I'm missing something and it should be possible do run this query using a simpler statement.
This should do
SELECT
Z.Id,
COUNT(DISTINCT C.Id) AS CageCount,
COALESCE(MAX(CAST(A.IsHungry AS INT)), 0) AS AnyHungry /*The cast is only required if A.IsHungry is BIT and not INT*/
FROM Zoo Z
LEFT JOIN Cage C ON Z.Id = C.ZooId
LEFT JOIN Animal A ON C.Id = A.CageId
GROUP BY Z.Id
If you only need the zoo id and hungry animals:
SELECT c.zooid,
COUNT(DISTINCT C.Id) as CageCount,
COALESCE(MAX(CONVERT(int, a.IsHungry)), 0) AS AnyHungry
FROM Cage C LEFT JOIN
Animal A
ON c.Id = a.CageId AND a.IsHungry = 1
GROUP BY c.zooid;

SQL - Join with multiple condition

I'm trying to join my users table with my jobs table based on a mapping table users_jobs:
Here is what the users table looks like:
users
|--------|------------------|
| id | name |
|--------|----------------- |
| 1 | Ozzy Osbourne |
| 2 | Lemmy Kilmister |
| 3 | Ronnie James Dio |
| 4 | Jimmy Page |
|---------------------------|
jobs table looks like this:
|--------|-----------------|
| id | title |
|--------|-----------------|
| 1 | Singer |
| 2 | Guitar Player |
|--------------------------|
And users_jobs table looks like this:
|--------|-------------|-------------|---------------|-------------|
| id | user_id | job_id | column3 | column4 |
|--------|-------------|-------------|---------------|-------------|
| 1 | 1 | 1 | 0 | 1 |
| 2 | 2 | 1 | 1 | 0 |
| 3 | 3 | 1 | 0 | 1 |
| 4 | 4 | 2 | 1 | 0 |
|----------------------|-------------|---------------|-------------|
For example, let's say the ozzy does a query.
Here is what should expect:
|--------|------------------|------------|--------- |
| id | name | column3 | column4 |
|--------|----------------- |------------|----------|
| 1 | Ozzy Osbourne | 0 | 1 |
| 2 | Lemmy Kilmister | 1 | 0 |
| 3 | Ronnie James Dio | 0 | 1 |
|---------------------------|------------|----------|
Basically, he can only see the job in which he is registered (role) and the users included.
I tried to do this:
SELECT u1.*, uj1.colum3, uj1.column4
FROM users AS u1
JOIN users_jobs AS uj1 ON uj1.user_id = 1
JOIN jobs AS j1 ON j1.id = up1.job_id
WHERE uj1.job_id = 1
Any help would be great!
Looks like you need INNER JOIN Try this :
select u.id, u.column3 , u.column4 from users u
inner join user_jobs uj on u.id=uj.user_id
inner join jobs j on j.id=uj.job_id
where uj.job_id=1;
If you need by certain user_id
select u.id, u.column3 , u.column4 from users u
inner join user_jobs uj on u.id=uj.user_id
inner join jobs j on j.id=uj.job_id
where uj.job_id=1
and u.id=1;
I found a solution.
Using #stackFan approach adding an EXISTS clause to make sure that the user is in.
SELECT u.id, u.column3 , u.column4
FROM users u
INNER JOIN user_jobs uj on u.id = uj.user_id
INNER JOIN jobs j on j.id = uj.job_id
WHERE uj.job_id = <job-ID>
AND
EXISTS (
SELECT *
FROM users_jobs AS uj
WHERE uj.job_id = <job-ID>
AND uj.user_id = <user-ID>
);
Try LEFT JOIN. It will display all users, whether they have job or not.
SELECT u.id, u.name, uj.colum3, uj.column4
FROM users AS u
LEFT JOIN users_jobs uj ON uj.user_id = u.id
LEFT JOIN jobs j ON j.id = uj.job_id

joining more than two tables without repeating values

I want to join three tables,
I have three tables user, profession and education where "uid" is primary key for user table and foreign key for other two tables. I want to join these tables to produce result in one single table
user profession education
+------+-------+ +-----+----------+ +-----+---------+
| uid | uName | | uid | profName | | uid | eduName |
+------+-------+ +-----+----------+ +-----+---------+
| 1 | aaa | | 1 | prof1 | | 1 | edu1 |
| 2 | bbb | | 1 | prof2 | | 1 | edu2 |
| 3 | ccc | | 2 | prof1 | | 1 | edu3 |
| | | | 3 | prof3 | | 3 | edu4 |
| | | | 3 | prof2 | | | |
+------+-------+ +-----+----------+ +-----+---------+
Expected output
+------+-------+-----+----------+-----+---------+
| uid | uName | uid | profName | uid | eduName |
+------+-------+-----+----------+-----+---------+
| 1 | aaa | 1 | prof1 | 1 | edu1 |
| null | null | 1 | prof2 | 1 | edu2 |
| null | null |null | null | 1 | edu3 |
| 2 | bbb | 2 | prof1 | null| null |
| 3 | ccc | 3 | prof3 | 3 | edu4 |
| null | null | 3 | prof2 | null| null |
+------+-------+-----+----------+-----+---------+
I tried following query
select u.uid ,u.uName,p.uid , p.profName,e.uid,e.eduName
from user u inner join profession p on u.uid=p.pid
inner join education e on u.uid = e.uid
where u.uid=p.uid
and u.uid=e.uid
and i.uid=1
Which gives me duplicate values
+------+-------+-----+----------+-----+---------+
| uid | uName | uid | profName | uid | eduName |
+------+-------+-----+----------+-----+---------+
| 1 | aaa | 1 | prof1 | 1 | edu1 |
| 1 | aaa | 1 | prof2 | 1 | edu1 |
| 1 | aaa | 1 | prof1 | 1 | edu2 |
| 1 | aaa | 1 | prof2 | 1 | edu2 |
| 1 | aaa | 1 | prof1 | 1 | edu3 |
| 1 | aaa | 1 | prof2 | 1 | edu3 |
+------+-------+-----+----------+-----+---------+
Is there a way to get the output with not repeating the values.
Thanks
Bit of a swine this one.
I agree with #GordonLinoff that ideally this presentation would be done on the client side.
However, if we wish to do it in SQL, then the basic approach is that you have to get the maximum number of rows that will be consumed by each user (based on a count of how many entries they have in each of the professions and educations tables, and then of these counts, the max count).
Once we have the number of rows required for each user, we expand the rows out for each user as necessary using a numbers table (I've included a number generator for the purpose).
Then we join each table on, according to the uid and the row number of the entry in the joined table relative to the row number of the "expanded" rows for each user. Then we select the relevant columns, and that's us done. Pay the nurse on the way out!
WITH
number_table(number) AS
(
SELECT
(ones.n) + (10 * tens.n) + (100 * hundreds.n) AS number
FROM --available range 0 to 999
(VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) AS ones(n)
,(VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) AS tens(n)
,(VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) AS hundreds(n)
)
,users(u_uid, userName) AS
(
SELECT 1, 'aaa'
UNION ALL
SELECT 2, 'bbb'
UNION ALL
SELECT 3, 'ccc'
)
,professions(p_u_uid, profName) AS
(
SELECT 1, 'prof1'
UNION ALL
SELECT 1, 'prof2'
UNION ALL
SELECT 2, 'prof1'
UNION ALL
SELECT 3, 'prof3'
UNION ALL
SELECT 3, 'prof2'
)
,educations(e_u_uid, eduName) AS
(
SELECT 1, 'edu1'
UNION ALL
SELECT 1, 'edu2'
UNION ALL
SELECT 1, 'edu3'
UNION ALL
SELECT 3, 'edu4'
)
,row_counts(uid, row_count) AS
(
SELECT u_uid, COUNT(u_uid) FROM users GROUP BY u_uid
UNION ALL
SELECT p_u_uid, COUNT(p_u_uid) FROM professions GROUP BY p_u_uid
UNION ALL
SELECT e_u_uid, COUNT(e_u_uid) FROM educations GROUP BY e_u_uid
)
,max_counts(uid, max_count) AS
(
SELECT uid, MAX(row_count) FROM row_counts GROUP BY uid
)
SELECT
u_uid
,userName
,p_u_uid
,profName
,e_u_uid
,eduName
FROM
max_counts
INNER JOIN
number_table ON number BETWEEN 1 AND max_count
LEFT JOIN
(
SELECT u_uid, userName, ROW_NUMBER() OVER (PARTITION BY u_uid ORDER BY userName) AS user_match
FROM users
) AS users
ON u_uid = uid
AND number = user_match
LEFT JOIN
(
SELECT p_u_uid, profName, ROW_NUMBER() OVER (PARTITION BY p_u_uid ORDER BY profName) AS prof_match
FROM professions
) AS professions
ON p_u_uid = uid
AND number = prof_match
LEFT JOIN
(
SELECT e_u_uid, eduName, ROW_NUMBER() OVER (PARTITION BY e_u_uid ORDER BY eduName) AS edu_match
FROM educations
) AS educations
ON e_u_uid = uid
AND number = edu_match
ORDER BY
IIF(COALESCE(u_uid, p_u_uid, e_u_uid) IS NULL, 1, 0) ASC --nulls last
,COALESCE(u_uid, p_u_uid, e_u_uid) ASC
,IIF(COALESCE(p_u_uid, e_u_uid) IS NULL, 1, 0) ASC --nulls last
,COALESCE(p_u_uid, e_u_uid) ASC
,IIF(e_u_uid IS NULL, 1, 0) ASC --nulls last
,e_u_uid ASC
And the results:
u_uid userName p_u_uid profName e_u_uid eduName
----------- -------- ----------- -------- ----------- -------
1 aaa 1 prof1 1 edu1
NULL NULL 1 prof2 1 edu2
NULL NULL NULL NULL 1 edu3
2 bbb 2 prof1 NULL NULL
3 ccc 3 prof2 3 edu4
NULL NULL 3 prof3 NULL NULL
Did you try the distinct keyword?
select DISTINCT u.uid ,u.uName,p.uid , p.profName,e.uid,e.eduName
from user u inner join profession p on u.uid=p.pid
inner join education e on u.uid = e.uid
where u.uid=p.uid
and u.uid=e.uid
and i.uid=1

SQL Count from Sub Table

I want to count some columns from sub table. My table structure is below:
Persons
+---+----------+--------+
| Pid | Name |Surname |
+---+----------+--------+
| 1 | Per A | D |
| 2 | Per B | E |
| 3 | Per C | F
+----+---------+--------+
Childs
+---+---------+-------------------+------------+-----+
| Cid | CName | School | Sex | Pid |
+---+---------+-------------------+------------+-----+
| 1 | John | High | Man | 1 |
| 2 | Alice | Primary | Woman | 2 |
| 3 | Mel | High | Man | 3 |
| 4 | Angelina | High | Woman | 2 |
+----+---------+------------------+------------+-----+
So I want to output
+---+----------+------+---------+--------+---+--------------+
| Pid| PerName | High | Primary | Woman | Man | ChildCount |
+---+----------+------+---------+--------+-----+------------+
| 1 | Per A | 1 | 0 | 0 | 1 | 1 |
| 2 | Per B | 1 | 1 | 2 | 0 | 2 |
| 3 | Per C | 1 | 0 | 0 | 1 | 1 |
+----+---------+------+---------+--------+-----+------------+
How can I get this output?
I try this method but i have more columns like this to calculate belongs to Child table. So I get slow query results.
select Pid,Name,Surname,
(select count(*) from Childs where Persons.Pid=Childs.Pid) ChildCount,
(select count(*) from Childs where Persons.Pid=Childs.Pid and School='Primary') Primary
from Persons
You can do this with join and conditional aggregation:
select p.Pid, p.Name,
sum(case when c.school = 'High' then 1 else 0 end) as high,
sum(case when c.school = 'Primary' then 1 else 0 end) as primary,
sum(case when c.sex = 'Man' then 1 else 0 end) as Man,
sum(case when c.sex = 'Woman' then 1 else 0 end) as Woman,
count(*) as ChildCount
from persons p left join
childs c
on p.pid = c.pid
group by p.Pid, p.Name;
Try This One:
select Pid,Name,Surname,
ifNull((select count(*) from Childs where Persons.Pid=Childs.Pid),0) ChildCount,
ifNull((select count(*) from Childs where Persons.Pid=Childs.Pid AND School='High' GROUP By Childs.Pid),0) High,
ifNull((select count(*) from Childs where Persons.Pid=Childs.Pid AND School='Primary' GROUP By Childs.Pid),0) 'primary',
ifNull((select count(*) from Childs where Persons.Pid=Childs.Pid AND Sex='Woman' GROUP By Childs.Pid),0) Woman,
ifNull((select count(*) from Childs where Persons.Pid=Childs.Pid AND Sex='Man' GROUP By Childs.Pid),0) Man
from Persons;