PostgreSQL Waiting List Query - sql

I have a table that contains WaitingListPosition records, which is essentially a ticket to signify a user is waiting to join a membership for a Club, which has its own record in the Clubs table. In the WaitingListPositions table there are multiple positions for various Clubs. This is what the table currently looks like with a SELECT query:
id | club_id | location_id | user_id | child_id | created_at
----+---------+-------------+---------+----------+-------------------------------
2 | 8 | 3 | 10 | | 2021-07-05 12:49:24.036091+00
4 | 8 | 3 | 33 | | 2021-07-05 12:55:57.54674+00
5 | 9 | 5 | | 5 | 2021-07-05 12:58:16.319837+00
I have an API that returns a list of WaitingListPosition objects in array, which returns all the WaitingListPositions that user has. I need to get the following data out but have been unsuccessful:
Each object must contain all the WaitingListPosition data
The current position in the waiting list (the list is ordered by created_at (timestamp) field)
The total amount of WaitingListPositions that correspond to the same Club of the current WaitingListPosition object
I need to figure out a query that would produce the following results for the API (using the previous data as example):
id | club_id | location_id | user_id | child_id | created_at | position | total_count
----+---------+-------------+---------+----------+-------------------------------+-----------+---------------
2 | 8 | 3 | 10 | | 2021-07-05 12:49:24.036091+00 | 1 | 2
4 | 8 | 3 | 33 | | 2021-07-05 12:55:57.54674+00 | 2 | 2
5 | 9 | 5 | | 5 | 2021-07-05 12:58:16.319837+00 | 1 | 1
I was able to successfully get the total_count into the query without any problems, perhaps not the most efficient way, but by running a subquery of all the memberships where the user/child is involved, and then getting the count for each location. Worked fine. I just cant for the life of me get the position...
This is what I've got so far, obviously the position is incorrect and is taking the row number of the wrong data.
WITH total_counts AS
(SELECT club_id,
COUNT(id)
FROM ClubWaitingListPositions
WHERE club_id = ANY((SELECT DISTINCT(club_id)
FROM ClubWaitingListPositions
WHERE user_id = $1
OR (child_id = ANY(SELECT id
FROM Children
WHERE $1 = ANY(parents)))))
GROUP BY club_id),
positions AS (SELECT results.*,
ClubLocations.location_name AS location_name,
Clubs.name AS club_name
FROM (SELECT ClubWaitingListPositions.*,
ROW_NUMBER() OVER(ORDER BY created_at) AS position,
(SELECT count
FROM total_counts
WHERE club_id = ClubWaitingListPositions.club_id)
AS total_count
FROM ClubWaitingListPositions) results
INNER JOIN ClubLocations ON ClubLocations.id = results.location_id
INNER JOIN Clubs ON ClubLocations.club_id = Clubs.id
WHERE (user_id = $1)
OR (child_id = ANY(SELECT id FROM Children WHERE $1 = ANY(parents))))
SELECT positions.*,
CONCAT(Users.first_name, ' ', Users.last_name) AS member_name
FROM positions
INNER JOIN Users ON Users.id = positions.user_id
WHERE user_id IS NOT NULL
UNION
SELECT positions.*,
CONCAT(Children.first_name, ' ', Children.last_name) AS member_name
FROM positions
INNER JOIN Children ON Children.id = positions.child_id
WHERE child_id IS NOT NULL
ORDER BY created_at DESC
Any help appreciated ! :)

Related

Postgres create view with column values based on another table?

I'm implementing a view to store leaderboard data of the top 10 users that is computed using an expensive COUNT(*). I'm planning on the view to look something like this:
id SERIAL PRIMARY KEY
user_id TEXT
type TEXT
rank INTEGER
count INTEGER
-- adding an index to user_id
-- adding a two-column unique index to user_id and type
I'm having trouble with seeing how this view should be created to properly account for the rank and type. Essentially, I have a big table (~30 million rows) like this:
+----+---------+---------+----------------------------+
| id | user_id | type | created_at |
+----+---------+---------+----------------------------+
| 1 | 1 | Diamond | 2021-05-11 17:35:18.399517 |
| 2 | 1 | Diamond | 2021-05-12 17:35:17.399517 |
| 3 | 1 | Diamond | 2021-05-12 17:35:18.399517 |
| 4 | 2 | Diamond | 2021-05-13 17:35:18.399517 |
| 5 | 1 | Clay | 2021-05-14 17:35:18.399517 |
| 6 | 1 | Clay | 2021-05-15 17:35:18.399517 |
+----+---------+---------+----------------------------+
With the table above, I'm trying to achieve something like this:
+----+---------+---------+------+-------+
| id | user_id | type | rank | count |
+----+---------+---------+------+-------+
| 1 | 1 | Diamond | 1 | 3 |
| 2 | 2 | Diamond | 2 | 1 |
| 3 | 1 | Clay | 1 | 2 |
| 4 | 1 | Weekly | 1 | 5 | -- 3 diamonds + 2 clay obtained between Mon-Sun
| 5 | 2 | Weekly | 2 | 1 |
+----+---------+---------+------+-------+
By Weekly I am counting the time from the last Sunday to the upcoming Sunday.
Is this doable using only SQL, or is some kind of script needed? If doable, how would this be done? It's worth mentioning that there are thousands of different types, so not having to manually specify type would be preferred.
If there's anything unclear, please let me know and I'll do my best to clarify. Thanks!
The "weekly" rows are produced in a different way compared to the "user" rows (I called them two different "categories"). To get the result you want you can combine two queries using UNION ALL.
For example:
select 'u' as category, user_id, type,
rank() over(partition by type order by count(*) desc) as rk,
count(*) as cnt
from scores
group by user_id, type
union all
select 'w', user_id, 'Weekly',
rank() over(order by count(*) desc),
count(*) as cnt
from scores
group by user_id
order by category, type desc, rk
Result:
category user_id type rk cnt
--------- -------- -------- --- ---
u 1 Diamond 1 3
u 2 Diamond 2 1
u 1 Clay 1 2
w 1 Weekly 1 5
w 2 Weekly 2 1
See running example at DB Fiddle.
Note: For the sake of simplicity I left the filtering by timestamp out of the query. If you really needed to include only the rows of the last 7 days (or other period of time), it would be a matter of adding a WHERE clause in both subqueries.
I think this is what you were talking about, right?
WITH scores_plus_weekly AS ((
SELECT id, user_id, 'Weekly' AS type, created_at
FROM scores
WHERE created_at BETWEEN '2021-05-10' AND '2021-05-17'
)
UNION (
SELECT * FROM scores
))
SELECT
row_number() OVER (ORDER BY CASE "type" WHEN 'Diamond' THEN 0 WHEN 'Clay' THEN 1 ELSE 2 END, count(*) DESC) as "id",
user_id,
"type",
row_number() OVER (PARTITION BY count(*) DESC) as "rank",
count(*)
FROM scores_plus_weekly
GROUP BY user_id, "type"
ORDER BY "id";
I'm sure this is not the only way, but I thought the result wasn't too complex. This query first combines the original database with all scores from this week. For the sake of consistency I picked a date range that matches your entire example set. It then groups by user_id and type to get the counts for each combination. The row_numbers will give you the overall rank and the rank per type. A big part of this query consists of sorting by type, so if you're joining another table that contains the order or priority of the types, the CASE can probably be simplified.
Then, lastly, this entire query can be caught in a view using the CREATE VIEW score_ranks AS , followed by your query.

Getting a distinct value from one column if all rows matches a certain criteria

I'm trying to find a performant and easy-to-read query to get a distinct value from one column, if all rows in the table matches a certain criteria.
I have a table that tracks e-commerce orders and whether they're delivered on time, contents and schema as following:
> select * from orders;
+----+--------------------+-------------+
| id | delivered_on_time | customer_id |
+----+--------------------+-------------+
| 1 | 1 | 9 |
| 2 | 0 | 9 |
| 3 | 1 | 10 |
| 4 | 1 | 10 |
| 5 | 0 | 11 |
+----+--------------------+-------------+
I would like to get all distinct customer_id's which have had all their orders delivered on time. I.e. I would like an output like this:
+-------------+
| customer_id |
+-------------+
| 10 |
+-------------+
What's the best way to do this?
I've found a solution, but it's a bit hard to read and I doubt it's the most efficient way to do it (using double CTE's):
> with hits_all as (
select memberid,count(*) as count from orders group by memberid
),
hits_true as
(select memberid,count(*) as count from orders where hit = true group by memberid)
select
*
from
hits_true
inner join
hits_all on
hits_all.memberid = hits_true.memberid
and hits_all.count = hits_true.count;
+----------+-------+----------+-------+
| memberid | count | memberid | count |
+----------+-------+----------+-------+
| 10 | 2 | 10 | 2 |
+----------+-------+----------+-------+
You use group by and having as follows:
select customer_id
from orders
group by customer_id
having sum(delivered_on_time) = count(*)
This works because an ontime delivery is identified by delivered_on_time = 1. So you can just ensure that the sum of delivered_on_time is equal to the number of records for the customer.
You can use aggregation and having:
select customer_id
from orders
group by customer_id
having min(delivered_on_time) = max(delivered_on_time);

Postgres query IN query

Is it possible in Postgres to determine if at least one result of query 1 is inside query 2 results?
For example:
SELECT * FROM items
WHERE
(SELECT id FROM users) IN (SELECT user_id FROM user_items WHERE item_id = 1)
I know that this query can be a nonsense, I'm just asking how to do that check in the where clause. In my real query (more complex), I'm getting:
(Postgrex.Error) ERROR 21000 (cardinality_violation): more than one row returned by a subquery used as an expression
if there is more than one result from query1 (query1 IN query2)
EDIT
select user_id
from notification_token n
join notification_folder f on n.user_id = f.user_id
where ((SELECT tag_id FROM notification_folder_tag WHERE notification_folder_id = f.id) IN (SELECT tag_id FROM event_tag WHERE event_id = 1))
tables:
notification_token
| user_id | notification_token |
--------------------------------------------------
| 1 | token1 |
| 2 | token2 |
| 3 | token3 |
notification_folder
| user_id | data |
--------------------------------------------------
| 1 | "useless string" |
notification_folder_tag
| notification_folder_id | tag_id |
--------------------------------------------------
| 1 | 1 |
| 1 | 2 |
| 2 | 5 |
event_tag
| event_id | tag_id |
--------------------------------------------------
| 1 | 1 |
| 2 | 2 |
| 3 | 8 |
The result that I want is user_id 1 from notification_token.
"Where" should be true because at least one tag_id from the left side of the IN (result 1,2) is contained in the right side of the IN (result 1).
Anyways i get error when the left side of the IN is composed by more than one entry. It works properly with just one entry
Try this
SELECT * FROM items
WHERE
EXISTS (SELECT id FROM users)
IN
(SELECT user_id FROM user_items WHERE item_id = 1);
If this doesn't work, go for relational database queries.
You seem to want items anyone who ordered item_1 has also ordered. If this interpretation is correct, then here is one way to write the query:
select distinct i.*
from items i join
user_items ui
on ui.item_id = i.item_id
where ui.user_id in (select ui2.user_id
from user_items ui2
where ui2.item_id = 1
);

Find number of rows identical one some, but different on another column

Say I have the following table:
CREATE TABLE data (
PROJECT_ID VARCHAR,
TASK_ID VARCHAR,
REF_ID VARCHAR,
REF_VALUE VARCHAR
);
I want to identify rows where
PROJECT_ID, REF_ID, REF_VALUE are the same
but TASK_ID are different.
The desired output is a list of TASK_ID_1, TASK_ID_2 and COUNT(*) of such conflicts. So, for example,
DATA
+------------+---------+--------+-----------+
| PROJECT_ID | TASK_ID | REF_ID | REF_VALUE |
+------------+---------+--------+-----------+
| 1 | 1 | 1 | 1 |
| 1 | 1 | 1 | 2 |
| 1 | 2 | 1 | 1 |
| 1 | 2 | 1 | 2 |
+------------+---------+--------+-----------+
OUTPUT
+-----------+-----------+----------+
| TASK_ID_1 | TASK_ID_2 | COUNT(*) |
+-----------+-----------+----------+
| 1 | 2 | 2 |
| 2 | 1 | 2 |
+-----------+-----------+----------+
would mean that there are two entries with TASK_ID == 1 and two entries with TASK_ID == 2 that share the same values for the other three columns. The inherent symmetry in the output is fine.
How would I go about finding this information? I've tried joining the table onto itself and grouping, but this turned up more results for a single task than the table had rows altogether, so it's clearly wrong.
The database used is PostgreSQL, though a solution that applies to most common SQL systems would be preferable.
You want a self join and aggregation:
select d1.task_id as task_id_1, d2.task_id as task_id_2, count(*)
from data d1 join
data d2
on d1.project_id = d2.project_id and
d1.ref_id = d2.ref_id and
d1.ref_value = d2.ref_value and
d1.task_id <> d2.task_id
group by d1.task_id, d2.task_id;
Notes:
Add the condition d1.task_id < d2.task_id if you want each pair to occur only once in the result set.
This does not handle NULL values, although that is easy enough to handle. Use is not distinct from instead of =.
You can also simplify this a bit with the using clause:
select d1.task_id as task_id_1, d2.task_id as task_id_2, count(*)
from data d1 join
data d2
using (project_id, ref_id, ref_value)
where d1.task_id <> d2.task_id
group by d1.task_id, d2.task_id;
You can get an idea of how many rows might be returned by using:
select d.project_id, d.ref_id, d.ref_value, count(distinct d.task_id), count(*)
from data d
group by d.project_id, d.ref_id, d.ref_value;
This is how I understand your question. This assume there are only two task for the same combination.
SQL DEMO
SELECT "PROJECT_ID", "REF_ID", "REF_VALUE",
MIN("TASK_ID") as TASK_ID_1,
MAX("TASK_ID") as TASK_ID_2,
COUNT(*) as cnt
FROM Table1
GROUP BY "PROJECT_ID", "REF_ID", "REF_VALUE"
HAVING MIN("TASK_ID") != MAX("TASK_ID")
-- COUNT(*) > 1 also should work
OUTPUT
I add more column to make clear what are the same elements:
| PROJECT_ID | REF_ID | REF_VALUE | task_id_1 | task_id_2 | cnt |
|------------|--------|-----------|-----------|-----------|-----|
| 1 | 1 | 2 | 1 | 2 | 2 |
| 1 | 1 | 1 | 1 | 2 | 2 |

Select all users who wrote a minimum amount of messages in a fixed time frame

Table user_message:
+----+---------+-------+------------+
| id | from_id | to_id | time_stamp |
+----+---------+-------+------------+
| 1 | 1 | 2 | 1414700000 |
| 2 | 2 | 1 | 1414700100 |
| 3 | 3 | 1 | 1414701000 |
| 4 | 3 | 2 | 1414701001 |
| 5 | 3 | 4 | 1414701002 |
| 6 | 1 | 3 | 1414701100 |
+----+---------+-------+------------+
I am now trying to get all users who wrote a minimum amount of messages, let's say 3, to other users in a fixed time frame, let's say 5 seconds. As in this example, I'd like to get a result looking similar to this:
+----+----+-------+
| from_id | count |
+---------+-------+
| 3 | 3 |
+---------+-------+
The idea of this is to check the messages for spam. A nice bonus would be to only take messages into account that share the same content.
The following uses a join for this purpose:
select um.*, count(*) as cnt
from user_message um join
user_message um2
on um.from_id = um2.from_id and
um2.time_stamp between um.time_stamp and um.time_stamp + 3
group by um.id
having count(*) >= 3;
For performance, you would want an index on user_message(from_id, time_stamp). Even with the index, if you have a large-ish table, the performance might not be so great.
EDIT:
Actually, another way to write this that might be more efficient is:
select um.*,
(select count(*)
from user_message um2
where um.from_id = um2.from_id and
um2.time_stamp between um.time_stamp and um.time_stamp + 3
) as cnt
from user_message um
having cnt >= 3;
This uses a MySQL extension that allows having in a non-aggregation query.
For every message (u1) find all messages (u2) sent from the same user in this second or the four previous seconds. Keep those u1 that have at least 3 u2. At last group by from_id to show one record per from_id with the maximum number of sent messages.
select from_id, max(cnt) as max_count
from
(
select u1.id, u1.from_id, count(*) as cnt
from user_message u1
join user_message u2
on u2.from_id = u1.from_id
-- and u2.content = u1.content
and u2.time_stamp between u1.time_stamp - 4 and u1.time_stamp
group by u1.id, u1.from_id
having count(*) >= 3
) as init
group by from_id;