PostgreSQL: Students who took more than one test on the date of their most recent test - sql

PostgreSQL
Data:
Tests:
- student (name, all unique)
- date (MM/DD, assume same year)
Example:
Tests:
student | date
aa | 01/01
aa | 01/01
bb | 01/01
bb | 01/02
Expected output:
student | date
aa | 01/01
Because bb only took 1 test; need to output students who took 2+ tests on same day for their most recent test date

Your problem is that nowhere in your query can be found the part with the "most recent test".
So I took your query and added a subquery to find out this information for each student. Joining that with your query filters out every other test date and it works.
SELECT
*
FROM exams e
JOIN (
SELECT DISTINCT ON (e.student)
*
FROM exams e
ORDER BY e.student, e.date DESC
) s USING (student, date)
GROUP BY e.student, e.date
HAVING COUNT(e.date) >= 2
ORDER BY e.student
demo: db<>fiddle

Here is one way, using analytic functions:
SELECT student, date
FROM
(
SELECT *,
ROW_NUMBER() OVER (PARTITION BY student ORDER BY date DESC) rn,
COUNT(*) OVER (PARTITION BY student, date) cnt
FROM exams
) t
WHERE rn = 1 AND cnt > 1;
Demo

Related

How to use FIND_IN_SET and sum column in my SQL query

Can anyone help me? I have a table result like this:
id_user
score
type
001
30
play
001
40
play
001
30
redeem
002
20
play
002
30
redeem
I want to sum column score group by id_user base on type 'play' and after that I want show ranking using find_in_set. Like this is the result of the table that I want to display:
id_user
total
rank
001
70
1
002
20
2
Previously I used the rank() function in MySQL version 10.4, but it does not work in MySQL version 15.1. This is my previous query:
SELECT id_user, SUM(score) AS total,
RANK() OVER (ORDER BY total DESC) AS rank
FROM result
WHERE type='play'
GROUP BY id_user
I have made some changes in your query. It's working now. Instead of column alias total SUM(score) needs to be used in order by clause of Rank() function's over(). And since Rank is a reserve word I used rnk instead.
DB-Fiddle:
create table result (id_user varchar(5), score int, type varchar(20));
insert into result values('001',30 ,'play');
insert into result values('001',40 ,'play');
insert into result values('001',30 ,'redeem');
insert into result values('002',20 ,'play');
insert into result values('002',30 ',redeem');
Query:
select id_user, SUM(score) AS total, RANK() OVER (ORDER BY SUM(score) DESC) AS rnk FROM result where type='play' GROUP BY id_user
Output:
id_user
total
rnk
001
70
1
002
20
2
db<>fiddle here
If your MySQL version doesn't support rank() you can use subquery to achieve same result:
Query:
select id_user, SUM(score) AS total,
coalesce((select count(distinct id_user) from result r2
where type='play'
group by id_user
having sum(r2.score)>sum(r.score) ),0)+1 AS rnk
FROM result r where type='play'
GROUP BY id_user
Output:
id_user
total
rnk
001
70
1
002
20
2
db<>fiddle here

Doing a distinct count on an employee history table, based on departments at a current point in time

So I have an employee table with data on all employee since the beginning. In the data I have all the data I should need. I have the employee startdate, enddate (null if nothing), I have the name of the department, and if a department has changed, that specific employee has a new line, with a new department value, and two columns called "DepValidFrom" and "DepValidto", in date format that determines the time-period that the current employee was in that specific department.
My goal is, to get into a matrix, a list of all the departments as rows, and with year and month as columns, and the number of employees in that department at that time as values. I have all the data, I just cannot find the exact way to write my PowerBI Measure or perhaps even SQL query.
So.... I am trying to pull this into Power BI, and I am getting an incomplete view. I want my data to look like the following:
Department | Jan | Feb | Mar | Apr |
Dep1 | 3 | 5 | 6 | 4 |
Dep2 | 2 | 3 | 2 | 3 |
Dep3 | 1 | 1 | 2 | 3 |
Right now I am just using a very simple DISTINCTCOUNT(Emp_Table[EmployeeInitials]) which gives me an incomplete view, as it only counts on the specific date, and doesn't retain the number into a total, leaving a bunch of empty values.
I hope someone can understand what I mean, and that someone can help!
Thanks!
You can start by unpivoting the dates and generating a query that gives the number of employee per department and date:
select e.dept, x.dt, sum(cnt) over(partition by dept order by dt) cnt
from employees e
cross apply (values (startdate, 1), (enddate, -1)) as x(dt, cnt)
where dt is not null
Then, you can do conditional aggregation to pivot the results - this requires enumerating the dates though:
select dept,
max(case when dt >= '20200101' and dt < '20200201' then cnt else 0 end) cnt_202001,
max(case when dt >= '20200201' and dt < '20200301' then cnt else 0 end) cnt_202002,
...
from (
select e.dept, x.dt, sum(cnt) over(partition by dept order by dt) cnt
from employees e
cross apply (values (startdate, 1), (enddate, -1)) as x(dt, cnt)
where dt is not null
) t
group by dept
When an employee changes in the middle of the month, it is counted in both departments for that month.

Get employee with on-off-on weekend work pattern

I have an employee table which has columns like
employee_ID, punch_in_date, punch_out_date.
Now, what I need is to find those employees who have worked on-off-on weekend pattern.
It is like if an employee has worked in week1 then he/she should not have worked in week2 and must have worked in Week3.
Week1, week2, and week3 are the consecutive weekend days.
I tried using the lag function of sql.
SELECT employee_id,
punch_in_date,
Lag(punch_in_date) OVER(partition BY employee_id ORDER BY employee_id) AS week_lag,
Datediff(day,Lag(punch_in_date) OVER(partition BY employee_id ORDER BY employee_id) ,punch_in_date) AS days
FROM employee
WHERE Datediff(day,Lag(punch_in_date) OVER(partition BY employee_id ORDER BY employee_id) ,punch_in_date)>= 14
AND datediff(day, punch_in_date, 'Today's date') <= 90 /*This means the data must falls under 3 months duration*/;
But I am getting an error like
SQL Error [4108] [S0001]: Windowed functions can only appear in the
SELECT or ORDER BY clauses.
How can I get the required result?
sample data:
employee_ID |punch_in_date |punch_out_date |
------------|--------------|---------------|
2 |2015-12-05 |2015-12-05 |
2 |2015-12-12 |2015-12-12 |
2 |2015-12-19 |2015-12-19 |
2 |2016-01-02 |2016-01-02 |
2 |2016-01-23 |2016-01-24 |
2 |2016-01-24 |2016-01-25 |
2 |2016-01-30 |2016-01-30 |
2 |2016-02-06 |2016-02-06 |
2 |2016-02-06 |2016-02-06 |
2 |2016-02-06 |2016-02-07 |
2 |2016-02-13 |2016-02-14 |
2 |2016-02-27 |2016-02-28 |
2 |2016-03-12 |2016-03-13 |
I suspect you want:
select employee_id, punch_in_date, week_lag
datediff(day, week_lag, punch_in_date) AS days
from (select e.*,
lag(punch_in_date) over (partition by employee_id order by employee_id) as week_lag
from employee e
) e
where week_lag >= 14 and
datediff(day, punch_in_date, getdate()) <= 90 ;
When using window functions, be very careful about where filtering. The filters are applied before the window function, so you might miss some rows that you want.
As the error message states; Windowed function are only allowed in select and order by.
What you can do is to use your query in a subquery
Select Employee_id,punch_in_date, week_lag,[days] FROM(
SELECT employee_id,
punch_in_date,
Lag(punch_in_date) OVER(partition BY employee_id ORDER BY employee_id)
AS week_lag,
Datediff(day,Lag(punch_in_date) OVER(partition BY employee_id ORDER BY
employee_id) ,punch_in_date) AS [days]
FROM employee
where punch_in_date >= dateadd(day,-90,getdate())
) q
WHERE [days]>= 14

Combining COUNT and RANK - PostgreSQL

What I need to select is total number of trips made by every 'id_customer' from table user and their id, dispatch_seconds, and distance for first order. id_customer, customer_id, and order_id are strings.
It should looks like this
+------+--------+------------+--------------------------+------------------+
| id | count | #1order id | #1order dispatch seconds | #1order distance |
+------+--------+------------+--------------------------+------------------+
| 1ar5 | 3 | 4r56 | 1 | 500 |
| 2et7 | 2 | dc1f | 5 | 100 |
+------+--------+------------+--------------------------+------------------+
Cheers!
Original post was edited as during discussion S-man helped me to find exact problem solution. Solution by S-man https://dbfiddle.uk/?rdbms=postgres_10&fiddle=e16aa6008990107e55a26d05b10b02b5
db<>fiddle
SELECT
customer_id,
order_id,
order_timestamp,
dispatch_seconds,
distance
FROM (
SELECT
*,
count(*) over (partition by customer_id), -- A
first_value(order_id) over (partition by customer_id order by order_timestamp) -- B
FROM orders
)s
WHERE order_id = first_value -- C
https://www.postgresql.org/docs/current/static/tutorial-window.html
A window function which gets the total record count per user
B window function which orders all records per user by timestamp and gives the first order_id of the corresponding user. Using first_value instead of min has one benefit: Maybe it could be possible that your order IDs are not really increasing by timestamp (maybe two orders come in simultaneously or your order IDs are not sequential increasing but some sort of hash)
--> both are new columns
C now get all columns where the "first_value" (aka the first order_id by timestamp) equals the order_id of the current row. This gives all rows with the first order by user.
Result:
customer_id count order_id order_timestamp dispatch_seconds distance
----------- ----- -------- ------------------- ---------------- --------
1ar5 3 4r56 2018-08-16 17:24:00 1 500
2et7 2 dc1f 2018-08-15 01:24:00 5 100
Note that in these test data the order "dc1f" of user "2et7" has a smaller timestamp but comes later in the rows. It is not the first occurrence of the user in the table but nevertheless the one with the earliest order. This should demonstrate the case first_value vs. min as described above.
You are on the right track. Just use conditional aggregation:
SELECT o.customer_id, COUNT(*)
MAX(CASE WHEN seqnum = 1 THEN o.order_id END) as first_order_id
FROM (SELECT o.*,
ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY order_timestamp ASC) as seqnum
FROM orders o
) o
GROUP BY o.customer_id;
Your JOIN is not necessary for this query.
You can use window function :
select distinct customer_id,
count(*) over (partition by customer_id) as no_of_order
min(order_id) over (partition by customer_id order by order_timestamp) as first_order_id
from orders o;
I think there are many mistakes in your original query, your rank isn't partitioned, the order by clause seems incorrect, you filter out all but one "random" order, then apply the count, the list goes on.
Something like this seems closer to what you seem to want?
SELECT
customer_id,
order_count,
order_id
FROM (
SELECT
a.customer_id,
a.order_count,
a.order_id,
RANK() OVER (PARTITION BY a.order_id, a.customer_id ORDER BY a.order_count DESC) AS rank_id
FROM (
SELECT
customer_id,
order_id,
COUNT(*) AS order_count
FROM
orders
GROUP BY
customer_id,
order_id) a) b
WHERE
b.rank_id = 1;

How to query the three best players in Oracle?

I have the following table:
NAME | SCORE
ALICE | 100
BOB | 90
CHARLES| 90
DUKE | 80
EVE | 70
...
My question is the following:
How can I extract with one query the name of the three best players? In my example the query should return four rows (ALICE, BOB, CHARLES and DUKE) because there are two silver medalists (they both have 90 points).
Thank You in advance.
Oracle has the DENSE_RANK analytical function for that exact purpose:
select name, score from (
select name, score, dense_rank() over(order by score desc nulls last) rank
-- ^^^^^^^^^^
-- reject NULL score at the end
from t
) V
where rank < 4
order by rank, name
See http://sqlfiddle.com/#!4/88445/5
How about the following
select *
from table1
where score >=
(select score from (
select score, rownum r from (
select distinct score from table1 order by score desc
) where rownum <= 3
) where r = 3)
order by score desc
See also this SQLFiddle: http://sqlfiddle.com/#!4/23e68/1