SQL sort that distributes results - sql

Given a table of products like this:
ID Name Seller ID Updated at
-- ---- --------- ----------
1 First 3 2012-01-01 12:00:10
2 Second 3 2012-01-01 12:00:09
3 Third 4 2012-01-01 12:00:08
4 Fourth 4 2012-01-01 12:00:07
5 Fifth 5 2012-01-01 12:00:06
I want to construct a query to sort the products like this:
ID
---
1
3
5
2
4
In other words, the query should show most recently updated products, distributed by seller to minimize the likelihood of continuous sequences of products from the same seller.
Any ideas on how to best accomplish this? (Note that the code for this application is Ruby, but I'd like to do this in pure SQL if possible).
EDIT:
Note that the query should handle this case, too:
ID Name Seller ID Updated at
-- ---- --------- ----------
1 First 3 2012-01-01 12:00:06
2 Second 3 2012-01-01 12:00:07
3 Third 4 2012-01-01 12:00:08
4 Fourth 4 2012-01-01 12:00:09
5 Fifth 5 2012-01-01 12:00:10
to produce the following results:
ID
---
5
4
2
3
1

One option demonstrated in this sqlfiddle is
select subq.*
from (
select rank() over (partition by seller_id order by updated_at desc) rnk,
p.*
from products p) subq
order by rnk, updated_at desc;

Related

How to get a set of records from within each partition based on a condition

From a table like this:
id
status
date
category
1
PENDING
2022-07-01
XYZ
2
DONE
2022-07-04
XYZ
3
PENDING
2022-07-03
DEF
4
DONE
2022-07-08
DEF
I would like to get the most recent records within each category (here 2 and 4). But there are at least two factors that complicate things.
First, there might be more than two records in the same category. (The records come in pairs.)
id
status
date
category
1
PENDING
2022-07-01
XYZ
2
PENDING
2022-07-02
XYZ
3
FAILED
2022-07-04
XYZ
4
FAILED
2022-07-05
XYZ
5
PENDING
2022-07-03
DEF
6
DONE
2022-07-08
DEF
In this case, I'd have to get 3, 4, and 6. Were there six records in the XYZ category, I'd have to get the most recent three.
And, secondly, the date could be the same for the most recent records within a category.
I tried something like this:
WITH temp AS (
SELECT *,
dense_rank() OVER (PARTITION BY category ORDER BY date DESC) rnk
FROM tbl
)
SELECT *
FROM temp
WHERE rnk = 1;
But this fails when there are more than 2 records in a category and I need to get the most recent two.
EDIT:
Eli Johnson has pointed out in a comment that there should be information about which messages are pairs. Of course! I digged around a bit, and after a join or two there is.
id
status
date
category
prev_id
1
PENDING
2022-07-01
XYZ
{}
2
PENDING
2022-07-02
XYZ
{}
3
FAILED
2022-07-04
XYZ
{1}
4
FAILED
2022-07-05
XYZ
{2}
5
PENDING
2022-07-03
DEF
{}
6
DONE
2022-07-08
DEF
{5}
The requirements are more hard-coded here then following proper design.
Based on what has been proposed in the question, I just tweaked it a little bit to get last records.
Assuming that records are always in pair, as mentioned in the question.
WITH temp AS (
SELECT *,
row_number() OVER (PARTITION BY category ORDER BY date1 DESC) rnk,
count(1) over (partition by category) cnt
FROM status
)
SELECT *
FROM temp
WHERE rnk*2 <= cnt;
Refer fiddle here.

Count values separately until certain amount of duplicates SQL

I need a Statement that selects all patients and the amount of their appointments and when there are 3 or more appointments that are taking place on the same date they should be counted as one appointment
That is what my Statement looks so far
SELECT PATSuchname, Count(DISTINCT AKTDATUM) AS AKTAnz
FROM tblAktivitaeten
LEFT OUTER JOIN tblPatienten ON (tblPatienten.PATID=tblAktivitaeten.PATID)
WHERE (AKTDeleted<>'J' OR AKTDeleted IS Null)
GROUP BY PATSuchname
ORDER BY AKTAnz DESC
The result should look like this
PATSuchname Appointments
----------------------------------------
Joey Patner 13
Billy Jean 15
Example Name 13
As you can see Joey Patner has 13 Appointments, in the real table though he has 15 appointments but three of them have the same Date and because of that they are only counted as 1
So how can i write a Statement that does exactly that?
(I am new to Stack Overflow, sorry if the format I use is wrong and tell me if it is.
In the table it looks like this.
tblPatienten
----------
PATSuchname PATID
------------------------
Joey Patner 1
Billy Jean 2
Example Name 3
tblAktivitaeten
----------
AKTDatum PATID AKTID
-----------------------------------------
08.02.2021 1 1000 ----
08.02.2021 1 1001 ---- So these 3 should counted as 1
08.02.2021 1 1002 ----
09.05.2021 1 1003
09.07.2021 2 1004 -- these 2 shouldn't be counted as 1
09.07.2021 2 1005 --
Two GROUP BY should do it:
SELECT
x.PATID, PATSuchname, SUM(ApptCount)
FROM (
SELECT
PATID, AKTDatum, CASE WHEN COUNT(*) < 3 THEN COUNT(*) ELSE 1 END AS ApptCount
FROM tblAktivitaeten
GROUP BY
PATID, AKTDatum
) AS x
LEFT JOIN tblPatienten ON tblPatienten.PATID = x.PATID
GROUP BY
x.PATID, PATSuchname

Select max date for each register, null if does not exists

I have these tables: Employee (id, name, number), Configuration (id, years, licence_days), Periods (id, start_date, end_date, configuration_id, employee_id, period_type):
Employee table:
id name number
---- ----- -------
1 Bob 355
2 John 467
3 Maria 568
4 Josh 871
configuration table:
id years licence_days
---- ----- ------------
1 1 8
2 3 16
3 5 24
Periods table:
id start_date end_date configuration_id employee_id period_type
---- ---------- ------- ---------------- ----------- -----------
1 2021-05-23 2021-05-31 1 1 vaccation
2 2021-05-24 2021-06-01 1 2 vaccation
3 2021-03-01 2021-03-17 2 2 vaccation
4 2021-05-05 2021-05-21 2 2 vaccation
5 2021-01-01 2021-01-17 2 4 vaccation
I want this result:
Result:
employee_id years licence_days max(end_date)
1 1 8 2021-05-31
1 3 16 null
1 5 24 null
2 1 8 2021-06-01
2 3 16 2021-05-21
2 5 24 null
3 1 8 null
3 3 16 null
3 5 24 null
4 1 8 null
4 3 16 2021-01-17
4 5 24 null
i.e., I want to select all Employees with all configuration, and for each one of that, the max end_date of the "vaccation" type (or null if it does not exists).
How can I do that
Oracle supports cross joins, right? So may be something like that?
SELECT e.employee_id, c.years, c.licence_days, max(p.end_date)
FROM Employee e
CROSS JOIN configuration c
LEFT JOIN Periods p
ON e.employee_id = p.employee_id
AND c.configuration_id = p.configuration_id
GROUP BY e.employee_id, c.years, c.licence_days
ORDER BY e.employee_id, c.years
#umberto-petrov chooses wisely with the ANSI CROSS JOIN syntax for a cartesian join. However, in the very weak probability that your requires output of configurations even where there is no employees, you can go with something like :
EDIT: Filtering the Periods join with 'vaccation' as asked in the comments.
If you have to filter for some employee ids, change ON 1 = 1 by ON Employee.id IN (id1, id2, ...). It still keeps every configurations but only takes employees that match the ids.
SELECT Employee.employee_id,
Configuration.years,
Configuration.licence_days,
MAX(Configuration.end_date) max_end_date
FROM Configuration LEFT JOIN Employee ON 1 = 1
LEFT JOIN Periods ON Periods.configuration_id = Configuration.id
AND Periods.employee_id = Employee.id
AND Periods.period_type = 'vaccation'
GROUP BY Employee.employee_id,
Configuration.years,
Configuration.licence_days
ORDER BY Employee.employee_id,
Configuration.years,
Configuration.licence_days
We start from configuration to take every records from this one at least, then made a LEFT CARTESIAN JOIN with Employee and finally a full LET JOIN on Periods for both. That way , if there is no employees, this will output configuration_id and NULL for years, licence_days and max end_date.

How do I create a frequency distribution?

I'm trying to create a frequency distribution to show how many customers have transacted 1x, 2x, 3x, etc.
I have a database transactions and column user_id. Each row indicates a transaction, and if a user_id shows up in multiple rows, that user has done multiple transactions.
Now I'd like to get a list that looks something like this:
Tra. | Freq.
0 | 345
1 | 543
2 | 45
3 | 20
4 | 0
5 | 3
etc
Currently I have this, but it just shows a list of users and how many transactions they have had.
SELECT user_id, COUNT(user_id) as number_of_transactions
FROM transactions
GROUP BY user_id
ORDER BY number_of_transactions DESC;
I did some digging and was suggested that generate_series might help, but I'm stuck and don't know how to move forward.
Use the first result as input to an outer query where you apply the count again, but this time grouping on number_of_transactions:
SELECT number_of_transactions, COUNT(*) AS freq
FROM (
SELECT user_id, COUNT(user_id) as number_of_transactions
FROM transactions
GROUP BY user_id
) A
GROUP BY number_of_transactions;
This would transform a result like:
user_id number_of_transactions
----------- ----------------------
1 2
2 1
3 2
4 4
to this:
number_of_transactions freq
---------------------- -----------
1 1
2 2
4 1

How to filter first appearance in table only

Here is the table structure:
tblApplicants:
applicantID (index) | ApplyingForYear (nvarchar)
------------------------------------------------------
1 2013/14
11 2013/14
13 2013/14
12 2013/14
15 2013/14
21 2012/13
tblApplicantSchools_shadow:
id (index) | applicantID | updated (datetime) | statusID (int) | schoolID (int)
-----------------------------------------------------------------------------------------------------
1 11 2012-09-24 00:00:00.000 3 2
1 13 2012-10-24 00:00:00.000 4 2
2 15 2012-11-24 00:00:00.000 3 4
3 13 2012-03-24 00:00:00.000 4 3
4 12 2012-09-24 00:00:00.000 4 1
5 21 2012-11-03 00:00:00.000 5 2
6 11 2012-09-04 00:00:00.000 4 4
What I need to do is:
get all applicants, that have an ApplyingForYear of '2013/14' in tblApplicants
have a statusID of 4
I only want to count them once - even if they appear twice or more in tblApplicantschools_show
group the number of distinct applicants (as per the above) - by the updated date column (grouped by week)
So based on the sample data above, there should be 3 rows that come out, (because ApplicantID 13 appears twice and I only want him once).
This is how the result should look:
Datesubmitted TotalAppsPerWeek
-------------------------------------------------------
2012-10-24 00:00:00.000 1
2012-09-24 00:00:00.000 1
2012-09-04 00:00:00.000 1
This is what I have so far - but it results in 4 rows, not 3 :(
select
DATEADD(ww,(DATEDIFF(ww,0,[tblApplicantSchools_shadow].updated)),0) AS Datesubmitted,
count(DISTINCT [tblApplicantSchools_shadow].applicantID) as TotalAppsPerWeek
FROM tblApplicants
INNER JOIN tblApplicantSchools_shadow
ON tblApplicantS.ApplicantID = tblApplicantSchools_shadow.applicantID
WHERE
ApplyingForYear = '2013/14'
AND [tblApplicantSchools_shadow].statusID = 4
GROUP BY
DATEADD(ww, (DATEDIFF(ww, 0, [tblApplicantSchools_shadow].updated)), 0)
And here is a Fiddle: http://sqlfiddle.com/#!3/3aa61/42
From your title, I'm assuming the one row you want from each applicant is the one with the smallest id. You can select one row per applicant ID with the ROW_NUMBER() function:
;with latestApplication AS
(
SELECT DATEADD(ww,(DATEDIFF(ww,0,[tblApplicantSchools_shadow].updated)),0)
AS Datesubmitted,
[tblApplicantSchools_shadow].applicantID,
ROW_NUMBER() OVER (PARTITION BY [tblApplicantSchools_shadow].applicantID
ORDER BY [tblApplicantSchools_shadow].id)
AS rn
FROM tblApplicants
INNER JOIN tblApplicantSchools_shadow
ON tblApplicantS.ApplicantID = tblApplicantSchools_shadow.applicantID
WHERE ApplyingForYear = '2013/14'
AND [tblApplicantSchools_shadow].statusID = 4
)
select Datesubmitted, COUNT(1) AS TotalAppsPerWeek
FROM latestApplication
WHERE rn = 1
group by Datesubmitted
order by Datesubmitted DESC
http://sqlfiddle.com/#!3/3aa61/57