Need assistance with a SQL query involving multiple sorts - sql

I'm not sure if this is even possible, but my head starts to hurt when thinking about how to solve this. I've read on subqueries and PARTITION but I'm outside my knowledge. Here is a sample of my data:
TestID StudentID ComponentID Score
-------------------------------------
14919 3445 1 20
14919 3445 4 17
14919 3445 8 20
14919 3445 11 19
14919 3445 13 19
11339 3448 1 15
11339 3448 4 23
11339 3448 8 23
**11339 3448 11 22**
11339 3448 13 20
**14773 3448 1 20**
14773 3448 4 21
**14773 3448 8 23**
14773 3448 11 21
**14773 3448 13 21**
There can be multiple test attempts attached to the same StudentID. Attempts are noted by TestID.
I need to be able to query for the highest test score per TestComponentID over all attempts for each StudentID. There are only 5 component IDs. So for StudentID = 14773, between both ComponentID of 1, I just need the highest score. I would need the same for 4, 8, 11 and 13. I hope that makes sense. I highlighted the rows that would need to be returned. Any help is greatly appreciated.
Here is the query I've attempted. It just returns the same number of rows as the original.
SELECT DISTINCT
sts.StudentStandardizedTestID,
sts.StandardizedTestComponentID,
sts.StudentID,
MAX(sts.score) OVER (PARTITION BY sts.StudentID) HIGHSCORE
FROM
StandardizedTestScore sts
JOIN
StudentStandardizedTest sst ON sst.StudentStandardizedTestID = sts.StudentStandardizedTestID
AND sst.standardizedtestid = 1
WHERE
sst.TranscriptSchoolID = 10
AND sts.StandardizedTestComponentID = 1
OR sts.StandardizedTestComponentID = 4
OR sts.StandardizedTestComponentID = 8
OR sts.StandardizedTestComponentID = 11
OR sts.StandardizedTestComponentID = 13
ORDER BY
sts.studentid, sts.StandardizedTestComponentID

Below is the code to create your table and data.
CREATE TABLE StandardizedTestScore (`StudentStandardizedTestID` int(11) ,`studentid` int(11) ,`StandardizedTestComponentID` int(11),`score` int(11));
INSERT INTO StandardizedTestScore
(`TestID`, `studentid`, `componentid`, `score`)
VALUES
(14919,3445,1,20),
(14919,3445,4,17),
(14919,3445,8,20),
(14919,3445,11,19),
(14919,3445,13,19),
(11339,3448,1,15),
(11339,3448,4,23),
(11339,3448,8,23),
(11339,3448,11,22),
(11339,3448,13,20),
(14773,3448,1,20),
(14773,3448,4,21),
(14773,3448,8,23),
(14773,3448,11,21),
(14773,3448,13,21);
The query you are looking for is this..
SELECT studentid,StandardizedTestComponentID as componentID,MAX(score) AS score
FROM StandardizedTestScore
GROUP BY studentid,StandardizedTestComponentID
The results are this..
studentid ComponentID Score
3445 1 20
3445 4 17
3445 8 20
3445 11 19
3445 13 19
3448 1 20
3448 4 23
3448 8 23
3448 11 22
3448 13 21

It sounds to me like you need aggregation not sorting. Something like:
SELECT studentid,testid,componentid,MAX(score) AS score
FROM yourtable
GROUP BY studentid,testid,componentid

Related

How to get top values when there is a tie

I am having difficulty figuring out this dang problem. From the data and queries I have given below I am trying to see the email address that has rented the most movies during the month of September.
There are only 4 relevant tables in my database and they have been anonymized and shortened:
Table "cust":
cust_id
f_name
l_name
email
1
Jack
Daniels
jack.daniels#google.com
2
Jose
Quervo
jose.quervo#yahoo.com
5
Jim
Beam
jim.beam#protonmail.com
Table "rent"
inv_id
cust_id
rent_date
10
1
9/1/2022 10:29
11
1
9/2/2022 18:16
12
1
9/2/2022 18:17
13
1
9/17/2022 17:34
14
1
9/19/2022 6:32
15
1
9/19/2022 6:33
16
3
9/1/2022 18:45
17
3
9/1/2022 18:46
18
3
9/2/2022 18:45
19
3
9/2/2022 18:46
20
3
9/17/2022 18:32
21
3
9/19/2022 22:12
10
2
9/19/2022 11:43
11
2
9/19/2022 11:42
Table "inv"
mov_id
inv_id
22
10
23
11
24
12
25
13
26
14
27
15
28
16
29
17
30
18
31
19
31
20
32
21
Table "mov":
mov_id
titl
rate
22
Anaconda
3.99
23
Exorcist
1.99
24
Philadelphia
3.99
25
Quest
1.99
26
Sweden
1.99
27
Speed
1.99
28
Nemo
1.99
29
Zoolander
5.99
30
Truman
5.99
31
Patient
1.99
32
Racer
3.99
and here is my current query progress:
SELECT cust.email,
COUNT(DISTINCT inv.mov_id) AS "Rented_Count"
FROM cust
JOIN rent ON rent.cust_id = cust.cust_id
JOIN inv ON inv.inv_id = rent.inv_id
JOIN mov ON mov.mov_id = inv.mov_id
WHERE rent.rent_date BETWEEN '2022-09-01' AND '2022-09-31'
GROUP BY cust.email
ORDER BY "Rented_Count" DESC;
and here is what it outputs:
email
Rented_Count
jack.daniels#google.com
6
jim.beam#protonmail.com
6
jose.quervo#yahoo.com
2
and what I want it to be outputting:
email
jack.daniels#google.com
jim.beam#protonmail.com
From the results I am actually getting I have a tie for first place (Jim and Jack) and that is fine but I would like it to list both tieing email addresses not just Jack's so you cant do anything with rows or max I don't think.
I think it must have something to do with dense_rank but I don't know how to use that specifically in this scenario with the count and Group By?
Your creativity and help would be appreciated.
You're missing the FETCH FIRST ROWS WITH TIES clause. It will work together with the ORDER BY clause to get you the highest values (FIRST ROWS), including ties (WITH TIES).
SELECT cust.email
FROM cust
INNER JOIN rent
ON rent.cust_id = cust.cust_id
INNER JOIN inv
ON inv.inv_id = rent.inv_id
INNER JOIN mov
ON mov.mov_id = inv.mov_id
WHERE rent.rent_date BETWEEN '2022-09-01' AND '2022-09-31'
GROUP BY cust.email
ORDER BY COUNT(DISTINCT inv.mov_id) DESC
FETCH FIRST 1 ROWS WITH TIES

How to group merge columns based on one row identifier with pandas?

I have a dataset, in which it has a lot of entries for a single location. I am trying to find a way to sum up all of those entries without affecting any of the other columns. So, just in case I'm not explaining it well enough, I want to use a dataset like this:
Locations Cyclists maleRunners femaleRunners maleCyclists femaleCyclists
Bedford 10 12 14 17 27
Bedford 11 40 34 9 1
Bedford 7 1 2 3 3
Leeds 1 1 2 0 0
Leeds 20 13 6 1 1
Bath 101 20 33 41 3
Bath 11 2 3 1 0
And turn it into something like this:
Locations Cyclists maleRunners femaleRunners maleCyclists femaleCyclists
Bedford 28 53 50 29 31
Leeds 21 33 39 1 1
Bath 111 22 36 42 3
Now, I have read up that a groupby should work in a way, but from my understanding a group by will change it into 2 columns and I don't particularly want to make hundreds of 2 columns and then merge it all. Surely there's a much simpler way to do this?
IIUC, groupby+sum will work for you:
df.groupby('Locations',as_index=False,sort=False).sum()
Output:
Locations Cyclists maleRunners femaleRunners maleCyclists femaleCyclists
0 Bedford 28 53 50 29 31
1 Leeds 21 14 8 1 1
2 Bath 112 22 36 42 3
Pivot table should work for you.
new_df = pd.pivot_table(df, values=['Cyclists', 'maleRunners', 'femalRunners',
'maleCyclists','femaleCyclists'],index='Locations', aggfunc=np.sum)

Compare Current Row with Previous/Next row in SQL Server

I have a table named team and it like below: I just added a row_number in the 3rd column
RaidNo OutComeID RN
2 15 1
4 15 2
6 14 3
8 16 4
10 16 5
12 14 6
14 16 7
16 15 8
18 15 9
20 16 10
22 12 11
24 16 12
26 16 13
28 16 14
30 15 15
32 14 16
34 13 17
When the OutcomeId came as 16 then start with one and 16 comes consecutively, add one by one. And the results be like
RaidNo OutComeID RN Result
2 15 1 0
4 15 2 0
6 14 3 0
8 16 4 1
10 16 5 2
12 14 6 0
14 16 7 1
16 15 8 0
18 15 9 0
20 16 10 1
22 12 11 0
24 16 12 1
26 16 13 2
28 16 14 3
30 15 15 0
32 14 16 0
34 13 17 0
Help me to get the result.
You can use the following query:
SELECT RaidNo, OutComeID, RN,
CASE
WHEN OutComeID <> 16 THEN 0
ELSE ROW_NUMBER() OVER (PARTITION BY OutComeID, grp ORDER BY RN)
END AS Result
FROM (
SELECT RaidNo, OutComeID, RN,
RN - ROW_NUMBER() OVER (PARTITION BY OutComeID ORDER BY RN) AS grp
FROM mytable) AS t
ORDER BY RN
Field grp identifies slices (also called islands) of consecutive records having the same OutComeID value. The outer query uses grp in order to enumerate each record that belongs to a '16' slice. The records that belong to the other slices are assigned value 0.
Demo here

How to count rows from a different table based on same date range

EDIT although the accepted answer doesn't match what I was looking for at the moment I wrote this question, it does show a better way of what I had already (which is what they want) the data in the production tables are what is wrong, meaning that the numbers are wrong.
I'm struggling to create the sql statment to create the report at the bottom of this post. I've included the statments to create my tables and test data. So to get started here is the Creation statement for the picture_stats
CREATE TABLE [picture_stats](
[PICTURE_STATS_ID] [int] NULL,
[USER_NAME] [varchar](30) NULL,
[DATE_TIME] [datetime] NULL,
[SIZE] [float] NULL,
[CLICK_COUNT] [int] NULL
) ON [PRIMARY]
GO
Insert statement that puts data into picture_stats
INSERT INTO [picture_stats]
([PICTURE_STATS_ID]
,[USER_NAME]
,[DATE_TIME]
,[SIZE]
,[CLICK_COUNT])
VALUES
(1 ,'A','2015-05-18'75,18),
(2 ,'A','2015-05-18'13,18),
(3 ,'A','2015-05-18'42,16),
(4 ,'A','2015-05-18'59,16),
(5 ,'A','2015-05-18'46,14),
(6 ,'A','2015-05-18'64,16),
(7 ,'A','2015-05-18'87,13),
(8 ,'A','2015-05-18'84,14),
(9 ,'A','2015-05-18'33,16),
(10,'A','2015-05-18'59,14),
(11,'B','2015-05-19'10,17),
(12,'B','2015-05-19'44,18),
(13,'B','2015-05-19'29,14),
(14,'B','2015-05-19'65,19),
(15,'B','2015-05-19'10,15),
(16,'B','2015-05-19'55,18),
(17,'B','2015-05-19'81,11),
(18,'B','2015-05-19'29,11),
(19,'B','2015-05-19'58,19),
(20,'B','2015-05-19'20,17),
(21,'C','2015-05-20'35,16),
(22,'C','2015-05-20'70,18),
(23,'C','2015-05-20'30,13),
(24,'C','2015-05-20'33,13),
(25,'C','2015-05-20'43,19),
(26,'C','2015-05-20'10,15),
(27,'C','2015-05-20'33,13),
(28,'C','2015-05-20'23,12),
(29,'C','2015-05-20'35,18),
(30,'C','2015-05-20'58,19)
GO
Table view of the data.
ID USER_NAME DATE_TIME SIZE CLICK_COUNT
-- --------- ---------- ---- -----------
1 A 2015-05-18 75 18
2 A 2015-05-18 1 18
3 A 2015-05-18 42 16
4 A 2015-05-18 59 16
5 A 2015-05-18 46 14
6 A 2015-05-18 64 16
7 A 2015-05-18 87 13
8 A 2015-05-18 84 14
9 A 2015-05-18 33 16
10 A 2015-05-18 59 14
11 B 2015-05-19 10 17
12 B 2015-05-19 44 18
13 B 2015-05-19 29 14
14 B 2015-05-19 65 19
15 B 2015-05-19 100 15
16 B 2015-05-19 55 18
17 B 2015-05-19 81 11
18 B 2015-05-19 29 11
19 B 2015-05-19 58 19
20 B 2015-05-19 20 17
21 C 2015-05-20 35 16
22 C 2015-05-20 7 18
23 C 2015-05-20 30 13
24 C 2015-05-20 33 13
25 C 2015-05-20 4 19
26 C 2015-05-20 100 15
27 C 2015-05-20 33 13
28 C 2015-05-20 23 12
29 C 2015-05-20 35 18
30 C 2015-05-20 58 19
The second table picture_comment_stats can be created with this:
CREATE TABLE [picture_comment_stats](
[PICTURE_STATS_ID] [int] NULL,
[USER_NAME] [varchar](30) NULL,
[DATE_TIME] [datetime] NULL,
[CLICK_COUNT] [int] NULL,
[LIKES] [int] NULL,
) ON [PRIMARY]
GO
To script in the data:
INSERT INTO [picture_comment_stats]
([PICTURE_STATS_ID]
,[USER_NAME]
,[DATE_TIME]
,[CLICK_COUNT]
,[LIKES])
VALUES
(1 ,'X','2015-05-18',75,18),
(2 ,'X','2015-05-18',1 ,18),
(3 ,'X','2015-05-18',42,16),
(4 ,'X','2015-05-18',59,16),
(9 ,'X','2015-05-19',34,16),
(10,'X','2015-05-19',57,14),
(11,'Y','2015-05-19',11,17),
(12,'Y','2015-05-19',44,18),
(17,'Y','2015-05-20',81,11),
(18,'Y','2015-05-20',29,11),
(19,'Y','2015-05-20',55,19),
(21,'Z','2015-05-20',45,16),
(20,'Y','2015-05-21',20,17),
(22,'Z','2015-05-21',7 ,18),
(23,'Z','2015-05-21',30,13),
(24,'Z','2015-05-21',39,13),
(25,'Z','2015-05-21',4 ,19),
(26,'Z','2015-05-21',10,15),
(27,'Z','2015-05-21',33,13),
(28,'Z','2015-05-21',23,12),
(29,'Z','2015-05-21',35,18),
(30,'Z','2015-05-21',58,19)
GO
The table should look like this.
ID USER_NAME DATE_TIME COMMENT_ID LIKES
-- --------- ---------- ---------- -----
1 X 2015-05-18 75 18
2 X 2015-05-18 1 18
3 X 2015-05-18 42 16
4 X 2015-05-18 59 16
9 X 2015-05-19 34 16
10 X 2015-05-19 57 14
11 Y 2015-05-19 11 17
12 Y 2015-05-19 44 18
17 Y 2015-05-20 81 11
18 Y 2015-05-20 29 11
19 Y 2015-05-20 55 19
21 Z 2015-05-20 45 16
20 Y 2015-05-21 20 17
22 Z 2015-05-21 7 18
23 Z 2015-05-21 30 13
24 Z 2015-05-21 39 13
25 Z 2015-05-21 4 19
26 Z 2015-05-21 10 15
27 Z 2015-05-21 33 13
28 Z 2015-05-21 23 12
29 Z 2015-05-21 35 18
30 Z 2015-05-21 58 19
What I want to do is make a report of the data such that I have the user name, the count of the pictures, and the count of comments on an individuals picture. The picture_stats_id in both tables is actually a foreign key which points to the table 'pictures' primary key picture_id. The report I want to be able to choose a date or a date range and have it show like so: I'm hoping the report would show the following for the different 3 day scenarios
(Filtered for Date:2015-05-18)
USER_NAME PICS COMMENTS
--------- ---- --------
A 10 4
(Filtered for Date:2015-05-19)
USER_NAME PICS COMMENTS
--------- ---- --------
B 10 2
(Filtered for Date:2015-05-20)
USER_NAME PICS COMMENTS
--------- ---- --------
C 10 1
(Filtered for Date:2015-05-18 to 2015-05-19)
USER_NAME PICS COMMENTS
--------- ---- --------
A 10 6
B 10 2
(Filtered for Date:2015-05-19 to 2015-05-20)
USER_NAME PICS COMMENTS
--------- ---- --------
B 10 2
C 10 1
(Filtered for Date:2015-05-18 to 2015-05-20)
USER_NAME PICS COMMENTS
--------- ---- --------
A 10 6
B 10 5
C 10 1
so far all I have is
SELECT ps.USER_NAME as USER_NAME,
COUNT(*) as PICTURE_COUNT,
SUM(1) AS COMMENT_COUNT -- can't figure out this statement for nothing.
from picture_comment_stats pcs
RIGHT OUTER JOIN picture_stats ps ON ps.PICTURE_STATS_ID = pcs.PICTURE_STATS_ID
--WHERE ps.DATE_TIME BETWEEN #beginFilterDate AND #endFilterDate
GROUP BY ps.USER_NAME
ORDER BY ps.USER_NAME
EDIT
The fiddle address as was given in the below comments: http://sqlfiddle.com/#!3/74e77
To clarify the counts I'm try to get. The second column, would be the number of pictures said user posted that day. the third column is how many comments were made on a users picture for that specified day ONLY. So even though X commented on A's picture a total of 6 times (on the 18th, and 19th) I only want to count 4 if I filter for the 18th, and 6 if i filter on the 18th-19th. It's a little hard to follow I'm sure, but that's what is wanted.
SELECT --ps.USER_NAME as USER_NAME,
s.user_name,
count(distinct c.comment_id) as comments,
count(s.picture_stats_id) as pics
--COUNT(*) as PICTURE_COUNT,
--SUM(1) AS COMMENT_COUNT
from --commenting.transcription_audit_stats pcs
-- RIGHT OUTER JOIN commenting.transcription_stats ps ON ps.SURVEY_STATS_ID --= pcs.SURVEY_STATS_ID
picture_stats s
left join picture_comment_stats c
on s.picture_stats_id = c.picture_stats_id
and s.date_time = c.date_time
where s.date_time between --specified date range
--WHERE ps.DATE_TIME BETWEEN #beginFilterDate AND #endFilterDate
--GROUP BY ps.USER_NAME
--ORDER BY ps.USER_NAME
group by s.user_name,s.date_time
Try this and get the user name from the lookup table you have. Also filter on the dates you need.

select last entry for every user multi table example

Given the following tables
table message
id message time_send
14 "first" 2014-02-10 22:16:31
15 "second" 2014-02-14 09:35:20
16 "third" 2014-02-13 09:35:47
17 "fourth" 2014-03-10 22:16:31
18 "fifth" 2014-03-14 09:35:20
19 "sixth" 2014-04-12 09:35:47
20 "seventh" 2014-04-13 09:35:47
21 "eighth" 2014-04-14 09:35:47
table message_owner
id message_id owner_id cp_id
1 14 1 4
2 14 4 1
3 15 12 4
4 15 4 12
5 16 4 1
6 16 1 4
7 17 12 4
8 17 4 12
9 18 4 1
10 18 1 4
11 19 12 4
12 19 4 12
13 20 12 1
14 20 1 12
15 21 12 7
16 21 7 12
I want to query the most recent message with every counter party(cp_id) for a given owner.
For example for owner_id=4 I would like the following output:
id message time_send owner_id cp_id
18 "fifth" 2014-03-14 09:35:20 4 1
19 "sixth" 2014-02-13 09:35:47 4 12
I see a lot of examples with one table but I am not able to transpose them in a multitable example.
edit1: adding more entries
This should work:
SELECT m.id, m.message, mo.owner_id, mo.cp_id
FROM message m
JOIN message_owner mo ON m.id=mo.message_id
WHERE mo.owner_id=4
AND m.time_send=(
SELECT MAX(time_send)
FROM message m2
JOIN message_owner mo2 ON mo2.message_id=m2.id
WHERE mo2.owner_id=mo.owner_id
AND mo2.cp_id =mo.cp_id
)
... notice though that putting a WHERE condition on a timestamp column can sometimes not work correctly.
Same example without jointures:
SELECT m.id, m.time_send, m.message, mo.owner_id, mo.cp_id
FROM message m, message_owner mo
WHERE m.id = mo.message_id
AND mo.owner_id = 4
AND m.time_send = (
SELECT MAX(time_send)
FROM message m2, message_owner mo2
WHERE mo2.message_id = m2.id
AND mo2.owner_id = mo.owner_id
AND mo2.cp_id = mo.cp_id
);
http://sqlfiddle.com/#!2/558d7/4