SQL Query - Grouping Data - sql

So every morning at work we have a stand-up meeting. We throw the nearest object to hand around the room as a method of deciding who speaks in what order. Being slightly odd I decided it could be fun to get some data on these throws. So, every morning I memorise the order of throws (as well as other relevant things like who dropped the ball/strange sponge object that was probably once a ball too and who threw to someone who'd already been or just gave an atrocious throw), and record this data in a table:
+---------+-----+------------+----------+---------+----------+--------+--------------+
| throwid | day | date | thrownum | thrower | receiver | caught | correctthrow |
+---------+-----+------------+----------+---------+----------+--------+--------------+
| 1 | 1 | 10/01/2012 | 1 | dan | steve | 1 | 1 |
| 2 | 1 | 10/01/2012 | 2 | steve | alice | 1 | 1 |
| 3 | 1 | 10/01/2012 | 3 | alice | matt | 1 | 1 |
| 4 | 1 | 10/01/2012 | 4 | matt | justin | 1 | 1 |
| 5 | 1 | 10/01/2012 | 5 | justin | arif | 1 | 1 |
| 6 | 1 | 10/01/2012 | 6 | arif | pete | 1 | 1 |
| 7 | 1 | 10/01/2012 | 7 | pete | greg | 0 | 1 |
| 8 | 1 | 10/01/2012 | 8 | greg | alan | 1 | 1 |
| 9 | 1 | 10/01/2012 | 9 | alan | david | 1 | 1 |
| 10 | 1 | 10/01/2012 | 10 | david | dan | 1 | 1 |
| 11 | 2 | 11/01/2012 | 1 | dan | david | 1 | 1 |
| 12 | 2 | 11/01/2012 | 2 | david | alice | 1 | 1 |
| 13 | 2 | 11/01/2012 | 3 | alice | steve | 1 | 1 |
| 14 | 2 | 11/01/2012 | 4 | steve | arif | 1 | 1 |
| 15 | 2 | 11/01/2012 | 5 | arif | pete | 0 | 1 |
| 16 | 2 | 11/01/2012 | 6 | pete | justin | 1 | 1 |
| 17 | 2 | 11/01/2012 | 7 | justin | alan | 1 | 1 |
| 18 | 2 | 11/01/2012 | 8 | alan | dan | 1 | 1 |
| 19 | 2 | 11/01/2012 | 9 | dan | greg | 1 | 1 |
+---------+-----+------------+----------+---------+----------+--------+--------------+
I've now got quite a few days worth of data for this, and I'm starting to run some queries on it for my own purposes (I've not told the rest of the team yet...wouldn't like to influence the results). I've done a few with no issues, but I'm stuck trying to get a certain result out.
What I'm looking for is the number of times each person has been the last team member to receive the ball. Now, as you can see on the table, due to absences etc the number of throws per day is not always constant, so I can't simply select the receiver by thrownum.
In the case for the data above, it would return:
+--------+-------------------+
| person | LastReceiverTotal |
+--------+-------------------+
| dan | 1 |
| greg | 1 |
+--------+-------------------+
I've got this far:
SELECT MAX(thrownum) AS LastThrowNum, day FROM Throws GROUP BY day
Now, this returns some useful data. I get the highest thrownum for each and every day. It would seem like all I need to do is get the receiver for this value, and then get a count grouped by receiver to get my answer. This doesn't work, though, because the resultset isn't what it seems due to the above query using aggregate functions.
I suspect there's a much better way of designing tables to store the data to be honest, but equally I'm also sure there's a way to get this information with the tables as they are - some kind of inner query? I can't figure out how it would work. Can anyone shed some light on how this would be done?

The query that you have gives you the biggest thrownum for each day.
With that, you just do a inner join with your table and get the receiver and the number of times he happears.
select t.receiver as person, count(t.day) as LastReceiverTotal from Throws t
inner join (SELECT MAX(thrownum) AS LastThrowNum, day FROM Throws GROUP BY day) a on a.LastThrowNum = t.thrownum and a.day = t.day
group by t.receiver

Related

SQL JOIN each id in JSON object

I have a JSON column containing col_values for another table. I want to return rows from that other table for each item in the JSON object.
If this was an INT column, I would use JOIN, but I need to JOIN every entry in the JSON object.
Take:
writers :
| id | name | projects (JSON) |
|:-- |:-----|:------------------|
| 1 | Andy | ["1","2","3","4"] |
| 2 | Hank | ["3","4","5","6"] |
| 3 | Alex | ["1","7","8","9"] |
| 4 | Joe | ["1","5","6","7"] |
| 5 | Ken | ["2","4","5","6"] |
| 6 | Zach | ["2","7","8","9"] |
| 7 | Walt | ["2","5","6","7"] |
| 8 | Mike | ["2","3","4","5"] |
cities :
| id | name | project |
|:-- |:---------|:--------|
| 1 | Boston | 1 |
| 2 | Chicago | 2 |
| 3 | Cisco | 3 |
| 4 | Seattle | 4 |
| 5 | North | 5 |
| 6 | West | 6 |
| 7 | Miami | 7 |
| 8 | York | 8 |
| 9 | Tainan | 9 |
| 10 | Seoul | 1 |
| 11 | South | 2 |
| 12 | Tokyo | 3 |
| 13 | Carlisle | 4 |
| 14 | Fugging | 5 |
| 15 | Turkey | 6 |
| 16 | Paris | 7 |
| 17 | Midguard | 8 |
| 18 | Fugging | 9 |
| 19 | Madrid | 1 |
| 20 | Salvador | 2 |
| 21 | Everett | 3 |
I need every city ordered by name for Mike (id=8).
Desired results:
This is what I'm getting and what I need to get (ORDER BY name).
Output :
| id | name | project |
|:---|:---------|:--------|
| 13 | Carlisle | 4 |
| 2 | Chicago | 2 |
| 3 | Cisco | 3 |
| 21 | Everett | 3 |
| 14 | Fugging | 5 |
| 5 | North | 5 |
| 20 | Salvador | 2 |
| 4 | Seattle | 4 |
| 11 | South | 2 |
| 12 | Tokyo | 3 |
Current query, but this can't be the best way...
SQL >
SELECT c.*
FROM cities c
WHERE EXISTS (
SELECT 1
FROM writers w
WHERE JSON_CONTAINS(
w.projects, CONCAT('\"', c.project, '\"'))
AND w.id = '8'
)
ORDER BY c.name;
DB Fiddle with the above. Is there a better way to do this "properly"?
Background
If it matters, I need to keep using JSON as the datatype because my server-side software that uses this database normally reads that column best if presented as a JSON object.
I would normally just do several database calls and iterate through that JSON object in my server-side language, but that is way too expensive with so many database calls, notwithstanding that it is even more costly to do multiple database calls for pagination.
I need all the results in a single database call. So, I need to JOIN or otherwise loop through each item in the JSON object within SQL.
Start with JOIN
Per a comment from a user, there is a better way...
SQL >
SELECT c.*
FROM writers w
JOIN cities c ON JSON_CONTAINS(w.projects, CONCAT('\"', c.project, '\"'))
WHERE w.id = '8'
ORDER BY c.name;
Output is the same...
Output :
id
name
project
13
Carlisle
4
2
Chicago
2
3
Cisco
3
21
Everett
3
14
Fugging
5
5
North
5
20
Salvador
2
4
Seattle
4
11
South
2
12
Tokyo
3
DB Fiddle

SQL, query to check and list distinct entries that occur in another table within a specific time frame

I'm using Oracle.
I have two tables. One contains users and the other is an access log of sorts. I need to list all users whose latest log entry appears in the log within a specified time frame including the timestamp of the latest entry. A single user can have several entries in the log.
Here are simplified versions of the tables:
Users
|----------------------------------|
| userid| username | name |
|----------------------------------|
| 1 | josm | John Smith |
| 2 | lajo | Laura Jones |
| 3 | miwi | Mike Williams |
| 4 | subo | Susan Brown |
| 5 | peda | Peter Davis |
| 6 | jami | Jane Miller |
|----------------------------------|
Log
|----------------------------------|
| userid| action | timestamp |
|----------------------------------|
| 3 | a | 20-01-2020 |
| 2 | v | 19-11-2019 |
| 2 | y | 02-11-2019 |
| 4 | b | 15-09-2019 |
| 1 | a | 23-05-2019 |
| 6 | y | 22-05-2019 |
| 3 | b | 16-04-2019 |
| 2 | a | 07-01-2019 |
| 5 | v | 18-11-2018 |
| 6 | a | 12-09-2018 |
|----------------------------------|
Desired result if the time frame is set to last six months:
|---------------------------------------|
| username | name | timestamp |
|--------------------------|------------|
| miwi | Mike Williams | 20-01-2020 |
| lajo | Laura Jones | 19-11-2019 |
| subo | Susan Brown | 15-09-2019 |
|---------------------------------------|
Any help will be greatly appreciated.
You can use aggregation:
select u.username, u.userid, max(l.timestamp)
from logs l join
users u
on l.userid = u.userid
group by u.username, u.userid
having max(l.timestamp) >= add_months(sysdate, -6)

SQL Server 2016 count similar rows as a column without duplicating query

I have a SQL query that returns data similar to this pseudo-table:
| Name | Id1 | Id2 | Guid |
|------+-----+-----+------|
| Joe | 1 | 1 | 1123 |
| Joe | 2 | 1 | 1123 |
| Joe | 3 | 1 | 1120 |
| Jeff | 1 | 1 | 1123 |
| Moe | 3 | 42 | 1120 |
I would like to display an additional column on the output, listing the total number of records that have matching GUIDs to a given row, like this:
| Name | Id1 | Id2 | Guid | # Matching |
+------+-----+-----+------+------------+
| Joe | 1 | 1 | 1123 | 3 |
| Joe | 2 | 1 | 1123 | 3 |
| Joe | 3 | 1 | 1120 | 2 |
| Jeff | 1 | 1 | 1123 | 3 |
| Moe | 3 | 42 | 1120 | 2 |
I was able to accomplish this by joining the query with itself, and doing a count. However, the query is rather large and takes awhile to complete, is there any way I can accomplish this without joining the query with itself?
You want a window function:
select t.*, count(*) over (partition by guid) as num_matching
from t;

Percentage to total in BigQuery Legacy SQL (Subqueries?)

I can't understand how to calulate percentage to total in BigQuery Legacy SQL.
So, I have a table:
ID | Name | Group | Mark
1 | John | A | 10
2 | Lucy | A | 5
3 | Jane | A | 7
4 | Lily | B | 9
5 | Steve | B | 14
6 | Rita | B | 11
I want to calculate percentage like this:
ID | Name | Group | Mark | Percent
1 | John | A | 10 | 10/(10+5+7)=45%
2 | Lucy | A | 5 | 5/(10+5+7)=22%
3 | Jane | A | 7 | 7/(10+5+7)=33%
4 | Lily | B | 9 | 9/(9+14+11)=26%
5 | Steve | B | 14 | 14/(9+14+11)=42%
6 | Rita | B | 11 | 11/(9+14+11)=32%
My table is quite long for me (3 million rows).
I thought that I could do it with subqueries, but in SELECT I can't use subqueries.
Does anyone know a way to do it?
SELECT
ID, Name, [Group], Mark,
RATIO_TO_REPORT(Mark) OVER(PARTITION BY [Group]) AS percent
FROM YourTable
Check more about RATIO_TO_REPORT

SQL only select rows with max date within each user

SQL beginner here. I've got a simple test that users take, and each row is the answer to one of their questions. They're allowed to take the exam once per day, so some people take it a second time on another day, and thus will have many rows with different test dates. What I'm basically trying to do is get each user's most recent score.
Here is what my data looks like (table name is dumdum):
+----------+----------------+----------+------------------+
| USERNAME | CORRECT_ANSWER | RESPONSE | DATE_TAKEN |
+----------+----------------+----------+------------------+
| matt | 1 | 1 | 3/23/15 1:04:26 |
| matt | 2 | 2 | 3/23/15 1:04:28 |
| matt | 3 | 3 | 3/23/15 1:04:23 |
| david | 1 | 3 | 3/20/15 1:04:25 |
| david | 2 | 2 | 3/20/15 1:04:28 |
| david | 3 | 1 | 3/20/15 1:04:30 |
| david | 1 | 1 | 3/21/15 11:03:14 |
| david | 2 | 3 | 3/21/15 11:03:17 |
| david | 3 | 2 | 3/21/15 11:03:19 |
| chris | 1 | 2 | 3/17/15 12:45:52 |
| chris | 2 | 2 | 3/17/15 12:45:56 |
| chris | 3 | 3 | 3/17/15 12:45:59 |
| peter | 1 | 1 | 3/19/15 2:45:33 |
| peter | 2 | 3 | 3/19/15 2:45:35 |
| peter | 3 | 2 | 3/19/15 2:45:38 |
| peter | 1 | 1 | 3/20/15 12:32:04 |
| peter | 2 | 2 | 3/20/15 12:32:05 |
| peter | 3 | 3 | 3/20/15 12:32:05 |
+----------+----------------+----------+------------------+
and what I'm trying to get in the end...
+----------+------------------+-------+
| USERNAME | MOST_RECENT_TEST | SCORE |
+----------+------------------+-------+
| matt | 3/23/2015 | 100 |
| david | 3/21/2015 | 33 |
| chris | 3/17/2015 | 67 |
| peter | 3/20/2015 | 100 |
+----------+------------------+-------+
I ran into some trouble because I need to go by day, and not by day/time, so I had to do a weird maneuver where I went to character and back to date... This is what I have so far, but I can't figure out how to use only the scores from the most recent test (right now it's factoring in all scores from every test ever taken)...
SELECT username, to_date(substr(max(test_date),1,9),'dd-MON-yy') as most_recent_test, round((sum(case when response=correct_answer then 1 end)/3)*100,0) as score
FROM dumdum group by username
Any help would be appreciated! Thanks!
There are several solutions to this problem this one uses the WITH clause and the RANK function.
It also uses the TRUNC function rather than to_date(substr(
with mxDate as
(SELECT USERNAME,
TRUNC(DATE_TAKEN) as MOST_RECENT_TEST,
CASE WHEN CORRECT_ANSWER = RESPONSE THEN 1 else 0 END as SCORE,
RANK () OVER (PARTITION BY USERNAME
ORDER BY TRUNC(DATE_TAKEN) DESC) Rk
FROM dumdum)
SELECT
USERNAME,
MOST_RECENT_TEST,
SUM(SCORE)/3 * 100
FROM
mxDate
WHERE
rk = 1
GROUP BY
USERNAME,
MOST_RECENT_TEST
Demo