Too many order by, max, subqueries for my intellect

Too many order by, max, subqueries for my intellect - sql

I can not seem to work out this query. I'm sure it needs subqueries, but I am out of options. My brain can not handle this or something. I need help :)
Little intro
I have a betting odds website. Every 15 minutes I import the latest odds (win/draw/lose -- or 1/X/2) for particular events from various bookmakers.
Every row of the odds table has the odds_type ('1', 'X' or '2'), the odds_index which is the actual odds, the bookmaker_id and the event_id.
But equally important: created_at, because I need to work with the odds from the latest import. It is very important for obvious reasons.
Scenario
In the table below we are working with imported odds for event_id #1.
Needed query
Isolate the latest set of imported odds for event_id = 1 from all bookmakers (in this example 9 records)
On that set, return the highest odds_index record for odds_type '1', 'X', '2'.
Now I prefer to do this with Rails scoping, so I can use #event.best_odds1 and #event.best_odds2, but I'll take any approach if it works. Been busting my brain on this for 5 days. Need to solve it.
Result
After the query I end up with 3 records so I can display "The best odds for event #1".
The tables
Bookmakers
ID | NAME
----------------------
1 | Unibet
2 | 888
3 | Ladbrokes
Events
ID | NAME
--------------------------
1 | Holland vs Denmark
2 | England vs Germany
3 | France vs Spain
Odds
ID | OT | OI | BI | EI | CREATED_AT
---------------------------------------------------
(first import from the bookies)
1 | '1' | 2.4 | 1 | 1 | 2010-06-10 15:00
2 | 'X' | 1.5 | 1 | 1 | 2010-06-10 15:00
3 | '2' | 6.2 | 1 | 1 | 2010-06-10 15:00
4 | '1' | 2.2 | 2 | 1 | 2010-06-10 15:58
5 | 'X' | 1.8 | 2 | 1 | 2010-06-10 15:58
6 | '2' | 5.2 | 2 | 1 | 2010-06-10 15:58
7 | '1' | 2.8 | 3 | 1 | 2010-06-10 16:56
8 | 'X' | 1.3 | 3 | 1 | 2010-06-10 16:56
9 | '2' | 7.1 | 3 | 1 | 2010-06-10 16:56
(last import from the bookies)
10 | '1' | 2.5 | 1 | 1 | 2010-06-11 17:10
11 | 'X' | 1.3 | 1 | 1 | 2010-06-11 17:10
12 | '2' | 6.4 | 1 | 1 | 2010-06-11 17:10
13 | '1' | 2.1 | 2 | 1 | 2010-06-11 18.12
14 | 'X' | 1.2 | 2 | 1 | 2010-06-11 18:58
15 | '2' | 6.2 | 2 | 1 | 2010-06-11 18:58
16 | '1' | 1.8 | 3 | 1 | 2010-06-12 14:56
17 | 'X' | 2.3 | 3 | 1 | 2010-06-12 14:56
18 | '2' | 5.1 | 3 | 1 | 2010-06-12 14:56
Abbreviated column names to fit on screen
OT = odds_type
OI = odds_index
BI = bookmaker_id
EI = event_id

You could use row_number() twice:
select *
from (
select *
, row_number() over (partition by OT order by OI desc) as rn2
from (
select *
, row_number() over (partition by EI, BI, OT
order by created_at desc) as rn1
from Odds
where EI = 1 -- for event 1
) sub1
where rn1 = 1 -- Latest row per EI, BI, OT
) sub2
where rn2 = 1 -- Highest OI per OT
But if the table keeps growing, this will perform badly. You could add a history table like OddsHistory, and move outdated Odds there. When only the latest Odds are in the Odds table, your query becomes much simpler.
Live example at SQL Fiddle.

Related

Generate 'average' column from sub query and ROW_NUMBER window function in SQL SELECT

I have the following SQL Server tables (with sample data):
Questionnaire
id | coachNodeId | youngPersonNodeId | complete
1 | 12 | 678 | 1
2 | 12 | 52 | 1
3 | 30 | 99 | 1
4 | 12 | 678 | 1
5 | 12 | 678 | 1
6 | 30 | 99 | 1
7 | 12 | 52 | 1
8 | 30 | 102 | 1
Answer
id | questionnaireId | score
1 | 1 | 1
2 | 2 | 3
3 | 2 | 2
4 | 2 | 5
5 | 3 | 5
6 | 4 | 5
7 | 4 | 3
8 | 5 | 4
9 | 6 | 1
10 | 6 | 3
11 | 7 | 5
12 | 8 | 5
ContentNode
id | text
12 | Zak
30 | Phil
52 | Jane
99 | Ali
102 | Ed
678 | Chris
I have the following T-SQL query:
SELECT
Questionnaire.id AS questionnaireId,
coachNodeId AS coachNodeId,
coachNode.[text] AS coachName,
youngPersonNodeId AS youngPersonNodeId,
youngPersonNode.[text] AS youngPersonName,
ROW_NUMBER() OVER (PARTITION BY Questionnaire.coachNodeId, Questionnaire.youngPersonNodeId ORDER BY Questionnaire.id) AS questionnaireNumber,
score = (SELECT AVG(score) FROM Answer WHERE Answer.questionnaireId = Questionnaire.id)
FROM
Questionnaire
LEFT JOIN
ContentNode AS coachNode ON Questionnaire.coachNodeId = coachNode.id
LEFT JOIN
ContentNode AS youngPersonNode ON Questionnaire.youngPersonNodeId = youngPersonNode.id
WHERE
(complete = 1)
ORDER BY
coachNodeId, youngPersonNodeId
This query outputs the following example data:
questionnaireId | coachNodeId | coachName | youngPersonNodeId | youngPersonName | questionnaireNumber | score
1 | 12 | Zak | 678 | Chris | 1 | 1
2 | 12 | Zak | 52 | Jane | 1 | 3
3 | 30 | Phil | 99 | Ali | 1 | 5
4 | 12 | Zak | 678 | Chris | 2 | 4
5 | 12 | Zak | 678 | Chris | 3 | 4
6 | 30 | Phil | 99 | Ali | 2 | 2
7 | 12 | Zak | 52 | Jane | 2 | 5
8 | 30 | Phil | 102 | Ed | 1 | 5
To explain what's happening here… There are various coaches whose job is to undertake questionnaires with various young people, and log the scores. A coach might, at a later date, repeat the questionnaire with the same young person several times, hoping that they get a better score. The ultimate goal of what I'm trying to achieve is that the managers of the coaches want to see how well the coaches are performing, so they'd like to see whether the scores for the questionnaires tend to go up or not. The window function represents a way to establish how many times the questionnaire has been undertaken by the same coach/young person combo.
I need to be able to determine the average score based on the questionnaire number. So for example, the coach 'Zak' logged scores of '1' and '3' for his first questionnaires (where questionnaireNumber = 1) so the average would be 2. For his second questionnaires (where questionnaireNumber = 2) the scores were '3' and '5' so the average would be 4. So in analysing this data we know that over time Zak's questionnaire scores have improved from an average of '2' the first time to an average of '4' the second time.
I feel like the query needs to be grouped by the coachNodeId and questionnaireNumber values so it would output something like this (I've ommitted the questionnaireId, youngPersonNodeId, youngPersonName and score columns as they aren't crucial for the output — they're only used to derive the averageScore — and wouldn't be useful the way the results are grouped):
coachNodeId | coachName | questionnaireNumber | averageScore
12 | Zak | 1 | 2 (calculation: (1 + 3) / 2)
12 | Zak | 2 | 4 (calculation: (3 + 5) / 2)
12 | Zak | 3 | 4 (only one value: 4)
30 | Phil | 1 | 5 (calculation: (5 + 5) / 2)
30 | Phil | 2 | 2 (only one value: 2)
Could anyone suggest how I can modify my query to output the average scores based on the score from the sub-query and the ROW_NUMBER window function? I've hit the limits of my SQL skills!
Many thanks.

It is a bit hard to tell without sample data, but I think you are describing aggregation:
SELECT q.coachNodeId AS coachNodeId,
cn.[text] AS coachName,
q.youngPersonNodeId AS youngPersonNodeId,
ypn.[text] AS youngPersonName,
AVG(score)
FROM Questionnaire q JOIN
ContentNode cn
ON q.coachNodeId = cn.id JOIN
ContentNode ypn
ON q.youngPersonNodeId = ypn.id LEFT JOIN
Answer a
ON a.questionnaireId = q.id
WHERE complete = 1
GROUP BY q.coachNodeID, cn.[text] AS coachName,
q.youngPersonNodeId, ypn.[text]

How to join transactional data with customer data tables and perform case-based operations in SQL

I'm trying to perform a query between two different tables and come up with a case by case scenario, coming up with a list of records of calls for a specific month.
Here are my tables:
Customer table:
+----+----------------+------------+
| id | name | number |
+----+----------------+------------+
| 1 | John Doe | 8973221232 |
| 2 | American Dad | 7165531212 |
| 3 | Michael Clean | 8884731234 |
| 4 | Samuel Gatsby | 9197543321 |
| 5 | Mike Chat | 8794029819 |
+----+----------------+------------+
Transaction data:
+----------+------------+------------+----------+---------------------+
| trans_id | incoming | outgoing | duration | date_time |
+----------+------------+------------+----------+---------------------+
| 1 | 8973221232 | 9197543321 | 64 | 2018-03-09 01:08:09 |
| 2 | 3729920490 | 7651113929 | 276 | 2018-07-20 05:53:10 |
| 3 | 8884731234 | 8973221232 | 382 | 2018-05-02 13:12:13 |
| 4 | 8973221232 | 9234759208 | 127 | 2018-07-07 15:32:30 |
| 5 | 7165531212 | 9197543321 | 852 | 2018-08-02 07:40:23 |
| 6 | 8884731234 | 9833823023 | 774 | 2018-07-03 14:27:52 |
| 7 | 8273820928 | 2374987349 | 120 | 2018-07-06 05:27:44 |
| 8 | 8973221232 | 9197543321 | 79 | 2018-07-30 12:51:55 |
| 9 | 7165531212 | 7651113929 | 392 | 2018-05-22 02:27:38 |
| 10 | 5423541524 | 7165531212 | 100 | 2018-07-21 22:12:20 |
| 11 | 9197543321 | 2983479820 | 377 | 2018-07-20 17:46:36 |
| 12 | 8973221232 | 7651113929 | 234 | 2018-07-09 03:32:53 |
| 13 | 7165531212 | 2309483932 | 88 | 2018-07-16 16:22:21 |
| 14 | 8973221232 | 8884731234 | 90 | 2018-09-03 13:10:00 |
| 15 | 3820838290 | 2093482348 | 238 | 2018-04-12 21:59:01 |
+----------+------------+------------+----------+---------------------+
What am I trying to accomplish?
I'm trying to compile a list of "costs" for each of the customers that made calls on July 2018. The costs are based on:
1) If the customer received a call (incoming), the cost of the call is equal to the duration;
2) if the customer made a call (outgoing), the cost of the call is 100 if the call is 30 or less in duration. If it exceeds 30 duration, then the cost is 100 plus 5 * duration of the exceeded period.
If the customer didn't make any calls during that month he shouldn't be on the list.
Examples:
1) Customer American Dad has 3 incoming calls and 1 outgoing call, however only trans_id 10 and 13 are for the month of July. He should be paying a total of 538:
for trans_id 10 = 450 (100 for the first 30s + 5 * 70 for the remaining)
for trans_id 13 = 88
2) Customer Samuel Gatsby has 1 incoming call and 3 outgoing calls, however only trans_id 8 and 11 are for the month of July. He should be paying a total of 722:
for trans_id 8 = 345 (100 for the first 30s + 5 * 49 for the remaining)
for trans_id 11 = 377
Considering only these two examples, the output would be:
+----+----------------+------------+------------+
| id | name | number | billable |
+----+----------------+------------+------------+
| 2 | American Dad | 7165531212 | 538 |
| 4 | Samuel Gatsby | 9197543321 | 722 |
+----+----------------+------------+------------+
Note: Mike Chat shouldn't be on the list as he didn't make or receive any calls for that specific month.
What have I tried so far?
I've been playing cat and mouse with this one, I'm using the number as uniqueID, already attempted both a full outer join and combining where incoming or outgoing is not null then applying rules by case, tried doing a left join and applying cases, but I'm circling around and I can't get to a final list. Whenever I get incoming or outgoing, I'm either not able to apply the case or not able to come with both together. Really appreciate the help!
select customer_name.name, customer_name.number, bill = (CASE
WHEN customer_name.number = transaction_data.incoming then 'sum bill'
else 'multiply and add'
end)
from customer_name
left join transaction_data on customer_name.number = transaction_data.incoming or customer_name.name = transaction_data.outgoing
where strftime('%Y-%m', transaction_data.date_time) = '2018-07'
Note: I'm using sqlite to try it out online but the database is on SQL Server 2012, so I know that I can use a date format much easier, that way, but I'd like to keep as close to T-SQL as possible.
Also tried creating a case to determine whether it's incoming call or outgoing, but I'm only getting incoming as a result, even though trans_id 10 is outgoing:
select name, number, duration, case
when customer_name.number = transaction_data.incoming then 'incoming'
when customer_name.number = transaction_data.outgoing then 'outgoing'
END direction
from customer_name
left join transaction_data on customer_name.number = transaction_data.incoming or customer_name.name = transaction_data.outgoing
where strftime('%Y-%m', transaction_data.date_time) = '2018-07'

Try this:
SELECT
c."name", c.number,
SUM(CASE c.number
WHEN t.incoming THEN t.duration
ELSE IIF(t.duration - 30 < 0, 0, t.duration - 30) * 5 + 100
END) AS billable
FROM Customer AS c INNER JOIN [Transaction] AS t
ON c.number IN(t.incoming, t.outgoing)
WHERE t.date_time >= '20180701' AND t.date_time < '20180801'
GROUP BY c."name", c.number
Output:
| name | number | billable |
+---------------+------------+----------+
| John Doe | 8973221232 | 440 |
| American Dad | 7165531212 | 538 |
| Michael Clean | 8884731234 | 774 |
| Samuel Gatsby | 9197543321 | 722 |
Test it online with SQL Fiddle.

Return Max(Recent_Date) for each month, Duplicate value and inner join issue - SQL

Basically, I want to achieve , for each months, like in this example, from January until March 2013, what is the Max(Most_Recent_Day) for each users.
Example, From January to March, every month in the Database, systems will capture the Most_Recent_Day for each users.
Below are the expected results:
User | Most_Recent_Day
--------------------------------
afolabi.banu | 1/31/2013
afolabi.banu | 2/7/2013
afolabi.banu | 3/21/2013
mario.sapiter | 1/22/2013
mario.sapiter | 2/7/2013
mario.sapiter | 3/11/2013
However, I want to have another DB column as well to be display .Below is the column.
User|Total_Hits | Recent_Month| Most_Recent_Day | Most_Recent_Days_Hits
I tried to use inner join, but the result are not what i expect. I got duplicated user name and duplicated recent day. Basically, I want only to display no duplicated record for the same user name.
Below is the result that I got. Please ignore the recent_month value since it's data from database.
User |Total_Hits | Recent_Month | Most_Recent_Day | Most_Recent_Days_Hits
-------------------------------------------------------------------------------------
afolabi.banu | 223 | 25 | 2/7/2013 | 5
afolabi.banu | 223 | 25 | 2/7/2013 | 5
afolabi.banu | 211 | 13 | 1/31/2013 | 3
afolabi.banu | 223 | 25 | 2/7/2013 | 5
afolabi.banu | 296 | 31 | 3/21/2013 | 1
afolabi.banu | 296 | 31 | 3/21/2013 | 1
mario.sapiter | 95 | 7 | 2/7/2013 | 5
mario.sapiter | 7 | 7 | 3/21/2013 | 1
mario.sapiter | 7 | 37 | 3/22/2013 | 1
mario.sapiter | 249 | 37 | 2/7/2013 | 5
This is my SQL Code
SELECT t.[User],
t.Total_Hits,
t.Recent_Month,
t.Most_Recent_Day,
t.Most_Recent_Day_Hits FROM UserUsageMonthly t
INNER JOIN
(
select
[User]
, max(Most_Recent_Day) as Most_Recent_Day
from UserUsageMonthly (NoLock)
where Application_Name='Daily Production Review' and Site_Collection='wrm13'
and Most_Recent_Day between '1/1/2013' and '3/31/2013'
group by [User], datepart(month,Most_Recent_Day)
) table2
ON
t.[User]=table2.[User]
AND t.Most_Recent_Day = table2.Most_Recent_Day
order by t.[User]

You should add the month value to your SQL SELECT
SELECT
MONTH(t.Most_Recent_Day) as 'MyMonth',
t.[User],
t.Total_Hits,
t.Recent_Month,
t.Most_Recent_Day,
t.Most_Recent_Day_Hits FROM UserUsageMonthly t
Then you can group by the month column
GROUP BY MyMonth

HOW TO: SQL Server select distinct field based on max value in other field

id tmpname date_used tkt_nr
---|---------|------------------|--------|
1 | template| 04/03/2009 16:10 | 00011 |
2 | templat1| 04/03/2009 16:11 | 00011 |
5 | templat2| 04/03/2009 16:12 | 00011 |
3 | diffname| 03/03/2009 15:11 | 00022 |
4 | diffname| 03/03/2009 16:12 | 00022 |
6 | another | 03/03/2009 16:13 | NULL |
7 | somethin| 24/12/2008 11:12 | 00023 |
8 | name | 01/01/2009 12:12 | 00026 |
I would like to have the result:
id tmpname date_used tkt_nr
---|---------|------------------|--------|
5 | templat2| 04/03/2009 16:12 | 00011 |
4 | diffname| 03/03/2009 16:12 | 00022 |
7 | somethin| 24/12/2008 11:12 | 00023 |
8 | name | 01/01/2009 12:12 | 00026 |
So what I'm looking for is to have distinct tkt_nr values excluding NULL, based on the max value of datetime.
I have tried several options but always failed
SELECT *
FROM templateFeedback a
JOIN (
SELECT ticket_number, MAX(date_used) date_used
FROM templateFeedback
GROUP BY ticket_number
) b
ON a.ticket_number = b.ticket_number AND a.date_used = b.date_used
I would appreciate any help. Unfortunately I need the code to be compatible with SQL Server.

I've stopped doing things this way since I discovered windowing functions. Too often, there are two records with the same timestamp and I get two records in the resultset. Here's the code for tSQL. Similar for Oracle. I don't think mySQL supports this yet.
Select id, tmpname, date_used, tkt_nbr
From
(
Select id, tmpname, date_used, tkt_nbr,
rownum = Row_Number() Over (Partition by tkt_nbr Order by date_used desc)
) x
Where row_num=1

Get the highest odds from the last update

I have these tables in a PostgreSQL database:
bookmakers
-----------------------
| id | name |
-----------------------
| 1 | Unibet |
-----------------------
| 2 | 888 |
-----------------------
odds
---------------------------------------------------------------------
| id | odds_type | odds_index | bookmaker_id | created_at |
---------------------------------------------------------------------
| 1 | 1 | 1.55 | 1 | 2012-06-02 10:30 |
---------------------------------------------------------------------
| 2 | 2 | 3.22 | 2 | 2012-06-02 10:30 |
---------------------------------------------------------------------
| 3 | X | 3.00 | 1 | 2012-06-02 10:30 |
---------------------------------------------------------------------
| 4 | 2 | 1.25 | 1 | 2012-05-27 09:30 |
---------------------------------------------------------------------
| 5 | 1 | 2.30 | 2 | 2012-05-27 09:30 |
---------------------------------------------------------------------
| 6 | X | 2.00 | 2 | 2012-05-27 09:30 |
---------------------------------------------------------------------
What I am trying to query is the following:
Give me the 1/X/2 odds from the latest update (created_at) from ALL bookmakers and from that last update, give me the highest odds for each odds_type ('1', '2', 'X').
On my website I display them as:
Best odds right now: 1 | X | 2
--------------------
2.30 | 3.00 | 3.22
I have to first get the latest, because the odds from the update from yesterday are no longer valid. Then from that last update, I have - in this case - 2 odds from 2 different bookmakers, so I need to get the best one for type '1','2','X'.
Pseudo SQL would be something like:
SELECT MAX(odds_index) WHERE odds_type = '1' ORDER BY created_at DESC, odds_index DESC
But that doesn't work, because I would always get the latest odds (and not the highest/best from those latest)
I hope I'm making sense.

Subqueries to the rescue!
select o1.odds_type, max(o1.odds_index)
from odds o1
inner join (select odds_type, max(created_at) as created_at
from odds group by odds_type) o2
on o1.odds_type = o2.odds_type
and o1.created_at = o2.created_at
group by o1.odds_type
SQLFiddle: http://sqlfiddle.com/#!3/47df4/3

Your words "from the last update" contradict your example. Here are two methods.
To get from last update, how about getting the max created_at date aka last update and then using it for the rest.
declare #max_date date
select #max_date = max(created_at) from odds
select odds_type, odds_index
from odds
where created_at = #max_date
Or to match your example
select odds_type, odds_index
from odds
group by odds_type
having created_at = max(created_at)
Note: Different DBMS give different results depending on the select columns and whether there are more columns than in the group by clause.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Too many order by, max, subqueries for my intellect - sql

Related

Generate 'average' column from sub query and ROW_NUMBER window function in SQL SELECT

How to join transactional data with customer data tables and perform case-based operations in SQL

Return Max(Recent_Date) for each month, Duplicate value and inner join issue - SQL

HOW TO: SQL Server select distinct field based on max value in other field

Get the highest odds from the last update

Categories

Resources