DeCorrelated SubQueries in Google BigQuery? - sql

I have been struggling with a problem for hours. I have found myself down multiple rabbit holes and into the realms of DeCorrelated SubQueries which are frankly beyond me...
I have two tables and I'm trying to pull from both without a common column to join against. I need to take the a value from table 1, find the closest value (that is lower) in table 2 and then pull related data from table 2.
table_1
id
score
1
99.983545
2
98.674359
3
97.832475
4
96.184545
5
93.658572
6
89.963544
7
87.427353
8
82.883345
table_2
average_level
percentile
99.743545
99
97.994359
98
97.212485
97
96.987545
96
95.998573
95
88.213584
94
87.837384
93
80.982147
92
From the two tables above I need to:
Take the id and score
identify the closest average_level to the score
include the correlating average_level and percentile
The hoped for output would look like this...
id
score
average_level
percentile
1
99.983545
99.743545
99
2
98.674359
97.994359
98
3
97.832475
97.212485
97
4
96.184545
95.998573
95
5
93.658572
88.213584
94
6
89.963544
88.213584
94
7
87.427353
80.982147
92
8
82.883345
80.982147
92
Any help or advice would be very much appreciated

You can do this by joining both tables with table_1.score >= table_2.average_level and then getting the max(average_level) and max(average_level) - which will be the closest yet inferior or equal values from table_2 - and grouping by the fields in table_1:
SELECT TABLE_1.ID, TABLE_1.SCORE,
MAX(TABLE_2.AVERAGE_LEVEL) AS AVERAGE_LEVEL,
MAX(TABLE_2.PERCENTILE) AS PERCENTILE
FROM TABLE_1 INNER JOIN TABLE_2
ON TABLE_1.SCORE >= TABLE_2.AVERAGE_LEVEL
GROUP BY TABLE_1.ID, TABLE_1.SCORE
ORDER BY TABLE_1.ID
I add the fiddle example here, it also includes #Ă–mer's answer

if we say first table score
and second one avarage
you can try this.
select *
from Score s
inner join average a on a.Percentile = (select top(1) al.Percentile from average al order by Abs(average_level - s.score))
enter image description here

Related

SQL query to order sum of a column

I have the following tables:
Drivers table, with a Driver_Code column
Route_files table, with a Driver_Code column and a Route_Code column
Routes table, with a Route_Code column and a Kilometers column
For every entry in the Drivers table there may be more than 1 entry in the Route_files table with the same Driver_Code. For every entry in the Route_files table, there is only one entry in the Routes table with the same Route_Code.
What I am trying to do is order the Drivers based on the total number of kilometers that they drove. So if I have the following data:
Drivers:
Driver_Code
2
3
4
Route_files:
Driver_Code Route_Code
2 20
2 50
2 30
3 30
4 40
Routes:
Route_Code Kilometers
20 1231
30 9
40 400000
50 24234
Then Driver 2 drove routes 20 30 and 50 so the total kilometers is 25474. Similarly driver 3 drove 9km and driver 4 drove 400000. The SQL query that I need should output:
Driver_Code Total_km
4 400000
2 25474
3 9
I tried to use an inner join on the Route_files and Routes tables to obtain a single "table" with all the necessary information, hoping that I could further use this obtained table, but couldn't figure out how to do that. I am working in dBase 2019(and can't change to something better, unfortunately). Any hints and ideas are appreciated!
I finally managed to do it. This is the working query:
select
Driver_Code,
SUM(km) as Total_km
from
Route_files
inner join Routes on Route_files.Route_Code = Routes.Route_Codes
GROUP BY
Route_files.Driver_Code
ORDER BY
Total_km Descending
Initially I was doing select Driver_Code, km, SUM(km) and when trying to do GROUP BY, dBase was forcing me to group by Driver_Code as well as km, which meant that the SUM function was being applied to every single entry instead of on all the entries of a single Driver_Code, which is what I needed. Now I finally understand what GROUP BY does!
Thanks everyone for your comments!

Grouping and Summing Totals in a Joined Table

I have two tables Medication and Inventory. I'm trying to SELECT all the below details from both tables but there are multiple listings of medication ids with different BRANCH_NO also in the INVENTORY table (the primary key in INVENTORY is actually BRANCH_NO, MEDICATION_ID composite key)
I need to total up the various medication_IDs and also join the tables in one SELECT command and display all the infomation for each med (there are 5) with a total sum of each med at the end of each row. But im getting all muddled trying Group by and Sum and at one point partition. Help please I'm new to this.
Below is the latest non working version - but it doesn't display
Medication Name
Medication Desc
Manufacturer
Pack Size
like i chanced it might.
SELECT I.MEDICATION_ID,
SUM(I.STOCK_LEVEL)
FROM INVENTORY I
INNER JOIN (SELECT MEDICATION_NAME, SUBSTR(MEDICATION_DESC,1,20) "Medication Description",
MANUFACTURER, PACK_SIZE FROM MEDICATION) M ON MEDICATION_ID=I.MEDICATION_ID
GROUP BY I.MEDICATION_ID;
For the data imagine I want this sort of output:
MEDICATION_ID MEDICATION_NAME STOCK_LEVEL OtherColumns.....
1 Alpha 10
2 Bravo 20
3 Charlie 20
1 Alpha 30
4 Delta 10
5 Echo 20
5 Echo 40
2 Bravo 10
grouping and totalling into this:
MEDICATION_ID MEDICATION_NAME STOCK_LEVEL OtherColumns.....
1 Alpha 40
2 Bravo 30
3 Charlie 20
4 Delta 10
5 Echo 60
I can get this when its just one table but when Im trying to join tables and also SELECT things its just not working.
Thanks in advance guys. I appreciate it may be a simple solution, but it will be a big help.
You need to write explicitly all non-aggregated columns into both SELECT and GROUP BY lists ( Btw, no need to use a nested query, and if it's the case MEDICATION_ID column is missing in it ) :
SELECT I.MEDICATION_ID, M.MEDICATION_NAME, SUM(I.STOCK_LEVEL) AS STOCK_LEVEL,
SUBSTR(M.MEDICATION_DESC,1,20) "Medication Description", M.MANUFACTURER, M.PACK_SIZE
FROM INVENTORY I
JOIN MEDICATION M ON M.MEDICATION_ID = I.MEDICATION_ID
GROUP BY I.MEDICATION_ID, M.MEDICATION_NAME, SUBSTR(M.MEDICATION_DESC,1,20),
M.MANUFACTURER, M.PACK_SIZE;
This way, you'll be able to return all the listed columns.

SQL select command SUM across 3 related tables

I've changed my DB structure to make it more future proof. Now I'm having trouble with the new select query.
I have table called activities that has a list of activities and how many steps per minute that activity was worth. The table was structred like this:
Activities
id act_name act_steps
12 Boxing 250
14 Karate 300
17 Yoga 89
I have another table called distance that is structed like this:
Distance
id dist_activity_id dist_activity_duration member_id
1 12 60 12
2 14 90 12
3 17 30 12
I have the query that would SUM and produce a total for all activities in the distance table
SELECT ROUND(SUM(act_steps * dist_activity_duration / 2000),2) AS total_miles
FROM distance,
activities
WHERE activities.id = distance.dist_activity_id
This worked fine.
To future proof it incase the number of steps for an activity changes I've setup a table called steps that is structured like this:
Steps
id activity_steps
1 6
2 250
3 300
4 89
I then updated the activities table, removing the act_steps column and replacing it with steps_id so it now looks like this:
Updated activities
id act_name steps_id
12 Boxing 2
14 Karate 3
17 Yoga 4
I'm not sure how to create the select command to get the SUM using the new structure.
Could someone please help me with this?
Thanks
Wayne
Learn to use proper JOIN syntax! Your query should look like:
SELECT ROUND(SUM(a.act_steps * d.dist_activity_duration / 2000), 2) AS total_miles
FROM distance d JOIN
activities a
ON a.id = d.dist_activity_id;
If you need to lookup the steps, then add another JOIN:
SELECT ROUND(SUM(s.activity_steps * d.dist_activity_duration / 2000), 2) AS total_miles
FROM distance d JOIN
activities a
ON a.id = d.dist_activity_id JOIN
steps s
ON s.id = a.steps_id;

Select row with shortest string in one column if there are duplicates in another column?

Let's say I have a database with rows like this
ID PNR NAME
1 35 Television
2 35 Television, flat screen
3 35 Television, CRT
4 87 Hat
5 99 Cup
6 99 Cup, small
I want to select each individual type of item (television, hat, cup) - but for the ones that have multiple entries in PNR I only want to select the one with the shortest NAME. So the result set would be
ID PNR NAME
1 35 Television
4 87 Hat
5 99 Cup
How would I construct such a query using SQLite? Is it even possible, or do I need to do this filtering in the application code?
Since SQLite 3.7.11, you can use MIN() or MAX() to select a row in a group:
SELECT ID,
PNR,
Name,
min(length(Name))
FROM MyTable
GROUP BY PNR;
You can use MIN(length(name))-aggregate function to find out the minimum length of several names; the slightly tricky thing is to get corresponding ID and NAME into the result. The following query should work:
select mt1.ID, mt1.PNR, mt1.Name
from MyTable mt1 inner join (
select pnr, min(length(Name)) as minlength
from MyTable group by pnr) mt2
on mt1.pnr = mt2.pnr and length(mt1.Name) = mt2.minlength

Need to repeat and calculate values in a single Select statement

I hope that someone can help me with my issue. I need to create in a single SELECT statement (the system that we use has some pivot tables in Excel that handle one single SELECT) the following:
I have a INL (Invoice Lines) table, that has a lot of fields, but the important one is the date.
INL_ID DATE
19 2004-03-15 00:00:00.000
20 2004-03-15 00:00:00.000
21 2004-03-15 00:00:00.000
22 2004-03-16 00:00:00.000
23 2004-03-16 00:00:00.000
24 2004-03-16 00:00:00.000
Now, I also have a ILD (Invoice Line Details) that are related by an ID field to the INL table. From the second table I will need to use the scu_qty field to "repeat" values from the first one in my results sheet.
The ILD table values that we need are:
INL_ID scu_qty
19 1
20 1
21 1
22 4
23 4
Now, with the scu_qty I need to repeat the value of the first table and also add one day each record, the scu_qty is the quantity of days of the services that we sell in the ILD table.
So I need to get something like (i'm going to show the INL_ID 22 that you can see has a value different of 1 in the SCU_QTY). The results of the select has to give me something like:
INL_ID DATE
22 2004-03-15 0:00:00
22 2004-03-16 0:00:00
22 2004-03-17 0:00:00
22 2004-03-18 0:00:00
In this information I only wrote the fields that need to be repeated and calculated, of course I will need more fields, but will be repeated from the INL table, so I don't put them so you don't get confused.
I hope that someone can help me with this, it's very important for us this report. Thanks a lot in advance
(Sorry for my English, that isn't my first language)
SELECT INL_ID, scu_qty, CalculatedDATE ...
FROM INL
INNER JOIN ILD ON ...
INNER JOIN SequenceTable ON SequenceTable.seqNo <= ILD.scu_qty
ORDER BY INL_ID, SequenceTable.seqNo
Depending on your SQL flavour you will need to lookup date manipulation functions to do
CalculatedDATE = {INL.DATE + SequenceTable.seqNo (days)}
select INL.INL_ID, `DATE`
from
INL
inner join
ILD on INL.INL_ID = ILD.INL_ID
inner join (
select 1 as qty union select 2 union select 3 union select 4
) s on s.qty <= ILD.scu_qty
order by INL.INL_ID
In instead of that subselect you will need a table if quantity is a bit bigger. Or tell what is your RDBMS and there can be an easier way.