I have a dataset about trains, it's including a table for the customers information which is a number representing an age group and the amount of travellers for that age group.
The ID represents a location which has multiple departure times, which has multiple age groups.
The data looks something like this
StationID
Time of Departure
TravellerID
Amount of travellers
1
12:13
4001
30
1
12:13
4002
15
1
19:45
4001
10
1
19:45
4002
20
I want to sum the amount of travellers for each departure
I tried to code it this way:
SELECT StationID,[Time of Departure], sum(Amount)
FROM Train_Stations AS TS
INNER JOIN DepartureData AS DD
ON DD.FK_StationID = TS.PK_StationID
INNER JOIN CustomerInfo AS CI
ON CI.FK_StationID = TS.PK_StationID
GROUP BY StationID, [Time of Departure]
The result is like this:
StationID
Time of Departure
Amount
1
12:13
75
1
12:13
75
1
19:45
75
1
19:45
75
But I want it like this:
StationID
Time of Departure
Amount
1
12:13
45
1
19:45
30
Seems, you do something different.Based on your data query is correct
WITH CTE(StationID,DEPARTURE_TIME,TRAVELLERID,AMOUNT_OF_TRAVELLERS) AS
(
SELECT 1,CAST('12:13'AS TIME),4001,30 UNION ALL
SELECT 1,CAST('12:13'AS TIME),4002,15 UNION ALL
SELECT 1,CAST('19:45'AS TIME),4001,10 UNION ALL
SELECT 1,CAST('19:45'AS TIME),4002,20
)
SELECT C.StationID,C.DEPARTURE_TIME,SUM(AMOUNT_OF_TRAVELLERS)TOTAL_TRAVELLERS
FROM CTE AS C
GROUP BY C.StationID,C.DEPARTURE_TIME
You should specify the column as DD.StationID. It will return as an expected result.
SELECT DD.StationID,DD.[Time of Departure], sum(DD.Amount)
FROM Train_Stations AS TS
INNER JOIN DepartureData AS DD
ON DD.FK_StationID = TS.PK_StationID
INNER JOIN CustomerInfo AS CI
ON CI.FK_StationID = TS.PK_StationID
GROUP BY DD.StationID, DD.[Time of Departure]
Related
Tables - Store
Stores
Date
Customer_ID
A
01/01/2020
1111
C
01/01/2020
1111
F
02/01/2020
1234
A
02/01/2020
1111
A
02/01/2020
2222
Tables - Customer
Customer_ID
Age_Group
Income_Level
1111
26-30
Low
1234
25 and below
Mid
2222
31-60
High
I want to know how I can get this output.
Stores
Age_Group
Percentage_by_Age
Income_Level
Percentage_By_Income
A
25 and below
10
Low
80
A
25 and below
10
Mid
10
A
25 and below
10
High
10
A
26 - 30
42
Low
15
A
26 - 30
42
Mid
65
A
26 - 30
42
High
20
A
31 - 60
48
Low
30
A
31 - 60
48
Mid
50
A
31 - 60
48
High
20
I am using SQL to query from different tables.
First I need to aggregate the number of customers by stores, then in each store, I want to find out how many customers visited Store A in a particular age group(25 and below), and how many of them are in which income level.
May I know how I can go about solving this query?
Thanks.
My current solution/thought process
SELECT
stores AS Stores,
Age_Group AS Age,
Income_Level AS Income
COUNT(DISTINCT(Customer_ID)) AS Number_of_Customers
FROM tables JOIN tables....
GROUP BY Stores, Ages, Income;
And then manually calculating the percentages.
But it doesn't seem right.
Is there a way to produce an example output table using just SQL?
As per your requirement, Common Table Expressions can be used . You can use below code to get the expected output.
WITH
data_for_percent_by_income AS (
SELECT
COUNT(customer_id) AS cus_count_in_per_income_level_and_agegrp,
Age_group AS age_g,income_level AS inc_lvl
FROM
`project.dataset.Customer2`
WHERE
customer_id IN (
SELECT customer_id
FROM
`project.dataset.Store5`
WHERE stores='A')
GROUP BY
Age_group,income_level),tot_cus_in_defined_income_level AS (
SELECT
COUNT(customer_id) AS cus_count_in_per_income_level,Age_group AS ag
FROM
`project.dataset.Customer2`
WHERE
customer_id IN (
SELECT
customer_id
FROM
`project.dataset.Store5`
WHERE stores='A')
GROUP BY
Age_group),
tot_cus_storeA AS(
SELECT
COUNT(*) AS tot_cus_in_A
FROM
`project.dataset.Customer2`
WHERE customer_id IN (
SELECT customer_id
FROM
`project.dataset.Store5`
WHERE stores='A') ),
final_view AS(
SELECT
ROUND(cus_count_in_per_income_level_and_agegrp*100/cus_count_in_per_income_level) AS p_by_inc,
age_g,inc_lvl
FROM
data_for_percent_by_income
INNER JOIN
tot_cus_in_defined_income_level
ON
data_for_percent_by_income.age_g=tot_cus_in_defined_income_level.ag )
SELECT
stores,tot_cus_in_defined_income_level.ag AS age_group,income_level,
ROUND(cus_count_in_per_income_level*100/tot_cus_in_A) AS percentage_by_age,
p_by_inc AS percentage_by_income
FROM
tot_cus_in_defined_income_level,tot_cus_storeA,`project.dataset.Customer2`,`project.dataset.Store5`
INNER JOIN
final_view
ON
age_group=final_view.age_g AND income_level=final_view.inc_lvl
WHERE
tot_cus_in_defined_income_level.ag = Age_group AND stores='A'
GROUP BY
stores,percentage_by_age,age_group,income_level,percentage_by_income
ORDER BY Age_group
I have attached the screenshots of the input table and output table.
Customer Table
Store Table
Output Table
SELECT
s.Stores AS Stores,
c.age_group AS Age,
a.income_level AS Affluence,
CAST(COUNT(DISTINCT c.Customer_ID) AS numeric)*100/SUM(CAST(COUNT(DISTINCT c.Customer_ID) AS numeric)) OVER(PARTITION BY s.Stores ) AS Perc_of_Members
This is what I did in the end.
Given an hourly table A with full heart_rate records, e.g.:
User Hour Heart_rate
Joe 1 60
Joe 2 70
Joe 3 72
Joe 4 75
Joe 5 68
Joe 6 71
Joe 7 78
Joe 8 83
Joe 9 85
Joe 10 80
And a subset hours where a purchase happened, e.g.
User Hour Purchase
Joe 3 'Soda'
Joe 9 'Coke'
Joe 10 'Doughnut'
I want to keep only those records from A that are in B or at most 2hr behind the B subset, without duplication, i.e. and preserving both the heart_rate from A and the item purchased from b so the outcome is
User Hour Heart_rate Purchase
Joe 1 60 null
Joe 2 70 null
Joe 3 72 'Soda'
Joe 7 78 null
Joe 8 83 null
Joe 9 85 'Coke'
Joe 10 80 'Doughnut'
How can the result be achieved with an inner join, without duplication (in this case the hours 8&9) (This is an MWE, assume multiple users and timestamps instead of hours)
The obvious solution is to combine
Inner Join + deduplication
Left join
Can this be achieved in a more elegant way?
You could use an INNER join of the tables and conditional aggregation for the deduplication:
SELECT a.User, a.Hour, a.Heart_rate,
MAX(CASE WHEN a.Hour = b.Hour THEN b.Purchase END) Purchase
FROM a INNER JOIN b
ON b.User = a.User AND a.Hour BETWEEN b.Hour - 2 AND b.Hour
WHERE a.User = 'Joe' -- remove this line if you want results for all users
GROUP BY a.User, a.Hour, a.Heart_rate;
Or with MAX() window function:
SELECT DISTINCT a.*,
MAX(CASE WHEN a.Hour = b.Hour THEN b.Purchase END) OVER (PARTITION BY a.User, a.Hour) Purchase
FROM a INNER JOIN b
ON b.User = a.User AND a.Hour BETWEEN b.Hour - 2 AND b.Hour;
See the demo (for MySql but it is standard SQL).
Your solutiuons should work and sounds good.
There is another way, using 3 Select Statements.
The inner Select combines both tables by UNION ALL. Because only tables with the same columns can be combinded, fields which are only in one table have to be defined in the other one as well and set to null. The column hour_eat is added to see when the last purchase has occured. By sorting this table, we can archive that under each row from table B lies now the row of table A which occures next.
In the middle Select statement the lag(Purchase) gets the last Purchase. If we only think about the rows from the 1st table, the Purchase value from the 2nd table is now at the right place. This comes in handy if timestamps and not defined hours are used. The row the last_value calculates the time between the purchase and measurement of the heart_beat.
The outer Select filters the rows of interest. The last 2 hours before the purchase and only the rows of the 1st table.
With
heart_tbl as (SELECT "Joe" as USER, row_number() over() Hour, Heart_rate from unnest([60,72,72,75,68,71,78,83,85,80]) Heart_rate ),
eat_tbl as (Select "Joe" as User ,3 Hour , 'Soda' as Purchase UNION ALL SELECT "Joe", 9, 'Coke' UNION ALL SELECT "Joe", 10, 'Doughnut' )
SELECT user, hour,heart_rate,Purchase_,hours_till_Purchase
from
(
SELECT *,
lag(Purchase) over (order by hour, heart_rate is not null) as Purchase_,
hour-last_value(hour_eat ignore nulls) over (order by hour desc,heart_rate is not null) as hours_till_Purchase
From # combine both tables to one table (ordered by hours)
(
SELECT user, hour,heart_rate, null as Purchase, null as hour_eat from heart_tbl
UNION ALL
Select user, hour, null as heart_rate, Purchase, hour from eat_tbl
)
)
Where heart_rate is not null and hours_till_Purchase >= -2
order by hour
I have a control table, where Prices with Item number are tracked date wise.
id ItemNo Price Date
---------------------------
1 a001 100 1/1/2003
2 a001 105 1/2/2003
3 a001 110 1/3/2003
4 b100 50 1/1/2003
5 b100 55 1/2/2003
6 b100 60 1/3/2003
7 c501 35 1/1/2003
8 c501 38 1/2/2003
9 c501 42 1/3/2003
10 a001 95 1/1/2004
This is the query I am running.
SELECT pr.*
FROM prices pr
INNER JOIN
(
SELECT ItemNo, max(date) max_date
FROM prices
GROUP BY ItemNo
) p ON pr.ItemNo = p.ItemNo AND
pr.date = p.max_date
order by ItemNo ASC
I am getting below values
id ItemNo Price Date
------------------------------
10 a001 95 2004-01-01
6 b100 60 2003-01-03
9 c501 42 2003-01-03
Question is, is my query right or wrong? though I am getting my desired result.
Your query does what you want, and is a valid approach to solve your problem.
An alternative option would be to use a correlated subquery for filtering:
select p.*
from prices p
where p.date = (select max(p1.date) from prices where p1.itemno = p.itemno)
The upside of this query is that it can take advantage of an index on (itemno, date).
You can also use window functions:
select *
from (
select p.*, rank() over(partition by itemno order by date desc) rn
from prices p
) p
where rn = 1
I would recommend benchmarking the three options against your real data to assess which one performs better.
I have been searching the forum and found a single post that is a little smilair to my problem here: Calculate average for Top n combined with SQL Group By.
My situation is:
I have a table tblWEIGHT that contains: ID, Date, idPONR, Weight
I have a second table tblSALES that contains: ID, Date, Sales, idPONR
I have a third table tblPONR that contains: ID, PONR, idProduct
And a fouth table tblPRODUCT that contais: ID, Product
The linking:
tblWEIGHT.idPONR = tblPONR.ID
tblSALES.idPONR = tblPONR.ID
tblPONR.idProduct = tblPRODUCT.ID
The maintable of my query is tblSALES. I want to all my sales listed, with the moving average of the top5
weights of the PRODUCT where the date of the weight is less than the sales date, and the product is the same as the sold product. Its IMPORTANT that the result isn't grouped by the date. I need all the records of tblSALES.
i have gotten as far as to get the top 1 weight, but im not able to get the moving average instread.
The query that gest the top 1 is the following, and i am guessing that the query i need is going to look a lot like it.
SELECT tblSALES.ID, tblSALES.Dato, tblPONR.idPRODUCT,
(
SELECT top 1 Weight FROM tblWEIGHT INNER JOIN tblPONR ON tblWeight.idPONR = tblPONR.ID
WHERE tblPONR.idPRODUCT = idPRODUCT AND
SALES.Date > tblWEIGHT.Date
ORDER BY tblWEIGHT.Date desc
) AS LatestWeight
FROM tblSALES INNER JOIN VtblPONR ON tblSALES.idPONR = tblPONR.ID
this is not my exact query since im danish and i wouldnt make sense. I know im not supposed to use Date as a fieldname.
i imagine the filan query would be something like:
SELECT tblSALES.ID..... avg(SELECT TOP 5 weight .........)
but doing this i keep getting error at max 1 record can be returned by this subquery
Final Question.
How do i make a query that creates a moving average of the top 5 weights of my sold product, where the date of the weight is earlier than the date i sold the product?
EDIT Sampledata:
DATEFORMAT: dd/mm/yyyy
tblWEIGHT
ID Date idPONR Weight
1 01-01-2020 1 100
2 02-01-2020 2 200
3 03-01-2020 3 200
4 04-01-2020 3 400
5 05-01-2020 2 250
6 06-01-2020 1 150
7 07-01-2020 2 200
tblSALES
ID Date Sales(amt) idPONR
1 05-01-2020 30 1
2 06-01-2020 15 2
3 10-01-2020 20 3
tblPONR
ID PONR(production Number) idProduct
1 2521 1
2 1548 1
3 5484 2
tblPRODUCT
ID Product
1 Bricks
2 Tiles
Desired outcome read comments for AvgWeight
tblSALES.ID tblSALES.Date tblSales.Sales(amt) AvgWeigt
1 05-01-2020 30 123 -->avg(top 5 newest weight of both idPONR 1 And 2 because they are the same product, and where tblWeight.Date<05-01-2020)
2 06-01-2020 15 123 -->avg(top 5 newest weight of both idPONR 1 And 2 because they are the same product, and where tblWeight.Date<06-01-2020)
3 10-01-2020 20 123 -->avg(top 5 newest weight of idPONR 3 since thats the only idPONR with that product, and where tblWeight.Date<10-01-2020)
Consider:
Query1
SELECT tblWeight.ID AS WeightID, tblWeight.Date AS WtDate,
tblWeight.idPONR, tblPONR.PONR, tblPONR.idProduct, tblWeight.Weight, tblSales.SalesAmt,
tblSales.ID AS SalesID, tblSales.Date AS SalesDate
FROM (tblPONR INNER JOIN tblWeight ON tblPONR.ID = tblWeight.idPONR)
INNER JOIN tblSales ON tblPONR.ID = tblSales.idPONR;
Query2
SELECT * FROM Query1 WHERE WeightID IN (
SELECT TOP 5 WeightID FROM Query1 AS Dupe WHERE Dupe.idProduct = Query1.idProduct
AND Dupe.WtDate<Query1.SalesDate ORDER BY Dupe.WtDate);
Query3
SELECT Query2.SalesID, Query2.SalesDate, Query2.SalesAmt,
First(DAvg("Weight","Query2","idProduct=" & [idProduct] & " AND WtDate<#" & [SalesDate] & "#")) AS AvgWt
FROM Query2
GROUP BY Query2.SalesID, Query2.SalesDate, Query2.SalesAmt;
I have two queries that work perfectly.
SELECT fy.date_stop as pend
FROM account_fiscalyear fy
WHERE <any date> BETWEEN fy.date_start AND fy.date_stop
This returns the last date of the fiscal year in which can be found.
and
SELECT a.id as id, COALESCE(MAX(l.date),a.purchase_date) AS date
FROM account_asset_asset a
LEFT JOIN account_move_line l ON (l.asset_id = a.id)
WHERE a.id <some condition>
GROUP BY a.id, a.purchase_date
This returns a results similar to the following giving the asset id and purchase date or last depreciation date for the asset.
61 2014-09-01
96 2014-09-01
115 2015-02-25
181 2015-11-27
122 2015-04-03
87 2014-09-01
67 2014-09-01
207 2016-09-09
54 2014-09-01
159 2015-08-25
163 2015-08-19
....
The result I want is the asset id but this time with the last day of the financial year that the purchase date or last depreciation date can be found in. I just don't seem to be able to find a way to combine the two queries.
Solved it.
SELECT a.id as id, COALESCE(MAX(l.date), a.purchase_date) as date
FROM
(SELECT ass.id as id, fy.date_stop as purchase_date
FROM account_fiscalyear fy, account_asset_asset ass
WHERE ass.purchase_date BETWEEN fy.date_start AND fy.date_stop) a
LEFT JOIN
(SELECT mvl.asset_id as asset_id, fy.date_stop as date
FROM account_move_line mvl, account_period per, account_fiscalyear fy
WHERE mvl.period_id = per.id AND per.fiscalyear_id = fy.id) l
ON (l.asset_id = a.id)
GROUP BY a.id, a.purchase_date