SQL combining a left join with a where between - sql

I have two queries that work perfectly.
SELECT fy.date_stop as pend
FROM account_fiscalyear fy
WHERE <any date> BETWEEN fy.date_start AND fy.date_stop
This returns the last date of the fiscal year in which can be found.
and
SELECT a.id as id, COALESCE(MAX(l.date),a.purchase_date) AS date
FROM account_asset_asset a
LEFT JOIN account_move_line l ON (l.asset_id = a.id)
WHERE a.id <some condition>
GROUP BY a.id, a.purchase_date
This returns a results similar to the following giving the asset id and purchase date or last depreciation date for the asset.
61 2014-09-01
96 2014-09-01
115 2015-02-25
181 2015-11-27
122 2015-04-03
87 2014-09-01
67 2014-09-01
207 2016-09-09
54 2014-09-01
159 2015-08-25
163 2015-08-19
....
The result I want is the asset id but this time with the last day of the financial year that the purchase date or last depreciation date can be found in. I just don't seem to be able to find a way to combine the two queries.
Solved it.
SELECT a.id as id, COALESCE(MAX(l.date), a.purchase_date) as date
FROM
(SELECT ass.id as id, fy.date_stop as purchase_date
FROM account_fiscalyear fy, account_asset_asset ass
WHERE ass.purchase_date BETWEEN fy.date_start AND fy.date_stop) a
LEFT JOIN
(SELECT mvl.asset_id as asset_id, fy.date_stop as date
FROM account_move_line mvl, account_period per, account_fiscalyear fy
WHERE mvl.period_id = per.id AND per.fiscalyear_id = fy.id) l
ON (l.asset_id = a.id)
GROUP BY a.id, a.purchase_date

Related

SQL join type to have as many rows as each date for each customer

I have these two tables
date
2017-1
2017-2
2017-3
2017-4
2017-5
2017-6
and
date
customer
no_orders
city_code
2017-1
156
1
DNZ
2017-3
156
5
LON
2017-5
156
4
DNZ
2017-6
156
2
YQB
How can I join these two tables to have one row for each customer for all the dates same as below?
If on a date, the customer has no order, its no_order should be 0 and its city_code should be the city_code of the previous date.
date
customer
no_orders
city_code_2
2017-1
156
1
DNZ
2017-2
156
0
DNZ
2017-3
156
5
LON
2017-4
156
0
LON
2017-5
156
4
DNZ
2017-6
156
2
YQB
This code by #Tim Biegeleisen resolved part 1 of my question but now I want to handle both parts with each other.
SELECT d.date, c.customer, COALESCE(t.no_orders, 0) AS no_orders
FROM dates d
CROSS JOIN (SELECT DISTINCT customer FROM customers) c
LEFT JOIN customers t
ON t.date = d.date AND
t.customer = c.customer
ORDER BY c.customer, d.date;
We can use the following calendar table approach:
SELECT d.date, c.customer, COALESCE(t.no_orders, 0) AS no_orders
FROM dates d
CROSS JOIN (SELECT DISTINCT customer FROM customers) c
LEFT JOIN customers t
ON t.date = d.date AND
t.customer = c.customer
ORDER BY c.customer, d.date;
This assumes that the first table is called dates and the second table customers. The query works by using a cross join to generate a set of all dates and customers. We then left join to the second table to bring in the number of orders for a given customer on a given day. Absent number of orders are reported as zero.

SQL sum values for each ID

I have a dataset about trains, it's including a table for the customers information which is a number representing an age group and the amount of travellers for that age group.
The ID represents a location which has multiple departure times, which has multiple age groups.
The data looks something like this
StationID
Time of Departure
TravellerID
Amount of travellers
1
12:13
4001
30
1
12:13
4002
15
1
19:45
4001
10
1
19:45
4002
20
I want to sum the amount of travellers for each departure
I tried to code it this way:
SELECT StationID,[Time of Departure], sum(Amount)
FROM Train_Stations AS TS
INNER JOIN DepartureData AS DD
ON DD.FK_StationID = TS.PK_StationID
INNER JOIN CustomerInfo AS CI
ON CI.FK_StationID = TS.PK_StationID
GROUP BY StationID, [Time of Departure]
The result is like this:
StationID
Time of Departure
Amount
1
12:13
75
1
12:13
75
1
19:45
75
1
19:45
75
But I want it like this:
StationID
Time of Departure
Amount
1
12:13
45
1
19:45
30
Seems, you do something different.Based on your data query is correct
WITH CTE(StationID,DEPARTURE_TIME,TRAVELLERID,AMOUNT_OF_TRAVELLERS) AS
(
SELECT 1,CAST('12:13'AS TIME),4001,30 UNION ALL
SELECT 1,CAST('12:13'AS TIME),4002,15 UNION ALL
SELECT 1,CAST('19:45'AS TIME),4001,10 UNION ALL
SELECT 1,CAST('19:45'AS TIME),4002,20
)
SELECT C.StationID,C.DEPARTURE_TIME,SUM(AMOUNT_OF_TRAVELLERS)TOTAL_TRAVELLERS
FROM CTE AS C
GROUP BY C.StationID,C.DEPARTURE_TIME
You should specify the column as DD.StationID. It will return as an expected result.
SELECT DD.StationID,DD.[Time of Departure], sum(DD.Amount)
FROM Train_Stations AS TS
INNER JOIN DepartureData AS DD
ON DD.FK_StationID = TS.PK_StationID
INNER JOIN CustomerInfo AS CI
ON CI.FK_StationID = TS.PK_StationID
GROUP BY DD.StationID, DD.[Time of Departure]

T-SQL get values for specific group

I have a table EmployeeContract similar like this:
ContractId
EmployeeId
ValidFrom
ValidTo
Salary
12
5
2018-02-01
2019-06-31
x
25
8
2015-01-01
2099-12-31
x
50
5
2019-07-01
2021-05-31
x
52
6
2011-08-01
2021-12-31
x
72
8
2010-08-01
2014-12-31
x
52
6
2011-08-01
2021-12-31
x
Table includes history contracts in company for each employee. I need to get date when employees started work and last date of contract. Sometime records has duplicates.
For example, based on data from above:
EmployeeId
ValidFrom
ValidTo
5
2018-02-01
2021-05-31
8
2010-08-01
2099-12-31
6
2011-08-01
2021-12-31
Base on this article: https://www.techcoil.com/blog/sql-statement-for-selecting-the-latest-record-in-each-group/
I prepared query like this:
select minv.*, maxv.maxvalidto from
(select distinct con.[EmployeeId], mvt.maxvalidto
from [EmployeeContract] con
join (select [EmployeeId], max(validto) as maxvalidto
FROM [EmployeeContract]
group by [EmployeeId]) mvt
on con.[EmployeeId] = mvt.[EmployeeId] and mvt.maxvalidto = con.validto) maxv
join
(select distinct con.[EmployeeId], mvf.minvalidfrom
from [EmployeeContract] con
join (select [EmployeeId], min(validfrom) as minvalidfrom
FROM [EmployeeContract]
group by [EmployeeId]) mvf
on con.[EmployeeId] = mvf.[EmployeeId] and mvf.minvalidfrom = con.validfrom) minv
on minv.[EmployeeId] = maxv.[EmployeeId]
order by 1
But I'm not satisfied, i think it's not easy to read, and probably optimize is poor. How can I do it better?
I think you want group by:
select employeeid, min(validfrom), max(validto)
from employeecontract
group by employeeid

SQL Issue with Group By's . Choose a PK based on condition

I want to display only the most recent appointmentId by CustomerId . I know im doing the query all wrong but how can i achieve the desired result
SELECT TEMP.AppointmentId,TEMP.AppointmentDateTime,TEMP.PatientId,MAX(TEMP.AppointmentDateTime)
FROM
(SELECT
Appointment.Id AS AppointmentId,
Appointment.DateTime AS AppointmentDateTime,
Customer.Id AS CustomerId
FROM Customer
INNER JOIN Appointment ON Appointment.CustomerId = Customer.Id
INNER JOIN CustomerUser ON CustomerUser.CustomerId = Customer.Id
WHERE Appointment.UpdatedAt >= #StartDate AND Appointment.UpdatedAt <= #EndDate
AND Appointment.Is_Active = 1
AND Customer.Is_Active = 1
AND CustomerUser.Is_Active = 1
) AS TEMP
GROUP BY TEMP.AppointmentId,TEMP.AppointmentDateTime,TEMP.CustomerId
This is the result set that i have
AppointmentId AppointmentDateTime CustomerId
8909 2020-12-24 13:39:00 98
8931 2020-12-18 10:30:00 26
8932 2020-12-17 14:30:00 26
8933 2020-11-06 15:30:00 26
8934 2020-12-30 17:31:00 153
8936 2020-12-21 11:06:00 180
8938 2020-12-25 23:00:00 153
8943 2020-12-21 17:45:00 188
9046 2020-12-30 13:49:00 98
But this is the Expected result
AppointmentId AppointmentDateTime CustomerId
8931 2020-12-18 10:30:00 26
8934 2020-12-30 17:31:00 153
8936 2020-12-21 11:06:00 180
8943 2020-12-21 17:45:00 188
9046 2020-12-30 13:49:00 98
I want to display only the most recent appointmentId of a Patient based on the AppointmentDateTime.
One method is a correlated subquery:
select a.*
from Appointment a
where a.AppointmentDateTime = (select max(a2 AppointmentDateTime)
from Appointment a2
where a2.customerid = a.customerid
);
Your query is considerably more complicated than this, including tables and conditions not explained in the question. However, this answers the question that you have asked.
You made it too complicated. You only need one query and you need to remove your aggregated field from the group by statement.
SELECT
MAX(Appointment.Id) AS MaxAppointmentId,
MAX(Appointment.DateTime) AS AppointmentDateTime,
Customer.Id AS CustomerId
FROM Customer
INNER JOIN Appointment ON Appointment.CustomerId = Customer.Id
INNER JOIN CustomerUser ON CustomerUser.CustomerId = Customer.Id
WHERE Appointment.UpdatedAt >= #StartDate AND Appointment.UpdatedAt <= #EndDate
AND Appointment.Is_Active = 1
AND Customer.Is_Active = 1
AND CustomerUser.Is_Active = 1
GROUP BY Customer.Id
I think this will do what you want.
select *
from
( select *, row_number() over (partition by customerid order by AppointmentDateTime desc) as rn
from expandtable ) as t
where rn = 1

30-day rolling/moving sum when current date is missing

I have a table (view_of_referred_events) which stores the number of visitors for a given page.
date country_id referral product_id visitors
2016-04-01 216 pl 113759 1
2016-04-03 216 pl 113759 1
2016-04-06 216 pl 113759 13
2016-04-07 216 pl 113759 10
I want to compute the 30-day rolling/moving sum for this product, even for those days which are missing. So the end result should be something like the following:
date country_id referral product_id cumulative_visitors
2016-04-01 216 pl 113759 1
2016-04-02 216 pl 113759 1
2016-04-03 216 pl 113759 2
2016-04-04 216 pl 113759 2
2016-04-05 216 pl 113759 2
2016-04-06 216 pl 113759 15
2016-04-07 216 pl 113759 25
Now, this is a simplistic representation, because I have tens of different country_id, referral and product_id. I can't pre-create a table with all possible combinations of {date, country_id, referral and product_id} because this would become untreatable considering the size of the table. I don't also want to have a row in the final table if that specific {date, country_id, referral and product_id} didn't exist before.
I was thinking if there was an easy way to tell Impala to use the value of the previous row (the previous day) if in view_of_referred_events there are no visitors for that day.
I wrote this query, where list_of_dates is a table with a list of days from April 1st to April 7th.
select
t.`date`,
t.country_id,
t.referral,
t.product_id,
sum(visitors) over (partition by t.country_id, t.referral, t.product_id order by t.`date`
rows between 30 preceding and current row) as cumulative_sum_visitors
from (
selec
d.`date`,
re.country_id,
re.referral,
re.product_id,
sum(visitors) as visitors
from list_of_dates d
left outer join view_of_referred_events re on d.`date` = re.`date`
and re.referral = "pl"
and re.product_id = "113759"
and re.country_id = "216"
group by d.`date`, re.country_id, re.referral, re.product_id
) t
order by t.`date` asc;
This returns something similar to what I want, but not exactly that.
date country_id referral product_id cumulative_visitors
2016-04-01 216 pl 113759 1
2016-04-02 NULL NULL NULL NULL
2016-04-03 216 pl 113759 2
2016-04-04 NULL NULL NULL NULL
2016-04-05 NULL NULL NULL NULL
2016-04-06 216 pl 113759 15
2016-04-07 216 pl 113759 25
I have added another sub query to get the value from the last row in the partition. I am not sure what version of hive/impala you are using, last_value(column_name, ignore null values true/false) is the syntax.
I assume you are trying to find the cumulative counts over a 30 days (month), I recommend using month field to group the rows. The month could come either from your dimension table list_of_dates or just substr(date, 1, 7) and get the cumulative counts of visitors over ..rows unbounded preceding and current row.
query:
select
`date`,
country_id,
referral,
product_id,
sum(visitors) over (partition by country_id, referral, product_id order by `date`
rows between 30 preceding and current row) as cumulative_sum_visitors
from (select
t.`date`,
-- get the last not null value from the partition window w for country_id, referral & product_id
last_value(t.country_id, true) over w as country_id,
last_value(t.referral, true) over w as referral
last_value(t.product_id, true) over w as product_id
if(visitors = null, 0, visitors) as visitors
from (
select
d.`date`,
re.country_id,
re.referral,
re.product_id,
sum(visitors) as visitors
from list_of_dates d
left outer join view_of_referred_events re on d.`date` = re.`date`
and re.referral = "pl"
and re.product_id = "113759"
and re.country_id = "216"
group by d.`date`, re.country_id, re.referral, re.product_id
) t
window w as (partition by t.country_id, t.referral, t.product_id order by t.`date`
rows between unbounded preceding and unbounded following)) t1
order by `date` asc;
I'm not sure how goo the performance will be, but you can do this by aggregating the data twice and adding 30 days for the second aggregation and negating the count.
Something like this:
with t as (
select d.`date`, re.country_id, re.referral, re.product_id,
sum(visitors) as visitors
from list_of_dates d left outer join
view_of_referred_events re
on d.`date` = re.`date` and
re.referral = 'pl' and
re.product_id = 113759 and
re.country_id = 216
group by d.`date`, re.country_id, re.referral, re.product_id
)
select date, country_id, referral, product_id,
sum(sum(visitors)) over (partition by country_id, referral, product_id order by date) as visitors
from ((select date, country_id, referral, product_id, visitors
from t
) union all
(select date_add(date, 30), country_id, referral, product_id, -visitors
from t
)
) tt
group by date, country_id, referral, product_id;