I have a table with the order lines which show the Booking Amount and the booked date, but the revenue is recognised over 3 months (so 1/3 in the booked month and a further 1/3 in each of the next 2 months).
I need to create a query that would show the total revenue recognised in each month.
Is there an analytic function that could work this out? as at the moment I have cobbled together 3 joined queries that give the number but in 3 seperate columns, where I need it in one column:
select TRUNC(OM.BOOKING_DATE, 'MONTH') as Month
, SUM(OM.BOOKED_VALUE)/3 as Month_1
, M2.Month_2
, M3.Month_3
from ORDERS.OM,
(select TRUNC(ADD_MONTHS(OM.BOOKING_DATE,1), 'MONTH') as Month
, SUM(OM.BOOKED_VALUE)/3 as Month_2
from ORDERS.OM
GROUP By TRUNC(ADD_MONTHS(OM.BOOKING_DATE,1), 'MONTH')) M2,
(select TRUNC(ADD_MONTHS(OM.BOOKING_DATE,2), 'MONTH') as Month
, SUM(OM.BOOKED_VALUE)/3 as Month_3
from ORDERS.OM
GROUP By TRUNC(ADD_MONTHS(OM.BOOKING_DATE,2), 'MONTH')) M3
WHERE TRUNC(OM.BOOKING_DATE, 'MONTH') = M2.MONTH
AND TRUNC(OM.BOOKING_DATE, 'MONTH') = M3.MONTH
GROUP By TRUNC(OM.BOOKING_DATE, 'MONTH'), M2.Month_2, M3.Month_3
Order by 1 DESC
Triple every row and sum
select t.Month, SUM(t.Val) as Value
from ORDERS.OM
cross join lateral (select TRUNC(OM.BOOKING_DATE, 'MONTH') as Month, OM.BOOKED_VALUE/3.0 as Val from dual union all
select TRUNC(ADD_MONTHS(OM.BOOKING_DATE,1), 'MONTH'), OM.BOOKED_VALUE/3.0 from dual union all
select TRUNC(ADD_MONTHS(OM.BOOKING_DATE,2), 'MONTH'), OM.BOOKED_VALUE/3.0 from dual ) t
group by t.Month
Related
I'm trying to fill missing months in a SELECT query.
It looks like this :
SELECT sl.loonperiode_dt, (sum(slr.uren)) code_220
FROM HR.soc_loonbrief_regels slr,
HR.soc_loonbrieven sl,
HR.werknemers w,
HR.v_kontrakten vk
WHERE sl.loonperiode_dt BETWEEN '01012018' AND '01122018'
AND slr.loon_code_id IN (394)
AND slr.loonbrief_id = sl.loonbrief_id
AND w.werknemer_id = sl.werknemer_id
AND w.werknemer_id = vk.werknemer_id
AND vk.functie_id IN (121, 122, 128)
AND sl.loonperiode_dt BETWEEN hist_start_dt AND last_day(nvl(hist_eind_dt, sl.loonperiode_dt))
AND w.afdeling_id like '961'
GROUP BY sl.loonperiode_dt
ORDER BY sl.loonperiode_dt
It outputs this table :
31/01/18 234
30/04/18 245,8
31/05/18 714,6
31/07/18 288,04
31/08/18 281
30/11/18 515,12
I obviously would like it to be like that :
31/01/18 234
28/02/18 0
31/03/18 0
30/04/18 245,8
31/05/18 714,6
30/06/18 0
31/07/18 288,04
31/08/18 281
30/09/18 0
31/10/18 0
30/11/18 515,12
31/12/18 0
I have a calendar table 'CONV_HC.calendar' with dates in a column named 'DAT'.
I have seen many questions and answers about this, but I can't figure out how to apply the LEFT JOIN method or any other one to my current problem.
Thanks a lot in advance,
You could have a already done table with months and "join" with it, group by the date, or you can create one with subquery or using a with statement, something like
WITH Months (month) AS (
SELECT 1 AS Month FROM DUAL
UNION ALL
SELECT MONTH + 1
FROM Months
WHERE MONTH < 12
)
SELECT *
FROM Months
LEFT JOIN SomeTable
ON SomeTable.month = Months.MONTH
--ON Extract(MONTH FROM SomeTable.date) = Months.MONTH
edit
A better example:
--Just to simulate some table data
WITH SomeData AS (
SELECT TO_DATE('01/01/2019', 'MM/DD/YYYY') AS Dat, 5 AS Value FROM dual
UNION ALL
SELECT TO_DATE('01/05/2019', 'MM/DD/YYYY') AS Dat, 7 AS Value FROM dual
UNION ALL
SELECT TO_DATE('03/03/2019', 'MM/DD/YYYY') AS Dat, 2 AS Value FROM dual
UNION ALL
SELECT TO_DATE('11/05/2019', 'MM/DD/YYYY') AS Dat, 9 AS Value FROM dual
)
, Months (StartDate, MaxYear) AS (
SELECT CAST(TO_DATE('01/01/2019', 'MM/DD/YYYY') AS DATE) AS StartDate, 2019 AS MaxYear FROM DUAL
UNION ALL
SELECT CAST(ADD_MONTHS(StartDate, 1) AS DATE), MaxYear
FROM Months
WHERE EXTRACT(YEAR FROM ADD_MONTHS(StartDate, 1)) <= MaxYear
)
SELECT
Months.StartDate AS Dat
, SUM(SomeData.Value) AS SumValue
FROM Months
LEFT JOIN SomeData
ON Extract(MONTH FROM SomeData.Dat) = Extract(MONTH FROM Months.StartDate)
GROUP BY
Months.StartDate
edit
You won't find a just copy past solution, you need to get the idea from it and change to your context.
let's try this. You can "add" the missing months in an APP, or you can JOIN it with a already done table, doesn't need to be a real table, you can make one. The with statement is an example of it. So lets get all month, at the last day for 2019:
--Geting the last day of every month for 2019
WITH Months (CurrentMonth, MaxYear) AS (
SELECT CAST(TO_DATE('01/01/2019', 'MM/DD/YYYY') AS DATE) AS CurrentMonth, 2019 AS MaxYear FROM DUAL
UNION ALL
SELECT CAST(ADD_MONTHS(CurrentMonth, 1) AS DATE), MaxYear
FROM Months
WHERE EXTRACT(YEAR FROM ADD_MONTHS(CurrentMonth, 1)) <= MaxYear
)
SELECT LAST_DAY(Months.CurrentMonth) AS LastDay
FROM Months
Ok, now we have all months avaliable for the join. In your query, you already have the sum done so lets skip the sum and just use your data. Just add another with query.
--Geting the last day of every month for 2018
WITH Months (CurrentMonth, MaxYear) AS (
SELECT CAST(TO_DATE('01/01/2018', 'MM/DD/YYYY') AS DATE) AS CurrentMonth, 2018 AS MaxYear FROM DUAL
UNION ALL
SELECT CAST(ADD_MONTHS(CurrentMonth, 1) AS DATE), MaxYear
FROM Months
WHERE EXTRACT(YEAR FROM ADD_MONTHS(CurrentMonth, 1)) <= MaxYear
)
, YourData as (
SELECT sl.loonperiode_dt, (sum(slr.uren)) code_220
FROM HR.soc_loonbrief_regels slr,
HR.soc_loonbrieven sl,
HR.werknemers w,
HR.v_kontrakten vk
WHERE sl.loonperiode_dt BETWEEN '01012018' AND '01122018'
AND slr.loon_code_id IN (394)
AND slr.loonbrief_id = sl.loonbrief_id
AND w.werknemer_id = sl.werknemer_id
AND w.werknemer_id = vk.werknemer_id
AND vk.functie_id IN (121, 122, 128)
AND sl.loonperiode_dt BETWEEN hist_start_dt AND last_day(nvl(hist_eind_dt, sl.loonperiode_dt))
AND w.afdeling_id like '961'
GROUP BY sl.loonperiode_dt
--ORDER BY sl.loonperiode_dt
)
SELECT
LAST_DAY(Months.CurrentMonth) AS LastDay
, COALESCE(YourData.code_220, 0) AS code_220
FROM Months
Left Join YourData
on Extract(MONTH FROM Months.CurrentMonth) = Extract(MONTH FROM YourData.loonperiode_dt)
--If you have more years: AND Extract(YEAR FROM Months.CurrentMonth) = Extract(YEAR FROM YourData.loonperiode_dt)
ORDER BY LastDay ASC
I'm working on a bit of PostgreSQL to grab the first 10 and last 10 invoices of every month between certain dates. I am having unexpected output in the lateral joins. Firstly the limit is not working, and each of the array_agg aggregates is returning hundreds of rows instead of limiting to 10. Secondly, the aggregates appear to be the same, even though one is ordered ASC and the other DESC.
How can I retrieve only the first 10 and last 10 invoices of each month group?
SELECT first.invoice_month,
array_agg(first.id) first_ten,
array_agg(last.id) last_ten
FROM public.invoice i
JOIN LATERAL (
SELECT id, to_char(invoice_date, 'Mon-yy') AS invoice_month
FROM public.invoice
WHERE id = i.id
ORDER BY invoice_date, id ASC
LIMIT 10
) first ON i.id = first.id
JOIN LATERAL (
SELECT id, to_char(invoice_date, 'Mon-yy') AS invoice_month
FROM public.invoice
WHERE id = i.id
ORDER BY invoice_date, id DESC
LIMIT 10
) last on i.id = last.id
WHERE i.invoice_date BETWEEN date '2017-10-01' AND date '2018-09-30'
GROUP BY first.invoice_month, last.invoice_month;
This can be done with a recursive query that will generate the interval of months for who we need to find the first and last 10 invoices.
WITH RECURSIVE all_months AS (
SELECT date_trunc('month','2018-01-01'::TIMESTAMP) as c_date, date_trunc('month', '2018-05-11'::TIMESTAMP) as end_date, to_char('2018-01-01'::timestamp, 'YYYY-MM') as current_month
UNION
SELECT c_date + interval '1 month' as c_date,
end_date,
to_char(c_date + INTERVAL '1 month', 'YYYY-MM') as current_month
FROM all_months
WHERE c_date + INTERVAL '1 month' <= end_date
),
invocies_with_month as (
SELECT *, to_char(invoice_date::TIMESTAMP, 'YYYY-MM') invoice_month FROM invoice
)
SELECT current_month, array_agg(first_10.id), 'FIRST 10' as type FROM all_months
JOIN LATERAL (
SELECT * FROM invocies_with_month
WHERE all_months.current_month = invoice_month AND invoice_date >= '2018-01-01' AND invoice_date <= '2018-05-11'
ORDER BY invoice_date ASC limit 10
) first_10 ON TRUE
GROUP BY current_month
UNION
SELECT current_month, array_agg(last_10.id), 'LAST 10' as type FROM all_months
JOIN LATERAL (
SELECT * FROM invocies_with_month
WHERE all_months.current_month = invoice_month AND invoice_date >= '2018-01-01' AND invoice_date <= '2018-05-11'
ORDER BY invoice_date DESC limit 10
) last_10 ON TRUE
GROUP BY current_month;
In the code above, '2018-01-01' and '2018-05-11' represent the dates between we want to find the invoices. Based on those dates, we generate the months (2018-01, 2018-02, 2018-03, 2018-04, 2018-05) that we need to find the invoices for.
We store this data in all_months.
After we get the months, we do a lateral join in order to join the invoices for every month. We need 2 lateral joins in order to get the first and last 10 invoices.
Finally, the result is represented as:
current_month - the month
array_agg - ids of all selected invoices for that month
type - type of the selected invoices ('first 10' or 'last 10').
So in the current implementation, you will have 2 rows for each month (if there is at least 1 invoice for that month). You can easily join that in one row if you need to.
LIMIT is working fine. It's your query that's broken. JOIN is just 100% the wrong tool here; it doesn't even do anything close to what you need. By joining up to 10 rows with up to another 10 rows, you get up to 100 rows back. There's also no reason to self join just to combine filters.
Consider instead window queries. In particular, we have the dense_rank function, which can number every row in the result set according to groups:
SELECT
invoice_month,
time_of_month,
ARRAY_AGG(id) invoice_ids
FROM (
SELECT
id,
invoice_month,
-- Categorize as end or beginning of month
CASE
WHEN month_rank <= 10 THEN 'beginning'
WHEN month_reverse_rank <= 10 THEN 'end'
ELSE 'bug' -- Should never happen. Just a fall back in case of a bug.
END AS time_of_month
FROM (
SELECT
id,
invoice_month,
dense_rank() OVER (PARTITION BY invoice_month ORDER BY invoice_date) month_rank,
dense_rank() OVER (PARTITION BY invoice_month ORDER BY invoice_date DESC) month_rank_reverse
FROM (
SELECT
id,
invoice_date,
to_char(invoice_date, 'Mon-yy') AS invoice_month
FROM public.invoice
WHERE invoice_date BETWEEN date '2017-10-01' AND date '2018-09-30'
) AS fiscal_year_invoices
) ranked_invoices
-- Get first and last 10
WHERE month_rank <= 10 OR month_reverse_rank <= 10
) first_and_last_by_month
GROUP BY
invoice_month,
time_of_month
Don't be intimidated by the length. This query is actually very straightforward; it just needed a few subqueries.
This is what it does logically:
Fetch the rows for the fiscal year in question
Assign a "rank" to the row within its month, both counting from the beginning and from the end
Filter out everything that doesn't rank in the 10 top for its month (counting from either direction)
Adds an indicator as to whether it was at the beginning or end of the month. (Note that if there's less than 20 rows in a month, it will categorize more of them as "beginning".)
Aggregate the IDs together
This is the tool set designed for the job you're trying to do. If really needed, you can adjust this approach slightly to get them into the same row, but you have to aggregate before joining the results together and then join on the month; you can't join and then aggregate.
I am trying to count unique users on a monthly basis that were not present in the previous month. So if a user has a record for January and then another one for February, then I would only count January for that user.
user_id time
a1 1/2/17
a1 2/10/17
a2 2/18/17
a4 2/5/17
a5 3/25/17
My results should look like this
Month User Count
January 1
February 2
March 1
I'm not really familiar with BigQuery, but here's how I would solve the problem using TSQL. I imagine that you'd be able to use similar logic in BigQuery.
1). Order the data by user_id first, and then time. In TSQL, you can accomplish this with the following and store it in a common table expression, which you will query in the step after this.
;WITH cte AS
(
select ROW_NUMBER() OVER (PARTITION BY [user_id] ORDER BY [time]) AS rn,*
from dbo.employees
)
2). Next query for only the rows with rn = 1 (the first occurrence for a particular user) and group by the month.
select DATENAME(month, [time]) AS [Month], count(*) AS user_count
from cte
where rn = 1
group by DATENAME(month, [time])
This is assuming that 2017 is the only year you're dealing with. If you're dealing with more than one year, you probably want step #2 to look something like this:
select year([time]) as [year], DATENAME(month, [time]) AS [month],
count(*) AS user_count
from cte
where rn = 1
group by year([time]), DATENAME(month, [time])
First aggregate by the user id and the month. Then use lag() to see if the user was present in the previous month:
with du as (
select date_trunc(time, month) as yyyymm, user_id
from t
group by date_trunc(time, month)
)
select yyyymm, count(*)
from (select du.*,
lag(yyyymm) over (partition by user_id order by yyyymm) as prev_yyyymm
from du
) du
where prev_yyyymm is not null or
prev_yyyymm < date_add(yyyymm, interval 1 month)
group by yyyymm;
Note: This uses the date functions, but similar functions exist for timestamp.
The way I understood question is - to exclude user to be counted in given month only if same user presented in previous month. But if same user present in few months before given, but not in previous - user should be counted.
If this is correct - Try below for BigQuery Standard SQL
#standardSQL
SELECT Year, Month, COUNT(DISTINCT user_id) AS User_Count
FROM (
SELECT *,
DATE_DIFF(time, LAG(time) OVER(PARTITION BY user_id ORDER BY time), MONTH) AS flag
FROM (
SELECT
user_id,
DATE_TRUNC(PARSE_DATE('%x', time), MONTH) AS time,
EXTRACT(YEAR FROM PARSE_DATE('%x', time)) AS Year,
FORMAT_DATE('%B', PARSE_DATE('%x', time)) AS Month
FROM yourTable
GROUP BY 1, 2, 3, 4
)
)
WHERE IFNULL(flag, 0) <> 1
GROUP BY Year, Month, time
ORDER BY time
you can test / play with above using below example with dummy data from your question
#standardSQL
WITH yourTable AS (
SELECT 'a1' AS user_id, '1/2/17' AS time UNION ALL
SELECT 'a1', '2/10/17' UNION ALL
SELECT 'a2', '2/18/17' UNION ALL
SELECT 'a4', '2/5/17' UNION ALL
SELECT 'a5', '3/25/17'
)
SELECT Year, Month, COUNT(DISTINCT user_id) AS User_Count
FROM (
SELECT *,
DATE_DIFF(time, LAG(time) OVER(PARTITION BY user_id ORDER BY time), MONTH) AS flag
FROM (
SELECT
user_id,
DATE_TRUNC(PARSE_DATE('%x', time), MONTH) AS time,
EXTRACT(YEAR FROM PARSE_DATE('%x', time)) AS Year,
FORMAT_DATE('%B', PARSE_DATE('%x', time)) AS Month
FROM yourTable
GROUP BY 1, 2, 3, 4
)
)
WHERE IFNULL(flag, 0) <> 1
GROUP BY Year, Month, time
ORDER BY time
The output is
Year Month User_Count
2017 January 1
2017 February 2
2017 March 1
Try this query:
SELECT
t1.d,
count(DISTINCT t1.user_id)
FROM
(
SELECT
EXTRACT(MONTH FROM time) AS d,
--EXTRACT(MONTH FROM time)-1 AS d2,
user_id
FROM nbitra.tmp
) t1
LEFT JOIN
(
SELECT
EXTRACT(MONTH FROM time) AS d,
user_id
FROM nbitra.tmp
) t2
ON t1.d = t2.d+1
WHERE
(
t1.user_id <> t2.user_id --User is in previous month
OR t2.user_id IS NULL --To handle january, since there is no previous month to compare to
)
GROUP BY t1.d;
I have two queries
1)
select Year , Month, Sum(Stores) from ABC ;
2)
select Year, Month , Sum(SalesStores) from DEF ;
I want a result like :
**Year, Month , Sum(Stores), Sum(SalesStores)**
How can I do it ?
I tried union & Union all
select Year , Month, Sum(Stores) from ABC union
select Year, Month , Sum(SalesStores) from DEF ;
I see only 3 columns in the output
Year, Month Sum(Stores).
Here are the tables :
Year, Month Stores
Year Month SalesStores
Is there a way I can see the result in the format I would like to see ?
Since I don't know their relationship, I prefer to use UNION ALL.
SELECT Year,
Month,
MAX(TotalStores) TotalStores,
MAX(TotalSalesStores) TotalSalesStores
FROM
(
SELECT Year, Month,
SUM(Stores) TotalStores,
NULL TotalSalesStores
FROM ABC
UNION ALL
SELECT Year, Month,
NULL TotalStores,
SUM(SalesStores) TotalSalesStores
from DEF
) a
GROUP BY Year, Month
You can UNION them in the following fashion:
SELECT Year , Month, Sum(Stores) As Stores, NULL As SalesStores from ABC
UNION
SELECT Year , Month, NULL As Stores, Sum(Stores) As SalesStores from ABC
Or use UNION ALL if your logic allows it.
Try:
SELECT Year, Month, SUM(TotalStores) as TotalAllStores, SUM(TotalSalesStore) as TotalAllSalesStore
FROM
(
SELECT Year , Month, Sum(Stores) as TotalStores, 0 as TotalSalesStore from ABC union
UNION ALL
SELECT Year, Month , 0 as TotalStores, Sum(SalesStores) as TotalSalesStore from DEF
) SalesByYearMonth
GROUP BY Year, Month
I would use FULL OUTER JOIN thus:
SELECT ISNULL(x.[Year], y.[Year]) AS [Year],
ISNULL(x.[Month], y.[Month]) AS [Month],
x.Sum_Stores,
y.Sum_SalesStores
FROM (select Year , Month, Sum(Stores) AS Sum_Stores from ABC ...) AS x
FULL OUTER JOIN (select Year, Month , Sum(SalesStores) AS Sum_SalesStores from DEF ...) AS y
ON x.[Year] = y.[Year] AND x.[Month] = y.[Month]
I have a table which has a table like this.
Month-----Book_Type-----sold_in_Dollars
Jan----------A------------ 100
Jan----------B------------ 120
Feb----------A------------ 50
Mar----------A------------ 60
Mar----------B------------ 30
and so on
I have to calculate the expected sales for each month and book type based on the last 2 months sales.
So for March and type A it would be (100+50)/2 = 75
For March and type B it is 120/1 since no data for Feb is there.
I was trying to use the lag function but it wouldn't work since there is data missing in a few rows.
Any ideas on this?
Since it plans to ignore missing values, this should probably work. Don't have a database to test it on at the moment but will give it another go in the morning
select
month,
book_type,
sold_in_dollars,
avg(sold_in_dollars) over (partition by book_type order by month
range between interval '2' month preceding and interval '1' month preceding) as avg_sales
from myTable;
This sort of assumes that month has a date datatype and can be sorted on... if it's just a text string then you'll need something else.
Normally you could just use rows between 2 preceding and 1 preceding but but this will take the two previous data points and not necessarily the two previous months if there are rows missing.
You could work it out with lag but it would be a bit more complicated.
As far as I know, you can give a default value to lag() :
SELECT Book_Type,
(lag(sold_in_Dollars, 1, 0) OVER(PARTITION BY Book_Type ORDER BY Month) + lag(sold_in_Dollars, 2, 0) OVER(PARTITION BY Book_Type ORDER BY Month))/2 AS expected_sales
FROM your_table
GROUP BY Book_Type
(Assuming Month column doesn't really contain JAN or FEB but real, orderable dates.)
What about something like (forgive the sql server syntax, but you get the idea):
Select Book_type, AVG(sold_in_dollars)
from MyTable
where Month in (Month(DATEADD('mm'-1,GETDATE)),Month(DATEADD('mm'-2,GETDATE)))
group by booktype
A partition outer join can help create the missing data. Create a set of months and join those values to each row by the month and perform the join once for each book type. I created the months January through April in this example:
with test_data as
(
select to_date('01-JAN-2010', 'DD-MON-YYYY') month, 'A' book_type, 100 sold_in_dollars from dual union all
select to_date('01-JAN-2010', 'DD-MON-YYYY') month, 'B' book_type, 120 sold_in_dollars from dual union all
select to_date('01-FEB-2010', 'DD-MON-YYYY') month, 'A' book_type, 50 sold_in_dollars from dual union all
select to_date('01-MAR-2010', 'DD-MON-YYYY') month, 'A' book_type, 60 sold_in_dollars from dual union all
select to_date('01-MAR-2010', 'DD-MON-YYYY') month, 'B' book_type, 30 sold_in_dollars from dual
)
select book_type, month, sold_in_dollars
,case when denominator = 0 then 'N/A' else to_char(numerator / denominator) end expected_sales
from
(
select test_data.book_type, all_months.month, sold_in_dollars
,count(sold_in_dollars) over
(partition by book_type order by all_months.month rows between 2 preceding and 1 preceding) denominator
,sum(sold_in_dollars) over
(partition by book_type order by all_months.month rows between 2 preceding and 1 preceding) numerator
from
(
select add_months(to_date('01-JAN-2010', 'DD-MON-YYYY'), level-1) month from dual connect by level <= 4
) all_months
left outer join test_data partition by (test_data.book_type) on all_months.month = test_data.month
)
order by book_type, month