For example, I have sales data for 1 year, and some of the products not available on a specific date range.
I currently have for 1 date range, but what is the best practice if have multiple exclusions?
SELECT * FROM XXX
WHERE
IF(Date BETWEEN '2018-11-22' AND '2019-03-28',
ID IN (8467,8468,8469,8470),
ID IN (8467,8468,8469,8470,9551,9552,9553)
)
Especially how to solve the issue if dates are overlapping?
If you are trying to exclude values, I am thinking:
SELECT *
FROM XXX
WHERE ID IN (8467, 8468, 8469, 8470, 9551, 9552, 9553) AND
(Date BETWEEN '2018-11-22' AND '2019-03-28' AND
ID NOT IN (9551, 9552, 9553) OR
Date NOT BETWEEN '2018-11-22' AND '2019-03-28'
);
You can add multiple pairs for other dates.
For a full solution, you might want to create a table with olumns such as:
product_id
start_exclusion_date
end_exclusion_date
And then phrase the query as:
select xxx.*
from xxx left join
exclusions e
on xxx.id = e.product_id and
xxx.date >= e.start_exclusion_date and
xxx.date <= e.end_exclusion_date
where xxx.id in ( . . . );
This is likely to be easier to maintain in the long term.
Try this,
select * from xxx
where not(date between '2018-11-22' and '2019-03-28' and id in(9551,9552,9553))
order by id, date
Below is an example for BigQuery Standard SQL and shows direction for building "complete picture" with whitelist and blacklist rules (all with quite simplified dummy data just to demonstrate it in action)
#standardSQL
WITH `project.dataset.xxx` AS (
SELECT 1 id, DATE '2018-11-22' `date` UNION ALL
SELECT 2, '2018-11-23' UNION ALL
SELECT 3, '2018-11-24' UNION ALL
SELECT 4, '2018-11-25' UNION ALL
SELECT 1, '2018-11-26' UNION ALL
SELECT 2, '2018-11-27' UNION ALL
SELECT 3, '2018-11-28' UNION ALL
SELECT 8, '2018-11-29'
), `project.dataset.whitelist` AS (
SELECT DATE '2018-11-22' start, DATE '2018-11-29' finish, [2,3] ids UNION ALL
SELECT '2018-11-22', '2018-11-22', [1]
), `project.dataset.blacklist` AS (
SELECT DATE '2018-11-26' start, DATE '2018-11-28' finish, [1,3] ids UNION ALL
SELECT '2018-11-22', '2018-11-22', [10]
)
SELECT DISTINCT t.*
FROM `project.dataset.xxx` t
JOIN `project.dataset.whitelist` w
ON (`date` BETWEEN w.start AND w.finish AND id IN UNNEST(w.ids))
JOIN `project.dataset.blacklist` b
ON NOT(`date` BETWEEN b.start AND b.finish AND id IN UNNEST(b.ids))
with result
Row id date
1 1 2018-11-22
2 2 2018-11-27
3 2 2018-11-23
4 3 2018-11-28
5 3 2018-11-24
Obviously, in real case all involved tables are real tables and query will look just like below
#standardSQL
SELECT DISTINCT t.*
FROM `project.dataset.xxx` t
JOIN `project.dataset.whitelist` w
ON (`date` BETWEEN w.start AND w.finish AND id IN UNNEST(w.ids))
JOIN `project.dataset.blacklist` b
ON NOT(`date` BETWEEN b.start AND b.finish AND id IN UNNEST(b.ids))
Related
I am running this query to get the average of logins per user for the last 3 months. If the user has logged-in in the last 3 months, get its average, if not return 0.
I have tried a number of different ways but seems like if the user has not logged in during the last 3 months, there are no records and the count() does not return 0. It simply returns nothing.
1) select case count(*)
WHEN 0
THEN 0
ELSE count(creationTS) / 3
END as average
from table_name where creationTS >= add_months(now(), -3)
and userId = '110'
group by userId;
2) select COALESCE(count(creationTS)/3,0) as average
from table_name where creationTS >= add_months(now(), -3)
and userId = '110'
group by userId;
It gives correct result if a record is found for the condition 'creationTS >= add_months(now(), -3)' but no record exists, it returns nothing. How can I return 0 in that case.
Try it like this:
a) get all distinct userid-s from the base table in a full-select.
b) left join that full-select back with the base table, on equality of the user id and login date not earlier than 3 months ago
c) count the found user id-s in the base table, getting NULL by default if the join fails, and use NVL() to force a 0 in case of NULL, and group by user id
WITH
-- sample input data,not part of real query
indata(userid,login_dt) AS (
SELECT 'arthur', DATE '2021-09-15'
UNION ALL SELECT 'arthur', DATE '2021-08-27'
UNION ALL SELECT 'arthur', DATE '2021-08-01'
UNION ALL SELECT 'trillian', DATE '2021-09-27'
UNION ALL SELECT 'trillian', DATE '2021-08-15'
UNION ALL SELECT 'trillian', DATE '2021-06-27'
UNION ALL SELECT 'ford', DATE '2021-02-27'
UNION ALL SELECT 'ford', DATE '2021-04-27'
)
,
userids AS (
SELECT DISTINCT
userid
FROM indata
)
SELECT
userids.userid
, NVL(COUNT(indata.userid),0) AS login_count
FROM userids
LEFT JOIN indata
ON userids.userid=indata.userid
AND login_dt >= ADD_MONTHS(CURRENT_DATE,-3)
GROUP BY
userids.userid
;
userid | login_count
----------+-------------
arthur | 3
ford | 0
trillian | 2
I am trying to join three table and get the results, however, one of the tables has multiple event_code for the same CSO_Item_key which is resulting in duplicate records.
Please note my source is Vertica and Target is SQL server.
I tried stuff and for XML approach but is not working with vertica; it says incorrect syntax XML.
Is there any other solution
Table 1
Entry Date Cso Item Key Fail Code
8/1/2018 4:28 BLXB796201 CSL120
8/1/2018 4:40 BLXB799101 CLL250
8/1/2018 4:55 BLXB803001 CMS130
8/1/2018 5:08 BLXB806201 CNE100
Table 2
Cso Item Key Event Code
BLXB796201 GTS
BLXB796201 LC28
BLXB796201 SDR4
BLXB799101 GTS
BLXB799101 LC28
BLXB799101 SDR4
BLXB803001 GTS
BLXB803001 LC28
BLXB803001 SDR4
BLXB806201 GTS
BLXB806201 LC28
BLXB806201 SDR4
Table 3
Fail Code Desc
CSL120 Bad Part
CLL250 Unit Scrapped
CNE100 OS Reinstall
CBN101 NTF
Expected Result:
Entry_Date Cso_Item_Key Fail_Code Desc Event_Code
8/1/2018 4:28 BLXB796201 CSL120 Bad Part GTS,LC28,SDR4
8/1/2018 4:40 BLXB799101 CLL250 Unit Scrapped GTS,LC28,SDR4
8/1/2018 4:55 BLXB803001 CMS130 Null GTS,LC28,SDR4
8/1/2018 5:08 BLXB806201 CNE100 OS Reinstall GTS,LC28,SDR4
Screenshot of data:
One of the only solutions I've seen for this is the strings_package extension which can be found here on github. With it, you can use the group_concat function like so:
-- get a list of nodes
select group_concat(node_name) over () from nodes;
-- nodes with storage for a projection
select schema_name,projection_name,
group_concat(node_name) over (partition by schema_name,projection_name)
from (select distinct node_name,schema_name,projection_name from storage_containers) sc order by schema_name, projection_name;
This is trying to do it all in SQL - a bit cheating as I am relying on the fact that Table_2 always has 3 different event codes for each CSO Item Key.
If that is not the case, you would have to add a few rows - up to the maximum number of Event Codes per CSO Item Key, to the i index table I'm creating as a Common Table expression, and you would have to LEFT JOIN that i table to tb2, and add some NULL processing logic to the expression, for example: ||','||MAX(CASE i.i WHEN 2 THEN event_code END), so that an empty string is concatenated when the event_code in the expression is NULL.
But otherwise - with your input (which you should take out of the query when you really use it), it could look like this:
WITH
-- your input, don't use in real query ...
tb1(Entry_Date,Cso_Item_Key,Fail_Code) AS (
SELECT TIMESTAMP '8/1/2018 4:28','BLXB796201','CSL120'
UNION ALL SELECT TIMESTAMP '8/1/2018 4:40','BLXB799101','CLL250'
UNION ALL SELECT TIMESTAMP '8/1/2018 4:55','BLXB803001','CMS130'
UNION ALL SELECT TIMESTAMP '8/1/2018 5:08','BLXB806201','CNE100'
)
,
tb2(Cso_Item_Key,Event_Code) AS (
SELECT 'BLXB796201','GTS'
UNION ALL SELECT 'BLXB796201','LC28'
UNION ALL SELECT 'BLXB796201','SDR4'
UNION ALL SELECT 'BLXB799101','GTS'
UNION ALL SELECT 'BLXB799101','LC28'
UNION ALL SELECT 'BLXB799101','SDR4'
UNION ALL SELECT 'BLXB803001','GTS'
UNION ALL SELECT 'BLXB803001','LC28'
UNION ALL SELECT 'BLXB803001','SDR4'
UNION ALL SELECT 'BLXB806201','GTS'
UNION ALL SELECT 'BLXB806201','LC28'
UNION ALL SELECT 'BLXB806201','SDR4'
)
,
tb3(Fail_Code,Descr) AS (
SELECT 'CSL120','Bad Part'
UNION ALL SELECT 'CLL250','Unit Scrapped'
UNION ALL SELECT 'CNE100','OS Reinstall'
UNION ALL SELECT 'CBN101','NTF'
)
-- real WITH clause starts here - and table "i" can contain more than 3 rows..
,
i(i) AS (
SELECT 1
UNION ALL SELECT 2
UNION ALL SELECT 3
)
,
tb2_w_i AS (
SELECT
*
, ROW_NUMBER() OVER (PARTITION BY cso_item_key ORDER BY event_code) AS i
FROM tb2
)
,
tb2_pivot AS (
SELECT
cso_item_key
, MAX(CASE i.i WHEN 1 THEN event_code END)
||','||MAX(CASE i.i WHEN 2 THEN event_code END)
||','||MAX(CASE i.i WHEN 3 THEN event_code END)
AS event_codes
FROM tb2_w_i JOIN i USING(i)
GROUP BY 1
)
SELECT
entry_date
, tb1.cso_item_key
, tb1.fail_code
, descr
, event_codes
FROM tb1
JOIN tb2_pivot USING(cso_item_key)
LEFT JOIN tb3 USING(fail_code)
;
The result (my NULLSTRING is the dash..)
entry_date |cso_item_key|fail_code|descr |event_codes
2018-08-01 04:28:00|BLXB796201 |CSL120 |Bad Part |GTS,LC28,SDR4
2018-08-01 04:40:00|BLXB799101 |CLL250 |Unit Scrapped|GTS,LC28,SDR4
2018-08-01 04:55:00|BLXB803001 |CMS130 |- |GTS,LC28,SDR4
2018-08-01 05:08:00|BLXB806201 |CNE100 |OS Reinstall |GTS,LC28,SDR4
EDIT: added third requirement after playing with solution from Tim Biegeleisen
EDIT2: modified Robbie's DOB to be before his parent's marriage date
I am trying to create a query that will look at two tables and determine the difference in dates based on a percentage. I know, super confusing... Let me try and explain using the tables below:
Bob and Mary are married on 2010-01-01 and expect 4 kids (Parent table)
I want to know how many years it took until they met 50% of their expected kids (i.e. 2/4 kids). Using the Child table to see the DOB of their 4 kids, we know that Frankie is the second child which meets our 50% threshold so we use Frankie's DOB and subtract it from Frankie's parent's marriage date and end up with 3 years!
If the goal isn't reached then display no value e.g. Mick and Jo only had 1 child so far so they haven't yet reached their goal
Hoping this is doable using BigQuery standard SQL.
Parent table
id married_couple married_at expected_kids
--------------------------------------
1 Bob and Mary 2010-01-01 4
2 Mick and Jo 2010-01-01 4
Child table
id child_name parent_id date_of_birth
--------------------------------------
1 Eddie 1 2012-01-01
2 Frankie 1 2013-01-01
3 Robbie 1 2005-01-01
4 Duncan 1 2015-01-01
5 Rick 2 2014-01-01
Expected SQL result
parent_id half_goal_reached(years)
--------------------------------------
1 3
2
Below both soluthions for BigQuery Standard SQL
First one is more in classic sql way, the second one is more of BigQuery style (I think)
First Solution: with analytics function
#standardSQL
SELECT
parent_id,
IF(
MAX(pos) = MAX(CAST(expected_kids / 2 AS INT64)),
MAX(DATE_DIFF(date_of_birth, married_at, YEAR)),
NULL
) AS half_goal_reached
FROM (
SELECT c.parent_id, c.date_of_birth, expected_kids, married_at,
ROW_NUMBER() OVER(PARTITION BY c.parent_id ORDER BY c.date_of_birth) AS pos
FROM `child` AS c
JOIN `parent` AS p
ON c.parent_id = p.id
)
WHERE pos <= CAST(expected_kids / 2 AS INT64)
GROUP BY parent_id
Second Solution: with use of ARRAY
#standardSQL
SELECT
parent_id,
DATE_DIFF(dates[SAFE_ORDINAL(CAST(expected_kids / 2 AS INT64))], married_at, YEAR) AS half_goal_reached
FROM (
SELECT
parent_id,
ARRAY_AGG(date_of_birth ORDER BY date_of_birth) AS dates,
MAX(expected_kids) AS expected_kids,
MAX(married_at) AS married_at
FROM `child` AS c
JOIN `parent` AS p
ON c.parent_id = p.id
GROUP BY parent_id
)
Dummy Data
You can test / play with both solutions using below dummy data
#standardSQL
WITH `parent` AS (
SELECT 1 id, 'Bob and Mary' married_couple, DATE '2010-01-01' married_at, 4 expected_kids UNION ALL
SELECT 2, 'Mick and Jo', DATE '2010-01-01', 4
),
`child` AS (
SELECT 1 id, 'Eddie' child_name, 1 parent_id, DATE '2012-01-01' date_of_birth UNION ALL
SELECT 2, 'Frankie', 1, DATE '2013-01-01' UNION ALL
SELECT 3, 'Robbie', 1, DATE '2014-01-01' UNION ALL
SELECT 4, 'Duncan', 1, DATE '2015-01-01' UNION ALL
SELECT 5, 'Rick', 2, DATE '2014-01-01'
)
Try the following query, whose logic is too verbose to explain it well. I join the parent and child tables, bringing into line the parent id, number of years elapsed since marriage, running number of children, and expected number of children. With this information in hand, we can easily find the first row whose running number of children matches or exceeds half of the expected number.
SELECT parent_id, num_years AS half_goal_reached
FROM
(
SELECT parent_id, num_years, cnt, expected_kids,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY num_years) rn
FROM
(
SELECT
t2.parent_id,
YEAR(t2.date_of_birth) - YEAR(t1.married_at) AS num_years,
(SELECT COUNT(*) FROM child c
WHERE c.parent_id = t2.parent_id AND
c.date_of_birth <= t2.date_of_birth) AS cnt,
t1.expected_kids
FROM parent t1
INNER JOIN child t2
ON t1.id = t2.parent_id
) t
WHERE
cnt >= expected_kids / 2
) t
WHERE t.rn = 1;
Note that there may be issues with how I computed the yearly differences, or how I compute the threshhold for half the number of expected children. Also, if we were using a recent enterprise database we could have used an analytic function to get the running number of children instead of a correlated subquery, but I was unsure if Big Query would support that, so I used the latter.
I have data in this format
student_id,month1,fees
A1,201612,22
A1,201611,33
A1,201610,44
A1,201609,55
A1,201608,66
A1,201607,77
A1,201606,88
A2,201612,12
A2,201610,24
A2,201609,36
A2,201607,48
I want fees of every student considering average of last three month fees means for student A1, for month 201612, fees will be sum(22,33,44)/3 so I used this query
(select student_id,month1,fees,(sum(fees) over(partition by
student_id
order by
student_id
,
month1
asc rows between 2 preceding and current row ))/3 as avg1 from table where
month1
>(select trim(Add_Months(cast(trim(maxrepmonth) as DATE Format 'YYYYMM'),-5) (format 'YYYYMM')) from (select max(
month1
) as maxrepmonth from table) z2) group by 1,2,3)
and this works fine for student A1 as it is having all months data but in case of student A2, for month 201612, It is taking fees from these months 201612,201610,201609 which is wrong, instead it should take only from 201612,201610 as 201611 is missing.
Please help.
Thanks
OLAP functions from the ANSI-99 standard are your friend here - especially the RANGE window clause.
Try this - I'm doing it with Vertica, but Teradata should be just as ANSI compliant and do it for you, too:
WITH foo(student_id,month1,fees) AS (
SELECT 'A1',DATE '2016-12-01',22
UNION ALL SELECT 'A1',DATE '2016-11-01',33
UNION ALL SELECT 'A1',DATE '2016-10-01',44
UNION ALL SELECT 'A1',DATE '2016-09-01',55
UNION ALL SELECT 'A1',DATE '2016-08-01',66
UNION ALL SELECT 'A1',DATE '2016-07-01',77
UNION ALL SELECT 'A1',DATE '2016-06-01',88
UNION ALL SELECT 'A2',DATE '2016-12-01',12
UNION ALL SELECT 'A2',DATE '2016-10-01',24
UNION ALL SELECT 'A2',DATE '2016-09-01',36
UNION ALL SELECT 'A2',DATE '2016-07-01',48
)
SELECT
*
, AVG(fees) OVER (
PARTITION BY student_id ORDER BY month1
RANGE BETWEEN INTERVAL '3 MONTHS' PRECEDING AND CURRENT ROW
) AS rolling_avg_3_months
FROM foo;
student_id|month1 |fees|rolling_avg_3_months
A1 |2016-06-01| 88| 88
A1 |2016-07-01| 77| 82.5
A1 |2016-08-01| 66| 77
A1 |2016-09-01| 55| 66
A1 |2016-10-01| 44| 55
A1 |2016-11-01| 33| 44
A1 |2016-12-01| 22| 33
A2 |2016-07-01| 48| 48
A2 |2016-09-01| 36| 42
A2 |2016-10-01| 24| 30
A2 |2016-12-01| 12| 18
Thanks Dudu, thanks Amit
So Teradata does not support RANGE ...
Now it becomes trickier.
Found a working solution, but it needs some explanation.
Hope the comments are enough.
WITH
-- the input data
foo(student_id,month1,fees) AS (
SELECT 'A1',DATE '2016-12-01',22
UNION ALL SELECT 'A1',DATE '2016-11-01',33
UNION ALL SELECT 'A1',DATE '2016-10-01',44
UNION ALL SELECT 'A1',DATE '2016-09-01',55
UNION ALL SELECT 'A1',DATE '2016-08-01',66
UNION ALL SELECT 'A1',DATE '2016-07-01',77
UNION ALL SELECT 'A1',DATE '2016-06-01',88
UNION ALL SELECT 'A2',DATE '2016-12-01',12
UNION ALL SELECT 'A2',DATE '2016-10-01',24
UNION ALL SELECT 'A2',DATE '2016-09-01',36
UNION ALL SELECT 'A2',DATE '2016-07-01',48
)
-- add two columns fees1before and fees2before, that can be null,
-- containing the fees of the two previous rows if the 'month1'
-- value of those rows is less than 3 months back
, foo_3_months AS (
SELECT
student_id
, month1
, fees AS fees_now
, CASE
WHEN
MONTHS_BETWEEN(month1,LAG(month1) OVER (PARTITION BY student_id ORDER BY month1))
< 3
THEN LAG(fees) OVER (PARTITION BY student_id ORDER BY month1)
END AS fees_1before
, CASE
WHEN MONTHS_BETWEEN(month1,LAG(month1,2) OVER (PARTITION BY student_id ORDER BY month1))
< 3
THEN LAG(fees,2) OVER (PARTITION BY student_id ORDER BY month1)
END AS fees_2before
FROM foo
)
-- finally, build a hard-wired average formula that takes care of
-- the fact that two of the three values can be NULL
-- I'm keeping the two additional columns for debugging purposes.
-- They can be removed in the end.
SELECT
*
, (fees_now+NVL(fees_1before,0)+NVL(fees_2before,0) )
/ (
1
+ (CASE WHEN fees_1before IS NOT NULL THEN 1 ELSE 0 END)
+ (CASE WHEN fees_2before IS NOT NULL THEN 1 ELSE 0 END)
)
AS rolling_avg_3months
FROM foo_3_months
;
Here's the result:
student_id|month1 |fees_now|fees_1before|fees_2before|rolling_avg_3months
A1 |2016-06-01| 88|- |- |88.000000000000000000
A1 |2016-07-01| 77| 88|- |82.500000000000000000
A1 |2016-08-01| 66| 77| 88|77.000000000000000000
A1 |2016-09-01| 55| 66| 77|66.000000000000000000
A1 |2016-10-01| 44| 55| 66|55.000000000000000000
A1 |2016-11-01| 33| 44| 55|44.000000000000000000
A1 |2016-12-01| 22| 33| 44|33.000000000000000000
A2 |2016-07-01| 48|- |- |48.000000000000000000
A2 |2016-09-01| 36| 48|- |42.000000000000000000
A2 |2016-10-01| 24| 36|- |30.000000000000000000
A2 |2016-12-01| 12| 24|- |18.000000000000000000
Not an easy task - and maybe a Request for Enhancement to Teradata?
Happy playing -
Marco the Sane
I just finished a second way of dealing with your need. Took me two hours, though, and I had to find the time to do that. But I was curious myself, so it's not wasted time.
The advantage of this approach is that it is more flexible. It's easier to change if you need 4, 5 or 6 months instead of 3, and you don't have to cater for the possibility of components to be NULL, because you can use the normal AVG() OVER().
The downside is a much more complex data preparation phase: You have to fill gaps, containing NULLs for the measure, and create a list of all possible firsts of month between the smallest month1 and the greatest month1 values in the base table. For this purpose, I mimick the TIMESERIES clause from Vertica.
The solution contains a lot that I think will be handy in the survival kit of anybody digging deeper into SQL, like creating in-line tables of consecutive integers, and time series out of these. That's also why I create a series of 100 integers when 7 would be enough. It also shows that a CROSS JOIN is not always a disaster.
I tried to sufficiently comment what I'm doing here, I hope it's enough.
-- WITHOUT RANGE BETWEEN - vertical version
WITH
-- the input data
foo(student_id,month1,fees) AS (
SELECT 'A1',DATE '2016-12-01',22
UNION ALL SELECT 'A1',DATE '2016-11-01',33
UNION ALL SELECT 'A1',DATE '2016-10-01',44
UNION ALL SELECT 'A1',DATE '2016-09-01',55
UNION ALL SELECT 'A1',DATE '2016-08-01',66
UNION ALL SELECT 'A1',DATE '2016-07-01',77
UNION ALL SELECT 'A1',DATE '2016-06-01',88
UNION ALL SELECT 'A2',DATE '2016-12-01',12
UNION ALL SELECT 'A2',DATE '2016-10-01',24
UNION ALL SELECT 'A2',DATE '2016-09-01',36
UNION ALL SELECT 'A2',DATE '2016-07-01',48
)
-- 1. Mimick Vertica's TIMESERIES clause:
-- Prepare a series of month-start dates
-- from the first month to the last month
-- of the time series. Assuming it's more than
-- 10 months:
-- 1.a A series of 100 ints starting from 0
-- 1.a.1 start with 10 ints
, ten_ints(idx) AS (
SELECT 0
UNION ALL SELECT 1
UNION ALL SELECT 2
UNION ALL SELECT 3
UNION ALL SELECT 4
UNION ALL SELECT 5
UNION ALL SELECT 6
UNION ALL SELECT 7
UNION ALL SELECT 8
UNION ALL SELECT 9
)
-- 1.a.2 make 100 out of 10
, idx_series AS (
SELECT
tens.idx * 10 + units.idx AS idx
FROM ten_ints units
CROSS JOIN ten_ints tens
)
-- 1.b get limit dates and total month count
, month_limits AS (
SELECT
MIN(month1) AS start_month
, MAX(month1) AS end_month
, MONTHS_BETWEEN(MAX(month1), MIN(month1)) AS monthcount
FROM foo
)
-- 1.c create an artificial list of all student_id
-- and all possible months to fill gaps
-- This is the end of the TIMESERIES mimick.
, student_month_list AS (
SELECT
student_id
, ADD_MONTHS(start_month,idx) AS month1
FROM month_limits
JOIN idx_series
ON idx <= monthcount
CROSS
JOIN (
SELECT DISTINCT student_id FROM foo
) bar
)
-- This returns:
-- student_id|month1
-- A1 |2016-06-01
-- A1 |2016-07-01
-- A1 |2016-08-01
-- A1 |2016-09-01
-- A1 |2016-10-01
-- A1 |2016-11-01
-- A1 |2016-12-01
-- A2 |2016-06-01
-- A2 |2016-07-01
-- A2 |2016-08-01
-- A2 |2016-09-01
-- A2 |2016-10-01
-- A2 |2016-11-01
-- A2 |2016-12-01
-- Main query:
-- left join student_month_list to the base table
-- and filter out the rows whose measure is NULL
SELECT
mth.student_id
, mth.month1
, AVG(foo.fees) OVER (
PARTITION BY mth.student_id ORDER BY mth.month1
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
) AS running_avg_3months
FROM student_month_list mth
LEFT JOIN foo USING(student_id, month1)
WHERE foo.fees IS NOT NULL
ORDER BY 1,2
;
l want to get the gap between dates range via SQL query lets see the situation:
l have table employees like : Every month the employee deserve payment
ID Name From_date To_date Paid_Amount`
1 ali 01/01/2002 31/01/2002 300
2 ali 01/02/2002 28/02/2002 300
3 ali 01/04/2002 30/04/2002 300
4 ali 01/05/2002 31/05/2002 300
5 ali 01/07/2002 31/07/2002 300
Now, we notice there are no payments in March and June
so, how by SQL query I can't get these months ??
Try this,
with mine(ID,Name,From_date,To_date,Paid_Amount) as
(
select 1,'ali','01/01/2002','31/01/2002',300 from dual union all
select 2,'ali','01/02/2002','28/02/2002',300 from dual union all
select 3,'ali','01/04/2002','30/04/2002',300 from dual union all
select 4,'ali','01/05/2002','31/05/2002',300 from dual union all
select 5,'ali','01/07/2002','31/07/2002',300 from dual
),
gtfirst (fromdt,todt) as (
select min(to_Date(from_Date,'dd/mm/yyyy')) fromdt,max(to_Date(to_Date,'dd/mm/yyyy')) todt from mine
),
dualtbl(first,last,fromdt,todt) as
(
select * from(select TRUNC(ADD_MONTHS(fromdt, rownum-1), 'MM') AS first,TRUNC(LAST_DAY(ADD_MONTHS(fromdt, rownum-1))) AS last,fromdt,todt from gtfirst connect by level <=12)
where first between fromdt and todt and last between fromdt and todt
)
select to_char(first,'month') no_payment_date from dualtbl where first not in (select to_Date(from_Date,'dd/mm/yyyy') from mine)
and first not in (select to_Date(to_date,'dd/mm/yyyy') from mine)
If you want to get the date difference between one payment date and the previous payment date and the ID field is sequential, then you may simply join back to the table and select the previous row.
SELECT X.From_date, Y.From_date, Y.From_date - X.From_date Difference
FROM Employees X
LEFT OUTER JOIN Employees Y ON Y.ID = X.ID - 1
If the ID field is not sequential, then you can use a similar method, but build a temporary table with a row index that you can use to join back to the previous payment.