SQL - Snowflake - Inner Join not working as expected - sql

I have a table ADS in snowflake like so (data is being inserted each day), note there are duplicates entries on rows 3 and 4:
ID
REPORT_DATE
CLICKS
IMPRESSIONS
1
Jan 01
20
400
1
Jan 02
25
600
1
Jan 03
80
900
1
Jan 03
80
900
2
Jan 01
30
500
2
Jan 02
55
650
2
Jan 03
90
950
I want to select all entries based on ID with the max REPORT_DATE - essentially I want to know the latest number of CLICKS and IMPRESSIONS for each ID:
ID
REPORT_DATE
CLICKS
IMPRESSIONS
1
Jan 03
80
900
2
Jan 03
90
950
This query successfully gives me the max DATE for each ID:
SELECT
MAX(REPORT_DATE),
ID
FROM ADS
GROUP BY
ID;
Result:
ID
MAX(REPORT_DATE)
1
Jan 03
2
Jan 03
However, when I try to conduct an inner join, duplicates arise:
SELECT
a.ID,
a.REPORT_DATE,
a.CLICKS,
a.IMPRESSIONS
FROM ADS a
INNER JOIN (
SELECT
MAX(REPORT_DATE),
ID
FROM ADS
GROUP BY
ID
) b
ON a.ID = b.ID
AND a.REPORT_DATE = b.REPORT_DATE;
Result:
ID
REPORT_DATE
CLICKS
IMPRESSIONS
1
Jan 03
80
900
1
Jan 03
80
900
2
Jan 03
90
950
How can I construct my query to remove these duplicates?

You could use QUALIFY and ROW_NUMBER():
SELECT a.ID,a.REPORT_DATE,a.CLICKS,a.IMPRESSIONS
FROM ADS a
QUALIFY ROW_NUMBER() OVER(PARTITION BY ID ORDER BY REPORT_DATE DESC) = 1;
Please note that ORDER BY REPORT_DATE is not stable(in case of a tie). I would suggest adding another column for sorting that is the tuple is always unique.
If the rows that have a tie are the same it actually is not an issue.

You can use row_number() window function:
select id, report_date, clicks, impresions from
(
select id, report_date, clicks, impresions, row_number()over(partition by id order
by report_date desc) rnk from ADs
)t
where rn=1

Related

How to remove duplicates records based on some condition in oracle sql?

I have obtained this table by using multiple joins
E_name s_date year h_value l_value update_date
a 01-08-2012 2012 25 70 01-01-2012
a 23-06-2012 2010 20 55 01-01-2009
a 19-03-2020 2020 210 540 29-04-2020
a 14-02-2020 2020 78 765 29-04-2020
b 27-12-2018 2018 14 29 31-01-2019
b 19-12-2018 2018 17 30 19-12-2018
I want to remove duplicates based on E_name and year.
if the next record has the same E_name and year as previous, then
row with most recent update_date will be considered
if both update_date are the same then the row with the most recent s_date will be considered
Required Output
E_name s_date year h_value l_value update_date
a 01-08-2012 2012 25 70 01-01-2012
a 23-06-2012 2010 20 55 01-01-2009
a 19-03-2020 2020 210 540 29-04-2020
b 27-12-2018 2018 14 29 31-01-2019
You need a group by and a row_number() on top
Select * from
( Select e_name,"year",
maxdate,update_date,
row_number() over (partition by e_name,"year" order by
update_date desc) as rn
from
( Select e_name,"year",
update_date,max(s_date) as maxdate from
sample
group by
e_name,"year",update_date
)
)
where rn =1
check this output link fiddle :http://sqlfiddle.com/#!4/c1646/23
A query like below may do the tricks. Put your oriented data into a Temp Table and apply below query on your Temp Table
with MyCTE
as
(
select
E_Name
,S_Date
,year
,H_value
,L_value
,update_date
, RANK() over (partition by E_Name,year order by update_date desc,S_DATE desc) as ranking
from TempTable
)
select * from MyCTE where ranking=1

fetching records for previous month

item loc year month quantity startdate
XYZ A 2020 1 3 23-06-2020
ABC B 2020 2 218 24-06-2020
SDC C 2020 6 107 25-06-2020
QWE D 2020 7 144 25-06-2020
XYZ A 2019 12 89 23-06-2020
ABC B 2019 11 218 24-06-2020
SDC C 2020 5 117 25-06-2020
QWE D 2020 6 144 25-06-2020
if i consider the above table then my output should look like this:
item loc year month quantity startdate
XYZ A 2020 1 89 23-06-2020
ABC B 2020 2 3 24-06-2020
SDC C 2020 6 117 25-06-2020
QWE D 2020 7 144 25-06-2020
so u can see that only quantities values changed and that we are taking from previos months and rest columns values are as it is.
It looks like you want window function lag(). For your sample data, this would produce the desired results:
select *
from (
select
item,
loc,
year,
month,
lag(quantity) over(partition by item, loc order by year, month) quantity,
startdate
from mytable
) t
where quantity is not null
Consider query which works in Access database:
SELECT Table1.*, (SELECT TOP 1 quantity FROM Table1 AS Dupe
WHERE Dupe.item = Table1.item AND Dupe.loc = Table1.loc
AND DateSerial(Dupe.[Year],Dupe.[Month],1)<DateSerial(Table1.[Year],Table1.[Month],1)
ORDER BY DateSerial(Dupe.[Year],Dupe.[Month],1)) AS PrevQty
FROM Table1;
If you want to return 0 when there is a gap in month sequence, consider:
SELECT Table1.*, Nz((SELECT quantity FROM Table1 AS Dupe
WHERE Dupe.item = Table1.item AND Dupe.loc = Table1.loc
AND DateSerial(Dupe.[Year],Dupe.[Month],1)=DateAdd("m",-1,DateSerial(Table1.[Year],Table1.[Month],1))
ORDER BY DateSerial(Dupe.[Year],Dupe.[Month],1)),0) AS PrevQty
FROM Table1;
Or
SELECT Q1.*, Nz(Q2.quantity,0) AS PrevQty FROM (
SELECT Table1.*, DateSerial([Year],[Month],1) AS FD FROM Table1) AS Q1
LEFT JOIN (
SELECT Table1.*, DateAdd("m",+1,DateSerial([Year],[Month],1)) AS PD FROM Table1) AS Q2
ON Q1.FD=Q2.PD AND Q1.item=Q2.item and Q1.loc=Q2.loc;

Grouped conditional sum in Oracle SQL

my_table shows the account balance of each person's credits N months ago. From this table, I want to get the monthly sum of each person's balances for the past 2 and 3 months and divide each sum by 2 and 3 respectively (that is, a moving average of the sum of balance for the last 2 and 3 months).
Please note that I need the sum of the balance in the past M months divided by M months.
PERSON_ID CRED_ID MONTHS_BEFORE BALANCE
01 01 1 1100
01 01 2 1500
01 01 3 2000
01 02 1 50
01 02 2 400
01 02 3 850
02 06 1 300
02 06 2 320
02 11 1 7500
02 11 2 10000
One way to do this would be to:
select
person_id, sum(balance) / 2 as ma_2
from
my_table
where
months_before <= 2
group by
person_id
and merge this result with
select
person_id, sum(balance) / 3 as ma_3
from
my_table
where
months_before <= 3
group by
person_id
I want to know if this can be handled with a case or a conditional sum or something along these lines:
select
person_id,
sum(balance) over (partition by person_id when months_before <= 2) / 2 as ma_2,
sum(balance) over (partition by person_id when months_before <= 3) / 3 as ma_3
from
my_table
The desired result would look as follows:
PERSON_ID MA_2 MA_3
01 1525.00 1966.66
02 9060.00 9060.00
If these two queries gives what you want and you need to merge them then only ma_2 needs conditional sum:
select person_id,
sum(case when months_before <= 2 then balance end) / 2 as ma_2,
sum(balance) / 3 as ma_3
from my_table
where months_before <= 3
group by person_id
dbfiddle
If you had a "month" column, you would use a window function:
select t.*,
avg(balance) over (partition by person_id
order by month
rows between 2 preceding and current row
) as avg_3month
from t;

SQL NOOB - Oracle joins and Row Number

I was hoping to get some guidance on a SQL script I am trying to put together for Oracle database 11g.
I am attempting to perform a count of claims from the 'claim' table, and order them by year / month / and enterprise.
I was able to get a count of claims and order them like I would like, however I need to pull data from another table and I am having trouble combining the 'row_number' function with a join.
Here is my script so far:
SELECT TO_CHAR (SYSTEM_ENTRY_DATE, 'YYYY') YEAR,
TO_CHAR (SYSTEM_ENTRY_DATE, 'MM') MONTH,
ENTERPRISE_IID,
COUNT (*) CLAIMS
FROM (SELECT CLAIM.CLAIM_EID,
CLAIM.SYSTEM_ENTRY_DATE,
CLAIM.ENTERPRISE_IID,
ROW_NUMBER () OVER (PARTITION BY CLAIM.CLAIM_EID, CLAIM.ENTERPRISE_IID
ORDER BY CLAIM.SYSTEM_ENTRY_DATE DESC) RN
FROM CLAIM
WHERE CLAIM_IID IN (SELECT DISTINCT (CLAIM_IID)
FROM CLAIM_LINE
WHERE STATUS <> 'D')
AND CLAIM.CONTEXT = '1'
AND CLAIM.CLAIM_STATUS = 'A'
AND CLAIM.LAST_ANALYSIS_DATE IS NOT NULL)
WHERE RN = 1
GROUP ENTERPRISE_IID,
TO_CHAR (SYSTEM_ENTRY_DATE, 'YYYY'),
TO_CHAR (SYSTEM_ENTRY_DATE, 'MM');
So far all of my data is coming from the 'claim' table. This pulls the following result:
YEAR MONTH ENTERPRISE_IID CLAIMS
---- ----- -------------- ----------
2016 01 6 1
2015 08 6 3
2016 02 6 2
2015 09 6 2
2015 07 6 2
2015 09 5 22
2015 11 5 29
2015 12 5 27
2016 04 5 8
2015 07 5 29
2015 05 5 15
2015 06 5 5
2015 10 5 45
2016 03 5 54
2015 03 5 10
2016 02 5 70
2016 01 5 55
2015 08 5 32
2015 04 5 12
19 rows selected.
The enterprise_IID is the primary key on the 'enterprise' table. The 'enterprise' table also contains the 'name' attribute for each entry. I would like to join the claim and enterprise table in order to show the enterprise name for this count, and not the enterprise_IID.
As you can tell I am rather new to Oracle and SQL, and I am a bit stuck on this one. I was thinking that I should do an inner join between the two tables, but I am not quite sure how to do that when using the row_number function.
Or perhaps I am taking the wrong approach here, and someone could push me in another direction.
Here is what I tried:
SELECT TO_CHAR (SYSTEM_ENTRY_DATE, 'YYYY') YEAR,
TO_CHAR (SYSTEM_ENTRY_DATE, 'MM') MONTH,
ENTERPRISE_IID,
ENTERPRISE.NAME,
COUNT (*) CLAIMS
FROM (SELECT CLAIM.CLAIM_EID,
CLAIM.SYSTEM_ENTRY_DATE,
CLAIM.ENTERPRISE_IID,
ROW_NUMBER () OVER (PARTITION BY CLAIM.CLAIM_EID, CLAIM.ENTERPRISE_IID
ORDER BY CLAIM.SYSTEM_ENTRY_DATE DESC) RN
FROM CLAIM, enterprise
INNER JOIN ENTERPRISE
ON CLAIM.ENTERPRISE_IID = ENTERPRISE.ENTERPRISE_IID
WHERE CLAIM_IID IN (SELECT DISTINCT (CLAIM_IID)
FROM CLAIM_LINE
WHERE STATUS <> 'D')
AND CLAIM.CONTEXT = '1'
AND CLAIM.CLAIM_STATUS = 'A'
AND CLAIM.LAST_ANALYSIS_DATE IS NOT NULL)
WHERE RN = 1
GROUP BY ENTERPRISE.NAME,
ENTERPRISE_IID,
TO_CHAR (SYSTEM_ENTRY_DATE, 'YYYY'),
TO_CHAR (SYSTEM_ENTRY_DATE, 'MM');
Thank you in advance!
"Desired Output"
YEAR MONTH NAME CLAIMS
---- ----- ---- ----------
2016 01 Ent1 1
2015 08 Ent1 3
2016 02 Ent1 2
2015 09 Ent1 2
2015 07 Ent1 2
2015 09 Ent2 22
2015 11 Ent2 29
2015 12 Ent2 27
2016 04 Ent2 8
2015 07 Ent2 29
2015 05 Ent2 15
2015 06 Ent2 5
2015 10 Ent2 45
2016 03 Ent2 54
2015 03 Ent2 10
2016 02 Ent2 70
2016 01 Ent2 55
2015 08 Ent2 32
2015 04 Ent2 12
19 rows selected.
You can try this. Joins can be used when calculating row numbers with row_number function.
SELECT TO_CHAR (SYSTEM_ENTRY_DATE, 'YYYY') YEAR,
TO_CHAR (SYSTEM_ENTRY_DATE, 'MM') MONTH,
ENTERPRISE_IID,
NAME,
COUNT (*) CLAIMS
FROM (SELECT CLAIM.CLAIM_EID,
CLAIM.SYSTEM_ENTRY_DATE,
CLAIM.ENTERPRISE_IID,
ENTERPRISE.NAME,
ROW_NUMBER () OVER (PARTITION BY CLAIM.CLAIM_EID, CLAIM.ENTERPRISE_IID
ORDER BY CLAIM.SYSTEM_ENTRY_DATE DESC) RN
FROM CLAIM --, enterprise (this is not required as the table is being joined already)
INNER JOIN ENTERPRISE ON CLAIM.ENTERPRISE_IID = ENTERPRISE.ENTERPRISE_IID
INNER JOIN (SELECT DISTINCT CLAIM_IID FROM CLAIM_LINE WHERE STATUS <> 'D') CLAIM_LINE
ON CLAIM.CLAIM_IID = CLAIM_LINE.CLAIM_IID
WHERE CLAIM.CONTEXT = '1'
AND CLAIM.CLAIM_STATUS = 'A'
AND CLAIM.LAST_ANALYSIS_DATE IS NOT NULL) t
WHERE RN = 1
GROUP BY NAME, --ENTERPRISE.NAME (The alias ENTERPRISE is not accessible here.)
ENTERPRISE_IID,
TO_CHAR(SYSTEM_ENTRY_DATE, 'YYYY'),
TO_CHAR(SYSTEM_ENTRY_DATE, 'MM');
I'd write the query like this:
SELECT TO_CHAR(TRUNC(c.system_entry_date,'MM'),'YYYY') AS year
, TO_CHAR(TRUNC(c.system_entry_date,'MM'),'MM') AS month
, e.enterprise_name AS name
, COUNT(*) AS claims
FROM (
SELECT r.claim_eid
, r.enterprise_iid
, MAX(r.system_entry_date) AS system_entry_date
FROM ( SELECT DISTINCT l.claim_iid
FROM claim_line l
WHERE l.status <> 'D'
) d
JOIN claim r
ON r.claim_iid = d.claim_iid
AND r.context = '1'
AND r.claim_status = 'A'
AND r.last_analysis_date IS NOT NULL
GROUP
BY r.claim_eid
, r.enterprise_iid
) c
JOIN enterprise e
ON e.enterprise_iid = c.enterprise_iid
GROUP
BY c.enterprise_iid
, TRUNC(c.system_entry_date,'MM')
, e.enterprise_name
ORDER
BY e.enterprise_name
, TRUNC(c.system_entry_date,'MM')
A few notes:
I prefer to qualify ALL column references with the table name or short table alias, and assign aliases to all inline views.
Since the usage of ROW_NUMBER() appears to be get the "latest" system_entry_date for a claim and eliminate duplicates, I'd prefer to use a GROUP BY and a MAX() aggregate.
I prefer to use a join operation rather than the NOT IN (subquery) pattern. (Or, I would tend to use a NOT EXISTS (correlated subquery) pattern.
I don't think it matters too much if you use TO_CHAR or EXTRACT. The TO_CHAR gets you the leading zero in the month, I don't think EXTRACT(MONTH ) gets you the leading zero. I'd use whichever gets me closest to the resultset I need.Personally, I would return just a single column, either containing the year and month as one string e.g. TO_CHAR( , 'YYYYMM') or just a DATE value. It all depends what I'm going to be doing with that.
Just hypothesis to start with, because requirement of query output unclear:
SELECT
C.ENTERPRISE_IID,
E.ENTERPRISE_NAME,
extract(year from CLAIM.SYSTEM_ENTRY_DATE) SYSTEM_ENTRY_YEAR,
extract(month from CLAIM.SYSTEM_ENTRY_DATE) SYSTEM_ENTRY_MONTH,
count(distinct C.CLAIM_EID) CLAIM_COUNT
FROM
CLAIM C,
ENTERPRISE E
WHERE
C.CLAIM_IID IN (
SELECT DISTINCT (CLAIM_IID)
FROM CLAIM_LINE
WHERE STATUS <> 'D'
)
AND C.CONTEXT = '1'
AND C.CLAIM_STATUS = 'A'
AND C.LAST_ANALYSIS_DATE IS NOT NULL
AND E.ENTERPRISE_IID = C.ENTERPRISE_IID
GROUP BY
C.ENTERPRISE_IID,
E.ENTERPRISE_NAME,
extract(year from CLAIM.SYSTEM_ENTRY_DATE),
extract(month from CLAIM.SYSTEM_ENTRY_DATE)
ORDER BY
extract(year from CLAIM.SYSTEM_ENTRY_DATE),
extract(month from CLAIM.SYSTEM_ENTRY_DATE),
E.ENTERPRISE_NAME

SQL Report for grouping by date ranges

I have a table which stores data like this:
ItemID Date Value
01 1/1/15 1
01 2/1/15 2
01 3/1/15 0
01 4/1/15 0
01 5/1/15 3
01 6/1/15 1
How do I generate a report in SQL which would show the begin and end dates of all zero periods per item?
In this example, I would get :
ItemID Start End
01 3/1/14 4/1/15
The condition is that there will be multiple zero periods during the year, and all of them should appear in the report (so simple group by will not do).
Thanks very much!
This will return the START and END dates of all continuous zero VALUE.
SQL Fiddle
;WITH Cte AS(
SELECT *,
RN = DATEADD(MONTH,- ROW_NUMBER() OVER(PARTITION BY ItemID ORDER BY [Date]), [Date])
FROM Test
WHERE Value = 0
)
SELECT
ItemID,
Start = MIN([Date]),
[End] = MAX([Date])
FROM Cte
GROUP BY
ItemID, RN
Sample Data
ItemID Date Value
------ ---------- -----------
01 2015-01-01 1
01 2015-02-01 2
01 2015-03-01 0
01 2015-04-01 0
01 2015-05-01 3
01 2015-06-01 1
01 2015-07-01 0
01 2015-08-01 0
01 2015-09-01 0
RESULT
ItemID Start End
------ ---------- ----------
01 2015-03-01 2015-04-01
01 2015-07-01 2015-09-01
A more general solution (works with 2012+):
with x as (
select *,
case when lag(value) over(partition by itemid order by date) <> value then 1 else 0 end as l
from #t
),
y as (
select *, sum(l) over(partition by itemid order by date) as grp
from x
where value = 0
)
select itemid, min(date), max(date)
from y
group by itemid, grp
order by itemid, grp