simple sql over (partition by) not working as expected

simple sql over (partition by) not working as expected - sql

Feels like it should be simple but my mind has gone blank so would appreciate any help!
Let's say I have this dataset
Date sale_id salesperson Missed_payment_this_month
01/01/2016 1001 John 1
01/01/2016 1002 Bob 0
01/01/2016 1003 Bob 0
01/01/2016 1004 John N/A
01/02/2016 1001 John 1
01/02/2016 1002 Bob 1
01/02/2016 1003 Bob 0
01/02/2016 1004 John 1
01/03/2016 1001 John 1
01/03/2016 1002 Bob 0
01/03/2016 1003 Bob 0
01/03/2016 1004 John 1
And want to add these two columns to the end. They look at the number of missed payments previously, by sales_id and salesperson.
Previous_missed_payment_by_sale_id Previous_missed_payment_by_sales person
0 0
0 0
0 0
0 0
1 1
0 0
0 0
0 1
2 3
1 1
0 1
1 3
sales_id is ok but getting it over sales persons is giving me an error (group by) or adding in extra columns. I need to keep the rows constant.
My best guess that returns extra columns:
select t1.Date, t1.sale_id, t1.salesperson
,sum(case when t2.Missed_payment_this_month = '1' then 1 else 0 end) previous_missed_sales_id
,sum(case when t2.Missed_payment_this_month = '1' then 1 else 0 end) OVER (PARTITION by t1.salesperson) previous_missed_salesperson
from [dbo].[simple_join_table2] t1
inner join [dbo].[simple_join_table2] t2 on
(t2.[Date] < t1.[Date] AND t1.[sale_id] = t2.[sale_id])
group by t1.Date, t1.sale_id, t1.salesperson
,case when t2.Missed_payment_this_month = '1' then 1 else 0 end
this is the output:
Date sale_id salesperson previous_missed_sales_id previous_missed_salesperson
01/02/2016 1002 Bob 0 1
01/02/2016 1003 Bob 0 1
01/03/2016 1002 Bob 0 1
01/03/2016 1002 Bob 1 1
01/03/2016 1003 Bob 0 1
01/02/2016 1001 John 1 3
01/02/2016 1004 John 0 3
01/03/2016 1001 John 2 3
01/03/2016 1004 John 0 3
01/03/2016 1004 John 1 3
Is this possible without another sub query? I guess another way to put it is i'm trying to mimic the sumx and earlier functions of Powerpivot.

If you are on 2012+ use windowing aggregates. Previous = sum all_previous_including_curret - sum current. Ms sql default window is exactly ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
with [simple_join_table2] as(
-- sample data
select cast(valuesDate as Date) valuesDate, sale_id, salesperson, Missed_payment_this_month
from (
values
('20160101',1001,'John', 1)
,('20160101',1002,'Bob ', 0)
,('20160101',1003,'Bob ', 0)
,('20160101',1004,'John',null)
,('20160201',1001,'John', 1)
,('20160201',1002,'Bob ', 1)
,('20160201',1003,'Bob ', 0)
,('20160201',1004,'John', 1)
,('20160301',1001,'John', 1)
,('20160301',1002,'Bob ', 0)
,('20160301',1003,'Bob ', 0)
,('20160301',1004,'John', 1)
) t(valuesDate, sale_id, salesperson, Missed_payment_this_month)
)
select valuesDate,sale_id, salesperson, Missed_payment_this_month,
byidprevmonth = sum(Missed_payment_this_month ) over(partition by sale_id order by valuesDate)
- sum(Missed_payment_this_month) over(partition by valuesDate, sale_id),
bypersonprevmonth = sum(Missed_payment_this_month) over(partition by salesperson order by valuesDate)
- sum(Missed_payment_this_month) over(partition by valuesDate, salesperson)
from [simple_join_table2]
order by salesperson, valuesDate

Related

sql finding cid with most expired cards [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 4 months ago.
Improve this question
I have a table Cards(card_id,status,cid)
With the columns:
cid - customer id
status - exp/vld
card_id - card id's
How to find the cid with the most expired cards?

From Oracle 12, you can use:
SELECT cid,
COUNT(*) AS num_exp
FROM cards
WHERE status = 'exp'
GROUP BY cid
ORDER BY num_exp DESC
FETCH FIRST ROW WITH TIES;

You can get count of expired cards for individual customers and then choose customer with MAX count. The below query should give results.
WITH t AS(
SELECT cid, count(1) customer_exp_cards_count
FROM Cards where status = 'exp'
group by cid)
SELECT cid FROM t t1
WHERE t1.customer_exp_cards_count IN (SELECT MAX(t2.customer_exp_cards_count)
FROM t t2)
Sample data and its result:
cardid status cid
3 exp 5
1 exp 1
2 exp 1
3 vld 1
5 vld 1
1 exp 2
2 exp 2
3 exp 2
4 vld 2
5 vld 2
6 exp 3
7 vld 4
4 vld 5
Result:
2

Suppose you have these two tables (just a sample data)
CUSTOMERS
CUST_ID
CUST_NAME
CUST_STATUS
101
John
ACTIVE
102
Annie
ACTIVE
103
Jane
ACTIVE
104
Bob
INACTIVE
CARDS
CARD_ID
CARD_STATUS
CUST_ID
1001001
VALID
101
1001002
VALID
101
1001003
EXPIRED
101
1001004
EXPIRED
101
1001005
VALID
101
1002010
VALID
102
1002020
EXPIRED
102
1002030
EXPIRED
102
1002040
EXPIRED
102
1003100
VALID
103
1003200
VALID
103
If you want just a CUST_ID with the number of most expired cards you can do it without table CUSTOMERS:
Select CUST_ID, EXPIRED_CARDS
From (Select CUST_ID, Count(CARD_ID) "EXPIRED_CARDS" From cards Where CARD_STATUS = 'EXPIRED' Group By CUST_ID)
Where EXPIRED_CARDS = (Select Max(EXPIRED_CARDS) From (Select Count(CARD_ID) "EXPIRED_CARDS" From cards Where CARD_STATUS = 'EXPIRED' Group By CUST_ID) )
--
-- R e s u l t
-- CUST_ID EXPIRED_CARDS
-- ---------- -------------
-- 102 3
Maybe you could consider creating a CTE with the data from both tables which will give you dataset that you could use later for different questions not just for this one. Something like this:
WITH
customers_cards AS
(
Select
cst.CUST_ID,
cst.CUST_NAME,
cst.CUST_STATUS,
crd.CARD_ID,
crd.CARD_STATUS,
Sum(CASE WHEN crd.CUST_ID Is Null Then 0 Else 1 End) OVER(Partition By crd.CUST_ID) "TOTAL_NUM_OF_CARDS",
Sum(CASE WHEN crd.CARD_ID Is Null Then Null WHEN crd.CARD_STATUS = 'VALID' And crd.CARD_ID Is Not Null Then 1 Else 0 End) OVER(Partition By crd.CUST_ID) "VALID_CARDS",
Sum(CASE WHEN crd.CARD_ID Is Null Then Null WHEN crd.CARD_STATUS = 'EXPIRED' And crd.CARD_ID Is Not Null Then 1 Else 0 End) OVER(Partition By crd.CUST_ID) "EXPIRED_CARDS"
From
customers cst
Left Join
cards crd on(crd.CUST_ID = cst.CUST_ID)
)
/* R e s u l t :
CUST_ID CUST_NAME CUST_STATUS CARD_ID CARD_STATUS TOTAL_NUM_OF_CARDS VALID_CARDS EXPIRED_CARDS
---------- --------- ----------- ------- ----------- ------------------ ----------- -------------
101 John ACTIVE 1001001 VALID 5 3 2
101 John ACTIVE 1001002 VALID 5 3 2
101 John ACTIVE 1001003 EXPIRED 5 3 2
101 John ACTIVE 1001004 EXPIRED 5 3 2
101 John ACTIVE 1001005 VALID 5 3 2
102 Annie ACTIVE 1002010 VALID 4 1 3
102 Annie ACTIVE 1002040 EXPIRED 4 1 3
102 Annie ACTIVE 1002030 EXPIRED 4 1 3
102 Annie ACTIVE 1002020 EXPIRED 4 1 3
103 Jane ACTIVE 1003100 VALID 2 2 0
103 Jane ACTIVE 1003200 VALID 2 2 0
104 Bob INACTIVE 0
*/
This can be used to answer many more potential questions. Here is the list of customers sorted by number of expired cards (descending):
Select Distinct
CUST_ID, CUST_NAME, TOTAL_NUM_OF_CARDS, VALID_CARDS, EXPIRED_CARDS
From
customers_cards
Order By
EXPIRED_CARDS Desc Nulls Last, CUST_ID
--
-- R e s u l t :
-- CUST_ID CUST_NAME TOTAL_NUM_OF_CARDS VALID_CARDS EXPIRED_CARDS
-- ---------- --------- ------------------ ----------- -------------
-- 102 Annie 4 1 3
-- 101 John 5 3 2
-- 103 Jane 2 2 0
-- 104 Bob 0
OR to answer your question:
Select Distinct
CUST_ID, CUST_NAME, TOTAL_NUM_OF_CARDS, VALID_CARDS, EXPIRED_CARDS
From
customers_cards
Where
EXPIRED_CARDS = (Select Max(EXPIRED_CARDS) From customers_cards)
Order By
CUST_ID
--
-- R e s u l t :
-- CUST_ID CUST_NAME TOTAL_NUM_OF_CARDS VALID_CARDS EXPIRED_CARDS
-- ---------- --------- ------------------ ----------- -------------
-- 102 Annie 4 1 3
Regards...

SQL - How to find if the combination of column has occured before or not?

Following example demonstrates the question
id
location
dt
1
India
2020-01-01
2
Usa
2020-02-01
1
Usa
2020-03-01
3
China
2020-04-01
1
India
2020-05-01
2
France
2020-06-01
1
India
2020-07-01
2
Usa
2020-08-01
This table is sorted by date.
I want to create another column, which would tell if the id has been to the location before or not.
So, The output would be like
id
location
dt
travelled
1
India
2020-01-01
0
2
Usa
2020-02-01
0
1
Usa
2020-03-01
0
3
China
2020-04-01
0
1
India
2020-05-01
1
2
France.
2020-06-01
0
1
India
2020-07-01
1
2
Usa
2020-08-01
1
The issue I am facing is, For every row, I need to consider only the rows above it.

Use EXISTS in a CASE expression:
SELECT t1.id, t1.location,
CASE
WHEN EXISTS (
SELECT 1
FROM tablename t2
WHERE t2.id = t1.id AND t2.location = t1.location AND t2.date < t1.date
) THEN 1
ELSE 0
END travelled
FROM tablename t1

I would strongly recommend window functions for this:
select t.*,
(case when row_number() over (partition by id, location order by date) > 1
then 1 else 0
end) as travelled
from t;
Window functions are usually faster than alternative methods.

Removing Duplicates Based date

I have the following Query to select (will use for an update statement) remove duplicates based on the min service date and keeping the most recent svc date.
select st.SubID, st.RecordNo, st.Fname, st.Lname, st.MemberID, st.ServiceDate, IsDeduped, DedupCriteria
from stagingtable st
join (select MemberID
from stagingtable
where SubID = 99999
and waveseqid = 1
group by MemberID
having count(*) > 1) st2
on st.MemberID = st2.MemberID
and st.ServiceDate = (Select min(ServiceDate) from stagingtable s where s.subid = 99999 and s.waveseqid = 1 and st.MemberID = s.MemberID)
where SubID = 99999
and waveseqid = 1
order by RecordNo
This seems to pull in sometime only pull in multiples with the same date for the memberid:
SurveyID RecordNo Fname Lname MemberID Option9 IsDeduped DedupCriteria
99999 1 John Doe 123 10/1/2015 0 NULL x These show on the query
99999 2 John Doe 123 10/1/2015 0 NULL x These show on the query
99999 3 John Doe 123 10/8/2015 0 NULL But expected these as well
99999 4 John Doe 123 10/12/2015 0 NULL But expected these as well
99999 4 John Doe 123 10/14/2015 0 NULL But expected these as well
99999 6 John Doe 123 10/29/2015 0 NULL But expected these as well
99999 7 John Doe 123 12/14/2015 0 NULL But expected these as well

Your "AND" statement restricts the results to only rows with the minimum service date.
and st.ServiceDate = (Select min(ServiceDate) from stagingtable s where s.subid = 99999 and s.waveseqid = 1 and st.MemberID = s.MemberID)
That's why you get two rows and not all of them.

Multiple joins with aggregates

I have the two following tables:
Person:
EntityId FirstName LastName
----------- ------------------ -----------------
1 Ion Ionel
2 Fane Fanel
3 George Georgel
4 Mircea Mircel
SalesQuotaHistory
SalesQuotaId EntityId SalesQuota SalesOrderDate
------------ ----------- ----------- -----------------------
1 1 1000 2014-01-01 00:00:00.000
2 1 1000 2014-01-02 00:00:00.000
3 1 1000 2014-01-03 00:00:00.000
4 3 3000 2013-01-01 00:00:00.000
5 3 3000 2013-01-01 00:00:00.000
7 4 4000 2015-01-01 00:00:00.000
8 4 4000 2015-01-02 00:00:00.000
9 4 4000 2015-01-03 00:00:00.000
10 1 1000 2015-01-01 00:00:00.000
11 1 1000 2015-01-02 00:00:00.000
I am trying to get the SalesQuota for each user in 2014 and 2015.
Using this query i am getting an erroneous result:
SELECT p.EntityId
, p.FirstName
, SUM(sqh2014.SalesQuota) AS '2014'
, SUM(sqh2015.SalesQuota) AS '2015'
FROM Person p
LEFT OUTER JOIN SalesQuotaHistory sqh2014
ON p.EntityId = sqh2014.EntityId
AND YEAR(sqh2014.SalesOrderDate) = 2014
LEFT OUTER JOIN SalesQuotaHistory sqh2015
ON p.EntityId = sqh2015.EntityId
AND YEAR(sqh2015.SalesOrderDate) = 2015
GROUP BY p.EntityId, p.FirstName
EntityId FirstName 2014 2015
--------- ----------- ---------- --------------------
1 Ion 6000 6000
2 Fane NULL NULL
3 George NULL NULL
4 Mircea NULL 12000
In fact, Id 1 has a total SalesQuota of 3000 in 2014 and 2000 in 2015.
What i am asking here, is .. what is really happening behind the scenes? What is the order of operation in this specific case?
Thanks to my last post i was able to solve this using the following query:
SELECT p.EntityId
, p.FirstName
, SUM(CASE WHEN YEAR(sqh.SalesOrderDate) = 2014 THEN sqh.SalesQuota ELSE 0 END) AS '2014'
, SUM(CASE WHEN YEAR(sqh.SalesOrderDate) = 2015 THEN sqh.SalesQuota ELSE 0 END) AS '2015'
FROM Person p
LEFT OUTER JOIN SalesQuotaHistory sqh
ON p.EntityId = sqh.EntityId
GROUP BY p.EntityId, p.FirstName
EntityId FirstName 2014 2015
----------- --------------------- ----------- -----------
1 Ion 3000 2000
2 Fane 0 0
3 George 0 0
4 Mircea 0 12000
but without understanding what's wrong with the first attempt .. i can't get over this ..
Any explanation would be greatly appreciated.

Is easy to see what is happening if you change your select to
SELECT *
and remove the group by
You first approach need something like this
Sql Fiddle Demo
SELECT p.[EntityId]
, p.FirstName
, COALESCE(s2014,0) as [2014]
, COALESCE(s2015,0) as [2015]
FROM Person p
LEFT JOIN (SELECT EntityId, SUM(SalesQuota) s2014
FROM SalesQuotaHistory
WHERE YEAR(SalesOrderDate) = 2014
GROUP BY EntityId
) as s1
ON p.[EntityId] = s1.EntityId
LEFT JOIN (SELECT EntityId, SUM(SalesQuota) s2015
FROM SalesQuotaHistory
WHERE YEAR(SalesOrderDate) = 2015
GROUP BY EntityId
) as s2
ON p.[EntityId] = s2.EntityId
Joining with the result data only if exist for that id and year.
OUTPUT
| EntityId | FirstName | 2014 | 2015 |
|----------|-----------|------|-------|
| 1 | Ion | 3000 | 2000 |
| 2 | Fane | 0 | 0 |
| 3 | George | 0 | 0 |
| 4 | Mircea | 0 | 12000 |

You have multiple rows for each year, so the first method is producing a Cartesian product.
For instance, consider EntityId 100:
1 1 1000 2014-01-01 00:00:00.000
2 1 1000 2014-01-02 00:00:00.000
3 1 1000 2014-01-03 00:00:00.000
10 1 1000 2015-01-01 00:00:00.000
11 1 1000 2015-01-02 00:00:00.000
The intermediate result from the join produces six rows, with these SalesQuotaId:
1 10
1 11
2 10
2 11
3 10
3 11
You can then do the math -- the result is off because of the multiple rows.
You seem to know how to fix the problem. The conditional aggregation approach produces the correct answer.

You could improve the speed of your query by adding a WHERE condition to filter only the years over which you're looking for data:
SELECT p.EntityId
, p.FirstName
, SUM(CASE WHEN YEAR(sqh.SalesOrderDate) = 2014
THEN sqh.SalesQuota ELSE 0 END) AS '2014'
, SUM(CASE WHEN YEAR(sqh.SalesOrderDate) = 2015
THEN sqh.SalesQuota ELSE 0 END) AS '2015'
FROM Person p
LEFT OUTER JOIN SalesQuotaHistory sqh
ON p.EntityId = sqh.EntityId
WHERE YEAR(sqh.SalesOrderDate) IN (2014, 2015)
GROUP BY p.EntityId, p.FirstName
Otherwise, the query that you found is the way to go (good job!)

Getting the sum of columns based on row values

I have a table that looks like the following.
EMPNUM EMPNAME LOCATION CATEGORY COUNT
123 JOHN DOE BLDG A 1 5
123 JOHN DOE BLDG A 1 6
123 JOHN DOE BLDG A 2 4
123 JOHN DOE BLDG A 3 7
123 JOHN DOE BLDG B 1 1
123 JOHN DOE BLDG B 2 3
234 EMILY DOE BLDG A 1 1
234 EMILY DOE BLDG A 2 2
234 EMILY DOE BLDG A 3 4
234 EMILY DOE BLDG B 2 3
234 EMILY DOE BLDG B 2 9
234 EMILY DOE BLDG B 3 3
I would like to transport it into columns that will yield to an output similar to below. I need to get the sum of COUNT based on the values of LOCATION and CATEGORY
EMPNUM EMPNAME SUM_A1 SUM_A2 SUM_A3 SUM_B1 SUM_B2 SUM_B3
123 JOHN DOE 11 4 7 1 3 0
234 EMILY DOE 1 2 4 0 12 3
Is there any way to do this as an SQL query? or in Crystal reports (though I prefer output using SQL)

If you are using 11g or later try
select * from table1
PIVOT (SUM("COUNT")
FOR ("LOCATION","CATEGORY") IN
(('BLDG A',1) AS sum_a1,
('BLDG A',2) AS sum_a2,
('BLDG A',3) AS sum_a3,
('BLDG B',1) AS sum_b1,
('BLDG B',2) AS sum_b2,
('BLDG B',3) AS sum_b3));
Here is a fiddle
Otherwise use APC's solution

This will work providing the values in LOCATION and CATEGORY are constant:
select empnum
, empname
, sum(case when location='BLDG A' and category = 1 then count else 0 end) sum_a1
, sum(case when location='BLDG A' and category = 2 then count else 0 end) sum_a2
, sum(case when location='BLDG A' and category = 3 then count else 0 end) sum_a3
, sum(case when location='BLDG B' and category = 1 then count else 0 end) sum_b1
, sum(case when location='BLDG B' and category = 2 then count else 0 end) sum_b2
, sum(case when location='BLDG B' and category = 3 then count else 0 end) sum_b3
from your_table
group by empnum
, empname
If the values are not known or not stable when you run the query you will need to use dynamic SQL.
Note that if you are on 11g you should employ A B Cade's PIVOT solution, which is more elegant.

The other answers will work great if you have a known number of values to transform into columns. But if you have an unknown number, then you can use dynamic sql to generate the results.
You would create the following procedure:
CREATE OR REPLACE procedure test_dynamic_pivot(p_cursor in out sys_refcursor)
as
sql_query varchar2(1000) := 'select empnum, empname';
begin
for x in (select distinct location, category from yourtable order by 1)
loop
sql_query := sql_query ||
' , sum(case when location = '''||x.location||''' and category='||x.category||' then cnt else 0 end) as sum_'||substr(x.location, -1, 1)||x.category;
dbms_output.put_line(sql_query);
end loop;
sql_query := sql_query || ' from yourtable group by empnum, empname';
open p_cursor for sql_query;
end;
/
And then to execute it:
variable x refcursor
exec test_dynamic_pivot(:x)
print x
The result is the same as the hard-coded version:
| EMPNUM | EMPNAME | SUM_A1 | SUM_A2 | SUM_A3 | SUM_B1 | SUM_B2 | SUM_B3 |
----------------------------------------------------------------------------
| 234 | EMILY DOE | 1 | 2 | 4 | 0 | 12 | 3 |
| 123 | JOHN DOE | 11 | 4 | 7 | 1 | 3 | 0 |

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

simple sql over (partition by) not working as expected - sql

Related

sql finding cid with most expired cards [closed]

SQL - How to find if the combination of column has occured before or not?

Removing Duplicates Based date

Multiple joins with aggregates

Getting the sum of columns based on row values

Categories

Resources