How to select rows where values changed for an ID - sql

I have a table that looks like the following
id effective_date number_of_int_customers
123 10/01/19 0
123 02/01/20 3
456 10/01/19 6
456 02/01/20 6
789 10/01/19 5
789 02/01/20 4
999 10/01/19 0
999 02/01/20 1
I want to write a query that looks at each ID to see if the salespeople have newly started working internationally between October 1st and February 1st.
The result I am looking for is the following:
id effective_date number_of_int_customers
123 02/01/20 3
999 02/01/20 1
The result would return only the salespeople who originally had 0 international customers and now have at least 1.
I have seen similar posts here that use nested queries to pull records where the first date and last have different values. But I only want to pull records where the original value was 0. Is there a way to do this in one query in SQL?

In your case, a simple aggregation would do -- assuming that 0 is the earliest value:
select id, max(number_of_int_customers)
from t
where effective_date in ('2019-10-01', '2020-02-01')
group by id
having min(number_of_int_customers) = 0;
Obviously, this is not correct if the values can decrease to zero. But this having clause fixes that problem:
having min(case when number_of_int_customers = 0 then effective_date end) = min(effective_date)
An alternative is to use window functions, such asfirst_value():
select distinct id, last_noic
from (select t.*,
first_value(number_of_int_customers) over (partition by id order by effective_date) as first_noic,
first_value(number_of_int_customers) over (partition by id order by effective_date desc) as last_noic,
from t
where effective_date in ('2019-10-01', '2020-02-01')
) t
where first_noic = 0;
Hmmm, on second thought, I like lag() better:
select id, number_of_int_customers
from (select t.*,
lag(number_of_int_customers) over (partition by id order by effective_date) as prev_noic
from t
where effective_date in ('2019-10-01', '2020-02-01')
) t
where prev_noic = 0;

Related

How to get the 2nd record for a customer purchase?

I'm working on a customers database and I want to get all data for their second purchase (for all of our customer weather they have 2 or more purchases).
For example:
Customer_ID Order_ID Order_Date
1 259 09/05/2020
1 644 03/11/2020
1 617 18/04/2022
4 834 22/09/2021
4 995 07/02/2022
I want to display the second order which is:
Customer_ID Order_ID Order_Date
1 644 03/11/2020
4 995 07/02/2022
I'm facing some difficulties in finding the right logic, any idea how I can achieve my end goal? :)
*Note: I'm using snowflake
You can use a ROW_NUMBER and filter using QUALIFY clause:
select * from table qualify row_number() over(partition by customer_id order by order_date) = 2;
You can use common table expression
with CTE_RS
AS (
SELECT Customer_ID,ORDER_ID,Order_Date,ROW_NUMBER() OVER(PARTITION BY Customer_ID ORDER BY Order_Date ) ORDRNUM FROM *TABLE NAME*
)
SELECT Customer_ID,ORDER_ID,Order_Date
FROM CTE_RS
WHERE ORDRNUM = 2 ;

Count lead duplicate rows

I have the below table
Table A:
row_number id start_dt end_dt cust_dt cust_id
1 101 4/8/19 4/20/19 4/10/19 725
2 101 4/21/19 5/20/19 4/10/19 456
3 101 5/1/19 6/30/19 4/10/19 725
4 101 7/1/19 8/20/19 4/10/19 725
I need to count "duplicates" in a table for testing purposes.
Criteria:
Need to exclude the start_dt and end_dt from my calculation.
It's only a duplicate if lead row is duplicated. So, for example row 1, row 3 or 4 are the same but only row 3 and 4 would be considered duplicates in this example.
What I have tried:
rank with a lead and self join but that doesn't seem to be working on my end.
How can I count the id to determine if there are duplicates?
Output: (something like below)
count id
2 101
End results for me is to have a count of 1 for the table
count id
1 101
Use row_number analytical function as following (gaps and island problem):
Select count(1), id from
(Select t.*,
row_number() over (order by row_number) as rn,
row_number() over (partition by id, cust_dt, cust_id order by row_number) as part_rn
From your_table t)
Group by id, cust_dt, cust_id, (rn-part_rn)
Having count(1) > 1
db<>fiddle demo
Cheers!!
If your definition of a duplicated row is: the CUST_IDin the lead row (with same id order by row_number) equalst to the current CUST_ID,
you may write it down simple using the LEAD analytic function.
select ID, ROW_NUMBER, CUST_ID,
case when CUST_ID = lead(CUST_ID) over (partition by id order by ROW_NUMBER) then 1 end is_dup
from tab
ID ROW_NUMBER CUST_ID IS_DUP
---------- ---------- ---------- ----------
101 1 725
101 2 456
101 3 725 1
101 4 725
The aggregated query to get the number of duplicated rows would than be
with dup as (
select ID, ROW_NUMBER, CUST_ID,
case when CUST_ID = lead(CUST_ID) over (partition by id order by ROW_NUMBER) then 1 end is_dup
from tab)
select ID, sum(is_dup) dup_cnt
from dup
group by ID
ID DUP_CNT
---------- ----------
101 1

SQL: Take maximum value, but if a field is missing for a particular ID, ignore all values

This is somewhat difficult to explain...(this is using SQL Assistant for Teradata, which I'm not overly familiar with).
ID creation_date completion_date Difference
123 5/9/2016 5/16/2016 7
123 5/14/2016 5/16/2016 2
456 4/26/2016 4/30/2016 4
456 (null) 4/30/2016 (null)
789 3/25/2016 3/31/2016 6
789 3/1/2016 3/31/2016 30
An ID may have more than one creation_date, but it will always have the same completion_date. If the creation_date is populated for all records for an ID, I want to return the record with the most recent creation_date. However, if ANY creation_date for a given ID is missing, I want to ignore all records associated with this ID.
Given the data above, I would want to return:
ID creation_date completion_date Difference
123 5/14/2016 5/16/2016 2
789 3/25/2016 3/31/2016 6
No records are returned for 456 because the second record has a missing creation_date. The record with the most recent creation_date is returned for 123 and 789.
Any help would be greatly appreciated. Thanks!
Depending on your database, here's one option using row_number to get the max date per group. You can then filter those results with not exists to check against null values:
select *
from (
select *,
row_number() over (partition by id order by creation_date desc) rn
from yourtable
) t
where rn = 1 and not exists (
select 1
from yourtable t2
where t2.creationdate is null and t.id = t2.id
)
row_number is a window function that is supported in many databases. mysql doesn't but you can achieve the same result using user-defined variables.
Here is a more generic version using conditional aggregation:
select t.*
from yourtable t
join (select id, max(creation_date) max_creation_date
from yourtable
group by id
having count(case when creation_date is null then 1 end) = 0
) t2 on t.id = t2.id and t.creation_date = t2.max_creation_date
SQL Fiddle Demo

Select info from table where row has max date

My table looks something like this:
group date cash checks
1 1/1/2013 0 0
2 1/1/2013 0 800
1 1/3/2013 0 700
3 1/1/2013 0 600
1 1/2/2013 0 400
3 1/5/2013 0 200
-- Do not need cash just demonstrating that table has more information in it
I want to get the each unique group where date is max and checks is greater than 0. So the return would look something like:
group date checks
2 1/1/2013 800
1 1/3/2013 700
3 1/5/2013 200
attempted code:
SELECT group,MAX(date),checks
FROM table
WHERE checks>0
GROUP BY group
ORDER BY group DESC
problem with that though is it gives me all the dates and checks rather than just the max date row.
using ms sql server 2005
SELECT group,MAX(date) as max_date
FROM table
WHERE checks>0
GROUP BY group
That works to get the max date..join it back to your data to get the other columns:
Select group,max_date,checks
from table t
inner join
(SELECT group,MAX(date) as max_date
FROM table
WHERE checks>0
GROUP BY group)a
on a.group = t.group and a.max_date = date
Inner join functions as the filter to get the max record only.
FYI, your column names are horrid, don't use reserved words for columns (group, date, table).
You can use a window MAX() like this:
SELECT
*,
max_date = MAX(date) OVER (PARTITION BY group)
FROM table
to get max dates per group alongside other data:
group date cash checks max_date
----- -------- ---- ------ --------
1 1/1/2013 0 0 1/3/2013
2 1/1/2013 0 800 1/1/2013
1 1/3/2013 0 700 1/3/2013
3 1/1/2013 0 600 1/5/2013
1 1/2/2013 0 400 1/3/2013
3 1/5/2013 0 200 1/5/2013
Using the above output as a derived table, you can then get only rows where date matches max_date:
SELECT
group,
date,
checks
FROM (
SELECT
*,
max_date = MAX(date) OVER (PARTITION BY group)
FROM table
) AS s
WHERE date = max_date
;
to get the desired result.
Basically, this is similar to #Twelfth's suggestion but avoids a join and may thus be more efficient.
You can try the method at SQL Fiddle.
Using an in can have a performance impact. Joining two subqueries will not have the same performance impact and can be accomplished like this:
SELECT *
FROM (SELECT msisdn
,callid
,Change_color
,play_file_name
,date_played
FROM insert_log
WHERE play_file_name NOT IN('Prompt1','Conclusion_Prompt_1','silent')
ORDER BY callid ASC) t1
JOIN (SELECT MAX(date_played) AS date_played
FROM insert_log GROUP BY callid) t2
ON t1.date_played = t2.date_played
SELECT distinct
group,
max_date = MAX(date) OVER (PARTITION BY group), checks
FROM table
Should work.

fill in a null cell with cell from previous record

Hi I am using DB2 sql to fill in some missing data in the following table:
Person House From To
------ ----- ---- --
1 586 2000-04-16 2010-12-03
2 123 2001-01-01 2012-09-27
2 NULL NULL NULL
2 104 2004-01-01 2012-11-24
3 987 1999-12-31 2009-08-01
3 NULL NULL NULL
Where person 2 has lived in 3 houses, but the middle address it is not known where, and when. I can't do anything about what house they were in, but I would like to take the previous house they lived at, and use the previous To date to replace the NULL From date, and use the next address info and use the From date to replace the null To date ie.
Person House From To
------ ----- ---- --
1 586 2000-04-16 2010-12-03
2 123 2001-01-01 2012-09-27
2 NULL 2012-09-27 2004-01-01
2 104 2004-01-01 2012-11-24
3 987 1999-12-31 2009-08-01
3 NULL 2009-08-01 9999-01-01
I understand that if there is no previous address before a null address, that will have to stay null, but if a null address is the last know address I would like to change the To date to 9999-01-01 as in person 3.
This type of problem seems to me where set theory no longer becomes a good solution, however I am required to find a DB2 solution because that's what my boss uses!
any pointers/suggestions welcome.
Thanks.
It might look something like this:
select
person,
house,
coalesce(from_date, prev_to_date) from_date,
case when rn = 1 then coalesce (to_date, '9999-01-01')
else coalesce(to_date, next_from_date) end to_date
from
(select person, house, from_date, to_date,
lag(to_date) over (partition by person order by from_date nulls last) prev_to_date,
lead(from_date) over (partition by person order by from_date nulls last) next_from_date,
row_number() over (partition by person order by from_date desc nulls last) rn
from temp
) t
The above is not tested but it might give you an idea.
I hope in your actual table you have a column other than to_date and from_date that allows you to order rows for each person, otherwise you'll have trouble sorting NULL dates, as you have no way of knowing the actual sequence.
create table Temp
(
person varchar(2),
house int,
from_date date,
to_date date
)
insert into temp values
(1,586,'2000-04-16','2010-12-03 '),
(2,123,'2001-01-01','2012-09-27'),
(2,NULL,NULL,NULL),
(2,104,'2004-01-01','2012-11-24'),
(3,987,'1999-12-31','2009-08-01'),
(3,NULL,NULL,NULL)
select A.person,
A.house,
isnull(A.from_date,BF.to_date) From_date,
isnull(A.to_date,isnull(CT.From_date,'9999-01-01')) To_date
from
((select *,ROW_NUMBER() over (order by (select 0)) rownum from Temp) A left join
(select *,ROW_NUMBER() over (order by (select 0)) rownum from Temp) BF
on A.person = BF.person and
A.rownum = BF.rownum + 1)left join
(select *,ROW_NUMBER() over (order by (select 0)) rownum from Temp) CT
on A.person = CT.person and
A.rownum = CT.rownum - 1