SQL - Count of instances between two dates for mutliple criteria - sql

I have a table of account numbers with a date range. I must use another table of data to determine how many times we interacted with that account between the date range. I'm at a loss where to even start.
Table1
+------+------------+------------+
| Acct | EndDate | StartDate |
+------+------------+------------+
| 1 | 2017-02-14 | 2016-12-16 |
| 2 | 2017-02-14 | 2016-12-16 |
| 3 | 2017-02-13 | 2016-12-15 |
+------+------------+------------+
Table2
+------+--------------+
| acct | calllog_date |
+------+--------------+
| 1 | 2016-06-16 |
| 1 | 2016-08-15 |
| 1 | 2015-11-10 |
| 2 | 2015-11-10 |
| 2 | 2015-11-13 |
| 2 | 2015-11-16 |
| 2 | 2015-11-19 |
| 3 | 2015-11-19 |
| 3 | 2015-11-23 |
| 4 | 2015-11-30 |
+------+--------------+

Try using a JOIN based on a match between the date values:
SELECT t1.Acct, COUNT(*) AS cnt
FROM Table1 AS t1
JOIN Table2 AS t2
ON t1.Acct = t2.Acct AND t2.calllog_date BETWEEN t1.StartDate AND t1.EndDate
GROUP BY t1.Acct

Related

Join with other table timestamp and sum of the columns

I'm trying to generate an aggregate table.
Lets say this is my tblA.
| type | name | timestap |
|------|------|---------------------|
| prod | t1 | 2020-06-01 01:00:00 |
| prod | t2 | 2020-06-01 01:00:02 |
| prod | t3 | 2020-06-01 01:00:03 |
| test | t4 | 2020-06-01 02:20:02 |
| test | t5 | 2020-06-01 02:20:03 |
and tblB
| tid | starttime | name | subtask | maintask |
|-----|---------------------|------|---------|----------|
| 1 | 2020-06-01 01:10:00 | t1 | 5 | 10 |
| 1 | 2020-06-01 01:10:00 | t1 | 6 | 10 |
| 1 | 2020-06-01 01:10:00 | t1 | 7 | 10 |
| 1 | 2020-06-01 01:10:00 | t1 | 8 | 10 |
| 2 | 2020-06-01 00:01:00 | t1 | 3 | 10 |
| 2 | 2020-05-01 00:02:00 | t1 | 5 | 15 |
| 4 | 2020-06-01 01:00:00 | t2 | 10 | 10 |
| 5 | 2020-06-01 11:00:10 | t2 | 10 | 20 |
| 5 | 2020-06-01 11:00:10 | t2 | 11 | 20 |
| 5 | 2020-06-01 11:00:10 | t2 | 12 | 20 |
Now I need to create a report table with the sum of subtask and main task. But there is where condition, we need to pick the tid,subtask, maintask where the starttime is greater than than the tblA's timestamp for each name.Then do the SUM.
Expected output:
| type | name | sum_of_subtask | sum_of_maintask | diff |
|------|------|----------------|-----------------|------|
| prod | t1 | 26 | 40 | 14 |
| prod | t2 | 33 | 60 | 27 |
For t1, the tid would be 1, because its starttime is > tblA.timestamp
for t2, the tid is 5, tblB.starttime > tblA.timestamp
Also the other condition the rows Im going to pick is the MAX(tid)
where starttime is > tblA.timestamp.
Then get the rows and do the sum find the difference between sum_of_subtask,sum_of_maintask on diff column.
I'm not sure how to write the logic for this.
You need simple join and sum for aggregation. here is the demo.
select
type,
ta.name,
sum(subtask) as sum_of_subtask,
sum(maintask) as sum_of_maintask,
sum(maintask - subtask) as diff
from tblA ta
join tblB tb
on ta.name = tb.name
where starttime > timestap
group by
type,
ta.name;
output:
| type | name | sum_of_subtask | sum_of_maintask | diff |
| ---- | ---- | -------------- | --------------- | ---- |
| prod | t1 | 26 | 40 | 14 |
| prod | t2 | 33 | 60 | 27 |

Return Max Value Date for each group in Netezza SQL

+--------+---------+----------+------------+------------+
| CASEID | USER ID | TYPE | OPEN_DT | CLOSED_DT |
+--------+---------+----------+------------+------------+
| 1 | 1000 | MA | 2017-01-01 | 2017-01-07 |
| 2 | 1000 | MB | 2017-07-15 | 2017-07-22 |
| 3 | 1000 | MA | 2018-02-20 | NULL |
| 8 | 1001 | MB | 2017-05-18 | 2018-02-18 |
| 9 | 1001 | MA | 2018-03-05 | 2018-04-01 |
| 7 | 1002 | MA | 2018-06-01 | 2018-07-01 |
+--------+---------+----------+------------+------------+
This is a snippet of my data set. I need a query that returns just the max(OPEN_DT) row for each USER_ID in Netezza SQL.
so given the above the results would be:
| CASEID | USERID | TYPE | OPEN_DT | CLOSED_DT |
| 3 | 1000 | MA | 2018-02-20 | NULL |
| 9 | 1001 | MA | 2018-03-05 | 2018-04-01 |
| 7 | 1002 | MA | 2018-06-01 | 2018-07-01 |
Any help is very much appreciated!
You can use correlated subquery :
select t.*
from table t
where open_dt = (select max(t1.open_dt) from table t1 where t1.user_id = t.user_id);
You can also row_number() :
select t.*
from (select *, row_number() over (partition by user_id order by open_dt desc) as seq
from table t
) t
where seq = 1;
However if you have a ties with open_dt then you would need to use limit clause with correlated subquery but i am not sure about the ties so i just leave it.

SQL Query check result with other rows in same table

I want build SQL query to select row based on date range and also want additional columns
Following is my sample table
+----+--------------+--------------+--------------+--------------+
| id | CustomerID | AccountID | DateFrom | DateTo |
+----+--------------+--------------+--------------+--------------+
| 1 | C0001 | A0001 | 21/01/2016 | 28/01/2016 |
| 2 | C0001 | A0001 | 01/02/2016 | 08/02/2016 |
| 3 | C0002 | A0002 | 09/02/2016 | 16/02/2016 |
| 4 | C0002 | A0002 | 14/01/2016 | 21/01/2016 |
| 5 | C0003 | A0003 | 07/01/2016 | 14/01/2016 |
| 6 | C0003 | A0003 | 09/02/2016 | 16/02/2016 |
| 7 | C0004 | A0004 | 01/01/2016 | 07/01/2016 |
| 8 | C0004 | A0004 | 09/03/2016 | 16/03/2016 |
+----+--------------+--------------+--------------+--------------+
If if pass Date Range is from 01/02/2016 to 28/02/2016 then I need result as follows
+----+-------------+------------+--------------+--------------+-------------+
| id | CustomerID | AccountID | DateFrom | DateTo | isPrevious |
+----+-------------+------------+--------------+--------------+-------------+
| 1 | C0001 | A0001 | 01/02/2016 | 08/02/2016 | Yes |
| 2 | C0002 | A0002 | 09/02/2016 | 16/02/2016 | Yes |
| 3 | C0003 | A0003 | 09/02/2016 | 16/02/2016 | Yes |
+----+-------------+------------+--------------+--------------+-------------+
This asume you want the data be full contain and no just overlaps with the target range. Also assume none of your row ranges overlap each other.
SELECT t1.*,
CASE WHEN (SELECT 1
FROM yourTable t2
WHERE t2.DateFrom < t1.DateFrom
and t2.CustomerID = t1.CustomerID
and t2.AccountID = t1.AccountID ) IS NULL
THEN 'No'
ELSE 'Yes'
END as isPrevious
FROM yourTable t1
WHERE DateFrom <= '2016-02-28' and
DateTo >= '2016-02-01';
If I understand correctly, this is a simple select:
select t.*
from t
where DateFrom <= '2016-02-28' and
DateTo >= '2016-02-01';
At least, this would give the results as you describe them.

How to get cumulative max records from the table without using OLAP function?

I have a table like below
-------------------------------------
| Id | startdate | enddate |rate|
-------------------------------------
| 1 | 1/1/2015 | 2/1/2015 | 10 |
| 1 | 2/1/2015 | 3/1/2015 | 15 |
| 1 | 3/1/2015 | 4/1/2015 | 5 |
| 1 | 4/1/2015 | 5/1/2015 | 10 |
| 1 | 5/1/2015 | 6/1/2015 | 20 |
| 1 | 6/1/2015 | 7/1/2015 | 30 |
| 1 | 7/1/2015 | 8/1/2015 | 10 |
| 1 | 8/1/2015 | 9/1/2015 | 30 |
| 1 | 9/1/2015 | 12/31/2015 | 20 |
------------------------------------
I need to populate cumulative max values for each id (Id=1 for this example) including the first record, like below (SQL server 2008):
----------------------------------
| Id | startdate | enddate |rate |
----------------------------------
| 1 | 1/1/2015 | 2/1/2015 | 10 |
| 1 | 2/1/2015 | 3/1/2015 | 15 |
| 1 | 5/1/2015 | 6/1/2015 | 20 |
| 1 | 6/1/2015 | 7/1/2015 | 30 |
| 1 | 8/1/2015 | 9/1/2015 | 30 |
-----------------------------------
Can any one help me on this?
You can calculate the cumulative max in SQL Server 2008 using outer apply:
select t.*, t2.maxrate
from t outer apply
(select max(t2.rate) as maxrate
from t t2
where t2.startdate <= t.startdate
) t2;
Your question appears to be about filtering, not just calculating the cumulative maximum value. You can select the rows with the max rate using a subquery:
select t.*
from (select t.*, t2.maxrate
from t outer apply
(select max(t2.rate) as maxrate
from t t2
where t2.startdate <= t.startdate
) t2
) t
where t.rate = t.maxrate;
This will return duplicates in a row. A better way is to use exists:
select t.*
from t
where not exists (select 1
from t t2
where t2.rate > t.rate and t2.startdate < t.startdate
);

Update using Self Join Sql Server

I have huge data and sample of the table looks like below
+-----------+------------+-----------+-----------+
| Unique_ID | Date | RowNumber | Flag_Date |
+-----------+------------+-----------+-----------+
| 1 | 6/3/2014 | 1 | 6/3/2014 |
| 1 | 5/22/2015 | 2 | NULL |
| 1 | 6/3/2015 | 3 | NULL |
| 1 | 11/20/2015 | 4 | NULL |
| 2 | 2/25/2014 | 1 | 2/25/2014 |
| 2 | 7/31/2014 | 2 | NULL |
| 2 | 8/26/2014 | 3 | NULL |
+-----------+------------+-----------+-----------+
Now I need to check if the difference between Date in 2nd row and Flag_date in 1st row. If the difference is more than 180 then 2nd row Flag_date should be updated with the date in 2nd row else it needs to be updated by Flag_date in 1st Row. And same rule follows for all rows with same unique_ID
update a
set a.Flag_Date=case when DATEDIFF(dd,b.Flag_Date,a.[Date])>180 then a.[Date] else b.Flag_Date end
from Table1 a
inner join Table1 b
on a.RowNumber=b.RowNumber+1 and a.Unique_ID=b.Unique_ID
The above update query when executed once, only the second row under each Unique_ID gets updated and result looks like below
+-----------+------------+-----------+------------+
| Unique_ID | Date | RowNumber | Flag_Date |
+-----------+------------+-----------+------------+
| 1 | 2014-06-03 | 1 | 2014-06-03 |
| 1 | 2015-05-22 | 2 | 2015-05-22 |
| 1 | 2015-06-03 | 3 | NULL |
| 1 | 2015-11-20 | 4 | NULL |
| 2 | 2014-02-25 | 1 | 2014-02-25 |
| 2 | 2014-07-31 | 2 | 2014-02-25 |
| 2 | 2014-08-26 | 3 | NULL |
+-----------+------------+-----------+------------+
And I need to run four times to achieve my desired result
+-----------+------------+-----------+------------+
| Unique_ID | Date | RowNumber | Flag_Date |
+-----------+------------+-----------+------------+
| 1 | 2014-06-03 | 1 | 2014-06-03 |
| 1 | 2015-05-22 | 2 | 2015-05-22 |
| 1 | 2015-06-03 | 3 | 2015-05-22 |
| 1 | 2015-11-20 | 4 | 2015-11-20 |
| 2 | 2014-02-25 | 1 | 2014-02-25 |
| 2 | 2014-07-31 | 2 | 2014-02-25 |
| 2 | 2014-08-26 | 3 | 2014-08-26 |
+-----------+------------+-----------+------------+
Is there a way where I can run update only once and all the rows are updated.
Thank you!
If you are using SQL Server 2012+, then you can use lag():
with toupdate as (
select t1.*,
lag(flag_date) over (partition by unique_id order by rownumber) as prev_flag_date
from table1 t1
)
update toupdate
set Flag_Date = (case when DATEDIFF(day, prev_Flag_Date, toupdate.[Date]) > 180
then toupdate.[Date] else prev_Flag_Date
end);
Both this version and your version can take advantage of an index on table1(unique_id, rownumber) or, better yet, table1(unique_id, rownumber, flag_date).
EDIT:
In earlier versions, this might have better performance:
with toupdate as (
select t1.*, t2.flag_date as prev_flag_date
from table1 t1 outer apply
(select top 1 t2.flag_date
from table1 t2
where t2.unique_id = t1.unique_id and
t2.rownumber < t1.rownumber
order by t2.rownumber desc
) t2
)
update toupdate
set Flag_Date = (case when DATEDIFF(day, prev_Flag_Date, toupdate.[Date]) > 180
then toupdate.[Date] else prev_Flag_Date
end);
The CTE can make use of the same index -- and it is important to have the index. The reason for the better performance is because your join on row_number() cannot use an index on that field.