Group Duplicates in different results depending on location in data set

Group Duplicates in different results depending on location in data set - sql

We would like to see how long a call has been with a department in our ticket system, we cannot use the min and max date from the call as the call can go to one department more than once:
A call can be with support , goes to branches and then come back to support so we cannot use min and max by group as it will show that the call has been with support the entire life cycle of the call.
I have a result that brings back the same information but for different times, I would like to group these into their own result
I have tried to use ranking but this didn't resolve the problem as the same rank applies to the value even when it is further down in the result set
select
min(update_time), max(update_time) ,assigned_group,version,update_time,
datediff(HOUR,min(update_time), max(Update_time)) as difference ,
dense_rank() Over (partition by assigned_group order by version ) as
pDenserank,
rank() Over (partition by assigned_group order by version) as prank,
dense_rank() Over (order by assigned_group) as denserank,
rank() Over (order by assigned_group) as rank,
assign_counter
from service_req_history
where id = 405012
group by version, assign_counter,update_time ,
assigned_group,version,update_time
order by assign_counter]
Current Result Set
I would like to see the following Results: Please see attached file
Min Update Time Max Update Time assigned_group Days with Department
2019/07/19 16:28 2019/07/22 09:01 Support 3
2019/07/22 11:32 2019/08/26 13:25 Branches 4
2019/08/26 15:44 2019/08/28 11:22 Support 2
2019/08/28 11:47 2019/08/28 15:32 Technical 0
Expected result Set
Your input would be highly appreciated, thanking you in advance.
Regards Charl

To start with you might want to do something as below:
SELECT MIN(update_time) 'Min Update Time'
,MAX(update_time) 'Max Update Time'
,assigned_group
,SUM(Call_Duration)/60.0/24 AS 'Days with Department'
FROM (
SELECT LAG(update_time) OVER(ORDER BY update_time ASC) prev_update_time
,update_time
,DATEDIFF(MINUTE, LAG(update_time) OVER(ORDER BY update_time ASC), Update_time) AS 'Call_Duration'
,assigned_group
FROM service_req_history
WHERE id = 405012
) AS CallDurationSet
GROUP BY assigned_group
To get other id's, you may want to remove the WHERE clause and add "id" column in the GROUP BY.

Related

SQL: How to create supplemental time-series records "out of thin air" from existing records

Suppose I have a table CUSTEVENTS listing customers active in certain months. I now want to consider a customer as being active even if it was in the prior two months.
Simple example, the data might start as:
MONTH_ENDING
CUSTNUM
2022-10-31
72378
2022-11-30
72378
It should be transformed into the following, given the expanded definition of active:
MONTH_ENDING
CUSTNUM
2022-10-31
72378
2022-11-30
72378
**2022-12-31
72378**
**2023-01-31
72378***
I'm arrive at the simplest / most elegant way to get there. I could certainly explode out the data using a time series reference table which would list all the pairs of MONTH_ENDING and "additional" MONTH_ENDING values that "count". Or perhaps I could UNION three subqueries that take the MONTH_ENDING, add_months(MONTH_ENDING,1) add_months(MONTH_ENDING,2). But, maybe there's something even more concise not involving multiple unioned queries or an instrumental time-mapping table.
I happen to be using Teradata but I'm not sure I care about platform-specificity; if there's a Teradata-only approach that works, I'll gladly take it.

The general approach is to first calculate the "Last" event time for a given customer, which is handled by something like
LAG(EVENT_DT) OVER (PARTITION BY CUSTNUM ORDER BY EVENT_DT)
The next concept is islands. You want to calculate that an island begins if the event happened after {your window} has elapsed from the prior one. Vice versa to calculate the island's end.
You can actually find some great online articles about this classic problem: Gaps and Islands problem.
If you understand CTE's, you can probably follow it through this example code I wrote. The first CTE is there to simply allow you to easily add a condition (instead of 1=1) for the events you care about.
WITH CTE_CONDITION AS (
SELECT
EVENT_DT AS dtm,
CUSTNUM
FROM
My_First_Table
WHERE
1 = 1
AND EVENT_DT is not null
),
CTE_LAGGED AS (
SELECT
dtm,
CUSTNUM,
LAG(dtm) OVER (
PARTITION BY CUSTNUM
ORDER BY
dtm
) AS previous_datetime,
LEAD(dtm) OVER (
PARTITION BY CUSTNUM
ORDER BY
dtm
) AS next_datetime,
ROW_NUMBER() OVER (
PARTITION BY CUSTNUM
ORDER BY
CTE_CONDITION.dtm
) AS island_location
FROM
CTE_CONDITION
),
CTE_ISLAND_START AS (
SELECT
ROW_NUMBER() OVER (
PARTITION BY CUSTNUM
ORDER BY
dtm
) AS island_number,
CUSTNUM,
dtm AS island_start_datetime,
island_location AS island_start_location
FROM
CTE_LAGGED
WHERE
(
DATEDIFF(MONTH, previous_datetime, dtm) > 2
OR CTE_LAGGED.previous_datetime IS NULL
)
),
CTE_ISLAND_END AS (
SELECT
ROW_NUMBER() OVER (
PARTITION BY CUSTNUM
ORDER BY
dtm
) AS island_number,
CUSTNUM,
dtm AS island_end_datetime,
island_location AS island_end_location
FROM
CTE_LAGGED
WHERE
DATEDIFF(MONTH, dtm, next_datetime) > 2
OR CTE_LAGGED.next_datetime IS NULL
)
SELECT
CTE_ISLAND_START.CUSTNUM,
CTE_ISLAND_START.island_start_datetime,
CTE_ISLAND_END.island_end_datetime,
DATEDIFF(
MONTH, CTE_ISLAND_START.island_start_datetime,
CTE_ISLAND_END.island_end_datetime
) AS ISLAND_DURATION_MONTH,
(
SELECT
COUNT(*)
FROM
CTE_LAGGED
WHERE
CTE_LAGGED.dtm BETWEEN CTE_ISLAND_START.island_start_datetime
AND CTE_ISLAND_END.island_end_datetime
AND CTE_LAGGED.CUSTNUM = CTE_ISLAND_START.CUSTNUM
AND CTE_LAGGED.CUSTNUM = CTE_ISLAND_START.CUSTNUM
) AS island_row_count
FROM
CTE_ISLAND_START
INNER JOIN CTE_ISLAND_END ON CTE_ISLAND_END.island_number = CTE_ISLAND_START.island_number
AND CTE_ISLAND_START.CUSTNUM = CTE_ISLAND_END.CUSTNUM
I wrote this into a Rasgo template using Snowflake syntax, but only minor adjustments should be needed to get this to work in Teradata.
Once you have this result, then this tells you the periods of activity that include the 2 month window. You can then use a calendar table at each month-begin and query or not whether the customer was "active" or not based on whether that date falls into these active ranges.

How to remove non exact duplicates in SQL Server

Currently I can get data that is from each report and filtered by case type and again on case open and for each casereport that I want.
However as a case can be open over several months I want Only want the first month it appears. for instance a case could be open in each report 201904, 201905 and then reopened in 201911, alot of info on that case changes so its not an exact duplicate, however I am only after the data for the case in the 201904 report.
Currently I am using the following code
Select ReportDate, CaseNo, Est, CaseType
From output.casedata
Where casetype='family' and Status='Open' AND (
Reportdate='201903' OR Reportdate='201904' OR Reportdate='201905'
or Reportdate='201906' or Reportdate='201907' or Reportdate='201908'
or Reportdate='201909' or Reportdate='201910' or Reportdate='201911'
or Reportdate='201912' or Reportdate='202001' or Reportdate='202002'
)

You can use the rank window function to find the row with the first date per case number, and then take all the details from it:
SELECT *
FROM (SELECT *, RANK() OVER (PARTITION BY CaseNo ORDER BY Reportdate) AS rk
FROM output.casedata
WHERE casetype = 'family' AND status='Open') t
WHERE rk = 1

If I followed your correctly, you want the earliest open record per case.
The following should to what you expect:
select c.*
from output.casedata c
where c.reportdate = (
select min(c1.reportdate)
where
c1.caseno = c.caseno
and c1.casetype = 'family'
and c1.status = 'open'
and c1.reportdate between '201903' and '202002'
)
For performance, you want an index on (caseno, casttype, status, reportdate).
Note that I simplifie the filter on reportdate to use between instead of enumerating all possible values.

Get sum of previous 6 values including the group

I need to sum up the values for the last 7 days,so it should be the current plus the previous 6. This should happen for each row i.e. in each row the column value would be current + previous 6.
The case :-
(Note:- I will calculate the hours,by suming up the seconds).
I tried using the below query :-
select SUM([drivingTime]) OVER(PARTITION BY driverid ORDER BY date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW)
from [f.DriverHseCan]
The problem I face is I have to do grouping on driver,asset for a date
In the above case,the driving time should be sumed up and then,its previous 6 rows should be taken,
I cant do this using rank() because I need these rows as well as I have to show it in the report.
I tried doing this in SSRS and SQL both.
In short it is adding total driving time for current+ 6 previous days

Try the following query
SELECT
s.date
, s.driverid
, s.assetid
, s.drivingtime
, SUM(s2.drivingtime) AS total_drivingtime
FROM f.DriverHseCan s
JOIN (
SELECT date,driverid, SUM(drivingtime) drivingtime
FROM f.DriverHseCan
GROUP BY date,driverid
) AS s2
ON s.driverid = s2.driverid AND s2.date BETWEEN DATEADD(d,-6,s.date) AND s.date
GROUP BY
s.date
, s.driverid
, s.assetid
, s.drivingtime
If you have week start/end dates, there could be better performing alternatives to solve your problem, e.g. use the week number in SSRS expressions rather than do the self join on SQL server

I think aggregation does what you want:
select sum(sum([drivingTime])) over (partition by driverid
order by date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
)
from [f.DriverHseCan]
group by driverid, date

I guess you need to use CROSS APPLY.
Something like following? :
SELECT driverID,
date,
CA.Last6DayDrivingTime
FROM YourTable YT
CROSS APPLY
(
SELECT SUM(drivingTime) AS Last6DayDrivingTime
FROM YourTable CA ON CA.driverID=YT.driverID
WHERE CA.date BETWEEN DATEADD(DAY,-6,YT.date) AND YT.date)
) CA
Edit:
As you commented that cross apply slow down the performance, other option is to pre calculate the week values in temp table or using CTE and then use them in your main query.

SQL Query GroupBy with date parameter

Suppose I have a table, TeamRatings, that looks something like this
|---Team----|--ValuationDate--|-Rating-|
|--Saints---|---10/15/2012----|---81.1-|
|--Broncos--|---10/15/2012----|---91.1-|
|--Ravens---|---10/16/2012----|--101.1-|
|--Broncos--|---10/22/2012----|---82.1-|
|--Ravens---|---10/22/2012----|---83.1-|
|--Saints---|---10/29/2012----|---84.1-|
|--Broncos--|---10/28/2012----|---85.1-|
|--Ravens---|---10/29/2012----|---86.1-|
Also, it is assumed that a team's rating remains unchanged until they play a new game, (representing a new record). E.g. The Broncos' rating on date 10/21/2012 is assumed to be 102.8
I want a query with a date parameter, that will return one record per team represnting that team's most recent game prior to the date specified. For instance,
If I input 10/23/2012 as my date parameter, the query should return
|---Team---|-ValuationDate---|-Rating-|
|--Saints--|---10/15/2012----|---81.1-|
|--Broncos-|---10/22/2012----|---82.1-|
|--Ravens--|---10/22/2012----|---83.1-|
Any help is greatly appreciated. Thanks!

On MS SQL Server 2005 or greater you can use a cte with ROW_NUMBER function:
WITH x
AS (SELECT team,
valuationdate,
rating,
rn = Row_number()
OVER(
partition BY team
ORDER BY valuationdate DESC)
FROM teamratings
WHERE valuationdate < #DateParam)
SELECT team,
valuationdate,
rating
FROM x
WHERE rn = 1

You can use a more general query like this:
select Team, x.ValuationDate, Rating
from TeamRatings inner join
(
select Team, max(ValuationDate) as ValuationDate
from TeamRatings
where ValuationDate < #dateParameter
group by Team
) x on TeamRatings.Team = x.Team and TeamRatings.ValuationDate = x.ValuationDate

Datediff between two tables

I have those two tables
1-Add to queue table
TransID , ADD date
10 , 10/10/2012
11 , 14/10/2012
11 , 18/11/2012
11 , 25/12/2012
12 , 1/1/2013
2-Removed from queue table
TransID , Removed Date
10 , 15/1/2013
11 , 12/12/2012
11 , 13/1/2013
11 , 20/1/2013
The TansID is the key between the two tables , and I can't modify those tables, what I want is to query the amount of time each transaction spent in the queue
It's easy when there is one item in each table , but when the item get queued more than once how do I calculate that?

Assuming the order TransIDs are entered into the Add table is the same order they are removed, you can use the following:
WITH OrderedAdds AS
( SELECT TransID,
AddDate,
[RowNumber] = ROW_NUMBER() OVER(PARTITION BY TransID ORDER BY AddDate)
FROM AddTable
), OrderedRemoves AS
( SELECT TransID,
RemovedDate,
[RowNumber] = ROW_NUMBER() OVER(PARTITION BY TransID ORDER BY RemovedDate)
FROM RemoveTable
)
SELECT OrderedAdds.TransID,
OrderedAdds.AddDate,
OrderedRemoves.RemovedDate,
[DaysInQueue] = DATEDIFF(DAY, OrderedAdds.AddDate, ISNULL(OrderedRemoves.RemovedDate, CURRENT_TIMESTAMP))
FROM OrderedAdds
LEFT JOIN OrderedRemoves
ON OrderedAdds.TransID = OrderedRemoves.TransID
AND OrderedAdds.RowNumber = OrderedRemoves.RowNumber;
The key part is that each record gets a rownumber based on the transaction id and the date it was entered, you can then join on both rownumber and transID to stop any cross joining.
Example on SQL Fiddle

DISCLAIMER: There is probably problem with this, but i hope to send you in one possible direction. Make sure to expect problems.
You can try in the following direction (which might work in some way depending on your system, version, etc) :
SELECT transId, (sum(add_date_sum) - sum(remove_date_sum)) / (1000*60*60*24)
FROM
(
SELECT transId, (SUM(UNIX_TIMESTAMP(add_date)) as add_date_sum, 0 as remove_date_sum
FROM add_to_queue
GROUP BY transId
UNION ALL
SELECT transId, 0 as add_date_sum, (SUM(UNIX_TIMESTAMP(remove_date)) as remove_date_sum
FROM remove_from_queue
GROUP BY transId
)
GROUP BY transId;
A bit of explanation: as far as I know, you cannot sum dates, but you can convert them to some sort of timestamps. Check if UNIX_TIMESTAMPS works for you, or figure out something else. Then you can sum in each table, create union by conveniently leaving the other one as zeto and then subtracting the union query.
As for that devision in the end of first SELECT, UNIT_TIMESTAMP throws out miliseconds, you devide to get days - or whatever it is that you want.
This all said - I would probably solve this using a stored procedure or some client script. SQL is not a weapon for every battle. Making two separate queries can be much simpler.

Answer 2: after your comments. (As a side note, some of your dates 15/1/2013,13/1/2013 do not represent proper date formats )
select transId, sum(numberOfDays) totalQueueTime
from (
select a.transId,
datediff(day,a.addDate,isnull(r.removeDate,a.addDate)) numberOfDays
from AddTable a left join RemoveTable r on a.transId = r.transId
order by a.transId, a.addDate, r.removeDate
) X
group by transId
Answer 1: before your comments
Assuming that there won't be a new record added unless it is being removed. Also note following query will bring numberOfDays as zero for unremoved records;
select a.transId, a.addDate, r.removeDate,
datediff(day,a.addDate,isnull(r.removeDate,a.addDate)) numberOfDays
from AddTable a left join RemoveTable r on a.transId = r.transId
order by a.transId, a.addDate, r.removeDate

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Group Duplicates in different results depending on location in data set - sql

Related

SQL: How to create supplemental time-series records "out of thin air" from existing records

How to remove non exact duplicates in SQL Server

Get sum of previous 6 values including the group

SQL Query GroupBy with date parameter

Datediff between two tables

Categories

Resources