Summerize Time in area - sql

i just have huge data of technitions that can be OnSite or OnTheWay ,
i want to summerize in witch site they been and for how long.
Example:
id UpdateTime UserName SiteID
488565 2019-02-18 19:07:24.000 stephen null
488388 2019-02-18 17:34:52.000 stephen 297
488558 2019-02-18 18:06:48.000 stephen 297
488565 2019-02-18 18:07:24.000 stephen 297
488565 2019-02-18 14:07:24.000 stephen null
483170 2019-02-18 13:53:14.000 stephen 299
488565 2019-02-18 11:07:24.000 stephen null
483170 2019-02-18 10:53:14.000 stephen 297
the technition was in 297 twice this day , i want to get this result per tech (End Time is the when i got Null or Diffrent SiteID):
UserName InComeTime TimeInSite(min) SiteID
stephen 2019-02-18 10:53:14.000 14 297
stephen 2019-02-18 13:53:14.000 14 299
stephen 2019-02-18 17:34:52.000 153 297
thanks,
eyal

Can't comment because no reputation :( ?!? so I'll post as answer although some questions remain. In principle you can work along the lines of joining null-value site records onto not-null site records. If you can't warrant that null-value siteIds mean 'exit' and not-null siteIds mean entry then there is no 'starting point' and you'd need to do a table scan. If you can warrant it (or deal with exceptions separately) then the query could take on the following form:
select t1.UserName,
t1.UpdateTime as EntryTime,
t2.UpdateTime as ExitTime,
datediff(MI, t1.UpdateTime, t2.UpdateTime) as TimeInSite,
t1.SiteId
from TimeTable t1
join TimeTable t2 on t2.id in
(select id from TimeTable
where
-- want the same user
UserName = t1.UserName
-- site id null/different means 'exited site'
and (siteId is null)
-- now get the entry with the minium update time that is greater than the entry time
and UpdateTime = (select min(UpdateTime) from TimeTable where UpdateTime > t1.UpdateTime
)
)
where t1.SiteId is not null
order by EntryTime
This does not take into account that you can have multiple 'not-null' siteIds for the same visit (i.e. the three 297s). Ideally this should be avoided. If you can't then you could first collate those entries into a temp table to only pick the the first entry time.
The above query outputs the following (SQL server, note that I have added entry and exit time for clarity). It is not 100% what you wanted because of the multiple 297s, but maybe it gets you started. Out of time now, maybe someone else can provide a 100% solution. Good luck!
UserName EntryTime ExitTime TimeInSite SiteId
------------ ----------------------- ----------------------- ----------- -----------
stephen 2019-02-18 10:53:14.000 2019-02-18 11:07:24.000 14 297
stephen 2019-02-18 13:53:14.000 2019-02-18 14:07:24.000 14 299
stephen 2019-02-18 18:07:24.000 2019-02-18 19:07:24.000 60 297

You can do this with window functions. You want to assign groups to the rows and then aggregate. How is the grouping defined?
In this case, you want to include the next NULL value in the group. So, a definition that works for you is the number of NULL values accumulated in reverse order. That is:
select t.*,
sum(case when siteId is null then 1 else 0 end) over (partition by userName order by updatetime desc) as grp
from t;
Then you can aggregate to get what you want:
select username, min(siteid) as siteid,
min(updatetime) as incometime,
datediff(minute, min(updatetime), max(updatetime)) as minutes
from (select t.*,
sum(case when siteId is null then 1 else 0 end) over (partition by userName order by updatetime desc) as grp
from t
) t;

Related

How to select a foreign key after narrowing down via Group By and Having in a subquery

I've got a unique problem. I'm querying a replicated database table cost_plan_breakdown, and the replication is known to have some duplicates due to issues with deleting records. I'm not the Admin so I'm trying to sidestep these duplicates as efficiently as possible. The table looks like this:
sys_id
sys_created_on
cost_plan
breakdown_start_date
axr123
2020-10-01 09:31:15
Outlook KTLO - Lisa Lymon
10-01-2020
pqo100
2020-12-23 05:50:20
Outlook KTLO - Lisa Lymon
10-01-2020
cji985
2020-10-01 09:31:15
Outlook KTLO - Lisa Lymon
11-01-2020
twg795
2020-10-05 13:23:08
DataPyramid CTB - Dave Dods
10-01-2020
jqr820
2020-09-28 16:11:54
Revoluccion CTB - Marcus Vance
11-01-2020
vjo150
2021-01-13 11:10:09
Server KTLO - Tom Smith
10-01-2020
Cost Plans typically have between 1 and 12 breakdowns during their lifespan, but there should only be one breakdown per cost plan per month. Notice that the Outlook Cost Plan has two breakdowns within the same month (October) with differing sys_id and sys_created_on.
So by using a smaller subquery in the where clause, I'm trying to determine the following:
"Group the rows with identical month and year of breakdown_start_date, and identical cost_plan. Of the remaining rows, select the one with the MAX sys_created_on. Take the sys_id of that row and feed it to the parent query to only include these rows."
...rest of query above
WHERE cpb.breakdown_type = 'requirement'
AND cpb.sys_id IN
(SELECT cpb2.sys_id
FROM cost_plan_breakdown cpb2
GROUP BY cpb2.name,
YEAR(cpb2.start_date_time),
MONTH(cpb2.start_date_time)
HAVING MAX(cpb2.sys_created_on))
At this point, I'm running into the error
cpb2.sys_id is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
I've previously semi-solved this by putting the MAX sys_created_on in the SELECT statement, and matching off that, but I realized that could pull in unwanted dupe records just because they match the sys_created_on of another.
I feel like the solution may be staring me in the face, but I'm stuck. Appreciate your help!
Use row_number to number the duplicate rows and then exclude them. Ordering the row number by sys_created_on desc ensures you get the latest of each per month.
declare #Test table (sys_id varchar(6), sys_created_on datetime2(0), cost_plan varchar(32), breakdown_start_date date);
insert into #Test (sys_id, sys_created_on, cost_plan, breakdown_start_date)
values
('axr123', '2020-10-01 09:31:15', 'Outlook KTLO - Lisa Lymon', '10-01-2020'),
('pqo100', '2020-12-23 05:50:20', 'Outlook KTLO - Lisa Lymon', '10-01-2020'),
('cji985', '2020-10-01 09:31:15', 'Outlook KTLO - Lisa Lymon', '11-01-2020'),
('twg795', '2020-10-05 13:23:08', 'DataPyramid CTB - Dave Dods', '10-01-2020'),
('jqr820', '2020-09-28 16:11:54', 'Revoluccion CTB - Marcus Vance', '11-01-2020'),
('vjo150', '2021-01-13 11:10:09', 'Server KTLO - Tom Smith', '10-01-2020');
with cte as (
select *
, row_number() over (partition by cost_plan, datepart(year,breakdown_start_date), datepart(month,breakdown_start_date) order by sys_created_on desc) rn
from #Test
)
select *
from cte
where rn = 1;
As per your comments this (the CTE) is just a neat way to write a sub-query/derived table and can still be written as follows:
select *
from (
select *
, row_number() over (partition by cost_plan, datepart(year,breakdown_start_date), datepart(month,breakdown_start_date) order by sys_created_on desc) rn
from #Test
) cte
where rn = 1;
Note: If you provide DDL+DML as shown above you make it much easier for people to assist.

How to get the set size, first and last record in a db2 ordered set with one call

I have a very big transaction table on DB2 v11, and I need to query a subset of it as efficiently as possible. All I need is the total count of the set (not known in advance, it's based on criteria, lets say 1 day) and the ID of the first record, and the ID of the last record.
The old code was fetching the entire table, then just using the 1st record ID, and the last record ID, and size, and not making use of the rest. Now this code is timing out. It's a complex query of several joins.
IS there a way to just fetch the size of the set, 1st record, last record all in one select query ?
I've read that reordering the list in order to fetch the 1st record(so fetch with Desc, then change to Asc) is not efficient.
sample table 1 TRANSACTION_RECORDS:
tdID TIMESTAMP name
-------------------------------
123 2020-03-31 john
234 2020-03-31 dan
456 2020-03-01 Eve
675 2020-04-01 joy
sample table 2 TRANSACTION_TYPE:
invoiceId tdID account
------------------------------
897 123 abc
898 123 def
877 234 mnc
899 456 opp
Sample query
select Min(tr.transaction_id), Max(tr.transaction_id)
from TRANSACTION_RECORDS TR
join TRANSACTION_TYPE TT
on TR.tdID=tt.tdID
WHERE Date(TR.TIMESTAMP) = '2020-03-31'
group by tr.tdID
order by TR.tdID ASC
This results in multiple columns, (but it requires the group by)
123,123
234,234
456,456
What I want is:
123,456
As I mentioned in the comments, for this query you don't need Group BY and neither Order by, just do:
select Min(tr.transaction_id), Max(tr.transaction_id)
from TRANSACTION_RECORDS TR
join TRANSACTION_TYPE TT
on TR.tdID=tt.tdID
WHERE Date(TR.TIMESTAMP) = '2020-03-31'
It should work as expected

Trouble pivoting data in DB2

Before this one is marked as duplicate please know I have done my research on Pivoting in DB2 (even though DB2 doesnt have PIVOT) from these links
Pivoting in DB2 on SO and IBM Developers, but I just cant make sense of how to do it with my Data and need some help. I tried to manipulate my string using examples from both of those links and could not get it to work. Im not asking for anyone to write the full code for me, but just give me a point in the right direction on how to change my string to retrieve the desired result. Thank you in advance.
Current String:
SELECT
cfna1 AS "Customer Name", cfrisk AS "Risk Rating", cfrirc AS "Rated By", date(digits(decimal(cfrid7 + 0.090000, 7, 0))) AS "Risk Rated Date",cfuc3n3 AS "Credit Score", date(digits(decimal(cf3ud7 + 0.090000, 7, 0))) AS "CR Date"
FROM cncttp08.jhadat842.cfmast cfmast
WHERE cfcif# IN ('T000714', 'T000713', 'T000716', 'T000715')
ORDER BY
CASE cfcif#
WHEN 'T000714' THEN 1
WHEN 'T000713' THEN 2
WHEN 'T000716' THEN 3
WHEN 'T000715' THEN 4
END
Result as expected from String:
Customer Name | Risk Rating | Rated By | Risk Rated Date | Credit Score | CR Date
Elmer Fudd 8 MLA 2018-02-08 777 2018-02-08
Result I would like to achieve:
Elmer Fudd
Risk Rating 8
Rated By MLA
Risk Rated Date 2018-02-08
Credit Score 777
CR Date 2018-02-08
Use unpivot method suggested in developers link and use cast to convert all columns to varchar.
Example:
select st1.id1, unpivot1.col1, unpivot1.val1
from (
select id1, char1 , date1, number1
from sometable
) st1,
lateral (values
('char col', cast(st1.char1 as varchar(100))),
('date col', cast(st1.date1 as varchar(100))),
('number col', cast(st1.number1 as varchar(100)))
) as unpivot1 (col1, val1)
order by st1.id1
I don't think that output is possible in sql -- do you mean something like this?
id_group Data_Type Value
1 Name Elmer Fudd
1 Risk Rating 8
1 Rated By MLA
1 Risk Rated Date 2018-02-08
1 Credit Score 777
1 CR Date 2018-02-08
To do this we need another column that brings all the elements together -- I called it "id_group" this is the column that identifys the group

I'm looking to find an average difference between a series of 2 rows same column in SQL

So I've looked through a lot of questions about subtraction and all that for SQL but haven't found the exact same use.
I'm using a single table and trying to find an average response time between two people talking on my site. Here's the data sample:
id created_at conversation_id sender_id receiver_id
307165 2017-05-03 20:03:27 96557 24 1755
307166 2017-05-03 20:04:22 96557 1755 24
303130 2017-04-20 18:03:53 102458 2518 4475
302671 2017-04-18 20:11:20 102505 3100 1079
302670 2017-04-18 20:09:38 103014 3100 2676
350570 2017-09-18 20:59:56 103496 5453 929
290458 2017-02-16 13:38:47 103575 2841 2282
300001 2017-04-08 16:42:16 104159 2740 1689
304204 2017-04-24 17:31:25 104531 5963 1118
284873 2017-01-12 22:33:19 104712 3657 3967
284872 2017-01-12 22:31:38 104712 3967 3657
What I want is to find an Average Response Time based on the conversation_id
Hmmm . . . You can get the "response" for a given row by getting the next row between the two conversers. The rest is getting the average -- which is database dependent.
Something like this:
select avg(next_created_at - created_at) -- exact syntax depends on the database
from (select m.*,
(select min(m2.created_at)
from messages m2
where m2.sender_id = m.receiver_id and m.sender_id = m2.receiver_id and
m2.conversation_id = m.conversation_id and
m2.created_at > m.created_at
) next_created_at
from messages m
) mm
where next_created_at is not null;
A CTE will take care of bringing the conversation start and end into the same row.
Then use DATEDIFF to compute the response time, and average it.
Assumes there are only ever two entries per conversation (ignores others with 1 or more than 2).
WITH X AS (
SELECT conversation_id, MIN(created_at) AS convstart, MAX(created_at) AS convend
FROM theTable
GROUP BY conversation_id
HAVING COUNT(*) = 2
)
SELECT AVG(DATEDIFF(second,convstart,convend)) AS AvgResponse
FROM X

Create column based on grouping other values

I have difficulties formulating my issue.
I have a view which brings these results. There's a need to add a column to the view, which will pair up round-trip flights with identical number.
Flt_No From_Airport To_Airport Dep_Date RequiredResult
124 |LCA |CDG |10/19/14 5:00 1
125 |CDG |LCA |10/19/14 10:00 1
197 |LCA |BCN |10/4/12 5:00 2
198 |BCN |LCA |10/4/12 11:00 2
501 |LCA |HER |15/8/12 12:05 3
502 |HER |LCA |15/8/12 15:15 3
I.e. flight 124 is going from Larnaca to CDG, and flight 125 is going back from CDG to Larnaca - they both have to have the same identifier.
Round-trip flights will always have following flight numbers.
I have a bunch of conditions which I won't write now.
Omitting hours is not an option, they're important.
I was thinking dense_rank() but I don't know how to create one identifier for 2 flights with different numbers, please help.
If your data is similar to the sample data posted, then the following query should give the required result:
SELECT *,
DENSE_RANK() OVER (ORDER BY CASE
WHEN From_Airport < To_Airport THEN From_Airport
ELSE To_Airport
END)
FROM mytable
Join conditions are not limited to simple equality. Assuming {Flight No, Departure, Destination} is unique on any one day, then a self join should do it:
select whatever
from flights outbound
inner join flights inbound on outbound.flt_no+1 = inbound.flt_no
and cast(outbound.dep_date, date)
= cast(inbound.dep_date, date)
and outbound.From_Airport = inbound.To_Airport
and outbound.To_Airpott = inbound.From_Ariport