We are a table called ticketing that tracks all the service tickets. One ticket can lead to another ticket which leads to another ticket indicated by the replaced_by_ticket_id field below
| ticket_id | is_current | replaced_by_ticket_id |
|-----------|------------|-----------------------|
| 134 | 0 | 240 |
| 240 | 0 | 321 |
| 321 | 1 | Null |
| 34 | 0 | 93 |
| 25 | 0 | 16 |
| 16 | 0 | 25 |
| 93 | 1 | Null |
How do I write a query to get the number of tickets leading to the current ones (321 & 93)? I mean I could join the table by itself, but there is no way of knowing how many times to join. Plus different tickets have different number of levels.
Here is the expected result of the query
| ticket_id | total_tickets |
|-----------|---------------|
| 321 | 3 |
| 93 | 4 |
What is the best way to do it?
You can use a recursive query; the trick is to keep track of the original "current" ticket, so you can aggregate by that in the outer query.
So:
with cte as (
select ticket_id, ticket_id as parent_id from ticketing where is_current = 1
union all
select c.ticket_id, t.ticket_id
from ticket t
inner join cte c on c.parent_id = t.replaced_by_ticket_id
)
select ticket_id, count(*) total_tickets
from cte
group by ticket_id
I need to be able to filter down a dataset to only show the first instance every 3 hours. If an instance is found, any other instances that occur up to 3 hours afterwards should be hidden.
The closes thing I've been able to find is using date_trunc to get the first instance each hour, but I need to hide specifically up to 3 hours after the first instance exactly.
Example Data:
+------------------------+-------+
| Timestamp | Value |
+------------------------+-------+
| "2015-12-29 13:35:00" | 65 |
| "2015-12-29 13:40:00" | 26 |
| "2015-12-29 13:45:00" | 80 |
| "2015-12-29 13:50:00" | 10 |
| "2015-12-29 16:40:00" | 76 |
| "2015-12-29 16:45:00" | 73 |
| "2016-01-04 08:05:00" | 87 |
| "2016-01-04 08:10:00" | 90 |
| "2016-01-04 08:15:00" | 52 |
| "2016-01-04 08:20:00" | 90 |
| "2016-01-04 08:25:00" | 23 |
| "2016-01-04 08:30:00" | 96 |
| "2016-01-04 13:35:00" | 53 |
| "2016-01-04 13:40:00" | 15 |
| "2016-01-04 13:45:00" | 85 |
+------------------------+-------+
Expected Result:
+------------------------+-------+
| Timestamp | Value |
+------------------------+-------+
| "2015-12-29 13:35:00" | 65 |
| "2015-12-29 16:40:00" | 76 |
| "2016-01-04 08:05:00" | 87 |
| "2016-01-04 13:30:00" | 7 |
+------------------------+-------+
Anyone have any ideas? Thank you so much for your help.
This is tricky, because you need to keep track of the last picked record to identify the next one - so you can't just group by 3 hours intervals.
Here is one approach using a recursive cte:
with recursive cte(ts, value) as (
select ts, value
from mytable
where ts = (select min(ts) from mytable)
union all
select x.*
from (select ts from cte order by ts desc limit 1) c
cross join lateral (
select t.ts, t.value
from mytable t
where t.ts >= c.ts + interval '3' hour
order by t.ts
limit 1
) x
)
select * from cte order by ts
The idea is to start from the earliest record in the table, then iterate by picking the first available record that is at least 3 hours later (this assumes no duplicates in the timestamp column).
Note that timestamp is not a good choice for a column name, because it conflicts with a language keyword (that's a datatype). I remaned it to ts in the query.
Demo on DB Fiddle:
ts | value
:------------------ | ----:
2015-12-29 13:35:00 | 65
2015-12-29 16:40:00 | 76
2016-01-04 08:05:00 | 87
2016-01-04 13:35:00 | 53
I am trying to computer the median number of transactions in each category.
A few notes (as the dataset below is a small snippet of a much larger dataset):
An employee can belong to multiple categories
Each transaction's median should be > 0
Not every person appears in every category
The data is set up like this:
| Person | Category | Transaction |
|:-------:|:--------:|:-----------:|
| PersonA | Sales | 27 |
| PersonB | Sales | 75 |
| PersonC | Sales | 87 |
| PersonD | Sales | 36 |
| PersonE | Sales | 70 |
| PersonB | Buys | 60 |
| PersonC | Buys | 92 |
| PersonD | Buys | 39 |
| PersonA | HR | 59 |
| PersonB | HR | 53 |
| PersonC | HR | 98 |
| PersonD | HR | 54 |
| PersonE | HR | 70 |
| PersonA | Other | 46 |
| PersonC | Other | 66 |
| PersonD | Other | 76 |
| PersonB | Other | 2 |
An ideal output would look like:
| Category | Median | Average |
|:--------:|:------:|:-------:|
| Sales | 70 | 59 |
| Buys | 60 | 64 |
| HR | 59 | 67 |
| Other | 56 | 48 |
I can get the average by:
SELECT
Category,
AVG(Transaction) AS Average_Transactions
FROM
table
GROUP BY
Category
And that works great!
This post tried to help me find the median. What I wrote was:
SELECT
Category,
PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY Transaction) OVER (PARTITION BY Category) AS Median_Transactions
FROM
table
GROUP BY
Category
But I get an error:
Msg 8120: Column 'Transactions' is invalid in the select list because it is not contained in either an aggregate function or the **GROUP BY** clause
How can I fix this?
You can do what you want using SELECT DISTINCT:
SELECT DISTINCT Category,
PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY Transaction) OVER (PARTITION BY Category) AS Median_Transactions
FROM table;
Unfortunately, SQL Server doesn't offer the PERCENTILE_ functions as window functions and doesn't have a MEDIAN() aggregation function. You can also do this using subqueries and counts.
It's not optimal but this is your solution
SELECT DISTINCT
category,
PERCENTILE_DISC(0.5)WITHIN GROUP(ORDER BY val) OVER (PARTITION BY category) AS Median_Transactions,
AVG(val) OVER (PARTITION BY d.category) [AVG]
FROM #data d;
I don't think this is pretty but it works. I didn't spend time on polishing it
with
avg_t as
( select category, avg(sales) as avg_sales
from sample
group by 1),
mn as
( select category, avg(sales) as median_sales
from (
select category, sales ,
row_number() over (partition by category order by sales asc) as r ,
count(person) over (partition by category) as total_count
from sample
) mn_sub
where (total_count % 2 = 0 and r in ( (total_count/2), ((total_count/2)+1)) ) or
(total_count % 2 <> 0 and r = ((total_count+1)/2))
group by 1
)
select avg_t.category, avg_t.avg_sales, mn.median_sales
from avg_t
inner join mn
on avg_t.category=mn.category
I have a dataset that includes a bunch of stay data (at a hotel). Each row contains a start date and an end date, but no duration field. I need to get a sum of the durations.
Sample Data:
| Stay ID | Client ID | Start Date | End Date |
| 1 | 38 | 01/01/2018 | 01/31/2019 |
| 2 | 16 | 01/03/2019 | 01/07/2019 |
| 3 | 27 | 01/10/2019 | 01/12/2019 |
| 4 | 27 | 05/15/2019 | NULL |
| 5 | 38 | 05/17/2019 | NULL |
There are some added complications:
I am using Crystal Reports and this is a SQL Expression, which obeys slightly different rules. Basically, it returns a single scalar value. Here is some more info: http://www.cogniza.com/wordpress/2005/11/07/crystal-reports-using-sql-expression-fields/
Sometimes, the end date field is blank (they haven't booked out yet). If blank, I would like to replace it with the current timestamp.
I only want to count nights that have occurred in the past year. If the start date of a given stay is more than a year ago, I need to adjust it.
I need to get a sum by Client ID
I'm not actually any good at SQL so all I have is guesswork.
The proper syntax for a Crystal Reports SQL Expression is something like this:
(
SELECT (CASE
WHEN StayDateStart < DATEADD(year,-1,CURRENT_TIMESTAMP) THEN DATEDIFF(day,DATEADD(year,-1,CURRENT_TIMESTAMP),ISNULL(StayDateEnd,CURRENT_TIMESTAMP))
ELSE DATEDIFF(day,StayDateStart,ISNULL(StayDateEnd,CURRENT_TIMESTAMP))
END)
)
And that's giving me the correct value for a single row, if I wanted to do this:
| Stay ID | Client ID | Start Date | End Date | Duration |
| 1 | 38 | 01/01/2018 | 01/31/2019 | 210 | // only days since June 4 2018 are counted
| 2 | 16 | 01/03/2019 | 01/07/2019 | 4 |
| 3 | 27 | 01/10/2019 | 01/12/2019 | 2 |
| 4 | 27 | 05/15/2019 | NULL | 21 |
| 5 | 38 | 05/17/2019 | NULL | 19 |
But I want to get the SUM of Duration per client, so I want this:
| Stay ID | Client ID | Start Date | End Date | Duration |
| 1 | 38 | 01/01/2018 | 01/31/2019 | 229 | // 210+19
| 2 | 16 | 01/03/2019 | 01/07/2019 | 4 |
| 3 | 27 | 01/10/2019 | 01/12/2019 | 23 | // 2+21
| 4 | 27 | 05/15/2019 | NULL | 23 |
| 5 | 38 | 05/17/2019 | NULL | 229 |
I've tried to just wrap a SUM() around my CASE but that doesn't work:
(
SELECT SUM(CASE
WHEN StayDateStart < DATEADD(year,-1,CURRENT_TIMESTAMP) THEN DATEDIFF(day,DATEADD(year,-1,CURRENT_TIMESTAMP),ISNULL(StayDateEnd,CURRENT_TIMESTAMP))
ELSE DATEDIFF(day,StayDateStart,ISNULL(StayDateEnd,CURRENT_TIMESTAMP))
END)
)
It gives me an error that the StayDateEnd is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause. But I don't even know what that means, so I'm not sure how to troubleshoot, or where to go from here. And then the next step is to get the SUM by Client ID.
Any help would be greatly appreciated!
Although the explanation and data set are almost impossible to match, I think this is an approximation to what you want.
declare #your_data table (StayId int, ClientId int, StartDate date, EndDate date)
insert into #your_data values
(1,38,'2018-01-01','2019-01-31'),
(2,16,'2019-01-03','2019-01-07'),
(3,27,'2019-01-10','2019-01-12'),
(4,27,'2019-05-15',NULL),
(5,38,'2019-05-17',NULL)
;with data as (
select *,
datediff(day,
case
when datediff(day,StartDate,getdate())>365 then dateadd(year,-1,getdate())
else StartDate
end,
isnull(EndDate,getdate())
) days
from #your_data
)
select *,
sum(days) over (partition by ClientId)
from data
https://rextester.com/HCKOR53440
You need a subquery for sum based on group by client_id and a join between you table the subquery eg:
select Stay_id, client_id, Start_date, End_date, t.sum_duration
from your_table
inner join (
select Client_id,
SUM(CASE
WHEN StayDateStart < DATEADD(year,-1,CURRENT_TIMESTAMP) THEN DATEDIFF(day,DATEADD(year,-1,CURRENT_TIMESTAMP),ISNULL(StayDateEnd,CURRENT_TIMESTAMP))
ELSE DATEDIFF(day,StayDateStart,ISNULL(StayDateEnd,CURRENT_TIMESTAMP))
END) sum_duration
from your_table
group by Client_id
) t on t.Client_id = your_table.client_id
I have been battling through this query/query design for sometime now and I thought it's time to ask the experts! Here's my table results:
ID | Status | date |
---------------------------------
05 | Returned | 20/6/2018 |
03 | Sent | 12/5/2018 |
01 | Pending | 07/6/2018 |
01 | Engaged | 11/4/2018 |
03 | Contacted | 16/4/2018 |
05 | Surveyed | 04/3/2017 |
05 | No Contact | 05/3/2017 |
How do I get it to return top/newest value for each ID:
ID | Status | date |
---------------------------------
05 | Returned | 20/6/2018 |
03 | Sent | 12/5/2018 |
01 | Pending | 07/6/2018 |
I've tried group by, TOP 1, Distinct and results still not what I wanted. Also, displaying the results by top 5% is won't do either as the ID can be more than just 3 types.
My QUERY below:
INSERT INTO TmpAllcomsEmployee ( StatusID, EmployeeID, CommunicationDate )
SELECT DISTINCT CommunicationLog.StatusID, TmpAllcomsEmployee.EmployeeID,
Max(CommunicationLog.CommunicationDate) AS MaxOfCommunicationDate
FROM CommunicationLog RIGHT JOIN TmpAllcomsEmployee ON
CommunicationLog.EmployeeID = TmpAllcomsEmployee.EmployeeID
GROUP BY CommunicationLog.StatusID, TmpAllcomsEmployee.EmployeeID
ORDER BY Max(CommunicationLog.CommunicationDate) DESC;
One method is a correlated subquery:
select cl.*
from CommunicationLog as cl
where cl.date = (select max(cl2.date)
from CommunicationLog as cl2
where cl2.EmployeeID = cl.EmployeeID
);
This gets the most recent record for each employee in CommunicationLog. You can join in the other table if you really need it. It does not seem unnecessary unless you are using it for filtering.