SQL MAX & MIN on same column around given date - sql

I have a table structure like the following:
+------+------+-------------+
|Person| Code | DateOccur |
+------+------+-------------+
| 1545 | A | 1/1/2014 |
| 2324 | K | 3/4/2014 |
| 2324 | j | 3/8/2014 |
| 1545 | B | 3/6/2014 |
| 5663 | J | 3/2/2014 |
+------+------+-------------+
The Primary key could be seen as (Person, DateOccur), that is, there would never be two values with the same date for the same employee.
I want to form a query, given a particular date SomeDate, where I return the MAX(DateOccur) LESS than SomeDate as well as the MIN(DateOccur) GREATER than SomeDate, as well as the two associated codes.
An example with the table provided would be (w/ SomeDate = 3/3/2014):
+--------+------------+---------------+-----------+--------------+
| Person | CodeBefore | MaxDateBefore | CodeAfter | MinDateAfter |
+--------+------------+---------------+-----------+--------------+
| 1545 | A | 1/1/2014 | B | 3/6/2014 |
| 2324 | K | 3/4/2014 | j | 3/8/2014 |
| 5663 | j | 3/2/2014 | null | null |
+--------+------------+---------------+-----------+--------------+
What would be the simplest method of accomplishing this? I read a number of MAX/MIN StackOverflow questions, but I specifically need values around a specific date. I ideally just don't want to end up with something hacked together when I know there has to be a smooth way to do this.

Do this with conditional aggregation and then joining in the results. Here is the code to get the dates:
select person, max(case when DateOccur < #TheDate then DateOccur end) as DateBefore,
min(case when DateOccur > #TheDate then DateAfter end) as DateAfter
from table t
group by person;
And to get the rest:
select x.person, x.DateBefore, tbefore.code, x.DateAfter, tafter.code
from (select person, max(case when DateOccur < #TheDate then DateOccur end) as DateBefore,
min(case when DateOccur > #TheDate then DateOccur end) as DateAfter
from table t
group by person
) x left outer join
table tbefore
on x.person = tbefore.person and x.DateBefore = tbefore.DateOccur left outer join
table tafter
on x.person = tafter.person and x.DateAfter = tafter.DateOccur;

I assume you want to avoid using min/max -based subqueries? Perhaps you can conceive of a solution around the ROW_NUMBER() OVER ranking function? What do you want to happen if you have 2 rows in the first table with the same DateOccur value?
http://msdn.microsoft.com/en-us/library/ms186734.aspx

Related

SQL Server: GROUP BY with multiple columns produces duplicate results

I'm trying to include a 3rd column into my existing SQL Server query but I am getting duplicate result values.
Here is an example of the data contained in tb_IssuedPermits:
| EmployeeName | Current |
|--------------|---------|
| Person A | 0 |
| Person A | 0 |
| Person B | 1 |
| Person C | 0 |
| Person B | 0 |
| Person A | 1 |
This is my current query which produces duplicate values based on 1 or 0 bit values.
SELECT EmployeeName, COUNT(*) AS Count, [Current]
FROM tb_IssuedPermits
GROUP BY EmployeeName, [Current]
| EmployeeName | Count | Current |
|--------------|-------|---------|
| Person A | 2 | 0 |
| Person B | 1 | 0 |
| Person C | 1 | 0 |
| Person A | 1 | 1 |
| Person B | 1 | 1 |
Any ideas on how I can amend my query to have the following expected result? I want one result row per EmployeeName. And Current shall be 1, if for the EmployeeName exists a row with Current = 1, else it shall be 0.
| EmployeeName | Count | Current |
|--------------|-------|---------|
| Person A | 3 | 1 |
| Person B | 2 | 1 |
| Person C | 1 | 0 |
The result does not need to be in any specific order.
TIA
If your Current column contains the string values 'FALSE' and 'TRUE' you can do this
SELECT EmployeeName, Count(*) AS Count,
MAX([Current]) AS Current
FROM tb_IssuedPermits
GROUP BY EmployeeName
It's a hack but it works: MAX will get the TRUE from each group if there is one.
If your Current column is a BIT, cast to INT and cast back, as #ThorstenKettner suggested.
SELECT EmployeeName,
Count(*) AS Count,
CAST(MAX(CAST([Current] AS INT)) AS BIT) AS Current
FROM tb_IssuedPermits
GROUP BY EmployeeName
Alternatively, you can use conditional aggregation:
SELECT EmployeeName,
Count(*) AS Count,
CAST(COUNT(NULLIF(Current, 0)) AS BIT) AS Current
FROM tb_IssuedPermits
GROUP BY EmployeeName
you can do like this
SELECT EmployeeName, Count(1) AS Count,SUM(CAST([Current]AS INT)) AS Current FROM tb_IssuedPermits GROUP BY EmployeeName

Find number of rows identical one some, but different on another column

Say I have the following table:
CREATE TABLE data (
PROJECT_ID VARCHAR,
TASK_ID VARCHAR,
REF_ID VARCHAR,
REF_VALUE VARCHAR
);
I want to identify rows where
PROJECT_ID, REF_ID, REF_VALUE are the same
but TASK_ID are different.
The desired output is a list of TASK_ID_1, TASK_ID_2 and COUNT(*) of such conflicts. So, for example,
DATA
+------------+---------+--------+-----------+
| PROJECT_ID | TASK_ID | REF_ID | REF_VALUE |
+------------+---------+--------+-----------+
| 1 | 1 | 1 | 1 |
| 1 | 1 | 1 | 2 |
| 1 | 2 | 1 | 1 |
| 1 | 2 | 1 | 2 |
+------------+---------+--------+-----------+
OUTPUT
+-----------+-----------+----------+
| TASK_ID_1 | TASK_ID_2 | COUNT(*) |
+-----------+-----------+----------+
| 1 | 2 | 2 |
| 2 | 1 | 2 |
+-----------+-----------+----------+
would mean that there are two entries with TASK_ID == 1 and two entries with TASK_ID == 2 that share the same values for the other three columns. The inherent symmetry in the output is fine.
How would I go about finding this information? I've tried joining the table onto itself and grouping, but this turned up more results for a single task than the table had rows altogether, so it's clearly wrong.
The database used is PostgreSQL, though a solution that applies to most common SQL systems would be preferable.
You want a self join and aggregation:
select d1.task_id as task_id_1, d2.task_id as task_id_2, count(*)
from data d1 join
data d2
on d1.project_id = d2.project_id and
d1.ref_id = d2.ref_id and
d1.ref_value = d2.ref_value and
d1.task_id <> d2.task_id
group by d1.task_id, d2.task_id;
Notes:
Add the condition d1.task_id < d2.task_id if you want each pair to occur only once in the result set.
This does not handle NULL values, although that is easy enough to handle. Use is not distinct from instead of =.
You can also simplify this a bit with the using clause:
select d1.task_id as task_id_1, d2.task_id as task_id_2, count(*)
from data d1 join
data d2
using (project_id, ref_id, ref_value)
where d1.task_id <> d2.task_id
group by d1.task_id, d2.task_id;
You can get an idea of how many rows might be returned by using:
select d.project_id, d.ref_id, d.ref_value, count(distinct d.task_id), count(*)
from data d
group by d.project_id, d.ref_id, d.ref_value;
This is how I understand your question. This assume there are only two task for the same combination.
SQL DEMO
SELECT "PROJECT_ID", "REF_ID", "REF_VALUE",
MIN("TASK_ID") as TASK_ID_1,
MAX("TASK_ID") as TASK_ID_2,
COUNT(*) as cnt
FROM Table1
GROUP BY "PROJECT_ID", "REF_ID", "REF_VALUE"
HAVING MIN("TASK_ID") != MAX("TASK_ID")
-- COUNT(*) > 1 also should work
OUTPUT
I add more column to make clear what are the same elements:
| PROJECT_ID | REF_ID | REF_VALUE | task_id_1 | task_id_2 | cnt |
|------------|--------|-----------|-----------|-----------|-----|
| 1 | 1 | 2 | 1 | 2 | 2 |
| 1 | 1 | 1 | 1 | 2 | 2 |

Summing dates across multiple rows in SQL?

We have a Table that stores alarms for certain SetPoints in our system. I'm attempting to write a query that first gets the difference between two dates (spread across two rows), and then sums all of the date differences to get a total sum for the amount of time the setpoint was in alarm.
We have one database where I've accomplished similar, but in that case, both the startTime and endTime were in the same row. In this case, this is not adequate
Some example Data
| Row | TagID | SetPointID | EventLogTime | InAlarm |
-------------------------------------------------------------------------------------
| 1 | 1 | 2 | 2016-01-01 01:49:18.070 | 1 |
| 2 | 1 | 1 | 2016-01-01 03:23:39.970 | 1 |
| 3 | 1 | 2 | 2016-01-01 03:23:40.070 | 0 |
| 4 | 1 | 1 | 2016-01-01 08:04:01.260 | 0 |
| 5 | 1 | 2 | 2016-01-01 08:04:01.370 | 1 |
| 6 | 1 | 1 | 2016-01-01 11:40:36.367 | 1 |
| 7 | 1 | 2 | 2016-01-01 11:40:36.503 | 0 |
| 8 | 1 | 1 | 2016-01-01 13:00:30.263 | 0 |
Results
| TagID | SetPointID | TotalTimeInAlarm |
------------------------------------------------------
| 1 | 1 | 6.004443 (hours) |
| 1 | 2 | 5.182499 (hours) |
Essentially, what I need to do is to get the start time and end time for each tag and each setpoint, then I need to get the total time in alarm. I'm thing CTEs might be able to help, but I'm not sure.
I believe the pseudo query logic would be similar to
Define #startTime DATETIME, #endTime DATETIME
SELECT TagID,
SetPointID,
ABS(First Occurrence of InAlarm = True (since last occurrence WHERE InAlarm = False)
- First Occurrence of InAlarm = False (since last occurrence WHERE InAlarm = True))
-- IF no InAlarm = False use #endTime.
GROUP BY TagID, SetPointID
You can use the LEAD windowed function (or LAG) to do this pretty easily. This assumes that the rows always come in pairs with 1-0-1-0 for "InAlarm". If that doesn't happen then it's going to throw things off. You would need to have business rules for these situations in any event.
;WITH CTE_Timespans AS
(
SELECT
TagID,
SetPointID,
InAlarm,
EventLogTime,
LEAD(EventLogTime, 1) OVER (PARTITION BY TagID, SetPointID ORDER BY EventLogTime) AS EndingEventLogTime
FROM
My_Table
)
SELECT
TagID,
SetPointID,
SUM(DATEDIFF(SS, EventLogTime, EndingEventLogTime))/3600.0 AS TotalTime
FROM
CTE_Timespans
WHERE
InAlarm = 1
GROUP BY
TagID,
SetPointID
One easy way is to use OUTER APPLY to get the next date that is not InAlarm
SELECT mt.TagID,
mt.SetPointID,
SUM(DATEDIFF(ss,mt.EventLogTime,oa.EventLogTime)) / 3600.0 AS [TotalTimeInAlarm]
FROM MyTable mt
OUTER APPLY (SELECT MIN([EventLogTime]) EventLogTime
FROM MyTable mt2
WHERE mt.TagID = mt2.TagID
AND mt.SetPointID = mt2.SetPointID
AND mt2.EventLogTime > mt.EventLogTime
AND InAlarm = 0
) oa
WHERE mt.InAlarm = 1
GROUP BY mt.TagID,
mt.SetPointID
LEAD() might perform better if using MSSQL 2012+
In SQL Server 2014+:
SELECT tagId, setPointId, SUM(DATEDIFF(second, pt, eventLogTime)) / 3600. AS diff
FROM (
SELECT *,
LAG(inAlarm) OVER (PARTITION BY tagId, setPointId ORDER BY eventLogTime, row) ppa,
LAG(eventLogTime) OVER (PARTITION BY tagId, setPointId ORDER BY eventLogTime, row) pt
FROM (
SELECT LAG(inAlarm) OVER (PARTITION BY tagId, setPointId ORDER BY eventLogTime, row) pa,
*
FROM mytable
) q
WHERE EXISTS
(
SELECT pa
EXCEPT
SELECT inAlarm
)
) q
WHERE ppa = 0
AND inAlarm = 1
GROUP BY
tagId, setPointId
This will filter out consecutive events with same alarm state

get the value from the previous row if row is NULL

I have this pivoted table
+---------+----------+----------+-----+----------+
| Date | Product1 | Product2 | ... | ProductN |
+---------+----------+----------+-----+----------+
| 7/1/15 | 5 | 2 | ... | 7 |
| 8/1/15 | 7 | 1 | ... | 9 |
| 9/1/15 | NULL | 7 | ... | NULL |
| 10/1/15 | 8 | NULL | ... | NULL |
| 11/1/15 | NULL | NULL | ... | NULL |
+---------+----------+----------+-----+----------+
I wanted to fill in the NULL column with the values above them. So, the output should be something like this.
+---------+----------+----------+-----+----------+
| Date | Product1 | Product2 | ... | ProductN |
+---------+----------+----------+-----+----------+
| 7/1/15 | 5 | 2 | ... | 7 |
| 8/1/15 | 7 | 1 | ... | 9 |
| 9/1/15 | 7 | 7 | ... | 9 |
| 10/1/15 | 8 | 7 | ... | 9 |
| 11/1/15 | 8 | 7 | ... | 9 |
+---------+----------+----------+-----+----------+
I've found this article that might help me but this only manipulate one column. How do I apply this to all my column or how can I achieve such result since my columns are dynamic.
Any help would be much appreciated. Thanks!
The ANSI standard has the IGNORE NULLS option on LAG(). This is exactly what you want. Alas, SQL Server has not (yet?) implemented this feature.
So, you can do this in several ways. One is using multiple outer applys. Another uses correlated subqueries:
select p.date,
(case when p.product1 is not null else p.product1
else (select top 1 p2.product1 from pivoted p2 where p2.date < p.date order by p2.date desc)
end) as product1,
(case when p.product1 is not null else p.product1
else (select top 1 p2.product1 from pivoted p2 where p2.date < p.date order by p2.date desc)
end) as product1,
(case when p.product2 is not null else p.product2
else (select top 1 p2.product2 from pivoted p2 where p2.date < p.date order by p2.date desc)
end) as product2,
. . .
from pivoted p ;
I would recommend an index on date for this query.
I would like to suggest you a solution. If you have a table which consists of merely two columns my solution will work perfectly.
+---------+----------+
| Date | Product |
+---------+----------+
| 7/1/15 | 5 |
| 8/1/15 | 7 |
| 9/1/15 | NULL |
| 10/1/15 | 8 |
| 11/1/15 | NULL |
+---------+----------+
select x.[Date],
case
when x.[Product] is null
then min(c.[Product])
else
x.[Product]
end as Product
from
(
-- this subquery evaluates a minimum distance to the rows where Product column contains a value
select [Date],
[Product],
min(case when delta >= 0 then delta else null end) delta_min,
max(case when delta < 0 then delta else null end) delta_max
from
(
-- this subquery maps Product table to itself and evaluates the difference between the dates
select p.[Date],
p.[Product],
DATEDIFF(dd, p.[Date], pnn.[Date]) delta
from #products p
cross join (select * from #products where [Product] is not null) pnn
) x
group by [Date], [Product]
) x
left join #products c on x.[Date] =
case
when abs(delta_min) < abs(delta_max) then DATEADD(dd, -delta_min, c.[Date])
else DATEADD(dd, -delta_max, c.[Date])
end
group by x.[Date], x.[Product]
order by x.[Date]
In this query I mapped the table to itself rows which contain values by CROSS JOIN statement. Then I calculated differences between dates in order to pick the closest ones and thereafter fill empty cells with values.
Result:
+---------+----------+
| Date | Product |
+---------+----------+
| 7/1/15 | 5 |
| 8/1/15 | 7 |
| 9/1/15 | 7 |
| 10/1/15 | 8 |
| 11/1/15 | 8 |
+---------+----------+
Actually, the suggested query doesn't choose the previous value. Instead of this, it selects the closest value. In other words, my code can be used for a number of different purposes.
First You need to add identity column in temporary or hard table then resolved by following method.
--- Solution ----
Create Table #Test (ID Int Identity (1,1),[Date] Date , Product_1 INT )
Insert Into #Test ([Date], Product_1)
Values
('7/1/15',5)
,('8/1/15',7)
,('9/1/15',Null)
,('10/1/15',8)
,('11/1/15',Null)
Select ID , DATE ,
IIF ( Product_1 is null ,
(Select Product_1 from #TEST
Where ID = (Select Top 1 a.ID From #TEST a where a.Product_1 is not null and a.ID<b.ID
Order By a.ID desc)
),Product_1) Product_1
from #Test b
-- Solution End ---

Rolling up remaining rows into one called "Other"

I have written a query which selects lets say 10 rows for this example.
+-----------+------------+
| STORENAME | COMPLAINTS |
+-----------+------------+
| Store1 | 4 |
| Store7 | 2 |
| Store8 | 1 |
| Store9 | 1 |
| Store2 | 1 |
| Store3 | 1 |
| Store4 | 1 |
| Store5 | 0 |
| Store6 | 0 |
| Store10 | 0 |
+-----------+------------+
How would I go about displaying the TOP 3 rows BUT Having the remaining rows roll up into a row called "other", and it adds all of their Complaints together?
So like this for example:
+-----------+------------+
| STORENAME | COMPLAINTS |
+-----------+------------+
| Store1 | 4 |
| Store7 | 2 |
| Store8 | 1 |
| Other | 4 |
+-----------+------------+
So what has happened above, is it displays the top3 then adds the complaints of the remaining rows into a row called other
I have exhausted all my resources and cannot find a solution. Please let me know if this makes sense.
I have created a SQLfiddle of the above tables that you can edit if it is possible :)
Here's hoping this is possible :)
Thanks,
Mike
Something like this may work
select *, row_number() over (order by complaints desc) as sno
into #temp
from
(
SELECT
a.StoreName
,COUNT(b.StoreID) AS [Complaints]
FROM Stores a
LEFT JOIN
(
SELECT
StoreName
,Complaint
,StoreID
FROM Complaints
WHERE Complaint = 'yes') b on b.StoreID = a.StoreID
GROUP BY a.StoreName
) as t ORDER BY [Complaints] DESC
select storename,complaints from #temp where sno<4
union all
select 'other',sum(complaints) as complaints from #temp where sno>=4
I do this with double aggregation and row_number():
select (case when seqnum <= 3 then storename else 'Other' end) as StoreName,
sum(numcomplaints) as numcomplaints
from (select c.storename, count(*) as numcomplaints,
row_number() over (order by count(*) desc) as seqnum
from complaints c
where c.complaint = 'Yes'
group by c.storename
) s
group by (case when seqnum <= 3 then storename else 'Other' end) ;
From what I can see, you don't really need any additional information from stores, so this version just leaves that table out.