SQL Server and difficulty generating column with unique values - sql

I'm using Microsoft SQL Server Management Studio, and I want a new column that calculates the following:
If it has an ‘Exec’ value for category, it takes the ‘enddate’.
If it has an ‘Scop’ value for category, it takes the ‘start date’.
This new column calculates the number of months between these two.
I want SQL to do the calculation for a given id, so each id will have different values calculated.
At the moment it takes the minimum enddate and minimum 'startdate' for the entire table.
SELECT
id, category, startdate, enddate,
CASE
WHEN id = id
THEN DATEDIFF(month,
(SELECT MIN(enddate) from [A].[PP] where category = 'Exec'),
(SELECT MIN(startdate) from [A].[PP] where category = 'Scop')) --AS datemodify
ELSE NULL
END
FROM
[A].[PP]
WHERE
startdate IS NOT NULL
AND (category = 'Exec' OR category = 'Scop')
ORDER BY
id ASC
Results it produces at the moment:
id
category
startdate
enddate
NewCOlumn
1
Scop
2022-11-1
2022-10-1
11
1
Exec
2023-11-1
2023-10-1
11
2
Scop
2022-11-1
2022-10-1
11
2
Exec
2023-11-1
2023-09-1
11
The results I want:
id
category
startdate
enddate
NewCOlumn
1
Scop
2021-11-1
2022-10-1
24
1
Exec
2023-11-1
2023-11-1
24
2
Scop
2022-11-1
2022-10-1
11
2
Exec
2023-11-1
2023-09-1
11

Based on comments I'm not sure you still know you want as your output so I've come up with two different versions.
Here's how I'm created a version of your data set:
INSERT INTO #TempTable (ID, Category, StartDate, EndDate)
VALUES (1, 'Scop', '2021-11-01', '2022-10-01'),
(1, 'Exec', '2023-11-01', '2023-10-01'),
(2, 'Scop', '2022-11-01', '2022-10-01'),
(2, 'Exec', '2023-11-01', '2023-10-01');
This is the first version, this created your two lines per ID but hacks the StartDate and EndDate from different rows. This works by selecting all of the data straight out of the temp table, it then goes on to say if the row is Category = Scop then do a DateDiff between the StartDate and then fetches the EndDate from a subquery where the IDs match and the Category = Exec (it also has the same logic applied but the other way around for where the initial Category = Exec):
SELECT TT.ID,
TT.Category,
TT.StartDate,
TT.EndDate,
CASE
WHEN TT.Category = 'Scop' THEN DATEDIFF(M, TT.StartDate, (SELECT EndDate FROM #TempTable WHERE Category = 'Exec' AND ID = TT.ID))
ELSE CASE
WHEN TT.Category = 'Exec' THEN DATEDIFF(M, (SELECT StartDate FROM #TempTable WHERE Category = 'Scop' AND ID = TT.ID), TT.EndDate)
END
END AS DateDiffCalc
FROM #TempTable AS TT;
This version compresses the IDs to a single row, it initially only fetches Scop data, but then joins back to itself using ID and specifices now to get the Exec data only. Now you can DateDiff between the Scop StartDate and the Exec EndDate
SELECT DISTINCT t1.ID,
t1.Category,
t1.StartDate,
T2.Category,
T2.EndDate,
DATEDIFF(M, t1.StartDate, T2.EndDate) AS DateDiffCalc
FROM #TempTable AS t1
INNER JOIN #TempTable AS T2 ON T2.ID = T2.ID AND T2.Category = 'Exec'
WHERE t1.Category = 'Scop'
ORDER BY t1.ID;

Related

Remove duplicates from single field only in rollup query

I have a table of data for individual audits on inventory. Every audit has a location, an expected value, a variance value, and some other data that aren't really important here.
I am writing a query for Cognos 11 which summarizes a week of these audits. Currently, it rolls everything up into sums by location class. My problem is that there may be multiple audits for individual locations and while I want the variance field to sum the data from all audits regardless of whether it's the first count on that location, I only want the expected value for distinct locations (i.e. only SUM expected value where the location is distinct).
Below is a simplified version of the query. Is this even possible or will I have to write a separate query in Cognos and make it two reports that will have to be combined after the fact? As you can likely tell, I'm fairly new to SQL and Cognos.
SELECT COALESCE(CASE
WHEN location_class = 'A'
THEN 'Active'
WHEN location_class = 'C'
THEN 'Active'
WHEN location_class IN (
'R'
,'0'
)
THEN 'Reserve'
END, 'Grand Total') "Row Labels"
,SUM(NVL(expected_cost, 0)) "Sum of Expected Cost"
,SUM(NVL(variance_cost, 0)) "Sum of Variance Cost"
,SUM(ABS(NVL(variance_cost, 0))) "Sum of Absolute Cost"
,COUNT(DISTINCT location) "Count of Locations"
,(SUM(NVL(variance_cost, 0)) / SUM(NVL(expected_cost, 0))) "Variance"
FROM audit_table
WHERE audit_datetime <= #prompt('EndDate') # audit_datetime >= #prompt('StartDate') #
GROUP BY ROLLUP(CASE
WHEN location_class = 'A'
THEN 'Active'
WHEN location_class = 'C'
THEN 'Active'
WHEN location_class IN (
'R'
,'0'
)
THEN 'Reserve'
END)
ORDER BY 1 ASC
This is what I'm hoping to end up with:
Thanks for any help!
Have you tried taking a look at the OVER clause in SQL? It allows you to use windowed functions within a result set such that you can get aggregates based on specific conditions. This would probably help since you seem to trying to get a summation of data based on a different grouping within a larger grouping.
For example, let's say we have the below dataset:
group1 group2 val dateadded
----------- ----------- ----------- -----------------------
1 1 1 2020-11-18
1 1 1 2020-11-20
1 2 10 2020-11-18
1 2 10 2020-11-20
2 3 100 2020-11-18
2 3 100 2020-11-20
2 4 1000 2020-11-18
2 4 1000 2020-11-20
Using a single query we can return both the sums of "val" over "group1" as well as the summation of the first (based on datetime) "val" records in "group2":
declare #table table (group1 int, group2 int, val int, dateadded datetime)
insert into #table values (1, 1, 1, getdate())
insert into #table values (1, 1, 1, dateadd(day, 1, getdate()))
insert into #table values (1, 2, 10, getdate())
insert into #table values (1, 2, 10, dateadd(day, 1, getdate()))
insert into #table values (2, 3, 100, getdate())
insert into #table values (2, 3, 100, dateadd(day, 1, getdate()))
insert into #table values (2, 4, 1000, getdate())
insert into #table values (2, 4, 1000, dateadd(day, 1, getdate()))
select t.group1, sum(t.val) as group1_sum, group2_first_val_sum
from #table t
inner join
(
select group1, sum(group2_first_val) as group2_first_val_sum
from
(
select group1, val as group2_first_val, row_number() over (partition by group2 order by dateadded) as rownumber
from #table
) y
where rownumber = 1
group by group1
) x on t.group1 = x.group1
group by t.group1, x.group2_first_val_sum
This returns the below result set:
group1 group1_sum group2_first_val_sum
----------- ----------- --------------------
1 22 11
2 2200 1100
The most inner subquery in the joined table numbers the rows in the data set based on "group2", resulting in the records either having a "1" or a "2" in the "rownum" column since there's only 2 records in each "group2".
The next subquery takes that data and filters out any rows that are not the first (rownum = 1) and sums the "val" data.
The main query gets the sum of "val" in each "group1" from the main table and then joins on the subqueried table to get the "val" sum of only the first records in each "group2".
There are more efficient ways to write this such as moving the summation of the "group1" values to a subquery in the SELECT statement to get rid of one of the nested tabled subqueries, but I wanted to show how to do it without subqueries in the SELECT statement.
Have you tried to put the distinct at the bottom like this ?
(SUM(NVL(variance_cost,0)) / SUM(NVL(expected_cost,0))) "Variance",
COUNT(DISTINCT location) "Count of Locations"
FROM audit_table

Highlight multiple records in a date range

Working with SQL Server 2008.
fromdate todate ID name
--------------------------------
1-Aug-16 7-Aug-16 x jack
3-Aug-16 4-Aug-16 x jack
5-Aug-16 6-Aug-16 x tom
1-Aug-16 2-Aug-16 x john
3-Aug-16 4-Aug-16 x harry
5-Aug-16 6-Aug-16 x mac
Is there a way to script this so that I know if there are multiple names tagged to an ID in the same date range?
For example above, I want to flag that ID x has Name Jack and Tom tagged in the same date range.
ID multiple_flag
------------------------------------------------
x yes
y no
If there is a unique index in your table (in my example it is column i but you could also generate one by means of using ROW_NUMBER()) then you can do the following query based on an INNER JOIN to find overlapping date ranges:
CREATE TABLE #tmp (i int identity primary key,fromdate date,todate date,ID int,name varchar(32));
insert into #tmp (fromdate,todate,ID ,name) values
('1-Aug-16','7-Aug-16',3,'jack'),
('3-Aug-16','4-Aug-16',3,'tom'),
('5-Aug-16','6-Aug-16',3,'jack');
select a.*,b.name bname,b.i i2 from #tmp a
INNER join #tmp b on b.id=a.id AND b.i<>a.i
AND ( b.fromdate between a.fromdate and a.todate
OR b.todate between a.fromdate and a.todate)
(My id column is int). This will give you:
i fromdate todate ID name bname i2
- ---------- ---------- - ---- ----- --
1 2016-08-01 2016-08-07 3 jack tom 2
1 2016-08-01 2016-08-07 3 jack jack 3
Implement further filtering or grouping as required. I left a little demo here.
Please check the below sql, but it might not be the optimal one..
SELECT formdate,todate,id,tab1.name,
case when tab2.#Of >1 then 'yes' else 'no' end as multiple_flag
FROM tab1
inner join (SELECT Name, COUNT(*) as #Of
FROM tab1
GROUP BY Name) as tab2 on tab1.name=tab2.name
order by tab1.id ;
add your where condition, before the order by, if you need to add some date range on your sql.
change formdate to fromdate before run this sql, as I have used formdate in my machine.
The result looks like
One way to do it is using EXISTS CASE:
Please note this part of the query:
-- make sure the records date ranges overlap
AND t1.fromdate <= t2.todate
AND t2.fromdate <= t1.todate
for an explanation on testing for overlapping ranges, read the overlap wiki.
Create and populate sample data (Please save us this step in your future questions)
DECLARE #T as table
(
fromdate date,
todate date,
ID char(1),
name varchar(10)
)
INSERT INTO #T VALUES
('2016-08-01', '2016-08-07', 'x', 'jack'),
('2016-08-03', '2016-08-04', 'x', 'tom'),
('2016-08-05', '2016-08-06', 'x', 'jack'),
('2016-08-01', '2016-08-02', 'y', 'john'),
('2016-08-03', '2016-08-04', 'y', 'harry'),
('2016-08-05', '2016-08-06', 'y', 'mac')
The query:
SELECT DISTINCT id,
CASE WHEN EXISTS
(
SELECT 1
FROM #T t2
WHERE t1.Id = t2.Id
-- make sure it's not the same record
AND t1.fromdate <> t2.fromdate
AND t1.todate <> t2.todate
-- make sure the records date ranges overlap
AND t1.fromdate <= t2.todate
AND t2.fromdate <= t1.todate
)
THEN 'Yes'
ELSE 'No'
END As multiple_flag
FROM #T t1
Results:
id multiple_flag
---- -------------
x Yes
y No

Fetch Date based on another date

Table 1:
Temp ResID Code Date
11 1 SPR
12 1 SPG 2009-10-05
13 1 SPR
14 1 SPG 2011-10-08
Table 2:
TempID Res ID InDate Out Date
21 1. 2009-10-05 2010-11-15
22 1. 2011-10-08 2011-11-09
Table 3: (Desired Result)
ResID Code Date
1 SPR 2010-11-15
1 SPG 2009-10-05
1 SPR 2011-11-09
1 SPG 2011-10-08
I have two tables as above.
I need to update the Table 1 Date column for every ID for every row where the code is SPR. The SPG value in table 1 date column is equal to InDate column value for the same resident
Please advice with the query. How do I achieve this with a query joining the two tables, table 1 and table 2 to get table 3
Edited from a select statement to an Update
I think this will work for you. See the SqlFiddle
Update
Table1
set
Table1.[Date] = Table2.OutDate
from
Table1
inner join
(
select
Temp,
ResID,
Code,
Date,
row_number() over(partition by ResID order by Temp) as RowId
from
Table1
where
Code = 'SPR'
) as SPR on Table1.Temp = SPR.Temp
inner join
(
select
ResID,
Code,
Date,
row_number() over(partition by ResID order by Temp) as RowId
from
Table1
where
Code = 'SPG'
) as SPG on SPG.RowId = SPR.RowId and SPG.ResID = SPR.ResID
inner join
Table2 on Table2.InDate = SPG.Date and Table2.ResID = SPG.ResID
You could delete all the SPR rows and then generate them:
delete Table1
where code = 'SPR';
insert Table1
(ResID, Code, [Date])
select t1.ResID
, 'SPR'
, OutDate
from Table1 t1
join Table2 t2
on t1.ResID = t2.ResID
and t1.[Date] = t2.InDate
where t1.Code = 'SPG';
Example at SQL Fiddle.
If table2 has value for every table one then you can just use table2, kind of weird but you can always join with the main table to get appropriate result if you want to
select Convert(varchar, indate, 102) as date, 'SPG' as code, resid from table2
union all
select convert(varchar, outdate, 102) as date, 'SPR' as code, resid from table2
SQL Fiddle

SQL to Return missing Row

I have one Scenario where I need to find missing records in Table using SQL - without using Cursor, Views, SP.
For a particular CustID initial Start_Date will be 19000101 and End_date will be any random date.
Then for next Record for the same CustID will have its Start_Date as End_Date (of previous Record) + 1.
Its End_Date again will be any random date.
And so on….
For Last record of same CustID its end Date will be 99991231.
Following population of data will explain it better.
CustID Start_Date End_Date
1 19000101 20121231
1 20130101 20130831
1 20130901 20140321
1 20140321 99991231
Basically I am trying to populate data like in SCD2 scenario.
Now I want to find missing record (or CustID).
Like below we don’t have record with CustID = 4 with Start_Date = 20120606 and End_Date = 20140101
CustID Start_Date End_Date
4 19000101 20120605
4 20140102 99991231
Code for Creating Table
CREATE TABLE TestTable
(
CustID int,
Start_Date int,
End_Date int
)
INSERT INTO TestTable values (1,19000101,20121231)
INSERT INTO TestTable values (1,20130101,20130831)
INSERT INTO TestTable values (1,20130901,20140321)
INSERT INTO TestTable values (1,20140321,99991231)
INSERT INTO TestTable values (2,19000101,99991213)
INSERT INTO TestTable values (3,19000101,20140202)
INSERT INTO TestTable values (3,20140203,99991231)
INSERT INTO TestTable values (4,19000101,20120605)
--INSERT INTO TestTable values (4,20120606,20140101) --Missing Value
INSERT INTO TestTable values (4,20140102,99991231)
Now SQL should return CustID = 4 as its has missing Value.
My idea is based on this logic. Lets assume 19000101 as 1 and 99991231 as 10. Now for all IDs, if you subtract the End_date - start_date and add them up, the total sum must be equal to 9 (10 - 1). You can do the same here
SELECT ID, SUM(END_DATE - START_DATE) as total from TABLE group by ID where total < (MAX_END_DATE - MIN_START_DATE)
You might want to find the command in your SQL that gives the number of days between 2 days and use that in the SUM part.
Lets take the following example
1 1900 2003
1 2003 9999
2 1900 2222
2 2222 9977
3 1900 9999
The query will be executed as follows
1 (2003 - 1900) + (9999 - 2003) = 1 8098
2 (2222 - 1900) + (9977 - 2222) = 2 9077
3 (9999 - 1900) = 3 8098
The where clause will eliminate 1 and 3 giving you only 2, which is what you want.
If you just need the CustID then this will do
SELECT t1.CustID
FROM TestTable t1
LEFT JOIN TestTable t2
ON DATEADD(D, 1, t1.Start_Date) = t2.Start_Date
WHERE t2.CustID IS NULL
GROUP BY t1.CustID
You need rows if the one of the following conditions is met:
Not a final row (99991231) and no matching next row
Not a start row (19000101) and no matching previous row
You can left join to the same table to find previous and next rows and filter the results where you don't find a row by checking the column values for null:
SELECT t1.CustID, t1.StartDate, t1.EndDate
FROM TestTable t1
LEFT JOIN TestTable tPrevious on tPrevious.CustID = t1.CustID
and tPrevious.EndDate = t1.StartDate - 1
LEFT JOIN TestTable tNext on tNext.CustID = t1.CustID
and tNext.StartDate = t1.EndDate + 1
WHERE (t1.EndDate <> 99991231 and tNext.CustID is null) -- no following
or (t1.StartDate <> 19000101 and tPrevious.CustID is null) -- no previous

Group events by temporal distance in SQL

In general, I need to associate (group) records which are created in similar time periods. If it helps, thinking of the example below as clickstream data where there is no sessionID and I need to build those sessions.
I have the following dataset:
UserId INT,
EventId INT,
DateCreated DATETIME,
BlockId INT
Assume the following data:
{123, 111, '2009-12-01 9:15am', NULL}
{123, 222, '2009-12-01 9:20am', NULL}
{123, 333, '2009-12-01 9:25am', NULL}
{123, 444, '2009-12-03 2:30pm', NULL}
{123, 555, '2009-12-03 2:32pm', NULL}
What I need to do is divide these events up, by user, into temporal buckets. There is a business rule that says anything > 30 minutes should be a new bucket. In the above example, events 111-333 represent a block, i.e. not more than 30 minutes separates them. Likewise, events 444-555 represent a second block.
My current solution uses a cursor and is extremely slow (therefore, unsustainable for the amount of data I need to process). I can post the code but it is pretty simple.
Any ideas?
Hopefully this will get you going in the right direction. If you're in an SP then using table variables for the StartTimes and EndTimes should make the query much easier to read and understand. This will give you start and end times for your batches, then just join back to your table and you should have it.
;WITH StartTimes AS
(
SELECT DISTINCT
T1.DateCreated AS StartTime
FROM
My_Table T1
LEFT OUTER JOIN My_Table T2 ON
T2.UserID = T1.UserID AND
T2.EventID = T1.EventID AND
T2.DateCreated >= DATEADD(mi, -30, T1.DateCreated) AND
T2.DateCreated < T1.DateCreated
WHERE
T2.UserID IS NULL
)
SELECT
StartTimes.StartTime,
EndTimes.EndTime
FROM
(
SELECT DISTINCT
T3.DateCreated AS EndTime
FROM
My_Table T3
LEFT OUTER JOIN My_Table T4 ON
T4.UserID = T3.UserID AND
T4.EventID = T3.EventID AND
T4.DateCreated <= DATEADD(mi, 30, T3.DateCreated) AND
T4.DateCreated > T3.DateCreated
WHERE
T4.UserID IS NULL
) AS ET
INNER JOIN StartTimes ST ON
ST.StartTime <= ET.EndTimes
LEFT OUTER JOIN StartTimes ST2 ON
ST2.StartTime <= ET.EndTimes AND
ST2.StartTime > ST.StartTime
WHERE
ST2.StartTime IS NULL
Based on comment thread,
A. Buckets are defined by the first record in the bucket, and the first record in each Bucket is defined as any row where the DateCreated is more than 30 minutes after the latest earlier DateCreated. (immediately previous record)
B. The rest of the rows in the bucket are all rows with DateCreated on or after the First Row whose DateCreated is less than 30 minutes after the immediately previous row, and there does not exist a non-qualifying, (or new bucket-defining), row since the specified Bucket-defining row.
In English:
Select The DateCreated of those records wheret he DateCreated is more than 30 minutes after the previous DateCreated and aggregate function of your choice on all the other records in table whose DateCreated is after that bucket-defining datecreated, less than 30 minutes after it's immedialte previous DateCreated, and there are no records between the bucket-defining DateCreated and this one which follow a greater than 30 minute gap.
In SQL:
Select Z.BucketDefinitionDate , Count(*) RowsInBucket
From (Select Distinct DateCreated BucketDefinitionDate
From Table Ti
Where DateCreated > DateAdd(minute, 30,
(Select Max(DateCreated) From Table
Where DateCreated < Ti.DateCreated))) Z
Join Table B
On B.DateCreated > Z.BucketDefinitionDate
And Not Exists
(Select * From Table
Where DateCreated Between Z.BucketDefinitionDate
And B.DateCreated
And DateCreated > DateAdd(minute, 30,
(Select Max(DateCreated) From Table
Where DateCreated < B.DateCreated)))
Group By Z.BucketDefinitionDate
What you can try is
DECLARE #TABLE TABLE(
ID INT,
EventID INT,
DateCreated DATETIME
)
INSERT INTO #TABLE SELECT 123, 111, '2009-12-01 9:15am'
INSERT INTO #TABLE SELECT 123, 222, '2009-12-01 9:20am'
INSERT INTO #TABLE SELECT 123, 333, '2009-12-01 9:25am'
INSERT INTO #TABLE SELECT 123, 444, '2009-12-03 2:30pm'
INSERT INTO #TABLE SELECT 123, 555, '2009-12-01 2:32pm'
SELECT ID,
DATEADD(dd, DATEDIFF(dd,0,DateCreated), 0) DayVal,
DATEPART(hh, DateCreated) HourPart,
FLOOR(DATEPART(mi, DateCreated) / 30.) MinBucket
FROM #TABLE
Now you can group by DayVal, HourPart and MinBucket.
I think I have something for you. it is not a cool single query like Tom H posted, but it seems to work. It uses a table variable as a working table.
declare #table table(
id int identity(1,1),
userId int,
eventId int,dateCreated datetime,
bucket int
)
insert into #table select 123, 111, '2009-12-01 9:15am', 0
// etc... insert more rows - note that the 'bucket' field is set to 0
declare #next_bucket int
set #next_bucket = 1
update #table
set bucket = #next_bucket, #next_bucket = #next_bucket + 1
from #table as [current]
where datecreated > dateadd(mi, 30, (select datecreated from #table as previous where [current].id = previous.id + 1))
update #table
set bucket =
coalesce(( select max(bucket)
from #table as previous
where previous.id < [current].id
and bucket <> 0
), 1)
from #table as [current]
where bucket = 0
-- return the results
select * from #table