SQL Query to generate an extra field from data in the table - sql

I have a table with 3 fields like this sample table Tbl1
Person Cost FromDate
1 10 2009-1-1
1 20 2010-1-1
2 10 2009-1-1
I want to query it and get back the 3 fields and a generated field called ToDate that defaults to 2099-1-1 unless there is an actual ToDate implied from another entry for the person in the table.
select Person,Cost,FromDate,ToDate From Tbl1
Person Cost FromDate ToDate
1 10 2009-1-1 2010-1-1
1 20 2010-1-1 2099-1-1
2 10 2009-1-1 2099-1-1

You can select the minimum date from all dates that are after the record's date. If there is none you get NULL. With COALESCE you change NULL into the default date:
select
Person,
Cost,
FromDate,
coalesce((select min(FromDate) from Tbl1 later where later.FromDate > Tbl1.FromDate), '2099-01-01') as ToDate
From Tbl1
order by Person, FromDate;

Although Thorsten's answer is perfectly fine, it would be more efficient to use window-functions to match the derived end-dates.
;WITH nbrdTbl
AS ( SELECT Person, Cost, FromDate, row_nr = ROW_NUMBER() OVER (PARTITION BY Person ORDER BY FromDate ASC)
FROM Tbl1)
SELECT t.Person, t.Cost, t.FromDate, derived_end_date = COALESCE(nxt.FromDate, '9991231')
FROM nbrdTbl t
LEFT OUTER JOIN nbrdTbl nxt
ON nxt.Person = t.Person
AND nxt.row_nr = t.row_nr + 1
ORDER BY t.Person, t.FromDate
Doing a test on a 2000-records table it's about 3 times as efficient according to the Execution plan (78% vs 22%).

Related

Extra column looking at where OpenDate > ClosedDate prevoius records

I have a hard struggle with this problem.
I have the following table:
TicketNumber
OpenTicketDate YYYY_MM
ClosedTicketDate YYYY_MM
1
2018-1
2020-1
2
2018-2
2021-2
3
2019-1
2020-6
4
2020-7
2021-1
I would like to create an extra column which would monitor the open tickets at the given OpenTicketDate.
So the new table would look like this:
TicketNumber
OpenTicketDate YYYY_MM
ClosedTicketDate YYYY_MM
OpenTicketsLookingBackwards
1
2018-1
2020-1
1
2
2018-2
2021-2
2
3
2019-1
2020-6
3
4
2020-7
2021-1
2
The logic behind the 4th (extra) column is that it looks at the previous records & current record where the ClosedTicketsDate > OpenTicketDate.
For example ticketNumber 4 has '2' open tickets because there are only 2 ClosedTicketDate records where ClosedTicketDate > OpenTicketDate.
The new column only fills data based on looking at prevoius records. It is backward looking not forward.
Is there anyone who can help me out?
You could perform a self join and aggregate as the following:
Select T.TicketNumber, T.OpenTicketDate, T.ClosedTicketDate,
Count(*) as OpenTicketsLookingBackwards
From table_name T Left Join table_name D
On Cast(concat(T.OpenTicketDate,'-1') as Date) < Cast(concat(D.ClosedTicketDate,'-1') as Date)
And T.ticketnumber >= D.ticketnumber
Group By T.TicketNumber, T.OpenTicketDate, T.ClosedTicketDate
Order By T.TicketNumber
You may also try with a scalar subquery as the following:
Select T.TicketNumber, T.OpenTicketDate, T.ClosedTicketDate,
(
Select Count(*) From table_name D
Where Cast(concat(T.OpenTicketDate,'-1') as Date) <
Cast(concat(D.ClosedTicketDate,'-1') as Date)
And T.ticketnumber >= D.ticketnumber
) As OpenTicketsLookingBackwards
From table_name T
Order By T.TicketNumber
Mostly, joins tend to outperform subqueries.
See a demo.

SQL SELECT repeating rows from table for specific time interval

I have table and I want to find repeating rows for specific time interval (DATE is input parameter for SQL query) where it will list all rows with the same PERSON and TYPE value.
ID DATE PERSON TYPE
1 01.01.2017 PERSON1 TYPE1
2 02.02.2017 PERSON1 TYPE1
3 03.03.2017 PERSON2 TYPE1
4 04.04.2017 PERSON2 TYPE2
5 05.05.2017 PERSON2 TYPE1
6 06.06.2017 PERSON1 TYPE2
So for example if DATE is between 01.01 and 04.04 it should list me rows with ID 1 and 2.
If DATE is between 01.01 and 06.06 it should list me rows with ID 1, 2, 3 and 5 because 1 and 2 have the same person and type in that interval and 3 and 5 have the same person and type in that interval.
SELECT ID FROM TABLE
WHERE DATE>='01.01.2017' AND DATE<='06.06.2017'
but I am not sure even how to start to define this repeating clause based on PERSON and TYPE columns.
Maybe can INNER JOIN help with this if referencing the same table and matching those two columns and third column ID is different?: TABLE.PERSON=TABLE.PERSON and TABLE.TYPE=TABLE.TYPE and TABLE.ID!=TABLE.ID of course table is the same but different alias can be used for this?
Please try...
SELECT ID AS ID
FROM tableName
JOIN
(
SELECT person AS person,
type AS type,
COUNT( person ) AS countOfPair
FROM tableName
WHERE date BETWEEN startDate AND endDate
GROUP BY person,
type
) tempTable ON tableName.person = tempTable.person AND
tableName.type = tempTable.type
WHERE countOfPair >= 2
The inner SELECT gathers each combination of person and type in between your start and end dates (please replace startDate and endDate with however you are referencing those) and performs a count of them.
The outer SELECT statement's JOIN then has the effect of appending the count of each combination to the end of each row containing that combination. The outer SELECT then retrieves the ID from each row that has a repeated combination.
If you have any questions or comments, then please feel free to post a Comment accordingly.
You can try this (I don't know if your version has window analytic function):
(X is the name of your table)
SELECT Y.ID, Y.DATE, Y.PERSON, Y.TYPE
FROM (
SELECT *, COUNT(*) OVER (PARTITION BY PERSON, TYPE) AS RC
FROM X
WHERE DATE >='01.01.2017' AND DATE <='04.04.2017'
) Y
WHERE RC>1
Or this if it doesn't support them:
SELECT X.ID, X.DATE, X.PERSON, X.TYPE
FROM X
INNER JOIN (
SELECT PERSON, TYPE, COUNT(*) AS RC
FROM X
WHERE DATE >='01.01.2017' AND DATE <='04.04.2017'
GROUP BY PERSON, TYPE
) Y ON X.PERSON = Y.PERSON AND X.TYPE = Y.TYPE
WHERE RC>1
I suggest to use always appropriate conversion for date datatypes.
Another method would be:
SELECT a.id
FROM tablename a NATURAL JOIN
(SELECT person,type FROM tablename
WHERE date>='01.01.2017' AND date<='06.06.2017'
GROUP BY person, type HAVING COUNT(*)>1) b ;
The NATURAL JOIN would automatically use columns person and type.
Add "DISTINCT" clause to avoid redundancy
SELECT DISTINCT ID FROM TABLE
WHERE DATE>='01.01.2017' AND DATE<='06.06.2017'

Joining next Sequential Row

I am planing an SQL Statement right now and would need someone to look over my thougts.
This is my Table:
id stat period
--- ------- --------
1 10 1/1/2008
2 25 2/1/2008
3 5 3/1/2008
4 15 4/1/2008
5 30 5/1/2008
6 9 6/1/2008
7 22 7/1/2008
8 29 8/1/2008
Create Table
CREATE TABLE tbstats
(
id INT IDENTITY(1, 1) PRIMARY KEY,
stat INT NOT NULL,
period DATETIME NOT NULL
)
go
INSERT INTO tbstats
(stat,period)
SELECT 10,CONVERT(DATETIME, '20080101')
UNION ALL
SELECT 25,CONVERT(DATETIME, '20080102')
UNION ALL
SELECT 5,CONVERT(DATETIME, '20080103')
UNION ALL
SELECT 15,CONVERT(DATETIME, '20080104')
UNION ALL
SELECT 30,CONVERT(DATETIME, '20080105')
UNION ALL
SELECT 9,CONVERT(DATETIME, '20080106')
UNION ALL
SELECT 22,CONVERT(DATETIME, '20080107')
UNION ALL
SELECT 29,CONVERT(DATETIME, '20080108')
go
I want to calculate the difference between each statistic and the next, and then calculate the mean value of the 'gaps.'
Thougts:
I need to join each record with it's subsequent row. I can do that using the ever flexible joining syntax, thanks to the fact that I know the id field is an integer sequence with no gaps.
By aliasing the table I could incorporate it into the SQL query twice, then join them together in a staggered fashion by adding 1 to the id of the first aliased table. The first record in the table has an id of 1. 1 + 1 = 2 so it should join on the row with id of 2 in the second aliased table. And so on.
Now I would simply subtract one from the other.
Then I would use the ABS function to ensure that I always get positive integers as a result of the subtraction regardless of which side of the expression is the higher figure.
Is there an easier way to achieve what I want?
The lead analytic function should do the trick:
SELECT period, stat, stat - LEAD(stat) OVER (ORDER BY period) AS gap
FROM tbstats
The average value of the gaps can be done by calculating the difference between the first value and the last value and dividing by one less than the number of elements:
select sum(case when seqnum = num then stat else - stat end) / (max(num) - 1);
from (select period, row_number() over (order by period) as seqnum,
count(*) over () as num
from tbstats
) t
where seqnum = num or seqnum = 1;
Of course, you can also do the calculation using lead(), but this will also work in SQL Server 2005 and 2008.
By using Join also you achieve this
SELECT t1.period,
t1.stat,
t1.stat - t2.stat gap
FROM #tbstats t1
LEFT JOIN #tbstats t2
ON t1.id + 1 = t2.id
To calculate the difference between each statistic and the next, LEAD() and LAG() may be the simplest option. You provide an ORDER BY, and LEAD(something) returns the next something and LAG(something) returns the previous something in the given order.
select
x.id thisStatId,
LAG(x.id) OVER (ORDER BY x.id) lastStatId,
x.stat thisStatValue,
LAG(x.stat) OVER (ORDER BY x.id) lastStatValue,
x.stat - LAG(x.stat) OVER (ORDER BY x.id) diff
from tbStats x

What's the most efficient way to match values between 2 tables based on most recent prior date?

I've got two tables in MS SQL Server:
dailyt - which contains daily data:
date val
---------------------
2014-05-22 10
2014-05-21 9.5
2014-05-20 9
2014-05-19 8
2014-05-18 7.5
etc...
And periodt - which contains data coming in at irregular periods:
date val
---------------------
2014-05-21 2
2014-05-18 1
Given a row in dailyt, I want to adjust its value by adding the corresponding value in periodt with the closest date prior or equal to the date of the dailyt row. So, the output would look like:
addt
date val
---------------------
2014-05-22 12 <- add 2 from 2014-05-21
2014-05-21 11.5 <- add 2 from 2014-05-21
2014-05-20 10 <- add 1 from 2014-05-18
2014-05-19 9 <- add 1 from 2014-05-18
2014-05-18 8.5 <- add 1 from 2014-05-18
I know that one way to do this is to join the dailyt and periodt tables on periodt.date <= dailyt.date and then imposing a ROW_NUMBER() (PARTITION BY dailyt.date ORDER BY periodt.date DESC) condition, and then having a WHERE condition on the row number to = 1.
Is there another way to do this that would be more efficient? Or is this pretty much optimal?
I think using APPLY would be the most efficient way:
SELECT d.Val,
p.Val,
NewVal = d.Val + ISNULL(p.Val, 0)
FROM Dailyt AS d
OUTER APPLY
( SELECT TOP 1 Val
FROM Periodt p
WHERE p.Date <= d.Date
ORDER BY p.Date DESC
) AS p;
Example on SQL Fiddle
If there relatively very few periodt rows, then there is an option that may prove quite efficient.
Convert periodt into a From/To ranges table using subqueries or CTEs. (Obviously performance depends on how efficiently this initial step can be done, which is why a small number of periodt rows is preferable.) Then the join to dailyt will be extremely efficient. E.g.
;WITH PIds AS (
SELECT ROW_NUMBER() OVER(ORDER BY PDate) RN, *
FROM #periodt
),
PRange AS (
SELECT f.PDate AS FromDate, t.PDate as ToDate, f.PVal
FROM PIds f
LEFT OUTER JOIN PIds t ON
t.RN = f.RN + 1
)
SELECT d.*, p.PVal
FROM #dailyt d
LEFT OUTER JOIN PRange p ON
d.DDate >= p.FromDate
AND (d.DDate < p.ToDate OR p.ToDate IS NULL)
ORDER BY 1 DESC
If you want to try the query, the following produces the sample data using table variables. Note I added an extra row to dailyt to demonstrate no periodt entries with a smaller date.
DECLARE #dailyt table (
DDate date NOT NULL,
DVal float NOT NULL
)
INSERT INTO #dailyt(DDate, DVal)
SELECT '20140522', 10
UNION ALL SELECT '20140521', 9.5
UNION ALL SELECT '20140520', 9
UNION ALL SELECT '20140519', 8
UNION ALL SELECT '20140518', 7.5
UNION ALL SELECT '20140517', 6.5
DECLARE #periodt table (
PDate date NOT NULL,
PVal int NOT NULL
)
INSERT INTO #periodt
SELECT '20140521', 2
UNION ALL SELECT '20140518', 1

Help in correcting the SQL

I am trying to solve this query. I have the following data:
Input
Date Id Value
25-May-2011 1 10
26-May-2011 1 10
26-May-2011 2 10
27-May-2011 1 20
27-May-2011 2 20
28-May-2011 1 10
I need to query and output as:
Output
FromDate ToDate Id Value
25-May-2011 26-May-2011 1 10
26-May-2011 26-May-2011 2 10
27-May-2011 27-May-2011 1 20
28-May-2011 28-May-2011 1 10
I tried this sql but I'm not getting the correct result:
SELECT START_DATE, END_DATE, A.KEY, B.VALUE FROM
(
SELECT MIN(DATE) START_DATE, KEY, VALUE
FROM
KEY_VALUE
GROUP
BY KEY,VALUE
) A INNER JOIN
(
SELECT MAX(DATE) END_DATE, KEY, VALUE
FROM
KEY_VALUE
GROUP
BY KEY, VALUE
) B ON A.KEY = B.KEY AND A.VALUE = B.VALUE;
I think that you are trying too hard. Should be more like this:
SELECT MIN(START_DATE) AS FromDate, MAX(END_DATE) AS ToDate, KEY, VALUE
FROM KEY_VALUE
GROUP BY KEY, VALUE
This query appears to produce the correct results, though it pointed out that you missed a line in your example output '27-May-2011 ... 27-May-2011 ... 2 ... 20'.
select id, [value], date as fromdate, (
select top 1 date
from key_value kv2
where id = kv.id
and [value] = kv.[value]
and date >= kv.date
and datediff(d, kv.date, date) = (
select count(*)
from key_value
where id = kv.id
and [value] = kv.[value]
and date > kv.date
and date <= kv2.date
)
order by date desc
) as todate
from key_value kv
where not exists (
select *
from key_value
where id = kv.id
and [value] = kv.[value]
and date = dateadd(d, -1, kv.[date])
)
First it finds the min date records with the where clause, looking for records that do not have another record on the day before. Then the todate subquery gets the greatest date record by finding the number of days between it and min date then finding the number of records between the two and making sure they match. This of course assumes that the records in the table are distinct.
However if you are processing a massive table your best option may be to sort the records by key, id, date and then use a cursor to programmatically find the min and max dates as you loop over and look for values to change, then push them into a new table whether real or temp along with any other calculations you might need to do on other fields along the way.