How to subtracts values between from different dates in SQL? - sql

Let's say that I'm using the following SQL table called TestTable:
Date Value1 Value2 Value3 ... Name
2013/01/01 1 4 7 Name1
2013/01/14 6 10 8 Name1
2013/02/23 10 32 9 Name1
And I'd like to get the increment of the values between to dates, like:
Value1Inc Value2Inc Value3Inc Name
4 22 1 Name1
between 2013/02/23 and 2013/01/14.
Please note that the values always increment. I'm trying the following approach found in StackOverflow:
select (
(select value1 from TestTable where date < '2013/01/14') -
(select value1 from TestTable where date < '2013/02/23')
) as Value1Inc,
(select value2 from TestTable where date < '2013/01/14') -
(select value2 from TestTable where date < '2013/02/23')
as Value2Inc
...
and so on, but this approach gives me a huge query.
I'd like to use MAX & MIN SQL functions in order to simplify the query, but I don't know exaclty how to do, as I'm not a SQL maste (at least yet:-).
Could you please guys give me a hand here?
Edit: Ups, I think that I have found the solution by myselft by adding a "GROUP BY Name" at the end of the query like this:
select name,max(value1) - min(value1) from TestTable where date < '2013-02-23' and date > '2013-01-01' GROUP BY Name
That was it!

You want to match the next record, using a join. Probably the easiest way is to enumerate and join:
with tt as (
select tt.*, row_number() over (partition by name order by date) as seqnum
from testtable tt
)
select tt.name, tt.date, ttnext.date as nextdate,
(ttnext.value1 - tt.value1) as Diff_Value1,
(ttnext.value2 - tt.value2) as Diff_Value2,
(ttnext.value3 - tt.value3) as Diff_Value2
from tt left outer join
tt ttnext
on tt.seqnum = ttnext.seqnum - 1;
If your database does not support row_number(), you can do something similar with correlated subqueries.

Related

SQL query with grouping and MAX

I have a table that looks like the following but also has more columns that are not needed for this instance.
ID DATE Random
-- -------- ---------
1 4/12/2015 2
2 4/15/2015 2
3 3/12/2015 2
4 9/16/2015 3
5 1/12/2015 3
6 2/12/2015 3
ID is the primary key
Random is a foreign key but i am not actually using table it points to.
I am trying to design a query that groups the results by Random and Date and select the MAX Date within the grouping then gives me the associated ID.
IF i do the following query
select top 100 ID, Random, MAX(Date) from DateBase group by Random, Date, ID
I get duplicate Randoms since ID is the primary key and will always be unique.
The results i need would look something like this
ID DATE Random
-- -------- ---------
2 4/15/2015 2
4 9/16/2015 3
Also another question is there could be times where there are many of the same date. What will MAX do in that case?
You can use NOT EXISTS() :
SELECT * FROM YourTable t
WHERE NOT EXISTS(SELECT 1 FROM YourTable s
WHERE s.random = t.random
AND s.date > t.date)
This will select only those who doesn't have a bigger date for corresponding random value.
Can also be done using IN() :
SELECT * FROM YourTable t
WHERE (t.random,t.date) in (SELECT s.random,max(s.date)
FROM YourTable s
GROUP BY s.random)
Or with a join:
SELECT t.* FROM YourTable t
INNER JOIN (SELECT s.random,max(s.date) as max_date
FROM YourTable s
GROUP BY s.random) tt
ON(t.date = tt.max_date and s.random = t.random)
In SQL Server you could do something like the following,
select a.* from DateBase a inner join
(select Random,
MAX(dt) as dt from DateBase group by Random) as x
on a.dt =x.dt and a.random = x.random
This method will work in all versions of SQL as there are no vendor specifics (you'll need to format the dates using your vendor specific syntax)
You can do this in two stages:
The first step is to work out the max date for each random:
SELECT MAX(DateField) AS MaxDateField, Random
FROM Example
GROUP BY Random
Now you can join back onto your table to get the max ID for each combination:
SELECT MAX(e.ID) AS ID
,e.DateField AS DateField
,e.Random
FROM Example AS e
INNER JOIN (
SELECT MAX(DateField) AS MaxDateField, Random
FROM Example
GROUP BY Random
) data
ON data.MaxDateField = e.DateField
AND data.Random = e.Random
GROUP BY DateField, Random
SQL Fiddle example here: SQL Fiddle
To answer your second question:
If there are multiples of the same date, the MAX(e.ID) will simply choose the highest number. If you want the lowest, you can use MIN(e.ID) instead.

Joining next Sequential Row

I am planing an SQL Statement right now and would need someone to look over my thougts.
This is my Table:
id stat period
--- ------- --------
1 10 1/1/2008
2 25 2/1/2008
3 5 3/1/2008
4 15 4/1/2008
5 30 5/1/2008
6 9 6/1/2008
7 22 7/1/2008
8 29 8/1/2008
Create Table
CREATE TABLE tbstats
(
id INT IDENTITY(1, 1) PRIMARY KEY,
stat INT NOT NULL,
period DATETIME NOT NULL
)
go
INSERT INTO tbstats
(stat,period)
SELECT 10,CONVERT(DATETIME, '20080101')
UNION ALL
SELECT 25,CONVERT(DATETIME, '20080102')
UNION ALL
SELECT 5,CONVERT(DATETIME, '20080103')
UNION ALL
SELECT 15,CONVERT(DATETIME, '20080104')
UNION ALL
SELECT 30,CONVERT(DATETIME, '20080105')
UNION ALL
SELECT 9,CONVERT(DATETIME, '20080106')
UNION ALL
SELECT 22,CONVERT(DATETIME, '20080107')
UNION ALL
SELECT 29,CONVERT(DATETIME, '20080108')
go
I want to calculate the difference between each statistic and the next, and then calculate the mean value of the 'gaps.'
Thougts:
I need to join each record with it's subsequent row. I can do that using the ever flexible joining syntax, thanks to the fact that I know the id field is an integer sequence with no gaps.
By aliasing the table I could incorporate it into the SQL query twice, then join them together in a staggered fashion by adding 1 to the id of the first aliased table. The first record in the table has an id of 1. 1 + 1 = 2 so it should join on the row with id of 2 in the second aliased table. And so on.
Now I would simply subtract one from the other.
Then I would use the ABS function to ensure that I always get positive integers as a result of the subtraction regardless of which side of the expression is the higher figure.
Is there an easier way to achieve what I want?
The lead analytic function should do the trick:
SELECT period, stat, stat - LEAD(stat) OVER (ORDER BY period) AS gap
FROM tbstats
The average value of the gaps can be done by calculating the difference between the first value and the last value and dividing by one less than the number of elements:
select sum(case when seqnum = num then stat else - stat end) / (max(num) - 1);
from (select period, row_number() over (order by period) as seqnum,
count(*) over () as num
from tbstats
) t
where seqnum = num or seqnum = 1;
Of course, you can also do the calculation using lead(), but this will also work in SQL Server 2005 and 2008.
By using Join also you achieve this
SELECT t1.period,
t1.stat,
t1.stat - t2.stat gap
FROM #tbstats t1
LEFT JOIN #tbstats t2
ON t1.id + 1 = t2.id
To calculate the difference between each statistic and the next, LEAD() and LAG() may be the simplest option. You provide an ORDER BY, and LEAD(something) returns the next something and LAG(something) returns the previous something in the given order.
select
x.id thisStatId,
LAG(x.id) OVER (ORDER BY x.id) lastStatId,
x.stat thisStatValue,
LAG(x.stat) OVER (ORDER BY x.id) lastStatValue,
x.stat - LAG(x.stat) OVER (ORDER BY x.id) diff
from tbStats x

How to SELECT top N rows that sum to a certain amount?

Suppose:
MyTable
--
Amount
1
2
3
4
5
MyTable only has one column, Amount, with 5 rows. They are not necessarily in increasing order.
How can I create a function, which takes a #SUM INT, and returns the TOP N rows that sum to this amount?
So for input 6, I want
Amount
1
2
3
Since 1 + 2 + 3 = 6. 2 + 4 / 1 + 5 won't work since I want TOP N ROWS
For 7/8/9/10, I want
Amount
1
2
3
4
I'm using MS SQL Server 2008 R2, if this matters.
Saying "top N rows" is indeed ambiguous when it comes to relational databases.
I assume that you want to order by "amount" ascending.
I would add a second column (to a table or view) like "sum_up_to_here", and create something like that:
create view mytable_view as
select
mt1.amount,
sum(mt2.amount) as sum_up_to_here
from
mytable mt1
left join mytable mt2 on (mt2.amount < mt1.amount)
group by mt1.amount
or:
create view mytable_view as
select
mt1.amount,
(select sum(amount) from mytable where amount < mt1.amount)
from mytable mt1
and then I would select the final rows:
select amount from mytable_view where sum_up_to_here < (some value)
If you don't bother about performance you may of course run it in one query:
select amount from
(
select
mt1.amount,
sum(mt2.amount) as sum_up_to_here
from
mytable mt1
left join mytable mt2 on (mt2.amount < mt1.amount)
group by mt1.amount
) t where sum_up_to_here < 20
One approach:
select t1.amount
from MyTable t1
left join MyTable t2 on t1.amount > t2.amount
group by t1.amount
having coalesce(sum(t2.amount),0) < 7
SQLFiddle here.
In Sql Server you can use CDEs to make it pretty simple to read.
Here is a CDE I did to sum up totals used in sequence. The CDE is similar to the joins above, and holds the total up to any given index. Outside of the CDE I join it back to the original table so I can select it along with other fields.
;with summrp as (
select m1.idx, sum(m2.QtyReq) as sumUsed
from #mrpe m1
join #mrpe m2 on m2.idx <= m1.idx
group by m1.idx
)
select RefNum, RefLineSuf, QtyReq, ProjectedDate, sumUsed from #mrpe m
join summrp on summrp.idx=m.idx
In SQL Server 2012 you can use this shortcut to get a result like Grzegorz's.
SELECT amount
FROM (
SELECT * ,
SUM(amount) OVER (ORDER BY amount ASC) AS total
from demo
) T
WHERE total <= 6
A fiddle in the hand... http://sqlfiddle.com/#!6/b8506/6

Help in correcting the SQL

I am trying to solve this query. I have the following data:
Input
Date Id Value
25-May-2011 1 10
26-May-2011 1 10
26-May-2011 2 10
27-May-2011 1 20
27-May-2011 2 20
28-May-2011 1 10
I need to query and output as:
Output
FromDate ToDate Id Value
25-May-2011 26-May-2011 1 10
26-May-2011 26-May-2011 2 10
27-May-2011 27-May-2011 1 20
28-May-2011 28-May-2011 1 10
I tried this sql but I'm not getting the correct result:
SELECT START_DATE, END_DATE, A.KEY, B.VALUE FROM
(
SELECT MIN(DATE) START_DATE, KEY, VALUE
FROM
KEY_VALUE
GROUP
BY KEY,VALUE
) A INNER JOIN
(
SELECT MAX(DATE) END_DATE, KEY, VALUE
FROM
KEY_VALUE
GROUP
BY KEY, VALUE
) B ON A.KEY = B.KEY AND A.VALUE = B.VALUE;
I think that you are trying too hard. Should be more like this:
SELECT MIN(START_DATE) AS FromDate, MAX(END_DATE) AS ToDate, KEY, VALUE
FROM KEY_VALUE
GROUP BY KEY, VALUE
This query appears to produce the correct results, though it pointed out that you missed a line in your example output '27-May-2011 ... 27-May-2011 ... 2 ... 20'.
select id, [value], date as fromdate, (
select top 1 date
from key_value kv2
where id = kv.id
and [value] = kv.[value]
and date >= kv.date
and datediff(d, kv.date, date) = (
select count(*)
from key_value
where id = kv.id
and [value] = kv.[value]
and date > kv.date
and date <= kv2.date
)
order by date desc
) as todate
from key_value kv
where not exists (
select *
from key_value
where id = kv.id
and [value] = kv.[value]
and date = dateadd(d, -1, kv.[date])
)
First it finds the min date records with the where clause, looking for records that do not have another record on the day before. Then the todate subquery gets the greatest date record by finding the number of days between it and min date then finding the number of records between the two and making sure they match. This of course assumes that the records in the table are distinct.
However if you are processing a massive table your best option may be to sort the records by key, id, date and then use a cursor to programmatically find the min and max dates as you loop over and look for values to change, then push them into a new table whether real or temp along with any other calculations you might need to do on other fields along the way.

Retrieve/update rows with a minimal deviation in a certain column value

I have a database table with one column being dates. However, some of the rows should share the same date but due to lag on insertion there's a one second difference between them. The insert part has been fixed already but the current data in the table needs to be fixed as well.
As an example the following data is present:
2008-10-08 12:23:01 1 1 x
2008-10-08 12:23:01 1 2 y
2008-10-08 12:23:02 1 3 z
Now I want to update the last row in this example and set the date to '2008-10-08 12:23:01'.
The best way I can think of is writing an external script to do that. It's tricky to determine which columns are correct and which should be updated without having more control over the grouping. Pseudo-code:
all_rows = SELECT * FROM table ORDER BY date
last_date = NULL
rows_to_update = []
for row in all_rows:
if last_date is NULL or row.date - last_date > X seconds:
set date to last_date for all rows from rows_to_update
last_date = row.date
rows_to_update = []
else if row.date != last_date:
rows_to_update += row
Alternatively, something like this could work, but you might need more than one run if want to handle cases where all three dates are different and you want to normalize two of them to the first one.
UPDATE
tbl t,
(SELECT
t.date,
(SELECT min(date)
FROM tbl
WHERE timestampdiff(SECOND,date,t.date) BETWEEN 1 AND 3) AS new_date
FROM tbl t) t2
SET t.date=t2.new_date
WHERE t.date=t2.date AND t2.new_date IS NOT NULL
For all rows::.
update yourtable set date_added=date_added-'01';
for a specific row add a where clause
due to lag in insertion
Why don't you get the date for insert before inserting/updating the first row and use that for all the other rows?
Assuming you have this structure:
create table tbl(id int identity, dt datetime)
insert into tbl (dt) values('2009-10-08 12:23:01')
insert into tbl (dt) values('2009-10-08 12:23:01')
insert into tbl (dt) values('2009-10-08 12:23:02')
insert into tbl (dt) values('2009-10-08 12:23:05')
insert into tbl (dt) values('2009-10-08 12:23:05')
insert into tbl (dt) values('2009-10-08 12:23:06')
This query will only show the last item of each set that's 1 second late:
select distinct A.* from tbl A
join (select * from tbl) AS T on datediff(ss, T.dt, A.dt) = 1
Using that in conjunction with an UPDATE statement, you get this:
update tbl set dt = (select top 1 dt from tbl where tbl.id < A.id order by tbl.id desc)
from tbl A
join (select * from tbl) AS T on datediff(ss, T.dt, A.dt) = 1
And that updates the last record of each set to the date above it, giving the results:
1 2009-10-08 12:23:01.000
2 2009-10-08 12:23:01.000
3 2009-10-08 12:23:01.000
4 2009-10-08 12:23:05.000
5 2009-10-08 12:23:05.000
6 2009-10-08 12:23:05.000
Its quick and dirty and unoptimized, but for a once-off data-scrub it should work.
Remember to back up!