Update based on order - sql

Is it possible to update data based on priority defined in column.
I have input data like this
id Start_date active_flag
1 21-03-2013 N
1 23-03-2013 N
1 22-02-2013 N
1 20-02-2013 N
we have to maintain SC2 in our data and have to keep the data for latest date ( i,e 23-02-2013 here) as active in our database.
we will be getting files daily but in some case, we can get files with combined data for 2 days. now I have to make sure all the history is maintained and data with the latest date as active.
My target data will look like
id Start_date active_flag
1 21-03-2013 N
1 23-03-2013 Y
1 22-02-2013 N
1 20-02-2013 N
but how to write an update which can update data for the column id , based on the order of Start_date.
Thanks in advance

CREATE TABLE #tst(id int,start_data datetime,active_flage varchar(2))
insert into #tst
SELECT 1,'2013-03-21','' UNION
SELECT 1,'2013-03-23','' UNION
SELECT 1,'2013-03-20','' UNION
SELECT 1,'2013-03-19','' UNION
SELECT 1,'2013-03-18',''
UPDATE #tst set active_flage=CASE when r_Id=1 then 'Y' ELSE 'N' END
FROM #tst a JOIN
(SELECT ROW_NUMBER() over(PARTITION by id order by start_data desc) as r_Id,* from #tst)b
ON a.Id=b.Id AND a.start_data=b.start_data
select * from #tst
Considering there will not be duplicate date for same id

Related

Snowflake: Repeating rows based on column value

How to repeat rows based on column value in snowflake using sql.
I tried a few methods but not working such as dual and connect by.
I have two columns: Id and Quantity.
For each ID, there are different values of Quantity.
So if you have a count, you can use a generator:
with ten_rows as (
select row_number() over (order by null) as rn
from table(generator(ROWCOUNT=>10))
), data(id, count) as (
select * from values
(1,2),
(2,4)
)
SELECT
d.*
,r.rn
from data as d
join ten_rows as r
on d.count >= r.rn
order by 1,3;
ID
COUNT
RN
1
2
1
1
2
2
2
4
1
2
4
2
2
4
3
2
4
4
Ok let's start by generating some data. We will create 10 rows, with a QTY. The QTY will be randomly chosen as 1 or 2.
Next we want to duplicate the rows with a QTY of 2 and leave the QTY =1 as they are.
Obviously you can change all parameters above to suit your needs - this solution works super fast and in my opinion way better than table generation.
Simply stack SPLIT_TO_TABLE(), REPEAT() with a LATERAL() join and voila.
WITH TEN_ROWS AS (SELECT ROW_NUMBER()OVER(ORDER BY NULL)SOME_ID,UNIFORM(1,2,RANDOM())QTY FROM TABLE(GENERATOR(ROWCOUNT=>10)))
SELECT
TEN_ROWS.*
FROM
TEN_ROWS,LATERAL SPLIT_TO_TABLE(REPEAT('hire me $10/hour',QTY-1),'hire me $10/hour')ALTERNATIVE_APPROACH;

SQL Query getting the latest record of the Group and calculate the value of those particular records

I do have the following table (just a sample) and would like to get the Points subtract from Record2 to Record1. (Record2-Record1) from the latest record of both record1 and 2. The records are entered in category of Match. 1 Match will consists of 2 records which are Record 1 and Record 2.
The output will be 3 as the newest record is ID 3 and 4 from the Match2.)
ID
Name
Points
TimeRecorded
Match
1
Record 1
3
2-Mar 2pm
1
2
Record 2
5
2-Mar 2pm
1
3
Record 1
5
4-Mar 5pm
2
4
Record 2
8
4-Mar 5pm
2
I tried to get the value of subtracting both query as below. But I feel that this is not the good way as it is hard coded for the match and the Name of the record. May I know how to construct a better query in order to get the latest record of the grouped match and calculate the points whereby subtracting Record1 from Record2.
SELECT
(select Points from RunRecord where Name= 'Record2' AND Match = 2)
- (select Points from RunRecord where Name= 'Record1' AND Match = 2)
You could use:
WITH cte AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY Name ORDER BY TimeRecorded DESC) rn
FROM yourTable
)
SELECT
MAX(CASE WHEN Name = 'Record 2' THEN Points END) -
MAX(CASE WHEN Name = 'Record 1' THEN Points END) AS diff
FROM cte
WHERE rn = 1;
The CTE assigns a row number for each group of records of the same name, with 1 being assigned to the most recent record. Then, we aggregate over the entire table and pivot out the points to find the difference.
You can use the rank() window function to rank the records by match descending. Then take the top of the ranked records and use conditional aggregation to control the sign of the points added.
SELECT sum(CASE x.name
WHEN 'Record2' THEN
x.points
WHEN 'Record1' THEN
-x.points
END)
FROM (SELECT rr.name,
rr.points,
rank() OVER (ORDER BY rr.match DESC) r
FROM runrecord rr
WHERE name IN ('Record1',
'Record2')) x
WHERE x.r = 1;

Joining next Sequential Row

I am planing an SQL Statement right now and would need someone to look over my thougts.
This is my Table:
id stat period
--- ------- --------
1 10 1/1/2008
2 25 2/1/2008
3 5 3/1/2008
4 15 4/1/2008
5 30 5/1/2008
6 9 6/1/2008
7 22 7/1/2008
8 29 8/1/2008
Create Table
CREATE TABLE tbstats
(
id INT IDENTITY(1, 1) PRIMARY KEY,
stat INT NOT NULL,
period DATETIME NOT NULL
)
go
INSERT INTO tbstats
(stat,period)
SELECT 10,CONVERT(DATETIME, '20080101')
UNION ALL
SELECT 25,CONVERT(DATETIME, '20080102')
UNION ALL
SELECT 5,CONVERT(DATETIME, '20080103')
UNION ALL
SELECT 15,CONVERT(DATETIME, '20080104')
UNION ALL
SELECT 30,CONVERT(DATETIME, '20080105')
UNION ALL
SELECT 9,CONVERT(DATETIME, '20080106')
UNION ALL
SELECT 22,CONVERT(DATETIME, '20080107')
UNION ALL
SELECT 29,CONVERT(DATETIME, '20080108')
go
I want to calculate the difference between each statistic and the next, and then calculate the mean value of the 'gaps.'
Thougts:
I need to join each record with it's subsequent row. I can do that using the ever flexible joining syntax, thanks to the fact that I know the id field is an integer sequence with no gaps.
By aliasing the table I could incorporate it into the SQL query twice, then join them together in a staggered fashion by adding 1 to the id of the first aliased table. The first record in the table has an id of 1. 1 + 1 = 2 so it should join on the row with id of 2 in the second aliased table. And so on.
Now I would simply subtract one from the other.
Then I would use the ABS function to ensure that I always get positive integers as a result of the subtraction regardless of which side of the expression is the higher figure.
Is there an easier way to achieve what I want?
The lead analytic function should do the trick:
SELECT period, stat, stat - LEAD(stat) OVER (ORDER BY period) AS gap
FROM tbstats
The average value of the gaps can be done by calculating the difference between the first value and the last value and dividing by one less than the number of elements:
select sum(case when seqnum = num then stat else - stat end) / (max(num) - 1);
from (select period, row_number() over (order by period) as seqnum,
count(*) over () as num
from tbstats
) t
where seqnum = num or seqnum = 1;
Of course, you can also do the calculation using lead(), but this will also work in SQL Server 2005 and 2008.
By using Join also you achieve this
SELECT t1.period,
t1.stat,
t1.stat - t2.stat gap
FROM #tbstats t1
LEFT JOIN #tbstats t2
ON t1.id + 1 = t2.id
To calculate the difference between each statistic and the next, LEAD() and LAG() may be the simplest option. You provide an ORDER BY, and LEAD(something) returns the next something and LAG(something) returns the previous something in the given order.
select
x.id thisStatId,
LAG(x.id) OVER (ORDER BY x.id) lastStatId,
x.stat thisStatValue,
LAG(x.stat) OVER (ORDER BY x.id) lastStatValue,
x.stat - LAG(x.stat) OVER (ORDER BY x.id) diff
from tbStats x

Help in correcting the SQL

I am trying to solve this query. I have the following data:
Input
Date Id Value
25-May-2011 1 10
26-May-2011 1 10
26-May-2011 2 10
27-May-2011 1 20
27-May-2011 2 20
28-May-2011 1 10
I need to query and output as:
Output
FromDate ToDate Id Value
25-May-2011 26-May-2011 1 10
26-May-2011 26-May-2011 2 10
27-May-2011 27-May-2011 1 20
28-May-2011 28-May-2011 1 10
I tried this sql but I'm not getting the correct result:
SELECT START_DATE, END_DATE, A.KEY, B.VALUE FROM
(
SELECT MIN(DATE) START_DATE, KEY, VALUE
FROM
KEY_VALUE
GROUP
BY KEY,VALUE
) A INNER JOIN
(
SELECT MAX(DATE) END_DATE, KEY, VALUE
FROM
KEY_VALUE
GROUP
BY KEY, VALUE
) B ON A.KEY = B.KEY AND A.VALUE = B.VALUE;
I think that you are trying too hard. Should be more like this:
SELECT MIN(START_DATE) AS FromDate, MAX(END_DATE) AS ToDate, KEY, VALUE
FROM KEY_VALUE
GROUP BY KEY, VALUE
This query appears to produce the correct results, though it pointed out that you missed a line in your example output '27-May-2011 ... 27-May-2011 ... 2 ... 20'.
select id, [value], date as fromdate, (
select top 1 date
from key_value kv2
where id = kv.id
and [value] = kv.[value]
and date >= kv.date
and datediff(d, kv.date, date) = (
select count(*)
from key_value
where id = kv.id
and [value] = kv.[value]
and date > kv.date
and date <= kv2.date
)
order by date desc
) as todate
from key_value kv
where not exists (
select *
from key_value
where id = kv.id
and [value] = kv.[value]
and date = dateadd(d, -1, kv.[date])
)
First it finds the min date records with the where clause, looking for records that do not have another record on the day before. Then the todate subquery gets the greatest date record by finding the number of days between it and min date then finding the number of records between the two and making sure they match. This of course assumes that the records in the table are distinct.
However if you are processing a massive table your best option may be to sort the records by key, id, date and then use a cursor to programmatically find the min and max dates as you loop over and look for values to change, then push them into a new table whether real or temp along with any other calculations you might need to do on other fields along the way.

Retrieve/update rows with a minimal deviation in a certain column value

I have a database table with one column being dates. However, some of the rows should share the same date but due to lag on insertion there's a one second difference between them. The insert part has been fixed already but the current data in the table needs to be fixed as well.
As an example the following data is present:
2008-10-08 12:23:01 1 1 x
2008-10-08 12:23:01 1 2 y
2008-10-08 12:23:02 1 3 z
Now I want to update the last row in this example and set the date to '2008-10-08 12:23:01'.
The best way I can think of is writing an external script to do that. It's tricky to determine which columns are correct and which should be updated without having more control over the grouping. Pseudo-code:
all_rows = SELECT * FROM table ORDER BY date
last_date = NULL
rows_to_update = []
for row in all_rows:
if last_date is NULL or row.date - last_date > X seconds:
set date to last_date for all rows from rows_to_update
last_date = row.date
rows_to_update = []
else if row.date != last_date:
rows_to_update += row
Alternatively, something like this could work, but you might need more than one run if want to handle cases where all three dates are different and you want to normalize two of them to the first one.
UPDATE
tbl t,
(SELECT
t.date,
(SELECT min(date)
FROM tbl
WHERE timestampdiff(SECOND,date,t.date) BETWEEN 1 AND 3) AS new_date
FROM tbl t) t2
SET t.date=t2.new_date
WHERE t.date=t2.date AND t2.new_date IS NOT NULL
For all rows::.
update yourtable set date_added=date_added-'01';
for a specific row add a where clause
due to lag in insertion
Why don't you get the date for insert before inserting/updating the first row and use that for all the other rows?
Assuming you have this structure:
create table tbl(id int identity, dt datetime)
insert into tbl (dt) values('2009-10-08 12:23:01')
insert into tbl (dt) values('2009-10-08 12:23:01')
insert into tbl (dt) values('2009-10-08 12:23:02')
insert into tbl (dt) values('2009-10-08 12:23:05')
insert into tbl (dt) values('2009-10-08 12:23:05')
insert into tbl (dt) values('2009-10-08 12:23:06')
This query will only show the last item of each set that's 1 second late:
select distinct A.* from tbl A
join (select * from tbl) AS T on datediff(ss, T.dt, A.dt) = 1
Using that in conjunction with an UPDATE statement, you get this:
update tbl set dt = (select top 1 dt from tbl where tbl.id < A.id order by tbl.id desc)
from tbl A
join (select * from tbl) AS T on datediff(ss, T.dt, A.dt) = 1
And that updates the last record of each set to the date above it, giving the results:
1 2009-10-08 12:23:01.000
2 2009-10-08 12:23:01.000
3 2009-10-08 12:23:01.000
4 2009-10-08 12:23:05.000
5 2009-10-08 12:23:05.000
6 2009-10-08 12:23:05.000
Its quick and dirty and unoptimized, but for a once-off data-scrub it should work.
Remember to back up!