Suppose that I have a dataframe as:
| ID | Value | Time |
|---------|-------|------|
| 101 | 100 | 1 |
| 101 | 0 | 2 |
| 101 | 200 | 4 |
| 101 | 200 | 7 |
| 101 | 0 | 10 |
| 102 | 100 | 2 |
| 102 | 0 | 3 |
| 102 | 200 | 5 |
For each non-zero Value, I would like to find the next Time that Value=0 for the same ID. So my desired output will be
| ID | Value | Time | NextTime |
|---------|-------|------|----------|
| 101 | 100 | 1 | 2 |
| 101 | 0 | 2 | Null |
| 101 | 200 | 4 | 10 |
| 101 | 200 | 7 | 10 |
| 101 | 0 | 10 | Null |
| 102 | 100 | 2 | 3 |
| 102 | 0 | 3 | Null |
| 102 | 200 | 5 | Null |
I have tried to use the following subquery:
SELECT *, CASE WHEN Value=0 THEN NULL ELSE (SELECT MIN(Time) FROM Table1 sub
WHERE sub.ID = main.ID AND sub.Time > main.Time AND sub.Value=0) END as NextTime
FROM Table1 AS main
ORDER BY
ID,
Time
This query should work, but the problem is that I am working with a extremely large dataframe (millions records), so this query can not be finished in a reasonable time. Could any one help with a more efficient way to get the desired result? Thanks.
You want a cumulative minimum:
select t.*,
min(case when value = 0 then time end) over
(partition by id
order by time
rows between 1 following and unbounded following
) as next_0_time
from t;
EDIT:
If you want values on the 0 rows to be NULL, then use a case expression:
select t.*,
(case when value <> 0
then min(case when value = 0 then time end) over
(partition by id
order by time
rows between 1 following and unbounded following
)
end) as next_0_time
from t;
Here is a db<>fiddle.
Related
I am having trouble semi-transposing the table below based on the 'LENGTH' column. I am using an Oracle database, sample data:
+-----------+-----------+--------+------+
| PERSON_ID | PERIOD_ID | LENGTH | FLAG |
+-----------+-----------+--------+------+
| 1 | 1 | 4 | 1 |
| 1 | 2 | 3 | 0 |
| 2 | 1 | 4 | 1 |
+-----------+-----------+--------+------+
I would like to lengthen this table based on the LENGTH row; basically duplicating the row for each value in the LENGTH column.
See the desired output table below:
+-----------+-----------+--------+------+
| PERSON_ID | PERIOD_ID | NUMBER | FLAG |
+-----------+-----------+--------+------+
| 1 | 1 | 1 | 1 |
| 1 | 1 | 2 | 1 |
| 1 | 1 | 3 | 1 |
| 1 | 1 | 4 | 1 |
| 1 | 2 | 1 | 0 |
| 1 | 2 | 2 | 0 |
| 1 | 2 | 3 | 0 |
| 2 | 1 | 1 | 1 |
| 2 | 1 | 2 | 1 |
| 2 | 1 | 3 | 1 |
| 2 | 1 | 4 | 1 |
+-----------+-----------+--------+------+
I typically work in Posgres so Oracle is new to me.
I've found some solutions using the connect by statement but they seem overly complicated, particularly when compared to the simple generate_series() command from Posgres.
A recursive CTE subtracting 1 from length until 1 is reached should work. (In Postgres too, BTW, should you need something working cross platform.)
WITH cte (person_id,
period_id,
number_,
flag)
AS
(
SELECT person_id,
period_id,
length number_,
flag
FROM elbat
UNION ALL
SELECT person_id,
period_id,
number_ - 1 number_,
flag
FROM cte
WHERE number_ > 1
)
SELECT *
FROM cte
ORDER BY person_id,
period_id,
number_;
db<>fiddle
I have a tabel that looks like this:
|--------+------+---------|------|
| Head | ID | Amount | Rank |
|--------+------+---------|------|
| 1 | 10 | 1000 | 1 |
| 1 | 11 | 1200 | 2 |
| 1 | 12 | 1500 | 3 |
| 2 | 20 | 3400 | 1 |
| 2 | 21 | 3600 | 2 |
| 2 | 22 | 4200 | 3 |
| 2 | 23 | 1700 | 4 |
|--------+------+---------|------|
I want a new column (New_column) that does the following:
|--------+------+---------|------|------------|
| Head | ID | Amount | Rank | New_column |
|--------+------+---------|------|------------|
| 1 | 10 | 1000 | 1 | 1000 |
| 1 | 11 | 1200 | 2 | 1000 |
| 1 | 12 | 1500 | 3 | 1200 |
| 2 | 20 | 3400 | 1 | 3400 |
| 2 | 21 | 3600 | 2 | 3400 |
| 2 | 22 | 4200 | 3 | 3600 |
| 2 | 23 | 1700 | 4 | 4200 |
|--------+------+---------|------|------------|
Within each Head number, if rank is not 1, takes the amount of row within the Head number with Rank number before it (Rank 2 takes the amount of Rank 1 within the same Head and Rank 3 takes the amount of Rank 2 within the same Head and so on...)
I know how to fix it with a For loop in other programming languages but Don't know how to do it with SQL.
I think you basically want lag():
select t.*,
lag(amount, 1, amount) over (partition by head order by rank) as new_column
from t;
The three-argument form of lag() allows you to provide a default value.
You can join the same table(subquery) on rank-1 of derived table.
select t1.*,case when t1.rank=1 then amount else t2.amount new_amount
from your_table t1 left join (select Head,ID,Amount,Rank from your_table) t2
on t1.head=t2.head and t1.rank=t2.rank-1
You can use this update:
UPDATE your_table b
SET New_column = CASE WHEN rank = 1 then Amount
ELSE (select a.Amount FROM your_table a where a.ID = b.ID and a.rank = b.rank-1) END
I have huge data and sample of the table looks like below
+-----------+------------+-----------+-----------+
| Unique_ID | Date | RowNumber | Flag_Date |
+-----------+------------+-----------+-----------+
| 1 | 6/3/2014 | 1 | 6/3/2014 |
| 1 | 5/22/2015 | 2 | NULL |
| 1 | 6/3/2015 | 3 | NULL |
| 1 | 11/20/2015 | 4 | NULL |
| 2 | 2/25/2014 | 1 | 2/25/2014 |
| 2 | 7/31/2014 | 2 | NULL |
| 2 | 8/26/2014 | 3 | NULL |
+-----------+------------+-----------+-----------+
Now I need to check if the difference between Date in 2nd row and Flag_date in 1st row. If the difference is more than 180 then 2nd row Flag_date should be updated with the date in 2nd row else it needs to be updated by Flag_date in 1st Row. And same rule follows for all rows with same unique_ID
update a
set a.Flag_Date=case when DATEDIFF(dd,b.Flag_Date,a.[Date])>180 then a.[Date] else b.Flag_Date end
from Table1 a
inner join Table1 b
on a.RowNumber=b.RowNumber+1 and a.Unique_ID=b.Unique_ID
The above update query when executed once, only the second row under each Unique_ID gets updated and result looks like below
+-----------+------------+-----------+------------+
| Unique_ID | Date | RowNumber | Flag_Date |
+-----------+------------+-----------+------------+
| 1 | 2014-06-03 | 1 | 2014-06-03 |
| 1 | 2015-05-22 | 2 | 2015-05-22 |
| 1 | 2015-06-03 | 3 | NULL |
| 1 | 2015-11-20 | 4 | NULL |
| 2 | 2014-02-25 | 1 | 2014-02-25 |
| 2 | 2014-07-31 | 2 | 2014-02-25 |
| 2 | 2014-08-26 | 3 | NULL |
+-----------+------------+-----------+------------+
And I need to run four times to achieve my desired result
+-----------+------------+-----------+------------+
| Unique_ID | Date | RowNumber | Flag_Date |
+-----------+------------+-----------+------------+
| 1 | 2014-06-03 | 1 | 2014-06-03 |
| 1 | 2015-05-22 | 2 | 2015-05-22 |
| 1 | 2015-06-03 | 3 | 2015-05-22 |
| 1 | 2015-11-20 | 4 | 2015-11-20 |
| 2 | 2014-02-25 | 1 | 2014-02-25 |
| 2 | 2014-07-31 | 2 | 2014-02-25 |
| 2 | 2014-08-26 | 3 | 2014-08-26 |
+-----------+------------+-----------+------------+
Is there a way where I can run update only once and all the rows are updated.
Thank you!
If you are using SQL Server 2012+, then you can use lag():
with toupdate as (
select t1.*,
lag(flag_date) over (partition by unique_id order by rownumber) as prev_flag_date
from table1 t1
)
update toupdate
set Flag_Date = (case when DATEDIFF(day, prev_Flag_Date, toupdate.[Date]) > 180
then toupdate.[Date] else prev_Flag_Date
end);
Both this version and your version can take advantage of an index on table1(unique_id, rownumber) or, better yet, table1(unique_id, rownumber, flag_date).
EDIT:
In earlier versions, this might have better performance:
with toupdate as (
select t1.*, t2.flag_date as prev_flag_date
from table1 t1 outer apply
(select top 1 t2.flag_date
from table1 t2
where t2.unique_id = t1.unique_id and
t2.rownumber < t1.rownumber
order by t2.rownumber desc
) t2
)
update toupdate
set Flag_Date = (case when DATEDIFF(day, prev_Flag_Date, toupdate.[Date]) > 180
then toupdate.[Date] else prev_Flag_Date
end);
The CTE can make use of the same index -- and it is important to have the index. The reason for the better performance is because your join on row_number() cannot use an index on that field.
I have an issue where I need group a set of values and increase the group number when the variance between 2 columns is greater than or equal to 4, please see below.
UPDATE: I added a date column so you can view the order, but I need the group to update based off of the variance not the date.
+--------+-------+-------+----------+--------------+
| Date | Col 1 | Col 2 | Variance | Group Number |
+--------+-------+-------+----------+--------------+
| 1-Jun | 2 | 1 | 1 | 1 |
| 2-Jun | 1 | 1 | 0 | 1 |
| 3-Jun | 3 | 2 | 1 | 1 |
| 4-Jun | 4 | 1 | 3 | 1 |
| 5-Jun | 5 | 1 | 4 | 2 |
| 6-Jun | 1 | 1 | 0 | 2 |
| 7-Jun | 23 | 12 | 11 | 3 |
| 8-Jun | 12 | 11 | 1 | 3 |
| 9-Jun | 2 | 1 | 1 | 3 |
| 10-Jun | 13 | 4 | 9 | 4 |
| 11-Jun | 2 | 1 | 1 | 4 |
+--------+-------+-------+----------+--------------+
The group number is simply the number of times that 4 or greater appears in the variance column. You can get this using a correlated subquery:
select t.*,
(select 1 + count(*)
from table t2
where t2.date < t.date and t2.variance >= 4
) as GroupNumber
from table t;
In SQL Server 2012+, you can also do this using a cumulative sum:
select t.*,
sum(case when variance >= 4 then 1 else 0 end) over
(order by date rows between unbounded preceding and 1 preceding
) as GroupNumber
from table t;
I need help with a SQL that will convert this table:
===================
| Id | FK | Status|
===================
| 1 | A | 100 |
| 2 | A | 101 |
| 3 | B | 100 |
| 4 | B | 101 |
| 5 | C | 100 |
| 6 | C | 101 |
| 7 | A | 102 |
| 8 | A | 102 |
| 9 | B | 102 |
| 10 | B | 102 |
===================
to this:
==========================================
| FK | Count 100 | Count 101 | Count 102 |
==========================================
| A | 1 | 1 | 2 |
| B | 1 | 1 | 2 |
| C | 1 | 1 | 0 |
==========================================
I can so simple counts, etc., but am struggling trying to pivot the table with the information derived. Any help is appreciated.
Use:
SELECT t.fk,
SUM(CASE WHEN t.status = 100 THEN 1 ELSE 0 END) AS count_100,
SUM(CASE WHEN t.status = 101 THEN 1 ELSE 0 END) AS count_101,
SUM(CASE WHEN t.status = 102 THEN 1 ELSE 0 END) AS count_102
FROM TABLE t
GROUP BY t.fk
use:
select * from
(select fk,fk as fk1,statusFK from #t
) as t
pivot
(COUNT(fk1) for statusFK IN ([100],[101],[102])
) AS pt
Just adding a shortcut to #OMG's answer.
You can eliminate CASE statement:
SELECT t.fk,
SUM(t.status = 100) AS count_100,
SUM(t.status = 101) AS count_101,
SUM(t.status = 102) AS count_102
FROM TABLE t
GROUP BY t.fk