I have a table
| ID | V1 | V2 |
| 100 | 1 | 1 |
| 100 | null | 1 |
| 101 | null | null |
| 101 | 1 | 1 |
| 102 | 1 | null |
| 102 | 1 | null |
Needed Sample output:
ID 100 has V1 value in at least one of the rows so need 1
same for ID 101 has V1 value in at least one of the rows so need 1
ID 102 has no V2 value in both rows so need null
Required output
| ID | V1 | V2 |
| 100 | 1 | 1 |
| 101 | 1 | 1 |
| 102 | 1 | null |
tried to combine the values into a list and get the max value
Is there any easier function which can achieve this?
select ID, max(V1) as V1, max(V2) as V2 from table group by ID;
You can do aggregation :
select id, max(v1) as v1, max(v2) as v2
from table t
group by id;
Related
I have data flowing from two tables, table A and table B. I'm doing an inner join on a common column from both the tables and creating two more new columns based on different conditions. Below is a sample dataset:
Table A
| Id | StartDate |
|-----|------------|
| 119 | 01-01-2018 |
| 120 | 01-02-2019 |
| 121 | 03-05-2018 |
| 123 | 05-08-2021 |
TABLE B
| Id | CodeId | Code | RedemptionDate |
|-----|--------|------|----------------|
| 119 | 1 | abc | null |
| 119 | 2 | abc | null |
| 119 | 3 | def | null |
| 119 | 4 | def | 2/3/2019 |
| 120 | 5 | ghi | 04/7/2018 |
| 120 | 6 | ghi | 4/5/2018 |
| 121 | 7 | jkl | null |
| 121 | 8 | jkl | 4/4/2019 |
| 121 | 9 | mno | 3/18/2020 |
| 123 | 10 | pqr | null |
What I'm basically doing is joining the tables on column 'Id' when StartDate>2018 and create two new columns - 'unlock' by counting CodeId when RedemptionDate is null and 'Redeem' by counting CodeId when RedmeptionDate is not null. Below is the SQL query:
WITH cte1 AS (
SELECT a.id, COUNT(b.CodeId) AS 'Unlock'
FROM TableA AS a
JOIN TableB AS b ON a.Id=b.Id
WHERE YEAR(a.StartDate) >= 2018 AND b.RedemptionDate IS NULL
GROUP BY a.id
), cte2 AS (
SELECT a.id, COUNT(b.CodeId) AS 'Redeem'
FROM TableA AS a
JOIN TableB AS b ON a.Id=b.Id
WHERE YEAR(a.StartDate) >= 2018 AND b.RedemptionDate IS NOT NULL
GROUP BY a.id
)
SELECT cte1.Id, cte1.Unlocked, cte2.Redeemed
FROM cte1
FULL OUTER JOIN cte2 ON cte1.Id = cte2.Id
If I break down the output of this query, result from cte1 will look like below:
| Id | Unlock |
|-----|--------|
| 119 | 3 |
| 121 | 1 |
| 123 | 1 |
And from cte2 will look like below:
| Id | Redeem |
|-----|--------|
| 119 | 1 |
| 120 | 2 |
| 121 | 2 |
The last select query will produce the following result:
| Id | Unlock | Redeem |
|------|--------|--------|
| 119 | 3 | 1 |
| null | null | 2 |
| 121 | 1 | 2 |
| 123 | 1 | null |
How can I replace the null value from Id with values from 'b.Id'? If I try coalesce or a case statement, they create new columns. I don't want to create additional columns, rather replace the null values from the column values coming from another table.
My final output should like:
| Id | Unlock | Redeem |
|-----|--------|--------|
| 119 | 3 | 1 |
| 120 | null | 2 |
| 121 | 1 | 2 |
| 123 | 1 | null |
If I'm following correctly, you can use apply with aggregation:
select a.*, b.*
from a cross apply
(select count(RedemptionDate) as num_redeemed,
count(*) - count(RedemptionDate) as num_unlock
from b
where b.id = a.id
) b;
However, the answer to your question is to use coalesce(cte1.id, cte2.id) as id.
Suppose that I have a dataframe as:
| ID | Value | Time |
|---------|-------|------|
| 101 | 100 | 1 |
| 101 | 0 | 2 |
| 101 | 200 | 4 |
| 101 | 200 | 7 |
| 101 | 0 | 10 |
| 102 | 100 | 2 |
| 102 | 0 | 3 |
| 102 | 200 | 5 |
For each non-zero Value, I would like to find the next Time that Value=0 for the same ID. So my desired output will be
| ID | Value | Time | NextTime |
|---------|-------|------|----------|
| 101 | 100 | 1 | 2 |
| 101 | 0 | 2 | Null |
| 101 | 200 | 4 | 10 |
| 101 | 200 | 7 | 10 |
| 101 | 0 | 10 | Null |
| 102 | 100 | 2 | 3 |
| 102 | 0 | 3 | Null |
| 102 | 200 | 5 | Null |
I have tried to use the following subquery:
SELECT *, CASE WHEN Value=0 THEN NULL ELSE (SELECT MIN(Time) FROM Table1 sub
WHERE sub.ID = main.ID AND sub.Time > main.Time AND sub.Value=0) END as NextTime
FROM Table1 AS main
ORDER BY
ID,
Time
This query should work, but the problem is that I am working with a extremely large dataframe (millions records), so this query can not be finished in a reasonable time. Could any one help with a more efficient way to get the desired result? Thanks.
You want a cumulative minimum:
select t.*,
min(case when value = 0 then time end) over
(partition by id
order by time
rows between 1 following and unbounded following
) as next_0_time
from t;
EDIT:
If you want values on the 0 rows to be NULL, then use a case expression:
select t.*,
(case when value <> 0
then min(case when value = 0 then time end) over
(partition by id
order by time
rows between 1 following and unbounded following
)
end) as next_0_time
from t;
Here is a db<>fiddle.
I have this Query for Invertory Balance and work well:
Select A.BATCH_ID ,
A.QTY_MOV - IsNull(B.QTY_USED,0) As BALANCE
From P_BATCH_PRODUC A
Left OUTER Join (Select MATERIAL_ID,
BATCH_MATERIAL_ID),
SUM(QTY_INS) QTY_USED
From CONSUMPTION
Group By MATERIAL_ID, BATCH_MATERIAL_ID) As B
On B.MATERIAL_ID= A.PRODUCT_ID
And A.BATCH_ID = B.BATCH_MATERIAL_ID"
Where A.QTY_MOV - IsNull(B.QTY_USED,0) > 0
AND A.PRODUCT_ID= 1
and A.BATCH_ID = 1
But now, it's possible to have more than one A.QTY_MOV for each A.BATCH_ID , so i need to Change A.QTY_MOV to Sum(A.QTY_MOV ). What do I need to change for that?
Sample:
Table A
+------------+------------+---------+
| Product_ID | Batch_ID | Qty_Mov |
+------------+------------+---------+
| 1 | 1 | 100 |
| 1 | 1 | 150 |
| 2 | 1 | 80 |
| 1 | 3 | 100 |
| 1 | 4 | 100 |
+------------+------------+---------+
Table B
+------------------+------------+------------+----------+--+
| BATCH_MATERIAL_ID| Product_ID | Batch_ID | Qty_USED | |
+------------------+------------+------------+----------+--+
| 1 | 1 | 1 | 80 | |
| 2 | 1 | 1 | 10 | |
| 3 | 1 | 2 | 150 | |
| 4 | 1 | 3 | 80 | |
+------------------+------------+------------+----------+--+
This is what I want
Batch_ID BALANCE
---------- ---------------
1 160
Based strictly on the question, it sounds like you want a window function:
Select A.BATCH_ID ,
SUM(A.QTY_MOV) OVER (PARTITION BY A.BATCH_ID) - IsNull(B.QTY_USED,0) As BALANCE
I don't know if this does anything useful. If it does not, you should ask a new question with sample data and an explanation of logic.
+----+--------------+-----------+--------------++--------------+
| ID | KEY | CODE | VALUE | ACTIVE |
+----+--------------+-----------+--------------+---------------+
| 1 | MIN_VAL_EMP | 111 | 100 | Y |
+----+--------------+-----------+--------------+---------------+
| 2 | MIN_VAL_MARR | 222 | 110 | Y |
+----+--------------+-----------+--------------+---------------+
| 3 | MIN_VAL_FOOD | 0 | 10 | Y |
+----+--------------+-----------+--------------+---------------+
| 4 | MAX_VAL_EMP | 121 | 8000 | Y |
+----+--------------+-----------+--------------+---------------+
| 5 | MAX_VAL_MARR | 0 | 20 | Y |
+----+--------------+-----------+--------------+---------------+
| 6 | MAX_VAL_FOOD | 0 | 30 | Y |
+----+--------------+-----------+--------------+---------------+
| 7 | MIN_VAL_EMP | 0 | 80 | Y |
+----+--------------+-----------+--------------+---------------+
Need to write a query,
If my CODE value is present then fetch that, if not present fetch those KEY with CODE is 0.
Also, there should not be duplicate KEY in the result, either the KEY with CODE or with the default CODE(0). Both will not be present.
Have to take care that, all these rules are applied only to record's ACTIVE = Y
For example if my CODE is 111.
The result will be,
+----+----+--------------+-----------+--------------+
| ID | KEY | CODE | VALUE |
+----+--------------+-----------+--------------+
| 1 | MIN_VAL_EMP | 111 | 100 |
+----+--------------+-----------+--------------+
| 3 | MIN_VAL_FOOD | 0 | 10 |
+----+--------------+-----------+--------------+
| 5 | MAX_VAL_MARR | 0 | 20 |
+----+--------------+-----------+--------------+
| 6 | MAX_VAL_FOOD | 0 | 30 |
+----+--------------+-----------+--------------+
Here, below row will not be part of the result, since we KEY with CODE 111 is present.
| 7 | MIN_VAL_EMP | 0 | 80 |
You can use a where in
select distinct id, key, code, value
frm my_table
where (key,code) in (select key, max(code)
from my_table where code = 111 or code = 0
group by key)
You can use the following query:
SELECT ID, KEY, CODE, VALUE
FROM (
SELECT ID, KEY, CODE, VALUE,
ROW_NUMBER() OVER (PARTITION BY KEY
ORDER BY CASE
WHEN KEY = 111 THEN 1
ELSE 2
END) rn
FROM mytable
WHERE ACTIVE = 'Y' AND CODE IN (111, 0)) t
WHERE t.rn = 1
select t.id, t.key,t.code, t.value from table t
where t.code= 111 and t.active = 'y'
union
select t.id, t.key,t.code, t.value from table t
where t.code= 0 and t.key not in
(
select t.key from table t
where t.code= 111 and t.active = 'y'
)
This should work
I have huge data and sample of the table looks like below
+-----------+------------+-----------+-----------+
| Unique_ID | Date | RowNumber | Flag_Date |
+-----------+------------+-----------+-----------+
| 1 | 6/3/2014 | 1 | 6/3/2014 |
| 1 | 5/22/2015 | 2 | NULL |
| 1 | 6/3/2015 | 3 | NULL |
| 1 | 11/20/2015 | 4 | NULL |
| 2 | 2/25/2014 | 1 | 2/25/2014 |
| 2 | 7/31/2014 | 2 | NULL |
| 2 | 8/26/2014 | 3 | NULL |
+-----------+------------+-----------+-----------+
Now I need to check if the difference between Date in 2nd row and Flag_date in 1st row. If the difference is more than 180 then 2nd row Flag_date should be updated with the date in 2nd row else it needs to be updated by Flag_date in 1st Row. And same rule follows for all rows with same unique_ID
update a
set a.Flag_Date=case when DATEDIFF(dd,b.Flag_Date,a.[Date])>180 then a.[Date] else b.Flag_Date end
from Table1 a
inner join Table1 b
on a.RowNumber=b.RowNumber+1 and a.Unique_ID=b.Unique_ID
The above update query when executed once, only the second row under each Unique_ID gets updated and result looks like below
+-----------+------------+-----------+------------+
| Unique_ID | Date | RowNumber | Flag_Date |
+-----------+------------+-----------+------------+
| 1 | 2014-06-03 | 1 | 2014-06-03 |
| 1 | 2015-05-22 | 2 | 2015-05-22 |
| 1 | 2015-06-03 | 3 | NULL |
| 1 | 2015-11-20 | 4 | NULL |
| 2 | 2014-02-25 | 1 | 2014-02-25 |
| 2 | 2014-07-31 | 2 | 2014-02-25 |
| 2 | 2014-08-26 | 3 | NULL |
+-----------+------------+-----------+------------+
And I need to run four times to achieve my desired result
+-----------+------------+-----------+------------+
| Unique_ID | Date | RowNumber | Flag_Date |
+-----------+------------+-----------+------------+
| 1 | 2014-06-03 | 1 | 2014-06-03 |
| 1 | 2015-05-22 | 2 | 2015-05-22 |
| 1 | 2015-06-03 | 3 | 2015-05-22 |
| 1 | 2015-11-20 | 4 | 2015-11-20 |
| 2 | 2014-02-25 | 1 | 2014-02-25 |
| 2 | 2014-07-31 | 2 | 2014-02-25 |
| 2 | 2014-08-26 | 3 | 2014-08-26 |
+-----------+------------+-----------+------------+
Is there a way where I can run update only once and all the rows are updated.
Thank you!
If you are using SQL Server 2012+, then you can use lag():
with toupdate as (
select t1.*,
lag(flag_date) over (partition by unique_id order by rownumber) as prev_flag_date
from table1 t1
)
update toupdate
set Flag_Date = (case when DATEDIFF(day, prev_Flag_Date, toupdate.[Date]) > 180
then toupdate.[Date] else prev_Flag_Date
end);
Both this version and your version can take advantage of an index on table1(unique_id, rownumber) or, better yet, table1(unique_id, rownumber, flag_date).
EDIT:
In earlier versions, this might have better performance:
with toupdate as (
select t1.*, t2.flag_date as prev_flag_date
from table1 t1 outer apply
(select top 1 t2.flag_date
from table1 t2
where t2.unique_id = t1.unique_id and
t2.rownumber < t1.rownumber
order by t2.rownumber desc
) t2
)
update toupdate
set Flag_Date = (case when DATEDIFF(day, prev_Flag_Date, toupdate.[Date]) > 180
then toupdate.[Date] else prev_Flag_Date
end);
The CTE can make use of the same index -- and it is important to have the index. The reason for the better performance is because your join on row_number() cannot use an index on that field.