Filling in missing data in Snowflake

Filling in missing data in Snowflake - sql

I have a table in Snowflake like this:
TIME USER ITEM
1 frank 1
2 frank 0
3 frank 0
4 frank 0
5 frank 2
6 alf 5
7 alf 0
8 alf 6
9 alf 0
10 alf 9
I want to be able to replace all the zeroes with the next non-zero value, so in the end I have a table like this:
TIME USER ITEM
1 frank 1
2 frank 2
3 frank 2
4 frank 2
5 frank 2
6 alf 5
7 alf 6
8 alf 6
9 alf 9
10 alf 9
How would I write a query that does that in Snowflake?

You can use conditional_change_event function for this - documented here:
with base_table as (
select
t1.*,
conditional_change_event(item) over (order by time desc) event_num
from test_table t1
order by time desc
)
select
t1.time,
t1.user,
t1.item old_item,
coalesce(t2.item, t1.item) new_item
from base_table t1
left join base_table t2 on t1.event_num = t2.event_num + 1 and t1.item = 0
order by t1.time asc
Above SQL Results:
+----+-----+--------+--------+
|TIME|USER |OLD_ITEM|NEW_ITEM|
+----+-----+--------+--------+
|1 |frank|1 |1 |
|2 |frank|0 |2 |
|3 |frank|0 |2 |
|4 |frank|0 |2 |
|5 |alf |2 |2 |
|6 |alf |5 |5 |
|7 |alf |0 |6 |
|8 |alf |6 |6 |
|9 |alf |0 |9 |
|10 |alf |9 |9 |
+----+-----+--------+--------+

You can use lead(ignore nulls):
select t.*,
(case when item = 0
then lead(nullif(item, 0) ignore nulls) over (partition by user order by time)
else item
end) as imputed_item
from t;
You can also phrase this using first_value():
select t.*,
last_value(nullif(item, 0) ignore nulls) over (partition by user order by time desc)
from t;

If you want to use first_value() or last_value() in Snowflake, please keep in mind that Snowflake supports window frames differently from the ANSI standard as documented here. This means that if you want to use the default window frame RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW you have to include it explicitly in the statement, otherwise, the default would be ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING and that is why the LAST_VALUE example from the previous answer would not work correctly. Here is one example that would work:
select t.*,
last_value(nullif(item, 0) ignore nulls) over (partition by user order by time desc rows between unbounded preceding and current row)
from t;

Nothing wrong with above solutions ... but here's a different approach ... I think it's simpler.
select * from good
union all
select
bad.time
,bad.user
,min(good.item)
from bad
left outer join
good on good.user=bad.user and good.time>bad.time
group by
1,2
Full COPY|PASTE|RUN SQL:
with cte as (
select * from (
select 1 time, 'frank' user , 1 item union
select 2 time, 'frank' user , 0 item union
select 3 time, 'frank' user , 0 item union
select 4 time, 'frank' user , 0 item union
select 5 time, 'frank' user , 2 item union
select 6 time, 'alf' user , 5 item union
select 7 time, 'alf' user , 0 item union
select 8 time, 'alf' user , 6 item union
select 9 time, 'alf' user , 0 item union
select 10 time, 'alf' user , 9) )
, good as (select * from cte where item<> 0)
, bad as (select * from cte where item= 0)
select * from good
union all
select
bad.time
,bad.user
,min(good.item )
from bad
left outer join
good on good.user=bad.user and good.time>bad.time
group by
1,2

Related

Over Partition for set column SQL

I have this table, I need set ID column = 1 for the max value of column minutes, and the rest ID column = 0.
Initial table:
Register |minutes | ID
10 |5 | 0
10 |6 | 0
10 |0 | 0
12 |3 | 0
12 |0 | 0
12 |4 | 0
Final table:
Register |minutes | ID
10 |5 | 0
10 |6 | 1
10 |0 | 0
12 |3 | 0
12 |0 | 0
12 |4 | 1
Using Over Partition, any idea ?
UPDATE A
SET ID = 1
FROM
(
Select top 1 row_number() over (PARTITION BY minutes
order by minutes asc) AS column,*
from table
)A
WHERE A.column=1

You can use row_number() in an updatable CTE:
with m as (
select *,
row_number() over(partition by register order by minutes desc) rn
from t
)
update m set id=1 where rn=1

Does this do what you want?
DECLARE #max INT
SELECT TOP 1
#max = Minutes
FROM YourTable
ORDER BY Minutes DESC
UPDATE YourTable
SET ID = CASE
WHEN Minutes = #max
THEN 1
ELSE 0
END

If you don’t want to use CTE or variable tables:
UPDATE A
SET A.ID = CASE
WHEN B.RowNumber = 1
THEN 1
ELSE 0
END
FROM table A
JOIN (
SELECT *, row_number() over (PARTITION BY Register
order by minutes DESC) AS RowNumber
FROM table
) B ON A.Register = B.Register AND A.minutes = B.minutes

I have left my previous answer in place as it represents the answer to the question as it was at the time of answering. Given the new information added to the question, the update query would be as below:
;WITH MyCTE AS
(
SELECT Register,
Minutes,
ID,
ROW_NUMBER() OVER (PARTITION BY Register ORDER BY Minutes DESC) RowN
FROM YourTable
)
UPDATE MyCTE
SET ID = CASE
WHEN RowN = 1 THEN 1
ELSE 0
END

SQL select top five most recent row and distinct by a specific column

Ok, So say I have a table as picture below name appModelFlat only with a few hundred more rows. It does not have a date field but I want to find out the five most recently created environments (EnvName). There is only 14 possible environments (EnvName). But I want to select the five most recently inserted rows that inserted different EnvName. That is to say I want to select distinct EnvName (Although distinct doesn't work this way) most recent 5 rows , and I know they are the most recent by their id. The higher the id the newer the row is. Any help on this query would be appreciated.
id|AppName|EnvName|ServerTypeName|ServerId|OS |OSVersion|CPU|Memory|ExtraStorage|MachineDesc |
----------------------------------------------------------------------------------------------------
1 |ASB |DEV |App |1 |Windows|7 |4 |4 |100 |ASB-DEV-App |
----------------------------------------------------------------------------------------------------
5 |AMS |DEV |APP |2 |RedHat |7.2 |4 |4 |50 |AMS-DEV-App |
----------------------------------------------------------------------------------------------------
6 |SPB |TST |App |1 |Windows|7 |2 |8 |50 |SPB-TST-App |
----------------------------------------------------------------------------------------------------
7 |SBI |TST |Oracle |1 |Solaris|11 |4 |8 |100 |SBI-TST-Oracle|
----------------------------------------------------------------------------------------------------
Here is my first attempt although I'm not sure if it is right. It does give me five results.
SELECT DISTINCT top 5 [ID] = ( SELECT TOP 1 [ID] FROM [AppModelFlat] Y WHERE Y.[EnvName] = X.[EnvName])
,[AppName]= ( SELECT TOP 1 [AppName] FROM [AppModelFlat] Y WHERE Y.[EnvName] = X.[EnvName])
,[EnvName]
,[ServerTypeName] = ( SELECT TOP 1 [ServerTypeName] FROM [AppModelFlat] Y WHERE Y.[EnvName] = X.[EnvName])
,[ServerId] = ( SELECT TOP 1 [ServerId] FROM [AppModelFlat] Y WHERE Y.[EnvName] = X.[EnvName])
,[OS] = ( SELECT TOP 1 [OS] FROM [AppModelFlat] Y WHERE Y.[EnvName] = X.[EnvName])
FROM [AppModelFlat] X order by id desc
edit:
For expected result. Lets say I only wanted to select the top 2 since I only gave 5 entries here. I would want to get back the following.
5 |AMS |DEV |APP |2 |RedHat |7.2 |4 |4 |50 |AMS-DEV-App |
----------------------------------------------------------------------------------------------------
7 |SBI |TST |Oracle |1 |Solaris|11 |4 |8 |100 |SBI-TST-Oracle|
Because I only have one of each EnvName and each row has the highest Id number for that row.

using row_number() to get the latest row for each EnvName, and only taking the top 5 from ordered Id desc
select top 5 *
from (
select *
, rn = row_number() over (partition by EnvName order by id desc)
from appModelFlat
) s
where rn = 1
order by id desc
top with ties version:
select top 5 *
from (
select top 1 with ties *
from appModelFlat
order by row_number() over (partition by EnvName order by id desc)
) s
order by id desc

A simple sub query would also do the trick:
SELECT TOP 5 Id, AppName, EnvName, ServerTypeName, ServerId, OS
FROM AppModelFlat Records
INNER JOIN (SELECT EnvName,
MAX(Id) as Id
FROM AppModelFlat) Latest ON Records.Id = Latest.Id

Removing duplicate results

I have a view with some records, many of them are duplicated. I need to filter records and get only one from each of them.
I've tried with
SELECT TOP 1 Item, Code, Desc, '1' AS Qty FROM vwTbl1 WHERE Code = '12' OR Code = '311'
Also tried with DISTINCT but still I get all records.
but in this case it shows me only one record. Grouping by Code doesn't work.
Is there any other way how to solve this?
Item | Code | Desc | QTY
a | 12 | 1 |1
a | 311 | 2 |1
b | 12 | 3 |1
b | 311 | 4 |1
c | 1 | 5 |1
Reult should be like:
Item | Code | Desc | QTY
a | 12 | 1 |1
b | 311 | 3 |1
So for each criteria get the first record.

The typical way of doing this uses row_number():
SELECT TOP 1 Item, Code, Desc, 1 AS Qty
FROM (SELECT v.*,
ROW_NUMBER() OVER (PARTITION BY Code ORDER BY (SELECT NULL)) as seqnum
FROM vwTbl1
WHERE Code IN ('12', '311') -- don't use single quotes if these are numbers
) v
WHERE seqnum = 1;

SELECT Top 1 *
FROM
(
SELECT Item, Code, Desc, '1' AS Qty
FROM vwTbl1 WHERE Code = '12' OR Code ='311'
)A
Edited Code based on your expected result:
Declare #YourTable table (Id INT IDENTITY(1,1),Item varchar(50),Code INT,
_Desc INT,Qty INT)
Insert into #YourTable
SELECT 'a',12,1,1 UNION ALL
SELECT 'a',311,2,1 UNION ALL
SELECT 'b',12,3,1 UNION ALL
SELECT 'b',311,4,1 UNION ALL
SELECT 'c',1 ,5 ,1
SELECT Item ,A.Code , _Desc ,Qty
FROM #YourTable T
JOIN
(
SELECT MAX(Id) Id, Code FROM #YourTable GROUP BY Code
)A ON A.Id = T.Id

Sum rows of two tables

I have a Database with two tables:
subscriber_mm
uid_local | uid_foreign | uid_partner
7 |2 |0
7 |4 |0
2 |1 |0
2 |2 |0
5 |1 |0
5 |3 |0
partner_mm
uid_local | uid_foreign | uid_partner
7 |1 |1
My goal is to count the total number of rows by uid_local from both tables
example:
count both tables by uid_local = 7
result: 3
example:
count both tables by uid_local = 2
result: 2
This is my solution (not the best) without the WHERE statement
SELECT sum(
ROWS ) AS total_rows
FROM (
SELECT count( * ) AS ROWS
FROM partner_mm
UNION ALL
SELECT count( * ) AS ROWS
FROM subscriber_mm
) AS u
how can i implement the WHERE statement?

Here pass a value instead of 2 for your query :
select sum(total_count ) as total_count from
(select count(*) as total_count from subscriber_mm s where s.uid_local=2
union all
select count(*) as total_count from partner_mm m where m.uid_local=2) as a
or
select a.uid_local,sum(total_count ) as total_count from
(select s.uid_local as uid_local, count(*) as total_count from subscriber_mm s group by s.uid_local
union all
select m.uid_local as uid_local, count(*) as total_count from partner_mm m group by m.uid_local) as a
group by a.uid_local

Please Try it
select uid_local,uid_foreign,uid_partner from subscriber_mm
union
select uid_local,uid_foreign,uid_partner from partner_mm
You can use Union For sum two table rows

No need of doing SUM since you need only to count the total number of rows returned from both tables for particular uid_local. The total rows can be obtained by using UNION ALL operator which clubs the resultset returned from both the tables WITHOUT removing the repeated records
Select count(*) as Result From
(
Select * from subscriber_mm
where uid_local=7
union all
Select * from subscriber_mm
where uid_local=7
)as tmp

SQL CTE compare rows in the same table

I have a table with customers from different data sources. There are SSN, License#, and some unique IDs but not all sources have the same IDs. I would like to compare the records on the ID columns (SSN, License, SystemID) and assign a mapped ID if same person found.
I am assuming I can use CTE but not sure where to start. Still trying to learn my way in SQL. Any help will be appreciated. Thanks.
This is how the table looks:
Source|RowID|SSN |License|SystemID
A |1 |SSN1|Lic111 |
A |2 | | |Sys666
B |3 |SSN2| |Sys777
C |4 |SSN1| |
D |5 | |Lic333 |
D |6 | |Lic333 |Sys666
E |7 | | |Sys777
Results (added MapCustomerID)
Source|RowID|SSN |License|SystemID|MapCustomerID
A |1 |SSN1|Lic111 | |1
A |2 | | |Sys666 |2
B |3 |SSN2| |Sys777 |3
C |4 |SSN1| | |1
D |5 | |Lic999 | |4
D |6 | |Lic333 |Sys666 |2
E |7 | | |Sys777 |3

Here is what may be a "good-enough" approach to the problem.
Along each of the three dimensions, find the minimum row id for that dimensions (with a special handling of NULLs). The overall customer identifier is then the minimum of these three ids. To make it sequential with no gaps, use dense_rank().
with ids as (
select t.*,
(case when SSN is not null
then min(RowId) over (partition by SSN)
end) as SSN_id,
(case when License is not null
then min(RowId) over (partition by License)
end) as License_id,
(case when SystemId is not null
then min(RowId) over (partition by SystemId)
end)as SystemId_id
from t
),
leastid as (
select ids.*,
(case when SSN_Id <= coalesce(License_Id, SSN_Id) and
SSN_Id <= coalesce(SystemId_id, SSN_Id)
then SSN_Id
when License_Id <= coalesce(SystemId_id, License_Id)
then License_Id
else SystemId_id
end) as LeastId
from ids
)
select Source, RowID, SSN, License, SystemID,
dense_rank(LeastId) over (order by LeastId) as MapCustomerId
from LeastIds;
This is not a complete solution, but it works for your data. It does not work in the following case:
A |1 |SSN1|Lic111 | |1
A |2 |SSN1| |Sys666 |2
A |3 | | |Sys666 |2
Because this requires two "hops".
When I have faced this situation in the past, I have created the extra column in the table and repeatedly used update to get the minimum id over the different dimensions. Such iteration quickly connects the different pieces. It is probably possible to write a recursive CTE to do the same thing. But, the simpler solution above may solve your problem.
EDIT:
Because I've faced this problem before, I wanted to come up with a single query solution (rather than iterating through updates). This is possible using recursive CTEs. Here is code that seems to work:
with t as (
select 'A' as source, 1 as RowId, 'SSN1' as SSN, 'Lic111' as License, 'ABC' as SystemId union all
select 'A', 2, 'SSN1', NULL, 'Sys666' union all
select 'A', 3, NULL, NULL, 'Sys666' union all
select 'A', 4, NULL, 'Lic222', 'Sys666' union all
select 'A', 5, NULL, 'Lic222', NULL union all
select 'A', 6, NULL, 'Lic444', NULL
),
first as (
select t.*,
(select min(RowId)
from t t2
where t2.SSN = t.SSN or
t2.License = t.License or
t2.SystemId = t.SystemId
) as minrowid
from t
),
cte as (
select rowid, minrowid
from first
union all
select cte.rowid, first.minrowid
from cte join
first
on cte.minrowid = first.rowid and
cte.minrowid > first.minrowid
),
lookup as (
select rowid, min(minrowid) as minrowid,
dense_rank() over (order by min(minrowid)) as MapCustomerId
from cte
group by rowid
)
select t.*, lookup.MapCustomerId
from t join
lookup
on t.rowid = lookup.rowid;

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Filling in missing data in Snowflake - sql

Related

Over Partition for set column SQL

SQL select top five most recent row and distinct by a specific column

Removing duplicate results

Sum rows of two tables

SQL CTE compare rows in the same table

Categories

Resources