ROW_NUMBER() Set Null to anything beside Min - sql

So I have this basic setup:
Declare #temp Table(t1 varchar(1)
,t2 int)
insert into #temp (t1,t2)
Values
('a','1')
,('a','2')
,('a','3')
,('a','4')
,('a',null)
select t1,t2,ROW_NUMBER() OVER ( PARTITION BY T1 ORDER BY t2) 'rnk'
from #temp
The problem is, the value that is Null get ranked the highest. What I am trying to do is set the first non zero/null value to the highest rank(lowest number) current output is:
t1 t2 rnk
a NULL 1
a 0 2
a 1 3
a 2 4
a 3 5
I want
t1 t2 rnk
a NULL 4/5 --either or
a 0 4/5
a 1 1
a 2 2
a 3 3
I know i can do this with subquerys but the problem is to get t2, is a 200 character case statement that i really don't want to copy and paste all over, once to calculate, then one to order by and such. I am seeing a Query to get the values, inside a query to get the rank,inside a query to only pull those ranked 1, which is 3 deep and i don't know like that. note i know it say oracle and i am sure at least one person will mark me down since this is in SQL server, BUT, the actual code is in oracle, i am just much better in SQL server and its easy to translate unless Oracle has some magic function that makes this easier.

You can use two keys for the order by. The following is compatible with both SQL Server and Oracle:
select t1, t2,
ROW_NUMBER() OVER (PARTITION BY T1
ORDER BY (CASE WHEN t2 IS NOT NULL OR T2 <> 0 THEN 0 ELSE 1 END),
t2
) as rnk
from #temp;
Oracle supports NULLS LAST, which makes it easier: ORDER BY t2 NULLS LAST.

Another option would be to use the ISNULL function to set it to the max value of the type on null values. So if t2 is an integer it would be:
select t1,t2,ROW_NUMBER() OVER ( PARTITION BY T1 ORDER BY ISNULL(t2,2147483647)) 'rnk'
from #temp
This would prevent you from having to use t2 (your big case statement) in your expression more than once.
I believe Oracle uses NVL instead of ISNULL.

Related

How to write a hive sql that insert into table2 if there's a specific row in table1

I am facing a hive problem.
I will get a 0 or 1 after from sql
"select count(*) from table1 where ..."
If the result is 1, then I will execute the sql
"Insert Into table2 partition(d) (select xxxx from table 1 where ...
group by t)"
Otherwise do nothing.
My question is how can I write these two sql together into one sql. I am only allowed to write a single long sql.
I tried to put the first sql into the where condition in sql2, but it throwed an error said it's not supported to operat on table1 in the subquery (couldn't remember clearly, something like this).
It sounds like a very easy question for experienced programmers, but I just started lerning hive for 2 days.
If select in insert overwrite table partition does not returns rows, nothing is being overwritten.
So, just calculate your count in the same dataset and filter by it, use analytics funtion if you want to aggregate data on different level before insert
Insert Into table2 partition(d)
select col1, col2, sum(col3), etc, etc, partition_col
from
(
select --some columns here,
--Assign the same count to all rows
count(case when your_boolean_condition then 1 else null end) over () as cnt
from table 1
) s
where cnt=1 --If this is not satisfied, no overwrite will happen
AND more_conditions
group by ...
Another approach possible is to use cross-join with your count:
insert Into table2 partition(d)
select xxxx ... ... ..., partition_column as d
from
(
select t.*, c.cnt
table1 t cross join (select count(*) cnt from table1 where condition) c
)s
where cnt=1 <and another_condition>

SQL Fill blank rows with the first data per group

This is the result set of my raw data.
what I want is to fill the next blank row for column
name with the first name per group.
in this example rowid 1881 and 1879 should be filled
MAR ROXAS, and 1881-1887 filled into RODRIGO DUTERTE and so on.
The ideal way is lag(ignore nulls), but SQL Server doesn't support that. Instead, you can use two levels of window functions:
select max(name) over (partition by name_rowid) as new_name
from (select t.*,
max(case when name is not null then rowid end) over (order by rowid) as name_rowid
from billtrans t
) t;
The above works in SQL Server 2012+. In SQL Server 2008, you can use much less efficient methods, such as outer apply:
select t.*, t2.name as new_name
from billtrans t outer apply
(select top 1 t2
from billtrans t2
where t2.rowid <= t.rowid and t2.name is not null
order by t2.rowid desc
) t2;
You can also phrase this using a similarly structured correlated subquery.

LAG functions and NULLS

How can I tell the LAG function to get the last "not null" value?
For example, see my table bellow where I have a few NULL values on column B and C.
I'd like to fill the nulls with the last non-null value. I tried to do that by using the LAG function, like so:
case when B is null then lag (B) over (order by idx) else B end as B,
but that doesn't quite work when I have two or more nulls in a row (see the NULL value on column C row 3 - I'd like it to be 0.50 as the original).
Any idea how can I achieve that?
(it doesn't have to be using the LAG function, any other ideas are welcome)
A few assumptions:
The number of rows is dynamic;
The first value will always be non-null;
Once I have a NULL, is NULL all up to the end - so I want to fill it with the latest value.
Thanks
You can do it with outer apply operator:
select t.id,
t1.colA,
t2.colB,
t3.colC
from table t
outer apply(select top 1 colA from table where id <= t.id and colA is not null order by id desc) t1
outer apply(select top 1 colB from table where id <= t.id and colB is not null order by id desc) t2
outer apply(select top 1 colC from table where id <= t.id and colC is not null order by id desc) t3;
This will work, regardless of the number of nulls or null "islands". You may have values, then nulls, then again values, again nulls. It will still work.
If, however the assumption (in your question) holds:
Once I have a NULL, is NULL all up to the end - so I want to fill it with the latest value.
there is a more efficient solution. We only need to find the latest (when ordered by idx) values. Modifying the above query, removing the where id <= t.id from the subqueries:
select t.id,
colA = coalesce(t.colA, t1.colA),
colB = coalesce(t.colB, t2.colB),
colC = coalesce(t.colC, t3.colC)
from table t
outer apply (select top 1 colA from table
where colA is not null order by id desc) t1
outer apply (select top 1 colB from table
where colB is not null order by id desc) t2
outer apply (select top 1 colC from table
where colC is not null order by id desc) t3;
You could make a change to your ORDER BY, to force the NULLs to be first in your ordering, but that may be expensive...
lag(B) over (order by CASE WHEN B IS NULL THEN -1 ELSE idx END)
Or, use a sub-query to calculate the replacement value once. Possibly less expensive on larger sets, but very clunky.
- Relies on all the NULLs coming at the end
- The LAG doesn't rely on that
COALESCE(
B,
(
SELECT
sorted_not_null.B
FROM
(
SELECT
table.B,
ROW_NUMBER() OVER (ORDER BY table.idx DESC) AS row_id
FROM
table
WHERE
table.B IS NOT NULL
)
sorted_not_null
WHERE
sorted_not_null.row_id = 1
)
)
(This should be faster on larger data-sets, than LAG or using OUTER APPLY with correlated sub-queries, simply because the value is calculated once. For tidiness, you could calculate and store the [last_known_value] for each column in variables, then just use COALESCE(A, #last_known_A), COALESCE(B, #last_known_B), etc)
if it is null all the way up to the end then can take a short cut
declare #b varchar(20) = (select top 1 b from table where b is not null order by id desc);
declare #c varchar(20) = (select top 1 c from table where c is not null order by id desc);
select is, isnull(b,#b) as b, insull(c,#c) as c
from table;
Select max(diff) from(
Select
Case when lag(a) over (order by b) is not null
Then (a -lag(a) over (order by b)) end as diff
From <tbl_name> where
<relevant conditions>
Order by b) k
Works fine in db visualizer.
UPDATE table
SET B = (#n := COALESCE(B , #n))
WHERE B is null;

Update Table Beginning At Record One SQL Server

I am trying to update a table with records from another table. Whenever I use the insert into statement, I find that the records are simply appended. Instead, I want the records to be inserted from the top of the table. What is the easiest way to do this? I am thin king I could use a update statement, but that means I will have to join the tables. One of the tables(the one I am pulling records from) has only one column. As such, I would have to include another column to do the join.I am trying not to make it so complicated. If there is a simplier way, please let me know.
Sample:
Table One
Col1
1
2
3
4
Table 2
Col1 Col2
a
b
c
d
I want to move column 1 from table 1 to column 2 in table 2 such that table 2 will be:
Table 2
Col1 Col2
a 1
b 2
c 3
d 4
You can do the update using row_number(), but the rows will be assigned in an indeterminate order:
with toupdate as (
select t2.*, row_number() over (select NULL)) as seqnum
from table2 t2
),
t1 as (
select t1.*, row_numbrer() over (select NULL)) as seqnum
from table1 t1
)
update toupdate
set col2 = t1.col1
from toupdate join
t1
on toupdate.seqnum = t1.seqnum;
Note: if you have an ordering in mind, then use the appropriate order by in the partition clauses.
Unless you explicity define an ORDER BY clause in your SELECT statements, your result set will be completely arbitrary. This is in line with how any RDBMS should operate. You should consider including a timestamp at the time of insertion to identify the latest rows.

sql delete rows with 1 column duplicated

I have a microsoft sql 2005 db table where the entire row is not duplicate, but a column is duplicated.
1 aaa
1 bbb
1 ccc
2 abc
2 def
How can i delete all the rows but 1 that have the first column duplicated?
For clarification I need to get rid of the second, third and fifth rows.
Try the following query in sql server 2005
WITH T AS (SELECT ROW_NUMBER()OVER(PARTITION BY id ORDER BY id) AS rnum,* FROM dbo.Table_1)
DELETE FROM T WHERE rnum>1
Let's call these the id and the Col1 columns.
DELETE myTable T1
WHERE EXISTS
(SELECT * FROM myTable T2
WHERE T2.id = T1.id AND T2.Col1 > T1.Col1)
Edit: As pointed out by Andomar, the above doesn't get rid of exact duplicate cases, where both id and Col1 are the same in different rows.
These can be handled as follow:
(note: whereby the above query is generic SQL, the following applies to MSSQL 2005 and above)
It uses the Common Table Expression (CTE) feature, along with ROW_NUMBER() function to produce a distinctive row value. It is essentially the same construct as the above except that it now works with a "table" (CTEs are mostly like a table) which has a truly distinct identifier key.
Note that by removing "AND T2.Col1 = T1.Col1", we produce a query which can handle both types of duplicates (id-only duplicates and both Id and Col1 duplicates) in a single query, i.e. in a similar fashion that Hamadri's solution (the PARTITION in his/her CTE serves the same purpose as the subquery in this solution, essentially the same amount of work is done). Depending on the situation, it may be preferable, performance-wise or other, to handle the situation in two steps.
WITH T AS
(SELECT ROW_NUMBER() OVER (ORDER BY id, Col1) AS rn, id, Col1 FROM MyTable)
DELETE T AS T1
WHERE EXISTS
(SELECT *
FROM T AS T2
WHERE T2.id = T1.id AND T2.Col1 = T1.Col1
AND T2.rn > T1.rn
)
DELETE tableName as ta
WHERE col2 NOT IN (SELECT MIN(col2) FROM tableName AS t2 GROUP BY col1)
Make sure the sub select returns the rows you want to keep.
Try this.
DELETE FROM <TABLE_NAME_HERE> WHERE <SECOND_COLUMN_NAME_HERE> IN ("bbb","abc","def");
SQL server is not my native SQL database, but maybe something like this? The idea is to get the duplicates and delete the ones with the larger ROW_NUMBER. This should leave only the first one. I dont know if this is what you want or if it will work, but the logic seems sound
DELETE T1
FROM T1 T2
WHERE T1.Col1 = T2.col1
AND T1.ROW_NUMBER() > T2.ROW_NUMBER()
Please feel free to correct me if SQL server cant handle that kind of treatment :)
--Another idea using ROW_NUMBER()
Delete MyTable
Where Id IN
(
Select T.Id FROM
(
SELECT ROW_NUMBER() OVER (PARTITION BY UniqueColumn ORDER BY Id) AS RowNumber FROM MyTable
)T
WHERE T.RowNumber > 1
)