Can row_number() ignore null in oracle - sql

I have data like this
---------------------------
| code | other column
---------------------------
| C | a
| null | a
| A | a
| null | a
| null | a
----------------------------
How can i write query to get row_number without counting null column.
----------------------------------
| id | code | other column |
----------------------------------
| 1 | C | a
| | null | a
| 2 | A | a
| | null | a
| | null | a
----------------------------------

Well, not specifically. But you can get what you want by using conditional logic:
select (case when code is not null
then row_number() over (partition by (case when code is not null then 1 else 0 end)
order by . . .
)
end) as id
It is not clear to me what the order by is for the row_number() which is what the . . . means.

If you need to order on code (descendent in your example) with NULLs last:
select
decode(code,null,null,row_number() over (order by code DESC NULLS LAST)) rn,
code
from test;
If you need to order on OTHER column:
select
decode(code,null,null,row_number() over (order by decode(code,null,null,'x') NULLS LAST, other DESC)) rn,
code, other
from test;

You can use row_number on the desired subset and then union all the other records:
select row_number() over (order by sortkey) as id, code, other_column
from mytable
where code is not null
union all
select null as id, code, other_column
from mytable
where code is null
order by sortkey;

Another easy way would be:
Select
CASE WHEN code IS NOT NULL THEN ROW_NUMBER() OVER (PARTITION BY code order by code)
ELSE NULL END id,
code,
other_column
FROM table;

Example, in my case using IS NOT NULL did not work for me, I had to change it to an expression:
SELECT A.ITEMNAME,
CASE
WHEN (SELECT T1.MAXSTOCK
FROM DATOS T1
WHERE T1.MAXSTOCK) > 5 THEN
ROW_NUMBER() OVER(PARTITION BY CASE
WHEN (SELECT T1.MAXSTOCK
FROM DATOS T1
WHERE T1.MAXSTOCK) <= 5 /*here should go IS NOT NULL*/
THEN
1
END ORDER BY A.ITEMNAME)
ELSE
NULL
END AS #ROW
FROM TABLE A

Related

Select rows based on multiple conditions

I have a table from which I have to select some rows based on the following conditions.
If more than one row exists with same DocumentRef, then select all the rows if BlockNumber is empty for all rows
If more than one row exists with same DocumentRef, then select only 1 row (ordered by DocumentId asc) with BlockNumber IS NOT EMPTY
If only one row exists with DocumentRef, select it irrespective of anything
Table:
I was trying to group it by DocumentRef and filter with having but having can only have aggregate functions. I think I will have to provide multiple conditions in having separated by OR. Please give me some direction.
Use window functions:
select t.*
from (select t.*,
sum(case when blocknumber is not null then 1 else 0 end) over (partition by documentref) as num_bn_notnull,
rank() over (partition by documentref
order by (case when blocknumber is not null then documentid end) desc nulls last
) as rnk
from t
) t
where num_bn_notnull = 0 or
rnk = 1;
Or, you can use exists clauses:
select t.*
from t
where not exists (select 1
from t t2
where t2.documentref = t.documentref and
t2.blocknumber is not null
) or
t.documentid = (select max(t2.documentid)
from t t2
where t2.documentref = t.documentref and
t2.blocknumber is not null
);
This can take advantage of an index on (documentref, blocknumber, documentid).
Actually, by a quirk of the SQL language, I think this works as well:
select t.*
from t
where t.documentid >= any (select t2.documentid
from t t2
where t2.documentref = t.documentref and
t2.blocknumber is not null
order by t2.documentid
fetch first 1 row only
);
The subquery returns an empty set if all blocknumbers are NULL. By definition, any document id matches the condition on an empty set.
Join the table to a query that returns for each documentref the maximum documentid for all the blocknumbers that are not null or null if they are all null:
select t.*
from tablename t inner join (
select
documentref,
max(case when blocknumber is not null then documentid end) maxid
from tablename
group by documentref
) d on d.documentref = t.documentref
and t.documentid = coalesce(d.maxid, t.documentid)
See the demo.
Results:
> DOCUMENTID | DOCUMENTREF | WARDID | BLOCKNUMBER
> ---------: | ----------: | -----: | ----------:
> 203962537 | 100000126 | B | A
> 203962538 | 100000130 | B | A
> 203962542 | 100000151 | null | null
> 203962543 | 100000151 | null | null
> 203962544 | 100000180 | B | A
> 203962546 | 100000181 | B | A
> 203962551 | 100000185 | null | null
> 203962552 | 100000186 | B | A

Using the last_value function on every column | Downfilling all nulls in a table

I have table an individual level table, ordered by Person_ID and Date, ascending. There are duplicate entries at the Person_ID level. What I would like to do is "downfill" null values across every column -- my impression is that the last_value( | ignore nulls) function will work perfectly for each column.
A major problem is that the table is hundreds of columns wide, and is quite dynamic (feature creation for ML experiments). There has to be a better way than to writing out a last_value statement for each variable, something like this:
SELECT last_value(var1) OVER (PARTITION BY Person_ID ORDER BY Date ASC
RANGE BETWEEN UNBOUNDED PRECEDING) as Var1,
last_value(var2) OVER (PARTITION BY Person_ID ORDER BY Date ASC
RANGE BETWEEN UNBOUNDED PRECEDING) as Var2,
...
last_value(var300) OVER (PARTITION BY Person_ID ORDER BY Date ASC
RANGE BETWEEN UNBOUNDED PRECEDING) as Var3
FROM TABLE
In summmary, I have the following table:
+----------+-----------+------+------+---+------------+
| PersonID | YearMonth | Var1 | Var2 | … | Var300 |
+----------+-----------+------+------+---+------------+
| 1 | 200901 | 2 | null | | null |
| 1 | 200902 | null | 1 | | Category 1 |
| 1 | 201010 | null | 1 | | null |
+----------+-----------+------+------+---+------------+
and desire the following table:
+----------+-----------+------+------+---+------------+
| PersonID | YearMonth | Var1 | Var2 | … | Var300 |
+----------+-----------+------+------+---+------------+
| 1 | 200901 | 2 | null | | null |
| 1 | 200902 | 2 | 1 | | Category 1 |
| 1 | 201010 | 2 | 1 | | Category 1 |
+----------+-----------+------+------+---+------------+
I don't see any great options for you, but here are two approaches you might look into.
OPTION 1 -- Recursive CTE
In this approach, you use a recursive query, where each child value equals itself or, if it is null, its parent's value. Like so:
WITH
ordered AS (
SELECT yt.*
row_number() over ( partition by yt.personid order by yt.yearmonth ) rn
FROM YOUR_TABLE yt),
downfilled ( personid, yearmonth, var1, var2, ..., var300, rn) as (
SELECT o.*
FROM ordered o
WHERE o.rn = 1
UNION ALL
SELECT c.personid, c.yearmonth,
nvl(c.var1, p.var1) var1,
nvl(c.var2, p.var2) var2,
...
nvl(c.var300, p.var300) var300
FROM downfilled p INNER JOIN ordered c ON c.personid = p.personid AND c.rn = p.rn + 1 )
SELECT * FROM downfilled
ORDER BY personid, yearmonth;
This replaces each expression like this:
last_value(var2) OVER (PARTITION BY Person_ID ORDER BY Date ASC
RANGE BETWEEN UNBOUNDED PRECEDING) as Var2
with an expression like this:
NVL(c.var2, p.var2)
One downside, though, is that this makes you repeat the list of 300 columns twice (once for the 300 NVL() expressions and once to specify the output columns of the recursive CTE (downfilled).
OPTION 2 -- UNPIVOT and PIVOT again
In this approach, you UNPIVOT your VARxx columns into rows, so that you only need to write the last_value()... expression one time.
SELECT personid,
yearmonth,
var_column,
last_value(var_value ignore nulls)
over ( partition by personid, var_column order by yearmonth ) var_value
FROM YOUR_TABLE
UNPIVOT INCLUDE NULLS ( var_value FOR var_column IN ("VAR1","VAR2","VAR3") ) )
SELECT * FROM unp
PIVOT ( max(var_value) FOR var_column IN ('VAR1' AS VAR1, 'VAR2' AS VAR, 'VAR3' AS VAR3 ) )
Here you still need to list each column twice. Also, I'm not sure what performance will be like if you have a large data set.

How can i check the order of column values(by date) for every unique id?

I have this table, Activity:
| ID | Date of activity | activity |
|----|---------------------|----------|
| 1 | 2016-05-01T13:45:03 | a |
| 1 | 2016-05-02T13:45:03 | b |
| 1 | 2016-05-03T13:45:03 | a |
| 1 | 2016-05-04T13:45:03 | b |
| 2 | 2016-05-01T13:45:03 | b |
| 2 | 2016-05-02T13:45:03 | b |
and this table:
| id | Right order |
|----|-------------|
| 1 | yes |
| 2 | no |
How can I check for every ID if the order of the activities is sumiliar to this order for example ?
a b a b a b ..
of course i'll check according to activity date
In SQL Server 2012+ you could use common table expression with lag(), and then the min() of a case expression that follows your logic like so:
;with cte as (
select *
, prev_activity = lag(activity) over (partition by id order by date_of_activity)
from t
)
select id
, right_order = min(case
when activity = 'a' and isnull(prev_activity,'b')<>'b' then 'no'
when activity = 'b' and isnull(prev_activity,'b')<>'a' then 'no'
else 'yes'
end)
from cte
group by id
rextester demo: http://rextester.com/NQQF78056
returns:
+----+-------------+
| id | right_order |
+----+-------------+
| 1 | yes |
| 2 | no |
+----+-------------+
Prior to SQL Server 2012 you can use outer apply() to get the previous activity instead of lag() like so:
select id
, right_order = min(case
when activity = 'a' and isnull(prev_activity,'b')<>'b' then 'no'
when activity = 'b' and isnull(prev_activity,'b')<>'a' then 'no'
else 'yes'
end)
from t
outer apply (
select top 1 prev_activity = i.activity
from t as i
where i.id = t.id
and i.date_of_activity < t.date_of_activity
order by i.date_of_activity desc
) x
group by id
EDITED - Allows for variable number of Patterns per ID
Perhaps another approach
Example
Declare #Pat varchar(max)='a b'
Declare #Cnt int = 2
Select ID
,RightOrder = case when rtrim(replicate(#Pat+' ',Hits/#Cnt)) = (Select Stuff((Select ' ' +activity From t Where id=A.id order by date_of_activity For XML Path ('')),1,1,'') ) then 'Yes' else 'No' end
From (Select ID,hits=count(*) from t group by id) A
Returns
ID RightOrder
1 Yes
2 No
select id,
case when sum(flag)=0 and cnt_per_id%2=0
and max(case when rnum=1 then activity end) = 'a'
and max(case when rnum=2 then activity end) = 'b'
and min_activity = 'a' and max_activity = 'b'
then 'yes' else 'no' end as RightOrder
from (select t.*
,row_number() over(partition by id order by activitydate) as rnum
,count(*) over(partition by id) as cnt_per_id
,min(activity) over(partition by id) as min_activity
,max(activity) over(partition by id) as max_activity
,case when lag(activity) over(partition by id order by activitydate)=activity then 1 else 0 end as flag
from tbl t
) t
group by id,cnt_per_id,max_activity,min_activity
Based on the explanation the following logic has to be implemented for rightorder.
Check if the number of rows per id are even (Remove this condition if there can be an odd number of rows like a,b,a or a,b,a,b,a and so on)
First row contains a and second b, min activity is a and max activity is b for an id.
Sum of flags (set using lag) should be 0

Select latest NOT NULL values from table

I have table with many statuses like
Id | Date | IsEnabled | IsUpdated | IsDuplicate | IsSuspended | ...
Statuses (IsEnabled, IsUpdated, IsDuplicate, IsSuspended...) are nullable bits.
I need to select the latest (but not greater then some input date) not nullable statuses from this table. In case some status has NULL value then select previous not nullable value.
I've create select to select only latest values and can't understand how to get previous not nullable values.
;WITH CTE AS (
SELECT cbs.*, rn = ROW_NUMBER() OVER (PARTITION BY cbs.Id ORDER BY cbs.[Date] DESC)
FROM [dbo].CompanyBusinessStatus cbs
WHERE cbs.[Date] <= #inputDate
)
SELECT *
FROM CTE
WHERE rn = 1
I'm using MS SQL 2016
Data Example :
1 | 2017-01-01 | 1 | 0 | 0 | 0
_______________________________________
1 | 2017-01-03 | 1 | NULL | NULL | 1
_______________________________________
2 | 2017-01-03 | 1 | 1 | NULL | 0
_______________________________________
1 | 2017-01-05 | 0 | 1 | 0 | NULL
In case #inputDate is '2017-01-04' I need to select
Id | IsEnabled | IsUpdated | IsDuplicate | IsSuspended
_________________________________________________________
1 | 1 | 0 | 0 | 1
_________________________________________________________
2 | 1 | 1 | NULL | 0
One way (demo) would be
SELECT Id,
IsEnabled = CAST(RIGHT(MAX(yyyymmdd + CAST(IsEnabled AS CHAR(1))), 1) AS BIT),
IsUpdated = CAST(RIGHT(MAX(yyyymmdd + CAST(IsUpdated AS CHAR(1))), 1) AS BIT),
IsDuplicate = CAST(RIGHT(MAX(yyyymmdd + CAST(IsDuplicate AS CHAR(1))), 1) AS BIT),
IsSuspended = CAST(RIGHT(MAX(yyyymmdd + CAST(IsSuspended AS CHAR(1))), 1) AS BIT)
FROM dbo.CompanyBusinessStatus cbs
CROSS APPLY (SELECT FORMAT(Date, 'yyyyMMdd')) CA(yyyymmdd)
WHERE cbs.[Date] <= #inputDate
GROUP BY Id
If you have a covering index on id (or even if you don't but get a hash aggregate) this can produce a plan with no sort operations at all and may be significantly cheaper than Gordon's answer.
My other answer clearly misinterpreted the question. Unfortunately, SQL Server only offers FIRST_VALUE() as a window function. So, here is one method:
SELECT DISTINCT cbs.id,
MAX(cbs.date) OVER (PARTITION BY cbs.id) as date,
FIRST_VALUE(IsEnabled) OVER (PARTITION BY cbs.id ORDER BY (CASE WHEN IsEnabled IS NULL THEN 2 ELSE 1 END), cbs.date DESC) as isEnabled,
FIRST_VALUE(IsUpdated) OVER (PARTITION BY cbs.id ORDER BY (CASE WHEN IsUpdated IS NULL THEN 2 ELSE 1 END), cbs.date DESC) as IsUpdated,
. . .
FROM [dbo].CompanyBusinessStatus cbs
WHERE cbs.[Date] <= #inputDate ;
I'm not a fan of SELECT DISTINCT for this purpose, but it seems like the easiest way to express the logic.
ANSI SQL offers the IGNORE NULLs option for FIRST_VALUE() (and some other window functions). However, SQL Server does not (yet) support this option.
For the query below, I think the Order by in the ROW_NUMBER will take the records with least NULL as the 1st one for your output.
WITH CTE AS (
SELECT cbs.*, rn = ROW_NUMBER() OVER (PARTITION BY cbs.Id ORDER BY cbs.[Date] DESC, IsEnabled DESC,IsUpdated DESC,IsDuplicate DESC,IsSuspended DESC)
FROM [dbo].CompanyBusinessStatus cbs
WHERE cbs.[Date] <= #inputDate
)
SELECT *
FROM CTE
WHERE rn = 1
The only way I know to do what you want is to do a correlated sub-query for each of the "Status" columns. It's a lot of SQL to write, and doesn't look very elegant, but it will definitely work in any version of SQL Server.
There might be a more elegant solution involving UNPIVOTing and then RE-PIVOTing, but I wouldn't bother going that route unless I had over 20 different "Status" columns.

Selecting Top 1 for Every ID

I have the following table:
| ID | ExecOrd | date |
| 1 | 1.0 | 3/4/2014|
| 1 | 2.0 | 7/7/2014|
| 1 | 3.0 | 8/8/2014|
| 2 | 1.0 | 8/4/2013|
| 2 | 2.0 |12/2/2013|
| 2 | 3.0 | 1/3/2014|
| 2 | 4.0 | |
I need to get the date of the top ExecOrd per ID of about 8000 records, and so far I can only do it for one ID:
SELECT TOP 1 date
FROM TABLE
WHERE DATE IS NOT NULL and ID = '1'
ORDER BY ExecOrd DESC
A little help would be appreciated. I have been trying to find a similar question to mine with no success.
There are several ways of doing this. A generic approach is to join the table back to itself using max():
select t.date
from yourtable t
join (select max(execord) execord, id
from yourtable
group by id
) t2 on t.id = t2.id and t.execord = t2.execord
If you're using 2005+, I prefer to use row_number():
select date
from (
select row_number() over (partition by id order by execord desc) rn, date
from yourtable
) t
where rn = 1;
SQL Fiddle Demo
Note: they will give different results if ties exist.
;with cte as (
SELECT id,row_number() over(partition by ID order byExecOrd DESC) r
FROM TABLE WHERE DATE IS NOT NULL )
select id from
cte where r=1