Select latest NOT NULL values from table - sql

I have table with many statuses like
Id | Date | IsEnabled | IsUpdated | IsDuplicate | IsSuspended | ...
Statuses (IsEnabled, IsUpdated, IsDuplicate, IsSuspended...) are nullable bits.
I need to select the latest (but not greater then some input date) not nullable statuses from this table. In case some status has NULL value then select previous not nullable value.
I've create select to select only latest values and can't understand how to get previous not nullable values.
;WITH CTE AS (
SELECT cbs.*, rn = ROW_NUMBER() OVER (PARTITION BY cbs.Id ORDER BY cbs.[Date] DESC)
FROM [dbo].CompanyBusinessStatus cbs
WHERE cbs.[Date] <= #inputDate
)
SELECT *
FROM CTE
WHERE rn = 1
I'm using MS SQL 2016
Data Example :
1 | 2017-01-01 | 1 | 0 | 0 | 0
_______________________________________
1 | 2017-01-03 | 1 | NULL | NULL | 1
_______________________________________
2 | 2017-01-03 | 1 | 1 | NULL | 0
_______________________________________
1 | 2017-01-05 | 0 | 1 | 0 | NULL
In case #inputDate is '2017-01-04' I need to select
Id | IsEnabled | IsUpdated | IsDuplicate | IsSuspended
_________________________________________________________
1 | 1 | 0 | 0 | 1
_________________________________________________________
2 | 1 | 1 | NULL | 0

One way (demo) would be
SELECT Id,
IsEnabled = CAST(RIGHT(MAX(yyyymmdd + CAST(IsEnabled AS CHAR(1))), 1) AS BIT),
IsUpdated = CAST(RIGHT(MAX(yyyymmdd + CAST(IsUpdated AS CHAR(1))), 1) AS BIT),
IsDuplicate = CAST(RIGHT(MAX(yyyymmdd + CAST(IsDuplicate AS CHAR(1))), 1) AS BIT),
IsSuspended = CAST(RIGHT(MAX(yyyymmdd + CAST(IsSuspended AS CHAR(1))), 1) AS BIT)
FROM dbo.CompanyBusinessStatus cbs
CROSS APPLY (SELECT FORMAT(Date, 'yyyyMMdd')) CA(yyyymmdd)
WHERE cbs.[Date] <= #inputDate
GROUP BY Id
If you have a covering index on id (or even if you don't but get a hash aggregate) this can produce a plan with no sort operations at all and may be significantly cheaper than Gordon's answer.

My other answer clearly misinterpreted the question. Unfortunately, SQL Server only offers FIRST_VALUE() as a window function. So, here is one method:
SELECT DISTINCT cbs.id,
MAX(cbs.date) OVER (PARTITION BY cbs.id) as date,
FIRST_VALUE(IsEnabled) OVER (PARTITION BY cbs.id ORDER BY (CASE WHEN IsEnabled IS NULL THEN 2 ELSE 1 END), cbs.date DESC) as isEnabled,
FIRST_VALUE(IsUpdated) OVER (PARTITION BY cbs.id ORDER BY (CASE WHEN IsUpdated IS NULL THEN 2 ELSE 1 END), cbs.date DESC) as IsUpdated,
. . .
FROM [dbo].CompanyBusinessStatus cbs
WHERE cbs.[Date] <= #inputDate ;
I'm not a fan of SELECT DISTINCT for this purpose, but it seems like the easiest way to express the logic.
ANSI SQL offers the IGNORE NULLs option for FIRST_VALUE() (and some other window functions). However, SQL Server does not (yet) support this option.

For the query below, I think the Order by in the ROW_NUMBER will take the records with least NULL as the 1st one for your output.
WITH CTE AS (
SELECT cbs.*, rn = ROW_NUMBER() OVER (PARTITION BY cbs.Id ORDER BY cbs.[Date] DESC, IsEnabled DESC,IsUpdated DESC,IsDuplicate DESC,IsSuspended DESC)
FROM [dbo].CompanyBusinessStatus cbs
WHERE cbs.[Date] <= #inputDate
)
SELECT *
FROM CTE
WHERE rn = 1

The only way I know to do what you want is to do a correlated sub-query for each of the "Status" columns. It's a lot of SQL to write, and doesn't look very elegant, but it will definitely work in any version of SQL Server.
There might be a more elegant solution involving UNPIVOTing and then RE-PIVOTing, but I wouldn't bother going that route unless I had over 20 different "Status" columns.

Related

SQL select all rows per group after a condition is met

I would like to select all rows for each group after the last time a condition is met for that group. This related question has an answer using correlated subqueries.
In my case I will have millions of categories and hundreds of millions/billions of rows. Is there a way to achieve the same results using a more performant query?
Here is an example. The condition is all rows (per group) after the last 0 in the conditional column.
category | timestamp | condition
--------------------------------------
A | 1 | 0
A | 2 | 1
A | 3 | 0
A | 4 | 1
A | 5 | 1
B | 1 | 0
B | 2 | 1
B | 3 | 1
The result I would like to achieve is
category | timestamp | condition
--------------------------------------
A | 4 | 1
A | 5 | 1
B | 2 | 1
B | 3 | 1
If you want everything after the last 0, you can use window functions:
select t.*
from (select t.*,
max(case when condition = 0 then timestamp end) over (partition by category) as max_timestamp_0
from t
) t
where timestamp > max_timestamp_0 or
max_timestamp_0 is null;
With an index on (category, condition, timestamp), the correlated subquery version might also perform quite well:
select t.*
from t
where t.timestamp > all (select t2.timestamp
from t t2
where t2.category = t.category and
t2.condition = 0
);
You might want to try window functions:
select category, timestamp, condition
from (
select
t.*,
min(condition) over(partition by category order by timestamp desc) min_cond
from mytable t
) t
where min_cond = 1
The window min() with the order by clause computes the minimum value of condition over the current and following rows of the same category: we can use it as a filter to eliminate rows for which there is a more recent row with a 0.
Compared to the correlated subquery approach, the upside of using window functions is that it reduces the number of scans needed on the table. Of course this computing also has a cost, so you'll need to assess both solutions against your sample data.

Selecting Case statement in SQL Server

I am trying to add a case statement to my query using SQL Server. I had a very long query that I have basically selected into a temporary table #Step1 giving the following table.
+---+------------+-------------+-----------------+---------------+
| | LOB | TechPrem | Label | Data |
+---+------------+-------------+-----------------+---------------+
| 1 | AOP | Yes | ADjAAL | 40331 |
| 1 | Boiler | Yes | AdjAAL | 0 |
| 1 | TRIA | NO | AdjAAL | 0 |
| 1 | AOP | Yes | PureAAL | 11904 |
| 1 | Boiler | Yes | PureAAL | 775 |
+---+------------+-------------+---------------- +---------------+
My doubt here is, looking at the above table, I want to select a case statement where if the 'TechPrem' is 'Yes' for AOP & Boiler, then my Query 1 should execute, else if 'TechPrem' is 'No' for AOP & Boiler, then Query 2 should execute. Any suggestions or thoughts on this would be helpful
Query 1 :
SELECT
FileID,
SUM(CAST(REPLACE(Data,',','.') AS FLOAT)) AS Summed_AAL_Attri
FROM
(SELECT
*,
ROW_NUMBER() OVER (PARTITION BY FileID, "LOB" ORDER BY "Label" ASC) AS rn
FROM
DATA WITH (NOLOCK)
WHERE
Label IN ('AdjAAL')
AND LOB IN ('AOP', 'Boiler', 'TRIA')) AS t
WHERE
t.rn = 1
AND FileID = 1
GROUP BY
FileID
Expected answer if 'Yes' : 403301
Query 2:
SELECT
FileID,
SUM(CAST(REPLACE(Data,',','.') AS FLOAT)) AS Summed_AAL_Attri
FROM
(SELECT
*,
row_number() over (partition by FileID, "LOB" ORDER BY "Label" ASC) as rn
FROM
DATA WITH (NOLOCK)
WHERE
Label IN ('AdjAAL','PureAAL')
AND LOB IN ('AOP', 'Boiler', 'TRIA')) AS t
WHERE
t.rn = 1
AND FileID = 1
GROUP BY
FileID
Expected answer if 'No' : 41106
Running different queries based on a case is not something SQL Server supports, however since I only see one difference in these queries in the WHERE clause of the sub-query, you can use the case statement there.
SELECT
FileID,
SUM(CAST(REPLACE(Data,',','.') AS FLOAT)) AS Summed_AAL_Attri
FROM
(SELECT
*,
ROW_NUMBER() OVER (PARTITION BY FileID, "LOB" ORDER BY "Label" ASC) AS rn
FROM
DATA WITH (NOLOCK)
WHERE
Label IN ('AdjAAL', CASE TechPrem
WHEN 'YES' THEN ''
WHEN 'NO' THEN 'PureAAL'
END)
AND LOB IN ('AOP', 'Boiler', 'TRIA')) AS t
WHERE
t.rn = 1
AND FileID = 1
GROUP BY
FileID

Can row_number() ignore null in oracle

I have data like this
---------------------------
| code | other column
---------------------------
| C | a
| null | a
| A | a
| null | a
| null | a
----------------------------
How can i write query to get row_number without counting null column.
----------------------------------
| id | code | other column |
----------------------------------
| 1 | C | a
| | null | a
| 2 | A | a
| | null | a
| | null | a
----------------------------------
Well, not specifically. But you can get what you want by using conditional logic:
select (case when code is not null
then row_number() over (partition by (case when code is not null then 1 else 0 end)
order by . . .
)
end) as id
It is not clear to me what the order by is for the row_number() which is what the . . . means.
If you need to order on code (descendent in your example) with NULLs last:
select
decode(code,null,null,row_number() over (order by code DESC NULLS LAST)) rn,
code
from test;
If you need to order on OTHER column:
select
decode(code,null,null,row_number() over (order by decode(code,null,null,'x') NULLS LAST, other DESC)) rn,
code, other
from test;
You can use row_number on the desired subset and then union all the other records:
select row_number() over (order by sortkey) as id, code, other_column
from mytable
where code is not null
union all
select null as id, code, other_column
from mytable
where code is null
order by sortkey;
Another easy way would be:
Select
CASE WHEN code IS NOT NULL THEN ROW_NUMBER() OVER (PARTITION BY code order by code)
ELSE NULL END id,
code,
other_column
FROM table;
Example, in my case using IS NOT NULL did not work for me, I had to change it to an expression:
SELECT A.ITEMNAME,
CASE
WHEN (SELECT T1.MAXSTOCK
FROM DATOS T1
WHERE T1.MAXSTOCK) > 5 THEN
ROW_NUMBER() OVER(PARTITION BY CASE
WHEN (SELECT T1.MAXSTOCK
FROM DATOS T1
WHERE T1.MAXSTOCK) <= 5 /*here should go IS NOT NULL*/
THEN
1
END ORDER BY A.ITEMNAME)
ELSE
NULL
END AS #ROW
FROM TABLE A

Redshift: Getting rank of a row, filtered by a condition

Every time I add a row to a table, I want to know where it ranks in comparison with the table up to that point. This is easily done with the RANK() window function. However, I'm struggling to find a way to to discover where it ranks in comparison with the table up until that point filtered by a value.
As an example, I'm wanting to end up with this highly contrived table:
date | name | animal_bought | num_sloths_bought_before | num_camels_bought_before
------------+---------+---------------+--------------------------+--------------------------
2014-09-01 | Vincent | sloth | 0 | 0
2014-09-01 | Luis | camel | 0 | 0
2014-09-02 | Vincent | sloth | 1 | 0
2014-09-02 | Luis | camel | 0 | 1
2014-09-02 | Kevin | sloth | 0 | 0
2014-09-03 | Vincent | camel | 1 | 0
2014-09-04 | Deo | camel | 0 | 0
2014-09-04 | Vincent | sloth | 2 | 1
2014-09-05 | Luis | camel | 0 | 2
2014-09-05 | Andrew | sloth | 0 | 0
I was initially looking to see whether I could apply a filter to the window function (eg. RANK() OVER(PARTITION BY name WHERE animal_bought = 'sloth' ORDER BY date ASC) AS num_sloths_bought_before) but this isn't syntactically correct. I then tried adding a sub-query, as follows:
SELECT
date,
name,
animal_bought,
( SELECT
RANK() OVER(PARTITION BY name ORDER BY date ASC) - 1
FROM this_table
WHERE animal_bought = 'sloth'
) AS num_sloths_bought_before
FROM source_table
but Redshift threw this error:
ERROR: This type of correlated subquery pattern is not supported yet
I've also tried putting the window function in a case statement (throws the same error) and calculating the ranks in a join query (not been able to make it work).
Hmmm. I don't think this query would do what you want anyway:
SELECT date, name, animal_bought,
(SELECT RANK() OVER(PARTITION BY name ORDER BY date ASC) - 1
FROM this_table
WHERE animal_bought = 'sloth'
) AS num_sloths_bought_before
FROM source_table
For a few reasons:
The use of rank() suggests that there is more than one row in this_table that matches animal_bought. Otherwise, you could use an aggregation function.
If there is only one row that matches the where clause, then the value is always 1, because the where clause is processed before the rank().
Your question only mentions one table but your query has two
Perhaps you just want rank() without a subquery?
SELECT date, name, animal_bought,
RANK() OVER (PARTITION BY name, animal ORDER BY date ASC) - 1 as NumberBoughtBefore
FROM source_table;
If you want it for both animals, then don't use rank(), use cumulative sum:
SELECT date, name, animal_bought,
sum(case when animal = 'sloth' then 1 else 0 end) over (partition by name order by date) as SlothsBefore,
sum(case when animal = 'camel' then 1 else 0 end) over (partition by name order by date) as CamelsBefore
FROM source_table;
EDIT:
SELECT date, name, animal_bought,
(sum(case when animal = 'sloth' then 1 else 0 end) over (partition by name order by date) -
(case when animal = 'sloth' then 1 else 0 end)
) as SlothsBefore,
(sum(case when animal = 'camel' then 1 else 0 end) over (partition by name order by date) -
(case when animal = 'camel' then 1 else 0 end)
) as CamelsBefore
FROM source_table;

Grouping SQL Results based on order

I have table with data something like this:
ID | RowNumber | Data
------------------------------
1 | 1 | Data
2 | 2 | Data
3 | 3 | Data
4 | 1 | Data
5 | 2 | Data
6 | 1 | Data
7 | 2 | Data
8 | 3 | Data
9 | 4 | Data
I want to group each set of RowNumbers So that my result is something like this:
ID | RowNumber | Group | Data
--------------------------------------
1 | 1 | a | Data
2 | 2 | a | Data
3 | 3 | a | Data
4 | 1 | b | Data
5 | 2 | b | Data
6 | 1 | c | Data
7 | 2 | c | Data
8 | 3 | c | Data
9 | 4 | c | Data
The only way I know where each group starts and stops is when the RowNumber starts over. How can I accomplish this? It also needs to be fairly efficient since the table I need to do this on has 52 Million Rows.
Additional Info
ID is truly sequential, but RowNumber may not be. I think RowNumber will always begin with 1 but for example the RowNumbers for group1 could be "1,1,2,2,3,4" and for group2 they could be "1,2,4,6", etc.
For the clarified requirements in the comments
The rownumbers for group1 could be "1,1,2,2,3,4" and for group2 they
could be "1,2,4,6" ... a higher number followed by a lower would be a
new group.
A SQL Server 2012 solution could be as follows.
Use LAG to access the previous row and set a flag to 1 if that row is the start of a new group or 0 otherwise.
Calculate a running sum of these flags to use as the grouping value.
Code
WITH T1 AS
(
SELECT *,
LAG(RowNumber) OVER (ORDER BY ID) AS PrevRowNumber
FROM YourTable
), T2 AS
(
SELECT *,
IIF(PrevRowNumber IS NULL OR PrevRowNumber > RowNumber, 1, 0) AS NewGroup
FROM T1
)
SELECT ID,
RowNumber,
Data,
SUM(NewGroup) OVER (ORDER BY ID
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS Grp
FROM T2
SQL Fiddle
Assuming ID is the clustered index the plan for this has one scan against YourTable and avoids any sort operations.
If the ids are truly sequential, you can do:
select t.*,
(id - rowNumber) as grp
from t
Also you can use recursive CTE
;WITH cte AS
(
SELECT ID, RowNumber, Data, 1 AS [Group]
FROM dbo.test1
WHERE ID = 1
UNION ALL
SELECT t.ID, t.RowNumber, t.Data,
CASE WHEN t.RowNumber != 1 THEN c.[Group] ELSE c.[Group] + 1 END
FROM dbo.test1 t JOIN cte c ON t.ID = c.ID + 1
)
SELECT *
FROM cte
Demo on SQLFiddle
How about:
select ID, RowNumber, Data, dense_rank() over (order by grp) as Grp
from (
select *, (select min(ID) from [Your Table] where ID > t.ID and RowNumber = 1) as grp
from [Your Table] t
) t
order by ID
This should work on SQL 2005. You could also use rank() instead if you don't care about consecutive numbers.