use default value in partition clause of a window function if null - sql

I have used window functions in my query to sum my rows according to value in combination of rows. Now If 1 row contains null then I have to consider it as false what should i do? I had tried adding coalesce(atg.flag,false) in partition but it didn't work.

coalesce is the way, here is an example:
t=# with dset(i,bool) as (values(1,true),(2,false),(3,null))
select i, bool::text, count(1) over (partition by coalesce(bool,false))
from dset;
i | bool | count
---+-------+-------
2 | false | 2
3 | | 2
1 | true | 1
(3 rows)
as you can see count =2 for null and false and =1 for true

Related

ORACLE SELECT DISTINCT VALUE ONLY IN SOME COLUMNS

+----+------+-------+---------+---------+
| id | order| value | type | account |
+----+------+-------+---------+---------+
| 1 | 1 | a | 2 | 1 |
| 1 | 2 | b | 1 | 1 |
| 1 | 3 | c | 4 | 1 |
| 1 | 4 | d | 2 | 1 |
| 1 | 5 | e | 1 | 1 |
| 1 | 5 | f | 6 | 1 |
| 2 | 6 | g | 1 | 1 |
+----+------+-------+---------+---------+
I need get a select of all fields of this table but only getting 1 row for each combination of id+type (I don't care the value of the type). But I tried some approach without result.
At the moment that I make an DISTINCT I cant include rest of the fields to make it available in a subquery. If I add ROWNUM in the subquery all rows will be different making this not working.
Some ideas?
My better query at the moment is this:
SELECT ID, TYPE, VALUE, ACCOUNT
FROM MYTABLE
WHERE ROWID IN (SELECT DISTINCT MAX(ROWID)
FROM MYTABLE
GROUP BY ID, TYPE);
It seems you need to select one (random) row for each distinct combination of id and type. If so, you could do that efficiently using the row_number analytic function. Something like this:
select id, type, value, account
from (
select id, type, value, account,
row_number() over (partition by id, type order by null) as rn
from your_table
)
where rn = 1
;
order by null means random ordering of rows within each group (partition) by (id, type); this means that the ordering step, which is usually time-consuming, will be trivial in this case. Also, Oracle optimizes such queries (for the filter rn = 1).
Or, in versions 12.1 and higher, you can get the same with the match_recognize clause:
select id, type, value, account
from my_table
match_recognize (
partition by id, type
all rows per match
pattern (^r)
define r as null is null
);
This partitions the rows by id and type, it doesn't order them (which means random ordering), and selects just the "first" row from each partition. Note that some analytic functions, including row_number(), require an order by clause (even when we don't care about the ordering) - order by null is customary, but it can't be left out completely. By contrast, in match_recognize you can leave out the order by clause (the default is "random order"). On the other hand, you can't leave out the define clause, even if it imposes no conditions whatsoever. Why Oracle doesn't use a default for that clause too, only Oracle knows.

Merge rows based on a condition

Is it possible to merge a collection of rows based on a condition in Spark SQL using a sql query ?
If the difference between purch_dt of two consecutive rows placed in order (line_num) is less than 5 days, then combine them into 1 row and output that merged row and the merged row should have the max value of purch_dt for that group. I tried using the LEAD function but I can't get it to reset after each false condition is encountered and consider the following rows as a new group. I am not being able to get the max of purch_dt for each such group.
Input:
orderid | line_num | purch_dt
1 | 1 | 10-02-2020
1 | 2 | 12-02-2020
1 | 3 | 14-02-2020
1 | 4 | 21-03-2020
1 | 5 | 23-03-2020
Output:
orderid | purch_dt
1 | 14-02-2020 -- 1 - 3 combined into 1 row because difference is <5 between each
1 | 23-02-2020 -- 4 - 5 combined into 1 row because difference is <5 between each
Total Output rows = 2 because we have 2 groups.
Please note that line_num 4 is used as a set break since its difference between line_num = 3 is greater than 5. Hence it should have its own merged record set.
I have the sql below so far, but I can't get to break out and create the groups.
create temporary view next_dt as
select
order,
LEAD(purch_dt) over (partition by orderid order by line_num asc) AS next_purch_dt,
purch_dt
from orders;
select *
from (
select
order,
CASE WHEN datediff(next_purch_dt, purch_dt) < 5 OR next_purch IS NULL THEN 'Y'
ELSE 'N'
END AS flg
from
next_dt)
WHERE flg = 'Y';
Any help is appreciated.
UPDATE:
Slight change in the requirements:-
The comparison has now to be made between two different fields in consecutive records - purch_dt of the current record and the return_dt of the next record.
Also, when a merged record group is being output, it should have the purch_dt populated with the value of the record with the least line_num in that group. And the return_dt column populated with the value of the max line_num record of that same group.
Input:
orderid | line_num | purch_dt | return_dt
1 | 1 | 10-02-2020 | 10-02-2020
1 | 2 | 12-02-2020 | 13-02-2020
1 | 3 | 14-02-2020 | 14-02-2020
1 | 4 | 21-03-2020 | 23-02-2020
1 | 5 | 23-03-2020 | 24-02-2020
Output:
orderid | purch_dt | return_dt
1 | 10-02-2020 | 14-02-2020
1 | 21-03-2020 | 24-02-2020
Total Output rows = 2 because we have 2 groups.
Note that each output record contains the purch_dt of the record with min line_num in that group. And contains return_dt populated as per the record with max line_num in that group.
You almost got this, below query has worked for me,
sql("""create temporary view next_dt_orders as
select *
from (
select
orderid,line_num,purch_dt,
case when datediff(
(lead(purch_dt) over (partition by orderid order by line_num asc)),
purch_dt) < 5
then "N"
else "Y"
end as flag
from
orders) tab
where
flag='Y'""")
sql("select * from next_dt_orders").show()
+-------+--------+----------+----+
|orderid|line_num| purch_dt|flag|
+-------+--------+----------+----+
| 1| 3|2020-02-14| Y|
| 1| 5|2020-03-23| Y|
+-------+--------+----------+----+

Update based on same column

I have a table like this:
ID | Flag
-----------
1 | True
1 | True
1 | NULL
1 | True
1 | NULL
2 | False
2 | False
2 | False
2 | NULL
2 | NULL
And I want an output like this:
ID | Flag
-----------
1 | True
1 | True
1 | True
1 | True
1 | True
2 | False
2 | False
2 | False
2 | False
2 | False
I want to replace nulls with the value assigned in different records. Is there a way to do it in a single update statement?
One option uses a correlated subquery:
update mytable t
set flag = (select bool_or(flag) from mytable t1 where t1.id = t.id)
Demo on DB Fiddle:
id | flag
-: | :---
1 | t
1 | t
1 | t
1 | t
1 | t
2 | f
2 | f
2 | f
2 | f
2 | f
You can also use exists:
update t
set flag = exists (select 1 from t t2 where t2.id = t.id and t2.flag);
The advantage of exists over a subquery with aggregation is performance: the query can stop at the first row where flag is true. This is a simple index lookup on an index on (id, flag).
Performance would be more improved by limiting the number of rows being updated. That actually suggests two separate statements:
update t
set flag = true
where (flag is null or not flag) and
exists (select 1 from t t2 where t2.id = t.id and t2.flag);
update t
set flag = false
where (flag is null or flag) and
not exists (select 1 from t t2 where t2.id = t.id and not t2.flag);
These could be combined into a single (more complicated) statement, but the sets being updated are disjoint. This limits the updates to the rows that need to be updated, as well as limiting the subquery to a simple lookup (assuming an index on (id, flag)).
The answers provided satisfy your sample data, but may still leave you short of a satisfactory answer. That is because your sample data is missing a couple significant sets. What happens if you had the following, either instead of or in addition to your current sample data?
+----+-------+
| id | flag |
+----+-------+
| 3 | true |
| 3 | false |
| 3 | null |
| 4 | null |
| 4 | null |
+----+-------+
The answer could be significantly different.
Assuming (like your sample data suggests):
There can never be the same id with true and false in the set. Else, you'd have to define what to do.
null values remain unchanged if there is no non-null value for the same id.
This should give you best performance:
UPDATE tbl t
SET flag = t1.flag
FROM (
SELECT DISTINCT ON (id)
id, flag
FROM tbl
ORDER BY id, flag
) t1 -- avoid repeated computation for same id
WHERE t.id = t1.id
AND t.flag IS NULL -- avoid costly no-op updates
AND t1.flag IS NOT NULL; -- avoid costly no-op updates;
db<>fiddle here
The subquery t1 distills target values per id once.
SELECT DISTINCT ON (id)
id, flag
FROM tbl
ORDER BY id, flag;
Since null sorts last, it effectively grabs the first non-null value per id. false sorts before true, but that has no bearing on the case as there can never be both for the same id. See:
Sort NULL values to the end of a table
Select first row in each GROUP BY group?
If you have many rows per id, there are faster techniques:
Optimize GROUP BY query to retrieve latest row per user
The added conditions in the outer query prevent all no-op updates from happening, thus avoiding major cost. Only rows are updated where a null value actually changes. See:
How do I (or can I) SELECT DISTINCT on multiple columns?

SQL : get values from column depending on boolean expressions in a column

I have a query result like this:
|bool Expression | Column A | Column B|
+----------------+----------+---------+
| true | 2 | 10 |
| false | 3 | 10 |
| true | 4 | 8 |
I need all values of Column B where all boolean expressions from A are true.
The Result I need in this case would be [8] if all were true it would be [8, 10]
Thanks in advance
You can group by columnb:
select columnb
from tablename
group by columnb
having min(boolexpression::int) = 1 and max(boolexpression::int) = 1
I would simply use boolean aggregation functions:
select b
from t
group by b
having bool_and(bool_expression);
As an aside, this will treat NULL boolean expressions correctly -- that is, the b value will be filtered out.
Why is the following not the solution
select b
from table
where expression = true

Setting rank to NULL using RANK() OVER in SQL

In a SQL Server DB, I have a table of values that I am interested in ranking.
When I perform a RANK() OVER (ORDER BY VALUE DESC) as RANK, I get the following results (in a hypothetical table):
RANK | USER_ID | VALUE
------------------------
1 | 33 | 30000
2 | 10 | 20000
3 | 45 | 10000
4 | 12 | 5000
5 | 43 | 2000
6 | 32 | NULL
6 | 13 | NULL
6 | 19 | NULL
6 | 28 | NULL
The problem is, I do not want the rows which have NULL for a VALUE to get a rank - I need some way to set the rank for these to NULL. So far, searching the web has brought me no answers on how I might be able to do this.
Thanks for any help you can provide.
You can try a CASE statement:
SELECT
CASE WHEN Value IS NULL THEN NULL
ELSE RANK() OVER (ORDER BY VALUE DESC)
END AS RANK,
USER_ID,
VALUE
FROM yourtable
The CASE statement provided earlier would count the NULL records in the rank if the SORT BY was ascending rather than descending. This would start the ranking at 5 rather than 1 - probably not what is desired.
To ensure that the nulls do not get counted in the rank, you can force them to the bottom by adding an initial sort criteria on whether the value IS NULL or not, like so:
SELECT
CASE WHEN Value IS NULL THEN NULL
ELSE RANK() OVER
(ORDER BY CASE WHEN Value IS NULL THEN 1 ELSE 0 END, VALUE DESC)
END AS RANK,
USER_ID,
VALUE
FROM yourtable
*** credit to Hugo Kornelis: https://social.msdn.microsoft.com/Forums/sqlserver/en-US/deb8a0aa-aaab-442b-a667-11220333a4e0/rank-without-counting-null-values?forum=transactsql