Merging rows in SQL (Oracle) - sql

I'm looking to run an UPDATE, and a subsequent DELETE if necessary, query on my table (let's call it MY_TABLE) that would merge all rows in the following way.
Input Table
ID LowerRange UpperRange Attribute
1 10 20 A
2 20 30 A
3 40 50 A
4 15 35 B
Output table
ID LowerRange UpperRange Attribute
1 10 30 A
3 40 50 A
4 15 35 B
Notice how...
Rows 1 & 2 of the Input Table are merged into Row 1 of the Output Table because their ranges overlap and they have the same Attribute.
Row 3 of the Input Table is not merged with Rows 1 & 2 because their ranges don't overlap, despite them having the same Attribute.
Row 4 of the Input Table is not merged with Rows 1 & 2 because they don't have the same Attribute, despite having overlapping ranges.
All rows in TABLE would be merged where their ranges overlap and they have the same Attribute.
Let me know if you have any questions. Any help would be greatly appreciated.
Thanks,
Stephen.

Here's one way (assuming that you could have overlapping rows where the start range of the current row is less than or equal to the end range of the previous row):
with sample_data as (select 1 id, 10 lower_range, 20 upper_range, 'A' attribute from dual union all
select 2 id, 20 lower_range, 30 upper_range, 'A' attribute from dual union all
select 3 id, 40 lower_range, 50 upper_range, 'A' attribute from dual union all
select 4 id, 15 lower_range, 35 upper_range, 'B' attribute from dual union all
select 5 id, 45 lower_range, 55 upper_range, 'A' attribute from dual union all
select 6 id, 16 lower_range, 34 upper_range, 'B' attribute from dual)
select min(id) id,
min(lower_range) lower_range,
max(upper_range) upper_range,
attribute
from (select id,
lower_range,
upper_range,
attribute,
sum(diff) over (partition by attribute order by lower_range, upper_range) grp
from (select id,
lower_range,
upper_range,
attribute,
case when lag(upper_range, 1, lower_range) over (partition by attribute order by lower_range, upper_range) >= lower_range then 0 else 1 end diff
from sample_data))
group by attribute, grp;
ID LOWER_RANGE UPPER_RANGE ATTRIBUTE
---------- ----------- ----------- ---------
1 10 30 A
3 40 55 A
4 15 35 B
If your rows only overlap when the previous upper_range is the same as the current lower_range, then just remove the > from the case statement.
What this does is see if the lower_range of the current row is greater than or equal to the previous row's upper_range. If it is, then we set the result to be 0, otherwise we'll set it to be 1 (which indicates that there is a gap between the two rows).
Next, we then perform a cumulative sum across all the rows per attribute. This then will have the same result for rows that overlap, and will increase by 1 every time it comes across a gap.
Now we can use this along with the attribute column to group the rows and find their min/max ranges along with the min(id).

Check this out: Merge data in two row into one

Related

multiple top n aggregates query defined as a view (or function)?

I couldn't find a past question exactly like this problem. I have an orders table, containing a customer id, order date, and several numeric columns (how many of a particular item were ordered on that date). Removing some of the numberics, it looks like this:
customer_id date a b c d
0001 07/01/22 0 3 3 5
0001 07/12/22 12 0 50 0
0002 06/30/22 5 65 0 30
0002 07/20/22 1 0 19 2
0003 08/01/22 0 0 99 0
I need to sum each numeric column by customer_id, then return the top n customers for each column. Obviously that means a single customer may appear multiple times, once for each column. Assuming top 2, the desired output would look something like this:
column_ranked customer_id sum rank
'a' 001 12 1
'a' 002 6 2
'b' 002 65 1
'b 001 3 2
'c' 003 99 1
'c' 001 53 2
'd' 002 30 1
'd' 001 5 2
(this assumes no date range filter)
My first thought was a CTE to collapse the table into its per-customer sums, then a union from the CTE, with a limit n clause, once for each summed column. That works if the date range is hard-coded into the CTE .... but I want to define this as a view, so it can be called by users something like this:
SELECT * from top_customers_view WHERE date_range BETWEEN ( date1 and date2 )
How can I pass the date restriction down to the CTE? Or am I taking the wrong approach entirely? If a view isn't possible, can it be done as a function? (without using a costly cursor, that is.)
Since the date ranges clearly produce a massive number of combinations you cannot generate a view with them. You can write a query, however, as shown below:
with
p as (select cast ('2022-01-01' as date) as ds, cast ('2022-12-31' as date) as de),
a as (
select top 10 customer_id, 'a' as col, sum(a) as s
from t cross join p where date between ds and de
group by customer_id order by s desc
),
b as (
select top 10 customer_id, 'b' as col, sum(b) as s
from t cross join p where date between ds and de
group by customer_id order by s desc
),
c as (
select top 10 customer_id, 'c' as col, sum(b) as s
from t cross join p where date between ds and de
group by customer_id order by s desc
),
d as (
select top 10 customer_id, 'd' as col, sum(b) as s
from t cross join p where date between ds and de
group by customer_id order by s desc
)
select * from a
union all select * from b
union all select * from c
union all select * from d
order by customer_id, col, s desc
The date range is in the second line.
See db<>fiddle.
Alternatively, you could create a data warehousing solution, but it would require much more effort to make it work.

JOIN by closer value to key

With the following sample data:
WITH values AS (
SELECT
1 AS shard,
2008 AS year,
1 AS value
UNION ALL
SELECT
1 AS shard,
20012 AS year,
2 AS value
UNION ALL
SELECT
2 AS shard,
2011 AS year,
3 AS value
UNION ALL
SELECT
2 AS shard,
1998 AS year,
4 AS value
UNION ALL
SELECT
2 AS shard,
2001 AS year,
5 AS value
UNION ALL
SELECT
4 AS shard,
1990 AS year,
6 AS value
ORDER BY year
),
data AS (
SELECT
1 AS id,
1 AS shard,
2010 AS year
UNION ALL
SELECT
1 AS id,
2 AS shard,
2000 AS year
UNION ALL
SELECT
1 AS id,
3 AS shard,
1990 AS year
UNION ALL
SELECT
2 AS id,
1 AS shard,
2010 AS year
UNION ALL
SELECT
2 AS id,
2 AS shard,
2000 AS year
UNION ALL
SELECT
2 AS id,
3 AS shard,
1990 AS year
)
I want to join my data collection with the values stored in values collection. Data has an id which differentiates each process, so I want to perform the JOIN for each id. Also, the JOIN has a double mapping key, which are the shard and year fields. I want to retreive, for each entry on my data, the value of the CLOSER year in my values collection which matches its shard attribute.
I have come up with the piece of code, but it is not working as expected as it doesn't consider the values.shard field, and it matches every year no matter the shard they are on.
SELECT *
FROM (
SELECT
data.id,
data.year,
values.year AS closer_year,
ABS(data.year - values.year) AS diff,
values.value,
ROW_NUMBER() OVER (PARTITION BY data.id, data.shard ORDER BY ABS(data.year - values.year)) AS rn
FROM data, values
)
WHERE rn = 1
For the sample data, the expected output should be:
id year closer_year diff value rn
1 2010 2008 2 1 1
1 2000 2001 1 5 1
1 1990 null null null 1
2 2010 2008 2 1 1
2 2000 2001 1 5 1
2 1990 null null null 1
What am I missing?
I found what I was missing just after posting the question. I will answer it in case anyone has a similar use case.
When rereading the text, I noticed that the "match the shard" property I was missing was indeed a left join, so rewriting the query like this solved the problem:
SELECT *
FROM (
SELECT
data.id,
data.year,
values.year AS closer_year,
ABS(data.year - values.year) AS diff,
values.value,
ROW_NUMBER() OVER (PARTITION BY data.id, data.shard ORDER BY ABS(data.year - values.year)) AS rn
FROM data
LEFT JOIN values
ON data.shard = values.shard
)
WHERE rn = 1

How to find the last non null value of a column and recursively find the sum value of another column

Suppose I have a column A and currently fetched value of A is null. I need to go back to previous rows and find the non -null value of column A. Then I need to find the sum of another column B from the point non value is seen till the current point. After that I need to add the sum of B with A, which will be new value of A.
For finding the column A non null value I have written the query as
nvl(last_value(nullif(A,0)) ignore nulls over (order by A),0)
But I need to do the calculation of B as mentioned above.
nvl(last_value(nullif(A,0)) ignore nulls over (order by A),0)
Can anyone please help me out ?
Sample data
A B date
null 20 14/06/2019
null 40 13/06/2019
10 50 12/06/2019
here value of A on 14/06/2019 should be replaced by sum of B + value of A on 12/06/2019(which is the 1st non null value of A)=20+40+50+10=120
If you have version 12c or higher:
with t( A,B, dte ) as
(
select null, 20, date'2019-06-14' from dual union all
select null, 40, date'2019-06-13' from dual union all
select 10 ,50, date'2019-06-12' from dual
)
select * from t
match_recognize(
order by dte desc
measures
nvl(
first(a),
y.a + sum(b)
) as a,
first(b) as b,
first(dte) as dte
after match skip to next row
pattern(x* y{0,1})
define x as a is null,
y as a is not null
);
A B DTE
------ ---------- ----------
120 20 2019-14-06
100 40 2019-13-06
10 50 2019-12-06
Use conditional count to divide data into separate groups, then use this group for analytical calculation:
select a, b, dt, grp, sum(nvl(a, 0) + nvl(b, 0)) over (partition by grp order by dt) val
from (
select a, b, dt, count(case when a is not null then 1 end) over (order by dt) grp
from t order by dt desc)
order by dt desc
Sample result:
A B DT GRP VAL
------ ---------- ----------- ---------- ----------
20 2019-06-14 4 120
40 2019-06-13 4 100
10 50 2019-06-12 4 60
5 2 2019-06-11 3 7
6 1 2019-06-10 2 7
3 2019-06-09 1 14
7 4 2019-06-08 1 11
demo
I think what you want is handled by using
sum(<column>) over (...) together with last_value over (...) function as below
:
with t( A,B, "date" ) as
(
select null, 20, date'2019-06-14' from dual union all
select null, 40, date'2019-06-13' from dual union all
select 10 ,50, date'2019-06-12' from dual
)
select nvl(a,sum(b) over (order by 1)+
last_value(a) ignore nulls
over (order by 1 desc)
) as a,
b, "date"
from t;
A B date
--- -- ----------
120 20 14.06.2019
120 40 13.06.2019
10 50 12.06.2019
Demo

select case when number is distinct then 1 else 0

I want to get the expected output such that when number happens first time in the account group, then display 1, otherwise if it happens another time, display null or 0. The same logic for other account groups.
The logic I can think is
select *,
case when number happens first time then 1 else null
over (partition by account order by number) from table.
account number expected output
abc 20 1
abc 20 0
abc 30 1
def 20 1
def 30 1
def 30 0
use lag
select *,case when number=lag(number) over(partition by account order by account)
then 0 else 1 end as val
from table_name
You almost had it!
select account, number
case when Row_Number() OVER (partition by account order by number) = 1 THEN 1 END ExpOut
from table
#PaulX's answer is close, but the partitioning isn't quite right. You can do:
-- CTE for sample data
with your_table (account, num) as (
select 'abc', 20 from dual
union all select 'abc', 20 from dual
union all select 'abc', 30 from dual
union all select 'def', 20 from dual
union all select 'def', 30 from dual
union all select 'def', 30 from dual
)
select account, num,
case when row_number() over (partition by account, num order by null) = 1
then 1
else 0
end as output
from your_table;
ACCOUNT NUM OUTPUT
------- ---------- ----------
abc 20 1
abc 20 0
abc 30 1
def 20 1
def 30 1
def 30 0
(adjusted for legal column names; hopefully you don't actually have quoited identifiers...)
If you want nulls rather than zeros then just leave out the else 0 part. And this just assumes by 'first' you mean the first returned in your result set, as otherwise - at least with the columns you showed - there is no obvious alternative. If you actually have other columns, particularly if you're using any others for the ordering of the result set, then you can apply the same ordering inside the partition clause to make it consistent.

Oracle - Order by Alpha numeric

I need to order the rows in my result set by a column that holds varchar2 K-12.
Example:
ID Grade Expense
1 1 500
1 10 500
1 11 500
1 12 500
1 2 500
1 3 500
1 4 500
1 5 500
1 6 500
1 7 500
1 8 500
1 9 500
1 K 500
This is my order by clause which works, but I would like to have the
row with Grade = K as the first row for each ID in my result set.
order by ID, to_number(regexp_substr(grade, '^[[:digit:]]*'))
As it stands, the result set has the row with ID = K is last and not
first. How can i make it the first row for each ID in my result set?
ID Grade Expense
1 K 500
1 1 500
1 2 500
1 3 500
1 4 500
1 5 500
1 6 500
1 7 500
1 8 500
1 9 500
1 10 500
1 11 500
1 12 500
Thanks in advance
Simply use a case statement to set K to something below 1. this has the advantage if you have a Pre-K later, you can modify the case to handle it as well.
With CTE as
(SELECT '1' as grade from dual union
SELECT '2' from dual union
select '10' from dual union
select 'K' from dual)
SELECT * FROM CTE
ORDER BY CASE GRADE when 'K' then -1
else to_number(regexp_substr(grade, '^[[:digit:]]*')) end
This is a bit of a kludge, but since the regex for 'K' returns null, change the order by to:
order by ID, nvl(to_number(regexp_substr(grade, '^[[:digit:]]*')),0)
This will return 0 for 'K' and sort it properly.
You can do the following:
WITH g1 AS (
SELECT 1 AS id, TO_CHAR(level) AS grade, 500 AS expense FROM dual
CONNECT BY level <= 12
UNION
SELECT 1, 'K', 500 FROM dual
UNION
SELECT 1, 'J', 500 FROM dual
)
SELECT g1.*, TO_NUMBER(REGEXP_SUBSTR(grade, '^\d+'))
, DECODE(grade, 'K', -1, TO_NUMBER(REGEXP_SUBSTR(grade, '^\d+')))
FROM g1
ORDER BY DECODE(grade, 'K', -1, TO_NUMBER(REGEXP_SUBSTR(grade, '^\d+'))) NULLS LAST
In this query I'm using CONNECT BY to build your grade table; of course you'll want to ignore that part. Note I added an extra row with a J for the grade level.
In my order by I am using DECODE() so that if grade = 'K', it will give a value of -1. For any grades that can be converted to numeric values (that is, if they start with at least one digit), I use a regex to get as many digits if possible (you can use [:digit:] or [0-9] in place of \d; but \d is nice and short).
I am specifying NULLS LAST so that any rows for which grade cannot be converted to a number, other than K, will be last.
I'm including the extra computed columns just to give a glimpse into what is actually going on and how the values are generated. They aren't needed for the query.
Please see SQL Fiddle demo here.
Only change the ORDER BY clause, this way:
order by ID asc, decode(grade,'K',-1,grade) asc