Oracle - Fill null values in a column with values from another column - sql

I am using Oracle 11.1.1.9.0 and my goal is to fill the Null values with the first NOT NULL values in "Raw Materials" column by each product i.e A, B and C in Product column. An example table and the intended result are illustrated at the end of this request.
None of the code sets in below works:

CODE 1:
IFNULL(Raw Materials,
First_value(Raw Materials) OVER (PARTITION BY Product))

CODE 2:
IFNULL(Raw Materials, 
First_value(Raw Materials) OVER (PARTITION BY Product
RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW))

CODE 3:
COALESCE(lag(Raw Materials ignore null) OVER (partition by Product),
Raw Materials)
CODE 4:
IFNULL(Raw Materials, EVALUATE('LAG(%1, 1) OVER (PARTITION BY %2)' AS varchar2(20), Raw Materials, Product))
Note: IFNULL function does work in the environment. It was tested with IFNULL(Raw Materials, '1') and it resulted in all null values becoming 1 in Raw Materials column.
Thank you.
+---------+----------+ +---------+----------+
| product | material | | product | material |
+---------+----------+ +---------+----------+
| A | | | A | Apple |
| A | | | A | Apple |
| A | | | A | Apple |
| A | | | A | Apple |
| A | Apple | | A | Apple |
| B | | | B | Orange |
| B | | | B | Orange |
| B | | => | B | Orange |
| B | | | B | Orange |
| B | Orange | | B | Orange |
| C | | | C | Banana |
| C | | | C | Banana |
| C | | | C | Banana |
| C | | | C | Banana |
| C | Banana | | C | Banana |
+---------+----------+ +---------+----------+
Left is the example table data. Right is the intended result.
The below link "Oracle code environment" shows the code environment and samples of Oracle Logical SQL function.
Oracle code environment
Oracle Logical SQL manual: https://docs.oracle.com/middleware/11119/biee/BIEUG/appsql.htm#CHDDCFJI

For your dataset, you could simply do a window MAX() or MIN():
NVL(Raw_Materials, MAX(Raw_Materials) OVER(PARTITION BY Product))
If you have a column that can be used to order the rows (I assumed id), you can use LAG() with the IGNORE NULLS clause:
NVL(Raw_Materials, LAG(Raw_Materials IGNORE NULLS) OVER(PARTITION BY Product ORDER BY id))

While you say that you are looking for some "first" value, your sample data suggests that you just want all same products to have the same material:
update mytable m1 set material =
(
select min(material)
from mytable m2
where m2.product = m1.product
);
If you just want to select this data. Then you can use this:
select product, min(material) over (partition by product)
from mytable;
According to the docs (https://docs.oracle.com/cd/E28280_01/bi.1111/e10540/sqlref.htm#BIEMG678) it seems OBIEE uses a special syntax for analytic window functions (e.g. MIN() OVER()):
select
product,
evaluate('min(%1) over (partition by %2)', material, product)
from mytable;
You must enable this by seeting the EVALUATE_SUPPORT_LEVEL accordingly.
(I hope I got this right. Otherwise read the docs on this and try something along the lines for yourself.)

You can try below query,We are using First value analytic function nullif, COALESCE, etc work on row level not column level.
with temp as (select 'A' product,NULL raw_material from dual union all
select 'A',NULL from dual union all
select 'A',NULL from dual union all
select 'A',NULL from dual union all
select 'A','APPLE' from dual union all
select 'B',NULL from dual union all
select 'B',NULL from dual union all
select 'B',NULL from dual union all
select 'B',NULL from dual union all
select 'B','ORANGE' from dual union all
select 'C',NULL from dual union all
select 'C',NULL from dual union all
select 'C',NULL from dual union all
select 'C',NULL from dual union all
select 'C',NULL from dual union all
select 'C','Banana' from dual)
select a.*,FIRST_VALUE(raw_material IGNORE NULLS)
OVER (partition by product ORDER BY product) first_product from temp a;

Oracle does not have an IFNULL function. Your code would have worked if you swapped IFNULL for COALESCE in either of your first two code snippets:
SELECT t.*,
COALESCE(
raw_material,
FIRST_VALUE(raw_material)
IGNORE NULLS
OVER ( PARTITION BY product )
) AS updated_raw_material
FROM test_data t;
Outputs:
PRODUCT | RAW_MATERIAL | UPDATED_RAW_MATERIAL
:------ | :----------- | :-------------------
A | null | Apple
A | null | Apple
A | null | Apple
A | Apple | Apple
B | null | Orange
B | null | Orange
B | null | Orange
B | null | Orange
B | Orange | Orange
C | null | Banana
C | null | Banana
C | null | Banana
C | null | Banana
C | null | Banana
C | Banana | Banana
db<>fiddle here

Related

postgresql aggregate by max string length

I have a one to many relationship. In this case, it's a pipelines entity that can have many segments. The segments entity has a column to list the wells associated with this pipeline. This column is purely informational, and is only updated from a regulatory source as a comma separated list, so the data type is text.
What I want to do is to list all the pipelines and show the segment column that has the most associated wells. Each well is identified with a standardized land location (text is the same length for each well). I am also doing other aggregate functions on the segments, so my query looks something like this (I have to simplify it because it's pretty large):
SELECT pipelines.*, max(segments.associated_wells), min(segments.days_without_production), max(segments.production_water_m3)
FROM pipelines
JOIN segments ON segments.pipeline_id = pipelines.id
GROUP BY pipelines.id
This selects the associated_wells that has the highest alphabetical value, which makes sense, but is not what I want.
max(length(segments.associated_wells)) will select the record I want, but only show the length. I need to show the column value.
How can I aggregate based on the string length but show the value?
Here's an example of what I am expecting:
Segment entity:
| id | pipeline_id | associated_wells | days_without_production | production_water_m3 |
|----|-------------|--------------------------|-------------------------|---------------------|
| 1 | 1 | 'location1', 'location2' | 30 | 2.3 |
| 2 | 1 | 'location1' | 15 | 1.4 |
| 3 | 2 | 'location1' | 20 | 1.8 |
Pipeline entity:
| id | name |
|----|-------------|
| 1 | 'Pipeline1' |
| 2 | 'Pipeline2' |
| | |
Desired Query Result:
| id | name | associated_wells | days_without_production | production_water_m3 |
|----|-------------|--------------------------|-------------------------|---------------------|
| 1 | 'Pipeline1' | 'location1', 'location2' | 15 | 2.3 |
| 2 | 'Pipeline2' | 'location1' | 20 | 1.8 |
| | | | | |
If I understand correctly, you want DISTINCT ON:
SELECT DISTINCT ON (p.id) p.*, s.*
FROM pipelines p JOIN
segments s
ON s.pipeline_id = p.id
ORDER BY p.id, LENGTH(s.associated_wells) DESC;
Keep normalising and verticalise the locations/associated wells, by cross joining with a series of integers, and then group twice:
WITH
segment(seg_id,pipeline_id,associated_wells,days_without_production,production_water_m3) AS (
SELECT 1,1,'location1, location2',30,2.3
UNION ALL SELECT 2,1,'location1',15,1.4
UNION ALL SELECT 3,2,'location1',20,1.8
)
,
pipeline(pipeline_id,name) AS (
SELECT 1,'Pipeline1'
UNION ALL SELECT 2,'Pipeline2'
)
,
i(i) AS (
SELECT 1
UNION ALL SELECT 2
UNION ALL SELECT 3
)
,
location AS (
SELECT
seg_id
, i AS loc_id
, SPLIT_PART(associated_wells,', ',i) AS location
FROM segment CROSS JOIN i
WHERE SPLIT_PART(associated_wells,',' ,i) <>''
)
,
pregroup AS (
SELECT
segment.pipeline_id
, location.location
, MIN(days_without_production) AS days_without_production
, MAX(production_water_m3) AS production_water_m3
FROM segment
JOIN pipeline USING(pipeline_id)
JOIN location USING(seg_id)
GROUP BY 1,2
)
SELECT
pipeline_id
, STRING_AGG(location,',') AS locations
, MIN(days_without_production) AS days_without_production
, MAX(production_water_m3) AS production_water_m3
FROM pregroup
GROUP BY 1;
pipeline_id | locations | days_without_production | production_water_m3
-------------+---------------------+-------------------------+---------------------
1 | location1,location2 | 15 | 2.3
2 | location1 | 20 | 1.8

BigQuery Standard SQL Group by aggregate multiple columns

Sample dataset:
|ownerId|category|aggCategory1|aggCategory2|
--------------------------------------------
| 1 | dog | animal | dogs |
| 1 | puppy | animal | dogs |
| 2 | daisy | flower | ignore |
| 3 | rose | flower | ignore |
| 4 | cat | animal | cats |
...
Looking to do a group by that contains number of owners from category, aggCategory1, aggCategory2 for example outputting:
|# of owners|summaryCategory|
-----------------------------
| 1 | dog |
| 1 | puppy |
| 1 | daisy |
| 1 | rose |
| 1 | cat |
| 2 | animal |
| 2 | flower |
| 1 | dogs |
| 2 | ignore |
| 1 | cats |
Doesn't have to be that format but looking to get the above data points.
Thanks!
One method is to use union all to unpivot the data and then aggregation in an outer query:
SELECT category, COUNT(*)
FROM (SELECT ownerID, category
FROM t
UNION ALL
SELECT ownerID, aggCategory1
FROM t
UNION ALL
SELECT ownerID, aggCategory2
FROM t
) t
GROUP BY category
The more BigQuery'ish way to write this uses arrays:
SELECT cat, COUNT(*)
FROM t CROSS JOIN
UNNEST(ARRAY[category, aggcategory1, aggcategory2]) cat
GROUP BY cat;
SELECT COUNT(T.ownerID), T.category
FROM (
SELECT ownerID, category
FROM table
UNION
SELECT ownerID, aggCategory1
FROM table
UNION
SELECT ownerID, aggCategory2
FROM table
) AS T
GROUP BY T.category
With a GROUP BY and the union with all of yours categories columns, it can be good.
use union all
with cte as
(
SELECT ownerID, category as summaryCategory
FROM table
UNION
SELECT ownerID, aggCategory1 as summaryCategory
FROM table
UNION
SELECT ownerID, aggCategory2 as summaryCategory
FROM table
) select count(ownerID),summaryCategory from cte group by summaryCategory

Using Limit on Distinct group by values psql

Suppose I have a table that looks like this or maybe I am going nowhere.
create table customers (id text, name text, number int, useless text);
With values
insert into customers (id, name, number, useless)
values
('1','apple',1, 'a'),
('2','banana',3, 'b'),
('3','pear',2, 's'),
('4','apple',1,'e'),
('5','banana',3,'s'),
('6','cherry',3, 'a'),
('7','cherry',4, 's'),
('8','apple',2, 'd'),
('9','banana',4, 'c'),
('10','pear',5, 'e');
My failed psql query is this.
select id, name, number, useless
from customers
where number < 4
group by customers.name limit 2
the query i want to use that it returns first 2 unique grouped by customers.name. Not the first 2 rows
In the end I want it to return
('1','apple',1, 'a'),
('4','apple',1,'e'),
('8','apple',2, 'd'),
('2','banana',3, 'b'),
('5','banana',3,'s'),
so it returns the first 2 grouped names.
How can I make this query?
Thank you.
Edit:
this query is my second try I know I am kinda close.
select t.id, t.name, t.ranking
from (
SELECT id, name, dense_rank() OVER (order by name) as
ranking
FROM customers
group by name
) t
where t.ranking < 3
try this:
select id, name, number, useless
from customers
where name in (
select name
from customers
where number < 4
group by customers.name
order by name limit 2
)
| id | name | number | useless |
|----|--------|--------|---------|
| 1 | apple | 1 | a |
| 2 | banana | 3 | b |
| 4 | apple | 1 | e |
| 5 | banana | 3 | s |
| 8 | apple | 2 | d |
| 9 | banana | 4 | c |
SQL Fiddle DEMO
The group by customers.name function do not order your output, just group them by the customers.name, what you want to do is to order the group right? So what i think you want to do is:
select id, name, number, useless
from customers
group by name
order by name []*
*[asc/desc] depends of what order you want to do:
asc - ascendent,
desc - descendent
Hope it helps you.
You can use dense_rank() as:
SELECT * FROM (
SELECT DENSE_RANK() OVER (order by name) AS rank, temp.*
FROM customers temp WHERE number < 4) data
WHERE data.rank <= 2
| rank| id| name | number | useless |
|-----|---|--------|--------|---------|
| 1 | 4 | apple | 1 | e |
| 1 | 1 | apple | 1 | a |
| 1 | 8 | apple | 2 | d |
| 2 | 5 | banana | 3 | s |
| 2 | 2 | banana | 3 | b |

Select query where record count = 2 and column contains either two values

Example 1
+--------------------------+
| IDENT | CURRENT | SOURCE |
+--------------------------+
| 12345 | 12345 | A |
| 23456 | 12345 | B |
| 34567 | 12345 | C |
+--------------------------+
Example 2
+--------------------------+
| IDENT | CURRENT | SOURCE |
+--------------------------+
| 12345 | 55555 | A |
| 23456 | 55555 | B |
+--------------------------+
Trying to write select query that will show all records that CURRENT count = 2 and SOURCE contains both A and B (NOT C).
Example A should not show up as there are 3 entries for the CURRENT as record is linked to SOURCE C.
Example B is what I'm looking the query to find, CURRENT has two records and is only linked to SOURCE 'A' and 'B'.
Currently if I run something similar to "where SOURCE = A or SOURCE = B", results are records that just have SOURCE of A, OR A+C.
NOTES: IDENT is always a unique value. CURRENT links multiple IDENTS from different SOURCE's.
We're clearly missing more information. Let's take example data (thanks gloomy for the initial fiddle).
| ID | CURRENT | SOURCE |
|----|---------|--------|
| 1 | 111 | A |
| 2 | 111 | B |
| 3 | 111 | C |
| 4 | 222 | A |
| 5 | 222 | B |
| 6 | 333 | A |
| 7 | 333 | C |
| 8 | 444 | B |
| 9 | 444 | C |
| 10 | 555 | B |
| 11 | 666 | A |
| 12 | 666 | A |
| 13 | 666 | B |
| 14 | 777 | A |
| 15 | 777 | A |
I assume you only need this as the result:
| ID | CURRENT | SOURCE |
|----|---------|--------|
| 4 | 222 | A |
| 5 | 222 | B |
This query will work with any amount of sources and result in the expected output:
SELECT * FROM test
WHERE CURRENT IN (
SELECT CURRENT FROM test
WHERE CURRENT NOT IN (
SELECT CURRENT FROM test
WHERE SOURCE NOT IN ('A', 'B')
)
GROUP BY CURRENT
HAVING count(SOURCE) = 2 AND count(DISTINCT SOURCE) = 2
)
If SOURCE values are guaranteed to be unique per CURRENT:
SELECT CURRENT
FROM atable
GROUP BY CURRENT
HAVING COUNT(SOURCE) = 2
AND COUNT(CASE WHEN SOURCE IN ('A', 'B') THEN SOURCE END) = 2
;
If SOURCE values aren't unique per CURRENT but CURRENTs with duplicate entries of 'A' or 'B' are allowed:
SELECT CURRENT
FROM atable
GROUP BY CURRENT
HAVING COUNT(DISTINCT SOURCE) = 2
AND COUNT(DISTINCT CASE WHEN SOURCE IN ('A', 'B') THEN SOURCE END) = 2
;
If SOURCE values aren't unique and groups with duplicate SOURCE entries aren't allowed:
SELECT CURRENT
FROM atable
GROUP BY CURRENT
HAVING COUNT(SOURCE) = 2
AND COUNT(DISTINCT SOURCE) = 2
AND COUNT(DISTINCT CASE WHEN SOURCE IN ('A', 'B') THEN SOURCE END) = 2
;
Every query returns only distinct CURRENT values matching the requirements. Use the query as a derived dataset and join it back to your table to get the details.
All the above options assume that either SOURCE is a NOT NULL column or that NULLs can just be disregarded.
Records where current count = 2:
SELECT CURRENT
FROM table
GROUP BY CURRENT
HAVING COUNT(*) = 2
Records where C is in SOURCE values:
SELECT CURRENT
FROM table
WHERE SOURCE = 'C'
Global query:
SELECT t.*
FROM TABLE t
WHERE t.CURRENT IN (
SELECT CURRENT
FROM table
GROUP BY CURRENT
HAVING COUNT(*) = 2
) AND t.CURRENT NOT IN (
SELECT CURRENT
FROM table
WHERE SOURCE = 'C'
)
http://sqlfiddle.com/#!2/69be9/8/0
select * from test where current in (
select test_a.current
from
(select *
from test
where source = 'A') as test_a
join (select *
from test
where source = 'B') as test_b
on test_b.current = test_a.current
where test_a.current not in
(select current from test where source='C')
)
SELECT *
FROM TABLE mainTbl,
(SELECT CURRENT
FROM TABLE
WHERE source IN ('A', 'B')
HAVING COUNT(1) = 2
GROUP BY CURRENT
) selectedSet
WHERE mainTbl.current = selectedSet.current
AND mainTbl.source IN ('A', 'B');

Running total of "matches" using a window function in SQL

I want to create a window function that will count how many times the value of the field in the current row appears in the part of the ordered partition coming before the current row. To make this more concrete, suppose we have a table like so:
| id| fruit | date |
+---+--------+------+
| 1 | apple | 1 |
| 1 | cherry | 2 |
| 1 | apple | 3 |
| 1 | cherry | 4 |
| 2 | orange | 1 |
| 2 | grape | 2 |
| 2 | grape | 3 |
And we want to create a table like so (omitting the date column for clarity):
| id| fruit | prior |
+---+--------+-------+
| 1 | apple | 0 |
| 1 | cherry | 0 |
| 1 | apple | 1 |
| 1 | cherry | 1 |
| 2 | orange | 0 |
| 2 | grape | 0 |
| 2 | grape | 1 |
Note that for id = 1, moving along the ordered partition, the first entry 'apple' doesn't match anything (since the implied set is empty), the next fruit, 'cherry' also doesn't match. Then we get to 'apple' again, which is a match and so on. I'm imagining the SQL looks something like this:
SELECT
id, fruit,
<some kind of INTERSECT?> OVER (PARTITION BY id ORDER by date) AS prior
FROM fruit_table;
But I cannot find anything that looks right. FWIW, I'm using PostgreSQL 8.4.
You could solve that without a window function rather elegantly with a self-left join and a count():
SELECT t.id, t.fruit, t.day, count(t0.*) AS prior
FROM tbl t
LEFT JOIN tbl t0 ON (t0.id, t0.fruit) = (t.id, t.fruit) AND t0.day < t.day
GROUP BY t.id, t.day, t.fruit
ORDER BY t.id, t.day
I renamed the date column day because date is a reserved word in every SQL standard and in PostgreSQL.
I corrected a mistake in your sample data. They way you had it, it did not check out. Might confuse people.
If your point is to do it with a window function, this one should work:
SELECT id, fruit, day
,count(*) OVER (PARTITION BY id, fruit ORDER BY day) - 1 AS prior
FROM tbl
ORDER BY id, day
This works, because, I quote the manual:
If frame_end is omitted it defaults to CURRENT ROW.
You effectively count how many rows had the same (id, fruit) on prior days - including the current row. That's what the - 1 is for.