Filling null values with last not null one => huge number of columns - sql

I want to copy rows from SOURCE_TABLE to TARGET_TABLE while filling null values with last not null one.
What I have :
SOURCE_TABLE :
A | B | C | PARENT | SEQ_NUMBER
1 | 2 | NULL | 1 | 1
NULL | NULL | 1 | 1 | 2
NULL | 3 | 2 | 1 | 3
DEST_TABLE is empty
What I want :
DEST_TABLE :
A | B | C | PARENT | SEQ_NUMBER
1 | 2 | NULL | 1 | 1
1 | 2 | 1 | 1 | 2
1 | 3 | 2 | 1 | 3
To achieve that I'm dynamically generating the following SQL :
insert into TARGET_TABLE (A, B, C)
select coalesce(A, lag(A ignore nulls) over (partition by parent order by seq_number)) as A,
coalesce(B, lag(B ignore nulls) over (partition by parent order by seq_number)) as B,
coalesce(C, lag(C ignore nulls) over (partition by parent order by seq_number)) as C,
from SOURCE_TABLE;
Everything works fine if SOURCE and TARGET tables have a small number of columns. In my case they have 400+ columns (yes this is bad but it is legacy and cannot be changed) and I got the following error :
ORA-01467: sort key too long
First I don't really understand this error. Is this because I'm using too many lag functions that use themselves "order by"/"partition by" ? Replace coalesce(A, lag(A....)) by coalesce(A,A) and the error disappear.
Then, is there a workaround or another way to achieve the same result ?
Thx

just do it using pl/sql anonymous block:
declare
x tgt_table%rowtype; --keep values from last row
begin
for r in (select * from src_table order by seq_number) loop
x.a := nvl(r.a, x.a);
...
x.z := nvl(r.z, x.z);
insert into tgt_table
values x;
end loop;
end;

Related

How to delete duplicate rows based on one column in postgreSQL?

Say I have column A, B and Date, and I want all rows which are duplicated in A to be removed, while keeping the one with the most recent Date. How would I do this?
I have looked at many other solutions but none seem to work for my case.
Thanks in advance for any help
This should work for you:
DELETE FROM YourTable USING
(SELECT colA, MAX(Date) maxDate
FROM YourTable
GROUP BY colA
) AS Keep
WHERE Keep.maxDate <> YourTable.Date
AND Keep.ColA = YourTable.ColA
will stay:
t=# with sample(a,b,dat) as (values(1,1,1),(1,1,2),(1,2,3),(2,2,3),(2,2,4))
, comparison as (select *,max(dat) over (partition by a) from sample)
select *
from comparison
where dat = max;
a | b | dat | max
---+---+-----+-----
1 | 2 | 3 | 3
2 | 2 | 4 | 4
(2 rows)
and thus to be deleted:
t=# with sample(a,b,dat) as (values(1,1,1),(1,1,2),(1,2,3),(2,2,3),(2,2,4))
, comparison as (select *,max(dat) over (partition by a) from sample)
delete
from comparison
where dat <> max
returning *;
a | b | dat | max
---+---+-----+-----
1 | 1 | 1 | 3
1 | 1 | 2 | 3
2 | 2 | 3 | 4
(3 rows)
of course instead of comparison you should name your table

Group by 3 columns: "Each group by expression must contain at least one column that is not an outer reference"

I know questions regarding this error message have been asked already, but I couldn't find any that really fit my problem.
I have a table with three columns (A,B,C) containing different values and I need to identify all the identical combination. For example out of "TABLE A" below:
| A | B | C |
| 1 | 2 | 3 |
| 1 | 3 | 3 |
| 1 | 2 | 3 |
| 2 | 2 | 2 |
| 1 | 3 | 3 |
... I would like too get "TABLE B" below:
| A | B | C | count |
| 1 | 2 | 3 | 1 |
| 1 | 3 | 3 | 1 |
| 2 | 2 | 2 | 1 |
(I need the last column "count" with 1 in each row for later usage)
When I try with "group by A,B,C" I get the error mentioned in the title. Any help would be greatly appreciated!
FYI, I don't think it really changes the matter, but "TABLE A" is obtained from an other table: "SOURCE_TABLE", thanks to a query of the type:
select (case when ... ),(case when ...),(case when ...) from SOURCE_TABLE
and I need to build "TABLE B" with only one query.
i think what you are after of is using distinct
select distinct A,B,C, 1 [count] -- where 1 is a static value for later use
from (select ... from sourcetable) X
Sounds like you have the right idea. My guess is that the error is occurring due to an outer reference in your CASE statements. If you wrapped your first query in another query, it may alleviate this issue. Try:
SELECT A, B, C, COUNT(*) AS [UniqueRowCount]
FROM (
SELECT (case when ... ) AS A, (case when ...) AS B, (case when ...) AS C FROM SOURCE_TABLE
) AS Subquery
GROUP BY A, B, C
After re-reading your question, it seems that you're not counting at all, just putting a "1" after each distinct row. If that's the case, then you can try:
SELECT DISTINCT A, B, C, [Count]
FROM (
SELECT (case when ... ) AS A, (case when ...) AS B, (case when ...) AS C, 1 AS [Count] FROM SOURCE_TABLE
) AS Subquery
Assuming your outer reference exceptions were occurring in only your aggregations, you should also simply try:
SELECT DISTINCT (case when ... ) AS A, (case when ...) AS B, (case when ...) AS C, 1 AS [Count] FROM SOURCE_TABLE

Oracle: Update with join not working as expected

Edit: The answer given by mathguy works perfectly. But I really wanted to understand why the first update statement isn't working and second one does. I know that + operator is not suggested, but in this case as the second table was in subquery, I had to use it.
Short Question. (Detailed explanation and Create/Insert statements below)
What is the difference between these 2 update statement and why first one is not working as expected, whereas second one does.
update d_dim d
set FLAG=
(select case when t.id is null then 'N' else 'Y' end as FLAG
from t_temp t
where d.id=t.id(+)
);
And
update d_dim d
set FLAG=
(select case when t.id is null then 'N' else 'Y' end as FLAG
from t_temp t,d_dim d1
where d1.id=t.id(+)
and d1.id=d.id
);
Detailed Explanation
I am trying to replicate a workplace scenario. Unfortunately SQLFiddle for Oracle is not working so couldn't create fiddle demo.
I have 2 tables, d_dim(ID,FLAG) and t_temp(ID) like below
select * from d_dim;
+----+------+
| ID | FLAG |
+----+------+
| 1 | |
| 2 | |
| 3 | |
| 4 | |
+----+------+
select * from t_temp;
+----+
| ID |
+----+
| 1 |
| 3 |
+----+
Now I need to set the FLAG in d_dim as Y or N.
If ID exists in t_temp the set it Y. Else set it N.
So expected output should be like.
+----+------+
| ID | FLAG |
+----+------+
| 1 | Y |
| 2 | N |
| 3 | Y |
| 4 | N |
+----+------+
This is the update statement I am using (Using (+) as in this case I need left join from d_dim to t_temp
update d_dim d
set FLAG=
(select case when t.id is null then 'N' else 'Y' end as FLAG
from t_temp t
where d.id=t.id(+)
)
--4 rows updated.
But ID 2 and 4 are updated as NULL.
select * from d_dim;
+----+------+
| ID | FLAG |
+----+------+
| 1 | Y |
| 2 | |
| 3 | Y |
| 4 | |
+----+------+
If I use just the select clause after plugging in d_dim table, I get correct output.
select d.id,
case when t.id is null then 'N' else 'Y' end as FLAG
from t_temp t,d_dim d
where d.id=t.id(+)
order by id
+----+------+
| ID | FLAG |
+----+------+
| 1 | Y |
| 2 | N |
| 3 | Y |
| 4 | N |
+----+------+
I did some hit and trial and came up with this query, which seems to be working
update d_dim d
set FLAG=(select case when t.id is null then 'N' else 'Y' end as FLAG
from t_temp t,d_dim d1
where d1.id=t.id(+)
and d1.id=d.id);
select * from d_dim;
+----+------+
| ID | FLAG |
+----+------+
| 1 | Y |
| 2 | N |
| 3 | Y |
| 4 | N |
+----+------+
So my question is that
Why the initial update statement doesn't work properly and why is it
updating null for id 2 and 4.
Please find the Create and Insert statements below
CREATE TABLE d_dim (id int, flag varchar2(4));
INSERT ALL INTO d_dim (id, flag) VALUES (1, NULL)
INTO d_dim (id, flag) VALUES (2, NULL)
INTO d_dim (id, flag) VALUES (3, NULL)
INTO d_dim (id, flag) VALUES (4, NULL)
SELECT * FROM dual;
CREATE TABLE t_temp (id int)
;
INSERT ALL
INTO t_temp (id) VALUES (1)
INTO t_temp (id) VALUES (3)
SELECT * FROM dual;
In the second query you have an outer join. In the first query you don't have any kind of join; you simply have a select from t, with a where clause where there is a (+) after t.id. I don't know why that syntax doesn't return an error; but when d.id doesn't exist in t, that subquery returns no rows, and that's how update works when the update value is supposed to be the output of a scalar subquery: if the subquery returns no rows, the update statement will update the field with NULL.
You didn't ask for a different way to make the update work, but if you want to see one, here it is. No doubt you know how to do this; offering it for the benefit of other forum members.
update d_dim
set FLAG = case when id in (select id from t_temp) then 'Y' else 'N' end;
EDIT: It seems the OP didn't fully understand my point so here are more details.
The Oracle documentation states explicitly:
•The (+) operator does not produce an outer join if you specify one table in the outer query and the other table in an inner query.
https://docs.oracle.com/cd/B28359_01/server.111/b28286/queries006.htm
(under the heading "Outer Joins", after the "See Also" box)
An update statement would update based on whether the condition was satisfied for the update as specified in the subselect clause.
When you mention
update d_dim d
set FLAG=
(select case when t.id is null then 'N' else 'Y' end as FLAG
from t_temp t
where d.id=t.id(+)
);
This statement would be executed only when d.id is equal to t.id. In your case, for id=2, d.id is 2, t.id is null. 2 is NOT equal to null, so the update has no effect on id 2.

Remove partial duplicates sql server

I am altering an existing view within SQL Server. My union statement creates something along the lines of:
Col1 | C2 | C3 | C4
-----|----|------|-----
1 A | B | NULL | NULL
2 A | B | C | NULL
3 A | B | C | D
4 E | F | NULL | NULL
5 E | F | G | NULL
However, I only want (in this scenario) rows 3 and 5 (I need to ommit one and two because they contain duplicate info - columns one, two, and three contain the same info as row three, but the third row is the most 'complete'). Row 5 for the same reason vs row 4.
Is this an outer join / intersect issue? How the heck do you create a view in this manner?
Assuming that Col1 is not NULL, then we can use ROW_NUMBER with order by on all 4 columns total value
; with cte
AS
(
select ROW_NUMBER() over ( partition by col1 order by (coalesce(Col1,'')+
coalesce([C2],'') +
coalesce([C3],'') +
coalesce([C4],'') ) desc) as seq,
*
FROM Table1
)
select * from cte
where seq =1

SQL - min() gets the lowest value, max() the highest, what if I want the 2nd (or 5th or nth) lowest value?

The problem I'm trying to solve is that I have a table like this:
a and b refer to point on a different table. distance is the distance between the points.
| id | a_id | b_id | distance | delete |
| 1 | 1 | 1 | 1 | 0 |
| 2 | 1 | 2 | 0.2345 | 0 |
| 3 | 1 | 3 | 100 | 0 |
| 4 | 2 | 1 | 1343.2 | 0 |
| 5 | 2 | 2 | 0.45 | 0 |
| 6 | 2 | 3 | 110 | 0 |
....
The important column I'm looking is a_id. If I wanted to keep the closet b for each a, I could do something like this:
update mytable set delete = 1 from (select a_id, min(distance) as dist from table group by a_id) as x where a_gid = a_gid and distance > dist;
delete from mytable where delete = 1;
Which would give me a result table like this:
| id | a_id | b_id | distance | delete |
| 1 | 1 | 1 | 1 | 0 |
| 5 | 2 | 2 | 0.45 | 0 |
....
i.e. I need one row for each value of a_id, and that row should have the lowest value of distance for each a_id.
However I want to keep the 10 closest points for each a_gid. I could do this with a plpgsql function but I'm curious if there is a more SQL-y way.
min() and max() return the smallest and largest, if there was an aggregate function like nth(), which'd return the nth largest/smallest value then I could do this in similar manner to the above.
I'm using PostgeSQL.
Try this:
SELECT *
FROM (
SELECT a_id, (
SELECT b_id
FROM mytable mib
WHERE mib.a_id = ma.a_id
ORDER BY
dist DESC
LIMIT 1 OFFSET s
) AS b_id
FROM (
SELECT DISTINCT a_id
FROM mytable mia
) ma, generate_series (1, 10) s
) ab
WHERE b_id IS NOT NULL
Checked on PostgreSQL 8.3
I love postgres, so it took it as a challenge the second I saw this question.
So, for the table:
Table "pg_temp_29.foo"
Column | Type | Modifiers
--------+---------+-----------
value | integer |
With the values:
SELECT value FROM foo ORDER BY value;
value
-------
0
1
2
3
4
5
6
7
8
9
14
20
32
(13 rows)
You can do a:
SELECT value FROM foo ORDER BY value DESC LIMIT 1 OFFSET X
Where X = 0 for the highest value, 1 for the second highest, 2... And so forth.
This can be further embedded in a subquery to retrieve the value needed. So, to use the dataset provided in the original question we can get the a_ids with the top ten lowest distances by doing:
SELECT a_id, distance FROM mytable
WHERE id IN
(SELECT id FROM mytable WHERE t1.a_id = t2.a_id
ORDER BY distance LIMIT 10);
ORDER BY a_id, distance;
a_id | distance
------+----------
1 | 0.2345
1 | 1
1 | 100
2 | 0.45
2 | 110
2 | 1342.2
Does PostgreSQL have the analytic function rank()? If so try:
select a_id, b_id, distance
from
( select a_id, b_id, distance, rank() over (partition by a_id order by distance) rnk
from mytable
) where rnk <= 10;
This SQL should find you the Nth lowest salary should work in SQL Server, MySQL, DB2, Oracle, Teradata, and almost any other RDBMS: (note: low performance because of subquery)
SELECT * /*This is the outer query part */
FROM mytable tbl1
WHERE (N-1) = ( /* Subquery starts here */
SELECT COUNT(DISTINCT(tbl2.distance))
FROM mytable tbl2
WHERE tbl2.distance < tbl1.distance)
The most important thing to understand in the query above is that the subquery is evaluated each and every time a row is processed by the outer query. In other words, the inner query can not be processed independently of the outer query since the inner query uses the tbl1 value as well.
In order to find the Nth lowest value, we just find the value that has exactly N-1 values lower than itself.