INSERT with VALUES and subquery, postgres - sql

I need to insert several rows at once, and values for some columns should be taken from ‘VALUES’, and for some columns, I would need to use a subquery.
Assuming that subquery will return exactly the same amount of rows as I’m trying to insert. Something like this:
WITH subq AS (
--returns 3 rows
SELECT param FROM tbl2 WHERE status = 1
)
INSERT INTO tbl1 (col1, col2, col3)
VALUES
('col1_val1', 'col2_val1', subq.row1.param),
('col1_val2', 'col2_val2', subq.row2.param),
('col1_val3', 'col2_val3', subq.row3.param)

You can add the text to the CTE and make a INSERT INTO SELECT
WITH CTE as
(
(SELECT 'col1_val1', 'col2_val1', param FROM tbl2 WHERE status = 1 ORDER By params LIMIT 1 OFFSET 0)
UNION ALL
(SELECT 'col1_val2', 'col2_val2', param FROM tbl2 WHERE status = 1 ORDER By params LIMIT 1 OFFSET 1)
UNION ALL
(SELECT 'col1_val3', 'col2_val3', param FROM tbl2 WHERE status = 1 ORDER By params LIMIT 1 OFFSET 2))
INSERT INTO tbl1 (col1, col2, col3)
SELECT * FROM CTE

Related

Merge into (in SQL), but ignore the duplicates

I try to merge two tables in snowflake with:
On CONCAT(tab1.column1, tab1.column2) = CONCAT(tab1.column1, tab1.column2)
The problem here is that there are duplicates. that means rows where column1 and column2 in table2 are identical. the only difference is the column timestamp. Therefore i would like to have two options: either i ignore the duplicate and take only one row (with the biggest timestamp), or distinguish again based on the timestamp. the second would be nicer
But I have no clue how to do it
Example:
Table1:
Col1 Col2 Col3 Timestamp
24 10 3 05.05.2022
34 19 2 04.05.2022
24 10 4 06.05.2022
Table2:
Col1 Col2 Col3
24 10 Null
34 19 Null
What I want to do:
MERGE INTO table1 AS dest USING
(SELECT * FROM table2) AS src
ON CONCAT(dest.col1, dest.col2) = CONCAT(src.col1, src.col2)
WHEN MATCHED THEN UPDATE
SET dest.col3 = src.col3
It feels like you want to update from TABLE1 too TABLE2 not the other way around, because as your example is there is no duplicates.
It also feels like you want to use two equi join's on col1 AND col2 not concat them together:
thus how I see your data, and the words you used, I think you should do this:
create or replace table table1(Col1 number, Col2 number, Col3 number, timestamp date);
insert into table1 values
(24, 10, 3, '2022-05-05'::date),
(34, 19, 2, '2022-05-04'::date),
(24, 10, 4, '2022-05-06'::date);
create or replace table table2(Col1 number, Col2 number, Col3 number);
insert into table2 values
(24, 10 ,Null),
(34, 19 ,Null);
MERGE INTO table2 AS d
USING (
select *
from table1
qualify row_number() over (partition by col1, col2 order by timestamp desc) = 1
) AS s
ON d.col1 = s.col1 AND d.col2 = s.col2
WHEN MATCHED THEN UPDATE
SET d.col3 = s.col3;
which runs fine:
number of rows updated
2
select * from table2;
shows it's has been updated:
COL1
COL2
COL3
24
10
4
34
19
2
but the JOIN being your way work as you have used if that is correct for your application, albeit it feels very wrong to me.
MERGE INTO table2 AS d
USING (
select *
from table1
qualify row_number() over (partition by col1, col2 order by timestamp desc) = 1
) AS s
ON concat(d.col1, d.col2) = concat(s.col1, s.col2)
WHEN MATCHED THEN UPDATE
SET d.col3 = s.col3;
This is it:
WITH CTE AS
(
SELECT *,
RANK() OVER (PARTITION BY col1,col2
ORDER BY Timestamp desc) AS rn
FROM table1
)
UPDATE CTE
SET col3 = (select col3 from table2 where CONCAT(table2.col1,table2.col2) = CONCAT(CTE.col1, CTE.col2))
where CTE.rn =1;

Aggregate function COUNT not scalar

The COUNT function doesn't result in a scalar as expected:
CREATE TABLE MyTable (Col1 INT, Col2 INT, Col3 INT)
INSERT INTO MyTable VALUES(2,3,9) -- Row 1
INSERT INTO MyTable VALUES(1,5,7) -- Row 2
INSERT INTO MyTable VALUES(2,3,9) -- Row 3
INSERT INTO MyTable VALUES(3,4,9) -- Row 4
SELECT COUNT(*) AS Result
FROM MyTable
WHERE Col3=9
GROUP BY Col1, Col2
I filter out the 3 rows where Col3=9.
In the remaining 3 rows there are two groups:
Group 1 where Col1=2 AND Col2=3 (Row 1 and 3)
Group 2 where Col1=3 AND Col2=4 (Row 4)
Finally I count those two rows.
Therefore, I expect the answer to be a scalar Result = 2 (the two groups where Col3=9).
But I got a non scalar result.
There are other ways to solve the this, so thats not the problem, but where am I thinking wrong?
Seems like you are looking for the total count of all the groups matching any condition. For this try like the following query.
SELECT COUNT(*) [Count] FROM
(
SELECT COUNT(*) AS Result
FROM MyTable
WHERE Col3=9
GROUP BY Col1, Col2
)T
SQL Fiddle
You can use subquery with singe aggregation :
select count(*)
from (select distinct col1, col2
from mytable
where col3 = 9
) t;

Redshift sample from table based on count of another table

I have TableA of, say, 3000 rows (could be any number < 10000). I need to create TableX with 10000 rows. So I need to select random 10000 - (number of rows in TableA) from TableB (and add in TableA as well) to create TableX. Any ideas please?
Something like this (which obviously wouldnt work):
Create table TableX as
select * from TableA
union
select * from TableB limit (10000 - count(*) from TableA);
You could use union all and window functions. You did not list the table columns, so I assumed col1 and col2:
insert into tableX (col1, col2)
select col1, col2 from table1
union all
select t2.col1, t2.col2
from (select t2.*, row_number() over(order by random()) from table2 t2) t2
inner join (select count(*) cnt from table1) t1 on t2.rn <= 10000 - t1.cnt
The first query in union all selects all rows from table1. The second query assigns random row numbers to rows in table2, and then selects as many rows as needed to reach a total of 10000.
Actually it might be simpler to select all rows from both tables, then order by and limit in the outer query:
insert into tableX (col1, col2)
select col1, col2
from (
select col1, col2, 't1' which from table1
union all
select col1, col2, 't2' from table2
) t
order by which, random()
limit 10000
with inparms as (
select 10000 as target_rows
), acount as (
select count(*) as acount, inparms.target_rows
from tablea
cross join inparms
), btag as (
select b.*, 'tableb' as tabsource,
row_number() over (order by random()) as rnum
from tableb
)
select a.*, 'tablea', row_number() over (order by 1) as rnum
from tablea
union all
select b.*
from btag b
join acount a on b.rnum <= a.target_rows - a.acount
;

Match columns 1 if data not found then search column 2 oracle query

I am trying to find a way if data is not found based on col1 of a table then search with other column value
SELECT * FROM TABLE
WHERE COL1='123'
IF NULL
THEN
SELECT * FROM TABLE
WHERE COL2='ABC';
Thanks
This a typical SQL select statement involving an OR expression.
SELECT * from TABLE WHERE Col1 = '123' or Col2 = 'ABC';
You want all rows that satisfy the first condition - but if no row matches, then you want all rows that satisfy the second condition.
I would adress this with a row limiting clause (available starting version 12c):
select *
from mytable
where 'ABC' in (col1, col2)
order by rank() over(order by case col1 = 'ABC' then 1 else 2 end)
fetch first 1 row with ties
This is more efficient than union all because it does not require two scans on the table.
You can use exists with union all :
select t.*
from table t
where col1 = 123 union all
select t.*
from table t
where col2 = 'abc' and
not exists (select 1 from table t1 where t1.col1 = 123);
If you are expecting only one row, you can use:
SELECT t.*
FROM TABLE t
WHERE COL1 = '123' OR COL2 = 'ABC'
ORDER BY (CASE WHEN COL1 = '123' THEN 1 ELSE 2 END)
FETCH FIRST 1 ROW ONLY;
With multiple possible rows in the result set, I would go for:
SELECT t.*
FROM TABLE t
WHERE COL1 = '123' OR
(COL2 = 'ABC' AND
NOT EXISTS (SELECT 1 FROM TABLE t2 WHERE t2.COL1 = '123');

How to delete all records returned by a subquery?

I want to delete all records that are returned by a certain query, but I can't figure out a proper way to do this. I tried to DELETE FROM mytable WHERE EXISTS (subquery), however, that deleted all records from the table and not just the ones selected by the subquery.
My subquery looks like this:
SELECT
MAX(columnA) as columnA,
-- 50 other columns
FROM myTable
GROUP BY
-- the 50 other columns above
having count(*) > 1;
This should be easy enough, but my mind is just stuck right now. I'm thankful for any suggestions.
Edit: columnA is not unique (also no other column in that table is globally unique)
Presumably, you want to use in:
DELETE FROM myTable
WHERE columnA IN (SELECT MAX(columnA) as columnA
FROM myTable
GROUP BY -- the 50 other columns above
HAVING count(*) > 1
);
This assumes that columnA is globally unique in the table. Otherwise, you will have to work a bit harder.
DELETE FROM myTable t
WHERE EXISTS (SELECT 1
FROM (SELECT MAX(columnA) as columnA,
col1, col2, . . .
FROM myTable
GROUP BY -- the 50 other columns above
HAVING count(*) > 1
) t2
WHERE t.columnA = t2.columnA AND
t.col1 = t2.col1 AND
t.col2 = t2.col2 AND . . .
);
And, even this isn't guaranteed to work if any of the columns have NULL values (although the conditions can be easily modified to handle this).
Another solution if the uniqueness is only guaranteed by a set of columns:
delete table1 where (col1, col2, ...) in (
select min(col1), col2, ...
from table1
where...
group by col2, ...
)
Null values will be ignored and not deleted.
To achieve this, try something like
with data (id, val1, val2) as
(
select 1, '10', 10 from dual union all
select 2, '20', 21 from dual union all
select 2, null, 21 from dual union all
select 2, '20', null from dual
)
-- map null values in column to a nonexistent value in this column
select * from data d where (d.id, nvl(d.val1, '#<null>')) in
(select dd.id, nvl(dd.val1, '#<null>') from data dd)
If you need to delete all the rows of a table such that the value of a given field is in the result of a query, you can use something like
delete table
my column in ( select column from ...)