sql insert query with duplicate id's

sql insert query with duplicate id's - sql

i have table with id column and param column
and i try to add some param only where this param not exist
for example my table is:
+--+-----+
|id|param|
+--+-----+
|2 |a |
+--+-----+
|2 |b |
+--+-----+
|3 |a |
+--+-----+
|3 |b |
+--+-----+
|4 |a |
+--+-----+
|4 |b |
+--+-----+
|4 |c |
+--+-----+
now i try to add "c" param to all id's that don't have c param
how i can do it in one sql query?
(the param that i wont to add it hard coded like "c" in the example param and i dont need to take it from any other table...)

You can do this with insert . . . select. The select part just needs to find the ids that do not have that parameter:
insert into t(id, param)
select id, 'C'
from t
group by id
having sum(case when param = 'C' then 1 else 0 end) = 0;

You could do this with MERGE ... SELECT inserting when there is no match.
MERGE INTO mytable t
USING ( SELECT DISTINCT id, 'C'
FROM mytable
) as i (key, val)
ON id=key and param=val
WHEN NOT MATCHED THEN
INSERT (id,param) VALUES (key,val)

Related

Conditional count of rows where at least one peer qualifies

Background
I'm a novice SQL user. Using PostgreSQL 13 on Windows 10 locally, I have a table t:
+--+---------+-------+
|id|treatment|outcome|
+--+---------+-------+
|a |1 |0 |
|a |1 |1 |
|b |0 |1 |
|c |1 |0 |
|c |0 |1 |
|c |1 |1 |
+--+---------+-------+
The Problem
I didn't explain myself well initially, so I've rewritten the goal.
Desired result:
+-----------------------+-----+
|ever treated |count|
+-----------------------+-----+
|0 |1 |
|1 |3 |
+-----------------------+-----+
First, identify id that have ever been treated. Being "ever treated" means having any row with treatment = 1.
Second, count rows with outcome = 1 for each of those two groups. From my original table, the ids who are "ever treated" have a total of 3 outcome = 1, and the "never treated", so to speak, have 1 `outcome = 1.
What I've tried
I can get much of the way there, I think, with something like this:
select treatment, count(outcome)
from t
group by treatment;
But that only gets me this result:
+---------+-----+
|treatment|count|
+---------+-----+
|0 |2 |
|1 |4 |
+---------+-----+

For the updated question:
SELECT ever_treated, sum(outcome_ct) AS count
FROM (
SELECT id
, max(treatment) AS ever_treated
, count(*) FILTER (WHERE outcome = 1) AS outcome_ct
FROM t
GROUP BY 1
) sub
GROUP BY 1;
ever_treated | count
--------------+-------
0 | 1
1 | 3
db<>fiddle here
Read:
For those who got no treatment at all (all treatment = 0), we see 1 x outcome = 1.
For those who got any treatment (at least one treatment = 1), we see 3 x outcome = 1.
Would be simpler and faster with proper boolean values instead of integer.

(Answer to updated question)
here is an easy to follow subquery logic that works with integer:
select subq.ever_treated, sum(subq.count) as count
from (select id, max(treatment) as ever_treated, count(*) as count
from t where outcome = 1
group by id) as subq
group by subq.ever_treated;

SQL count distinct values for each row

I got a table looking like this
+-----+---------+
|Group|Value |
+-----+---------+
|A |1 |
+-----+---------+
|B |2 |
+-----+---------+
|C |1 |
+-----+---------+
|D |3 |
+-----+---------+
And I would like to add a column in my select command that count GROUP based on value, lookin like this:
+-----+---------+---------+
|Group|Value | COUNT |
+-----+---------+---------+
|A |1 |2 |
+-----+---------+---------+
|B |2 |1 |
+-----+---------+---------+
|C |1 |2 |
+-----+---------+---------+
|D |3 |1 |
+-----+---------+---------+
Value 1 got the two groups A and C the other values for each one in this example.
Additional is it possible to consider all values for VALUES and GROUP even if a WHERE filtered out some of them in the select query?

You want a window function:
select t.*, count(*) over (partition by value) as count
from t;
You have a problem if the query has a where clause. The where applies to the window function. So you need a subquery for the count:
select t.*
from (select t.*, count(*) over (partition by value) as count
from t
) t
where . . .;
Or a correlated subquery might be convenient under some circumstances:
select t.*,
(select count(*) from t t2 where t2.value = t.value) as count
from t
where . . .;

How to explode an Array and create a view in Hive?

I have the following data where id is an Integer and vectors is an array:
id, vectors
1, [1,2,3]
2, [2,3,4]
3, [3,4,5]
I would like to explode the vectors column with its index postioning such that it looks like this:
+---+-----+------+
|id |index|vector|
+---+-----+------+
|1 |0 |1 |
|1 |1 |2 |
|1 |2 |3 |
|2 |0 |2 |
|2 |1 |3 |
|2 |2 |4 |
|3 |0 |3 |
|3 |1 |4 |
|3 |2 |5 |
+---+-----+------+
I figured that I can do this using Spark Scala using selectExpr
df.selectExpr("*", "posexplode(vectors) as (index, vector)")
However, this is a relatively simple task and I would like to avoid writing ETL scripts and was thinking if there is anyway the expression can be used and creating a view for easy access through Presto.

This is easy to do in Presto using standard SQL syntax with UNNEST:
WITH data(id, vector) AS (
VALUES
(1, array[1,2,3]),
(2, array[2,3,4]),
(3, array[3,4,5])
)
SELECT id, index - 1 AS index, value
FROM data, UNNEST(vector) WITH ORDINALITY AS t(value, index)
Note that the index produced by WITH ORDINALITY is 1-based, so I subtracted 1 from it to produce the output you included in your question.

You can use Lateral view of Hive to explode array data.
Try below query -
select
id, (row_number() over (partition by id order by col)) -1 as `index`, col as vector
from (
select 1 as id, array(1,2,3) as vectors from (select '1') t1 union all
select 2 as id, array(2,3,4) as vectors from (select '1') t2 union all
select 3 as id, array(3,4,5) as vectors from (select '1') t3
) t
LATERAL VIEW explode(vectors) v;

Update each row with incremental value Postgres

I want to update every row in a table in Postgres and set each row to a different value; this value is gonna be an incremental value with a start value.
For instance, suppose I have table tab_a with the following data:
|attr_a|attr_b|
|1 |null |
|2 |null |
|3 |null |
|4 |null |
The output I might want is:
|attr_a|attr_b|
|1 |5 |
|2 |6 |
|3 |7 |
|4 |8 |
Here is my script:
UPDATE tab_a
SET attr_b = gen.id
FROM generate_series(5,8) AS gen(id);
However is not working as expected...

You could do
UPDATE tab_a upd
SET attr_b = row_number + 4 -- or something like row_number + (select max(attr_a) from tab_a)
FROM (
SELECT attr_a, row_number() over ()
FROM tab_a
ORDER BY 1
) a
WHERE upd.attr_a = a.attr_a;

Do something like this
UPDATE pr_conf_item upd
SET item_order = a.row_number
FROM (
SELECT id, row_number() over ()
FROM pr_conf_item
ORDER BY 1
) a
WHERE upd.id = a.id;

Joining with null

I have two tables and I want to join them with null values in it.
Sample data of my first table(A_TEST):
+--+----+
|ID|NAME|
+--+----+
| |a |
|1 |b |
|1 |c |
+--+----+
Sample data of my second table(B_TEST):
+--+----+
|ID|NAME|
+--+----+
|1 |d |
|2 |e |
|3 |f |
+--+----+
I need to achieve the result by joining a_test.id = b_test.id and if there is null values in it I need to fetch them too. So I tried to write query as below,
select a_test.id,a_test.name,b_test.id,b_test.name
from a_test,b_test
where (a_test.id = b_test.id
or a_test.id is null);
I got output as below,
+--+----+--+----+
|ID|NAME|ID|NAME|
+--+----+--+----+
| |a |1 |d |
| |a |2 |e |
| |a |3 |f |
|1 |b |1 |d |
|1 |c |1 |d |
+--+----+--+----+
But my expected result is, since id 1 is there in my a_test i need the corresponding row from b_test also.See output below
+--+----+--+----+
|ID|NAME|ID|NAME|
+--+----+--+----+
| |a |1 |d |
|1 |b |1 |d |
|1 |c |1 |d |
+--+----+--+----+
I tried with outer joins also but that also does not give me the expected output.

Your own query is almost correct (although you shouldn't use error-prone comma-separated joins that went out of fashion some twenty years ago). You are only missing the condition what must match in case of a_test.id is null (which is: the b_test.id must be in table a_test).
select
a.id as a_id,
a.name as a_name,
b.id as b_id,
b.name as b_name
from a_test a
join b_test b on
(a.id = b.id)
or
(a.id is null and b.id in (select id from a_test));
SQL fiddle: http://www.sqlfiddle.com/#!4/fae22/2.

However strange and meaningless your requirement is, this query gives you your expected result:
select A.*, B.*
from a_test A
join b_test B
on A.id = B.id
union all
select A.*, B.*
from a_test A
cross join b_test B
where A.id is null
and exists (
select 1
from a_test Ax
where Ax.id = B.id
)
order by 2, 4
;
Enjoy!

If a a_test.id NULL is supposed to be treated as 1 when joining, use COALESCE and a sub-query to find replacing value (to be figured out by yourself, just make sure it doesn't return more than one row):
select a_test.id,a_test.name,b_test.id,b_test.name
from a_test,b_test
where COALESCE(a_test.id,(select integervalue from sometable)) = b_test.id

select a_test.id,a_test.name,b_test.id,b_test.name
from a_test,b_test
where a_test.id = b_test.id(+)
But, what are you want see, when a_test.id is null or missing?

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

sql insert query with duplicate id's - sql

You can do this with insert . . . select. The select part just needs to find the ids that do not have that parameter: insert into t(id, param) select id, 'C' from t group by id having sum(case when param = 'C' then 1 else 0 end) = 0;

You could do this with MERGE ... SELECT inserting when there is no match. MERGE INTO mytable t USING ( SELECT DISTINCT id, 'C' FROM mytable ) as i (key, val) ON id=key and param=val WHEN NOT MATCHED THEN INSERT (id,param) VALUES (key,val)

Related

Conditional count of rows where at least one peer qualifies

SQL count distinct values for each row

How to explode an Array and create a view in Hive?

Update each row with incremental value Postgres

Joining with null

Categories

Resources