sql insert query with duplicate id's - sql

i have table with id column and param column
and i try to add some param only where this param not exist
for example my table is:
+--+-----+
|id|param|
+--+-----+
|2 |a |
+--+-----+
|2 |b |
+--+-----+
|3 |a |
+--+-----+
|3 |b |
+--+-----+
|4 |a |
+--+-----+
|4 |b |
+--+-----+
|4 |c |
+--+-----+
now i try to add "c" param to all id's that don't have c param
how i can do it in one sql query?
(the param that i wont to add it hard coded like "c" in the example param and i dont need to take it from any other table...)

You can do this with insert . . . select. The select part just needs to find the ids that do not have that parameter:
insert into t(id, param)
select id, 'C'
from t
group by id
having sum(case when param = 'C' then 1 else 0 end) = 0;

You could do this with MERGE ... SELECT inserting when there is no match.
MERGE INTO mytable t
USING ( SELECT DISTINCT id, 'C'
FROM mytable
) as i (key, val)
ON id=key and param=val
WHEN NOT MATCHED THEN
INSERT (id,param) VALUES (key,val)

Related

Conditional count of rows where at least one peer qualifies

Background
I'm a novice SQL user. Using PostgreSQL 13 on Windows 10 locally, I have a table t:
+--+---------+-------+
|id|treatment|outcome|
+--+---------+-------+
|a |1 |0 |
|a |1 |1 |
|b |0 |1 |
|c |1 |0 |
|c |0 |1 |
|c |1 |1 |
+--+---------+-------+
The Problem
I didn't explain myself well initially, so I've rewritten the goal.
Desired result:
+-----------------------+-----+
|ever treated |count|
+-----------------------+-----+
|0 |1 |
|1 |3 |
+-----------------------+-----+
First, identify id that have ever been treated. Being "ever treated" means having any row with treatment = 1.
Second, count rows with outcome = 1 for each of those two groups. From my original table, the ids who are "ever treated" have a total of 3 outcome = 1, and the "never treated", so to speak, have 1 `outcome = 1.
What I've tried
I can get much of the way there, I think, with something like this:
select treatment, count(outcome)
from t
group by treatment;
But that only gets me this result:
+---------+-----+
|treatment|count|
+---------+-----+
|0 |2 |
|1 |4 |
+---------+-----+
For the updated question:
SELECT ever_treated, sum(outcome_ct) AS count
FROM (
SELECT id
, max(treatment) AS ever_treated
, count(*) FILTER (WHERE outcome = 1) AS outcome_ct
FROM t
GROUP BY 1
) sub
GROUP BY 1;
ever_treated | count
--------------+-------
0 | 1
1 | 3
db<>fiddle here
Read:
For those who got no treatment at all (all treatment = 0), we see 1 x outcome = 1.
For those who got any treatment (at least one treatment = 1), we see 3 x outcome = 1.
Would be simpler and faster with proper boolean values instead of integer.
(Answer to updated question)
here is an easy to follow subquery logic that works with integer:
select subq.ever_treated, sum(subq.count) as count
from (select id, max(treatment) as ever_treated, count(*) as count
from t where outcome = 1
group by id) as subq
group by subq.ever_treated;

SQL count distinct values for each row

I got a table looking like this
+-----+---------+
|Group|Value |
+-----+---------+
|A |1 |
+-----+---------+
|B |2 |
+-----+---------+
|C |1 |
+-----+---------+
|D |3 |
+-----+---------+
And I would like to add a column in my select command that count GROUP based on value, lookin like this:
+-----+---------+---------+
|Group|Value | COUNT |
+-----+---------+---------+
|A |1 |2 |
+-----+---------+---------+
|B |2 |1 |
+-----+---------+---------+
|C |1 |2 |
+-----+---------+---------+
|D |3 |1 |
+-----+---------+---------+
Value 1 got the two groups A and C the other values for each one in this example.
Additional is it possible to consider all values for VALUES and GROUP even if a WHERE filtered out some of them in the select query?
You want a window function:
select t.*, count(*) over (partition by value) as count
from t;
You have a problem if the query has a where clause. The where applies to the window function. So you need a subquery for the count:
select t.*
from (select t.*, count(*) over (partition by value) as count
from t
) t
where . . .;
Or a correlated subquery might be convenient under some circumstances:
select t.*,
(select count(*) from t t2 where t2.value = t.value) as count
from t
where . . .;

How to explode an Array and create a view in Hive?

I have the following data where id is an Integer and vectors is an array:
id, vectors
1, [1,2,3]
2, [2,3,4]
3, [3,4,5]
I would like to explode the vectors column with its index postioning such that it looks like this:
+---+-----+------+
|id |index|vector|
+---+-----+------+
|1 |0 |1 |
|1 |1 |2 |
|1 |2 |3 |
|2 |0 |2 |
|2 |1 |3 |
|2 |2 |4 |
|3 |0 |3 |
|3 |1 |4 |
|3 |2 |5 |
+---+-----+------+
I figured that I can do this using Spark Scala using selectExpr
df.selectExpr("*", "posexplode(vectors) as (index, vector)")
However, this is a relatively simple task and I would like to avoid writing ETL scripts and was thinking if there is anyway the expression can be used and creating a view for easy access through Presto.
This is easy to do in Presto using standard SQL syntax with UNNEST:
WITH data(id, vector) AS (
VALUES
(1, array[1,2,3]),
(2, array[2,3,4]),
(3, array[3,4,5])
)
SELECT id, index - 1 AS index, value
FROM data, UNNEST(vector) WITH ORDINALITY AS t(value, index)
Note that the index produced by WITH ORDINALITY is 1-based, so I subtracted 1 from it to produce the output you included in your question.
You can use Lateral view of Hive to explode array data.
Try below query -
select
id, (row_number() over (partition by id order by col)) -1 as `index`, col as vector
from (
select 1 as id, array(1,2,3) as vectors from (select '1') t1 union all
select 2 as id, array(2,3,4) as vectors from (select '1') t2 union all
select 3 as id, array(3,4,5) as vectors from (select '1') t3
) t
LATERAL VIEW explode(vectors) v;

Update each row with incremental value Postgres

I want to update every row in a table in Postgres and set each row to a different value; this value is gonna be an incremental value with a start value.
For instance, suppose I have table tab_a with the following data:
|attr_a|attr_b|
|1 |null |
|2 |null |
|3 |null |
|4 |null |
The output I might want is:
|attr_a|attr_b|
|1 |5 |
|2 |6 |
|3 |7 |
|4 |8 |
Here is my script:
UPDATE tab_a
SET attr_b = gen.id
FROM generate_series(5,8) AS gen(id);
However is not working as expected...
You could do
UPDATE tab_a upd
SET attr_b = row_number + 4 -- or something like row_number + (select max(attr_a) from tab_a)
FROM (
SELECT attr_a, row_number() over ()
FROM tab_a
ORDER BY 1
) a
WHERE upd.attr_a = a.attr_a;
Do something like this
UPDATE pr_conf_item upd
SET item_order = a.row_number
FROM (
SELECT id, row_number() over ()
FROM pr_conf_item
ORDER BY 1
) a
WHERE upd.id = a.id;

Joining with null

I have two tables and I want to join them with null values in it.
Sample data of my first table(A_TEST):
+--+----+
|ID|NAME|
+--+----+
| |a |
|1 |b |
|1 |c |
+--+----+
Sample data of my second table(B_TEST):
+--+----+
|ID|NAME|
+--+----+
|1 |d |
|2 |e |
|3 |f |
+--+----+
I need to achieve the result by joining a_test.id = b_test.id and if there is null values in it I need to fetch them too. So I tried to write query as below,
select a_test.id,a_test.name,b_test.id,b_test.name
from a_test,b_test
where (a_test.id = b_test.id
or a_test.id is null);
I got output as below,
+--+----+--+----+
|ID|NAME|ID|NAME|
+--+----+--+----+
| |a |1 |d |
| |a |2 |e |
| |a |3 |f |
|1 |b |1 |d |
|1 |c |1 |d |
+--+----+--+----+
But my expected result is, since id 1 is there in my a_test i need the corresponding row from b_test also.See output below
+--+----+--+----+
|ID|NAME|ID|NAME|
+--+----+--+----+
| |a |1 |d |
|1 |b |1 |d |
|1 |c |1 |d |
+--+----+--+----+
I tried with outer joins also but that also does not give me the expected output.
Your own query is almost correct (although you shouldn't use error-prone comma-separated joins that went out of fashion some twenty years ago). You are only missing the condition what must match in case of a_test.id is null (which is: the b_test.id must be in table a_test).
select
a.id as a_id,
a.name as a_name,
b.id as b_id,
b.name as b_name
from a_test a
join b_test b on
(a.id = b.id)
or
(a.id is null and b.id in (select id from a_test));
SQL fiddle: http://www.sqlfiddle.com/#!4/fae22/2.
However strange and meaningless your requirement is, this query gives you your expected result:
select A.*, B.*
from a_test A
join b_test B
on A.id = B.id
union all
select A.*, B.*
from a_test A
cross join b_test B
where A.id is null
and exists (
select 1
from a_test Ax
where Ax.id = B.id
)
order by 2, 4
;
Enjoy!
If a a_test.id NULL is supposed to be treated as 1 when joining, use COALESCE and a sub-query to find replacing value (to be figured out by yourself, just make sure it doesn't return more than one row):
select a_test.id,a_test.name,b_test.id,b_test.name
from a_test,b_test
where COALESCE(a_test.id,(select integervalue from sometable)) = b_test.id
select a_test.id,a_test.name,b_test.id,b_test.name
from a_test,b_test
where a_test.id = b_test.id(+)
But, what are you want see, when a_test.id is null or missing?