Update json in postgres using multiple columns - sql

I want to update a column of JSONB objects. So If I have this table
I want to delete value1 and value2 from the rows that have a as 1 then I thought this query would work
UPDATE
test AS d
SET
b = b - s.b_value
FROM
(VALUES
(1, 'value1'),
(1, 'value2')
)
AS s(a, b_value)
WHERE
d.a = s.a
but it gives me this result where value1 was not eliminated.
Is there a simple way to fix it? I want to make a query to delete this sort of stuff but it would be a blessing if it can be done in only one query. I got the original idea from here and here you can test the SQL query

You can subtract a text[] array of keys from a jsonb value like so:
with s (a, b_value) as (
values (1, 'value1'), (1, 'value2')
), dels as (
select a, array_agg(b_value) as b_values
from s
group by a
)
update test
set b = b - dels.b_values
from dels
where dels.a = test.a;
db<>fiddle here

Related

Select UNIQUE, NOT DISTINCT values

I am trying to select values from a table that are not duplicates - for example, with the following input set, I would like to select only the values in Column 1 that don't have a duplicated value in Column 2
Column 1 Column 2
A X
B X
C Y
D Y
E Z
Resulting in
Column 1 Column 2
E Z
This is made harder by my having a character limit for my SQL statement, and my having to join a couple of tables in the same query.
My existing statement is here, and this is where I am stuck.
SELECT d.o_docguid, d.o_itemdesc
FROM dms_doc d
INNER JOIN
(SELECT s.o_itemno as si, s.o_projectno as sp, t.o_itemno as ti, t.o_projectno as tp
FROM env_bs1192_1 s, env_bs1192_2 t
WHERE s.TB_FILE_ID = t.TB_FILE_ID) as r
ON (si = d.o_itemno AND sp = d.o_projectno)
OR (ti = d.o_itemno AND tp = d.o_projectno)
Results look like
o_docguid o_itemdesc
aguid adescription
bguid adescription
cguid bdescription
I want to filter this list out such that all that remains are the unique descriptions and their associated guid (i.e. only the rows that have specifically a single unique entry in the description, or put another way, if there is a duplicate, throw both away - in this instance, cguid and bdescription should be the only results).
The last challenge, which I still haven't solved, is that this SQL statement needs to fit into a character limit of 242 characters.
Taking the first part as a question, the answer might be:
declare #Table table (Column1 char(1), Column2 char(1));
insert into #Table values
('A', 'X'),
('B', 'X'),
('C', 'Y'),
('D', 'Y'),
('E', 'Z');
select
Column1 = max(Column1),
Column2
from
#Table
group by
Column2
having
count(*) = 1;
How to do it with generic data.
DROP TABLE IF EXISTS #MyTable
CREATE TABLE #MyTable(Column1 VARCHAR(50),Column2 VARCHAR(50))
INSERT INTO #MyTable(Column1,Column2)
VALUES
('A','X'),
('B','X'),
('C','Y'),
('D','Y'),
('E','Z')
;WITH UniqueCol2 AS
(
SELECT Column2
FROM #MyTable
GROUP BY Column2
HAVING COUNT(*) = 1
)
SELECT
mt.*
FROM UniqueCol2
JOIN #MyTable mt ON mt.Column2 = UniqueCol2.Column2

Postgresql update column based on set of values from another table

Dummy data to illustrate my problem:
create table table1 (category_id int,unit varchar,is_valid bool);
insert into table1 (category_id, unit, is_valid)
VALUES (1, 'a', true), (2, 'z', true);
create table table2 (category_id int,unit varchar);
insert into table2 (category_id, unit)
values(1, 'a'),(1, 'b'),(1, 'c'),(2, 'd'),(2, 'e');
So the data looks like:
Table 1:
category_id
unit
is_valid
1
a
true
2
z
true
Table 2:
category_id
unit
1
a
1
b
1
c
2
d
2
e
I want to update the is_valid column in Table 1, if the category_id/unit combination from Table 1 doesn't match any of the rows in Table 2. For example, the first row in Table 1 is valid, since (1, a) is in Table 2. However, the second row in Table 1 is not valid, since (2, z) is not in Table 2.
How can I update the column using postgresql? I tried a few different where clauses of the form
UPDATE table1 SET is_valid = false WHERE...
but I cannot get a WHERE clause that works how I want.
You can just set the value of is_valid the the result of a ` where exists (select ...). See Demo.
update table1 t1
set is_valid = exists (select null
from table2 t2
where (t2.category_id, t2.unit) = (t1.category_id, t1.unit)
);
NOTES:
Advantage: Query correctly sets the is_valid column regardless of the current value and is a vary simple query.
Disadvantage: Query sets the value of is_valid for every row in the table; even thoes already correctly set.
You need to decide whether the disadvantage out ways the advantage. If so then the same basic technique in a much more complicated query:
with to_valid (category_id, unit, is_valid) as
(select category_id
, unit
, exists (select null
from table2 t2
where (t2.category_id, t2.unit) = (t1.category_id, t1.unit)
)
from table1 t1
)
update table1 tu
set is_valid = to_valid.is_valid
from to_valid
where (tu.category_id, tu.unit) = (to_valid.category_id, to_valid.unit)
and tu.is_valid is distinct from to_valid.is_valid;

Count non-null values from multiple columns at once without manual entry in SQL

I have a SQL table with about 50 columns, the first represents unique users and the other columns represent categories which are scored 1-10.
Here is an idea of what I'm working with
user
a
b
c
abc
5
null
null
xyz
null
6
null
I am interested in counting the number of non-null values per column.
Currently, my queries are:
SELECT col_name, COUNT(col_name) AS count
FROM table
WHERE col_name IS NOT NULL
Is there a way to count non-null values for each column in one query, without having to manually enter each column name?
The desired output would be:
column
count
a
1
b
1
c
0
Consider below approach (no knowledge of column names is required at all - with exception of user)
select column, countif(value != 'null') nulls_count
from your_table t,
unnest(array(
select as struct trim(arr[offset(0)], '"') column, trim(arr[offset(1)], '"') value
from unnest(split(trim(to_json_string(t), '{}'))) kv,
unnest([struct(split(kv, ':') as arr)])
where trim(arr[offset(0)], '"') != 'user'
)) rec
group by column
if applied to sample data in your question - output is
I didn't do this in big-query but instead in SQL Server, however big query has the concept of unpivot as well. Basically you're trying to transpose your columns to rows and then do a simple aggregate of the columns to see how many records have data in each column. My example is below and should work in big query without much or any tweaking.
Here is the table I created:
CREATE TABLE example(
user_name char(3),
a integer,
b integer,
c integer
);
INSERT INTO example(user_name, a, b, c)
VALUES('abc', 5, null, null);
INSERT INTO example(user_name, a, b, c)
VALUES('xyz', null, 6, null);
INSERT INTO example(user_name, a, b, c)
VALUES('tst', 3, 6, 1);
And here is the UNPIVOT I did:
select count(*) as amount, col
from
(select user_name, a, b, c from example) e
unpivot
(blah for col in (a, b, c)
) as unpvt
group by col
Here's example of the output (note, I added an extra record in the table to make sure it was working properly):
Again, the syntax may be slightly different in BigQuery but I think thould get you most of the way there.
Here's a link to my db-fiddle - https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=deaa0e92a4ef1de7d4801e458652816b

BigQuery Merge - Insert new rows if not mached

I'm trying to write a Merge query in Google BigQuery (part of an ETL process).
I have Source (staging) and Target tables and I have 2 ways of merge the data: the classic 'Upsert' Merge OR Insert new row if not matched all columns.
This is an example of the first way (the classic 'Upsert') query:
MERGE DS.Target T
USING DS.Source S
ON T.Key=S.Key
WHEN NOT MATCHED THEN
INSERT ROW
WHEN MATCHED THEN
UPDATE SET Col1 = S.Col1, Col2 = S.Col2
in that way if the key exist it always updates the values of the cols even if value are the same. also this will work only if the key is not Nullable.
The other way of doing it is to inserting new row when values not matched:
MERGE DS.Target T
USING DS.Source S
ON T.A = S.A and T.B = S.B and T.C = S.C
WHEN NOT MATCHED THEN
INSERT ROW
I prefer this way, BUT I found that its not possible when column type is NULL, because NULL != NULL and then the condition is false when values are Null.
I can't find a proper way of writing this query and handle Nulls comparison.
It's not possible to check for Nulls at the merge condition, Ex:
ON ((T.A IS NULL and S.A IS NULL) or T.A = S.A)
WHEN NOT MATCHED THEN
INSERT ROW
Error message:
RIGHT OUTER JOIN cannot be used without a condition that is an equality of fields from both sides of the join.
It's not possible also to use the Target table reference at the WHERE clause, Ex:
ON T.A = S.A
WHEN NOT MATCHED AND
S.A IS NOT NULL AND T.A IS NOT NULL
THEN
INSERT ROW
What do you suggest? Also, lets say both ways are possible, what would be more cost effective by BQ? I guess the performance should be the same. I also assume that I can ignore the insertions cost. Thanks!
Can you use a "magic" number or id?
This works:
CREATE OR REPLACE TABLE temp.target AS
SELECT * FROM UNNEST(
[STRUCT(1 AS A, 2 AS B, 3 AS C, 5 AS d)
, (null, 1, 3, 500)
]);
CREATE OR REPLACE TABLE temp.source AS
SELECT * FROM UNNEST(
[STRUCT(1 AS A, 2 AS B, 3 AS C, 100 AS d)
, (1, 1, 1, 1000)
, (null, null, null, 10000)
, (null, 1, 3, 10000)
]);
MERGE temp.target T
USING temp.source S
ON IFNULL(T.A, -9999999) = IFNULL(S.A, -9999999) and IFNULL(T.B, -9999999) = IFNULL(S.B, -9999999) and IFNULL(T.C, -9999999) = IFNULL(S.C, -9999999)
WHEN NOT MATCHED THEN
INSERT ROW;

SQL - Select from column A based on values in column B

Lets say I have a table with 2 columns (a, b) with following values:
a b
--- ---
1 5
1 NULL
2 NULL
2 NULL
3 NULL
My desired output:
a
---
2
3
I want to select only those distinct values from column a for which every single occurrence of this value has NULL in column b. Therefore from my desired output, "1" won't come in because there is a "5" in column b even though there is a NULL for the 2nd occurrence of "1".
How can I do this using a TSQL query?
If I understand correctly, you can do this with group by and having:
select a
from t
group by a
having count(b) = 0;
When you use count() with a column name, it counts the number of non-NULL values. Hence, if all values are NULL, then the value will be zero.
It's fairly simple to do:
SELECT A
FROM table1
GROUP BY A
HAVING COUNT(B) = 0
Grouping by A results in all the rows where the value of A is identical to be transferred into a single row in the output. Adding the HAVING clause enables to filter those grouped rows with an aggregate function. COUNT doesn't count NULL values, so when it's 0, there are no other values in B.
Two more ways to do this:
SELECT a
FROM t
EXCEPT
SELECT a
FROM t
WHERE b IS NOT NULL ;
This would use an index on (a, b):
SELECT a
FROM t
GROUP BY a
WHERE MIN(b) IS NOT NULL ;
Try it like this:
DECLARE #tbl TABLE(a INT, b INT);
INSERT INTO #tbl VALUES(1,5),(1,NULL),(2,NULL),(2,NULL),(3,NULL);
--Your test data
SELECT * FROM #tbl;
--And this is what you want - hopefully...
SELECT DISTINCT tbl.a
FROM #tbl AS tbl
WHERE NOT EXISTS(SELECT * FROM #tbl AS x WHERE x.a=tbl.a AND b IS NOT NULL)
To turn your question on it's head, you want the values from column a where there are no non-null values for that value in column b.
select distinct a
from table1 as t1
where 0 = (select count(*)
from table1 as t2
where t1.a = t2.a
and b is not null)
Sample fiddle is here: http://sqlfiddle.com/#!6/5d1b8/1
This should do it:
SELECT DISTINCT a
FROM t
WHERE b IS NULL
AND a NOT IN (SELECT a FROM t WHERE b IS NOT NULL);