Use IN to compare Array of Values against a table of data - sql

I want to compare an array of values against the the rows of a table and return only the rows in which the data are different.
Suppose I have myTable:
| ItemCode | ItemName | FrgnName |
|----------|----------|----------|
| CD1 | Apple | Mela |
| CD2 | Mirror | Specchio |
| CD3 | Bag | Borsa |
Now using the SQL instruction IN I would like to compare the rows above against an array of values pasted from an excel file and so in theory I would have to write something like:
WHERE NOT IN (
ARRAY[CD1, Apple, Mella],
ARRAY[CD2, Miror, Specchio],
ARRAY[CD3, Bag, Borsa]
)
The QUERY should return rows 1 and 2 "MELLA" and "MIROR" are in fact typos.

You could use a VALUES expression to emulate a table of those arrays, like so:
... myTable AS t
LEFT JOIN (
VALUES (1, 'CD1','Apple','Mella')
, (1, 'CD2', 'Miror', 'Specchio')
, (1, 'CD3', 'Bag', 'Borsa')
) AS v(rowPresence, a, b, c)
ON t.ItemCode = v.a AND t.ItemName = v.b AND t.FrgnName = v.c
WHERE v.rowPresence IS NULL
Technically, in your scenario, you can do without the "rowPresence" field I added, since none of the values in your arrays are NULL any would do; I basically added it to point to a more general case.

Related

postgres insert data from an other table inside array type columns

I have tow table on Postgres 11 like so, with some ARRAY types columns.
CREATE TABLE test (
id INT UNIQUE,
category TEXT NOT NULL,
quantitie NUMERIC,
quantities INT[],
dates INT[]
);
INSERT INTO test (id, category, quantitie, quantities, dates) VALUES (1, 'cat1', 33, ARRAY[66], ARRAY[123678]);
INSERT INTO test (id, category, quantitie, quantities, dates) VALUES (2, 'cat2', 99, ARRAY[22], ARRAY[879889]);
CREATE TABLE test2 (
idweb INT UNIQUE,
quantities INT[],
dates INT[]
);
INSERT INTO test2 (idweb, quantities, dates) VALUES (1, ARRAY[34], ARRAY[8776]);
INSERT INTO test2 (idweb, quantities, dates) VALUES (3, ARRAY[67], ARRAY[5443]);
I'm trying to update data from table test2 to table test only on rows with same id. inside ARRAY of table test and keeping originals values.
I use INSERT on conflict,
how to update only 2 columns quantities and dates.
running the sql under i've got also an error that i don't understand the origin.
Schema Error: error: column "quantitie" is of type numeric but expression is of type integer[]
INSERT INTO test (SELECT * FROM test2 WHERE idweb IN (SELECT id FROM test))
ON CONFLICT (id)
DO UPDATE
SET
quantities = array_cat(EXCLUDED.quantities, test.quantities),
dates = array_cat(EXCLUDED.dates, test.dates);
https://www.db-fiddle.com/f/rs8BpjDUCciyZVwu5efNJE/0
is there a better way to update table test from table test2, or where i'm missing the sql?
update to show result needed on table test:
**Schema (PostgreSQL v11)**
| id | quantitie | quantities | dates | category |
| --- | --------- | ---------- | ----------- | --------- |
| 2 | 99 | 22 | 879889 | cat2 |
| 1 | 33 | 34,66 | 8776,123678 | cat1 |
Basically, your query fails because the structures of the tables do not match - so you cannot insert into test select * from test2.
You could work around this by adding "fake" columns to the select list, like so:
insert into test
select idweb, 'foo', 0, quantities, dates from test2 where idweb in (select id from test)
on conflict (id)
do update set
quantities = array_cat(excluded.quantities, test.quantities),
dates = array_cat(excluded.dates, test.dates);
But this looks much more convoluted than needed. Essentially, you want an update statement, so I would just recommend:
update test
set
dates = test2.dates || test.dates,
quantities = test2.quantities || test.quantities
from test2
where test.id = test2.idweb
Note that this ues || concatenation operator instead of array_cat() - it is shorter to write.
Demo on DB Fiddle:
id | category | quantitie | quantities | dates
-: | :------- | --------: | :--------- | :------------
2 | cat2 | 99 | {22} | {879889}
1 | cat1 | 33 | {34,66} | {8776,123678}

Include data in a table looking in every insert if there is a match with the table values

I need to insert data from one table into another, but this insert must look into the table which receives data to determine if there is a match or not, and if it is, don't insert new data.
So, i have the next tables (NODE_ID refers to values in NODE1 and NODE2, think about lines with two nodes everyone):
Table A:
| ARC | NODE1 | NODE2 | STATE |
| x | 1 | 2 | A |
| y | 2 | 3 | A |
| z | 3 | 4 | B |
Table B:
| NODE_ID| VALUE |
| 1 | N |
| 2 | N |
| 3 | N |
| 4 | N |
And want the next result, that relates NODE_ID with ARCS and write in the result table the value of STATE from ARCS table, only one result for each NODE, because if not, i would have more than one row for the same NODE:
Table C result:
| NODE_ID| STATE |
| 1 | A |
| 2 | A |
| 3 |A(or B)|
I tried to do this with CASE statement with EXISTS, IF , and NVL2() and so on in the select but have no result at this time.
Any idea about how could i write this query?
Thank you very much for your help
Ok guys, i edit my message to explain how i did it finally, i've also changed a little bit my first message to make it more clear to undestand because we had problems with that.
So finally i used this query, that #mathguy introduced to me:
merge into Table_C c
using (select distinct b.NODE_ID as nodes, a.STATE
from Table_A a, Table_B b
where (b.NODE_ID=a.NODE1 or b.NODE_ID=a.NODE2) s
on (s.nodes=c.NODE_ID)
when not matched then
insert (NODE_ID, STATE)
values (s.nodes, s.STATE)
That's all
This can be done with insert, but often when you update one table with values from another, the merge statement is more powerful (more flexible).
merge into table_c c
using ( select arc, min(state) as state from table_a group by arc ) s
on (s.arc = c.node_id)
when not matched then insert (node_id, state)
values (s.arc, s.state)
;
Thanks to #Boneist and #ThorstenKettner for pointing out several syntax errors (now fixed).
If table C does not yet exist, use a create select statement:
create table c as select arc as node_id, state from a;
In case there can be duplicate arc (not shown in your sample) you'd need aggregation:
create table c as select arc as node_id, min(state) as state from a group by arc;

Left join statement returning unexpected null values for rows/column that have data

I have two tables that I'm writing a query against. Some of the columns can be found in one of the tables, while some of the columns are calucalted.
For clarity, I will copy my query below:
select field_a,
cast(field_b as int),
field_c,
field_d,
Year,
coalesce( cast(field_e as float),0) America_spend,
sum( cast(field_e as float), 0) as America_spend,
coalesce( cast(field_e as float)/ sum( cast( field_e as float)) over(partition by Year) as total_spend
from table_a
left join table_b on
table_a.flield_a = table_b.field_a1 and
table_a.flield_b = table_b.field_b1 and
table_a.Year = table_b.Year
group by field_a,
field_b,
Year
I have tables that look like this
table a:
|field_a|field_b|field_c|field_d|Year|field_f|field_g|field_h
|data | 1 | data | data |2014| data | data | data
|data | 1 | data | data |2014| null | data | data
|0 | 1 | data | data |2014| data | data | data
|data | 1 | data | data |2014| null | data | data
|0 | 1 | data | data |2014| data | data | data
table b:
|field_a1|field_b1|Year|field_c1|field_j
|null | 1 |2014| data | data
|data | 1 |2015| data | data
|null | 0 |2014| data | data
|data | 1 |2015| data | data
|null | 0 |2014| data | data
The problem that I'm having is that some of the values in the 'total spend column' get assigned a value of null. Total spend is calculated per year and this field should never be null. Likewise, the year column doesn't contain a null value in either of the tables. But for some reason when I run the query, I get results that have some of the rows in the year column with a null value. This should never happen. Most of the results conform to what I would expect, but there are some that do not.
I'm guessing that is has something to do with the fact that some of the rows in field_b are null and get converted to 0, but why does this matter?
I updated the tables and the queries to more accurately reflect the structure of the database.
Yes the query runs and I have no naming conflicts.
#SeanLange's comment is what is most likely the issue with your expected results. That is that NULL does NOT Equal NULL (NULL <> NULL). Null is an "unknown" value in sql and 2 unknowns are not equal.
But you can eliminate your NULL if you want to match them together as the same case. Simply use COALESCE() or ISNULL() and provide the same default value on both sides of your ON condition and make sure your default is not represented within your dataset or you will get undesired results.
DECLARE #TableA AS TABLE (FieldA VARCHAR(5),FiledB INT,Yr INT)
DECLARE #TableB AS TABLE (FieldA VARCHAR(5),FiledC INT,Yr INT)
INSERT INTO #TableA (FieldA,FiledB,Yr)
VALUES (null,1,2014),('data',1,2015),(null,1,2014),('data',1,2015),(null,1,2014)
INSERT INTO #TableB (FieldA,FiledC,Yr)
VALUES (null,1,2014),('data',1,2015),(null,1,2014),('data',1,2015),(null,1,2014)
SELECT *
FROM
#TableA a
LEFT JOIN #TableB b
ON COALESCE(a.FieldA,'NULLVALUE') = COALESCE(b.FieldA,'NULLVALUE')
AND a.Yr = b.Yr
Your particular example dataset that you provided us repeats FieldA to Yr Combinations so the results are a little funky but it still works.

Get rows where value is not a substring in another row

I'm writing recursive sql against a table that contains circular references.
No problem! I read that you can build a unique path to prevent infinite loops. Now I need to filter the list down to only the last record in the chain. I must be doing something wrong though. -edit I'm adding more records to this sample to make it more clear why just selecting the longest record doesn't work.
This is an example table:
create table strings (id int, string varchar(200));
insert into strings values (1, '1');
insert into strings values (2, '1,2');
insert into strings values (3, '1,2,3');
insert into strings values (4, '1,2,3,4');
insert into strings values (5, '5');
And my query:
select * from strings str1 where not exists
(
select * from strings str2
where str2.id <> str1.id
and str1.string || '%' like str2.string
)
I'd expect to only get the last records
| id | string |
|----|---------|
| 4 | 1,2,3,4 |
| 5 | 5 |
Instead I get them all
| id | string |
|----|---------|
| 1 | 1 |
| 2 | 1,2 |
| 3 | 1,2,3 |
| 4 | 1,2,3,4 |
| 5 | 5 |
Link to sql fiddle: http://sqlfiddle.com/#!15/7a974/1
My problem was all around the 'LIKE' comparison.
select * from strings str1
where not exists
(
select
*
from
strings str2
where
str2.id <> str1.id
and str2.string like str1.string || '%'
)

SQL Postgres Invalidate Rows that reference invalid Id's

I am trying to create a stored procedure that will invalidate rows that contain id references of an id in another table. The catch is that the rows to be invalidated contain groupings of these id's stored as a comma separated string. Let's take a look at the tables:
table_a table_b
+----+------+ +---------+-------+
| id | name | | ids | valid |
+----+------+ +---------+-------+
| 1 | a | | 1,2,3 | T |
| 2 | b | | 4,3,8 | T |
| 3 | c | | 5,2,5,4 | T |
| 4 | d | | 7 | T |
| 5 | e | | 6,8 | T |
| 6 | f | | 9,7,2 | T |
| 7 | g | +---------+-------+
| 8 | h |
+----+------+
Above you can see that table_b contains groupings of ids from table_a and as you can imagine the table_a.id is an integer while table_b.ids is text. The goal is to look at each table_b.ids and if it contains an id that does not exist in table_a.id then set its validity to false.
I have not worked with any SQL in quite sometime and I have never worked with PostgreSQL which is why I am having such difficulty. The closest query I could come up with, is not working, but is along the lines of:
CREATE FUNCTION cleanup_records() AS $func$
BEGIN
UPDATE table_b
SET valid = FALSE
WHERE COUNT(
SELECT regexp_split_to_table(table_b.ids)
EXCEPT SELECT id FROM table_a
) > 0;
END;
$func$ LANGUAGE PLPGSQL;
The general idea is that I am trying to turn each row of table_b.ids into a table and then using the EXCEPT operator against table_a to see if it has any ids that are invalid. The error I receive is:
ERROR: syntax error at or near "SELECT"
LINE 1: ...able_b SET valid = FALSE WHERE COUNT(SELECT reg...
which is not very helpful as it just indicates that I do not have the correct syntax. Is this query viable? If so can you show me where I may have gone wrong - if not is there an easier or even more complicated way to achieve this?
Sample data:
CREATE TABLE table_b
(ids text, valid boolean);
INSERT INTO table_b
(ids, valid)
VALUES
('1,2,3' , 'T'),
('4,3,8' , 'T'),
('5,2,5,4' , 'T'),
('7' , 'T'),
('6,8' , 'T'),
('9,7,2' , 'T');
CREATE TABLE table_a
(id integer, name text);
INSERT INTO table_a
(id, name)
VALUES
(1,'a'),
(2,'b'),
(3,'c'),
(4,'d'),
(5,'e'),
(6,'f'),
(7,'g'),
(8,'h');
UPDATE table_b
SET valid = FALSE
WHERE EXISTS(
SELECT regexp_split_to_table(table_b.ids)
EXCEPT SELECT id FROM table_a
);
You can use 'exists' to check for the existence of a row. The previous syntax was incorrect as count can't be used that way.
groupings of these id's stored as a comma separated string
Don't do that. It's really bad database design, and is why you're having problems. See:
Is using multiple foreign keys separated by commas wrong, and if so, why?
PostgreSQL list of integers separated by comma or integer array for performance?
Also, there's a more efficient way to do your query than that shown by vkp. If you do it that way, you're splitting the string for every ID you're testing. There is no need to do that. Instead, join on a table of expanded ID lists.
Something like:
UPDATE table_b
SET valid = 'f'
FROM table_b b
CROSS JOIN regexp_split_to_table(b.ids, ',') b_ids(id)
LEFT JOIN table_a a ON (a.id = b_ids.id::integer)
WHERE table_b.ids = b.ids
AND a.id IS NULL
AND table_b.valid = 't';
You need to join on table_b even though it's the update target because you can't make a lateral function reference to the update target table directly.