SQL Postgres Invalidate Rows that reference invalid Id's - sql

I am trying to create a stored procedure that will invalidate rows that contain id references of an id in another table. The catch is that the rows to be invalidated contain groupings of these id's stored as a comma separated string. Let's take a look at the tables:
table_a table_b
+----+------+ +---------+-------+
| id | name | | ids | valid |
+----+------+ +---------+-------+
| 1 | a | | 1,2,3 | T |
| 2 | b | | 4,3,8 | T |
| 3 | c | | 5,2,5,4 | T |
| 4 | d | | 7 | T |
| 5 | e | | 6,8 | T |
| 6 | f | | 9,7,2 | T |
| 7 | g | +---------+-------+
| 8 | h |
+----+------+
Above you can see that table_b contains groupings of ids from table_a and as you can imagine the table_a.id is an integer while table_b.ids is text. The goal is to look at each table_b.ids and if it contains an id that does not exist in table_a.id then set its validity to false.
I have not worked with any SQL in quite sometime and I have never worked with PostgreSQL which is why I am having such difficulty. The closest query I could come up with, is not working, but is along the lines of:
CREATE FUNCTION cleanup_records() AS $func$
BEGIN
UPDATE table_b
SET valid = FALSE
WHERE COUNT(
SELECT regexp_split_to_table(table_b.ids)
EXCEPT SELECT id FROM table_a
) > 0;
END;
$func$ LANGUAGE PLPGSQL;
The general idea is that I am trying to turn each row of table_b.ids into a table and then using the EXCEPT operator against table_a to see if it has any ids that are invalid. The error I receive is:
ERROR: syntax error at or near "SELECT"
LINE 1: ...able_b SET valid = FALSE WHERE COUNT(SELECT reg...
which is not very helpful as it just indicates that I do not have the correct syntax. Is this query viable? If so can you show me where I may have gone wrong - if not is there an easier or even more complicated way to achieve this?
Sample data:
CREATE TABLE table_b
(ids text, valid boolean);
INSERT INTO table_b
(ids, valid)
VALUES
('1,2,3' , 'T'),
('4,3,8' , 'T'),
('5,2,5,4' , 'T'),
('7' , 'T'),
('6,8' , 'T'),
('9,7,2' , 'T');
CREATE TABLE table_a
(id integer, name text);
INSERT INTO table_a
(id, name)
VALUES
(1,'a'),
(2,'b'),
(3,'c'),
(4,'d'),
(5,'e'),
(6,'f'),
(7,'g'),
(8,'h');

UPDATE table_b
SET valid = FALSE
WHERE EXISTS(
SELECT regexp_split_to_table(table_b.ids)
EXCEPT SELECT id FROM table_a
);
You can use 'exists' to check for the existence of a row. The previous syntax was incorrect as count can't be used that way.

groupings of these id's stored as a comma separated string
Don't do that. It's really bad database design, and is why you're having problems. See:
Is using multiple foreign keys separated by commas wrong, and if so, why?
PostgreSQL list of integers separated by comma or integer array for performance?
Also, there's a more efficient way to do your query than that shown by vkp. If you do it that way, you're splitting the string for every ID you're testing. There is no need to do that. Instead, join on a table of expanded ID lists.
Something like:
UPDATE table_b
SET valid = 'f'
FROM table_b b
CROSS JOIN regexp_split_to_table(b.ids, ',') b_ids(id)
LEFT JOIN table_a a ON (a.id = b_ids.id::integer)
WHERE table_b.ids = b.ids
AND a.id IS NULL
AND table_b.valid = 't';
You need to join on table_b even though it's the update target because you can't make a lateral function reference to the update target table directly.

Related

Select concatenated columns based on criteria list in other table

I have a table1
line
a
b
c
d
e
f
g
h
1
18
2
2
22
0
2
1
2
2
20
2
2
2
0
0
0
2
3
10
2
2
222
0
2
1
2
4
12
2
2
3
0
0
0
0
5
15
2
2
3
0
0
0
0
And a table2
 line
criteria
1
 a,b
2
 b,c,f,h
3
 a,b,e,g,h
4
 c,e
I am using this code to see/select the unique results of concated/joined columns, like concat(c,',',d), concat(b,',',d,',',g) and so on from table1 and is working perfectly:
SELECT DISTINCT(CONCAT(c,',',d))
FROM table1
But, instead of writing manually like concat(c,',',d), I want to refer to table2.criteria to get columns references to be concated/joined from table1 so that i can see the entire unique results against each concated criteria
Tried this, but getting an error:
SELECT DISTINCT(SELECT criteria FROM table2)
FROM table1
ERROR: more than one row returned by a subquery used as an expression
SQL state: 21000
The expected unique result is something like this;
| criteria | result |
| ------------ | ---------- |
| a,b | 15,2 |
| a,b | 10,2 |
| a,b | 20,2 |
| a,b | 12,2 |
| a,b | 18,2 |
| b,c,f,h | 2,2,2,2 |
| b,c,f,h | 2,2,0,2 |
| b,c,f,h | 2,2,0,0 |
| a,b,e,g,h | 20,2,0,0,2 |
| a,b,e,g,h | 12,2,0,0,0 |
| a,b,e,g,h | 15,2,0,0,0 |
| a,b,e,g,h | 10,2,0,1,2 |
| a,b,e,g,h | 18,2,0,1,2 |
| c,e | 2,0 |
SQL does not allow to parameterize identifiers. There are various ways to work around this restriction.
It's unclear from the question, but according to comments you want to concatenate the given pattern for every row in table1.
1. Dynamic SQL
Create a helper function (once!) that concatenates and executes statements dynamically.
Basics:
Define table and column names as arguments in a plpgsql function?
CREATE OR REPLACE FUNCTION f_concat_cols(_cols text)
RETURNS TABLE (result text)
LANGUAGE plpgsql AS
$func$
BEGIN
RETURN QUERY EXECUTE format(
$q$SELECT concat_ws(',', %s) FROM table1 ORDER BY line$q$, _cols);
END
$func$;
It's a set-returning function (a.k.a. "table function"), to return one result row for every row in table1 for each given pattern.
Warning: Converting user input to code like this is a prime opportunity for SQL injection. You must make sure that table1.criteria can only hold valid strings!
To get the full result matrix (with distinct results per row in table2), the query is simple now:
SELECT DISTINCT line AS t2_line, criteria, t1.*
FROM table2, f_concat_cols(criteria) t1
ORDER BY t2_line;
2. Workaround with conversion to JSON
SELECT DISTINCT t2.line AS t2_line, t2.criteria, c.*
FROM table2 t2
CROSS JOIN (SELECT line, to_json(t) AS js FROM table1 t) t1
CROSS JOIN LATERAL (
SELECT string_agg(t1.js->>sub, ',') AS result
FROM unnest(string_to_array(t2.criteria, ',')) sub
) c
ORDER BY t2_line;
After converting rows from t1 to a JSON record, we can access keys (converted from column names) directly.
I unnest the pattern, access each single key, and aggregate the result in LATERAL subquery. See:
What is the difference between a LATERAL JOIN and a subquery in PostgreSQL?
You could encapsulate the logic in a function like in 1., but that's optional in this case.
3. Workaround with conversion to Postgres arrays
SELECT DISTINCT t2.line AS t2_line, t2.criteria, c.*
FROM table2 t2
CROSS JOIN (SELECT line, ARRAY [a,b,c,d,e,f,g,h] AS arr FROM table1 t) t1
CROSS JOIN LATERAL (
SELECT string_agg(t1.arr[idx]::text, ',') AS result
FROM unnest(string_to_array(translate(t2.criteria, 'abcdefgh', '12345678'), ',')::int[]) idx
) c
ORDER BY t2_line;
Similar to the "trick" with JSON, we can avoid dynamic SQL by converting columns to a plain Postgres array. Then project column names to integer array indices. I use translate() for the simple case, but that only works for single letters! Use replace() or regexp_replace() or some other method for longer names.
The rest is like the above.
fiddle - showing all.

Include data in a table looking in every insert if there is a match with the table values

I need to insert data from one table into another, but this insert must look into the table which receives data to determine if there is a match or not, and if it is, don't insert new data.
So, i have the next tables (NODE_ID refers to values in NODE1 and NODE2, think about lines with two nodes everyone):
Table A:
| ARC | NODE1 | NODE2 | STATE |
| x | 1 | 2 | A |
| y | 2 | 3 | A |
| z | 3 | 4 | B |
Table B:
| NODE_ID| VALUE |
| 1 | N |
| 2 | N |
| 3 | N |
| 4 | N |
And want the next result, that relates NODE_ID with ARCS and write in the result table the value of STATE from ARCS table, only one result for each NODE, because if not, i would have more than one row for the same NODE:
Table C result:
| NODE_ID| STATE |
| 1 | A |
| 2 | A |
| 3 |A(or B)|
I tried to do this with CASE statement with EXISTS, IF , and NVL2() and so on in the select but have no result at this time.
Any idea about how could i write this query?
Thank you very much for your help
Ok guys, i edit my message to explain how i did it finally, i've also changed a little bit my first message to make it more clear to undestand because we had problems with that.
So finally i used this query, that #mathguy introduced to me:
merge into Table_C c
using (select distinct b.NODE_ID as nodes, a.STATE
from Table_A a, Table_B b
where (b.NODE_ID=a.NODE1 or b.NODE_ID=a.NODE2) s
on (s.nodes=c.NODE_ID)
when not matched then
insert (NODE_ID, STATE)
values (s.nodes, s.STATE)
That's all
This can be done with insert, but often when you update one table with values from another, the merge statement is more powerful (more flexible).
merge into table_c c
using ( select arc, min(state) as state from table_a group by arc ) s
on (s.arc = c.node_id)
when not matched then insert (node_id, state)
values (s.arc, s.state)
;
Thanks to #Boneist and #ThorstenKettner for pointing out several syntax errors (now fixed).
If table C does not yet exist, use a create select statement:
create table c as select arc as node_id, state from a;
In case there can be duplicate arc (not shown in your sample) you'd need aggregation:
create table c as select arc as node_id, min(state) as state from a group by arc;

Join tables with unknown number of rows without repeating column that is joined by

Here's my quandary:
I need to join all columns of two tables based on a primary key, but I don't want to repeat the primary key in the results.
The second table has the primary key and then unknown number and names of columns.
So essentially I want
SELECT * (except for b.PK) FROM
TableA a
JOIN TableB b ON a.PK = b.PK
The obvious solution would be to select all columns explicitly from table a except for a.PK, but let's say that I don't know the number or names of columns in table a either (except I know it has the PK).
So to sum:
How do I join two tables by their PKs, where I don't know the rest of their columns explicitly, and without repeating the PK in the results?
EDIT: (Using T-SQL with SQL Server)
Something like SELECT * except column foo FROM ... doesn't exist. But you can use a natural join, which eliminates redundant columns. You haven't mentioned your RDBMS, so here's an explanation from the MySQL manual. A natural join is standard SQL though.
The columns of a NATURAL join or a USING join may be different from
previously. Specifically, redundant output columns no longer appear,
and the order of columns for SELECT * expansion may be different from
before.
Consider this set of statements:
CREATE TABLE t1 (i INT, j INT);
CREATE TABLE t2 (k INT, j INT);
INSERT INTO t1 VALUES(1,1);
INSERT INTO t2 VALUES(1,1);
SELECT * FROM t1 NATURAL JOIN t2;
SELECT * FROM t1 JOIN t2 USING (j);
Previously, the statements produced this output:
+------+------+------+------+
| i | j | k | j |
+------+------+------+------+
| 1 | 1 | 1 | 1 |
+------+------+------+------+
+------+------+------+------+
| i | j | k | j |
+------+------+------+------+
| 1 | 1 | 1 | 1 |
+------+------+------+------+
In the first SELECT statement, column j appears in both tables and
thus becomes a join column, so, according to standard SQL, it should
appear only once in the output, not twice. Similarly, in the second
SELECT statement, column j is named in the USING clause and should
appear only once in the output, not twice. But in both cases, the
redundant column is not eliminated. Also, the order of the columns is
not correct according to standard SQL.
Now the statements produce this output:
+------+------+------+
| j | i | k |
+------+------+------+
| 1 | 1 | 1 |
+------+------+------+
+------+------+------+
| j | i | k |
+------+------+------+
| 1 | 1 | 1 |
+------+------+------+
The redundant column is eliminated and the column order is correct
according to standard SQL

Append a zero to value if necessary in SQL statement DB2

I have a complex SQL statement that I need to match up two table based on a join. The the intial part of the complex query has a location number that is stored in a table as a Smallint and the second table has the Store number stored as a CHAR(4). I have been able to cast the smallint to a char(4) like this:
CAST(STR_NBR AS CHAR(4)) AND LOCN_NBR
The issue is that because the Smallint suppresses the leading '0' the join returns null values from the right hand side of the LEFT OUTER JOIN.
Example
Table set A(Smallint) Table Set B (Char(4))
| 96 | | 096 |
| 97 | | 097 |
| 99 | | 099 |
| 100 | <- These return -> | 100 |
| 101 | <- These return -> | 101 |
| 102 | <- These return -> | 102 |
I need to add make it so that they all return, but since it is in a join statement how do you append a zero to the beginning and in certain conditions and not in others?
SELECT RIGHT('0000' || STR_NBR, 4)
FROM TABLE_A
Casting Table B's CHAR to tinyint would work as well:
SELECT ...
FROM TABLE_A A
JOIN TABLE_B B
ON A.num = CAST(B.txt AS TINYINT)
Try LPAD function:
LPAD(col,3,'0' )
I was able to successfully match it out to obtain a 3 digit location number at all times by doing the following:
STR_NBR was originally defined as a SmallINT(2)
LOCN_NO was originally defined as a Char(4)
SELECT ...
FROM TABLE_A AS A
JOIN TABLE_B AS B
ON CAST(SUBSTR(DIGITS(A.STR_NBR),3,3)AS CHAR(4)) = B.LOCN_NO

Deleting similar columns in SQL

In PostgreSQL 8.3, let's say I have a table called widgets with the following:
id | type | count
--------------------
1 | A | 21
2 | A | 29
3 | C | 4
4 | B | 1
5 | C | 4
6 | C | 3
7 | B | 14
I want to remove duplicates based upon the type column, leaving only those with the highest count column value in the table. The final data would look like this:
id | type | count
--------------------
2 | A | 29
3 | C | 4 /* `id` for this record might be '5' depending on your query */
7 | B | 14
I feel like I'm close, but I can't seem to wrap my head around a query that works to get rid of the duplicate columns.
count is a sql reserve word so it'll have to be escaped somehow. I can't remember the syntax for doing that in Postgres off the top of my head so I just surrounded it with square braces (change it if that isn't correct). In any case, the following should theoretically work (but I didn't actually test it):
delete from widgets where id not in (
select max(w2.id) from widgets as w2 inner join
(select max(w1.[count]) as [count], type from widgets as w1 group by w1.type) as sq
on sq.[count]=w2.[count] and sq.type=w2.type group by w2.[count]
);
There is a slightly simpler answer than Asaph's, with EXISTS SQL operator :
DELETE FROM widgets AS a
WHERE EXISTS
(SELECT * FROM widgets AS b
WHERE (a.type = b.type AND b.count > a.count)
OR (b.id > a.id AND a.type = b.type AND b.count = a.count))
EXISTS operator returns TRUE if the following SQL statement returns at least one record.
According to your requirements, seems to me that this should work:
DELETE
FROM widgets
WHERE type NOT IN
(
SELECT type, MAX(count)
FROM widgets
GROUP BY type
)