Include data in a table looking in every insert if there is a match with the table values - sql

I need to insert data from one table into another, but this insert must look into the table which receives data to determine if there is a match or not, and if it is, don't insert new data.
So, i have the next tables (NODE_ID refers to values in NODE1 and NODE2, think about lines with two nodes everyone):
Table A:
| ARC | NODE1 | NODE2 | STATE |
| x | 1 | 2 | A |
| y | 2 | 3 | A |
| z | 3 | 4 | B |
Table B:
| NODE_ID| VALUE |
| 1 | N |
| 2 | N |
| 3 | N |
| 4 | N |
And want the next result, that relates NODE_ID with ARCS and write in the result table the value of STATE from ARCS table, only one result for each NODE, because if not, i would have more than one row for the same NODE:
Table C result:
| NODE_ID| STATE |
| 1 | A |
| 2 | A |
| 3 |A(or B)|
I tried to do this with CASE statement with EXISTS, IF , and NVL2() and so on in the select but have no result at this time.
Any idea about how could i write this query?
Thank you very much for your help
Ok guys, i edit my message to explain how i did it finally, i've also changed a little bit my first message to make it more clear to undestand because we had problems with that.
So finally i used this query, that #mathguy introduced to me:
merge into Table_C c
using (select distinct b.NODE_ID as nodes, a.STATE
from Table_A a, Table_B b
where (b.NODE_ID=a.NODE1 or b.NODE_ID=a.NODE2) s
on (s.nodes=c.NODE_ID)
when not matched then
insert (NODE_ID, STATE)
values (s.nodes, s.STATE)
That's all

This can be done with insert, but often when you update one table with values from another, the merge statement is more powerful (more flexible).
merge into table_c c
using ( select arc, min(state) as state from table_a group by arc ) s
on (s.arc = c.node_id)
when not matched then insert (node_id, state)
values (s.arc, s.state)
;
Thanks to #Boneist and #ThorstenKettner for pointing out several syntax errors (now fixed).

If table C does not yet exist, use a create select statement:
create table c as select arc as node_id, state from a;
In case there can be duplicate arc (not shown in your sample) you'd need aggregation:
create table c as select arc as node_id, min(state) as state from a group by arc;

Related

Deleting duplicate rows with primary keys that are connected to other tables

A process was causing duplicate rows in a table where there were not supposed to be any. There are several great answers to deleting duplicate rows online. But, what if those duplicates with ID primary keys all have data in other tables tied to them?
Is there a way to delete all duplicates in the first table and migrate all data tied to those keys to the single PK ID that wasn't deleted?
For example:
TABLE 1
+-------+----------+----------+------------+
| ID(PK)| Model | ItemType | Color |
+-------+----------+----------+------------+
| 1 | 4 | B | Red |
| 2 | 4 | B | Red |
| 3 | 5 | A | Blue |
+-------+----------+----------+------------+
TABLE 2
+-------+----------+---------+
| ID(PK)| OtherID | Type |
+-------+----------+---------+
| 1 | 1 | Type1 |
| 2 | 1 | Type2 |
| 3 | 2 | Type3 |
| 4 | 2 | Type4 |
| 5 | 2 | Type5 |
+-------+----------+---------+
So I would theoretically want to delete the entry with ID: 2 from TABLE 1, and then have the OtherID fields in TABLE 2 switch to 1. This would actually be needed for X number of tables. This particular situation has 4 tables connected to its ID PK.
You cannot do this automatically. But you can do this with some queries. First, you set all the foreign keys to the correct id, which is presumably the smallest one:
with ids (
select t1.*, min(id) over (partition by Model, ItemType, Color) as min_id
from table1 t1
)
update t2
set t2.otherid = ids.min_id
from table2 t2 join
ids
on t2.otherid = ids.id
where ids.id <> ids.min_id;
Then delete the ids that are either duplicated or not referenced in table2 (depending on which you actually want):
with ids (
select t1.*, min(id) over (partition by Model, ItemType, Color) as min_id
from table1 t1
)
delete from ids
where id <> min_id;
Note: If the database has concurrent users, you might want to put it in single user mode for this operation or lock the tables so they are not modified during these two operations.
To do this right, you want to wrap everything in a single transaction and perform this during a regular maintenance period. Anything else could leave things as inconsistent as they are now.
Make a determination as to which "key" you will use.
Update all of the child tables to use the new "key" where the value is the old "key".
There should be no FK dependencies on the duplicate records, delete them.
Once all ambiguities are resolved, place an unique constraint on (ItemType,Color) (or whatever the real columns are).
If there are a lot of instances, you may need to write a script to handle this and use the information in sys.foreign_keys and sys.foreign_key_columns to determine which records to update and in which order.

How to query sum total of transitively linked child transactions from database?

I got this one assignment which has a lot of weird stuff to do. I need to create an API for storing transaction details and do some operations. One such operation involves retrieving a sum of all transactions that are transitively linked by their parent_id to $transaction_id.
If A is the parent of B and C, and C is the parent of D and E, then
sum(A) = A + B + C + D + E
note: not just immediate child transactions.
I have this sample data in the SQL database as given below.
MariaDB [test_db]> SELECT * FROM transactions;
+------+-------+----------+---------+
| t_id | t_pid | t_amount | t_type |
+------+-------+----------+---------+
| 1 | NULL | 10000.00 | default |
| 2 | NULL | 25000.00 | cars |
| 3 | 1 | 30000.00 | bikes |
| 4 | NULL | 10000.00 | bikes |
| 5 | 3 | 15000.00 | bikes |
+------+-------+----------+---------+
5 rows in set (0.000 sec)
MariaDB [test_db]>
where t_id is a unique transaction_id and t_pid is a parent_id which is either null or an existing t_id.
so, when I say sum(t_amount) where t_id=1, I want the result to be
sum(1+3+5) -> sum(10000 + 30000 + 15000) = 55000.
I know I can achieve this in a programmatic way with some recursion which will do repeated query operations and add the sum. But, that will give me poor performance if the data is very large say, millions of records.
I want to know if there is any possibility of achieving this with a complex query. And if yes, then how to do it?
I have very little knowledge and experience with databases. I tried with what I know and I couldn't do it. I tried searching for any similar queries available here and I didn't find any.
With what I have researched, I guess I can achieve this with stored procedures and using the HAVING clause. Let me know if I am right there and help me do this.
So, any sort of help will be appreciated.
Thanks in advance.
You need a recursive CTE:
with recursive cte as (
select t_id as ultimate_id, t_id, t_amount
from tranctions t
where t_id = 1
union all
select cte.ultimate_id, t.t_id, t.amount
from cte join
transactions tc
on tc.p_id = cte.t_id
)
select ultimate_id, sum(t_amount)
from cte
group by ultimate_id;

SQL Postgres Invalidate Rows that reference invalid Id's

I am trying to create a stored procedure that will invalidate rows that contain id references of an id in another table. The catch is that the rows to be invalidated contain groupings of these id's stored as a comma separated string. Let's take a look at the tables:
table_a table_b
+----+------+ +---------+-------+
| id | name | | ids | valid |
+----+------+ +---------+-------+
| 1 | a | | 1,2,3 | T |
| 2 | b | | 4,3,8 | T |
| 3 | c | | 5,2,5,4 | T |
| 4 | d | | 7 | T |
| 5 | e | | 6,8 | T |
| 6 | f | | 9,7,2 | T |
| 7 | g | +---------+-------+
| 8 | h |
+----+------+
Above you can see that table_b contains groupings of ids from table_a and as you can imagine the table_a.id is an integer while table_b.ids is text. The goal is to look at each table_b.ids and if it contains an id that does not exist in table_a.id then set its validity to false.
I have not worked with any SQL in quite sometime and I have never worked with PostgreSQL which is why I am having such difficulty. The closest query I could come up with, is not working, but is along the lines of:
CREATE FUNCTION cleanup_records() AS $func$
BEGIN
UPDATE table_b
SET valid = FALSE
WHERE COUNT(
SELECT regexp_split_to_table(table_b.ids)
EXCEPT SELECT id FROM table_a
) > 0;
END;
$func$ LANGUAGE PLPGSQL;
The general idea is that I am trying to turn each row of table_b.ids into a table and then using the EXCEPT operator against table_a to see if it has any ids that are invalid. The error I receive is:
ERROR: syntax error at or near "SELECT"
LINE 1: ...able_b SET valid = FALSE WHERE COUNT(SELECT reg...
which is not very helpful as it just indicates that I do not have the correct syntax. Is this query viable? If so can you show me where I may have gone wrong - if not is there an easier or even more complicated way to achieve this?
Sample data:
CREATE TABLE table_b
(ids text, valid boolean);
INSERT INTO table_b
(ids, valid)
VALUES
('1,2,3' , 'T'),
('4,3,8' , 'T'),
('5,2,5,4' , 'T'),
('7' , 'T'),
('6,8' , 'T'),
('9,7,2' , 'T');
CREATE TABLE table_a
(id integer, name text);
INSERT INTO table_a
(id, name)
VALUES
(1,'a'),
(2,'b'),
(3,'c'),
(4,'d'),
(5,'e'),
(6,'f'),
(7,'g'),
(8,'h');
UPDATE table_b
SET valid = FALSE
WHERE EXISTS(
SELECT regexp_split_to_table(table_b.ids)
EXCEPT SELECT id FROM table_a
);
You can use 'exists' to check for the existence of a row. The previous syntax was incorrect as count can't be used that way.
groupings of these id's stored as a comma separated string
Don't do that. It's really bad database design, and is why you're having problems. See:
Is using multiple foreign keys separated by commas wrong, and if so, why?
PostgreSQL list of integers separated by comma or integer array for performance?
Also, there's a more efficient way to do your query than that shown by vkp. If you do it that way, you're splitting the string for every ID you're testing. There is no need to do that. Instead, join on a table of expanded ID lists.
Something like:
UPDATE table_b
SET valid = 'f'
FROM table_b b
CROSS JOIN regexp_split_to_table(b.ids, ',') b_ids(id)
LEFT JOIN table_a a ON (a.id = b_ids.id::integer)
WHERE table_b.ids = b.ids
AND a.id IS NULL
AND table_b.valid = 't';
You need to join on table_b even though it's the update target because you can't make a lateral function reference to the update target table directly.

SQL Query: Search with list of tuples

I have a following table (simplified version) in SQLServer.
Table Events
-----------------------------------------------------------
| Room | User | Entered | Exited |
-----------------------------------------------------------
| A | Jim | 2014-10-10T09:00:00 | 2014-10-10T09:10:00 |
| B | Jim | 2014-10-10T09:11:00 | 2014-10-10T09:22:30 |
| A | Jill | 2014-10-10T09:00:00 | NULL |
| C | Jack | 2014-10-10T09:45:00 | 2014-10-10T10:00:00 |
| A | Jack | 2014-10-10T10:01:00 | NULL |
.
.
.
I need to create a query that returns person's whereabouts in given timestamps.
For an example: Where was (Jim at 2014-10-09T09:05:00), (Jim at 2014-10-10T09:01:00), (Jill at 2014-10-10T09:10:00), ...
The result set must contain the given User and Timestamp as well as the found room (if any).
------------------------------------------
| User | Timestamp | WasInRoom |
------------------------------------------
| Jim | 2014-10-09T09:05:00 | NULL |
| Jim | 2014-10-09T09:01:00 | A |
| Jim | 2014-10-10T09:10:00 | A |
The number of User-Timestamp tuples can be > 10 000.
The current implementation retrieves all records from Events table and does the search in Java code. I am hoping that I could push this logic to SQL. But how?
I am using MyBatis framework to create SQL queries so the tuples can be inlined to the query.
The basic query is:
select e.*
from events e
where e.user = 'Jim' and '2014-10-09T09:05:00' >= e.entered and ('2014-10-09T09:05:00' <= e.exited or e.exited is NULL) or
e.user = 'Jill' and '2014-10-10T09:10:00 >= e.entered and ('2014-10-10T09:10:00' <= e.exited or e.exited is NULL) or
. . .;
SQL Server can handle ridiculously large queries, so you can continue in this vein. However, if you have the name/time values in a table already (or it is the result of a query), then use a join:
select ut.*, t.*
from usertimes ut left join
events e
on e.user = ut.user and
ut.thetime >= et.entered and (ut.thetime <= exited or ut.exited is null);
Note the use of a left join here. It ensures that all the original rows are in the result set, even when there are no matches.
Answers from Jonas and Gordon got me on track, I think.
Here is query that seems to do the job:
CREATE TABLE #SEARCH_PARAMETERS(User VARCHAR(16), "Timestamp" DATETIME)
INSERT INTO #SEARCH_PARAMETERS(User, "Timestamp")
VALUES
('Jim', '2014-10-09T09:05:00'),
('Jim', '2014-10-10T09:01:00'),
('Jill', '2014-10-10T09:10:00')
SELECT #SEARCH_PARAMETERS.*, Events.Room FROM #SEARCH_PARAMETERS
LEFT JOIN Events
ON #SEARCH_PARAMETERS.User = Events.User AND
#SEARCH_PARAMETERS."Timestamp" > Events.Entered AND
(Events.Exited IS NULL OR Events.Exited > #SEARCH_PARAMETERS."Timestamp"
DROP TABLE #SEARCH_PARAMETERS
By declaring a table valued parameter type for the (user, timestamp) tuples, it should be simple to write a table valued user defined function which returns the desired result by joining the parameter table and the Events table. See http://msdn.microsoft.com/en-us/library/bb510489.aspx
Since you are using MyBatis it may be easier to just generate a table variable for the tuples inline in the query and join with that.

Removing entries from a table where it's values already exist in the table

I'm starting with this example table (#temp2):
| a | b |
|---|---|
| 2 | 4 |
| 2 | 5 | x
| 3 | 1 |
| 6 | 4 | x
| 6 | 5 |
| 7 | 5 | x
| 7 | 4 | x
|---|---|
This is a table of transaction keys that I want to be deleted from another existing table. It represents transactions that negate other transactions, where a negates b or vice-versa. So I cannot have a single a negating multiple b or a single b negating multiple a. I have some logic that I thought would do it but there is a problem. With my existing logic it will look to see if a or b is a duplicate and delete it if it is. The problem is, if I want a row to be deleted I would like it to 'free' up the value that wasn't the reason for deletion. I hope this isn't too confusing. But from my example I put x's next to each row I want deleted. Currently my algorithm is deleting too many rows. The row (6,5) gets deleted because there already exists a '5' in the second row, but that row is getting deleted (since '2' can't negate '4' and '5') so this 'frees up' the '5' to negate entry 6.
This is my current code, but it is deleting too many rows:
delete t
from #temp2 t
where exists(select * from #temp2
where b = t.b
and a < t.a)
or exists(select * from #temp2
where a = t.a
and b < t.b)
Any help is much appreciated!