SQL Multiple Row Insert w/ multiple selects from different tables - sql

I am trying to do a multiple insert based on values that I am pulling from a another table. Basically I need to give all existing users access to a service that previously had access to a different one. Table1 will take the data and run a job to do this.
INSERT INTO Table1 (id, serv_id, clnt_alias_id, serv_cat_rqst_stat)
SELECT
(SELECT Max(id) + 1
FROM Table1 ),
'33', --The new service id
clnt_alias_id,
'PI' --The code to let the job know to grant access
FROM TABLE2,
WHERE serv_id = '11' --The old service id
I am getting a Primary key constraint error on id.
Please help.
Thanks,
Colin

This query is impossible. The max(id) sub-select will evaluate only ONCE and return the same value for all rows in the parent query:
MariaDB [test]> create table foo (x int);
MariaDB [test]> insert into foo values (1), (2), (3);
MariaDB [test]> select *, (select max(x)+1 from foo) from foo;
+------+----------------------------+
| x | (select max(x)+1 from foo) |
+------+----------------------------+
| 1 | 4 |
| 2 | 4 |
| 3 | 4 |
+------+----------------------------+
3 rows in set (0.04 sec)
You will have to run your query multiple times, once for each record you're trying to copy. That way the max(id) will get the ID from the previous query.

Is there a requirement that Table1.id be incremental ints? If not, just add the clnt_alias_id to Max(id). This is a nasty workaround though, and you should really try to get that column's type changed to auto_increment, like Marc B suggested.

Related

Create a procedure to merge data and avoid duplicates

I am trying to create a SQL Server merge procedure that would allow me to merge new entries in the data set and nullify duplicates in the table. Both tables of the same type. I am trying to perform a merge and avoid duplicates. The ID and Email will always be a one to one relation. However, the source table sometimes will send the same email with two different Ids. We want to keep only one record per person and nullify all the email for the invalid record. Initial thoughts are to join the source table with the target table on email and check which emails have two occurrences and nullify, but how could I put this in one procedure.
Table 1 and Table 2:
Id | Email | First | Last | Building | Date |....
Example of duplicate:
1 | tst#tst.com | ...
2 | tst#tst.com | ...
Needed output:
1 | tst#tst.com
2 | null
Procedure:
CREATE PROCEDURE mergingTwo #TableType
AS
BEGIN
MERGE [target]
USING [source] ON [target].Id = [source].Id OR [target].Email = [source].Email
WHEN MATCHED THEN
UPDATE
SET
WHEN NOT MATCHED BY TARGET THEN
INSERT
Can do the Merge First then nullify the e-mail in a Second update like
With cte as (select id, row_number() over (partition by e-mail order by id asc) n_row
From table_foo)
Update table_foo
Set email = null
From table_foo
Inner Join cte
On cte.id = table_foo.id
And cte.n_row > 1
Sounds like a job for a union (unless you really want those NULL entries).
SELECT Email FROM Table1
UNION
SELECT Email FROM Table2
;

SQL Insert only the rows that have data in a specific column

I am basically a noob at this and have gotten this far from Google searches alone. Access VBA and SQL inventory database.
I have a table that I populate by a barcode scanner that looks like the following;
PartNo | SerialNo | Qty | Vehicle
-------+----------+-----+---------
test | | 1 | H2
test2 | | 1 | H2
test3 | test3s/n | 1 | H2
test3 | test4s/n | 1 | H2
test | | 1 | H2
I am trying to update 2 tables from this, or insert if the PartNo doesn't exist.
tblPerm2 has PartNo as primary key
tblPerm1 has PartNo, SerialNo, Qty and Vehicle
PartNo must exist in tblPerm2 to be added to tblPerm1
I can get the PartNo inserted into tblPerm2 no problem, but I'm running into problems with tblPerm1.
I'm following user Parfait's example here, Update Existing Access Records from CSV Import , native to MS Access or in VB.NET
I've tried an Insert and and insert with a join. The code below adds everything to tblPerm1, including rows with no SerialNo. How can I insert only the rows from tblTemp that have a serial number?
INSERT INTO tblPerm1 (PartNo, SerialNo, Qty, Vehicle)
SELECT tblTemp.PartNo, tblTemp.SerialNo, tblTemp.Qty, tblTemp.Vehicle
FROM tblTemp
WHERE tblTemp.SerialNo IS NOT NULL;
I expect this to only insert the 2 'test3' rows, but all rows are inserted.
SELECT DISTINCT is the same, but only one entry for 'test'
Once this is done, I'll delete from tblTemp and continue on updating and inserting. Maybe there is a better way?
Thanks in advance
Are the SerialNo columns actually empty strings instead of NULL?
If this works, then yes they are:
INSERT INTO tblPerm1 (PartNo, SerialNo, Qty, Vehicle)
SELECT tblTemp.PartNo, tblTemp.SerialNo, tblTemp.Qty, tblTemp.Vehicle
FROM tblTemp
WHERE tblTemp.SerialNo <> '';
See How to check for Is not Null And Is not Empty string in SQL server? for more on checking for empty strings, with or without counting whitespace (though details may vary depending on what SQL server you are running).

Return rows of a table that actually changed in an UPDATE

Using Postgres, I can perform an update statement and return the rows affected by the commend.
UPDATE accounts
SET status = merge_accounts.status,
field1 = merge_accounts.field1,
field2 = merge_accounts.field2,
etc.
FROM merge_accounts WHERE merge_accounts.uid =accounts.uid
RETURNING accounts.*
This will give me a list of all records that matched the WHERE clause, however will not tell me which rows were actually updated by the operation.
In this simplified use-case it of course would be trivial to simply add another guard AND status != 'Closed, however my real world use-case involves updating potentially dozens of fields from a merge table with 10,000+ rows, and I want to be able to detect which rows were actually changed, and which are identical to their previous version. (The expectation is very few rows will actually have changed).
The best I've got so far is
UPDATE accounts
SET x=..., y=...
FROM accounts as old WHERE old.uid = accounts.uid
FROM merge_accounts WHERE merge_accounts.uid = accounts.uid
RETURNING accounts, old
Which will return a tuple of old and new rows that can then be diff'ed inside my Java codebase itself - however this requires significant additional network traffic and is potentially error prone.
The ideal scenario is to be able to have postgres return just the rows that actually had any values changed - is this possible?
Here on github is a more real world example of what I'm doing, incorporating some of the suggestions so far.
Using Postgres 9.1, but can use 9.4 if required. The requirements are effectively
Be able to perform an upsert of new data
Where we may only know the specific key/value pair to update on any given row
Get back a result containing just the rows that were actually changed by the upsert
Bonus - get a copy of the old records as well.
Since this question was opened I've gotten most of this working now, although I'm unsure if my approach is a good idea or not - it's a bit hacked together.
Only update rows that actually change
That saves expensive updates and expensive checks after the UPDATE.
To update every column with the new value provided (if anything changes):
UPDATE accounts a
SET (status, field1, field2) -- short syntax for ..
= (m.status, m.field1, m.field2) -- .. updating multiple columns
FROM merge_accounts m
WHERE m.uid = a.uid
AND (a.status IS DISTINCT FROM m.status OR
a.field1 IS DISTINCT FROM m.field1 OR
a.field2 IS DISTINCT FROM m.field2)
RETURNING a.*;
Due to PostgreSQL's MVCC model any change to a row writes a new row version. Updating a single column is almost as expensive as updating every column in the row at once. Rewriting the rest of the row comes at practically no cost, as soon as you have to update anything.
Details:
How do I (or can I) SELECT DISTINCT on multiple columns?
UPDATE a whole row in PL/pgSQL
Shorthand for whole rows
If the row types of accounts and merge_accounts are identical and you want to adopt everything from merge_accounts into accounts, there is a shortcut comparing the whole row type:
UPDATE accounts a
SET (status, field1, field2)
= (m.status, m.field1, m.field2)
FROM merge_accounts m
WHERE a.uid = m.uid
AND m IS DISTINCT FROM a
RETURNING a.*;
This even works for NULL values. Details in the manual.
But it's not going to work for your home-grown solution where (quoting your comment):
merge_accounts is identical, save that all non-pk columns are array types
It requires compatible row types, i.e. each column shares the same data type or there is at least an implicit cast between the two types.
For your special case
UPDATE accounts a
SET (status, field1, field2)
= (COALESCE(m.status[1], a.status) -- default to original ..
, COALESCE(m.field1[1], a.field1) -- .. if m.column[1] IS NULL
, COALESCE(m.field2[1], a.field2))
FROM merge_accounts m
WHERE m.uid = a.uid
AND (m.status[1] IS NOT NULL AND a.status IS DISTINCT FROM m.status[1]
OR m.field1[1] IS NOT NULL AND a.field1 IS DISTINCT FROM m.field1[1]
OR m.field2[1] IS NOT NULL AND a.field2 IS DISTINCT FROM m.field2[1])
RETURNING a.*
m.status IS NOT NULL works if columns that shouldn't be updated are NULL in merge_accounts.
m.status <> '{}' if you operate with empty arrays.
m.status[1] IS NOT NULL covers both options.
Related:
Return pre-UPDATE column values using SQL only
if you aren't relying on side-effectts of the update, only update the records that need to change
UPDATE accounts
SET status = merge_accounts.status,
field1 = merge_accounts.field1,
field2 = merge_accounts.field2,
etc.
FROM merge_accounts WHERE merge_accounts.uid =accounts.uid
AND NOT (status IS NOT DISTINCT FROM merge_accounts.status
AND field1 IS NOT DISTINCT FROM merge_accounts.field1
AND field2 IS NOT DISTINCT FROM merge_accounts.field2
)
RETURNING accounts.*
I would recommend using the information_schema.columns table to introspect the columns dynamically, and then use those within a plpgsql function to dynamically generate the UPDATE statement.
i.e. this DDL:
create table foo
(
id serial,
val integer,
name text
);
insert into foo (val, name) VALUES (10, 'foo'), (20, 'bar'), (30, 'baz');
And this query:
select column_name
from information_schema.columns
where table_name = 'foo'
order by ordinal_position;
will yield the columns for the table in the order that they were defined in the table DDL.
Essentially you would use the above SELECT within the function to dynamically build up your UPDATE statement by iterating over the results of the above SELECT in a FOR LOOP to dynamically build up both the SET and WHERE clauses.
Some variation of this ?
SELECT * FROM old;
id | val
----+-----
1 | 1
2 | 2
4 | 5
5 | 1
6 | 2
SELECT * FROM new;
id | val
----+-----
1 | 2
2 | 2
3 | 2
5 | 1
6 | 1
SELECT * FROM old JOIN new ON old.id = new.id;
id | val | id | val
----+-----+----+-----
1 | 1 | 1 | 2
2 | 2 | 2 | 2
5 | 1 | 5 | 1
6 | 2 | 6 | 1
(4 rows)
WITH sel AS (
SELECT o.id , o.val FROM old o JOIN new n ON o.id=n.id ),
upd AS (
UPDATE old SET val = new.val FROM new WHERE new.id=old.id RETURNING old.* )
SELECT * from sel, upd WHERE sel.id = upd.id AND sel.val <> upd.val;
id | val | id | val
----+-----+----+-----
1 | 1 | 1 | 2
6 | 2 | 6 | 1
(2 rows)
Refer SO answer and read the entire discussion.
If you are updating a single table and want to know if the row is actually changed you can use this query:
with rows_affected as (
update mytable set (field1, field2, field3)=('value1', 'value2', 3) where id=1 returning *
)
select count(*)>0 as is_modified from rows_affected
join mytable on mytable.id=rows_affected.id
where rows_affected is distinct from mytable;
And you can wrap your existing queries into this one without the need to modify the actual update statements.

SQL How to insert a new row in the middle of the table

Here is my problem:
I want to insert a new row in my table but there is already some registers in it. If I need to put this new row at the same row that a already register is, what should I do?
For example:
I have this table with this rows:
ID|Value
1 |Sample1
2 |Sample2
3 |Sample3
But now I want to insert a new row where Sample2 is, so the table should be like:
ID|Value
1 |Sample1
2 |NewSample
3 |Sample2
4 |Sample3
Any thoughts?
Any thoughts?
Yes. Please forget about changing the primary key (the ID) if you have references somewhere.
Rather add a column (e.g. ViewOrder) which is handling this explicitly for you:
ID|Value | ViewOrder
1 |Sample1 |1
5 |NewSample |2
2 |Sample2 |3
3 |Sample3 |4
Query to select:
SELECT ID, Value, ViewOrder FROM yourTable ORDER BY ViewORDER
Insert / Update would look something like this (whereas YourRowIndex is the index where you wish to insert your new row, of course):
UPDATE dbo.table SET VIEWORDER = VIEWORDER + 1 WHERE VIEWORDER >= #YourRowIndex ;
SET IDENTITY_INSERT dbo.table ON
INSERT dbo.table (Value, ViewOrder) VALUES (#YourValue, #YourRowIndex);
The easy way is to add a new column -- set it to the same value as ID and then you have two choices, if you make it numeric you can just add a value in between
ID | Value | OrderCol
1 | Sample1 | 1
4 | NewSample | 1.5
2 | Sample2 | 2
3 | Sample3 | 3
your other option is to renumber order -- which can be slow if you have a lot of stuff in the table.
You probably don't want to change ID since there might be an external table which references this identifier.
In SQL Server, the basic approach would be:
DECLARE #value VARCHAR(32), #ID INT = 2;
UPDATE dbo.table SET ID = ID + 1 WHERE ID >= 2;
INSERT dbo.table (ID, Value) SELECT #ID, #Value;
But keep in mind that if these values are referenced in other tables, or end users know what ID = 3 currently refers to, this is going to mess all that up (or not be possible).
Also an important thing to remember is that, by definition, a table is an unordered set of rows - there is no "middle" of a table.

Is it possible to use a PG sequence on a per record label?

Does PostgreSQL 9.2+ provide any functionality to make it possible to generate a sequence that is namespaced to a particular value? For example:
.. | user_id | seq_id | body | ...
----------------------------------
- | 4 | 1 | "abc...."
- | 4 | 2 | "def...."
- | 5 | 1 | "ghi...."
- | 5 | 2 | "xyz...."
- | 5 | 3 | "123...."
This would be useful to generate custom urls for the user:
domain.me/username_4/posts/1
domain.me/username_4/posts/2
domain.me/username_5/posts/1
domain.me/username_5/posts/2
domain.me/username_5/posts/3
I did not find anything in the PG docs (regarding sequence and sequence functions) to do this. Are sub-queries in the INSERT statement or with custom PG functions the only other options?
You can use a subquery in the INSERT statement like #Clodoaldo demonstrates. However, this defeats the nature of a sequence as being safe to use in concurrent transactions, it will result in race conditions and eventually duplicate key violations.
You should rather rethink your approach. Just one plain sequence for your table and combine it with user_id to get the sort order you want.
You can always generate the custom URLs with the desired numbers using row_number() with a simple query like:
SELECT format('domain.me/username_%s/posts/%s'
, user_id
, row_number() OVER (PARTITION BY user_id ORDER BY seq_id)
)
FROM tbl;
db<>fiddle here
Old sqlfiddle
Maybe this answer is a little off-piste, but I would consider partitioning the data and giving each user their own partitioned table for posts.
There's a bit of overhead to the setup as you will need triggers for managing the DDL statements for the partitions, but would effectively result in each user having their own table of posts, along with their own sequence with the benefit of being able to treat all posts as one big table also.
General gist of the concept...
psql# CREATE TABLE posts (user_id integer, seq_id integer);
CREATE TABLE
psql# CREATE TABLE posts_001 (seq_id serial) INHERITS (posts);
CREATE TABLE
psql# CREATE TABLE posts_002 (seq_id serial) INHERITS (posts);
CREATE TABLE
psql# INSERT INTO posts_001 VALUES (1);
INSERT 0 1
psql# INSERT INTO posts_001 VALUES (1);
INSERT 0 1
psql# INSERT INTO posts_002 VALUES (2);
INSERT 0 1
psql# INSERT INTO posts_002 VALUES (2);
INSERT 0 1
psql# select * from posts;
user_id | seq_id
---------+--------
1 | 1
1 | 2
2 | 1
2 | 2
(4 rows)
I left out some rather important CHECK constraints in the above setup, make sure you read the docs for how these kinds of setups are used
insert into t values (user_id, seq_id) values
(4, (select coalesce(max(seq_id), 0) + 1 from t where user_id = 4))
Check for a duplicate primary key error in the front end and retry if needed.
Update
Although #Erwin advice is sensible, that is, a single sequence with the ordering in the select query, it can be expensive.
If you don't use a sequence there is no defeat of the nature of the sequence. Also it will not result in a duplicate key violation. To demonstrate it I created a table and made a python script to insert into it. I launched 3 parallel instances of the script inserting as fast as possible. And it just works.
The table must have a primary key on those columns:
create table t (
user_id int,
seq_id int,
primary key (user_id, seq_id)
);
The python script:
#!/usr/bin/env python
import psycopg2, psycopg2.extensions
query = """
begin;
insert into t (user_id, seq_id) values
(4, (select coalesce(max(seq_id), 0) + 1 from t where user_id = 4));
commit;
"""
conn = psycopg2.connect('dbname=cpn user=cpn')
conn.set_isolation_level(psycopg2.extensions.ISOLATION_LEVEL_SERIALIZABLE)
cursor = conn.cursor()
for i in range(0, 1000):
while True:
try:
cursor.execute(query)
break
except psycopg2.IntegrityError, e:
print e.pgerror
cursor.execute("rollback;")
cursor.close()
conn.close()
After the parallel run:
select count(*), max(seq_id) from t;
count | max
-------+------
3000 | 3000
Just as expected. I developed at least two applications using that logic and one of then is more than 13 years old and never failed. I concede that if you are Facebook or some other giant then you could have a problem.
Yes:
CREATE TABLE your_table
(
column type DEFAULT NEXTVAL(sequence_name),
...
);
More details here:
http://www.postgresql.org/docs/9.2/static/ddl-default.html