How to insert generated id into a results table - sql

I have the following query
SELECT q.pol_id
FROM quot q
,fgn_clm_hist fch
WHERE q.quot_id = fch.quot_id
UNION
SELECT q.pol_id
FROM tdb2wccu.quot q
WHERE q.nr_prr_ls_yr_cov IS NOT NULL
For every row in that result set, I want to create a new row in another table (call it table1) and update pol_id in the quot table (from the above result set) with the generated primary key from the inserted row in table1.
table1 has two columns. id and timestamp.
I'm using db2 10.1.
I've tried numerous things and have been unsuccessful for quite a while. Thanks!

Simple solution: create a new table for the result set of your query, which has an identity column in it. Then, after running your query, update the pol_id field with the newly generated ID in your result table.
Alteratively, you can do it more manually by using the the ROW_NUMBER() OLAP function, which I often found convenient for creating IDs. For this it is convenient to use a stored procedure which does the following:
get the maximum old id from Table1 and write it into a variable old_max_id.
after generating the result set, write the row-numbers into the table1, maybe by something like
INSERT INTO TABLE1
SELECT ROW_NUMBER() OVER (PARTITION BY <primary-key> ORDER BY <whatever-you-want>)
+ OLD_MAX_ID
, CURRENT TIMESTAMP
FROM (<here comes your SQL query>)
Either write the result set into a table or return a cursor to it. Here you should either use the same ROW_NUMBER statement as above or directly use the ID from Table1.

Related

Repeating Max(timestamp) value for all the duplicate ID rows in bigquery

This might be simple, but I am having troubles resolving this issue
I have a table with following columns and data :
There are multiple entries for an id with different updateTimestamp, I want to identify the max timestamp and repeat that value for all the duplicate ids in the table. Also, the table is large and I do not want to query it multiple times(process is complex).
Here is what I am expecting the output to look like
You need to find maximum of update timestamp for each ID and update the table using ID column. Something like
update <table_name> tgt
set tgt.max_updatetimestamp = src.maxstamp
from (select id, max(updatetimestamp) as maxstamp from <tablename> group by 1) src
where tgt.id = src.id;

Merge update records in a final table

I have a user table in Hive of the form:
User:
Id String,
Name String,
Col1 String,
UpdateTimestamp Timestamp
I'm inserting data in this table from a file which has the following format:
I/U,Timestamp when record was written to file, Id, Name, Col1, UpdateTimestamp
e.g. for inserting a user with Id 1:
I,2019-08-21 14:18:41.002947,1,Bob,stuff,123456
and updating col1 for the same user with Id 1:
U,2019-08-21 14:18:45.000000,1,,updatedstuff,123457
The columns which are not updated are returned as null.
Now simple insertion is easy in hive using load in path in a staging table and then ignoring the first two fields from the stage table.
However, how would I go about the update statements? So that my final row in hive looks like below:
1,Bob,updatedstuff,123457
I was thinking to insert all rows in a staging table and then perform some sort of merge query. Any ideas?
Typically with a merge statement your "file" would still be unique on ID and the merge statement would determine whether it needs to insert this as a new record, or update values from that record.
However, if the file is non-negotiable and will always have the I/U format, you could break the process up into two steps, the insert, then the updates, as you suggested.
In order to perform updates in Hive, you will need the users table to be stored as ORC and have ACID enabled on your cluster. For my example, I would create the users table with a cluster key, and the transactional table property:
create table test.orc_acid_example_users
(
id int
,name string
,col1 string
,updatetimestamp timestamp
)
clustered by (id) into 5 buckets
stored as ORC
tblproperties('transactional'='true');
After your insert statements, your Bob record would say "stuff" in col1:
As far as the updates - you could tackle these with an update or merge statement. I think the key here is the null values. It's important to keep the original name, or col1, or whatever, if the staging table from the file has a null value. Here's a merge example which coalesces the staging tables fields. Basically, if there is a value in the staging table, take that, or else fall back to the original value.
merge into test.orc_acid_example_users as t
using test.orc_acid_example_staging as s
on t.id = s.id
and s.type = 'U'
when matched
then update set name = coalesce(s.name,t.name), col1 = coalesce(s.col1, t.col1)
Now Bob will show "updatedstuff"
Quick disclaimer - if you have more than one update for Bob in the staging table, things will get messy. You will need to have a pre-processing step to get the latest non-null values of all the updates prior to doing the update/merge. Hive isn't really a complete transactional DB - it would be preferred for the source to send full user records any time there's an update, instead of just the changed fields only.
You can reconstruct each record in the table using you can use last_value() with the null option:
select h.id,
coalesce(h.name, last_value(h.name, true) over (partition by h.id order by h.timestamp) as name,
coalesce(h.col1, last_value(h.col1, true) over (partition by h.id order by h.timestamp) as col1,
update_timestamp
from history h;
You can use row_number() and a subquery if you want the most recent record.

Get Id from a conditional INSERT

For a table like this one:
CREATE TABLE Users(
id SERIAL PRIMARY KEY,
name TEXT UNIQUE
);
What would be the correct one-query insert for the following operation:
Given a user name, insert a new record and return the new id. But if the name already exists, just return the id.
I am aware of the new syntax within PostgreSQL 9.5 for ON CONFLICT(column) DO UPDATE/NOTHING, but I can't figure out how, if at all, it can help, given that I need the id to be returned.
It seems that RETURNING id and ON CONFLICT do not belong together.
The UPSERT implementation is hugely complex to be safe against concurrent write access. Take a look at this Postgres Wiki that served as log during initial development. The Postgres hackers decided not to include "excluded" rows in the RETURNING clause for the first release in Postgres 9.5. They might build something in for the next release.
This is the crucial statement in the manual to explain your situation:
The syntax of the RETURNING list is identical to that of the output
list of SELECT. Only rows that were successfully inserted or updated
will be returned. For example, if a row was locked but not updated
because an ON CONFLICT DO UPDATE ... WHERE clause condition was not
satisfied, the row will not be returned.
Bold emphasis mine.
For a single row to insert:
Without concurrent write load on the same table
WITH ins AS (
INSERT INTO users(name)
VALUES ('new_usr_name') -- input value
ON CONFLICT(name) DO NOTHING
RETURNING users.id
)
SELECT id FROM ins
UNION ALL
SELECT id FROM users -- 2nd SELECT never executed if INSERT successful
WHERE name = 'new_usr_name' -- input value a 2nd time
LIMIT 1;
With possible concurrent write load on the table
Consider this instead (for single row INSERT):
Is SELECT or INSERT in a function prone to race conditions?
To insert a set of rows:
How to use RETURNING with ON CONFLICT in PostgreSQL?
How to include excluded rows in RETURNING from INSERT ... ON CONFLICT
All three with very detailed explanation.
For a single row insert and no update:
with i as (
insert into users (name)
select 'the name'
where not exists (
select 1
from users
where name = 'the name'
)
returning id
)
select id
from users
where name = 'the name'
union all
select id from i
The manual about the primary and the with subqueries parts:
The primary query and the WITH queries are all (notionally) executed at the same time
Although that sounds to me "same snapshot" I'm not sure since I don't know what notionally means in that context.
But there is also:
The sub-statements in WITH are executed concurrently with each other and with the main query. Therefore, when using data-modifying statements in WITH, the order in which the specified updates actually happen is unpredictable. All the statements are executed with the same snapshot
If I understand correctly that same snapshot bit prevents a race condition. But again I'm not sure if by all the statements it refers only to the statements in the with subqueries excluding the main query. To avoid any doubt move the select in the previous query to a with subquery:
with s as (
select id
from users
where name = 'the name'
), i as (
insert into users (name)
select 'the name'
where not exists (select 1 from s)
returning id
)
select id from s
union all
select id from i

SQL insert row with one change

I have this table:
Table1:
id text
1 lala
And i want take first row and copy it, but the id 1 change to 2.
Can you help me with this problem?
A SQL table has no concept of "first" row. You can however select a row based on its characteristics. So, the following would work:
insert into Table1(id, text)
select 2, text
from Table1
where id = 1;
As another note, when creating the table, you can have the id column be auto-incremented. The syntax varies from database to database. If id were auto-incremented, then you could just do:
insert into Table1(text)
select text
from Table1
where id = 1;
And you would be confident that the new row would have a unique id.
Kate - Gordon's answer is technically correct. However, I would like to know more about why you want to do this.
If you're intent is to have the field increment with the insertion of each new row, manually setting the id column value isn't a great idea - it becomes very easy for there to be a conflict with two rows attempting to use the same id at the same time.
I would recommend using an IDENTITY field for this (MS SQL Server -- use an AUTO_INCREMENT field in MySQL). You could then do the insert as follows:
INSERT INTO Table1 (text)
SELECT text
FROM Table1
WHERE id = 1
SQL Server would automatically assign a new, unique value to the id field.

Return multiple tables from a T-SQL function in SQL Server 2008

You can return a single table from a T-SQL Function in SQL Server 2008.
I am wondering if it is possible to return more than one table.
The scenario is that I have three queries that filter 3 different tables. Each table is filtered against 5 filter tables that I would like to return from a function; rather than copy and paste their creation in each query.
An simplified example of what this would look like with copy and paste:
FUNCTION GetValuesA(#SomeParameter int) RETURNS #ids TABLE (ID int) AS
WITH Filter1 As ( Select id FROM FilterTable1 WHERE Attribute=SomeParameter )
, Filter2 As ( Select id FROM FilterTable2 WHERE Attribute=SomeParameter )
INSERT INTO #IDs
SELECT ID FROM ValueTableA
WHERE ColA IN (SELECT id FROM Filter1)
AND ColB IN (SELECT id FROM Filter2)
RETURN
-----------------------------------------------------------------------------
FUNCTION GetValuesB(#SomeParameter int) RETURNS #ids TABLE (ID int) AS
WITH Filter1 As ( Select id FROM FilterTable1 WHERE Attribute=SomeParameter )
, Filter2 As ( Select id FROM FilterTable2 WHERE Attribute=SomeParameter )
INSERT INTO #IDs
SELECT ID FROM ValueTableB
WHERE ColA IN (SELECT id FROM Filter1)
AND ColB IN (SELECT id FROM Filter2)
AND ColC IN (SELECT id FROM Filter2)
RETURN
So, the only difference between the two queries is the Table being filtered, and HOW (the Where clause).
I would like to know if I could return Filter1 & Filter2 from a function. I am also open to suggestions on different ways to approach this problem.
No.
Conceptually, how would you expect to handle a function that returned a variable number of tables? You would JOIN on two tables at once? What if the returned fields don't line up?
Is there some reason you can't have a TVF for each filter?
As others say, NO. A function in TSQL must return exactly one result (although that result can come in the form of a table with numerous values).
There are a couple of ways you could achieve something similar though. A stored procedure can execute multiple select statements and deliver the results up to whatever called it, whether that be an application layer or something like SSMS. Many libraries require you to add additional commands to access more result sets though. For instance, in Pyodbc to access result sets after the first one you need to call cursor.nextset()
Also, inside a function you could UNION several result sets together although that would require each result set to have the same columns. One way to achieve that if they have a different column structure is to add in nulls for the missing columns for each select statement. If you needed to know which select statement returned the value, you could also add a column which indicated that. This should work with your simplified example since in each case it is just returning a single ID column, but it could get awkward very quickly if the column names or types are radically different.