Repeating Max(timestamp) value for all the duplicate ID rows in bigquery - google-bigquery

This might be simple, but I am having troubles resolving this issue
I have a table with following columns and data :
There are multiple entries for an id with different updateTimestamp, I want to identify the max timestamp and repeat that value for all the duplicate ids in the table. Also, the table is large and I do not want to query it multiple times(process is complex).
Here is what I am expecting the output to look like

You need to find maximum of update timestamp for each ID and update the table using ID column. Something like
update <table_name> tgt
set tgt.max_updatetimestamp = src.maxstamp
from (select id, max(updatetimestamp) as maxstamp from <tablename> group by 1) src
where tgt.id = src.id;

Related

Merge update records in a final table

I have a user table in Hive of the form:
User:
Id String,
Name String,
Col1 String,
UpdateTimestamp Timestamp
I'm inserting data in this table from a file which has the following format:
I/U,Timestamp when record was written to file, Id, Name, Col1, UpdateTimestamp
e.g. for inserting a user with Id 1:
I,2019-08-21 14:18:41.002947,1,Bob,stuff,123456
and updating col1 for the same user with Id 1:
U,2019-08-21 14:18:45.000000,1,,updatedstuff,123457
The columns which are not updated are returned as null.
Now simple insertion is easy in hive using load in path in a staging table and then ignoring the first two fields from the stage table.
However, how would I go about the update statements? So that my final row in hive looks like below:
1,Bob,updatedstuff,123457
I was thinking to insert all rows in a staging table and then perform some sort of merge query. Any ideas?
Typically with a merge statement your "file" would still be unique on ID and the merge statement would determine whether it needs to insert this as a new record, or update values from that record.
However, if the file is non-negotiable and will always have the I/U format, you could break the process up into two steps, the insert, then the updates, as you suggested.
In order to perform updates in Hive, you will need the users table to be stored as ORC and have ACID enabled on your cluster. For my example, I would create the users table with a cluster key, and the transactional table property:
create table test.orc_acid_example_users
(
id int
,name string
,col1 string
,updatetimestamp timestamp
)
clustered by (id) into 5 buckets
stored as ORC
tblproperties('transactional'='true');
After your insert statements, your Bob record would say "stuff" in col1:
As far as the updates - you could tackle these with an update or merge statement. I think the key here is the null values. It's important to keep the original name, or col1, or whatever, if the staging table from the file has a null value. Here's a merge example which coalesces the staging tables fields. Basically, if there is a value in the staging table, take that, or else fall back to the original value.
merge into test.orc_acid_example_users as t
using test.orc_acid_example_staging as s
on t.id = s.id
and s.type = 'U'
when matched
then update set name = coalesce(s.name,t.name), col1 = coalesce(s.col1, t.col1)
Now Bob will show "updatedstuff"
Quick disclaimer - if you have more than one update for Bob in the staging table, things will get messy. You will need to have a pre-processing step to get the latest non-null values of all the updates prior to doing the update/merge. Hive isn't really a complete transactional DB - it would be preferred for the source to send full user records any time there's an update, instead of just the changed fields only.
You can reconstruct each record in the table using you can use last_value() with the null option:
select h.id,
coalesce(h.name, last_value(h.name, true) over (partition by h.id order by h.timestamp) as name,
coalesce(h.col1, last_value(h.col1, true) over (partition by h.id order by h.timestamp) as col1,
update_timestamp
from history h;
You can use row_number() and a subquery if you want the most recent record.

Using REPLACE AND LIKE

I am currently trying to update a column in a temporary table using Oracle 11g SQL syntax. In this column there is an Unique ID that is 12 digits long. However I need to join this table with this column holding the Unique ID but the syntax for the Unique ID of this table is slightly different than the syntax for the table that it will be joined (with Unique ID serving as the PK = FK). This may be tough to follow so I will provide what I am doing below.
UniqueID Column from TABLE xyz Syntax
AB10783421111111
UniqueID Column from TABLE zxo Syntax
383421111111
You see how the numbers are identical except for the AB107 and first '3' in the zxo table? I would like to know why both these queries are not running
UPDATE temp37 SET UNIQUE_ID = REPLACE(UNIQUE_ID, (LIKE 'AB107%'), (LIKE '3%'));
UPDATE temp37
SET UNIQUE_ID = '3%'
WHERE UNIQUE_ID = 'AB107%';
Essentially I would like to replace every case of an id with AB10755555555555 to 355555555555. Thank you for any help.
You can do:
UPDATE temp37 SET UNIQUE_ID = REPLACE(UNIQUE_ID, 'AB107', '3');
OR
UPDATE temp37 SET UNIQUE_ID = CONCAT('3', substr(UNIQUE_ID, 6)) WHERE UNIQUE_ID LIKE 'AB107%';

update rows one by one with max value + 1 in SQL

here is my situation,
I have 2 tables,
1st table has all records, and it has IDs
2nd table has new records and it doesnt have ID, yet.
I want to generate ID for 2nd table with max(id) + 1 from 1st table.
when i do this, it makes all rows same id number, but i want to make it unique increment number.
e.g
select max(id) from table1 then it gives '997040'
I want to make second table rows like;
id
997041
997042
997043
997044
i think i need to use cursor or whileloop, or both, but i could not create the actual query.
sorry about bad explanation, i am so confused now
Use ROWNUM to generate incrementing row numbers. E.g.:
SELECT someConstant + ROWNUM FROM source.
CREATE TABLE table_name
(
ID int IDENTITY(997041,1) PRIMARY KEY
)
I hope this sql query would work!!
Or refer http://www.w3schools.com/sql/sql_autoincrement.asp

How to insert generated id into a results table

I have the following query
SELECT q.pol_id
FROM quot q
,fgn_clm_hist fch
WHERE q.quot_id = fch.quot_id
UNION
SELECT q.pol_id
FROM tdb2wccu.quot q
WHERE q.nr_prr_ls_yr_cov IS NOT NULL
For every row in that result set, I want to create a new row in another table (call it table1) and update pol_id in the quot table (from the above result set) with the generated primary key from the inserted row in table1.
table1 has two columns. id and timestamp.
I'm using db2 10.1.
I've tried numerous things and have been unsuccessful for quite a while. Thanks!
Simple solution: create a new table for the result set of your query, which has an identity column in it. Then, after running your query, update the pol_id field with the newly generated ID in your result table.
Alteratively, you can do it more manually by using the the ROW_NUMBER() OLAP function, which I often found convenient for creating IDs. For this it is convenient to use a stored procedure which does the following:
get the maximum old id from Table1 and write it into a variable old_max_id.
after generating the result set, write the row-numbers into the table1, maybe by something like
INSERT INTO TABLE1
SELECT ROW_NUMBER() OVER (PARTITION BY <primary-key> ORDER BY <whatever-you-want>)
+ OLD_MAX_ID
, CURRENT TIMESTAMP
FROM (<here comes your SQL query>)
Either write the result set into a table or return a cursor to it. Here you should either use the same ROW_NUMBER statement as above or directly use the ID from Table1.

insert a column from one table to another with constraints

I have 2 tables: legend and temp. temp has an ID and account ID, legend has advent_id, account_id and other columns. ID in temp and advent_id in legend are similar. account_id column is empty and I want to import the relevant account_id from temp to legend. I am using PostgreSQL.
I am trying the following query, but it is not working as I am expecting. New rows are getting created and the account_id's are getting added in the new rows, not to the corresponding advent_id.
insert into legend(account_id)
select temp.account_id
from legend, temp
where legend.advent_id=temp.id;
It is inserting the account_id in the wrong place, not to the corresponding advent_id.
I am using the following query to check:
select advent_id, account_id from legend where account_id is not null;
What exactly is the problem in my insert query?
If you are trying to modify existing rows, rather than add rows, then you need an UPDATE query, not an INSERT query.
It's possible to use joined tables as the target of an UPDATE; in your example:
UPDATE legend
JOIN temp
SET legend.account_id = temp.account_id
WHERE(temp.id = legend.advent_id);