i have the following table created in postgres using the goqu library, there will only be one value in the column at any point and I want to use select for update to lock the rows before overwriting the value to prevent concurrency.
table:
user_id
1234
This is what I tried:
type data struct {
ID int `db:"user_id"`
}
var new_id data
new_id = data{Id: 4321,}
DB.Select(data{}).From(tablename).ForUpdate(exp.SkipLocked).Insert().Rows(new_id).Executor().Exec()
But this command seems to create new rows instead of overwriting the value. Any help is appreciated!
Related
I have a user table in Hive of the form:
User:
Id String,
Name String,
Col1 String,
UpdateTimestamp Timestamp
I'm inserting data in this table from a file which has the following format:
I/U,Timestamp when record was written to file, Id, Name, Col1, UpdateTimestamp
e.g. for inserting a user with Id 1:
I,2019-08-21 14:18:41.002947,1,Bob,stuff,123456
and updating col1 for the same user with Id 1:
U,2019-08-21 14:18:45.000000,1,,updatedstuff,123457
The columns which are not updated are returned as null.
Now simple insertion is easy in hive using load in path in a staging table and then ignoring the first two fields from the stage table.
However, how would I go about the update statements? So that my final row in hive looks like below:
1,Bob,updatedstuff,123457
I was thinking to insert all rows in a staging table and then perform some sort of merge query. Any ideas?
Typically with a merge statement your "file" would still be unique on ID and the merge statement would determine whether it needs to insert this as a new record, or update values from that record.
However, if the file is non-negotiable and will always have the I/U format, you could break the process up into two steps, the insert, then the updates, as you suggested.
In order to perform updates in Hive, you will need the users table to be stored as ORC and have ACID enabled on your cluster. For my example, I would create the users table with a cluster key, and the transactional table property:
create table test.orc_acid_example_users
(
id int
,name string
,col1 string
,updatetimestamp timestamp
)
clustered by (id) into 5 buckets
stored as ORC
tblproperties('transactional'='true');
After your insert statements, your Bob record would say "stuff" in col1:
As far as the updates - you could tackle these with an update or merge statement. I think the key here is the null values. It's important to keep the original name, or col1, or whatever, if the staging table from the file has a null value. Here's a merge example which coalesces the staging tables fields. Basically, if there is a value in the staging table, take that, or else fall back to the original value.
merge into test.orc_acid_example_users as t
using test.orc_acid_example_staging as s
on t.id = s.id
and s.type = 'U'
when matched
then update set name = coalesce(s.name,t.name), col1 = coalesce(s.col1, t.col1)
Now Bob will show "updatedstuff"
Quick disclaimer - if you have more than one update for Bob in the staging table, things will get messy. You will need to have a pre-processing step to get the latest non-null values of all the updates prior to doing the update/merge. Hive isn't really a complete transactional DB - it would be preferred for the source to send full user records any time there's an update, instead of just the changed fields only.
You can reconstruct each record in the table using you can use last_value() with the null option:
select h.id,
coalesce(h.name, last_value(h.name, true) over (partition by h.id order by h.timestamp) as name,
coalesce(h.col1, last_value(h.col1, true) over (partition by h.id order by h.timestamp) as col1,
update_timestamp
from history h;
You can use row_number() and a subquery if you want the most recent record.
I have the following query
SELECT q.pol_id
FROM quot q
,fgn_clm_hist fch
WHERE q.quot_id = fch.quot_id
UNION
SELECT q.pol_id
FROM tdb2wccu.quot q
WHERE q.nr_prr_ls_yr_cov IS NOT NULL
For every row in that result set, I want to create a new row in another table (call it table1) and update pol_id in the quot table (from the above result set) with the generated primary key from the inserted row in table1.
table1 has two columns. id and timestamp.
I'm using db2 10.1.
I've tried numerous things and have been unsuccessful for quite a while. Thanks!
Simple solution: create a new table for the result set of your query, which has an identity column in it. Then, after running your query, update the pol_id field with the newly generated ID in your result table.
Alteratively, you can do it more manually by using the the ROW_NUMBER() OLAP function, which I often found convenient for creating IDs. For this it is convenient to use a stored procedure which does the following:
get the maximum old id from Table1 and write it into a variable old_max_id.
after generating the result set, write the row-numbers into the table1, maybe by something like
INSERT INTO TABLE1
SELECT ROW_NUMBER() OVER (PARTITION BY <primary-key> ORDER BY <whatever-you-want>)
+ OLD_MAX_ID
, CURRENT TIMESTAMP
FROM (<here comes your SQL query>)
Either write the result set into a table or return a cursor to it. Here you should either use the same ROW_NUMBER statement as above or directly use the ID from Table1.
Someone deployed a SQL table with the schema
ConfigOptions
name VARCHAR(50)
value VARCHAR(50)
and the following logic for saving options:
int i = ExecuteNonQuery("UPDATE ConfigOptions SET value=#value WHERE name=#name");
if(i==0) i = ExecuteNonQuery("INSERT INTO ConfigOptions (name,value) (#name,#value)");
We now saw that this table is littered with duplicates, and we want to change this.
As far as I can tell, the logic is: whenever the UPDATE affected zero rows, another row is inserted. If I am not mistaken, this can be caused by:
a row by the name of #name does not exist or
the row exists, but already contains value #value
So, all rows with same name should be full duplicates. If now, something is completely wrong (and behaviour may be undefined).
Now I have to fix this problem of duplicates, so I want to add a PK on name. Before I can do this, I have to remove all rows with duplicate names, only keeping one of each.
In the installer (only the installer is allowed to change schema), I only have SQL queries at hand, so I can't do it with C# logic:
Dictionary<string, int> dic = new Dictionary<string, int>();
SqlDataReader sdr = ExecuteReader("SELECT name,COUNT(value) FROM ConfigOptions GROUP BY name HAVING COUNT(value)>1");
while (sdr.Read()) dic.Add(sdr.GetString(0), sdr.GetInt32(1));
sdr.Close();
foreach (var kv in dic) {
AddParameter("#name", System.Data.SqlDbType.VarChar, 50, kv.Key);
ExecuteNonQuery("DELETE TOP " + (kv.Value - 1) + " FROM ConfigOptions WHERE name=#name");
}
ExecuteNonQuery("ALTER TABLE program_options ADD PRIMARY KEY (name)");
Is there a way to put this into SQL logic?
Using %%physloc%%, the phys(ical) loc(ation) of the row, should do the trick:
DELETE FROM ConfigOptions
WHERE %%physloc%% NOT IN (
SELECT MIN(%%physloc%%)
FROM ConfigOptions
GROUP BY name);
After this cleanup, you can add the primary key to the table.
NOTE: this will leave you with only one row for every name. If the value column is different in two records with the same name, you will lose the newest record. If you want to change this, use GROUP BY name, value.
I have this table:
Table1:
id text
1 lala
And i want take first row and copy it, but the id 1 change to 2.
Can you help me with this problem?
A SQL table has no concept of "first" row. You can however select a row based on its characteristics. So, the following would work:
insert into Table1(id, text)
select 2, text
from Table1
where id = 1;
As another note, when creating the table, you can have the id column be auto-incremented. The syntax varies from database to database. If id were auto-incremented, then you could just do:
insert into Table1(text)
select text
from Table1
where id = 1;
And you would be confident that the new row would have a unique id.
Kate - Gordon's answer is technically correct. However, I would like to know more about why you want to do this.
If you're intent is to have the field increment with the insertion of each new row, manually setting the id column value isn't a great idea - it becomes very easy for there to be a conflict with two rows attempting to use the same id at the same time.
I would recommend using an IDENTITY field for this (MS SQL Server -- use an AUTO_INCREMENT field in MySQL). You could then do the insert as follows:
INSERT INTO Table1 (text)
SELECT text
FROM Table1
WHERE id = 1
SQL Server would automatically assign a new, unique value to the id field.
I added some rows into the table manually and also I set up the ID (auto_increment) manually. Now when I try to add new row through my app into DB table, to DB table I am getting the error , that the created ID value already exist.
How can I set manually the next ID value (for example, in table I have to IDs, so how to tell to PostgreSQL, that the next ID should be counted since the number 3)?
http://www.postgresql.org/docs/current/static/functions-sequence.html
select setval('sequence-name', <new-value>);
You can get the sequence name from the the table definition:
id | integer | not null default nextval('id_seq'::regclass)
In this case the sequence is named 'id_seq'
Edit (10x to #Glenn):
SELECT setval('id_seq', max(id)) FROM table;
I think there is a simpler way:
ALTER SEQUENCE "seq_product_id" RESTART WITH 10