INSERT INTO ... FROM SELECT ... RETURNING id mappings - sql

I'm using PostgreSQL 9.3.
I want to duplicate some of the db records. Since I'm using an auto-increment pk id for the table, I want to get back the id mappings from the generated ids of duplicated records to the original ones. For example, say I have a table posts with 2 records in it:
[{'id': 1, 'title': 'first'}
, {'id': 2. 'title': 'second'}]
With SQL:
INSERT INTO posts (title) SELECT title FROM posts RETURNING id, ??
I expect to see mappings like:
[{'id': 3, 'from_id': 1}
, {'id': 4, 'from_id': 2}]
Any idea on how to fill in the question marks above to make it work? Thanks a lot!

This would be simpler for UPDATE, where additional rows joined into the update are visible to the RETURNING clause:
Return pre-UPDATE column values using SQL only
The same is currently not possible for INSERT. The manual:
The expression can use any column names of the table named by table_name
table_name being the target of the INSERT command.
You can use (data-modifying) CTEs to get this to work.
Assuming title to be unique per query, else you need to do more:
WITH sel AS (
SELECT id, title
FROM posts
WHERE id IN (1,2) -- select rows to copy
)
, ins AS (
INSERT INTO posts (title)
SELECT title FROM sel
RETURNING id, title
)
SELECT ins.id, sel.id AS from_id
FROM ins
JOIN sel USING (title);
If title is not unique per query (but at least id is unique per table):
WITH sel AS (
SELECT id, title, row_number() OVER (ORDER BY id) AS rn
FROM posts
WHERE id IN (1,2) -- select rows to copy
ORDER BY id
)
, ins AS (
INSERT INTO posts (title)
SELECT title FROM sel ORDER BY id -- ORDER redundant to be sure
RETURNING id
)
SELECT i.id, s.id AS from_id
FROM (SELECT id, row_number() OVER (ORDER BY id) AS rn FROM ins) i
JOIN sel s USING (rn);
This second query relies on the undocumented implementation detail that rows are inserted in the order provided. It works in all current versions of Postgres and is probably not going to break.
db<>fiddle here
Old sqlfiddle

if id column of posts is serial type, it's generated like nextval('posts_id_seq'::regclass),
you can manually call this function for every new row
with
sel as (
SELECT id, title, nextval('posts_id_seq'::regclass) new_id
FROM posts
WHERE id IN (1,2)
),
ins as (
INSERT INTO posts (id, title)
SELECT new_id, title
FROM sel
)
SELECT id, new_id
FROM sel
it'l works with any data, include non-unique title

The simplest solution IMHO would be to simply add a column to your table where you could put id of the row that was cloned.

Related

INSERT into table with WITH clause not working in postgres

Sorry for the followup question (from INSERT into table if doesn't exists and return id in both cases)
But I couldn't find any solution for my questions.
I have a feedback table whose columns are foreign key of other tables. For ex. scopeid is foregin key of id column in scope table, similarly userid is foreign key of id column from user table and so on.
So, I am trying to insert following data in the table:
scope: home_page,
username: abc
status: fixed
app: demoapp
So, to insert above data, I am trying to write subquery to get the id of each value and use that. Also if that value doesn't exists insert and use the new ID to insert that in feedback table.
So basically I am trying to insert into multiple table (if something doesnt exists) and use those ID to insert into final table which is feedback table.
Hope things are much clearer now.
Here is my feedback table:
id scopeid comment rating userid statusid appid
3 1 test 5 2 1 2
All the id columns are foreign key of other tables and so in my below query I am trying to get the id by name and if not exists add those.
Here is my final query:
INSERT INTO feedbacks (scopeid, comment, rating, userid, statusid, appid)
VALUES
(
-- GET SCOPE ID
(
WITH rows_exists AS (
SELECT id FROM scope
WHERE appid=2 AND NAME = 'application'),
row_new AS (INSERT INTO scope (appid, NAME) SELECT 2, 'application' WHERE NOT EXISTS (SELECT id FROM scope WHERE appid=2 AND name='application') returning id)
SELECT id FROM rows_exists UNION ALL SELECT id FROM row_new
),
-- Comment
'GOD IS HERE TO COMMENT',
-- rating
5,
-- userid
(
WITH rows_exists AS (
SELECT id FROM users
WHERE username='abc'),
row_new AS (INSERT INTO users (username) SELECT 'abc' WHERE NOT EXISTS (SELECT id FROM users WHERE username='abc') returning id)
SELECT id FROM rows_exists UNION ALL SELECT id FROM row_new
),
-- statusid
(SELECT id FROM status WHERE NAME='received'),
-- appid
(
WITH rows_exists AS (
SELECT id FROM apps
WHERE name='google'),
row_new AS (INSERT INTO apps (name) SELECT 'google' WHERE NOT EXISTS (SELECT id FROM apps WHERE NAME='google') returning id)
SELECT id FROM rows_exists UNION ALL SELECT id FROM row_new
)
)
But I get following Error:
with clause containing a data-modifying statement must be at the top level
Is that even possible what I am trying to achieve by this way or other method.
The following inserts ids that don't exist and then inserts the resulting id:
with s as (
select id
from scope
where appid = 2 AND NAME = 'application'
),
si as (
insert into scope (appid, name)
select v.appid, v.name
from (values (2, 'application')) v(appid, name)
where not exists (select 1 from scope s where s.appid = v.appid and s.name = v.name)
returning id
),
. . . similar logic for other tables
insert into feedback (scopeid, comment, . . . )
select (select id from s union all select id from is) as scopeid,
'test' as comment,
. . .;
You should be sure you have unique constraints in each of the table for the values you are looking for. Otherwise, you could have a race condition and end up inserting the same row multiple times in a multithreaded environment.

How to delete the duplicate data in table (Postgres)

I want to delete the duplicated data in a table , I know there is a way use
SELECT
fruit,
COUNT( fruit )
FROM
basket
GROUP BY
fruit
HAVING
COUNT( fruit )> 1
ORDER BY
fruit;
to find them , buy I need to determine every column's value is equal , which means tableA.* = tableA.* (except id , id is the auto-increment primary key )
and I tried this:
SELECT
*,
COUNT( * )
FROM
myTable
GROUP BY
*
HAVING
COUNT( * )> 1
ORDER BY
id;
but it says I can't use GROUP BY * , so how can I find & delete the duplicated data(need every column's value is equal except id)?
using
SELECT * DISTINCT
DISTINCT remove duplicated result
You need to try something similar to be below query. You apply PARTITION BY for the columns other than Id (as it is incrementing unique value). PARTITION BY should be applied for columns, for which you want to check duplicates.
Also refer to Row_Number in Postgres & Common Table expression in Postgres
WITH DuplicateTableRows AS
(
SELECT Id, Row_Number() OVER (PARTITION BY col1, col2... ORDER BY Id)
FROM
Table1
)
DELETE FROM Table1
WHERE Id IN (SELECT Id FROM Table1 WHERE row_number > 1)
You can do this using JSON:
select (to_jsonb(b) - 'id')
from basket b
group by 1
having count(*) > 1;
The result is as JSON. Unfortunately, to extract the values back into a record, you need to list the columns individually.

Change value of duplicated rows

There is a table with tow columns(ID, Data) and there are 3 rows with same value.
ID Data
4 192.168.0.22
4 192.168.0.22
4 192.168.0.22
Now I want to change third row DATA column. In update SQL Server Generate an error that I ca not change the value.
I can delete all 3 rows. But I can not delete third row separately.
This table is for a software that I bought and I changed the third Server IP.
You can try the following query
create table #tblSimilarValues(id int, ipaddress varchar(20))
insert into #tblSimilarValues values (4, '192.168.0.22'),
(4, '192.168.0.22'),(4, '192.168.0.22')
Use Below query if you want to change all rows
with oldData as (
select *,
count(*) over (partition by id, ipaddress) as cnt
from #tblSimilarValues
)
update oldData
set ipaddress = '192.168.0.22_1'
where cnt > 1;
select * from #tblSimilarValues
Use Below query if you want to skip firs row
;with oldData as (
select *,
ROW_NUMBER () over (partition by id, ipaddress order by id, ipaddress) as cnt
from #tblSimilarValues
)
update oldData
set ipaddress = '192.168.0.22_2'
where cnt > 1;
select * from #tblSimilarValues
drop table #tblSimilarValues
You can find the live demo live demo here
Since there is no column that allows us to distinguish these rows from each other, there's no "third row" (nor a first or second one for that matter).
We can use a ROW_NUMBER function to apply arbitrary row numbers to these rows, however, and if we place that in a CTE, we can apply DELETE/UPDATE actions via the CTE and use the arbitrary row numbers:
declare #t table (ID int not null, Data varchar(15))
insert into #t(ID,Data) values
(4,'192.168.0.22'),
(4,'192.168.0.22'),
(4,'192.168.0.22')
;With ArbitraryAssignments as (
select *,ROW_NUMBER() OVER (PARTITION BY ID, Data ORDER BY Data) as rn
from #t
)
delete from ArbitraryAssignments where rn > 2
select * from #t
This produces two rows of output - one row was deleted.
Note that I say that the ROW_NUMBER is arbitrary. One of the expressions in both the PARTITION BY and ORDER BY clauses is the same. By definition, then, we know that no real ORDER is defined by this (because all rows within the same partition, by definition, have the same value for that expression).
In this case ID columns allows duplicate value which is wrong, ID should be unique.
Now what you can do is create a new column make that unique or Primary Key or change the duplicate values of ID column and make it Unique/Primary key.
Now as per your Unique key/Primary key you can update DATA column value by query as below:
UPDATE <Table Name>
SET DATA = 'new data'
WHERE ID = 3;

Remove duplicates from table in bigquery

I found duplicates in my table by doing below query.
SELECT name, id, count(1) as count
FROM [myproject:dev.sample]
group by name, id
having count(1) > 1
Now i would like to remove these duplicates based on id and name by using DML statement but its showing '0 rows affected' message.
Am i missing something?
DELETE FROM PRD.GPBP WHERE
id not in(select id from [myproject:dev.sample] GROUP BY id) and
name not in (select name from [myproject:dev.sample] GROUP BY name)
I suggest, you create a new table without the duplicates. Drop your original table and rename the new table to original table.
You can find duplicates like below:
Create table new_table as
Select name, id, ...... , put our remaining 10 cols here
FROM(
SELECT *,
ROW_NUMBER() OVER(Partition by name , id Order by id) as rnk
FROM [myproject:dev.sample]
)a
WHERE rnk = 1;
Then drop the older table and rename new_table with old table name.
Below query (BigQuery Standard SQL) should be more optimal for de-duping like in your case
#standardSQL
SELECT AS VALUE ANY_VALUE(t)
FROM `myproject.dev.sample` AS t
GROUP BY name, id
If you run it from within UI - you can just set Write Preference to Overwrite Table and you are done
Or if you want you can use DML's INSERT to new table and then copy over original one
Meantime, the easiest way is as below (using DDL)
#standardSQL
CREATE OR REPLACE TABLE `myproject.dev.sample` AS
SELECT * FROM (
SELECT AS VALUE ANY_VALUE(t)
FROM `myproject.dev.sample` AS t
GROUP BY name, id
)

remove duplicate records with a criteria

I am using a script which requires only unique values. And I have a table which has duplicates like below, i need to keep only unique values (first occurrence) irrespective of what is present inside the brackets.
can I delete the records and keep the unique records using a single query?
Input table
ID Name
1 (Del)testing
2 (Del)test
3 (Delete)testing
4 (Delete)tester
5 (Del)tst
6 (Delete)tst
So the output tables should be something like
Input table
ID Name
1 (Del)testing
2 (Del)test
3 (Delete) tester
4 (Del)tst
SELECT DISTINCT * FROM FOO;
It depends how much data you have to retrieve, if you only have to change Delete -> Del you can try with REPLACE
http://technet.microsoft.com/en-us/library/ms186862.aspx
also grouping functions should help you
I don't think this would be easy query
Assumption: The name column always has all strings in the format given in the sample data.
Try this:
;with cte as
(select *, rank() over
(partition by substring(name, charindex(')',name)+1,len(name)+1 - charindex(')',name))
order by id) rn
from tbl
),
filtered_cte as
(select * from cte
where rn = 1
)
select rank() over (partition by getdate() order by id,getdate()) id , name
from filtered_cte
How this works:
The first CTE cte uses rank() to rank the occurrence of the string outside brackets in the name column.
The second CTE filtered_cte only returns the first row for each occurence of the specified string. In this step, we get the expected results, but not in the desired format.
In this step we partition by and order by the getdate() function. This function is chosen as a dummy to give us continuous values for the id column while using the rank function as we did in step 1.
Demo here.
Note that this solution will return filtered values, but not delete anything in the source table. If you wish, you can delete from the CTE created in step 1 to remove data from the source table.
First use this update to make them uniform
Update table set name = replace(Name, '(Del)' , '(Delete)')
then delete the repetitive names
Delete from table where id in
(Select id from (Select Row_Number() over(Partition by Name order by id) as rn,* from table) x
where rn > 1)
First create the input date table
CREATE TABLE test
(ID int,Name varchar(20));
INSERT INTO test
(`ID`, `Name`)
VALUES
(1, '(Del)testing'),
(2, '(Del)test'),
(3, '(Delete)testing'),
(4, '(Delete)tester'),
(5, '(Del)tst'),
(6, '(Delete)tst');
Select Query
select id, name
from (
select id, name ,
ROW_NUMBER() OVER(PARTITION BY substring(name,PATINDEX('%)%',name)+1,20) ORDER BY name) rn
from test ) t
where rn= 1
order by 1
SQL Fiddle Link
http://www.sqlfiddle.com/#!6/a02b0/34