Insert Result Query Into Table - sql

I have problem with an insert query in Postgresql.
I have query like this :
select *
from (
select *,
row_number() over (partition by id order by id) as row_number
from lookup_temp
) as rows
where row_number = 1
and I want to insert the result to table lookup_temp.
How can I do this?

I am assuming that you are trying to insert lookup_temp with only one row of each id repeating in your select (because of using this select *,row_number() over (partition by id order by id) as row_numberfrom lookup_temp) to the same table lookup_temp. if yes the below query is enough for you.
delete from lookup_temp where ctid in (
select ctid from (
select ctid,
row_number() over (partition by id order by id) as row_number
from lookup_temp
) as rows
where row_number <> 1)
ctid
The physical location of the row version within its table. Note that
although the ctid can be used to locate the row version very quickly,
a row's ctid will change if it is updated or moved by VACUUM FULL.
Therefore ctid is useless as a long-term row identifier. The OID, or
even better a user-defined serial number, should be used to identify
logical rows.

You can perform an INSERT from SELECT to get the result into the lookup_temp table
INSERT into lookup_temp (specify your columns) VALUES
(
select *
from (
select *,
row_number() over (partition by id order by id) as row_number
from lookup_temp
) as rows
where row_number = 1
)

Your query can be simpler with distinct on
insert into lookup_temp
select distinct on (id) *
from lookup_temp
If you are inserting into another table specify the columns
insert into another_table (id, c1, c2...)
select distinct on (id) id, c1, c2...
from lookup_temp
http://www.postgresql.org/docs/current/static/sql-select.html#SQL-DISTINCT

--Just do it this way. It should work fine
select *
INTO lookup_temp
from (
select *, row_number() over (partition by id order by id) as row_number
from lookup_temp
) as rows
where row_number = 1

Related

Creating column and filtering it in one select statement

Wondering if it is possible to creating a new column and filter on that column. The following is an example:
SELECT row_number() over (partition by ID order by date asc) row# FROM table1 where row# = 1
Thanks!
Some databases support a QUALIFY clause which you might be able to use:
SELECT *
FROM table1
QUALIFY ROW_NUMBER() OVER (PARTITION BY ID ORDER BY date) = 1;
On SQL Server, you may use a TOP 1 WITH TIES trick:
SELECT TOP 1 WITH TIES *
FROM table1
ORDER BY ROW_NUMBER() OVER (PARTITION BY ID ORDER BY date);
More generally, you would have to use a subquery:
WITH cte AS (
SELECT t.*, ROW_NUMBER() OVER (PARTITION BY ID ORDER BY date) rn
FROM table1 t
)
SELECT *
FROM cte
WHERE rn = 1;
The WHERE clause is evaluated before the SELECT so your column has to exist before you can use a WHERE clause. You could achieve this by making a subquery of the original query.
SELECT *
FROM
(
SELECT row_number() over (partition by ID order by date asc) row#
FROM table1
) a
WHERE a.row# = 1

How to remove duplicate rows using CTE?

I would like to remove duplicate rows from my table. So that I have used ROW_NUMBER() function in order to find the duplicate values. After that I wanted to add WHERE claues to my query and so that I modify my query and used "CTE" but it gives me an error
ORA-00928: missing SELECT keyword
This is the query which runs successfully for my use case :
WITH RowNumCTE as
(
SELECT ID,parcelid,propertyaddress,saledate,saleprice,legalreference,
ROW_NUMBER() OVER
( PARTITION BY parcelid,propertyaddress,saledate,saleprice,legalreference
ORDER BY id ) AS rn
FROM housedata
)
SELECT *
FROM RowNumCTE
To delete duplicates:
delete housedata where rowid in
( select lead(rowid) over (partition by parcelid, propertyaddress, saledate, saleprice, legalreference order by id)
from housedata );
To delete duplicates using a CTE:
delete housedata where id in
( with cte as
( select id
, row_number() over(partition by parcelid, propertyaddress, saledate, saleprice, legalreference order by id) as rn
from housedata )
select id from cte
where rn > 1 );
I think you need
BEGIN
FOR d IN
(
SELECT ROW_NUMBER() OVER
( PARTITION BY parcelid,propertyaddress,saledate,saleprice,legalreference
ORDER BY id ) AS rn,
h.*
FROM housedata h
)
LOOP
IF d.rn > 1 THEN
DELETE housedata WHERE id = d.id;
END IF;
END LOOP;
COMMIT;
END;
/
considering those five columns compose the grouping criteria for the desired deletion and id is a primary key column.

Create a table with duplicate values, and use a CTE (Common Table Expression )to delete those duplicate values

Create a table with duplicate values, and use a CTE (Common Table Expression )to delete those duplicate values.
=>
Would some one please help me how to start it because i really don't understand the question.
Assume guess duplicate values can be chosen anything.
For MS SQL Server, this would work:
;with cte as
(
select *
, row_number() over (
partition by [columns], [which], [should], [be], [unique]
order by [columns], [to], [select], [what's], [kept]
) NoOfThisDuplicate
)
delete
from cte
where NoOfThisDuplicate > 1
SQL Fiddle Demo (based on this question: Deleting duplicate row that has earliest date).
Explanation
Create a CTE
Populate it with all rows from the table we want to delete
Add a NoOfThiDuplicate column to that output
Populate this value with the sequential number of this record with the group/partition of all records with the same values for columns [columns], [which], [should], [be], [unique].
The order of the numbering depends on the sort order of those records when sorted by columns [columns], [to], [select], [what's], [kept]
We delete all records returned by the CTE except the first of each group (i.e. all except those with NoOfThisDuplicate=1).
Oracle Setup:
CREATE TABLE test_data ( value ) AS
SELECT LEVEL FROM DUAL CONNECT BY LEVEL <= 10
UNION ALL
SELECT 2*LEVEL FROM DUAL CONNECT BY LEVEL <= 5;
Query 1:
This will select the values removing duplicates:
SELECT DISTINCT *
FROM test_data
But it does not use a CTE.
Query 2:
So, we can put it in a sub-query factoring clause (the name used in the Oracle documentation which corresponds to the SQL Server Common Table Expression)
WITH unique_values ( value ) AS (
SELECT DISTINCT *
FROM test_data
)
SELECT * FROM unique_values;
Query 3:
The sub-query factoring clause was pointless in the previous example ... so doing it a different way:
WITH row_numbers ( value, rn ) AS (
SELECT value, ROW_NUMBER() OVER ( PARTITION BY value ORDER BY ROWNUM ) AS rn
FROM test_data
)
SELECT value
FROM row_numbers
WHERE rn = 1;
Will select the rows where it the first instance of each value found.
Delete Query:
But that didn't delete the rows ...
DELETE FROM test_data
WHERE ROWID IN (
WITH row_numbers ( rid, rn ) AS (
SELECT ROWID, ROW_NUMBER() OVER ( PARTITION BY value ORDER BY ROWNUM ) AS rn
FROM test_data
)
SELECT rid
FROM row_numbers
WHERE rn > 1
);
Which uses the ROWID pseudocolumn to match rows for deletion.

Scalable Solution to get latest row for each ID in BigQuery

I have a quite large table with a field ID and another field as collection_time. I want to select latest record for each ID. Unfortunately combination of (ID, collection_time) time is not unique together in my data. I want just one of records with the maximum collection time. I have tried two solutions but none of them has worked for me:
First: using query
SELECT * FROM
(SELECT *, ROW_NUMBER() OVER (PARTITION BY ID ORDER BY collection_time) as rn
FROM mytable) where rn=1
This results in Resources exceeded error that I guess is because of ORDER BY in the query.
Second
Using join between table and latest time:
(SELECT tab1.*
FROM mytable AS tab1
INNER JOIN EACH
(SELECT ID, MAX(collection_time) AS second_time
FROM mytable GROUP EACH BY ID) AS tab2
ON tab1.ID=tab2.ID AND tab1.collection_time=tab2.second_time)
this solution does not work for me because (ID, collection_time) are not unique together so in JOIN result there would be multiple rows for each ID.
I am wondering if there is a workaround for the resourcesExceeded error, or a different query that would work in my case?
SELECT
agg.table.*
FROM (
SELECT
id,
ARRAY_AGG(STRUCT(table)
ORDER BY
collection_time DESC)[SAFE_OFFSET(0)] agg
FROM
`dataset.table` table
GROUP BY
id)
This will do the job for you and is scalable considering the fact that the schema keeps changing, you won't have to change this
Short and scalable version:
select array_agg(t order by collection_time desc limit 1)[offset(0)].*
from mytable t
group by t.id;
Quick and dirty option - combine your both queries into one - first get all records with latest collection_time (using your second query) and then dedup them using your first query:
SELECT * FROM (
SELECT *, ROW_NUMBER() OVER (PARTITION BY tab1.ID) AS rn
FROM (
SELECT tab1.*
FROM mytable AS tab1
INNER JOIN (
SELECT ID, MAX(collection_time) AS second_time
FROM mytable GROUP BY ID
) AS tab2
ON tab1.ID=tab2.ID AND tab1.collection_time=tab2.second_time
)
)
WHERE rn = 1
And with Standard SQL (proposed by S.Mohsen sh)
WITH myTable AS (
SELECT 1 AS ID, 1 AS collection_time
),
tab1 AS (
SELECT ID,
MAX(collection_time) AS second_time
FROM myTable GROUP BY ID
),
tab2 AS (
SELECT * FROM myTable
),
joint AS (
SELECT tab2.*
FROM tab2 INNER JOIN tab1
ON tab2.ID=tab1.ID AND tab2.collection_time=tab1.second_time
)
SELECT * EXCEPT(rn)
FROM (
SELECT *, ROW_NUMBER() OVER (PARTITION BY ID) AS rn
FROM joint
)
WHERE rn=1
If you don't care about writing a piece of code for every column:
SELECT ID,
ARRAY_AGG(col1 ORDER BY collection_time DESC)[OFFSET(0)] AS col1,
ARRAY_AGG(col2 ORDER BY collection_time DESC)[OFFSET(0)] AS col2
FROM myTable
GROUP BY ID
I see no one has mentioned window functions with QUALIFY:
SELECT *, MAX(collection_time) OVER (PARTITION BY id) AS max_timestamp
FROM my_table
QUALIFY collection_time = max_timestamp
The window function adds a column max_timestamp that is accessible in the QUALIFY clause to filter on.
As per your comment, Considering you have a table with unique ID's for which you need to find latest collection_time. Here is another way to do it using Correlated Sub-Query. Give it a try.
SELECT id,
(SELECT Max(collection_time)
FROM mytable B
WHERE A.id = B.id) AS Max_collection_time
FROM id_table A
Another solution, which could be more scalable since it avoids multiple scans of the same table (which will happen with both self-join and correlated subquery in above answers). This solution only works with standard SQL (uncheck "Use Legacy SQL" option):
SELECT
ID,
(SELECT srow.*
FROM UNNEST(t.srows) srow
WHERE srow.collection_time = MAX(srow.collection_time))
FROM
(SELECT ID, ARRAY_AGG(STRUCT(col1, col2, col3, ...)) srows
FROM id_table
GROUP BY ID) t

Allow max 10 items with same GUID

I have a table with 4 columns:
ID, GUID, Binary, Timestamp.
My goal is to save last 10 modifications of the binary into database. If the 11th modification is inserted, the oldest one should be removed.
My current approach is to do it in two steps (pseudo mssql):
1) DELETE FROM mytable WHERE GUID = 'XXX' AND
ID NOT IN (SELECT TOP 9 ID FROM mytable WHERE GUID = 'XXX' ORDER BY Timestamp)
2) INSERT new binary ...
Is there a way to do it more efficient, maybe with one statement? Is there a way to make it both, mssql and postgresql compatible (without TOP / Limit)?
You could use cte for compatibility:
with cte as (
select
row_number() over(order by Timestamp desc) as row_num
from mytable
where GUID = 'XXX'
)
delete from cte
where row_num > 10
editsee Gordon Linoff answer, my syntax is not working in PostgreSQL, just tested it in sqlfiddle. I'm working too much with SQL Server...
edit2
About delete and insert in one query, PostgreSQL allows that:
with cte_del as (
select
id,
row_number() over(order by id desc) as row_num
from tbl
where GUID = 'XXX'
), cte_d as (
delete from tbl where id in (select id from cte_del where row_num > 10)
)
insert into ...
select id from cte_del where row_num <= 10;
sql fiddle demo
I think the following will work in both SQL Server and Postgres:
with todelete as (
select id, row_number() over (partition by GUID order by timestamp) as seqnum
from mytable
where GUID = 'xxx'
)
delete from mytable
where id in (select id from todelete where seqnum > 10);