I have a single table TableA. It has columns id, type, relatedId, another1, another2. Column type can have values 1, 2 or 3.
What I need is, for each row in TableA, where type = 1, insert another row in the same table and update the original row (column relatedId) with id of newly inserted row. Also, values for some columns in newly inserted row should be copied from the original one.
So for current state:
id|type|relatedId|another1
10| 1 |null|"some text"
11| 2 |null|"somthing"
12| 1 |null|"somthing else"
result should be following:
id|type|relatedId|another1
10| 1 |13 |"some text" - now has relationship to 13
11| 2 |null|"somthing"
12| 1 |14 |"somthing else" - now has relationship to 13
13| 3 |null|"some text" - inserted, "another1" is copied from 10
14| 3 |null|"somthing else" - inserted, "another1" is copied from 12
Assuming the texts are unique you can do this:
demo:db<>fiddle
WITH ins AS (
INSERT INTO tablea(type, related_id, another1)
SELECT 3, null, another1
FROM tablea
WHERE type = 1
RETURNING id, another1
)
UPDATE tablea t
SET related_id = s.id
FROM (
SELECT * FROM ins
) s
WHERE s.another1 = t.another1 AND t.type = 1
The WITH clause allows to execute two separate statements sequentially. So first inserting the new data. With the new generated ids you can update the old data afterwards. Because you have to match the original data, the text is helpful as identifier.
This only works if you do not have to datasets with (1, 'something'). Then it would be hard to identify which of both records is the original for each copy.
Another way could be to store the type1-ids in the new type3-columns as well. If this would be ok for you, you could do this:
demo:db<>fiddle
WITH ins AS (
INSERT INTO tablea(type, related_id, another1)
SELECT 3, id, another1
FROM tablea
WHERE type = 1
RETURNING id, related_id, another1
)
UPDATE tablea t
SET related_id = s.id
FROM (
SELECT * FROM ins
) s
WHERE s.related_id = t.id
This stores the original type1-ids in the related_id column of the new ones. So in every case the original id can be found over this value.
Unfortunately, you cannot NULL out these columns in another WITH clause because the WITH clauses only work with existing data. At this moment the query itself is not done yet. So the new records do not exist physically.
This one could work...
demo:db<>fiddle
WITH to_be_copied AS (
SELECT id, another1
FROM tablea
WHERE type = 1
), ins AS (
INSERT INTO tablea(type, related_id, another1)
SELECT 3, null, another1
FROM to_be_copied
ORDER BY id -- 1
RETURNING id, another1
)
UPDATE tablea t
SET related_id = s.type3_id
FROM (
SELECT
*
FROM
(SELECT id as type1_id, row_number() OVER (ORDER BY id) FROM to_be_copied) tbc
JOIN
(SELECT id as type3_id, row_number() OVER (ORDER BY id) FROM ins) i
ON tbc.row_number = i.row_number
) s
WHERE t.id = s.type1_id
This solution assumes that the given order at (1) ensures the inserting order of the new records. In fact, I am not quite sure about it. But if so: First all type1 records are queried. After that there are copied (in the same order!). After that the old and the new records ids are taken. The row_number() window function adds a consecutive row count to the records. So if both data sets have the same order, the old ids should get the same row number as their corresponding new ids. In that case an identification is possible. For the small example this works...
--> Edit: This seems to say: Yes, the order will be preserved since Postgres 9.6 https://stackoverflow.com/a/50822258/3984221
According to this question Postgres retains the order of row inserted via a SELECT with explicit ORDER BY as of 9.6. We can use this to connect the inserted rows with those they come from using row_number().
WITH
"cte1"
AS
(
SELECT "id",
3 "type",
"related_id",
"another1",
row_number() OVER (ORDER BY "id") "rn"
FROM "tablea"
WHERE "type" = 1
),
"cte2"
AS
(
INSERT INTO "tablea"
("type",
"another1")
SELECT "type",
"another1"
FROM "cte1"
ORDER BY "id"
RETURNING "id"
),
"cte3"
AS
(
SELECT "id",
row_number() OVER (ORDER BY "id") "rn"
FROM "cte2"
)
UPDATE "tablea"
SET "related_id" = "cte3"."id"
FROM "cte1"
INNER JOIN "cte3"
ON "cte3"."rn" = "cte1"."rn"
WHERE "cte1"."id" = "tablea"."id";
In the first CTE we get all the rows, that should be insert along with their row_number() ordered by their ID. In the second one we insert them by selecting from the first CTE explicitly ordering by the ID. We return the inserted ID in the second CTE, so that we can select it in the third CTE where we again add a row_number() ordered by the ID. We can now join the first and the third CTE via the row number to get pairs of original ID and newly inserted IDs. Base on that we can update the table setting the related IDs.
db<>fiddle
Related
I have multiple tables that I am inserting into, and I would like one of the tables to be partitioned after inserting into it so that I can determine the most updated IDs and label their ACTIVE status. My SQL for one of the tables that already contains my data looks as follows:
CREATE TABLE IF NOT EXISTS MY_TABLE
(
LINK_ID BINARY NOT NULL,
LOAD TIMESTAMP NOT NULL,
SOURCE STRING NOT NULL,
SOURCE_DATE TIMESTAMP NOT NULL,
ORDER BIGINT NOT NULL,
ID BINARY NOT NULL,
ATTRIBUTE_ID BINARY NOT NULL
);
INSERT ALL
WHEN HAS_DATA AND ID_SEQ_NUM > 1 AND (SELECT COUNT(1) FROM MY_TABLE WHERE ID = KEY) = 0 THEN
INTO MY_TABLE VALUES (
LINK_KEY,
TIME,
DATASET_NAME,
DATASET_DATE,
ORDER_NUMBER,
O_KEY,
OA_KEY
)
SELECT *
FROM TEST_TABLE;
Currently, I am inserting records that show a change in any of the columns to the table. I have extended the table to now include an ACTIVE column and defaulted every record to TRUE for the current records in the table.
ALTER TABLE MY_TABLE ADD COLUMN ACTIVE BOOLEAN DEFAULT FALSE;
When a new record is inserted which indicates a change in one of the column values, I want that ACTIVE value for the new record to be TRUE for that ID group while changing the ACTIVE value for the other records within the ID group to be FALSE (so the previous records would be not considered active/would be FALSE while the most recent record that is inserted indicated by the ORDER value would be active/TRUE)
At first, I tried doing:
INSERT ALL
WHEN HAS_DATA AND ID_SEQ_NUM > 1 AND (SELECT COUNT(1) FROM MY_TABLE WHERE ID = KEY) = 0 THEN
INTO MY_TABLE VALUES (
LINK_KEY,
TIME,
DATASET_NAME,
DATASET_DATE,
ORDER_NUMBER,
O_KEY,
OA_KEY,
ACTIVE
)
SELECT *, OFFSET_NUMBER = MAX(OFFSET_NUMBER) OVER (PARTITION BY O_KEY) AS ACTIVE,
FROM TEST_TABLE;
However, this does not seem to change the records for each ID group that already exist in the table to false when a new record is inserted that is considered to be the most recent active record. Is there a way I can run this select statement below after all the new records are inserted, but still have it be in the same statement that contains the insertion process?
SELECT *, ORDER = MAX(ORDER) OVER (PARTITION BY ID) AS ACTIVE,
FROM MY_TABLE
I have a table that looks like:
ID|CREATED |VALUE
1 |1649122158|200
1 |1649122158|200
1 |1649122158|200
That I'd like to look like:
ID|CREATED |VALUE
1 |1649122158|200
And I run the following query:
DELETE FROM MY_TABLE T USING (SELECT ID,CREATED,ROW_NUMBER() OVER (PARTITION BY ID ORDER BY CREATED DESC) AS RANK_IN_KEY FROM MY_TABLE T) X WHERE X.RANK_IN_KEY <> 1 AND T.ID = X.ID AND T.CREATED = X.CREATED
But it removes everything from MY_TABLE and not just other rows with the same value. This is more than just selecting distinct records, I'd like to enforce a unique constraint to get the latest value of ID and keep just one record for it, even if there were duplicates.
So
ID|CREATED |VALUE
1 |1649122158|200
1 |1649122159|300
2 |1649122158|200
2 |1649122158|200
3 |1649122170|500
3 |1649122160|200
Would become (using the same final unique constraint statement):
ID|CREATED |VALUE
1 |1649122159|300
2 |1649122158|200
3 |1649122170|500
How can I improve my logic to properly handle these unique constraint modifications?
Check out this post: https://community.snowflake.com/s/question/0D50Z00008EJgemSAD/how-to-delete-duplicate-records-
If all columns make up a unique records, the recommended solution is the insert all the records into a new table with SELECT DISTINCT * and do a swap. You could also do a INSERT OVERWRITE INTO the same table.
Something like INSERT OVERWRITE INTO tableA SELECT DISTINCT * FROM tableA;
The following setup should leave rows with id of 1 and 3. And not delete all rows as you say.
Schema
create table t (
id int,
created int ,
value int
);
insert into t values(1, 1649122158, 200);
insert into t values(1 ,1649122159, 300);
insert into t values(2 ,1649122158, 200);
insert into t values(2 ,1649122158, 200);
insert into t values(3 ,1649122170, 500);
insert into t values(3 ,1649122160, 200);
Delete statement
with x as (
SELECT
id, created,
row_number() over(partition by id) as r
FROM t
)
delete from t
using x
where x.id = t.id and x.r <> 1 and x.created = t.created
;
Output
select * from t;
1 1649122158 200
3 1649122170 500
The logic is such, that the table in the using clause is joined with the operated on table. Following the join logic, it just matches by some key. In your case, you have key as {id,created}. This key is duplicated for rows with id of 2. So the whole group is deleted.
I'm no savvy in database schemas. But as a thought, you may add a row with a rank to existing table. And after that you can proceed with deletion. This way you do not need to create other table and insert values to that. Be warned that data may become fragmented(physically, on disks). So you will need to run some kind of tune up later.
Update
You may find this almost one-liner interesting:
SO answer
I will duplicate code here, as it is so small and well written.
WITH
u AS (SELECT DISTINCT * FROM your_table),
x AS (DELETE FROM your_table)
INSERT INTO your_table SELECT * FROM u;
I am trying to get the row number of an inserted record so I can use it for a select statement. What I am trying to accomplish is insert a person into one table, get that row number and then select something from another table where the row numbers match. Here is what I got so far:
INSERT INTO TableA Values (‘Person’)
Select timeToken
From
(
Select
Row_Number() Over (Order By tokenOrder) As RowNum
, *
From TableB WHERE taken = false
) t2
Where RowNum = (Row Number of Inserted Item)
How do I get the row number of the inserted item, I want to compare ids as some records might have been deleted so they would not match.
TABLEA Data (primary key is id)
id name
3 John
12 Steve
TABLEB Data (primary key is id)
id timeToken tokenOrder taken
2 1:00am 1 false
3 2:00am 2 false
5 3:00am 3 true
6 4:00am 4 false
My expect result when I insert person, the select take would return 4:00am
I am doing this in a stored procedure.
It is an error to think that rows have numbers unless an ORDER BY clause is included.
The only way to find a row after you have inserted it is to search for it. Presumably your table has a primary key; use that to search for it.
Try This .It may help you out
Declare #TableA_PK BIGINT
INSERT INTO TableA Values ('Person')
SET #TableA_PK=SCOPE_IDENTITY()
Select timeToken
From
(
Select
Row_Number() Over (Order By tokenOrder) As RowNum
, *
From TableB WHERE taken = false
) t2
Where RowNum =#TableA_PK
SCOPE_IDENTITY(): Scope Identity will captures the last inserted record primary key value and which can be stored in a varaible and
and then it can be for further re-use
By the sounds of it you are trying to do something like what is listed on thhe following link LINK - SQL Server - Return value after INSERT
Basically :
INSERT INTO TableA (Person)
OUTPUT Inserted.ID
VALUES('bob');
Adding a foreign key constraint(referencing primary key in table A) in table b will be good since you won't be able to delete records from table A without deleting them from table B. It'll be helpful for comparing the records using ID.
Try this
declare #rowNum int;
INSERT INTO TableA Values ('Person')
SET #rowNum =SCOPE_IDENTITY()
select * from TableA where id = #rowNum
To clarify the title, in a select statement, in the where clause, I need to verify to table on which I am doing using another select. In that second select, I have to find all the secondary ID. Here is what I have worked out so far
Declare #id INT
--inserting values in temp table
SELECT
rn = ROW_NUMBER() OVER (ORDER BY adt_trl_dt_tm),
*
INTO #Temp
FROM dbo.EVNT_HSTRY
ORDER BY adt_trl_dt_tm DESC
--Searching for items that are deleted and have not been restored
SELECT *
FROM dbo.EVNT_HSTRY hstry
WHERE evnt_hstry_cd LIKE '3' and
adt_trl_dt_tm > (SELECT adt_trl_dt_tm FROM dbo.EVNT_HSTRY WHERE evnt_id = evnt_id
DROP TABLE #Temp
To clarify the code, evnt_id is a foreign key. The primary key is evnt_Hstry_id. The evnt_hstry_cd 3 means deleted. What I am trying to do is to see if the field adt_trl_dt_tm (lastest date modified) of the row being read is the latest by comparing it with all the adt_trl_dt_tm fields that have the same evnt_id.
The table I am doing the select on is the table where we store the history of the events. It is where we say when the event has been added, modified, deleted and or restored.
Sadly, I cannot do that into my application as this statement is being run in an SSIS.
Overall, I need to compare the adt_trl_dt_tm with the other adt_trl_dt_tm that have the same evnt_id and select the latest.
Can you test this with your data ?
SELECT *
FROM dbo.EVNT_HSTRY hstry
WHERE evnt_hstry_cd LIKE '3' and
not exists (select 1 from EVNT_HSTRY WHERE hstry.evnt_id = evnt_id
AND Hstry.adt_trl_dt_tm > adt_trl_dt_tm)
SELECT *
FROM dbo.EVNT_HSTRY hstry
WHERE evnt_hstry_cd = '3' and
adt_trl_dt_tm = (
SELECT max(adt_trl_dt_tm) FROM dbo.EVNT_HSTRY WHERE evnt_id = hstry.evnt_id
)
will result in a row read if the code 3 is the most recent entry in hstry and no row if there is a more recent row not having code 3
Change LIKE in = if it matches exactly
I have an Access table of the form (I'm simplifying it a bit)
ID AutoNumber Primary Key
SchemeName Text (50)
SchemeNumber Text (15)
This contains some data eg...
ID SchemeName SchemeNumber
--------------------------------------------------------------------
714 Malcolm ABC123
80 Malcolm ABC123
96 Malcolms Scheme ABC123
101 Malcolms Scheme ABC123
98 Malcolms Scheme DEF888
654 Another Scheme BAR876
543 Whatever Scheme KJL111
etc...
Now. I want to remove duplicate names under the same SchemeNumber. But I want to leave the record which has the longest SchemeName for that scheme number. If there are duplicate records with the same longest length then I just want to leave only one, say, the lowest ID (but any one will do really). From the above example I would want to delete IDs 714, 80 and 101 (to leave only 96).
I thought this would be relatively easy to achieve but it's turning into a bit of a nightmare! Thanks for any suggestions. I know I could loop it programatically but I'd rather have a single DELETE query.
See if this query returns the rows you want to keep:
SELECT r.SchemeNumber, r.SchemeName, Min(r.ID) AS MinOfID
FROM
(SELECT
SchemeNumber,
SchemeName,
Len(SchemeName) AS name_length,
ID
FROM tblSchemes
) AS r
INNER JOIN
(SELECT
SchemeNumber,
Max(Len(SchemeName)) AS name_length
FROM tblSchemes
GROUP BY SchemeNumber
) AS w
ON
(r.SchemeNumber = w.SchemeNumber)
AND (r.name_length = w.name_length)
GROUP BY r.SchemeNumber, r.SchemeName
ORDER BY r.SchemeName;
If so, save it as qrySchemes2Keep. Then create a DELETE query to discard rows from tblSchemes whose ID value is not found in qrySchemes2Keep.
DELETE
FROM tblSchemes AS s
WHERE Not Exists (SELECT * FROM qrySchemes2Keep WHERE MinOfID = s.ID);
Just beware, if you later use Access' query designer to make changes to that DELETE query, it may "helpfully" convert the SQL to something like this:
DELETE s.*, Exists (SELECT * FROM qrySchemes2Keep WHERE MinOfID = s.ID)
FROM tblSchemes AS s
WHERE (((Exists (SELECT * FROM qrySchemes2Keep WHERE MinOfID = s.ID))=False));
DELETE FROM Table t1
WHERE EXISTS (SELECT 1 from Table t2
WHERE t1.SchemeNumber = t2.SchemeNumber
AND Length(t2.SchemeName) > Length(t1.SchemeName)
)
Depend on your RDBMS you may use function different from Length (Oracle - length, mysql - length, sql server - LEN)
delete ShortScheme
from Scheme ShortScheme
join Scheme LongScheme
on ShortScheme.SchemeNumber = LongScheme.SchemeNumber
and (len(ShortScheme.SchemeName) < len(LongScheme.SchemeName) or (len(ShortScheme.SchemeName) = len(LongScheme.SchemeName) and ShortScheme.ID > LongScheme.ID))
(SQL Server flavored)
Now updated to include the specified tie resolution. Although, you may get better performance doing it in two queries: first deleting the schemes with shorter names as in my original query and then going back and deleting the higher ID where there was a tie in name length.
I'd do this in multiple steps. Large delete operations done in a single step make me too nervous -- what if you make a mistake? There's no sql 'undo' statement.
-- Setup the data
DROP Table foo;
DROP Table bar;
DROP Table bat;
DROP Table baz;
CREATE TABLE foo (
id int(11) NOT NULL,
SchemeName varchar(50),
SchemeNumber varchar(15),
PRIMARY KEY (id)
);
insert into foo values (714, 'Malcolm', 'ABC123' );
insert into foo values (80, 'Malcolm', 'ABC123' );
insert into foo values (96, 'Malcolms Scheme', 'ABC123' );
insert into foo values (101, 'Malcolms Scheme', 'ABC123' );
insert into foo values (98, 'Malcolms Scheme', 'DEF888' );
insert into foo values (654, 'Another Scheme ', 'BAR876' );
insert into foo values (543, 'Whatever Scheme ', 'KJL111' );
-- Find all the records that have dups, find the longest one
create table bar as
select max(length(SchemeName)) as max_length, SchemeNumber
from foo
group by SchemeNumber
having count(*) > 1;
-- Find the one we want to keep
create table bat as
select min(a.id) as id, a.SchemeNumber
from foo a join bar b on a.SchemeNumber = b.SchemeNumber
and length(a.SchemeName) = b.max_length
group by SchemeNumber;
-- Select into this table all the rows to delete
create table baz as
select a.id from foo a join bat b where a.SchemeNumber = b.SchemeNumber
and a.id != b.id;
This will give you a new table with only records for rows that you want to remove.
Now check these out and make sure that they contain only the rows you want deleted. This way you can make sure that when you do the delete, you know exactly what to expect. It should also be pretty fast.
Then when you're ready, use this command to delete the rows using this command.
delete from foo where id in (select id from baz);
This seems like more work because of the different tables, but it's safer probably just as fast as the other ways. Plus you can stop at any step and make sure the data is what you want before you do any actual deletes.
If your platform supports ranking functions and common table expressions:
with cte as (
select row_number()
over (partition by SchemeNumber order by len(SchemeName) desc) as rn
from Table)
delete from cte where rn > 1;
try this:
Select * From Table t
Where Len(SchemeName) <
(Select Max(Len(Schemename))
From Table
Where SchemeNumber = t.SchemeNumber )
And Id >
(Select Min (Id)
From Table
Where SchemeNumber = t.SchemeNumber
And SchemeName = t.SchemeName)
or this:,...
Select * From Table t
Where Id >
(Select Min(Id) From Table
Where SchemeNumber = t.SchemeNumber
And Len(SchemeName) <
(Select Max(Len(Schemename))
From Table
Where SchemeNumber = t.SchemeNumber))
if either of these selects the records that should be deleted, just change it to a delete
Delete
From Table t
Where Len(SchemeName) <
(Select Max(Len(Schemename))
From Table
Where SchemeNumber = t.SchemeNumber )
And Id >
(Select Min (Id)
From Table
Where SchemeNumber = t.SchemeNumber
And SchemeName = t.SchemeName)
or using the second construction:
Delete From Table t Where Id >
(Select Min(Id) From Table
Where SchemeNumber = t.SchemeNumber
And Len(SchemeName) <
(Select Max(Len(Schemename))
From Table
Where SchemeNumber = t.SchemeNumber))