I have a table that looks like:
ID|CREATED |VALUE
1 |1649122158|200
1 |1649122158|200
1 |1649122158|200
That I'd like to look like:
ID|CREATED |VALUE
1 |1649122158|200
And I run the following query:
DELETE FROM MY_TABLE T USING (SELECT ID,CREATED,ROW_NUMBER() OVER (PARTITION BY ID ORDER BY CREATED DESC) AS RANK_IN_KEY FROM MY_TABLE T) X WHERE X.RANK_IN_KEY <> 1 AND T.ID = X.ID AND T.CREATED = X.CREATED
But it removes everything from MY_TABLE and not just other rows with the same value. This is more than just selecting distinct records, I'd like to enforce a unique constraint to get the latest value of ID and keep just one record for it, even if there were duplicates.
So
ID|CREATED |VALUE
1 |1649122158|200
1 |1649122159|300
2 |1649122158|200
2 |1649122158|200
3 |1649122170|500
3 |1649122160|200
Would become (using the same final unique constraint statement):
ID|CREATED |VALUE
1 |1649122159|300
2 |1649122158|200
3 |1649122170|500
How can I improve my logic to properly handle these unique constraint modifications?
Check out this post: https://community.snowflake.com/s/question/0D50Z00008EJgemSAD/how-to-delete-duplicate-records-
If all columns make up a unique records, the recommended solution is the insert all the records into a new table with SELECT DISTINCT * and do a swap. You could also do a INSERT OVERWRITE INTO the same table.
Something like INSERT OVERWRITE INTO tableA SELECT DISTINCT * FROM tableA;
The following setup should leave rows with id of 1 and 3. And not delete all rows as you say.
Schema
create table t (
id int,
created int ,
value int
);
insert into t values(1, 1649122158, 200);
insert into t values(1 ,1649122159, 300);
insert into t values(2 ,1649122158, 200);
insert into t values(2 ,1649122158, 200);
insert into t values(3 ,1649122170, 500);
insert into t values(3 ,1649122160, 200);
Delete statement
with x as (
SELECT
id, created,
row_number() over(partition by id) as r
FROM t
)
delete from t
using x
where x.id = t.id and x.r <> 1 and x.created = t.created
;
Output
select * from t;
1 1649122158 200
3 1649122170 500
The logic is such, that the table in the using clause is joined with the operated on table. Following the join logic, it just matches by some key. In your case, you have key as {id,created}. This key is duplicated for rows with id of 2. So the whole group is deleted.
I'm no savvy in database schemas. But as a thought, you may add a row with a rank to existing table. And after that you can proceed with deletion. This way you do not need to create other table and insert values to that. Be warned that data may become fragmented(physically, on disks). So you will need to run some kind of tune up later.
Update
You may find this almost one-liner interesting:
SO answer
I will duplicate code here, as it is so small and well written.
WITH
u AS (SELECT DISTINCT * FROM your_table),
x AS (DELETE FROM your_table)
INSERT INTO your_table SELECT * FROM u;
I have below table
create table test(id serial, key int,type text,words text[],numbers int[] );
insert into test(key,type,words) select 1,'Name',array['Table'];
insert into test(key,type,numbers) select 1,'product_id',array[2];
insert into test(key,type,numbers) select 1,'price',array[40];
insert into test(key,type,numbers) select 1,'Region',array[23,59];
insert into test(key,type,words) select 2,'Name',array['Table1'];
insert into test(key,type,numbers) select 2,'product_id',array[1];
insert into test(key,type,numbers) select 2,'price',array[34];
insert into test(key,type,numbers) select 2,'Region',array[23,59,61];
insert into test(key,type,words) select 3,'Name',array['Chair'];
insert into test(key,type,numbers) select 3,'product_id',array[5];
I was using below query to pivot table for users.
select key,
max(array_to_string(words,',')) filter(where type='Name') as "Name",
cast(max(array_to_string(numbers,',')) filter(where type='product_id') as int) as "product_id",
cast(max(array_to_string(numbers,',')) filter(where type='price') as int) as "price" ,
max(array_to_string(numbers,',')) filter(where type='Region') as "Region"
from test group by key
But I couldn't unnest the Region column during Pivot in-order to use Region column to join with another table .
My expected output is below
Since we are using unnest("Region") to do to pivot. There must be a row with region data for each product.
Or below code will do the trick by creating an array of null.
unnest(CASE WHEN array_length("Region", 1) >= 1
THEN "Region"
ELSE '{null}'::int[] END)
Schema:
create table test(id serial, key int,type text,words text[],numbers int[] );
insert into test(key,type,words) select 1,'Name',array['Table'];
insert into test(key,type,numbers) select 1,'product_id',array[2];
insert into test(key,type,numbers) select 1,'price',array[40];
insert into test(key,type,numbers) select 1,'Region',array[23,59];
insert into test(key,type,words) select 2,'Name',array['Table1'];
insert into test(key,type,numbers) select 2,'product_id',array[1];
insert into test(key,type,numbers) select 2,'price',array[34];
insert into test(key,type,numbers) select 2,'Region',array[23,59,61];
insert into test(key,type,words) select 3,'Name',array['Chair'];
insert into test(key,type,numbers) select 3,'product_id',array[5];
select key,"Name",product_id,price,unnest(CASE WHEN array_length("Region", 1) >= 1
THEN "Region"
ELSE '{null}'::int[] END) from
(
select key,
max(array_to_string(words,',')) filter(where type='Name') as "Name",
cast(max(array_to_string(numbers,',')) filter(where type='product_id') as int) as "product_id",
cast(max(array_to_string(numbers,',')) filter(where type='price') as int) as "price" ,
max(numbers) filter(where type='Region') as "Region"
from test group by key
)t order by key
key
Name
product_id
price
unnest
1
Table
2
40
23
1
Table
2
40
59
2
Table1
1
34
23
2
Table1
1
34
59
2
Table1
1
34
61
3
Chair
5
null
null
db<>fiddle here
Very strange database design... I'm assuming you inherited it?
If none of the other array values will ever have a cardinality > 1 then, you can simply unnest:
select
key,
(max (words) filter (where type = 'Name'))[1] as name,
(max (numbers) filter (where type = 'product_id'))[1] as product_id,
(max (numbers) filter (where type = 'price'))[1] as price,
unnest (max (numbers) filter (where type = 'Region')) as region
from test
group by key
If they can have multiple values, that can also be handled.
-- EDIT 3/15/2021 --
Short version: an unnest against a null won't product a row, so if you coalesce the null value into an array of a single null element, that should take care of this part:
select
key,
(max (words) filter (where type = 'Name'))[1] as name,
(max (numbers) filter (where type = 'product_id'))[1] as product_id,
(max (numbers) filter (where type = 'price'))[1] as price,
unnest (coalesce (max (numbers) filter (where type = 'Region'), array[null]::integer[])) as region
from test
group by key
order by key
Now for the part you didn't ask... I and at least one other have been gently nudging you that your database design is going to cause multiple problems at every turn. The fact that it's in production doesn't mean you shouldn't fix it as soon as you can.
This design is what's known as EAV - Entity - Attribute - Value. It has its use cases, but like most good things it can also be applied when it shouldn't. The use case that comes to mind is if you want users to be able to dynamically add attributes to certain objects. Even then, there might be better/easier ways.
And as one example, if you have one million objects, five attributes means you have to store that as five million rows, and the majority of that space will be occupied with repeating the key and attribute names.
Just food for thought. We can continue to triage this with every new scenario you find, but it would be better to redo the design.
I need to replace non-zeros in column within select statement.
SELECT Status, Name, Car from Events;
I can do it like this:
SELECT (Replace(Status, '1', 'Ready'), Name, Car from Events;
Or using Case/Update.
But I have numbers from -5 to 10 and writing Replace or something for each case is not good idea.
How can I add comparasing with replace without updating database?
Table looks like this:
Status Name Car
0 John Porsche
1 Bill Dodge
5 Megan Ford
The standard method is to use case:
select t.*,
(case when status = 1 then 'Ready'
else 'Something else'
end) as status_string
from t;
I would instead recommend, though, that you have a status reference table:
create table statuses (
status int primary key,
name varchar(255)
);
insert into statuses (status, name)
values (0, 'UNKNOWN'),
(1, 'READY'),
. . . -- for the rest of the statuses
Then use JOIN:
select t.*, s.name
from t join
statuses s
on t.status = s.status;
SELECT IF(status =1, 'Approved', 'Pending') FROM TABLENAME
I am new to SQL and I an trying to understand the GROUP BY statement.
I have inserted the following data in SQL:
CREATE TABLE table( id integer, type text);
INSERT INTO table VALUES (1,'start');
INSERT INTO table VALUES (2,'start');
INSERT INTO table VALUES (2,'complete');
INSERT INTO table VALUES (3,'complete');
INSERT INTO table VALUES (3,'start');
INSERT INTO table VALUES (4,'start');
I want to select those IDs that do not have a type 'complete'. For this example I should get IDs 1, 4.
I have tried multiple GROUP BY - HAVING combinations. My best approach is:
SELECT id from customers group by type having type!='complete';
but the resulted IDs are 4,3,2.
Could anyone give me a hint about what I am doing wrong?
You are close. The having clause needs an aggregation function and you need to aggregate by id:
select id
from table t
group by id
having sum(case when type = 'complete' then 1 else 0 end) = 0;
Normally, if you have something called an id, you would also have a table with that as primary key. If so, you can also do:
select it.id
from idtable it
where not exists (select 1
from table t
where t.type = 'complete' and it.id = t.id
);
i got a problem once again :D
a little info first:
im trying to copy data from one table to an other table(structure is the same).
now one cell needs to be incremented, beginns per group at 1 (just like a histroy).
i have this table:
create table My_Test/My_Test2 (
my_Id Number(8,0),
my_Num Number(6,0),
my_Data Varchar2(100));
(my_Id, my_Num is a nested PK)
if i want to insert a new row, i need to check if the value in my_id already exists.
if this is true then i need to use the next my_Num for this Id.
i have this in my Table:
My_Id My_Num My_Data
1 1 'test1'
1 2 'test2'
2 1 'test3'
if i add now a row for my_Id 1, the row would look like this:
i have this in my Table:
My_Id My_Num My_Data
1 3 'test4'
this sounds pretty easy ,now i need to make it in a SQL
and on SQL Server i had the same problem and i used this:
Insert Into My_Test (My_Id,My_Num,My_Data)
SELECT my_Id,
(
SELECT
CASE (
CASE MAX(a.my_Num)
WHEN NULL
THEN 0
Else Max(A.My_Num)
END) + b.My_Num
WHEN NULL
THEN 1
ELSE (
CASE MAX(a.My_Num)
WHEN NULL
THEN 0
Else Max(A.My_Num)
END) + b.My_Num
END
From My_Test A
where my_id = 1
)
,My_Data
From My_Test2 B
where my_id = 1;
this Select gives null back if no Rows are found in the subselect
is there a way so i could use max in the case? and if it give null back it should use 0 or 1?
Edit:
Im usung now this:
Insert INTO My_Test ( My_Id,My_Num,My_Data )
SELECT B.My_Id,
(
SELECT COALESCE(MAX(a.My_Num),0) + b.my_Num
FROM My_Test A
Where a.My_Id = b.My_Id)
,b.My_Data
FROM My_Test2 B
WHERE My_Id = 1
THX to Bharat and OMG Ponies
greets
Auro
Try this one
Insert Into My_Test (My_Id,My_Num,My_Data)
SELECT my_Id,(
SELECT MAX(NVL(My_Num,0)) + 1
From My_Test
where my_id = b.my_id
)
,My_Data
From My_Test2 B
where my_id = <your id>;
Insert Into My_Test (My_Id,My_Num,My_Data)
select My_id,coalesce(max(My_num),0),'test4' from My_Test
where My_id=1
group by My_id
All solutions have a problem in that they don't work in a multi user environment. If two sessions issue that insert statement at the same time, they would both get the same (my_id,my_num) combination, and one of them will fail with a ORA-00001 unique constraint violation. Therefore, if you need this to work in a multi user environment, the best advice is to use only one primary key column and populate it with a sequence. Keep your my_id column as well, as that is a sort-of-grouping column or foreign key column. If your end users really like to see the "my_num" column in their (web) application, you can use the row_number analytic function.
You can read more about this scenario in this blogpost of mine: http://rwijk.blogspot.com/2008/01/sequence-within-parent.html
Regards,
Rob.