SQLite: How to update rows with a sequence of numbers? - sql

In SQLIte I would like to renumber the values in a specific column with a sequence of numbers.
For example the relevance-column in these rows:
relevance | value
-------------------
3 | value1
5 | valueb
8 | valuex
9 | valueaa
must be updated starting from 1 with increment 1:
relevance | value
-------------------
1 | value1
2 | valueb
3 | valuex
4 | valueaa
What I'm looking for, is something like this:
-- first set all to startvalue
UPDATE MyTable SET relevance = 0;
-- then renumber:
UPDATE MyTable SET relevance = (some function to increase by 1 to the previous row);
I tried this, but its not increasing, seems like Max is not evaluating on each row:
UPDATE MyTable SET relevance = (SELECT Max(relevance ))+1;

First create a temporary table where you will insert the column relevance from your table and with ROW_NUMBER() window function another column with the new sequence and then update from this temporary table:
drop table if exists temp.tmp;
create temporary table tmp as
select relevance, row_number() over (order by relevance) rn
from MyTable;
update MyTable
set relevance = (
select rn from temp.tmp
where temp.tmp.relevance = MyTable.relevance
);
drop table temp.tmp;
See the demo.

Related

Remove duplicate rows based on specific columns

I have a table that contains these columns:
ID (varchar)
SETUP_ID (varchar)
MENU (varchar)
LABEL (varchar)
The thing I want to achieve is to remove all duplicates from the table based on two columns (SETUP_ID, MENU).
Table I have:
id | setup_id | menu | label |
-------------------------------------
1 | 10 | main | txt |
2 | 10 | main | txt |
3 | 11 | second | txt |
4 | 11 | second | txt |
5 | 12 | third | txt |
Table I want:
id | setup_id | menu | label |
-------------------------------------
1 | 10 | main | txt |
3 | 11 | second | txt |
5 | 12 | third | txt |
You can achieve this with a common table expression (cte)
with cte as (
select id, setup_id, menu,
row_number () over (partition by setup_id, menu, label) rownum
from atable )
delete from atable a
where id in (select id from cte where rownum >= 2)
This will give you your desired output.
Common Table Expression docs
Assuming a table named tbl where both setup_id and menu are defined NOT NULL and id is the PRIMARY KEY.
EXISTS will do nicely:
DELETE FROM tbl t0
WHERE EXISTS (
SELECT FROM tbl t1
WHERE t1.setup_id = t0.setup_id
AND t1.menu = t0.menu
AND t1.id < t0.id
);
This deletes every row where a dupe with lower id is found, effectively only keeping the row with the smallest id from each set of dupes. An index on (setup_id, menu) or even (setup_id, menu, id) will help performance with big tables a lot.
If there is no PK and no reliable UNIQUE (combination of) column(s), you can fall back to using the ctid. If NULL values can be involved, you need to specify how to deal with those.
Consider:
Delete duplicate rows from small table
How to delete duplicate rows without unique identifier
How do I (or can I) SELECT DISTINCT on multiple columns?
After cleaning up duplicates, add a UNIQUE constraint to prevent new dupes:
ALTER TABLE tbl ADD CONSTRAINT tbl_setup_id_menu_uni UNIQUE (setup_id, menu);
If you had an index on (setup_id, menu), drop that now. It's superseded by the UNIQUE constraint.
I have found a solution that fits me the best.
Here it is if anyone needs it:
DELETE FROM table_name
WHERE id IN
(SELECT id
FROM
(SELECT id,
ROW_NUMBER() OVER( PARTITION BY setup_id,
menu
ORDER BY id ) AS row_num
FROM table_name ) t
WHERE t.row_num > 1 );
link: https://www.postgresql.org/docs/current/queries-union.html
https://www.postgresql.org/docs/current/sql-select.html#SQL-DISTINCT
let's sat table name is a
select distinct on (setup_id,menu ) a.* from a;
Key point: The DISTINCT ON expression(s) must match the leftmost ORDER BY expression(s). The ORDER BY clause will normally contain additional expression(s) that determine the desired precedence of rows within each DISTINCT ON group.
Which means you can only order by setup_id,menu in this distinct on query scope.
Want the opposite:
EXCEPT returns all rows that are in the result of query1 but not in the result of query2. (This is sometimes called the difference between two queries.) Again, duplicates are eliminated unless EXCEPT ALL is used.
SELECT * FROM a
EXCEPT
select distinct on (setup_id,menu ) a.* from a;
You can try something along these lines to delete all but the first row in case of duplicates (please note that this is not tested in any way!):
DELETE FROM your_table WHERE id IN (
SELECT unnest(duplicate_ids[2:]) FROM (
SELECT array_agg(id) AS duplicate_ids FROM your_table
GROUP BY SETUP_ID, MENU
HAVING COUNT(*) > 1
)
)
)
The above collects the ids of the duplicate rows (COUNT(*) > 1) in an array (array_agg), then takes all but the first element in that array ([2:]) and "explodes" the id values into rows (unnest).
The outer query just deletes every id that ends up in that result.
For mysql the similar question is already answered here Find and remove duplicate rows by two columns
Try if any of the approach helps in this matter.
I like the below one for MySql:
ALTER IGNORE TABLE your_table ADD UNIQUE (SETUP_ID, MENU);
DELETE t1
FROM table_name t1
join table_name t2 on
(t2.setup_id = t1.setup_id or t2.menu = t1.menu) and t2.id < t1.id
There are many ways to find and delete all duplicate row(s) based on conditions. But I like inner join method, which works very fast even in a large amount of Data. Please check follows :
DELETE T1 FROM <TableName> T1
INNER JOIN <TableName> T2
WHERE
T1.id > T2.id AND
T1.<ColumnName1> = T2.<ColumnName1> AND T1.<ColumnName2> = T2.<ColumnName2>;
In your case you can write as follows :
DELETE T1 FROM <TableName> T1
INNER JOIN <TableName> T2
WHERE
T1.id > T2.id AND
T1.setup_id = T2. setup_id;
Let me know if you face any issue or need more help.

ORACLE - Setting RANK of duplicated on a big table, optimization needed

This is a simplified extract for a more complex algorithm.
The problem is I have a simple table C_HASH like this:
CREATE TABLE C_HASH
(
HASH CHAR (48),
RANK INTEGER
);
First I fill the table with all the hash values. But because I can have duplicates in HASH, to identify the duplicates one by one I need to set the RANK by HASH.
I do this SQL statement but it is way to long, I have indexed the HASH column, with no effect:
UPDATE C_HASH a set RANK = ( select temp.rank from ( select rowid, rank() over ( PARTITION BY HASH ORDER BY ROWID ) rank from C_HASH ) temp where temp.rowid = a.rowid);
I need to optimize this! A clue?
You could use the merge syntax:
merge into c_hash c
using (
select rowid, row_number() over(partition by hash order by rowid) rank
from c_hash
) c1
on (c1.rowid = c.rowid)
when matched then update set c.rank = c1.rank
Demo on DB Fiddle
Sample data:
HASH | RANK
:----------------------------------------------- | ---:
foo | null
foo | null
foo | null
bar | null
Results:
HASH | RANK
:----------------------------------------------- | ---:
foo | 1
foo | 2
foo | 3
bar | 1
If you are going to update a lot of rows, it might be more efficient to create a new table, using the insert ... select syntax:
create table c_hash2 as
select hash, row_number() over(partition by hash order by rowid) as rank
from c_hash
This is going to take a long time, because you are updating all rows. But you can simplify the logic to:
update c_hash h
set rank = (select count(*)
from c_hash h2
where h2.hash = h.hash and h2.rowid <= h.rowid
);
This should be table to take advantage of your existing index.

SQL SELECT column value based on value

I am using Postgresql. I would like to write a SELECT statement with a column value based on the value in the database.
For example.
| id | indicator |
| 1 | 0 |
| 2 | 1 |
indicator can be 0 or 1 only where 0 = manual and 1 = auto.
Expected output from a SELECT *
1 manual
2 auto
You can use a case expression:
select id, case indicator
when 0 then 'manual'
when 1 then 'auto'
end as indicator
from the_table;
If you need that frequently you could create a view for that.
In the long run, it might be better to create a proper lookup table and join to that:
create table indicator
(
id integer primary key,
name text not null
);
insert into indicator (id, name)
values (0, 'manua', 1, 'auto');
alter table the_table
add constraint fk_indicator
foreign key (indicator) references indicator (id);
Then join to it:
select t.id, i.name as indicator
from the_table t
join indicator i on i.id = t.indicator;

Make sure no two rows contain identical values in Postgresql

I have a table and I want to make sure that no two rows can be alike.
So, for example, this table would be valid:
user_id | printer
---------+-------------
1 | LaserWriter
4 | LaserWriter
1 | ThinkJet
2 | DeskJet
But this table would not be:
user_id | printer
---------+-------------
1 | LaserWriter
4 | LaserWriter
1 | ThinkJet <--error (duplicate row)
2 | DeskJet
1 | ThinkJet <--error (duplicate row)
This is because the last table has two instances of 1 | ThinkJet.
So, user_id can be repeated (i.e. 1) and printer can be repeated (i.e. LaserWriter) but once a record like 1 | ThinkJet is entered once that combination cannot be entered again.
How can I prevent such occurrences in a Postgresql 11.5 table?
I would try experimenting with SQL code but alas I am still new on the matter.
Please note this is for INSERTing data into the table, not SELECTing it. Like a constraint iirc.
Thanks
Here's your script
ALTER TABLE tableA ADD CONSTRAINT some_constraint PRIMARY KEY(user_id,printer);
INSERT INTO tableA(user_id, printer)
VALUES
(
1,
'LaserWriter'
)
ON CONFLICT (user_id, printer)
DO NOTHING;
You can use DISTINCT. For example:
SELECT user_id, DISTINCT printer FROM my_table;
That's all. Hope it helps!
You need a series of steps (assuming there is no already assigned unique key).
Add a temporary column to make each row unique.
Assign a value to the new columns.
Remove the already existing duplicates.
Create a Unique or Primary Key on the composite columns.
Remove the temporary column.
alter table your_table add temp_unique integer unique;
do $$
declare
row_num integer = 1;
c_assign cursor for
select temp_unique
from your_table
for update;
begin
for rec in c_assign
loop
update your_table
set temp_unique = row_num
where current of c_assign;
row_num = row_num + 1;
end loop;
end;
$$
delete from your_table ytd
where exists ( select 1
from your_table ytk
where ytd.user_id = ytk.user_id
and ytd.printer = ytk.printer
and ytd.temp_unique > ytk.temp_unique
) ;
alter table your_table add constraint id_prt_uk unique (user_id, printer);
alter table your_table drop temp_unique;
I found the answer. When creating the table I needed to specify the two columns as UNIQUE. Observe:
CREATE TABLE foo (user_id INT, printer VARCHAR(20), UNIQUE (user_id, printer));
Now, here are my results:
=# INSERT INTO foo VALUES (1, 'LaserWriter');
INSERT 0 1
=# INSERT INTO foo VALUES (4, 'LaserWriter');
INSERT 0 1
=# INSERT INTO foo VALUES (1, 'ThinkJet');
INSERT 0 1
=# INSERT INTO foo VALUES (2, 'DeskJet');
INSERT 0 1
=# INSERT INTO foo VALUES (1, 'ThinkJet');
ERROR: duplicate key value violates unique constraint "foo_user_id_printer_key"
DETAIL: Key (user_id, printer)=(1, ThinkJet) already exists.
=# SELECT * FROM foo;
user_id | printer
---------+-------------
1 | LaserWriter
4 | LaserWriter
1 | ThinkJet
2 | DeskJet
(4 rows)

PostgreSQL query on text array value

I have a table where one column has an array - but stored in a text format:
mytable
id ids
-- -------
1 '[3,4]'
2 '[3,5]'
3 '[3]'
etc ...
I want to find all records that have the value 5 as an array element in the ids column.
I was trying to achieve this by using the "string to array" function and removing the [ symbols with the translate function, but couldn't find a way.
You can do this: http://www.sqlfiddle.com/#!1/5c148/12
select *
from tbl
where translate(ids, '[]','{}')::int[] && array[5];
Output:
| ID | IDS |
--------------
| 2 | [3,5] |
You can also use bool_or: http://www.sqlfiddle.com/#!1/5c148/11
with a as
(
select id, unnest(translate(ids, '[]','{}')::int[]) as elem
from tbl
)
select id
from a
group by id
having bool_or(elem = 5);
To see the original elements:
with a as
(
select id, unnest(translate(ids, '[]','{}')::int[]) as elem
from tbl
)
select id, '[' || array_to_string(array_agg(elem), ',') || ']' as ids
from a
group by id
having bool_or(elem = 5);
Output:
| ID | IDS |
--------------
| 2 | [3,5] |
Postgresql DDL is atomic, if it's not late yet in your project, just structure your stringly-typed array to a real array: http://www.sqlfiddle.com/#!1/6e18c/2
alter table tbl
add column id_array int[];
update tbl set id_array = translate(ids,'[]','{}')::int[];
alter table tbl drop column ids;
Query:
select *
from tbl
where id_array && array[5]
Output:
| ID | ID_ARRAY |
-----------------
| 2 | 3,5 |
You can also use contains operator: http://www.sqlfiddle.com/#!1/6e18c/6
select *
from tbl
where id_array #> array[5];
I prefer the && syntax though, it directly connotes intersection. It reflects that you are detecting if there's an intersection between two sets(array is a set)
http://www.postgresql.org/docs/8.2/static/functions-array.html
If you store the string representation of your arrays slightly differently, you can cast to array of integer directly:
INSERT INTO mytable
VALUES
(1, '{3,4}')
,(2, '{3,5}')
,(3, '{3}');
SELECT id, ids::int[]
FROM mytable;
Else, you have to put in one more step:
SELECT (translate(ids, '[]','{}'))::int[]
FROM mytable
I would consider making the column an array type to begin with.
Either way, you can find your row like this:
SELECT id, ids
FROM (
SELECT id, ids, unnest(ids::int[]) AS elem
FROM mytable
) x
WHERE elem = 5