Postgresql - Preserve relative position when using distinct - sql

I ran a query which returned a table like this.
d | e | f
---+-----+----
2 | 103 | C
6 | 201 | AB
1 | 102 | B
1 | 102 | B
1 | 102 | B
1 | 102 | B
1 | 102 | B
3 | 105 | E
3 | 105 | E
3 | 105 | E
What I want is to get distinct rows but in order. Basically I want this:
2 | 103 | C
6 | 201 | AB
1 | 102 | B
3 | 105 | E
I tried distinct and group by, but they are not always preserving the position (they preserved it for some other cases that I had). Any idea as to how can this be done easily or would one need to use other functionalities like rank?

SQL tables represent unordered sets. There is no ordering, unless you have an explicit order by with a column or expression.
If you have such an ordering, you can do what you want using group by:
select d, e, f
from t
group by d, e, f
order by min(a); -- assuming a is the column that specifies the ordering

Use case when:
order by case when f=C then 1 when f=AB then 2
when f=B then 3 when f=E then 5 else null end

You can try to order by ctid column, which describes the physical location of a row, to identify a row.
The physical location of the row version within its table. Note that although the ctid can be used to locate the row version very quickly, a row's ctid will change if it is updated or moved by VACUUM FULL. Therefore ctid is useless as a long-term row identifier. The OID, or even better a user-defined serial number, should be used to identify logical rows.
use row_number with windows function to make row number by ctid.
then get rn = 1 and order by ctid
CREATE TABLE T(
d int,
e int,
f varchar(5)
);
insert into t values (2,103, 'C');
insert into t values (6,201, 'AB');
insert into t values (1,102, 'B');
insert into t values (1,102, 'B');
insert into t values (1,102, 'B');
insert into t values (1,102, 'B');
insert into t values (1,102, 'B');
insert into t values (3,105, 'E');
insert into t values (3,105, 'E');
insert into t values (3,105, 'E');
Query 1:
select d,e,f
from (
select d,e,f,ctid,row_number() over(partition by d,e,f order by ctid) rn
FROM T
)t1
where rn = 1
order by ctid
Results:
| d | e | f |
|---|-----|----|
| 2 | 103 | C |
| 6 | 201 | AB |
| 1 | 102 | B |
| 3 | 105 | E |

Related

How do insert data into a table that already exists?

I'm trying to insert data into a table that already exists, but I cant find anything on how to do this. I only found how to insert this data into a new table.
Syntax error at or near Insert
Tutorial I visited
SELECT film_category.film_id, film_category.category_id, rental_duration, rental_rate
INSERT INTO category_description
FROM film_category
LEFT JOIN FILM
ON film_category.film_id = film.film_id
A simplified test to showcase methods to insert.
CREATE TABLE TableA (
ID INT GENERATED ALWAYS AS IDENTITY,
ColA1 INT,
ColA2 VARCHAR(30)
);
--
-- INSERT VALUES into existing table
--
INSERT INTO TableA (ColA1, ColA2) VALUES
(10, 'A'),
(20, 'B'),
(30, 'C');
3 rows affected
--
-- SELECT INTO new table
--
SELECT ID, ColA1+2 AS ColB1, ColA2||'2' AS ColB2
INTO TableB
FROM TableA;
3 rows affected
--
-- INSERT from SELECT with explicit columns
--
INSERT INTO TableA (ColA1, ColA2)
SELECT ColB1+1, CONCAT(LEFT(ColB2,1),'3') AS ColB23
FROM TableB;
3 rows affected
SELECT * FROM TableA;
id | cola1 | cola2
-: | ----: | :----
1 | 10 | A
2 | 20 | B
3 | 30 | C
4 | 13 | A3
5 | 23 | B3
6 | 33 | C3
--
-- INSERT from SELECT without columns
-- Only works when they have the same number of columns.
--
INSERT INTO TableB
SELECT *
FROM TableA;
6 rows affected
SELECT * FROM TableB;
id | colb1 | colb2
-: | ----: | :----
1 | 12 | A2
2 | 22 | B2
3 | 32 | C2
1 | 10 | A
2 | 20 | B
3 | 30 | C
4 | 13 | A3
5 | 23 | B3
6 | 33 | C3
db<>fiddle here
The order is wrong https://www.w3schools.com/sql/sql_insert_into_select.asp
Also see this answer
Insert into ... values ( SELECT ... FROM ... )
INSERT INTO category_description
SELECT
film_category.film_id,
film_category.category_id,
rental_duration,
rental_rate
FROM
film_category
LEFT JOIN FILM ON film_category.film_id = film.film_id

Compare two tables and show the value of another table if exist, if not exist just show status in SQL

I have two tables to compare in SQL. When the id from one exists in the other, the result I want is the value of data from the second table; when it doesn't exist it will show "Data not Exist" in the 'value' field name.
Example
Table 1
| id|
-----
| 1 |
| 2 |
| 3 |
| 4 |
| 5 |
| 6 |
| 7 |
| 8 |
| 9 |
| 10|
Table 2
|id | value
---------
| 1 | 10|
| 2 | 9 |
| 3 | 7 |
| 4 | 8 |
| 5 | 6 |
I've tried the query below:
select a.id,
CASE when exists(select a.id from table2 b where a.id = b.id)
THEN value
else 'Data Not Exist'
END as Result_Value
from table1 a inner join table2 b
on a.id=b.id
order by a.id;
The Result is:
|id | Result_Value
---------
| 1 | 10|
| 2 | 9 |
| 3 | 7 |
| 4 | 8 |
| 5 | 6 |
Above result that's not I wanted, my expectation result like below:
|id | Result_Value
---------
| 1 | 10 |
| 2 | 9 |
| 3 | 7 |
| 4 | 8 |
| 5 | 6 |
| 6 | Data Not Exist |
| 7 | Data Not Exist |
| 8 | Data Not Exist |
| 9 | Data Not Exist |
| 10| Data Not Exist |
Note: This is simple explanation from my query, because my query have complexity to join another table with inner join, I don't know where I'm exactly wrong using select exist.
Just use a LEFT JOIN, and COALESCE any NULL values to Data not Exist:
SELECT a.id, COALESCE(b.value, 'Data not exist') AS value
FROM a
LEFT JOIN b ON b.id = a.id
Output:
id value
1 10
2 9
3 7
4 8
5 Data not exist
6 Data not exist
7 Data not exist
8 Data not exist
9 Data not exist
10 Data not exist
Demo on dbfiddle
I found 2 issues here.
Don't use join/inner join if you want your next table to show up.
DataTypes of your select case values should be the same.
Here's your query.
select a.id,
case
when isnull(b.id, '') != ''
then cast(b.value as varchar(50))
else
'Data Not Exist'
END as Result_Value
from table1 a
left join table2 b on a.id=b.id
order by a.id;
Alternatively, Using LEFT JOIN between Table1 and Table2 and ISNULL to check NULL, If NULL then replace with Data not Exist
SELECT a.id, ISNULL(b.value,'Data not Exist') AS value FROM dbo.Table1 a
LEFT JOIN dbo.Table2 b ON a.id=b.id
You can get the desired results by using a LEFT JOIN a long side with one of:
COALESCE() expression.
ISNULL() function.
CASE expression.
IIF() function.
As the following
SELECT T1.Id,
COALESCE(CAST(T2.Value AS VARCHAR(10)), 'Data Not Exist') ByCoalesce,
ISNULL(CAST(T2.Value AS VARCHAR(10)), 'Data Not Exist') ByIsNull,
CASE WHEN T2.Value IS NULL
THEN 'Data Not Exist'
ELSE CAST(T2.Value AS VARCHAR(10))
END ByCaseExpression,
IIF(T2.Value IS NULL, 'Data Not Exist', CAST(T2.Value AS VARCHAR(10))) ByIifFunction
FROM
(
VALUES
(1),
(2),
(3),
(4),
(5),
(6),
(7),
(8),
(9),
(10)
) T1(Id) LEFT JOIN
(
VALUES
(1, 10),
(2, 9 ),
(3, 7 ),
(4, 8 ),
(5, 6 )
) T2(Id, Value)
ON T1.Id = T2.Id;
Note that you need to CAST() / CONVERT() the INT values to VARCHAR(n) because VARCHAR data type has a lower precedence than INT data type.
Online Demo

UPDATE based on multiple "WHERE IN" conditions

Let's say I have a table I want to update based on multiple conditions. Each of these conditions is an equal-sized array, and the only valid cases are the ones which match the same index in the arrays.
That is, if we use the following SQL clause
UPDATE Foo
SET bar = 1
WHERE a IN ( 1, 2, 3, 4, 5)
AND b IN ( 6, 7, 8, 9, 0)
AND c IN ('a', 'b', 'c', 'd', 'e')
bar will be set to 1 for any row which has, for example, a = 1, b = 8, c = 'e'.
That is not what I want.
I need a clause where only a = 1, b = 6, c = 'a' or a = 2, b = 7, c = 'b' (etc.) works.
Obviously I could rewrite the clause as
UPDATE Foo
SET bar = 1
WHERE (a = 1 AND b = 6 AND c = 'a')
OR (a = 2 AND b = 7 AND c = 'b')
OR ...
This would work, but it's hardly extensible. Given the values of the conditions are variable and obtained programmatically, it'd be far better if I could set each array in one place instead of having to build a string-building loop to get that WHERE call right.
So, is there a better, more elegant way to have the same behavior as this last block?
Use the Table Values Constructor :
UPDATE f
SET bar = 1
WHERE EXISTS (
SELECT * FROM (VALUES (1,6,'a'),(2,7,'b'),(3,8,'c')) AS Trios(a,b,c)
WHERE Trios.a = f.a AND Trios.b = f.b AND Trios.c = f.c
)
You can use values() and join:
UPDATE f
SET bar = 1
FROM Foo f JOIN
(VALUES (1, 6, 'a'),
(2, 7, 'b'),
. . .
) v(a, b, c)
ON f.a = v.a AND f.b = v.b AND f.c = v.c;
Try this might work
DECLARE #Temp AS Table ( a int, b int, c varchar(50))
INSERT INTO #Temp(a,b,c)
VALUES(1, 6, 'a'),
(2, 7, 'b'),
(3, 8, 'c'),
(4, 9, 'd'),
(5, 0, 'e')
UPDATE F
SET bar = 1
FROM FOO F INNER JOIN #Temp T
ON F.a = T.a AND F.b = T.b AND F.c = T.c
When you read the data don't save it as separated values but as a single string and then use the following:
update foo
set bar = 1
where concat(a,b,c) in ('16a','27b','38c','49d','50e')
it may not be the most elegant way but it is very practical and simple.
I could be entirely off the mark here--I'm not sure if you're passing in a set of values or what-have-you--but my first thought is using a series of CTEs.
I'm making considerable assumptions about your data, but here's an example you can run in SSMS based on my thoughts of your question.
-- Create #Data and insert some, er... data ---
DECLARE #Data TABLE ( id INT IDENTITY(100,1) PRIMARY KEY, a VARCHAR(1), b VARCHAR(1), c VARCHAR(1) );
INSERT INTO #Data ( a ) VALUES ('1'), ('2'), ('3'), ('4'), ('5');
INSERT INTO #Data ( b ) VALUES ('6'), ('7'), ('8'), ('9'), ('0');
INSERT INTO #Data ( c ) VALUES ('a'), ('b'), ('c'), ('d'), ('e');
So let's assume this is your data. I've kept it simple to make it easier to understand.
+-----+---+---+---+
| id | a | b | c |
+-----+---+---+---+
| 100 | 1 | | |
| 101 | 2 | | |
| 102 | 3 | | |
| 103 | 4 | | |
| 104 | 5 | | |
| 105 | | 6 | |
| 106 | | 7 | |
| 107 | | 8 | |
| 108 | | 9 | |
| 109 | | 0 | |
| 110 | | | a |
| 111 | | | b |
| 112 | | | c |
| 113 | | | d |
| 114 | | | e |
+-----+---+---+---+
Query the data with aligned "array" indexes:
;WITH CTE_A AS (
SELECT
id,
ROW_NUMBER() OVER ( ORDER BY id ) AS a_row_id,
a
FROM #Data WHERE a IS NOT NULL
)
, CTE_B AS (
SELECT
id,
ROW_NUMBER() OVER ( ORDER BY id ) AS b_row_id,
b
FROM #Data WHERE b IS NOT NULL
)
, CTE_C AS (
SELECT
id,
ROW_NUMBER() OVER ( ORDER BY id ) AS c_row_id,
c
FROM #Data WHERE c IS NOT NULL
)
SELECT
CTE_A.id, CTE_A.a_row_id, CTE_A.a
, CTE_B.id, CTE_B.b_row_id, CTE_B.b
, CTE_C.id, CTE_C.c_row_id, CTE_C.c
FROM CTE_A
JOIN CTE_B ON CTE_A.a_row_id = CTE_B.b_row_id
JOIN CTE_C ON CTE_A.a_row_id = CTE_C.c_row_id;
Which returns:
+-----+----------+---+-----+----------+---+-----+----------+---+
| id | a_row_id | a | id | b_row_id | b | id | c_row_id | c |
+-----+----------+---+-----+----------+---+-----+----------+---+
| 100 | 1 | 1 | 105 | 1 | 6 | 110 | 1 | a |
| 101 | 2 | 2 | 106 | 2 | 7 | 111 | 2 | b |
| 102 | 3 | 3 | 107 | 3 | 8 | 112 | 3 | c |
| 103 | 4 | 4 | 108 | 4 | 9 | 113 | 4 | d |
| 104 | 5 | 5 | 109 | 5 | 0 | 114 | 5 | e |
+-----+----------+---+-----+----------+---+-----+----------+---+
Again, assumptions made on your data (in particular an id exists that can be sorted), but this basically pivots it by linking the a, b and c values on their relative "index" (ROW_NUMBER). By using ROW_NUMBER in this way, we can create a makeshift array index value ( a_row_id, b_row_id, c_row_id ) that can be used to join the resulting values.
This example can easily be changed to an UPDATE statement.
Does this address your question?

Classify records based on matching table

I have two tables: ITEMS and MATCHING_ITEMS, as below:
ITEMS:
|---------------------|------------------|
| ID | Name |
|---------------------|------------------|
| 1 | A |
| 2 | B |
| 3 | C |
| 4 | D |
| 5 | E |
| 6 | F |
| 7 | G |
|---------------------|------------------|
MATCHING_ITEMS:
|---------------------|------------------|
| ID_1 | ID_2 |
|---------------------|------------------|
| 1 | 2 |
| 1 | 3 |
| 2 | 3 |
| 4 | 5 |
| 4 | 6 |
| 5 | 6 |
|---------------------|------------------|
The MATCHING_ITEMS table defines items that match each other, and thus belong to the same group, i.e. items 1,2, and 3 match with each other and thus belong in a group, and the same for items 4,5, and 6. Item 7 does not have a match belong to any group.
I now need to add a 'Group' column on the ITEMS table which contains a unique integer for each group, so it would look as follows:
ITEMS:
|---------------------|------------------|------------------|
| ID | Name | Group |
|---------------------|------------------|------------------|
| 1 | A | 1 |
| 2 | B | 1 |
| 3 | C | 1 |
| 4 | D | 2 |
| 5 | E | 2 |
| 6 | F | 2 |
| 7 | G | NULL |
|---------------------|------------------|------------------|
So far I have been using a stored procedure to do this, looping over each line in the MATCHING_ITEMS table and updating the ITEMS table with a group value. The problem is that I eventually need to do this for a table containing millions of records, and the looping method is far too slow.
Is there a way that I can achieve this without using a loop?
If you have all pairs of matches in the matching table, then you can just use the minimum id to assign the group. For this:
select i.*,
(case when grp_id is not null
then dense_rank() over (order by grp_id)
end) as grouping
from items i left join
(select mi.id_1, least(mi.id1, min(mi.id2)) as grp_id
from matching_items mi
group by mi.id_1
) mi
on i.id = mi.id_1;
Note: This works only if all pairs are in the matching items table. Otherwise, you will need a recursive/hierarchical query to get all the pairs.
You could use min and max at first, then dense_rank to assign group numbers:
select id, name, dense_rank() over (order by mn, mx) grp
from (
select distinct id, name,
min(id_1) over (partition by name) mn,
max(id_2) over (partition by name) mx
from items left join matching_items on id in (id_1, id_2))
order by id
demo
The pairs 2,3 and 5,6 in the Matching_items table seem redundant as they could be derived (if I am reading your question right)
Here is how I did it. I just reused id_1 from your example as the group no:
create table
items (
ID number,
name varchar2 (2)
);
insert into items values (1, 'A');
insert into items values (2, 'B');
insert into items values (3, 'C');
insert into items values (4, 'D');
insert into items values (5, 'E');
insert into items values (6, 'F');
insert into items values (7, 'G');
create table
matching_items (
ID number,
ID_2 number
);
insert into matching_items values (1, 2);
insert into matching_items values (1, 3);
insert into matching_items values (2, 3);
insert into matching_items values (4, 5);
insert into matching_items values (4, 6);
insert into matching_items values (5, 6);
with new_grp as
(
select id, id_2, id as group_no
from matching_items
where id in (select id from items)
and id not in (select id_2 from matching_items)),
assign_grp as
(
select id, group_no
from new_grp
union
select id_2, group_no
from new_grp)
select items.id, name, group_no
from items left outer join assign_grp
on items.id = assign_grp.id;

Using Limit on Distinct group by values psql

Suppose I have a table that looks like this or maybe I am going nowhere.
create table customers (id text, name text, number int, useless text);
With values
insert into customers (id, name, number, useless)
values
('1','apple',1, 'a'),
('2','banana',3, 'b'),
('3','pear',2, 's'),
('4','apple',1,'e'),
('5','banana',3,'s'),
('6','cherry',3, 'a'),
('7','cherry',4, 's'),
('8','apple',2, 'd'),
('9','banana',4, 'c'),
('10','pear',5, 'e');
My failed psql query is this.
select id, name, number, useless
from customers
where number < 4
group by customers.name limit 2
the query i want to use that it returns first 2 unique grouped by customers.name. Not the first 2 rows
In the end I want it to return
('1','apple',1, 'a'),
('4','apple',1,'e'),
('8','apple',2, 'd'),
('2','banana',3, 'b'),
('5','banana',3,'s'),
so it returns the first 2 grouped names.
How can I make this query?
Thank you.
Edit:
this query is my second try I know I am kinda close.
select t.id, t.name, t.ranking
from (
SELECT id, name, dense_rank() OVER (order by name) as
ranking
FROM customers
group by name
) t
where t.ranking < 3
try this:
select id, name, number, useless
from customers
where name in (
select name
from customers
where number < 4
group by customers.name
order by name limit 2
)
| id | name | number | useless |
|----|--------|--------|---------|
| 1 | apple | 1 | a |
| 2 | banana | 3 | b |
| 4 | apple | 1 | e |
| 5 | banana | 3 | s |
| 8 | apple | 2 | d |
| 9 | banana | 4 | c |
SQL Fiddle DEMO
The group by customers.name function do not order your output, just group them by the customers.name, what you want to do is to order the group right? So what i think you want to do is:
select id, name, number, useless
from customers
group by name
order by name []*
*[asc/desc] depends of what order you want to do:
asc - ascendent,
desc - descendent
Hope it helps you.
You can use dense_rank() as:
SELECT * FROM (
SELECT DENSE_RANK() OVER (order by name) AS rank, temp.*
FROM customers temp WHERE number < 4) data
WHERE data.rank <= 2
| rank| id| name | number | useless |
|-----|---|--------|--------|---------|
| 1 | 4 | apple | 1 | e |
| 1 | 1 | apple | 1 | a |
| 1 | 8 | apple | 2 | d |
| 2 | 5 | banana | 3 | s |
| 2 | 2 | banana | 3 | b |