MySQL get rows but prefer one column value over another - sql

A bit of a strange one, I want to write a MySQL query that will get results from a table, but prefer one value of a column over another, ie
id name value prioirty
1 name1 value1 NULL
2 name1 value1 1
3 name2 value2 NULL
4 name3 value3 NULL
So here name1 has two entries, but one has a prioirty of 1. I want to get all the values from the table, but prefer the values with whatever priorty I'm after.
The results I'd be after would be
id name value prioirty
2 name1 value1 1
3 name2 value2 NULL
4 name3 value3 NULL
An equivalent way of saying it would be 'get all rows from the table, but prefer rows with a priority of x'.

This should do it:
SELECT
T1.id,
T1.name,
T1.value,
T1.priority
FROM
My_Table T1
LEFT OUTER JOIN My_Table T2 ON
T2.name = T1.name AND
T2.priority > COALESCE(T1.priority, -1)
WHERE
T2.id IS NULL
This also allows you to have multiple priority levels with the highest being the one that you want to return (if you had a 1 and 2, the 2 would be returned).
I will also say though that it does seem like there are some design problems in the DB. My approach would have been:
My_Table (id, name)
My_Values (id, priority, value)
with an FK on id to id. PKs on id in My_Table and id, priority in My_Values. Of course, I'd use appropriate table names too.

You need to redesign your table first.
It should be:
YourTable (Id, Name, Value)
YourTablePriority (PriorityId, Priority, Id)
Update:
select * from YourTable a
where a.Id not in
(select b.Id from YourTablePriority b)
This should work in sql server, you may need a little change to make it work in mysql.

Maybe something like:
SELECT id, name, value, priority FROM
table_name GROUP BY name ORDER BY priority
Although not having a database in front of me I can't test it...

If I understand correctly, you want the value of a name given a specific priority, or the value associated with a NULL priority. (You do not necessarily want the MAX(priority) that exists.)
Yes, you've got some awkward design issues which you should address, but let's solve the problem you do have at present (and you can later migrate to the problem you ought to have :) ):
mysql> SET #priority = 1; -- the priority we want, if recorded
mysql> PREPARE stmt FROM "
SELECT
t0.*
FROM
t t0
LEFT JOIN
(SELECT DISTINCT name, priority FROM t WHERE priority = ?) t1
ON t0.name = t1.name
WHERE
t0.priority = t1.priority
OR
t1.priority IS NULL
";
mysql> EXECUTE stmt USING #priority;
+----+-------+--------+----------+
| id | name | value | priority |
+----+-------+--------+----------+
| 2 | name1 | valueX | 1 |
| 3 | name2 | value2 | NULL |
| 4 | name3 | value3 | NULL |
+----+-------+--------+----------+
3 rows in set (0.00 sec)
(Note that I changed the prioritized value of "name1" to "valueX" in the above -- your original formulation had identical value values for "name1" regardless of priority, which made it hard for me to understand why you cared to discriminate one from the other.)

Related

Remove duplicate rows based on specific columns

I have a table that contains these columns:
ID (varchar)
SETUP_ID (varchar)
MENU (varchar)
LABEL (varchar)
The thing I want to achieve is to remove all duplicates from the table based on two columns (SETUP_ID, MENU).
Table I have:
id | setup_id | menu | label |
-------------------------------------
1 | 10 | main | txt |
2 | 10 | main | txt |
3 | 11 | second | txt |
4 | 11 | second | txt |
5 | 12 | third | txt |
Table I want:
id | setup_id | menu | label |
-------------------------------------
1 | 10 | main | txt |
3 | 11 | second | txt |
5 | 12 | third | txt |
You can achieve this with a common table expression (cte)
with cte as (
select id, setup_id, menu,
row_number () over (partition by setup_id, menu, label) rownum
from atable )
delete from atable a
where id in (select id from cte where rownum >= 2)
This will give you your desired output.
Common Table Expression docs
Assuming a table named tbl where both setup_id and menu are defined NOT NULL and id is the PRIMARY KEY.
EXISTS will do nicely:
DELETE FROM tbl t0
WHERE EXISTS (
SELECT FROM tbl t1
WHERE t1.setup_id = t0.setup_id
AND t1.menu = t0.menu
AND t1.id < t0.id
);
This deletes every row where a dupe with lower id is found, effectively only keeping the row with the smallest id from each set of dupes. An index on (setup_id, menu) or even (setup_id, menu, id) will help performance with big tables a lot.
If there is no PK and no reliable UNIQUE (combination of) column(s), you can fall back to using the ctid. If NULL values can be involved, you need to specify how to deal with those.
Consider:
Delete duplicate rows from small table
How to delete duplicate rows without unique identifier
How do I (or can I) SELECT DISTINCT on multiple columns?
After cleaning up duplicates, add a UNIQUE constraint to prevent new dupes:
ALTER TABLE tbl ADD CONSTRAINT tbl_setup_id_menu_uni UNIQUE (setup_id, menu);
If you had an index on (setup_id, menu), drop that now. It's superseded by the UNIQUE constraint.
I have found a solution that fits me the best.
Here it is if anyone needs it:
DELETE FROM table_name
WHERE id IN
(SELECT id
FROM
(SELECT id,
ROW_NUMBER() OVER( PARTITION BY setup_id,
menu
ORDER BY id ) AS row_num
FROM table_name ) t
WHERE t.row_num > 1 );
link: https://www.postgresql.org/docs/current/queries-union.html
https://www.postgresql.org/docs/current/sql-select.html#SQL-DISTINCT
let's sat table name is a
select distinct on (setup_id,menu ) a.* from a;
Key point: The DISTINCT ON expression(s) must match the leftmost ORDER BY expression(s). The ORDER BY clause will normally contain additional expression(s) that determine the desired precedence of rows within each DISTINCT ON group.
Which means you can only order by setup_id,menu in this distinct on query scope.
Want the opposite:
EXCEPT returns all rows that are in the result of query1 but not in the result of query2. (This is sometimes called the difference between two queries.) Again, duplicates are eliminated unless EXCEPT ALL is used.
SELECT * FROM a
EXCEPT
select distinct on (setup_id,menu ) a.* from a;
You can try something along these lines to delete all but the first row in case of duplicates (please note that this is not tested in any way!):
DELETE FROM your_table WHERE id IN (
SELECT unnest(duplicate_ids[2:]) FROM (
SELECT array_agg(id) AS duplicate_ids FROM your_table
GROUP BY SETUP_ID, MENU
HAVING COUNT(*) > 1
)
)
)
The above collects the ids of the duplicate rows (COUNT(*) > 1) in an array (array_agg), then takes all but the first element in that array ([2:]) and "explodes" the id values into rows (unnest).
The outer query just deletes every id that ends up in that result.
For mysql the similar question is already answered here Find and remove duplicate rows by two columns
Try if any of the approach helps in this matter.
I like the below one for MySql:
ALTER IGNORE TABLE your_table ADD UNIQUE (SETUP_ID, MENU);
DELETE t1
FROM table_name t1
join table_name t2 on
(t2.setup_id = t1.setup_id or t2.menu = t1.menu) and t2.id < t1.id
There are many ways to find and delete all duplicate row(s) based on conditions. But I like inner join method, which works very fast even in a large amount of Data. Please check follows :
DELETE T1 FROM <TableName> T1
INNER JOIN <TableName> T2
WHERE
T1.id > T2.id AND
T1.<ColumnName1> = T2.<ColumnName1> AND T1.<ColumnName2> = T2.<ColumnName2>;
In your case you can write as follows :
DELETE T1 FROM <TableName> T1
INNER JOIN <TableName> T2
WHERE
T1.id > T2.id AND
T1.setup_id = T2. setup_id;
Let me know if you face any issue or need more help.

(SQL) How to select all fields from table 1 except the id's from table 2?

I got 3 tables.
Users:
id__| login__
--------------
_1__| root
_2__| admin
_3__| user
Table 1
id__ | name__ | data_
---------------------
1____| name1__| data1
2____| name2__| data2
3____| name3__| data3
4____| name4__| data4
Table2
id__ | table1_id_| user_id
---------------------------
1____| ____3_____| ___3___
2____| ____2_____| ___3___
3____| ____2_____| ___1___
4____| ____3_____| ___1___
I want to get 'name' and 'data' from table1 except ids from table2 that belongs to users.id = 3, that means i need to get this:
Table result (all fields from table1) (result for user_id = 3) (btw for user_id = 1 result must be the same):
Desired output:
id__| name__ | data_
--------------------
1___| name1__| data1
4___| name4__| data4
---------------------
What the SQL query i should use?
Although you can solve this using NOT IN, I recommend using NOT EXISTS instead:
SELECT t1.*
FROM table1 t1
WHERE NOT EXISTS (SELECT 1
FROM table_2 t2
WHERE t2.table1_id = t1.id AND t2.user_id = 3
);
Why? NOT IN behaves rather strangely if any table1_id value is NULL. If that occurs, then the NOT IN only returns false and NULL -- it never returns true. Hence, no rows at all will be returned if even one column value is NULL.
NOT EXISTS, on the other hand, behaves more intuitively, so you don't have to worry about this condition.
If I understood correctly this should is what you want:
SELECT * FROM table1 WHERE id NOT IN (SELECT table1_id FROM table2 WHERE user_id = 3);

Determine source on COALESCE fields

I have two tables table which are identical in structure but belong to different schemas (schemas A and B). All rows in question will always appear in the A.table but may or may not appear in B.table. B.table is essentially an override for the defaults in A.table.
As such my query uses a COALESCE on each field similar to:
SELECT COALESCE(B.id, A.id) as id,
COALESCE(B.foo, A.foo) as foo,
COALESCE(B.bar, A.bar) as bar
FROM A.table LEFT JOIN B.table ON (A.id = B.id)
WHERE A.id in (1, 2, 3)
This works great, but I also want to add the source of the data. In the example above, assuming id=2 existed in B.table but not 1 or 3, I would want to include some indication that A is the source for 1 and 3 and B is the source for 2.
So the data might look like the following
+---------------------------------+
| id | foo | bar | source |
+---------------------------------+
| 1 | a | b | A |
| 2 | c | d | B |
| 3 | e | f | A |
+---------------------------------+
I don't really care what the value of source is as long as I can distinguish A from B.
I am no pgsql expert (not by a long shot) but I have tinkered around with EXISTS and a subquery but have had no luck so far.
As records showing the default value (from A.table) have NULLs for B.id, all you need is to add this column specification to your query:
CASE WHEN B.id IS NULL THEN 'A' ELSE 'B' END AS Source
The USING clause would simplify the query you have:
SELECT id
, COALESCE(B.foo, A.foo) AS foo
, COALESCE(B.bar, A.bar) AS bar
, CASE WHEN b.id IS NULL THEN 'A' ELSE 'B' END AS source -- like #Terje provided
FROM a
LEFT JOIN b USING (id)
WHERE a.id IN (1, 2, 3);
But typically, this alternative query should serve you better:
SELECT x.* -- or list columns of your choice
FROM (VALUES (1), (2), (3)) t (id)
, LATERAL (
SELECT *, 'B' AS source FROM b WHERE id = t.id
UNION ALL
SELECT *, 'A' FROM a WHERE id = t.id
LIMIT 1
) x
ORDER BY x.id;
Advantages:
You don't have to add another COALESCE construct for every column you want to add to the result.
The same query works for any number of columns in a and b.
The query even works if the column names are not identical. Only number and data types of columns must match.
Of course, you can always list selected, compatible columns as well:
SELECT * -- or list columns of your choice
FROM (VALUES (1), (2), (3)) t (id)
, LATERAL (
SELECT foo, bar, 'B' AS source FROM b WHERE id = t.id
UNION ALL
SELECT foo2, bar17, 'A' FROM a WHERE id = t.id
LIMIT 1
) x
ORDER BY x.id;
The first SELECT determines names, data types and number of columns.
This query doesn't break if columns in b are not defined NOT NULL.
COALESCE cannot tell the difference between b.foo IS NULL and no row with matching id in b. So the source of any result column (except id) can still be 'A', even if the result row says 'B' - if any relevant column in b can be NULL.
My alternative returns all values from b if the row exists - including NULL values. So the result can be different if columns in b can be NULL. It depends on your requirements which behavior is desirable.
Either query assumes that id is defined as primary key (so exactly 1 or 0 rows per given id value).
Related:
Select first record if none match
What is the difference between LATERAL and a subquery in PostgreSQL?

SQL Server : Nested Select Query

I have a SQL query returning results based on a where clause.
I would like to include some more results, from the same table, dependent on what is found in the first select.
My select returns rows with ID's that meet the where criteria. It does happen that the table has more rows with this ID, but that does not meet the initial where criteria. Rather than re querying the DB with a separate call, I would like to use one select statement to also get these extra rows with the same ID. ID is not the index/ID. Its a naming convention I am using here.
Pseudo: (two steps)
1: select * from table where condition=xxx
2: for each row returned, (select * from table where id=row.id)
I want to do:
select
id as thisID, field1, field2,
(select id, field1, field2 from table where id = thisID)
from
table
where
condition=xxx
I have multiple joins in my real query, and just cant get the above to work. I unfortunately can not supply the real query, but I get an error of:
Only one expression can be specified in the select list when the subquery is not introduced with EXISTS. Invalid column name 'thisID'
My query works fine with the multiple joins, without the above. I am trying to retrieve these extra records as part of the current working query.
Example:
TABLE
select * from table where col3 = 'green'
id, col1, col2, col3
123 | blue | red | green
-------------------------
567 | blue | red | green
-------------------------
123 | blue | red | blue
-------------------------
890 | blue | red | green
-------------------------
I want to return all 4 rows, because although row 3 fails the where condition, it has the same col1 value as row 1 (123), and I need to include it, as it is part of a "set" that I need to locate / import, called / referenced by id=123.
What I am doing manually now, is getting row one, and then running another query based on row 1's ID, to get row 3 as well.
You can use Where IN
select id as thisID, field1, field2 from table
where id in
(select id from table where condition=xxx)
Try this
Let say you table is below and called #Temp
Id Col1 Col2 Col3
123 blue red green
567 blue red green
123 blue red blue
890 blue red green
Will get the id to a temp table
Create Table #T1(Id int)
Insert Into #T1
Select Id
From #Temp
Where Col3='green'
Then
Select distinct *
From #Temp
Where Id in (select Id from #T1) Or Col3='Green'
Which result all the rows from main table
Update
If you want to use the way you currently using, try something like below
select
id as thisID, field1, field2,
(select top 1 id from table where id = t.id) as Id,
(select top 1 field1 from table where id = t.id) as field1,
(select top 1 field2 from table where id = t.id) as field2,
from
table t
where
condition=xxx

In postgresql, how can I fill in missing values within a column?

I'm trying to figure out how to fill in values that are missing from one column with the non-missing values from other rows that have the same value on a given column. For instance, in the below example, I'd want all the "1" values to be equal to Bob and all of the "2" values to be equal to John
ID # | Name
-------|-----
1 | Bob
1 | (null)
1 | (null)
2 | John
2 | (null)
2 | (null)
`
EDIT: One caveat is that I'm using postgresql 8.4 with Greenplum and so correlated subqueries are not supported.
CREATE TABLE bobjohn
( ID INTEGER NOT NULL
, zname varchar
);
INSERT INTO bobjohn(id, zname) VALUES
(1,'Bob') ,(1, NULL) ,(1, NULL)
,(2,'John') ,(2, NULL) ,(2, NULL)
;
UPDATE bobjohn dst
SET zname = src.zname
FROM bobjohn src
WHERE dst.id = src.id
AND dst.zname IS NULL
AND src.zname IS NOT NULL
;
SELECT * FROM bobjohn;
NOTE: this query will fail if more than one name exists for a given Id. (and it won't touch records for which no non-null name exists)
If you are on a postgres version >-9, you could use a CTE to fetch the source tuples (this is equivalent to a subquery, but is easier to write and read (IMHO). The CTE also tackles the duplicate values-problem (in a rather crude way):
--
-- CTE's dont work in update queries for Postgres version below 9
--
WITH uniq AS (
SELECT DISTINCT id
-- if there are more than one names for a given Id: pick the lowest
, min(zname) as zname
FROM bobjohn
WHERE zname IS NOT NULL
GROUP BY id
)
UPDATE bobjohn dst
SET zname = src.zname
FROM uniq src
WHERE dst.id = src.id
AND dst.zname IS NULL
;
SELECT * FROM bobjohn;
UPDATE tbl
SET name = x.name
FROM (
SELECT DISTINCT ON (id) id, name
FROM tbl
WHERE name IS NOT NULL
ORDER BY id, name
) x
WHERE x.id = tbl.id
AND tbl.name IS NULL;
DISTINCT ON does the job alone. Not need for additional aggregation.
In case of multiple values for name, the alphabetically first one (according to the current locale) is picked - that's what the ORDER BY id, name is for. If name is unambiguous you can omit that line.
Also, if there is at least one non-null value per id, you can omit WHERE name IS NOT NULL.
If you know for a fact that there are no conflicting values (multiple rows with the same ID but different, non-null names) then something like this will update the table appropriately:
UPDATE some_table AS t1
SET name = (
SELECT name
FROM some_table AS t2
WHERE t1.id = t2.id
AND name IS NOT NULL
LIMIT 1
)
WHERE name IS NULL;
If you only want to query the table and have this information filled in on the fly, you can use a similar query:
SELECT
t1.id,
(
SELECT name
FROM some_table AS t2
WHERE t1.id = t2.id
AND name IS NOT NULL
LIMIT 1
) AS name
FROM some_table AS t1;