Using Limit on Distinct group by values psql - sql

Suppose I have a table that looks like this or maybe I am going nowhere.
create table customers (id text, name text, number int, useless text);
With values
insert into customers (id, name, number, useless)
values
('1','apple',1, 'a'),
('2','banana',3, 'b'),
('3','pear',2, 's'),
('4','apple',1,'e'),
('5','banana',3,'s'),
('6','cherry',3, 'a'),
('7','cherry',4, 's'),
('8','apple',2, 'd'),
('9','banana',4, 'c'),
('10','pear',5, 'e');
My failed psql query is this.
select id, name, number, useless
from customers
where number < 4
group by customers.name limit 2
the query i want to use that it returns first 2 unique grouped by customers.name. Not the first 2 rows
In the end I want it to return
('1','apple',1, 'a'),
('4','apple',1,'e'),
('8','apple',2, 'd'),
('2','banana',3, 'b'),
('5','banana',3,'s'),
so it returns the first 2 grouped names.
How can I make this query?
Thank you.
Edit:
this query is my second try I know I am kinda close.
select t.id, t.name, t.ranking
from (
SELECT id, name, dense_rank() OVER (order by name) as
ranking
FROM customers
group by name
) t
where t.ranking < 3

try this:
select id, name, number, useless
from customers
where name in (
select name
from customers
where number < 4
group by customers.name
order by name limit 2
)
| id | name | number | useless |
|----|--------|--------|---------|
| 1 | apple | 1 | a |
| 2 | banana | 3 | b |
| 4 | apple | 1 | e |
| 5 | banana | 3 | s |
| 8 | apple | 2 | d |
| 9 | banana | 4 | c |
SQL Fiddle DEMO

The group by customers.name function do not order your output, just group them by the customers.name, what you want to do is to order the group right? So what i think you want to do is:
select id, name, number, useless
from customers
group by name
order by name []*
*[asc/desc] depends of what order you want to do:
asc - ascendent,
desc - descendent
Hope it helps you.

You can use dense_rank() as:
SELECT * FROM (
SELECT DENSE_RANK() OVER (order by name) AS rank, temp.*
FROM customers temp WHERE number < 4) data
WHERE data.rank <= 2
| rank| id| name | number | useless |
|-----|---|--------|--------|---------|
| 1 | 4 | apple | 1 | e |
| 1 | 1 | apple | 1 | a |
| 1 | 8 | apple | 2 | d |
| 2 | 5 | banana | 3 | s |
| 2 | 2 | banana | 3 | b |

Related

SQL group by a field and only return one joined row for each grouping

Table data
+-----+----------------+--------+----------------+
| ID | Required_by | Name | Another_Field |
+-----+----------------+--------+----------------+
| 1 | 7 August | cat | X |
| 2 | 7 August | cat | Y |
| 3 | 10 August | cat | Z |
| 4 | 11 August | dog | A |
+-----+----------------+--------+----------------+
What I want to do is group by the name, then for each group choose one of the rows with the earliest required by date.
For this data set, I would like to end up with either rows 1 and 4, or rows 2 and 4.
Expected result:
+-----+----------------+--------+----------------+
| ID | Required_by | Name | Another_Field |
+-----+----------------+--------+----------------+
| 1 | 7 August | cat | X |
| 4 | 11 August | dog | A |
+-----+----------------+--------+----------------+
OR
+-----+----------------+--------+----------------+
| ID | Required_by | Name | Another_Field |
+-----+----------------+--------+----------------+
| 2 | 7 August | cat | Y |
| 4 | 11 August | dog | A |
+-----+----------------+--------+----------------+
I have something that returns 1,2 and 4 but I'm not sure how to only pick one from the first group to get the desired result. I'm joining the grouping with the data table so that I can get the ID and another_field back after the grouping.
SELECT d.id, d.name, d.required_by, d.another_field
FROM
(
SELECT min(required_by) as min_date, name
FROM data
GROUP BY name
) agg
INNER JOIN
data d
on d.required_by = agg.min_date AND d.name = agg.name
This is typically solved using window functions:
select d.id, d.name, d.required_by, d.another_field
from (
select id, name, required_by, another_field,
row_number() over (partition by name order by required_by) as rn
from data
) d
where d.rn = 1;
In Postgres using distinct on() is typically faster:
select distinct on (name) *
from data
order by name, required_by
Online example
SELECT [id]
,[date]
,[name]
FROM [test].[dbo].[data]
WHERE date IN (SELECT min(date) FROM data GROUP BY name)
enter image description here

How to delete the rows with three same data columns and one different data column

I have a table "MARK_TABLE" as below.
How can I delete the rows with same "STUDENT", "COURSE" and "SCORE" values?
| ID | STUDENT | COURSE | SCORE |
|----|---------|--------|-------|
| 1 | 1 | 1 | 60 |
| 3 | 1 | 2 | 81 |
| 4 | 1 | 3 | 81 |
| 9 | 2 | 1 | 80 |
| 10 | 1 | 1 | 60 |
| 11 | 2 | 1 | 80 |
Now I already filtered the data I want to KEEP, but without the "ID"...
SELECT student, course, score FROM mark_table
INTERSECT
SELECT student, course, score FROM mark_table
The output:
| STUDENT | COURSE | SCORE |
|---------|--------|-------|
| 1 | 1 | 60 |
| 1 | 2 | 81 |
| 1 | 3 | 81 |
| 2 | 1 | 80 |
Use the following query to delete the desired rows:
DELETE FROM MARK_TABLE M
WHERE
EXISTS (
SELECT
1
FROM
MARK_TABLE M_IN
WHERE
M.STUDENT = M_IN.STUDENT
AND M.COURSE = M_IN.COURSE
AND M.SCORE = M_IN.SCORE
AND M.ID < M_IN.ID
)
OUTPUT
db<>fiddle demo
Cheers!!
use distinct
SELECT distinct student, course, score FROM mark_table
Assuming you don't just want to select the unique data you want to keep (you mention you've already done this), you can proceed as follows:
Create a temporary table to hold the data you want to keep
Insert the data you want to keep into the temporary table
Empty the source table
Re-Insert the data you want to keep into the source table.
select * from
(
select row_number() over (partition by student,course,score order by score)
rn,student,course,score from mark_table
) t
where rn=1
Use CTE with RowNumber
create table #MARK_TABLE (ID int, STUDENT int, COURSE int, SCORE int)
insert into #MARK_TABLE
values
(1,1,1,60),
(3,1,2,81),
(4,1,3,81),
(9,2,1,80),
(10,1,1,60),
(11,2,1,80)
;with cteDeleteID as(
Select id, row_number() over (partition by student,course,score order by score) [row_number] from #MARK_TABLE
)
delete from #MARK_TABLE where id in
(
select id from cteDeleteID where [row_number] != 1
)
select * from #MARK_TABLE
drop table #MARK_TABLE

Postgresql : Mark the first row of a group

I have a table t like this :
id | group_id | name
------------------------
1 | 1 | richard
2 | 1 | ray
3 | 2 | enzo
4 | 2 | shiela
5 | 2 | anne
I have no problem selecting each group, however I want to mark the first occurrence for each group by group_id. Then add it as column to mark that the row is the first occurrence of that group.
E.g, Richard for group 1, or Enzo for group 2 and so on.
I should be able to use:
select
t.*
case
when (condition)
...(boolean result here)
end as is_first_row
from t
and result to :
id | group_id | name |is_first_row
-------------------------------
1 | 1 | richard | t
2 | 1 | ray | f
3 | 2 | enzo | t
4 | 2 | shiela | f
5 | 2 | anne | f
How do I formulate the condition statement for the select query?
Use row_number():
with my_table(id, group_id, name) as (
values
(1, 1, 'richard'),
(2, 1, 'ray'),
(3, 2, 'enzo'),
(4, 2, 'shiela'),
(5, 2, 'anne')
)
select *, row_number() over w = 1 as is_first_row
from my_table
window w as (partition by group_id order by id);
id | group_id | name | is_first_row
----+----------+---------+--------------
1 | 1 | richard | t
2 | 1 | ray | f
3 | 2 | enzo | t
4 | 2 | shiela | f
5 | 2 | anne | f
(5 rows)
Select row_number() to see how it works. Row numbers are calculated in partitions by group_id i.e. for every group_id separately, in order by id:
with my_table(id, group_id, name) as (
values
(1, 1, 'richard'),
(2, 1, 'ray'),
(3, 2, 'enzo'),
(4, 2, 'shiela'),
(5, 2, 'anne')
)
select *, row_number() over w
from my_table
window w as (partition by group_id order by id);
id | group_id | name | row_number
----+----------+---------+------------
1 | 1 | richard | 1
2 | 1 | ray | 2
3 | 2 | enzo | 1
4 | 2 | shiela | 2
5 | 2 | anne | 3
(5 rows)
please check my answer and let me know in case of any error in the logic
Create Table #Temp(id int,group_id int,name nvarchar(max))
Insert into #Temp values
(1,1,'richard')
,(2,1,'ray')
,(3,2,'enzo')
,(4,2,'shiela')
,(5,2,'anne')
Select t2.id,t2.group_id,t2.name,t1.group_id_c, case
when t1.group_id_c=1 then 't'
else 'f'
end AS is_firstrow from #temp t2 join
(Select t.*, row_number() over (partition by group_id order by id) as group_id_c from #Temp t ) t1
on t1.id=t2.id

Select query where record count = 2 and column contains either two values

Example 1
+--------------------------+
| IDENT | CURRENT | SOURCE |
+--------------------------+
| 12345 | 12345 | A |
| 23456 | 12345 | B |
| 34567 | 12345 | C |
+--------------------------+
Example 2
+--------------------------+
| IDENT | CURRENT | SOURCE |
+--------------------------+
| 12345 | 55555 | A |
| 23456 | 55555 | B |
+--------------------------+
Trying to write select query that will show all records that CURRENT count = 2 and SOURCE contains both A and B (NOT C).
Example A should not show up as there are 3 entries for the CURRENT as record is linked to SOURCE C.
Example B is what I'm looking the query to find, CURRENT has two records and is only linked to SOURCE 'A' and 'B'.
Currently if I run something similar to "where SOURCE = A or SOURCE = B", results are records that just have SOURCE of A, OR A+C.
NOTES: IDENT is always a unique value. CURRENT links multiple IDENTS from different SOURCE's.
We're clearly missing more information. Let's take example data (thanks gloomy for the initial fiddle).
| ID | CURRENT | SOURCE |
|----|---------|--------|
| 1 | 111 | A |
| 2 | 111 | B |
| 3 | 111 | C |
| 4 | 222 | A |
| 5 | 222 | B |
| 6 | 333 | A |
| 7 | 333 | C |
| 8 | 444 | B |
| 9 | 444 | C |
| 10 | 555 | B |
| 11 | 666 | A |
| 12 | 666 | A |
| 13 | 666 | B |
| 14 | 777 | A |
| 15 | 777 | A |
I assume you only need this as the result:
| ID | CURRENT | SOURCE |
|----|---------|--------|
| 4 | 222 | A |
| 5 | 222 | B |
This query will work with any amount of sources and result in the expected output:
SELECT * FROM test
WHERE CURRENT IN (
SELECT CURRENT FROM test
WHERE CURRENT NOT IN (
SELECT CURRENT FROM test
WHERE SOURCE NOT IN ('A', 'B')
)
GROUP BY CURRENT
HAVING count(SOURCE) = 2 AND count(DISTINCT SOURCE) = 2
)
If SOURCE values are guaranteed to be unique per CURRENT:
SELECT CURRENT
FROM atable
GROUP BY CURRENT
HAVING COUNT(SOURCE) = 2
AND COUNT(CASE WHEN SOURCE IN ('A', 'B') THEN SOURCE END) = 2
;
If SOURCE values aren't unique per CURRENT but CURRENTs with duplicate entries of 'A' or 'B' are allowed:
SELECT CURRENT
FROM atable
GROUP BY CURRENT
HAVING COUNT(DISTINCT SOURCE) = 2
AND COUNT(DISTINCT CASE WHEN SOURCE IN ('A', 'B') THEN SOURCE END) = 2
;
If SOURCE values aren't unique and groups with duplicate SOURCE entries aren't allowed:
SELECT CURRENT
FROM atable
GROUP BY CURRENT
HAVING COUNT(SOURCE) = 2
AND COUNT(DISTINCT SOURCE) = 2
AND COUNT(DISTINCT CASE WHEN SOURCE IN ('A', 'B') THEN SOURCE END) = 2
;
Every query returns only distinct CURRENT values matching the requirements. Use the query as a derived dataset and join it back to your table to get the details.
All the above options assume that either SOURCE is a NOT NULL column or that NULLs can just be disregarded.
Records where current count = 2:
SELECT CURRENT
FROM table
GROUP BY CURRENT
HAVING COUNT(*) = 2
Records where C is in SOURCE values:
SELECT CURRENT
FROM table
WHERE SOURCE = 'C'
Global query:
SELECT t.*
FROM TABLE t
WHERE t.CURRENT IN (
SELECT CURRENT
FROM table
GROUP BY CURRENT
HAVING COUNT(*) = 2
) AND t.CURRENT NOT IN (
SELECT CURRENT
FROM table
WHERE SOURCE = 'C'
)
http://sqlfiddle.com/#!2/69be9/8/0
select * from test where current in (
select test_a.current
from
(select *
from test
where source = 'A') as test_a
join (select *
from test
where source = 'B') as test_b
on test_b.current = test_a.current
where test_a.current not in
(select current from test where source='C')
)
SELECT *
FROM TABLE mainTbl,
(SELECT CURRENT
FROM TABLE
WHERE source IN ('A', 'B')
HAVING COUNT(1) = 2
GROUP BY CURRENT
) selectedSet
WHERE mainTbl.current = selectedSet.current
AND mainTbl.source IN ('A', 'B');

SQL query to search by multiple tags with relevance sorting

I've got a set of cities that have a many-to-many relationship with a set of tags. The user gives me a collection of tags (which may contain duplicates!), and I need to return a list of matching entries, sorted by relevance.
The Data
Here's some sample data to illustrate the problem:
Cities:
--------------------
| id | city |
--------------------
| 1 | Atlanta |
| 2 | Baltimore |
| 3 | Cleveland |
| 4 | Denver |
| 5 | Eugene |
--------------------
Tags:
------
| id |
------
| 1 |
| 2 |
| 3 |
| 4 |
------
The cities are tagged like this:
Atlanta: 1, 2
Baltimore: 3
Cleveland: 1, 3, 4
Denver: 2, 3
Eugene: 1, 4
...so the CityTags table looks like:
------------------------
| city_id | tag_id |
------------------------
| 1 | 1 |
| 1 | 2 |
| 2 | 3 |
| 3 | 1 |
| 3 | 3 |
| 3 | 4 |
| 4 | 2 |
| 4 | 3 |
| 5 | 1 |
| 5 | 4 |
------------------------
Example 1
If the user gives me the tag ids: [1, 3, 3, 4], I want to count how many matches I have for each of the tags, and return a relevance-sorted result like:
------------------------
| city | matches |
------------------------
| Cleveland | 4 |
| Baltimore | 2 |
| Eugene | 2 |
| Atlanta | 1 |
| Denver | 1 |
------------------------
Since Cleveland matched all four tags, it's first, followed by Baltimore and Eugene, which each had two tags match, etc.
Example 2
One more example to make for good measure. For the search [2, 2, 2, 3, 4], we'd get:
------------------------
| city | matches |
------------------------
| Denver | 4 |
| Atlanta | 3 |
| Cleveland | 2 |
| Baltimore | 1 |
| Eugene | 1 |
------------------------
SQL
If I ignore the repeated tags, then it's trivial:
SELECT name,COUNT(name) AS relevance FROM
(SELECT name FROM cities,citytags
WHERE id=city_id AND tag_id IN (1,3,3,4)) AS matches
GROUP BY name ORDER BY relevance DESC;
But that's not what I need. I need to respect the duplicates. Can someone suggest how I might accomplish this?
Solution in Postgresql
Aha! A temporary table is was I needed. Postgresql lets me do this with its WITH syntax. Here's the solution:
WITH search(tag) AS (VALUES (1), (3), (3), (4))
SELECT name, COUNT(name) AS relevance FROM cities
INNER JOIN citytags ON cities.id=citytags.city_id
INNER JOIN search ON citytags.tag_id=search.tag
GROUP BY name ORDER BY relevance DESC;
Thank you very much to those that answered.
If the user list comes in as a comma-separated list, you could try turning it into a temp table and join on that instead. I don't know the relveant syntax for PosteGRE, so here is the idea in MySql:
create temporary table usertags (tag_id int);
insert usertags values (1),(3),(3),(4);
SELECT name, COUNT(name) AS relevance
FROM cities
JOIN citytags on cities.id = citytags.city_id
JOIN usertags on citytags.tag_id = usertags.tag_id
GROUP BY name ORDER BY relevance DESC;
Converting the comma-separated list to the above code would be as simple as doing a replace all of , to ),( using your server-side language, and then embedding that into a VALUES statement to populate the temp table.
Demo (MySql): http://www.sqlize.com/1qNThhD9tC
Stick all the tags into a table and then JOIN instead of including them in an IN list.
CREATE TABLE #input (
tag_id INT NOT NULL
)
;
INSERT INTO #input
SELECT 1
UNION ALL SELECT 3
UNION ALL SELECT 3
UNION ALL SELECT 4
;
SELECT
city.name,
search.relevance
FROM
city
INNER JOIN
(
SELECT
city_id,
COUNT(*) AS relevance
FROM
citytags
INNER JOIN
#input
ON #input.tag_id = citytags.tag_id
GROUP BY
city_id
)
AS search
ON search.city_id = city.id
ORDER BY
search.relevance DESC
;