LIMIT by distinct values in PostgreSQL

LIMIT by distinct values in PostgreSQL - sql

I have a table of contacts with phone numbers similar to this:
Name Phone
Alice 11
Alice 33
Bob 22
Bob 44
Charlie 12
Charlie 55
I can't figure out how to query such a table with LIMITing the rows not just by plain count but by distinct names. For example, if I had a magic LIMIT_BY clause, it would work like this:
SELECT * FROM "Contacts" ORDER BY "Phone" LIMIT_BY("Name") 1
Alice 11
Alice 33
-- ^ only the first contact
SELECT * FROM "Contacts" ORDER BY "Phone" LIMIT_BY("Name") 2
Alice 11
Charlie 12
Alice 33
Charlie 55
-- ^ now with Charlie because his phone 12 goes right after 11. Bob isn't here because he's third, beyond the limit
How could I achieve this result?
In other words, select all rows containing top N distinct Names ordered by Phone

I don't think that PostgreSQL provides any particularly efficient way to do this, but for 6 rows it doesn't need to be very efficient. You could do a subquery to compute which people you want to see, then join that subquery back against the full table.
select * from
"Contacts" join
(select name from "Contacts" group by name order by min(phone) limit 2) as limited
using (name)
You could put the subquery in an IN-list rather than a JOIN, but that often performs worse.

If you want all names that are in the first n rows, you can use in:
select t.*
from t
where t.name in (select t2.name
from t t2
order by t2.phone
limit 2
);
If you want the first n names by phone:
select t.*
from t
where t.name in (select t2.name
from t t2
group by t2.name
order by min(t2.phone)
limit 2
);

try this:
SELECT distinct X.name
,X.phone
FROM (
SELECT *
FROM (
SELECT name
,rn
FROM (
SELECT name
,phone
,row_number() OVER (
ORDER BY phone
) rn
FROM "Contacts"
) AA
) DD
WHERE rn <= 2 --rn is the "limit" variable
) EE
,"Contacts" X
WHERE EE.name = X.name
above seems to be working correctly on following dataset:
create table "Contacts" (name text, phone text);
insert into "Contacts" (name, phone) VALUES
('Alice', '11'),
('Alice', '33'),
('Bob', '22'),
('Bob', '44'),
('Charlie', '13'),
('Charlie', '55'),
('Dennis', '12'),
('Dennis', '66');

Related

SQL for Exclude

I have a table which is a simple lists of ID numbers and NAMES - I am trying to write a SQL which only returns rows where the NAME does not have particular IDs.
This has been stumping me - the query below returns all as they have other IDs from the exclude lists (large range of IDs). How to structure a query where only those who don't have ID 2 or 3 are returned -- i.e. only returns 'bob' for table below.
select * from TABLE where ID not in (2, 3)
ID NAMES
1 bob
1 alice
2 alice
1 dave
2 dave
3 dave
4 dave
Thank you.

One method is group by and having:
select name
from t
group by name
having sum(case when ID in (2, 3) then 1 else 0 end) = 0;
If you want the original ids, you can add listagg(id, ',') within group (order by id) to the select. Or use not exists:
select t.*
from t
where not exists (select 1
from t t2
where t2.name = t.name and
t2.id in (2, 3)
);

Specific format Join results on SQL Server

I have trawled the internet looking for a solution but nothing so far.
Here are 2 sample tables joined on SID/ID
SID Name Attendance Class
1 abc good 1A
2 xyz bad 1B
3 dsk good 1A
4 uij bad 1B
5 sss bad 1A
6 fff good 1D
7 ccc good 1A
ID Lesson Result
1 Read Pass-67%
1 Write Pass-89%
1 Sing Pass-99%
2 Read Pass-75%
3 Sing Fail-47%
3 Read Pass-55%
4 Write Pass-90%
4 Sing Fail-10%
The results need to be in the following format.
A row showing the student name, followed by rows of the students' results.
If a student does not have any results they will not be included.
1, abc, good, 1A
1, Read, Pass-67%
1, Write, Pass-89%
1, Sing, Pass-99%
2, xyz, bad, 1B
2, Read, Pass-75%
3, dsk, good, 1A
3, Sing, Fail-47%
3, Read, Pass-55%
4, uij, bad, 1B
4, Write, Pass-90%
4, Sing, Fail-10%
I attempted using Union to no avail, it is similar to a pivot have not had any luck with that either. Is assume i’m missing a trick here, how can I get this done?
I have included the data if it makes it any easier!
CREATE TABLE RESULTS (ID Int, Lesson varchar(12), Result nvarchar(8))
insert into RESULTS (ID, Lesson, Result)
values
(1,'Read', 'Pass-67%'),
(1,'Write', 'Pass-89%'),
(1,'Sing', 'Pass-99%'),
(2,'Read', 'Pass-75%'),
(3,'Sing', 'Fail-47%'),
(3,'Read','Pass-55%'),
(4,'Write', 'Pass-90%'),
(4,'Sing', 'Fail-10%')
CREATE TABLE STUDENTS (ID int, Name varchar(5), Attendance nvarchar(10),
Class nvarchar (3))
insert into STUDENTS values
(1,'abc','good','1A'),
(2,'xyz','bad','1B'),
(3,'dsk','good','1A'),
(4,'uij','bad','1B'),
(5,'sss','bad','1A'),
(6,'fff','good','1D'),
(7,'ccc','good','1A')

You can use a UNION with a few workarounds.
;WITH Data AS
(
SELECT
S.ID,
S.Name,
S.Attendance,
S.Class,
IsStudent = 1
FROM
Students AS S
WHERE
EXISTS (SELECT 'at least one result' FROM Results AS R WHERE R.ID = S.ID)
UNION ALL
SELECT
ID = R.ID,
Name = R.Lesson,
Attendance = R.Result,
Class = NULL,
IsStudent = 0
FROM
Results AS R
)
SELECT
D.ID,
D.Name,
D.Attendance,
D.Class
FROM
Data AS D
ORDER BY
ID,
IsStudent DESC
But, as you can see on the final column names, you are mixing different data together which is not a good thing to do.

Use union all :
select t.*
from(select ID, Name, Attendance, class
from STUDENTS s
where exists (select 1 from RESULTS where id = s.id) union all
select ID, Lesson, Result, null
from RESULTS r
) t
order by id, (case when class is not null then 0 else 1 end);

Simply concat those columns and Union
SELECT CONVERT(VARCHAR(10),id)+' , '+Name+' , '+Attendance
AS ResultSet INTO #T FROM dbo.STUDENTS
UNION ALL
SELECT CONVERT(VARCHAR(10),ID)+' , '+Lesson+' , '+ Result
FROM dbo.RESULTS
SELECT * FROM #T ORDER BY ResultSet
DROP TABLE #T

Comparing two rows in Oracle DB and Showing Column Mismatch

Table Columns: Id, Name, Age
First Rows:
select 11, 'James', 22 from dual;
This will return
11 James 22
Second Row:
select * from supplier where id=11`;
This will return
11 Vinod 25
Now I wanted to compare both rows:
11 James 22
11 Vinod 25
It should return the columns which has differences.
Name Mismatch
Age Mismatch
I am using 12c is there Built in feature in oracle which will solve this.
Or any other ways from which I can achieve the solution for the same.
Thanks In advance..
`

You can use join and decode (can use case alternatively) to find out if column value matches:
with cte(id, name, age) as (select 11, 'James', 22 from dual)
select
s.id,
decode(s.name, t.name, null, 'Name mismatch') name_check,
decode(s.age, t.age, null, 'Age mismatch') age_check
from supplier s
inner join cte t
on s.id = t.id
where s.id = 11;

Get distinct individual column values (not distinct pairs) from two tables in single query

I have two tables like the following. One is for sport talents of some people and second for arts talents. One may not have a sport talent to list and same applies for art talent.
CREATE TABLE SPORT_TALENT(name varchar(10), TALENT varchar(10));
CREATE TABLE ART_TALENT(name varchar(10), TALENT varchar(10));
INSERT INTO SPORT_TALENT(name, TALENT) VALUES
('Steve', 'Footbal')
,('Steve', 'Golf')
,('Bob' , 'Golf')
,('Mary' , 'Tennnis');
INSERT INTO ART_TALENT(name, TALENT) VALUES
('Steve', 'Dancer')
, ('Steve', 'Singer')
, ('Bob' , 'Dancer')
, ('Bob' , 'Singer')
, ('John' , 'Dancer');
Now I want to list down sport talent and art talent of one person. I would like to avoid duplication. But I don't mind if there is a "null" in any output. I tried the following
select distinct sport_talent.talent as s_talent,art_talent.talent as a_talent
from sport_talent
JOIN art_talent on sport_talent.name=art_talent.name
where (sport_talent.name='Steve' or art_talent.name='Steve');
s_talent | a_talent
----------+----------
Footbal | Dancer
Golf | Singer
Footbal | Singer
Golf | Dancer
I would like to avoid redundancy and need something like the following (distinct values of sport talents + distinct values of art talents).
s_talent | a_talent
----------+----------
Footbal | Dancer
Golf | Singer
As mentioned in subject, I am not looking for distinct combinations. But at the same time, it's OK if there are some records with "null" value in one column. I am relatively new to SQL.

Try:
SELECT s_talent, a_talent
FROM (
SELECT distinct on (talent) talent as s_talent,
dense_rank() over (order by talent) as x
FROM SPORT_TALENT
WHERE name='Steve'
) x
FULL OUTER JOIN (
SELECT distinct on (talent) talent as a_talent,
dense_rank() over (order by talent) as x
FROM ART_TALENT
WHERE name='Steve'
) y
ON x.x = y.x
Demo: http://sqlfiddle.com/#!15/66e04/3

There are no duplicates in your query. Each of the four records in your query return is unique. This result may not be what you want, but seems like its problem is not the duplicate.

Postgres 9.4
... introduces unnest() with multiple arguments. Does exactly what you want, and should be fast, too. Per documentation:
The special table function UNNEST may be called with any number of
array parameters, and it returns a corresponding number of columns, as
if UNNEST (Section 9.18) had been called on each parameter separately
and combined using the ROWS FROM construct.
About ROWS FROM:
Compare result of two table functions using one column from each
SELECT *
FROM unnest(
ARRAY(SELECT DISTINCT talent FROM sport_talent WHERE name = 'Steve')
, ARRAY(SELECT DISTINCT talent FROM art_talent WHERE name = 'Steve')
) AS t(s_talent, a_talent);
Postgres 9.3 or older
SELECT s_talent, a_talent
FROM (
SELECT talent AS s_talent, row_number() OVER () AS rn
FROM sport_talent
WHERE name = 'Steve'
GROUP BY 1
) s
FULL JOIN (
SELECT talent AS a_talent, row_number() OVER () AS rn
FROM art_talent
WHERE name = 'Steve'
GROUP BY 1
) a USING (rn);
Similar previous answers with more explanation:
What type of JOIN to use
Sort columns independently, such that all nulls are last per column
This is similar to what #kordirko posted, but uses GROUP BY to get distinct talents, which is evaluated before window functions. So we only need a bare row_number() and not the more expensive dense_rank().
About the sequence of events in a SELECT query:
Best way to get result count before LIMIT was applied
SQL Fiddle.

Simple Query to Grab Max Value for each ID

OK I have a table like this:
ID Signal Station OwnerID
111 -120 Home 1
111 -130 Car 1
111 -135 Work 2
222 -98 Home 2
222 -95 Work 1
222 -103 Work 2
This is all for the same day. I just need the Query to return the max signal for each ID:
ID Signal Station OwnerID
111 -120 Home 1
222 -95 Work 1
I tried using MAX() and the aggregation messes up with the Station and OwnerID being different for each record. Do I need to do a JOIN?

Something like this? Join your table with itself, and exclude the rows for which a higher signal was found.
select cur.id, cur.signal, cur.station, cur.ownerid
from yourtable cur
where not exists (
select *
from yourtable high
where high.id = cur.id
and high.signal > cur.signal
)
This would list one row for each highest signal, so there might be multiple rows per id.

You are doing a group-wise maximum/minimum operation. This is a common trap: it feels like something that should be easy to do, but in SQL it aggravatingly isn't.
There are a number of approaches (both standard ANSI and vendor-specific) to this problem, most of which are sub-optimal in many situations. Some will give you multiple rows when more than one row shares the same maximum/minimum value; some won't. Some work well on tables with a small number of groups; others are more efficient for a larger number of groups with smaller rows per group.
Here's a discussion of some of the common ones (MySQL-biased but generally applicable). Personally, if I know there are no multiple maxima (or don't care about getting them) I often tend towards the null-left-self-join method, which I'll post as no-one else has yet:
SELECT reading.ID, reading.Signal, reading.Station, reading.OwnerID
FROM readings AS reading
LEFT JOIN readings AS highersignal
ON highersignal.ID=reading.ID AND highersignal.Signal>reading.Signal
WHERE highersignal.ID IS NULL;

In classic SQL-92 (not using the OLAP operations used by Quassnoi), then you can use:
SELECT g.ID, g.MaxSignal, t.Station, t.OwnerID
FROM (SELECT id, MAX(Signal) AS MaxSignal
FROM t
GROUP BY id) AS g
JOIN t ON g.id = t.id AND g.MaxSignal = t.Signal;
(Unchecked syntax; assumes your table is 't'.)
The sub-query in the FROM clause identifies the maximum signal value for each id; the join combines that with the corresponding data row from the main table.
NB: if there are several entries for a specific ID that all have the same signal strength and that strength is the MAX(), then you will get several output rows for that ID.
Tested against IBM Informix Dynamic Server 11.50.FC3 running on Solaris 10:
+ CREATE TEMP TABLE signal_info
(
id INTEGER NOT NULL,
signal INTEGER NOT NULL,
station CHAR(5) NOT NULL,
ownerid INTEGER NOT NULL
);
+ INSERT INTO signal_info VALUES(111, -120, 'Home', 1);
+ INSERT INTO signal_info VALUES(111, -130, 'Car' , 1);
+ INSERT INTO signal_info VALUES(111, -135, 'Work', 2);
+ INSERT INTO signal_info VALUES(222, -98 , 'Home', 2);
+ INSERT INTO signal_info VALUES(222, -95 , 'Work', 1);
+ INSERT INTO signal_info VALUES(222, -103, 'Work', 2);
+ SELECT g.ID, g.MaxSignal, t.Station, t.OwnerID
FROM (SELECT id, MAX(Signal) AS MaxSignal
FROM signal_info
GROUP BY id) AS g
JOIN signal_info AS t ON g.id = t.id AND g.MaxSignal = t.Signal;
111 -120 Home 1
222 -95 Work 1
I named the table Signal_Info for this test - but it seems to produce the right answer.
This only shows that there is at least one DBMS that supports the notation. However, I am a little surprised that MS SQL Server does not - which version are you using?
It never ceases to surprise me how often SQL questions are submitted without table names.

WITH q AS
(
SELECT c.*, ROW_NUMBER() OVER (PARTITION BY id ORDER BY signal DESC) rn
FROM mytable
)
SELECT *
FROM q
WHERE rn = 1
This will return one row even if there are duplicates of MAX(signal) for a given ID.
Having an index on (id, signal) will greatly improve this query.

with tab(id, sig, sta, oid) as
(
select 111 as id, -120 as signal, 'Home' as station, 1 as ownerId union all
select 111, -130, 'Car', 1 union all
select 111, -135, 'Work', 2 union all
select 222, -98, 'Home', 2 union all
select 222, -95, 'Work', 1 union all
select 222, -103, 'Work', 2
) ,
tabG(id, maxS) as
(
select id, max(sig) as sig from tab group by id
)
select g.*, p.* from tabG g
cross apply ( select top(1) * from tab t where t.id=g.id order by t.sig desc ) p

We can do using self join
SELECT T1.ID,T1.Signal,T2.Station,T2.OwnerID
FROM (select ID,max(Signal) as Signal from mytable group by ID) T1
LEFT JOIN mytable T2
ON T1.ID=T2.ID and T1.Signal=T2.Signal;
Or you can also use the following query
SELECT t0.ID,t0.Signal,t0.Station,t0.OwnerID
FROM mytable t0
LEFT JOIN mytable t1 ON t0.ID=t1.ID AND t1.Signal>t0.Signal
WHERE t1.ID IS NULL;

select a.id, b.signal, a.station, a.owner from
mytable a
join
(SELECT ID, MAX(Signal) as Signal FROM mytable GROUP BY ID) b
on a.id = b.id AND a.Signal = b.Signal

SELECT * FROM StatusTable
WHERE Signal IN (
SELECT A.maxSignal FROM
(
SELECT ID, MAX(Signal) AS maxSignal
FROM StatusTable
GROUP BY ID
) AS A
);

select
id,
max_signal,
owner,
ownerId
FROM (
select * , rank() over(partition by id order by signal desc) as max_signal from table
)
where max_signal = 1;

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

LIMIT by distinct values in PostgreSQL - sql

Related

SQL for Exclude

Specific format Join results on SQL Server

Comparing two rows in Oracle DB and Showing Column Mismatch

Get distinct individual column values (not distinct pairs) from two tables in single query

Simple Query to Grab Max Value for each ID

Categories

Resources