How to join table with maximum value in column? - sql

This seems like a fairly simple SQL statement, but I'm just not seeing it. I have:
a table called Groups. columns are Id, Name, Address, and a bunch of other columns we don't care about.
a table called People. columns are Id, GroupId, Name, Priority, and a bunch of other columns we don't care about.
I want a list of all Groups. each row in this list needs to have the Id, Name, and Address of the Group. It also needs to have the Name of the person with the highest Priority. Priority can be NULL as well, which would be selected if there was no other person in that group with a non null. If there are any ties for Priority, I don't care which one gets selected as long as its one of them.
Sample data:
Group: (Id, Name, Address)
1, Group A, Address A
2, Group B, Address B
3, Group C, Address C
4, Group D, Address D
People: (Id, GroupId, Name, Priority)
1, 1, Alice, 39
2, 1, Bob, 22
3, 1, Craig, 88
4, 2, David, NULL
5, 2, Elise, 3
6, 3, Frank, NULL
Results should be:
1, Group A, Address A, Craig
2, Group B, Address B, Elise
3, Group C, Address C, Frank
4, Group D, Address D, NULL
I can use SELECT * FROM People ORDER BY Priority DESC to get the rows in the right order, but I'd need to only get the first row for each GroupId from that.

The other answer will also work, but this is useful if you also want to, say, know both the person and the priority of that person (In my experience this kind of thing is common):
SELECT Id, Name, Address, Person, Priority
FROM
(
SELECT g.Id, g.Name, g.Address, p.Name Person, p.Priority,
row_number() OVER (PARTITION BY g.ID ORDER BY p.Priority DESC) rn
FROM [Group] g
INNER JOIN People p ON p.GroupID = g.ID
) t
WHERE rn = 1

Use a sub-query to obtain the relevant Person per group.
SELECT *
, (
SELECT TOP 1 Name
FROM People p
WHERE p.GroupId = g.Id
ORDER BY Priority DESC
)
FROM GROUP g;

Related

Group items by 2 columns

Let's say we have a Sqlite table containing:
name;city;age;id
Alice;New-York;25;13782749
Eve;Chicago;23;1938679
Bob;New-York;25;824697624
How to group by h=CONCAT(city,age):
h;name;id
group1;Alice;13782749
group1;Bob;824697624
group2;Eve;1938679
Instead of group1, group2, it's ok to have 1, 2, or even a hash f68ac46, c3155a0 for each group.
The closest I could get is:
select (city||age) as h, * from mytable order by h;
but I'd like a group number or a hash instead, and not display city||age (which in my real case can be long).
You could enumerate the groups using dense_rank():
select dense_rank() over (order by city, age) as grpnum, name, id
from t;

Filter on specific columns and return all columns

I am trying to left join two tables and retrieve all columns from table one but remove duplicates based on a set of columns.
SELECT A.*, B.impact
FROM #Site_one AS A WITH (NOLOCK)
LEFT JOIN #Progress AS B With (NOLOCK)
ON lower(A.site_code) = lower(B.site_code)
GROUP BY A.date, A.operationid, A.worklocation, A.siteid, A.alias
This does not work as there will be column in A which either need to be aggregated or be added to the group by clause. The issue with that is that I do not want to filter on those columns and do not want them aggregated.
Is there a way to select all columns in A and the impact column in B and still be able to filter out duplicates on the columns specified in the group by clause?
Any pointers/help would be greatly appreciated.
and still be able to filter out duplicates on the columns specified in the group by clause
But, how does the database really know which rows to throw away? Suppose you have:
Person
John, 42, Stockbroker
John, 36, Train driver
John, 58, Retired
John, 58, Metalworker
And you think "I wanna dedupe those based on the name":
SELECT * FROM person GROUP BY name
So which three Johns should the DB throw away?
It cannot decide this for you; you have to write the query to make it clear what you want to keep or throw
You could MAX everything:
SELECT name, MAX(age), MAX(job) FROM person GROUP BY name
That'll work.. but it gives you a John that never existed in the original data:
John, 58, Train driver
You could say "I'll only keep the person with the max age":
SELECT p.*
FROM
person p
INNER JOIN (SELECT name, max(age) as maxage FROM person GROUP BY name) maxp
ON p.name = maxp.name AND p.age = maxp.maxage
.. but there are two people with the same max age.
Your DB might have a row number analytic, which is nice:
SELECT *, row_number() over(PARTITION BY name ORDER BY age DESC) rn
FROM person
One of your 58 year old Johns will get row number 1 - can't be sure which one, but you could then discard all the rows with an rn > 1:
WITH x as (
SELECT *, row_number() over(PARTITION BY name ORDER BY age DESC) rn
FROM person
)
SELECT name, age, job
INTO newtable
FROM x
WHER rn = 1
..but what if you discarded the wrong John...
You're going to have to go and think about this some more, and exactly specify what to throw away...

Find the customer name of any customers who have an interest in ALL artists

I am looking for customer name of any customers who have an interest in all artists. I am not sure how to get it using sql. Needed help here.
I have these tables with columns.
Artist (ARTISTID,LASTNAME, FIRSTNAME)
Customer (CUSTOMERID, LASTNAME, FIRSTNAME)
CUSTOMER_ARTIST_INT (ARTISTID, CUSTOMERID)
You can get the customer ids by doing:
select customerid
from customer_artist_int cai
group by customerid
having count(*) = (select count(*) from artist);
You can then use a join or in to get the rest of the customer information.
A neat way to approach such problems is to invert them - find all the customers for which there exists no artist they aren't interested in. It may sound a bit convoluted in English, but when you convert it to SQL, it becomes quite elegant:
SELECT *
FROM customer c
WHERE NOT EXISTS (SELECT *
FROM artist a
JOIN customer_artist_int cai ON a.artistid = cai.artistid
WHERE cai.customerid = c.customerid)
If I understad well, this may be a way
Setup:
create table Artist (ARTISTID,LASTNAME, FIRSTNAME) as
select '1', 'lastname_artist_1', 'first_name_artist_1' from dual union all
select '2', 'lastname_artist_2', 'first_name_artist_2' from dual union all
select '3', 'lastname_artist_3', 'first_name_artist_3' from dual
create table Customer (CUSTOMERID, LASTNAME, FIRSTNAME) as
select '1', 'lastname_customer_1', 'first_name_customer_1' from dual union all
select '2', 'lastname_customer_2', 'first_name_customer_2' from dual union all
select '3', 'lastname_customer_3', 'first_name_customer_3' from dual
create table CUSTOMER_ARTIST_INT (ARTISTID, CUSTOMERID) as
select 1, 1 from dual union all
select 2, 1 from dual union all
select 3, 1 from dual union all
select 2, 2 from dual
Query:
SELECT CUSTOMERID,
LASTNAME,
FIRSTNAME,
COUNT(DISTINCT artistid)
FROM customer c INNER JOIN customer_artist_int USING (customerid)
GROUP BY CUSTOMERID,
LASTNAME,
FIRSTNAME
HAVING COUNT(artistid) = (SELECT COUNT(1) FROM Artist)
Here you count the number of artstis that every customer is interested in and check if it is equal to the number of all the artists.
i don't think mine is the most elegant of all answers, but here is my attempt..
SELECT c.FIRSTNAME, c.LASTNAME, c.CUSTOMERID
FROM DTOOHEY.CUSTOMER c, DTOOHEY.CUSTOMER_ARTIST_INT cai
WHERE c.CUSTOMERID = cai.CUSTOMERID
AND c.CUSTOMERID IN
(SELECT cai.CUSTOMERID
FROM DTOOHEY.CUSTOMER_ARTIST_INT cai
GROUP BY cai.CUSTOMERID
HAVING COUNT (*) = (SELECT COUNT (*) FROM DTOOHEY.ARTIST)
)
GROUP BY c.FIRSTNAME, c.LASTNAME, c.CUSTOMERID;
based on my limited knowledge, the flow of command is:
1) I am trying to get the customer ID, first name and last name of customer
2) I am getting it from the 2 tables (cai and c)
3) trying to join the 2 tables to give me a single data set
4) where the c.customerid is to be gathered in...
this is where the magic begins!!!
5) select the customerID (the single CustomerID)
6) from this table cai
7) group the result based on customerID, this is what gives the single CustomerID Value that you need...
8) having COUNT (*) - having the count of customerID value, to that of equal of the number of count of artists in the dtoohey.artist table.
the main logic is that the number of artist in the artist table (which is 11), exist in the CUSTOMER_ARTIST_INT in the same quantity. As such, we can tally the result of count from the ARTIST Table into the CUSTOMER_ARTIST_INT table.

How to count the number of times an element appears consecutively in a table in Teradata?

I have a table that looks like this
ID, Order, Segment
1, 1, A
1, 2, B
1, 3, B
1, 4, C
1, 5, B
1, 6, B
1, 7, B
1, 8, B
Basically by ordering the data using the Order column. I would like to understand the number of consecutive B's for each of the ID's. Ideally the output I would like is
ID, Consec
1, 2
1, 4
Because the segment B appears consecutively in row 2 and 3 (2 times), and then again in row 5,6,7,8 (4 times).
I can't think of a solution in SQL since there is no loop facility in SQL.
Are there elegant solutions in Teradata SQL?
P.S. The data I am dealing with has ~20 million rows.
The way to do it in R has been published here.
How to count the number of times an element appears consecutively in a data.table?
It is easy to do with analytic functions. While I don't know anything about teradata, quickly googling makes it appear as though it does support analytic functions.
In any case, I've tested the following in Oracle --
select id,
count(*)
from (select x.*,
row_number() over(partition by id order by ord) -
row_number() over(partition by id, seg order by ord) as grp
from tbl x) x
where seg = 'B'
group by id, grp
order by grp
The trick is establishing the 'groups' of Bs.
Fiddle: http://sqlfiddle.com/#!4/4ed6c/2/0

SQL Query Logic suggestion

I m out with peculiar scenario, need to get an logic for writing SQl query, tried my level best but still getting struck.
I have list of companies along with corresponding set of directors. Let's assume company 'x' has 5 directors (aa,bb,cc,dd,ee). Need to find out whether any other company in list has the same set of directors (ie) (aa,bb,cc,dd,ee present in company 'z' too). Even if one director gets differed there is no need to consider it.
lets consider simple example
company director
-------------------
a xx
a yy
b zz
b xx
c xx
c yy
O/P required (Since a and c has same set of directors)
company1 company2 director
---------------------------
a c xx
a c yy
Logic tried so far:
Replicated input table for comparison, performed a simple inner join, it fetches values, real problem exists in grouping company names which is troublesome in every iteration.
Can anyone help on the same. Really thankful
I can think of a hack using listagg
with x as (
select 'A' as company, 1 as director from dual union all
select 'A' as company, 2 as director from dual union all
select 'B' as company, 1 as director from dual union all
select 'B' as company, 3 as director from dual union all
select 'B' as company, 4 as director from dual union all
select 'C' as company, 1 as director from dual union all
select 'C' as company, 2 as director from dual union all
select 'D' as company, 4 as director from dual union all
select 'E' as company, 4 as director from dual union all
select 'F' as company, 5 as director from dual union all
select 'F' as company, 4 as director from dual union all
select 'G' as company, 4 as director from dual union all
select 'G' as company, 5 as director from dual
) , companies as (
select company,
listagg(director,',') within group (order by director) directors
from x
group by company
)
select directors,
listagg(company,',') within group (order by company) as companies
from companies
group by directors
having count(*) > 1
and the result is
DIRECTORS COMPANIES
------------------------------ ------------------------------
1,2 A,C
4 D,E
4,5 F,G
P.S: if you need to fetch the data for farther manipulation and the string types are not usable you can use COLLECT instead of LISTAGG but you would have to define a custom TYPE with MAP and ORDER functions to be able to group by the collected list of values.
My SQL is a bit rusty but I'd try something like this:
SELECT c1, c2 FROM
(select company as c0, count(director) as cd0 from data group by company) ALPHA
JOIN
(select count(d.director) as d0, d.company as c1, d.director as d1, e.company as c2, e.director as d2 from data d LEFT JOIN data e ON d.director = e.director
WHERE d.company <> e.company
group by c1, c2) BETA
ON ALPHA.cd0 = BETA.d0 AND ALPHA.c0 = BETA.c1
If you SELECT * above, you'll see that either column d1 or d2 gives you xx but not both xx and yy in the result set. However, it should be straightforward enough to write an outer query that gives you that.
Explanation: I'm computing the join of a company's directors against other company's directors and checking against the count of directors for the company itself. In your example:
Company a has 2 directors, and both match against only company c.
Company b has 2 directors and no matches for both against companies a and c.
Company c has 2 directors and both match against company a.
This makes a and c a perfect match.
I tested against some more dummy data, as well as companies with 3 or 4 directories and it worked on those too.
How about this:
WITH dirlist AS (
SELECT company, LISTAGG(director, ',') WITHIN GROUP (ORDER BY director) directors
FROM companies
)
SELECT DISTINCT c1.company, c2.company, c1.directors
FROM dirlist c1, dirlist c2
WHERE c1.directors=c2.directors;
Edit: I can see now that I haven't answered your question exactly. This query only detects matches (without eliminating duplicates and selfjoins). Hope it help a bit.
A slight update on the answer by GoranM
WITH dirlist AS (
SELECT company, LISTAGG(director, ',') WITHIN GROUP (ORDER BY director) directors
FROM companies
)
SELECT c1.company, MAX(c2.company), COUNT(c1.directors) AS DuplicateDirList
FROM dirlist c1, dirlist c2
WHERE c1.directors=c2.directors;
GROUP BY c1.company
HAVING COUNT(c1.directors) >1
This will give you a company with diplicate directors and the first duplicate company.