PostgreSQL selecting rows which appear maximum number of times - sql

I have 3 tables. 1 is startups which contains startup id and name, 1 is investor_groups which contains investor id and investor group name, 1 is deals which contains startup_id - The id of the startup that submitted the funding application, and investor_group_id - The id of the investor group that the startup submitted the funding application to.
I have to find the startups which submitted the most applications, and the names of the groups they applied to.
What I'm trying to do is -
SELECT S.name AS Startup_name, COUNT(S.name) as num
FROM deals D
INNER JOIN startups S ON D.startup_id = S.id
INNER JOIN investor_groups I ON D.investor_group_id = I.id
GROUP BY Startup_name
ORDER BY num DESC
LIMIT 2
But this is giving me the result as -
startup_name, num
HJ Inc, 3
smoothies, 3
What I want is -
startup_name, investor_name
HJ Inc, abc
HJ Inc , def
HJ Inc , ghi
smoothies, xyz
smoothies, rst
smoothies, lmn
When I am adding Investor group names in the SELECT statement, its showing me an error that Investor group names need to be in a GROUP BY statement.
The sample data from the tables -
For table 'startups'
startups
For table 'investor_groups'
investor_groups
For table 'deals'
deals

You could try using count() over()
SELECT
S.name AS Startup_name
, I.name AS investor_name
, COUNT(*) over(partition by S.name) AS num
FROM deals D
INNER JOIN startups S ON D.startup_id = S.id
INNER JOIN investor_groups I ON D.investor_group_id = I.id
ORDER BY num DESC, Startup_name, investor_name

Related

How to sum up max values from another table with some filtering

I have 3 tables
User Table
id
Name
1
Mike
2
Sam
Score Table
id
UserId
CourseId
Score
1
1
1
5
2
1
1
10
3
1
2
5
Course Table
id
Name
1
Course 1
2
Course 2
What I'm trying to return is rows for each user to display user id and user name along with the sum of the maximum score per course for that user
In the example tables the output I'd like to see is
Result
User_Id
User_Name
Total_Score
1
Mike
15
2
Sam
0
The SQL I've tried so far is:
select TOP(3) u.Id as User_Id, u.UserName as User_Name, SUM(maxScores) as Total_Score
from Users as u,
(select MAX(s.Score) as maxScores
from Scores as s
inner join Courses as c
on s.CourseId = c.Id
group by s.UserId, c.Id
) x
group by u.Id, u.UserName
I want to use a having clause to link the Users to Scores after the group by in the sub query but I get a exception saying:
The multi-part identifier "u.Id" could not be bound
It works if I hard code a user id in the having clause I want to add but it needs to be dynamic and I'm stuck on how to do this
What would be the correct way to structure the query?
You were close, you just needed to return s.UserId from the sub-query and correctly join the sub-query to your Users table (I've joined in reverse order to you because to me its more logical to start with the base data and then join on more details as required). Taking note of the scope of aliases i.e. aliases inside your sub-query are not available in your outer query.
select u.Id as [User_Id], u.UserName as [User_Name]
, sum(maxScore) as Total_Score
from (
select s.UserId, max(s.Score) as maxScore
from Scores as s
inner join Courses as c on s.CourseId = c.Id
group by s.UserId, c.Id
) as x
inner join Users as u on u.Id = x.UserId
group by u.Id, u.UserName;

Most popular pairs of shops for workers from each company

I've got 2 tables, one with sales and one with companies:
Sales Table
Transaction_Id Shop_id Sale_date Client_ID
92356 24234 11.09.2018 12356
92345 32121 11.09.2018 32121
94323 24321 11.09.2018 21231
94278 45321 11.09.2018 42123
Company table
Client_ID Company_name
12345 ABC
13322 ABC
32321 BCD
22221 BCD
What I want to achieve is distinct count of Clients from each Company for each pair of shops(Clients who had at least 1 transaction in both of shops) :
Shop_Id_1 Shop_id_2 Company_name Count(distinct Client_id)
12356 12345 ABC 31
12345 14278 ABC 23
14323 12345 BCD 32
14278 12345 BCD 43
I think that I have to use self join, but my queries even with filter for one week is killing DB, any thoughts on that? I'm using Microsoft SQL server 2012.
Thanks
I think this is a self-join and aggregation, with a twist. The twist is that you want to include the company in each sales record, so it can be used in the self-join:
with sc as (
select s.*, c.company_name
from sales s join
companies c
on s.client_id = c.client_id
)
select sc1.shop_id, sc2.shop_id, sc1.company_name, count(distinct sc1.client_id)
from sc sc1 join
sc sc2
on sc1.client_id = sc2.client_id and
sc1.company_name = sc2.company_name
group by sc1.shop_id, sc2.shop_id, sc1.company_name;
I think there are some issues with your question. I interpreted it as such that the company table contains the shop ID's, not the ClienId's.
First you can create a solution to get the shops as rows for each company. Here I chose a maximum of 5 shops per company. Don't forget the semicolon in the previous statement before the cte's.
WITH CTE_Comp AS
(
SELECT *, ROW_NUMBER() OVER (PARTITION BY CompanyName ORDER BY ShopID) AS RowNumb
FROM Company AS C
)
SELECT C1.ShopID,
C2.ShopID AS ShopID_2,
C3.ShopID AS ShopID_3,
C4.ShopID AS ShopID_4,
C5.ShopID AS ShopID_5,
C1.CompanyName
INTO ShopsByCompany
FROM CTE_Comp AS C1
LEFT JOIN CTE_Comp AS C2 ON C1.CompanyName= C2.CompanyName AND RowNumb = 2
LEFT JOIN CTE_Comp AS C2 ON C1.CompanyName= C3.CompanyName AND RowNumb = 3
LEFT JOIN CTE_Comp AS C2 ON C1.CompanyName= C4.CompanyName AND RowNumb = 4
LEFT JOIN CTE_Comp AS C2 ON C1.CompanyName= C5.CompanyName AND RowNumb = 5
WHERE C1.RowNumb = 1
After that, in a few steps, I think you could get the desired result:
WITH ClientsPerShop AS
(
SELECT ShopID,
COUNT (DISTINCT ClientID) AS TotalClients
FROM Sales
GROUP BY ShopID
)
, ClienstsPerCompany AS
(
SELECT CompanyName,
SUM (TotalClients) AS ClientsPerComp
FROM Company AS C
INNER JOIN ClientsPerShop AS CPS ON C.ShopID = CPS.ShopID
GROUP BY CompanyName
)
SELECT *
FROM ClienstsPerCompany AS CPA
INNER JOIN ShopsByCompany AS SBC ON SBC.CompanyName = CPA.CompanyName
Hopefully this will bring you closer to your solution, best of luck!

How to perform max on an inner join with 2 different counts on columns?

How to find the user with the most referrals that have at least three blue shoes using PostgreSQL?
table 1 - users
name (matches shoes.owner_name)
referred_by (foreign keyed to users.name)
table 2 - shoes
owner_name (matches persons.name)
shoe_name
shoe_color
What I have so far is separate queries returning parts of what I want above:
(SELECT count(*) as shoe_count
FROM shoes
GROUP BY owner_name
WHERE shoe_color = “blue”
AND shoe_count>3) most_shoes
INNER JOIN
(SELECT count(*) as referral_count
FROM users
GROUP BY referred_by
) most_referrals
ORDER BY referral_count DESC
LIMIT 1
Two subqueries seem like the way to go. They would look like:
SELECT s.owner_name, s.show_count, r.referral_count
FROM (SELECT owner_name, count(*) as shoe_count
FROM shoes
WHERE shoe_color = 'blue'
GROUP BY owner_name
HAVING shoe_count >= 3
) s JOIN
(SELECT referred_by, count(*) as referral_count
FROM users
GROUP BY referred_by
) r
ON s.owner_name = r.referred_by
ORDER BY r.referral_count DESC
LIMIT 1 ;

T SQL Adress Table with the same Company need latest Contact

i got an Address Table with Primary and Secondary Company locations, example:
ADDRESSES:
ID CompanyName AdressType MainID Location
1 ExampleCompany H 0 Germany
2 ExampleCompany N 1 Sweden
3 ExampleCompany N 1 Germany
and we got another Contacts Table including the latest Contact to each of the Company Locations
Contacts
ID SuperID Datecreate Notes
1 1 10.04.2018 XY
2 3 09.04.2018 YX
3 2 11.04.2018 XX
Now we want to select the latest Contact per Company and sort them so we got a list of all our customers that we did not contact in a long time.
i thought about something like this:
SELECT
ADDRH.ID,
ADDRH.COMPANY1,
TOPCONT.ID,
TOPCONT.DATECREATE,
TOPCONT.NOTES0
FROM dbo.ADDRESSES ADDRH
OUTER APPLY (SELECT TOP 1 ID, SUPERID, DATECREATE, CREATEDBY, NOTES0 FROM DBO.CONTACTS CONT WHERE ADDRH.ID = CONT.SUPERID ORDER BY DATECREATE DESC) TOPCONT
WHERE
TOPCONT.ID IS NOT NULL
ORDER BY TOPCONT.DATECREATE
But this is still missing the fact that we got the same company multiple times in the addresses table. how can i create a list that got each company with the latest contact?
Thanks for your help
Greetings
Well, you have to remove duplicates from address as well. Because of the structure of your data, I think the best approach is to use row_number():
SELECT ac.*
FROM (SELECT a.ID, a.COMPANY1, c.ID, c.DATECREATE, c.NOTES0,
ROW_NUMBER() OVER (PARTITION BY a.COMPANY1 ORDER BY c.DATECREATE DESC) as seqnum
FROM dbo.ADDRESSES a JOIN
DBO.CONTACTS c
ON a.ID = c.SUPERID
WHERE c.ID IS NOT NULL
) ac
WHERE seqnum = 1
ORDER BY c.DATECREATE;

SQL - Select highest value when data across 3 tables

I have 3 tables:
Person (with a column PersonKey)
Telephone (with columns Tel_NumberKey, Tel_Number, Tel_NumberType e.g. 1=home, 2=mobile)
xref_Person+Telephone (columns PersonKey, Tel_NumberKey, CreatedDate, ModifiedDate)
I'm looking to get the most recent (e.g. the highest Tel_NumberKey) from the xref_Person+Telephone for each Person and use that Tel_NumberKey to get the actual Tel_Number from the Telephone table.
The problem I am having is that I keep getting duplicates for the same Tel_NumberKey. I also need to be sure I get both the home and mobile from the Telephone table, which I've been looking to do via 2 individual joins for each Tel_NumberType - again getting duplicates.
Been trying the following but to no avail:
-- For HOME
SELECT
p.PersonKey, pn.Phone_Number, pn.Tel_NumberKey
FROM
Persons AS p
INNER JOIN
xref_Person+Telephone AS x ON p.PersonKey = x.PersonKey
INNER JOIN
Telephone AS pn ON x.Tel_NumberKey = pn.Tel_NumberKey
WHERE
pn.Tel_NumberType = 1 -- e.g. Home phone number
AND pn.Tel_NumberKey = (SELECT MAX(pn1.Tel_NumberKey) AS Tel_NumberKey
FROM Person AS p1
INNER JOIN xref_Person+Telephone AS x1 ON p1.PersonKey = x1.PersonKey
INNER JOIN Telephone AS pn1 ON x1.Tel_NumberKey = pn1.Tel_NumberKey
WHERE pn1.Tel_NumberType = 1
AND p1.PersonKey = p.PersonKey
AND pn1.Tel_Number = pn.Tel_Number)
ORDER BY
p.PersonKey
And have been looking over the following links but again keep getting duplicates.
SQL select max(date) and corresponding value
How can I SELECT rows with MAX(Column value), DISTINCT by another column in SQL?
SQL Server: SELECT only the rows with MAX(DATE)
Am sure this must be possible but been at this a couple of days and can't believe its that difficult to get the most recent / highest value when referencing 3 tables. Any help greatly appreciated.
select *
from
( SELECT p.PersonKey, pn.Phone_Number, pn.Tel_NumberKey
, row_number() over (partition by p.PersonKey, pn.Phone_Number order by pn.Tel_NumberKey desc) rn
FROM
Persons AS p
INNER JOIN
xref_Person+Telephone AS x ON p.PersonKey = x.PersonKey
INNER JOIN
Telephone AS pn ON x.Tel_NumberKey = pn.Tel_NumberKey
WHERE
pn.Tel_NumberType = 1
) tt
where tt.rn = 1
ORDER BY
tt.PersonKey
you have to use max() function and then you have to order by rownum in descending order like.
select f.empno
from(select max(empno) empno from emp e
group by rownum)f
order by rownum desc
It will give you all employees having highest employee number to lowest employee number. Now implement it with your case then let me know.