Table is as follows
Company, Vertical, Counts
For each company I want to get the SUM of counts based on a specific Vertical having the highest count
Company Vertical Counts
IBM Finance 10
IBM R&D 5
IBM PR 2
I would like to get the following output
IBM Finance 17
A self-join should do it.
select company, vertical, total_count
from(
select sum(counts) as total_count
from table
)a
cross join table
where counts=(select max(counts) from table);
Depending on your RDBMS, you can also use a window function (eg sum(count) over () as total_count) and not have to worry about the cross join.
It's a twist on the problem of "How to get the MAX row" (DBA.SE link)
get total and highest vertical per Company in a simple aggregate
use these to identify the row in the source table
Something like this, untested
SELECT
t.Company, t.Vertical, m.CompanyCount
FROM
( --get total and highest vertical per Company
SELECT
COUNT(*) AS CompanyCount,
MAX(Vertical) AS CompanyMaxVertical,
Company
FROM MyTable
GROUP BY Company
) m
JOIN --back to get the row for that company with highest vertical
MyTable t ON m.Company = t.Company AND m.CompanyMaxVertical = t.Vertical
Edit: this is closer to standard SQL than a ROW_NUMBER because we don't know the platform
select Company,
Vertical,
SumCounts
from (
select Company,
Vertical,
row_number() over(partition by Company order by Counts desc) as rn,
sum(Counts) over(partition by Company) as SumCounts
from YourTable
) as T
where rn = 1
SELECT company,
vertical,
total_sum
FROM (
SELECT Company,
Vertical,
sum(counts) over (partition by null) as total_sum,
rank() over (order by counts desc) as count_rank
FROM the_table
) t
WHERE count_rank = 1
Related
I am trying to find out the most popular job position employees are working at a combination of companies. If there is a tie, however, then both are added to the table.
I have a file called employees_data.txt.
I have their name, company, job position, and age in that order.
Natali, Google, IT, 45
Nadia, Facebook, Sales, 25
Jacob, Google, IT, 32
Leonard, Bing, Custodian, 65
Kami, Amazon, Driver, 43
Paul, Facebook, Engineer, 31
Ashley, Walmart, IT, 34
Robert, Fedex, IT, 27
Rebecca, Ups, Driver, 29
Mal, Apple, Custodian, 73
Erin, Bing, Sales, 38
I know the expected outcome should be the IT position, I'm just unsure the sql command to read through and keep track of the positions.
Any help is greatly appreciated!
Feels like homework :laugh:
You need an aggregate (count, sum, min,max, etc,.) and a group by
select count(*), position
from t
group by position
https://www.db-fiddle.com/f/dUqdZaUGpHTAYv8vH1YhU1/0
to only return the 'top record' we can use a self join with row_number calculation like this... probably an easier and cleaner way to do it, but you get the idea.
SELECT count(*) as recordcount, t.position
FROM t
INNER JOIN (
SELECT *
,row_number() OVER (
ORDER BY recordCount DESC
) AS rn
FROM (
SELECT count(*) AS recordCount
,position
FROM t
GROUP BY position
) as a
) d ON t.position = d.position
AND d.rn = 1
group by t.position
https://www.db-fiddle.com/f/dUqdZaUGpHTAYv8vH1YhU1/1
You want aggregation with a window function. That is:
select p.*
from (select position, count(*) as cnt,
rank() over (order by count(*) desc) as seqnum
from t
group by position
) p
where seqnum = 1;
In the most recent version of Postgres, you don't even need a subquery because it now supports with ties:
select position, count(*) as cnt
from t
group by position
order by count(*) desc
fetch first 1 row with ties;
I suspect your assignment calls for a query something along these lines:
select job_position
from employees_data
group by job_position
order by count(*) desc
fetch first 1 row with ties
Assuming the table is call jobpositions, and the columns are as follows:
name, company, position,age
I would use:
select * from (
select position, COUNT(position) as countpos, ROW_NUMBER() OVER(ORDER BY count(position) DESC) as numpos
from jobpositions group by position order by count(position) desc
) tb1 where tb1.numpos=1
This seems to work in postgres, and i like it because it is simple.
I have a table Tabl1 : id, name, country, year, medal.
how can I find the top 10 countries by the number of medals for each year in 1 request?
thanks:)
You haven't told us anything about your table schema or the data, so this is a guess!
Going to assume your medal column contains the qty of medals for each Id/name, so you just need to rank by the sum of medals. Something along the lines of:
select [year], country, [Rank] from (
select [year], country, Rank() over(partition by [year] order by Sum(medal) desc ) [Rank]
from Tabl1
group by [year],country
)x
where [Rank]<=10
order by [year], [Rank]
here you can get the top 10 countries in each year:
select * from
(
select country,year,count(*),row_number() over (order by count(*) desc) as rn
from table
group by country, year
) tt
where tt.rn < 11
the sub query groups the data per country and year and gives you count() of each group, but at the same time It sorts them per count(*) desc and gives the a row number per each group ( it happanes using row_number() window funcion) , so the country with the most medal in eacg year is on top and it gets row number = 1 in each group , you need top 10 , so you filter them tt.rn < 11 in the main query.
If you want 10 countries per year:
with data as (
select country, "year" as yr,
rank() over (partition by "year" order by count(*) desc) as rnk
from T
group by country, "year"
)
select yr as "year", country from data
where rnk <= 10
order by yr, rnk;
Note that if ties are possible this could return more than ten rows for any given year.
I am facing a bit of issue with an SQL query:
Currently I have 2 tables. The first table lists sales by a vendor and country eg and there are a lot more rows but this is just the gist.
Country id Sale
US 1 100
UK 2 1000
US 3 150
UK 2 200
In the second table I have ids that links to the vendor's name eg
id name
1 john
2 david
3 tom
I need to get the top vendor in each country but sum of sales. the output should look something like this
country id name sum_sales
Would you be able to help. Currently I am only able to groupby and sum and am unable to obtain the top guy in each country. thank you!
I am running this on big_query sql
Use dense_rank() with aggregation :
select yr, Country, id, name, total_sales
from (select extract(year from s.date) as yr,
s.Country, s.id, v.name, sum(s.sales) as total_sales,
dense_rank() over (partition by s.date, s.country order by sum(s.sales) desc) as seq
from sales s inner join
vendors v
on v.id = v.id
group by s.date, s.Country, s.id, v.name
) t
where seq <= 2;
EDIT : For specific year format use FORMAT_DATETIME
FORMAT_DATETIME("%Y", DATETIME "2020-03-19")
By this way, you will get vendors for each country which are having higher sales.
Note : This will display two or more vendors which are having same total sales. If you want only one from them, then use row_number() instead of dense_rank().
In BigQuery, you can use window functions with aggregation:
select id, name, country, sum_sales
from (select s.id, v.name, s.country, sum(sales) as sum_sales
row_number() over (partition by s.country order by sum(sales) desc) as seqnum
from sales s join
vendors v
on v.id = v.id
group by s.id, v.name, s.country
) sv
where seqnum = 1;
Below is for BigQuery Standard SQL
#standardSQL
SELECT AS VALUE ARRAY_AGG(t ORDER BY sum_sale DESC LIMIT 1)[OFFSET(0)]
FROM (
SELECT country, id, name, SUM(sale) sum_sale
FROM `project.dataset.vendors`
JOIN `project.dataset.sales`
USING(id)
GROUP BY id, name, country
) t
GROUP BY country
I am using SQL Server and I have a table "a"
month segment_id price
-----------------------------
1 1 100
1 2 200
2 3 50
2 4 80
3 5 10
I want to make a query which presents the original columns where the price will be the max per month
The result should be:
month segment_id price
----------------------------
1 2 200
2 4 80
3 5 10
I tried to write SQL code:
Select
month, segment_id, max(price) as MaxPrice
from
a
but I got an error:
Column segment_id is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause
I tried to fix it in many ways but didn't find how to fix it
Because you need a group by clause without segment_id
Select month, max(price) as MaxPrice
from a
Group By month
as you want results per each month, and segment_id is non-aggregated in your original select statement.
If you want to have segment_id with maximum price repeating per each month for each row, you need to use max() function as window analytic function without Group by clause
Select month, segment_id,
max(price) over ( partition by month order by segment_id ) as MaxPrice
from a
Edit (due to your lastly edited desired results) : you need one more window analytic function row_number() as #Gordon already mentioned:
Select month, segment_id, price From
(
Select a.*,
row_number() over ( partition by month order by price desc ) as Rn
from a
) q
Where rn = 1
I would recommend a correlated subquery:
select t.*
from t
where t.price = (select max(t2.price) from t t2 where t2.month = t.month);
The "canonical" solution is to use row_number():
select t.*
from (select t.*,
row_number() over (partition by month order by price desc) as seqnum
from t
) t
where seqnum = 1;
With the right indexes, the correlated subquery often performs better.
Only because it was not mentioned.
Yet another option is the WITH TIES clause.
To be clear, the approach by Gordon and Barbaros would be a nudge more performant, but this technique does not require or generate an extra column.
Select Top 1 with ties *
From YourTable
Order By row_number() over (partition by month order by price desc)
With not exists:
select t.*
from tablename t
where not exists (
select 1 from tablename
where month = t.month and price > t.price
)
or:
select t.*
from tablename inner join (
select month, max(price) as price
from tablename
group By month
) g on g.month = t.month and g.price = t.price
Group by the highest Number in a column worked great with MAX(), but what if I would like to get the cell that is at most common.
As example:
ID
100
250
250
300
200
250
So I would like to group by ID and instead of get the lowest (MIN) or highest (MAX) number, I would like to get the most common one (that would be 250, because there 3x).
Is there an easy way in SQL Server 2012 or am I forced to add a second SELECT where I COUNT(DISTINCT ID) and add that somehow to my first SELECT statement?
You can use dense_rank to return all the id's with the highest counts. This would handle cases when there are ties for the highest counts as well.
select id from
(select id, dense_rank() over(order by count(*) desc) as rnk from tablename group by id) t
where rnk = 1
A simple way to do what you want uses top and order by:
SELECT top 1 id
FROM t
GROUP BY id
ORDER BY COUNT(*) DESC;
This is a statistic called the mode. Getting the mode and max is a bit challenging in SQL Server. I would approach it as:
WITH cte AS (
SELECT t.id, COUNT(*) AS cnt,
row_number() OVER (ORDER BY COUNT(*) DESC) AS seqnum
FROM t
GROUP BY id
)
SELECT MAX(id) AS themax, MAX(CASE WHEN seqnum = 1 THEN id END) AS MODE
FROM cte;