SQL GROUP BY multiple columns with one non-repeating column

SQL GROUP BY multiple columns with one non-repeating column - sql

I am attempting to query multiple columns in order to display the heaviest ship for each builder/company name.
When using my above query I instead receive the results for every ships weight instead of the heaviest ship for each builder. I have spent a few hours trying to discern what is needed to cause the builder column to be distinct.

You have to remove 'shipname' from the group by list to get the max weight for each builder, then join the query to the original table to get the ship name as the following:
select T.builder, T.shipname, T.weight
from ship T join
(
select builder, max(weight) mx
from ship
group by builder
) D
on T.builder=D.builder and T.weight=D.mx
order by T.builder
You may also use DESNE_RANK() function to get the required results as the following:
select top 1 with ties
builder, shipname, weight
from ship
order by dense_rank() over (partition by builder order by weight desc)
See a demo on SQL Server (I supposed that you are using SQL Server from the posted image).

You don't need to apply a GROUP BY clause in this situation. It will be sufficient to check whether the ship's weight is the highest weight for the current builder. This can be done with a simple sub query:
SELECT
builder, shipname, weight
FROM
ship
WHERE
weight = (SELECT MAX(i.weight) FROM ship i WHERE i.builder = ship.builder)
ORDER BY builder;

I always think the simplest way to handle a complex query starts with a Common Table Expression (CTE). If you’re new to this, a CTE is a query which is run as a first step, so that you can use its results in the next step.
WITH cte AS (
SELECT builder, max(weight) AS weight
FROM data
GROUP BY builder
)
SELECT *
FROM cte JOIN data ON cte.builder=data.builder AND cte.weight=data.weight;
The CTE above fetches the rows with the maximum weights:
SELECT builder, max(weight) AS weight
FROM data
GROUP BY builder
builder
weight
Ace Shipbuilding Corp
95000
Ajax
90000
Jones
95000
Master
80000
You now have the maximum weight for each builder.
You then join this result with the original data to fetch the rows which match the weights and builders:
builder
weight
builder
shipname
weight
Master
80000
Master
Queen Shiney
80000
Jones
95000
Jones
Princess of Florida
95000
Ajax
90000
Ajax
Prince Al
90000
Ace Shipbuilding Corp
95000
Ace Shipbuilding Corp
Ocean V
95000
Ace Shipbuilding Corp
95000
Ace Shipbuilding Corp
Sea Peace
95000
Note that there is a tie for the Ace Shipbuilding Corp.
The above solution supposes that the data you have in your sample is the original data.

Related

How to grouby data in one column and distribute it in another column in HiveSQL?

I have the following data:
CompanyID
Department
No of People
Country
45390
HR
100
UK
45390
Service
250
UK
98712
Service
300
US
39284
Admin
142
Norway
85932
Admin
260
Germany
I wish to know how many people belong to the same department from different countries?
Required Output
Department
No of People
Country
HR
100
UK
Service
250
UK
300
US
Admin
142
Norway
260
Germany
I was able to get the data but the Department was repeated by this query.
""" select Department, Country,count(Department) from dataset
group by Country,Department
order by Department """
How can I get the desired output?

The result set that you are producing is not really a relational result set. Why? Because rows depend on what is in the "previous" row. And in a relational database, there is no such thing as a "previous" row. This type of processing is often handled in the application layer.
Of course, SQL can do what you want. You just need to be careful:
select (case when 1 = row_number() over (partition by Department order by Country)
then Department
end) as Department,
Country, count(*) as num_people,
from dataset
group by Country,Department
order by Department, Country;
Note that the order by needs to match the window function clause to be sure that what row_number() considered to be the first row is really the first row in the result set.

SQL AVG function

What is the correct syntax to display the data from all records using GROUP (or DISTINCT) and AVG?
My data looks like this:
TABLE NAME: Pensje
COLUMNS:
ID, Company, Position, Salary
Example:
ID Company Position Salary
1 Atari Designer 24000
2 Atari Designer 20000
3 Atari Programmer 35000
4 Amiga Director 40000
I need to arrange data in this way (only records from 1 company need to be displayed)
Position a , average Salary from all the records with same Company and Position
Position b , average Salary from all the records with same Company and Position
Ex. Atari
Designer, 22000
Programmer, 35000
My SQL looks like this:
SELECT Position, AVG(Salary)
FROM Pensje
WHERE Company = %s
GROUP BY Position
ORDER BY Position ASC
In the above example "Position" is displayed correctly, "Salary" is not displayed at all, while after removing AVG() is displayed but only first position found in table
Many thanks for taking your time to help me!

Posting an answer from MCP_infiltrator which helped:
In your WHERE clause you should be using WHERE Company LIKE '%s' not = The way you have it written, should produce the results you want so I do not understand why it is not working properly for you. You may also want to alias your AVG() results like AVG(Salary) AS Average_Salary otherwise your column will comback with no name.

weighted ranking/ combined score in Google Big Query

...Spent several hours trying what not and researching this forum. Quite pessimistic at this point about the usefulness of Google Big Query (GBQ) for anything more than trivial queries, but here is one last desperate try, maybe someone has better ideas:
Let's say we have a COUNTRY table with average population weight(in kilograms) and height (in meters) per country as follows:
country | continent | weight | height |
============================================
US | America | 200 | 2.00 |
Canada | America | 170 | 1.90 |
France | Europe | 160 | 1.78 |
Germany | Europe | 110 | 2.00 |
Let's say you want to pick out and live in the European country with "smallest" people, where you define the measure "smallness" as the weighted sum of body weight and height with some constant weights, such as 0.6 for body weight and 0.4 for body height.
In Oracle or MS SQL server this can be done elegantly and compactly by using analytic window functions such as rank() and row_number(), for example:
select country, combined_score
from (select
country
,( 0.6*rank(weight) over() + 0.4*rank(height) over() ) combined_score
from country
where continent = 'Europe')
order by combined_score
Note that the ranking is done after the filtering for continent. The continent filter is dynamic (say input from a web form), so the ranking can not be pre-calculated and stored in the table in advance!
In GBQ there are no rank() , row_number() or over(). Even if you try some "poor man" hacks it is still not going to work because GBQ does not support correlated queries. Here are similar attempts by other people with some pretty unsatisfactory and inefficient results:
BigQuery SQL running totals
Row number in BigQuery?
Any ideas how this can be done? I can even restructure the data to use nested records, if it helps. Thank you in advance!

In your specific example, I think you can compute the result without using RANK and OVER at all:
SELECT country, score
FROM (SELECT country, 0.6 * weight + 0.4 * height AS score
FROM t WHERE continent = 'Europe')
ORDER BY score;
However, I'm assuming that this is a toy example and that your real problem involves use of RANK more in line with your example query. In that case, BigQuery does not yet support analytic functions directly, but we'll consider this to be a feature request. :-)

An equivalent for RANK in BigQuery is row_number().
For example, the top 5 contributors to Wikipedia, with row_number giving their place:
SELECT
ROW_NUMBER() OVER() row_number,
contributor_username,
count,
FROM (
SELECT contributor_username, COUNT(*) count,
FROM [publicdata:samples.wikipedia]
GROUP BY contributor_username
ORDER BY COUNT DESC
LIMIT 5)

SQL Selecting distinct rows from multiple columns based on max value in one column

This is my SQL View - lets call it MyView :
ECode SHCode TotalNrShare CountryCode Country
000001 +00010 100 UKI United Kingdom
000001 ABENSO 900 USA United States
000355 +00012 1000 ESP Spain
000355 000010 50 FRA France
000042 009999 10 GER Germany
000042 +00012 999 ESP Spain
000787 ABENSO 500 USA United States
000787 000150 500 ITA Italy
001010 009999 100 GER Germany
I would like to return the single row with the highest number in the column TotalNrShare for each ECode.
For example, I’d like to return these results from the above view:
ECode SHCode TotalNrShare CountryCode Country
000001 ABENSO 900 USA United States
000355 +00012 1000 ESP Spain
000042 +00012 999 ESP Spain
000787 ABENSO 500 USA United States
001010 009999 100 GER Germany
(note in the case of ECode 000787 where there are two SHCode's with 500 each, as they are the same amount we can just return the first row rather than both, it isnt important for me which row is returned since this will happen very rarely and my analysis doesnt need to be 100%)
Ive tried various things but do not seem to be able to return either unqiue results or the additional country code/country info that I need.
This is one of my attempts (based on other solutions on this site, but I am doing something wrong):
SELECT tsh.ECode, tsh.SHCode, tsh.TotalNrShare, tsh.CountryCode, tsh.Country
FROM dbo.MyView AS tsh INNER JOIN
(SELECT DISTINCT ECode, MAX(TotalNrShare) AS MaxTotalSH
FROM dbo.MyView
GROUP BY ECode) AS groupedtsh ON tsh.ECode = groupedtsh.ECode AND tsh.TotalNrShare = groupedtsh.MaxTotalSH

WITH
sequenced_data AS
(
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY ECode ORDER BY TotalNrShare) AS sequence_id
FROM
myView
)
SELECT
*
FROM
sequenced_data
WHERE
sequence_id = 1
This should, however, give the same results as your example query. It's simply a different approach to accomplish the same thing.
As you say that something is wrong, however, please could you elaborate on what is going wrong? Is TotalNrShare actually a string for example? And is that messing up your ordering (and so the MAX())?
EDIT:
Even if the above code was not compatible with your SQL Server, it shouldn't crash it out completely. You should just get an error message. Try executing Select * By Magic, for example, and it should just give an error. I strongly suggest getting your installation of Management Studio looked at and/or re-installed.
In terms of an alternative, you could do this...
SELECT
*
FROM
(SELECT ECode FROM MyView GROUP BY ECode) AS base
CROSS APPLY
(SELECT TOP 1 * FROM MyView WHERE ECode = base.ECode ORDER BY TotalNrShare DESC) AS data
Ideally you would replace the base sub-query with a table that already has a distinct list of all the ECodes that you are interested in.

try this;
with cte as(
SELECT tsh.ECode, tsh.SHCode, tsh.TotalNrShare, tsh.CountryCode, tsh.Country,
ROW_NUMBER() over (partition by ECode order by SHCode ) as row_num
FROM dbo.MyView)
select * from cte where row_num=1

SQL command for PROGRESS database

Please bear with me new to SQL- I am trying to write an SQL command with a join in a PROGRESS db. I would like to then select only the first matching record from the join. I thought to use LIMIT but PROGRESS does not support that. MIN or TOP would also work I think but having trouble with the syntax. Here is current syntax:
SELECT esthead_0."k-est-code", estdie_0."estd-size2", estdie_0."k-cmp-no", estdie_0."estd-cal"
FROM VISION.PUB.estdie estdie_0
INNER JOIN VISION.PUB.esthead esthead_0 ON estdie_0."k-est-code" = esthead_0."k-est-code"
WHERE estdie_0."k-cmp-no" = (SELECT MIN("k-cmp-no")
FROM VISION.PUB.estdie estdie_0 )
This will select the MIN from the whole table but I would like the MIN of the records the join returns for each "k-est-code".

To do what you're accomplishing, you need to use Aggregate functions and GROUP BY.
Here is the correct query:
SELECT esthead_0."k-est-code", estdie_0."estd-size2", MIN(estdie_0."k-cmp-no") AS k-cmp-no-minimum, estdie_0."estd-cal"
FROM VISION.PUB.estdie estdie_0
INNER JOIN VISION.PUB.esthead esthead_0 ON estdie_0."k-est-code" = esthead_0."k-est-code"
GROUP BY esthead_0."k-est-code", estdie_0."estd-size2", estdie_0."estd-cal"
The general syntax for adding a GROUP BY / Aggregate query is:
use an aggregate function like MIN(), MAX(), AVG(), SUM() to select which column you want ... (choose the function depending on whether you want minimum, maximum etc). There are those I listed which are standard, and then often your database will give you some additional ones as well.
Add every other column you're selecting EXCEPT the ones in the function to a GROUP BY at the end of your query.
Your GROUP BY has to occur after your WHERE, but before your ORDER BY.
If you want to do WHERE-like filtering on the function (say you wanted only k-cmp-no over 100), you use HAVING after the group by, e.g.:
HAVING MIN(estdie_0."k-cmp-no") > 100
Google for Group By and Aggregate functions for more info on this SQL concept. It works the same in all databases as it's standard ANSI SQL. See this page for a more thorough introduction with examples: http://www.w3schools.com/sql/sql_groupby.asp

Progress (OE 11.2) supports OFFSET FETCH which is same as LIMIT OFFSET in mysql.
Example:
SQLExplorer>select FirstName , LastName , EmpNum from pub.employee order by empnum offset 10 rows fetch next 10 rows only;
FirstName LastName EmpNum
------------------------------ -------------------------------------------------- -----------
Frank Garsen 11
Jenny Morris 12
Luke Sanders 13
Marcy Adams 14
Alex Simons 15
Holly Atkins 16
Larry Barry 17
Jean Brady 18
Larry Dawsen 19
Dan Flanagan 20
Hope this helps

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SQL GROUP BY multiple columns with one non-repeating column - sql

Related

How to grouby data in one column and distribute it in another column in HiveSQL?

SQL AVG function

weighted ranking/ combined score in Google Big Query

SQL Selecting distinct rows from multiple columns based on max value in one column

SQL command for PROGRESS database

Categories

Resources