How to Select and Order By columns not in Groupy By SQL statement - Oracle - sql

I have the following statement:
SELECT
IMPORTID,Region,RefObligor,SUM(NOTIONAL) AS SUM_NOTIONAL
From
Positions
Where
ID = :importID
GROUP BY
IMPORTID, Region,RefObligor
Order BY
IMPORTID, Region,RefObligor
There exists some extra columns in table Positions that I want as output for "display data" but I don't want in the group by statement.
These are Site, Desk
Final output would have the following columns:
IMPORTID,Region,Site,Desk,RefObligor,SUM(NOTIONAL) AS SUM_NOTIONAL
Ideally I'd want the data sorted like:
Order BY
IMPORTID,Region,Site,Desk,RefObligor
How to achieve this?

It does not make sense to include columns that are not part of the GROUP BY clause. Consider if you have a MIN(X), MAX(Y) in the SELECT clause, which row should other columns (not grouped) come from?
If your Oracle version is recent enough, you can use SUM - OVER() to show the SUM (grouped) against every data row.
SELECT
IMPORTID,Site,Desk,Region,RefObligor,
SUM(NOTIONAL) OVER(PARTITION BY IMPORTID, Region,RefObligor) AS SUM_NOTIONAL
From
Positions
Where
ID = :importID
Order BY
IMPORTID,Region,Site,Desk,RefObligor
Alternatively, you need to make an aggregate out of the Site, Desk columns
SELECT
IMPORTID,Region,Min(Site) Site, Min(Desk) Desk,RefObligor,SUM(NOTIONAL) AS SUM_NOTIONAL
From
Positions
Where
ID = :importID
GROUP BY
IMPORTID, Region,RefObligor
Order BY
IMPORTID, Region,Min(Site),Min(Desk),RefObligor

I believe this is
select
IMPORTID,
Region,
Site,
Desk,
RefObligor,
Sum(Sum(Notional)) over (partition by IMPORTID, Region, RefObligor)
from
Positions
group by
IMPORTID, Region, Site, Desk, RefObligor
order by
IMPORTID, Region, RefObligor, Site, Desk;
... but it's hard to tell without further information and/or test data.

A great blog post that covers this dilemma in detail is here:
http://bernardoamc.github.io/sql/2015/05/04/group-by-non-aggregate-columns/
Here are some snippets of it:
Given:
CREATE TABLE games (
game_id serial PRIMARY KEY,
name VARCHAR,
price BIGINT,
released_at DATE,
publisher TEXT
);
INSERT INTO games (name, price, released_at, publisher) VALUES
('Metal Slug Defense', 30, '2015-05-01', 'SNK Playmore'),
('Project Druid', 20, '2015-05-01', 'shortcircuit'),
('Chroma Squad', 40, '2015-04-30', 'Behold Studios'),
('Soul Locus', 30, '2015-04-30', 'Fat Loot Games'),
('Subterrain', 40, '2015-04-30', 'Pixellore');
SELECT * FROM games;
game_id | name | price | released_at | publisher
---------+--------------------+-------+-------------+----------------
1 | Metal Slug Defense | 30 | 2015-05-01 | SNK Playmore
2 | Project Druid | 20 | 2015-05-01 | shortcircuit
3 | Chroma Squad | 40 | 2015-04-30 | Behold Studios
4 | Soul Locus | 30 | 2015-04-30 | Fat Loot Games
5 | Subterrain | 40 | 2015-04-30 | Pixellore
(5 rows)
Trying to get something like this:
SELECT released_at, name, publisher, MAX(price) as most_expensive
FROM games
GROUP BY released_at;
But name and publisher are not added due to being ambiguous when aggregating...
Let’s make this clear:
Selecting the MAX(price) does not select the entire row.
The database can’t know and when it can’t give the right answer every
time for a given query it should give us an error, and that’s what it
does!
Ok… Ok… It’s not so simple, what can we do?
Use an inner join to get the additional columns
SELECT g1.name, g1.publisher, g1.price, g1.released_at
FROM games AS g1
INNER JOIN (
SELECT released_at, MAX(price) as price
FROM games
GROUP BY released_at
) AS g2
ON g2.released_at = g1.released_at AND g2.price = g1.price;
Or Use a left outer join to get the additional columns, and then filter by the NULL of a duplicate column...
SELECT g1.name, g1.publisher, g1.price, g2.price, g1.released_at
FROM games AS g1
LEFT OUTER JOIN games AS g2
ON g1.released_at = g2.released_at AND g1.price < g2.price
WHERE g2.price IS NULL;
Hope that helps.

Related

Trying to display Using Sum() with JOIN doesn't provide correct output

I'm trying to create a query that displays a user's Id, the sum of total steps, and sum of total calories burnt.
The data for steps and calories are within two datasets, so I used JOIN. However, when I write out the query, the joined data does not look correct. However when I do them separately, it appears to show the correct data
Below are my queries...I am fairly new to SQL, so I am somewhat confused on what I did wrong. How do I correct this? Thank you in advanced for the help!
For the Steps table, "Id" and "StepTotal" are Integers. For the Calories table, "Id" and "Calories" are also Integers.
SELECT steps.Id,Sum(StepTotal) AS Total_steps,Sum(cal.Calories) as Total_calories
FROM fitbit.Daily_steps AS steps
JOIN fitbit.Daily_calories AS cal ON steps.Id=cal.Id
GROUP BY Id
Given Output(Picture)
Expected Output(Picture)
For Steps
SELECT Id,Sum(StepTotal) AS Total_steps
FROM fitbit.Daily_steps
group by Id
Id
Total_steps
1503960366
375619
1624580081
178061
1644430081
218489
For Calories
SELECT Id,Sum(Calories) AS Total_calories
FROM fitbit.Daily_calories
group by Id
Id
Total_calories
1503960366
56309
1624580081
45984
1644430081
84339
I believe your current solution is returning additional rows as the result of the JOIN.
Let's look at an example data set
Steps
id | total
a | 5
a | 7
b | 3
Calories
id | total
a | 100
a | 300
b | 400
Now, if we SELECT * FROM Calories, we'd get 3 rows. If we SELECT * FROM Calories GROUP BY id, we'd get two rows.
But if we use a JOIN:
SELECT Steps.id, Steps.total AS steps, Calories.total AS cals FROM Steps
JOIN Calories
ON Steps.id = Calories.id
WHERE id = 'a'
This would return the following:
Steps_Calories
id | steps | cals
a | 5 | 100
a | 5 | 300
a | 7 | 100
a | 7 | 300
So now if we GROUP BY & SUM(steps), we get 24, instead of the expected 12, because the JOIN returns each pairing of steps & calories.
To mitigate this, we can use sub-queries & group & sum within the sub-queries
SELECT Steps.id, Steps.total AS steps, Calories.total AS cals
FROM (SELECT id, SUM(total) FROM Steps GROUP BY id) as step_totals
JOIN (Select id, SUM(total) FROM Cals GROUP BY id) as cal_totals
JOIN Calories
ON cal_totals.id = step_totals.id
Now each subquery only returns a single row for each id, so the join only returns a single row as well.
Of course, you'll have to adapt this for your schema.

Finding the most recent date in SQL for a range of rows

I have a table of course work marks, with the table headings:
Module code, coursework numbers, student, date submitted, mark
Sample data in order of table headings:
Maths, 1, Parry, 12-JUN-92, 20
Maths, 2, Parry, 13-JUN-92, 20
Maths, 2, Parry, 15-JUN-92, 25
Expected data after query
Maths, 1, Parry, 12-JUN-92, 20
Maths, 2, Parry, 15-JUN-92, 25
Sometimes a student retakes an exam and they have an additional row for a piece of coursework.
I need to try get only the latest coursework’s in a table. The following works when I isolate a particular student:
SELECT *
FROM TABLE
WHERE NAME = ‘NAME’
AND DATE IN (SELECT MAX(DATE)
FROM TABLE
WHERE NAME = ‘NAME’
GROUP BY MODULE_CODE, COURSEWORK_NUMBER, STUDENT)
This provides the correct solution for that person, giving me the most recent dates for each row (each coursework) in the table. However, this:
SELECT *
FROM TABLE
AND DATE IN (SELECT MAX(DATE)
FROM TABLE
GROUP BY MODULE_CODE, COURSEWORK_NUMBER, STUDENT)
Does not provide me with the same table but for every person who has attempted the coursework. Where am I going wrong? Sorry if the details are a bit sparse, but I’m worried about plagiarism.
Working with SQL plus
This is a good spot to use Oracle keep syntax:
select
module_code,
course_work_number,
student,
max(date_submitted) date_submitted,
max(mark) keep(dense_rank first order by date_submitted desc) mark
from mytable
group by module_code, course_work_number, student
Demo on DB Fiddle:
MODULE_CODE | COURSE_WORK_NUMBER | STUDENT | DATE_SUBMITTED | MARK
:---------- | -----------------: | :------ | :------------- | ---:
Maths | 1 | Parry | 12-JUN-92 | 20
Maths | 2 | Parry | 15-JUN-92 | 25
You are looking for a groupwise maximum. See this article from MySQL:
https://dev.mysql.com/doc/refman/8.0/en/example-maximum-column-group-row.html
I'm not sure about the correct syntax for Oracle, but it should be similar. At least the query structure should put you on the right path.
You could use the row_number function to solve this:
select x.*
(SELECT a.*,row_number() over(partition by name order by date desc) as row1
FROM TABLE a)x
where x.row1=1
The idea is to assign a row number based on the date and then select the cases where row number is 1. Hope this helps.

How to query partitions that you get from using window functions?

I have a table that has the following structure
------------------------------------
|Company_ID| Company_Name| Join_Key|
------------------------------------
| 1 | ACompany | AC |
| 2 | BCompany | BC |
While this table doesn't have many column, there are somewhere around 4 million rows.
I want to calculate some string distance calculations on these company names. I have the following query
select a.Company_Name as Name1,
b.Company_Name as Name2,
Fuzzy_Match(a.Company_Name, b.Company_Name, 'JaccardDistance') as Jaccard --this is a custom function
from [Companies] a, [Companies] b
While something like this would work on a smaller database, since my database is so large, there is no way for me to be able to get through all of the combinations in a reasonable amount of time. So I thought about partitioning the database with a window function.
select Company_Name,
ROW_NUMBER() over(partition by Join_Key order by Join_Key asc) as row_num
Join_Key
from [Companies]
This gives me a list of the companies numbered and partitioned by their join_key, but the thing that I'm not sure of is how to do both things.
How can I perform a cross join and calculate the string similarity measures for each partition so that I'm only comparing companies that both have 'AC' as their join key?

How can I SELECT the max row in a table SQL?

I have a little problem.
My table is:
Bill Product ID Units Sold
----|-----------|------------
1 | 10 | 25
1 | 20 | 30
2 | 30 | 11
3 | 40 | 40
3 | 20 | 20
I want to SELECT the product which has sold the most units; in this sample case, it should be the product with ID 20, showing 50 units.
I have tried this:
SELECT
SUM(pv."Units sold")
FROM
"Products" pv
GROUP BY
pv.Product ID;
But this shows all the products, how can I select only the product with the most units sold?
Leaving aside for the moment the possibility of having multiple products with the same number of units sold, you can always sort your results by the sum, highest first, and take the first row:
SELECT pv."Product ID", SUM(pv."Units sold")
FROM "Products" pv
GROUP BY pv."Product ID"
ORDER BY SUM(pv."Units sold") DESC
LIMIT 1
I'm not quite sure whether the double-quote syntax for column and table names will work - exact syntax will depend on your specific RDBMS.
Now, if you do want to get multiple rows when more than one product has the same sum, then the SQL will become a bit more complicated:
SELECT pv.`Product ID`, SUM(pv.`Units sold`)
FROM `Products` pv
GROUP BY pv.`Product ID`
HAVING SUM(pv.`Units sold`) = (
select max(sums)
from (
SELECT SUM(pv2.`Units sold`) as "sums"
FROM `Products` pv2
GROUP BY pv2.`Product ID`
) as subq
)
Here's the sqlfiddle
SELECT SUM(pv."Units sold") as `sum`
FROM "Products" pv
group by pv.Product ID
ORDER BY sum DESC
LIMIT 1
limit 1 + order by
The Best and effective way to this is Max function
Here's The General Syntax of Max function
SELECT MAX(ID) AS id
FROM Products;
and in your Case
SELECT MAX(Units Sold) from products
Here is the Complete Reference to MIN and MAX functions in Query
Click Here

Select a row used for GROUP BY

I have this table:
id | owner | asset | rate
-------------------------
1 | 1 | 3 | 1
2 | 1 | 4 | 2
3 | 2 | 3 | 3
4 | 2 | 5 | 4
And i'm using
SELECT asset, max(rate)
FROM test
WHERE owner IN (1, 2)
GROUP BY asset
HAVING count(asset) > 1
ORDER BY max(rate) DESC
to get intersection of assets for specified owners with best rate.
I also need id of row used for max(rate), but i can't find a way to include it to SELECT. Any ideas?
Edit:
I need
Find all assets that belongs to both owners (1 and 2)
From the same asset i need only one with the best rate (3)
I also need other columns (owner) that belongs to the specific asset with best rate
I expect the following output:
id | asset | rate
-------------------------
3 | 3 | 3
Oops, all 3s, but basically i need id of 3rd row to query the same table again, so resulting output (after second query) will be:
id | owner | asset | rate
-------------------------
3 | 2 | 3 | 3
Let's say it's Postgres, but i'd prefer reasonably cross-DBMS solution.
Edit 2:
Guys, i know how to do this with JOINs. Sorry for misleading question, but i need to know how to get extra from existing query. I already have needed assets and rates selected, i just need one extra field among with max(rate) and given conditions if it's possible.
Another solution that might or might not be faster than a self join (depending on the DBMS' optimizer)
SELECT id,
asset,
rate,
asset_count
FROM (
SELECT id,
asset,
rate,
rank() over (partition by asset order by rate desc) as rank_rate,
count(asset) over (partition by null) as asset_count
FROM test
WHERE owner IN (1, 2)
) t
WHERE rank_rate = 1
ORDER BY rate DESC
You are dealing with two questions and trying to solve them as if they are one. With a subquery, you can better refine by filtering the list in the proper order first (max(rate)), but as soon as you group, you lose this. As such, i would set up two queries (same procedure, if you are using procedures, but two queries) and ask the questions separately. Unless ... you need some of the information in a single grid when output.
I guess the better direction to head is to have you show how you want the output to look. Once you bake the input and the output, the middle of the oreo is easier to fill.
SELECT b.id, b.asset, b.rate
from
(
SELECT asset, max(rate) maxrate
FROM test
WHERE owner IN (1, 2)
GROUP BY asset
HAVING count(asset) > 1
) a, test b
WHERE a.asset = b.asset
AND a.maxrate = b.rate
ORDER BY b.rate DESC
You don't specify what type of database you're running on, but if you have analytical functions available you can do this:
select id, asset, max_rate
from (
select ID, asset, max(rate) over (partition by asset) max_rate,
row_number() over (partition by asset order by rate desc) row_num
from test
where owner in (1,2)
) q
where row_num = 1
I'm not sure how to add in the "having count(asset) > 1" in this way though.
This first searches for rows with the maximum rate per asset. Then it takes the highest id per asset, and selects that:
select *
from test
inner join
(
select max(id) as MaxIdWithMaxRate
from test
inner join
(
select asset
, max(rate) as MaxRate
from test
group by
asset
) filter
on filter.asset = test.asset
and filter.MaxRate = test.rate
group by
asset
) filter2
on filter.MaxIdWithMaxRate = test.id
If multiple assets share the maximum rate, this will display the one with the highest id.