Oracle SQL query to GROUP BY and subtract? - sql

If I have four columns: A, B, C and D in a table how would an Oracle SQL query group by column D, then amongst each grouping select the rows where C = 'c' and for those selected rows, returns the value of B minus A?

SELECT Aggfunction(B - A), D FROM TABLENAME WHERE C='c' GROUP BY D
Replace Aggfunction with aggregate function you want e.g. SUM or AVG. You can only include ungrouped columns from a grouped sql query result set in an aggregate function (which makes sense because you only get one record out per group so have to accumulate the ungrouped columns in some way in order to represent a value per group)

Related

Filter SQL by Aggregate Not in SELECT Statement

Can you filter a SQL table based on an aggregated value, but still show column values that weren't in the aggregate statement?
My table has only 3 columns: "Composer_Tune", "_Year", and "_Rank".
I want to use SQL to find which "Composer_Tune" values are repeated in each annual list, as well as which ranks the duplicated items had.
Since I am grouping by "Composer_Tune" & "Year", I can't list "_Rank" with my current code.
The image shows the results of my original "find the duplicates" query vs what I want:
Current vs Desired Results
I tried applying the concepts in this Aggregate Subquery StackOverflow post but am still getting "_Rank is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause" from this code:
WITH DUPE_DB AS (SELECT * FROM DB.dbo.[NAME] GROUP BY Composer_Tune, _Year HAVING COUNT(*)>1)
SELECT Composer_Tune, _Year, _Rank
FROM DUPE_DB
You need to explicitly declare the columns used in the Group By expression in the select columns.
You can use the following documentation if you are using transact sql for the proper use of Group By.
Simply join the aggregated resultset to original unit level table:
WITH DUPE_DB AS (
SELECT Composer_Tune, _Year
FROM DB.dbo.[NAME]
GROUP BY Composer_Tune, _Year
HAVING COUNT(*) > 1
)
SELECT n.Composer_Tune, n._Year, n._Rank
FROM DB.dbo.[NAME] n
INNER JOIN DUPE_DB
ON n.Compuser_Tune = DUPE_DB.Composer_Tune
AND n._Year = DUPE_DB._Year
ORDER n.Composer_Tune, n._Year

Combine multiple rows with N-1 identical columns and 1 different column into one row, preserving the first N-1 columns and summing the last column

I have a query that produces a table with 26 columns, A-Z. For some rows, columns A-Y are identical, and column Z is the only one that differs. Is there an easy and clean way to combine duplicate rows, such that columns A-Y are the same and column Z is summed over? My solution is to do something like
SELECT A, B, C,...,Y,SUM(Z)
-- lots of work
FROM [table produced by multiple joins]
GROUP BY A, B, C,...,Y
The last GROUP BY clause ends up being very long. It's also prone to making mistakes if columns are ever added or removed from the SELECT statement. Is this the only way to go about what I want to do?
Below is for BigQuery Standard SQL
#standardSQL
SELECT
ANY_VALUE((SELECT AS STRUCT t.* EXCEPT(z))).*,
SUM(z) AS z
FROM `project.dataset.table_produced_by_multiple_joins` t
GROUP BY FORMAT('%t', (SELECT AS STRUCT t.* EXCEPT(z)))

SQL Basic Syntax

I have the following problem:
What happens if the query didn't ask for B in the select?. I think it would give an error because the aggregate is computed based on the values in the select clause.
I have the following relation schema and queries:
Suppose R(A,B) is a relation with a single tuple (NULL, NULL).
SELECT A, COUNT(B)
FROM R
GROUP BY A;
SELECT A, COUNT(*)
FROM R
GROUP BY A;
SELECT A, SUM(B)
FROM R
GROUP BY A;
The first query returns NULL and 0. I am not sure about what the second query returns. The aggregate COUNT(*) count the number of tuples in one table; however, I don't know what it does to a group. The third returns NULL,NULL
The only rule about SELECT and GROUP BY is that the unaggregated columns in the SELECT must be in the GROUP BY (with very specific exceptions).
You can have columns in the GROUP BY that never appear in the SELECT. That is fine. It doesn't affect the definition of a group, but multiple rows may seem to have the same values in the GROUP BY columns.

Postgres select distinct based on timestamp

I have a table with two columns a and b where a is an ID and b is a timestamp.
I need to select all of the a's which are distinct but I only care about the most up to date row per ID.
I.e. I need a way of selecting distinct a's conditional on the b values.
Is there a way to do this using DISTINCT ON in postgres?
Cheers
Like #a_horse_with_no_name suggests, the solution is
SELECT DISTINCT ON (a) a, b FROM the_table ORDER BY a, b DESC
As the manual says,
Note that the "first row" of a set is unpredictable unless the query
is sorted on enough columns to guarantee a unique ordering of the rows
arriving at the DISTINCT filter. (DISTINCT ON processing occurs after
ORDER BY sorting.)
As posted by the upvoted answers, SELECT DISTINCT ON (a) a, b FROM the_table ORDER BY a, b DESC works on Postgre 12. However, I am posting this answer to highlight few important points:
The results will be sorted based on column a; not column b.
Within each result row, the most recent (highest value) for column b would be picked.
In case, someone wants to get the most recent value for column b on the entire result set, in sql, we can run : SELECT MAX(b) from (SELECT DISTINCT ON (a) a, b FROM the_table ORDER BY a, b DESC).

SQL two criteria from one group-by

I have a table with some "functionally duplicate" records - different IDs, but the 4 columns of "user data" (of even more columns) are identical. I've got a query working that will select all records that have such duplicates.
Now I want to select, from each group of duplicates, first any of them that have column A not null - and I've verified from the data that there are at most 1 such rows per group - and if there are none in this particular group, then the minimum of column ID.
How do I select that? I can't exactly use a non-aggregate in the THEN of a CASE and an aggregate in the ELSE. E.g. this doesn't work:
SELECT CASE
WHEN d.A IS NULL THEN d.ID
ELSE MIN(d.ID) END,
d.B,
d.C,
d.E,
d.F
FROM TABLE T
JOIN (my duplicate query here) D ON T.B=D.B
AND T.C=D.C
AND T.E=D.E
AND T.F=D.F
GROUP BY T.B,
T.C,
T.E,
T.F
Error being:
column A must appear in the GROUP BY clause or be used in an aggregate function.
This can be radically simpler:
SELECT DISTINCT ON (b, c, e, f)
b, c, e, f, id -- add more columns freely
FROM (<duplicate query here>) sub
ORDER BY b, c, e, f, (a IS NOT NULL), id
Your duplicate query has all columns. No need to JOIN to the base table again.
Use the Postgres extension of the standard SQL DISTINCT: DISTINCT ON:
Select first row in each GROUP BY group?
Postgres has a proper boolean type. You can ORDER BY boolean expression directly. The sequence is FALSE (0), TRUE (1), NULL (NULL). If a is NULL, this expression is FALSE and sorts first: (a IS NOT NULL). The rest is ordered by id. Voilá.
Selection of ID happens automatically. According to your description you want the ID of the row selected in this query. Nothing more to do.
You can probably integrate this into your duplicate query directly.