Display max value of a sum using group by - sql

Cheers everybody,
I've been trying endlessly to display only the max value of COBRANCATOTAL for each year and display the name and nif of the client,
for instance the query result is :
Current Result
Therefore the result should be
227518698 | Rui | G | 2015 | 100
227518699 | Sara | G | 2016 | 100
227518693 | Paulo Pereira | G | 2014 | 43
227518691 | Diogo Batista | G | 2017 | 2
I can't seem to remove the other values, just to appear the maximum for each year.

You can do this using window functions. Here is one method:
with t as (
<your query here>
)
select t.*
from (select t.*, row_number() over (partition by year order by cobrancatotal desc) as seqnum
from t
) t
where seqnum = 1;
You should also learn to use proper, explicit JOIN syntax. Commas in the FROM clause are from archaic versions of SQL.

Related

Self join to create a new column with updated records

I am trying to write a SQL query to get the start date for employees in a store. As seen in the first screenshot, employee number 5041 had the number A0EH but as the number got updated, it updated the start date for the employee as well. This effects the metric of total duration in the store.
I am trying to get to the output below but haven't been able to figure out how to get this view.
This is the code I was trying but I am not getting the correct output.
select
esd.employee_number,
(case when esd.old_employee_number is null then es.employee_number else es.old_employee_number end) as old_employee_number,
esd.entity_id,
esd.original_start_date
from earliest_start_date as esd
left join earliest_start_date as es
on (es.employee_number = esd.old_employee_number)
How do I solve this on SQL?
Redshift reportedly supports recursion via WITH clause. Here's an example:
MariaDB 10.5 has similar support. Test case is here:
Fully working test case (via MariaDB 10.5) (Updated)
Link to Amazon Redshift detail for WITH clause and window functions:
Amazon Redshift - WITH clause
Amazon redshift - Window functions
WITH RECURSIVE cte (employee_number, original_no, entity_id, original_start_date, n) AS (
SELECT employee_number, employee_number, entity_id, original_start_date, 1 FROM earliest_start_date WHERE old_employee_number IS NULL UNION ALL
SELECT new_tbl.employee_number, cte.original_no, cte.entity_id, cte.original_start_date, n+1
FROM earliest_start_date new_tbl
JOIN cte
ON cte.employee_number = new_tbl.old_employee_number
)
, xrows AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY entity_id ORDER BY n DESC) AS rn
FROM cte
)
SELECT * FROM xrows WHERE rn = 1
;
Result:
+-----------------+-------------+-----------+---------------------+------+----+
| employee_number | original_no | entity_id | original_start_date | n | rn |
+-----------------+-------------+-----------+---------------------+------+----+
| XXXX | XXXX | 88 | 2021-09-02 | 1 | 1 |
| 5041 | A0EH | 96 | 2021-09-05 | 2 | 1 |
+-----------------+-------------+-----------+---------------------+------+----+
2 rows in set
Raw test data:
SELECT * FROM earliest_start_date;
+-----------------+---------------------+-----------+---------------------+
| employee_number | old_employee_number | entity_id | original_start_date |
+-----------------+---------------------+-----------+---------------------+
| 5041 | A0EH | 96 | 2021-09-10 |
| A0EH | NULL | 96 | 2021-09-05 |
| XXXX | NULL | 88 | 2021-09-02 |
+-----------------+---------------------+-----------+---------------------+
Note that the logic makes assumption about uniqueness of the employee_number and, in the current form, can't handle cases where the employee_number is reused by the same employee or used again with a different employee without adjusting prior data. There may not be enough detail in the current structure to handle those cases.

POSTGRESQL : How to select the first row of each group?

With this query :
WITH responsesNew AS
(
SELECT DISTINCT responses."studentId", notation, responses."givenHeart",
SUM(notation + responses."givenHeart") OVER (partition BY responses."studentId"
ORDER BY responses."createdAt") AS total, responses."createdAt",
FROM responses
)
SELECT responsesNew."studentId", notation, responsesNew."givenHeart", total,
responsesNew."createdAt"
FROM responsesNew
WHERE total = 3
GROUP BY responsesNew."studentId", notation, responsesNew."givenHeart", total,
responsesNew."createdAt"
ORDER BY responsesNew."studentId" ASC
I get this data table :
studentId | notation | givenHeart | total | createdAt |
----------+----------+------------+-------+--------------------+
374 | 1 | 0 | 3 | 2017-02-13 12:43:03
374 | null | 0 | 3 | 2017-02-15 22:22:17
639 | 1 | 2 | 3 | 2017-04-03 17:21:30
790 | 1 | 0 | 3 | 2017-02-12 21:12:23
...
My goal is to keep only in my data table the early row of each group like shown below :
studentId | notation | givenHeart | total | createdAt |
----------+----------+------------+-------+--------------------+
374 | 1 | 0 | 3 | 2017-02-13 12:43:03
639 | 1 | 2 | 3 | 2017-04-03 17:21:30
790 | 1 | 0 | 3 | 2017-02-12 21:12:23
...
How can I get there?
I've read many topics over here but nothing I've tried with DISTINCT, DISTINCT ON, subqueries in WHERE, LIMIT, etc have worked for me (surely due to my poor understanding). I've met errors related to window function, missing column in ORDER BY and a few others I can't remember.
You can do this with distinct on. The query would look like this:
WITH responsesNew AS (
SELECT DISTINCT r."studentId", notation, r."givenHeart",
SUM(notation + r."givenHeart") OVER (partition BY r."studentId"
ORDER BY r."createdAt") AS total,
r."createdAt"
FROM responses r
)
SELECT DISTINCT ON (r."studentId") r."studentId", notation, r."givenHeart", total,
r."createdAt"
FROM responsesNew r
WHERE total = 3
ORDER BY r."studentId" ASC, r."createdAt";
I'm pretty sure this can be simplified. I just don't understand the purpose of the CTE. Using SELECT DISTINCT in this way is very curious.
If you want a simplified query, ask another question with sample data, desired results, and explanation of what you are doing and include the query or a link to this question.
use Row_number() window function to add a row number to each partition and then only show row 1.
no need to fully qualify names if only one table is involved. and use aliases when qualifying to simplify readability.
WITH responsesNew AS
(
SELECT "studentId"
, notation
, "givenHeart"
, SUM(notation + "givenHeart") OVER (partition BY "studentId" ORDER BY "createdAt") AS total
, "createdAt"
, Row_number() OVER ("studentId" ORDER BY "createdAt") As RNum
FROM responses r
)
SELECT RN."studentId"
, notation, RN."givenHeart"
, total
, RN."createdAt"
FROM responsesNew RN
WHERE total = 3
AND RNum = 1
GROUP BY RN."studentId"
, notation
, RN."givenHeart", total
, RN."createdAt"
ORDER BY RN."studentId" ASC

Show last update date

I am new in this forum and also new in SQL my question is
I have an Excel sheet link to database with "From Microsoft query" I have 3 tables link together pd_ln,pdcflbrt,pdlbr
By using the following query I am getting this data
SELECT pdcflbrt.lbrcod, pdcflbrt.lbrrat, pd_ln.prdnum, pdcflbrt.begeffdat
FROM velocity.dbo.pd_ln pd_ln, velocity.dbo.pdcflbrt pdcflbrt, velocity.dbo.pdlbr pdlbr
WHERE pdlbr.lbrrattky = pdcflbrt.lbrrattky AND pd_ln.pd_ln_tky = pdlbr.pd_ln_tky
+--------------+--------------+-----------+------------------+
| lbrcod | lbrrat | prdnum | begeffdat |
+--------------+--------------+-----------+------------------+
| FC Braselton | 0.11 | 00236 | 7/15/2012 0:00 |
| FC Braselton | 0.11 | 00236 | 7/15/2012 0:00 |
| FC Braselton | 0.1 | 00236 | 12/10/2012 0:00 |
| Sizing | 0.21 | 03103 | 8/28/2015 0:00 |
| Sizing | 0.2 | 03103 | 10/13/2011 0:00 |
+--------------+--------------+-----------+------------------+
How do I query to get the last begeffdat of each prdnum.
Magood's answer may work in this situation. However, if there was a unique identifier for each edit that you were selecting, it wouldn't work. As far as I know, you would have to get involved with row_number() like so:
SELECT s2.lbrcod, s2.lbrrat, s2.prdnum, s2.begeffdat from
(SELECT pdcflbrt.lbrcod
, pdcflbrt.lbrrat
, pd_ln.prdnum
, pdcflbrt.begeffdat
, row_number() over (partition by pd_ln.prdnum order by pdcflbrt.begeffdat desc) as RN
FROM velocity.dbo.pd_ln pd_ln, velocity.dbo.pdcflbrt pdcflbrt, velocity.dbo.pdlbr pdlbr
WHERE pdlbr.lbrrattky = pdcflbrt.lbrrattky AND pd_ln.pd_ln_tky = pdlbr.pd_ln_tky) s2
where s2.rn = 1
This will return only the top date (it is the same query on the inner portion, but with the row_number() function added, with each different prdnum starting the numbers over, and ordering the rows by date, with the newest date first. The outer portion selects only row 1 (that's the last where) which is the newest date.
EDIT: Alternatively, if you only want the OLDEST update, you could change the desc in the main query's select statement to say asc.
-- Only for name and latest date
select lbrcod, max(begeffdate) begeffdat from #table
group by lbrcod
-- For all columns
select * from (
select *, row_number() over (partition by prdnum order by begeffdate desc) rowNum from #table
) data
where rowNum = 1

CTE to represent a logical table for the rows in a table which have the max value in one column

I have an "insert only" database, wherein records aren't physically updated, but rather logically updated by adding a new record, with a CRUD value, carrying a larger sequence. In this case, the "seq" (sequence) column is more in line with what you may consider a primary key, but the "id" is the logical identifier for the record. In the example below,
This is the physical representation of the table:
seq id name | CRUD |
----|-----|--------|------|
1 | 10 | john | C |
2 | 10 | joe | U |
3 | 11 | kent | C |
4 | 12 | katie | C |
5 | 12 | sue | U |
6 | 13 | jill | C |
7 | 14 | bill | C |
This is the logical representation of the table, considering the "most recent" records:
seq id name | CRUD |
----|-----|--------|------|
2 | 10 | joe | U |
3 | 11 | kent | C |
5 | 12 | sue | U |
6 | 13 | jill | C |
7 | 14 | bill | C |
In order to, for instance, retrieve the most recent record for the person with id=12, I would currently do something like this:
SELECT
*
FROM
PEOPLE P
WHERE
P.ID = 12
AND
P.SEQ = (
SELECT
MAX(P1.SEQ)
FROM
PEOPLE P1
WHERE P.ID = 12
)
...and I would receive this row:
seq id name | CRUD |
----|-----|--------|------|
5 | 12 | sue | U |
What I'd rather do is something like this:
WITH
NEW_P
AS
(
--CTE representing all of the most recent records
--i.e. for any given id, the most recent sequence
)
SELECT
*
FROM
NEW_P P2
WHERE
P2.ID = 12
The first SQL example using the the subquery already works for us.
Question: How can I leverage a CTE to simplify our predicates when needing to leverage the "most recent" logical view of the table. In essence, I don't want to inline a subquery every single time I want to get at the most recent record. I'd rather define a CTE and leverage that in any subsequent predicate.
P.S. While I'm currently using DB2, I'm looking for a solution that is database agnostic.
This is a clear case for window (or OLAP) functions, which are supported by all modern SQL databases. For example:
WITH
ORD_P
AS
(
SELECT p.*, ROW_NUMBER() OVER ( PARTITION BY id ORDER BY seq DESC) rn
FROM people p
)
,
NEW_P
AS
(
SELECT * from ORD_P
WHERE rn = 1
)
SELECT
*
FROM
NEW_P P2
WHERE
P2.ID = 12
PS. Not tested. You may need to explicitly list all columns in the CTE clauses.
I guess you already put it together. First find the max seq associated with each id, then use that to join back to the main table:
WITH newp AS (
SELECT id, MAX(seq) AS latestseq
FROM people
GROUP BY id
)
SELECT p.*
FROM people p
JOIN newp n ON (n.latestseq = p.seq)
ORDER BY p.id
What you originally had would work, or moving the CTE into the "from" clause. Maybe you want to use a timestamp field rather than a sequence number for the ordering?
Following up from #Glenn's answer, here is an updated query which meets my original goal and is on par with #mustaccio's answer, but I'm still not sure what the performance (and other) implications of this approach vs the other are.
WITH
LATEST_PERSON_SEQS AS
(
SELECT
ID,
MAX(SEQ) AS LATEST_SEQ
FROM
PERSON
GROUP BY
ID
)
,
LATEST_PERSON AS
(
SELECT
P.*
FROM
PERSON P
JOIN
LATEST_PERSON_SEQS L
ON
(
L.LATEST_SEQ = P.SEQ)
)
SELECT
*
FROM
LATEST_PERSON L2
WHERE
L2.ID = 12

MIN() Function in SQL

Need help with Min Function in SQL
I have a table as shown below.
+------------+-------+-------+
| Date_ | Name | Score |
+------------+-------+-------+
| 2012/07/05 | Jack | 1 |
| 2012/07/05 | Jones | 1 |
| 2012/07/06 | Jill | 2 |
| 2012/07/06 | James | 3 |
| 2012/07/07 | Hugo | 1 |
| 2012/07/07 | Jack | 1 |
| 2012/07/07 | Jim | 2 |
+------------+-------+-------+
I would like to get the output like below
+------------+------+-------+
| Date_ | Name | Score |
+------------+------+-------+
| 2012/07/05 | Jack | 1 |
| 2012/07/06 | Jill | 2 |
| 2012/07/07 | Hugo | 1 |
+------------+------+-------+
When I use the MIN() function with just the date and Score column I get the lowest score for each date, which is what I want. I don't care which row is returned if there is a tie in the score for the same date. Trouble starts when I also want name column in the output. I tried a few variation of SQL (i.e min with correlated sub query) but I have no luck getting the output as shown above. Can anyone help please:)
Query is as follows
SELECT DISTINCT
A.USername, A.Date_, A.Score
FROM TestTable AS A
INNER JOIN (SELECT Date_,MIN(Score) AS MinScore
FROM TestTable
GROUP BY Date_) AS B
ON (A.Score = B.MinScore) AND (A.Date_ = B.Date_);
Use this solution:
SELECT a.date_, MIN(name) AS name, a.score
FROM tbl a
INNER JOIN
(
SELECT date_, MIN(score) AS minscore
FROM tbl
GROUP BY date_
) b ON a.date_ = b.date_ AND a.score = b.minscore
GROUP BY a.date_, a.score
SQL-Fiddle Demo
This will get the minimum score per date in the INNER JOIN subselect, which we use to join to the main table. Once we join the subselect, we will only have dates with names having the minimum score (with ties being displayed).
Since we only want one name per date, we then group by date and score, selecting whichever name: MIN(name).
If we want to display the name column, we must use an aggregate function on name to facilitate the GROUP BY on date and score columns, or else it will not work (We could also use MAX() on that column as well).
Please learn about the GROUP BY functionality of RDBMS.
SELECT Date_,Name,MIN(Score)
FROM T
GROUP BY Name
This makes the assumption that EACH NAME and EACH date appears only once, and this will only work for MySQL.
To make it work on other RDBMSs, you need to apply another group function on the Date column, like MAX. MIN. etc
SELECT T.Name, T.Date_, MIN(T.Score) as Score FROM T
GROUP BY T.Date_
Edit: This answer is not corrected as pointed out by JNK in comments
SELECT Date_,MAX(Name),MIN(Score)
FROM T
GROUP BY Date_
Here I am using MAX(NAME), it will pick one name if two names were found with the same goal numbers.
This will find Min score for each day (no duplicates), scored by any player. The name that starts with Z will be picked first than the name that starts with A.
Edit: Fixed by removing group by name