LEFT JOINing the max/top - sql

I have two tables from which I'm trying to run a query to return the maximum (or top) transaction for each person. I should note that I cannot change the table structure. Rather, I can only pull data.
People
+-----------+
| id | name |
+-----------+
| 42 | Bob |
| 65 | Ted |
| 99 | Stu |
+-----------+
Transactions (there is no primary key)
+---------------------------------+
| person | amount | date |
+---------------------------------+
| 42 | 3 | 9/14/2030 |
| 42 | 4 | 7/02/2015 |
| 42 | *NULL* | 2/04/2020 |
| 65 | 7 | 1/03/2010 |
| 65 | 7 | 5/20/2020 |
+---------------------------------+
Ultimately, for each person I want to return the highest amount. If that doesn't work then I'd like to look at the date and return the most recent date.
So, I'd like my query to return:
+----------------------------------------+
| person_id | name | amount | date |
+----------------------------------------+
| 42 | Bob | 4 | 7/02/2015 | (<- highest amount)
| 65 | Ted | 7 | 5/20/2020 | (<- most recent date)
| 99 | Stu | *NULL* | *NULL* | (<- no records in Transactions table)
+----------------------------------------+
SELECT People.id, name, amount, date
FROM People
LEFT JOIN (
SELECT TOP 1 person_id
FROM Transactions
WHERE person_id = People.id
ORDER BY amount DESC, date ASC
)
ON People.id = person_id
I can't figure out what I am doing wrong, but I know it's wrong. Any help would be much appreciated.

You are almost there but since there are duplicate Id in the Transaction table ,so you need to remove those by using Row_number() function
Try this :
With cte as
(Select People,amount,date ,row_number() over (partition by People
order by amount desc, date desc) as row_num
from Transac )
Select * from People as a
left join cte as b
on a.ID=b.People
and b.row_num=1
The result is in Sql Fiddle
Edit: Row_number() from MSDN
Returns the sequential number of a row within a partition of a result set,
starting at 1 for the first row in each partition.
Partition is used to group the result set and Over by clause is used
Determine the partitioning and ordering of the rowset before the
associated window function is applied.

Related

How to trace back a record all the way to origin using SQL

We are a table called ticketing that tracks all the service tickets. One ticket can lead to another ticket which leads to another ticket indicated by the replaced_by_ticket_id field below
| ticket_id | is_current | replaced_by_ticket_id |
|-----------|------------|-----------------------|
| 134 | 0 | 240 |
| 240 | 0 | 321 |
| 321 | 1 | Null |
| 34 | 0 | 93 |
| 25 | 0 | 16 |
| 16 | 0 | 25 |
| 93 | 1 | Null |
How do I write a query to get the number of tickets leading to the current ones (321 & 93)? I mean I could join the table by itself, but there is no way of knowing how many times to join. Plus different tickets have different number of levels.
Here is the expected result of the query
| ticket_id | total_tickets |
|-----------|---------------|
| 321 | 3 |
| 93 | 4 |
What is the best way to do it?
You can use a recursive query; the trick is to keep track of the original "current" ticket, so you can aggregate by that in the outer query.
So:
with cte as (
select ticket_id, ticket_id as parent_id from ticketing where is_current = 1
union all
select c.ticket_id, t.ticket_id
from ticket t
inner join cte c on c.parent_id = t.replaced_by_ticket_id
)
select ticket_id, count(*) total_tickets
from cte
group by ticket_id

SQL SELECT most recently created row WHERE something is true

I am trying to SELECT the most recently created row, WHERE the ID field in the row is a certain number, so I don't want the most recently created row in the WHOLE table, but the most recently created one WHERE the ID field is a specific number.
My Table:
Table:
| name | value | num |SecondName| Date |
| James | HEX124 | 1 | Carl | 11022020 |
| Jack | JEU836 | 4 | Smith | 19042020 |
| Mandy | GER234 | 33 | Jones | 09042020 |
| Mandy | HER575 | 7 | Jones | 10052020 |
| Jack | JEU836 | 4 | Smith | 14022020 |
| Ryan | GER631 | 33 | Jacque | 12042020 |
| Sarah | HER575 | 7 | Barlow | 01022019 |
| Jack | JEU836 | 4 | Smith | 14042020 |
| Ryan | HUH233 | 33 | Jacque | 15042020 |
| Sarah | HER575 | 7 | Barlow | 02022019 |
My SQL:
SELECT name, value, num, SecondName, Date
FROM MyTable
INNER JOIN (SELECT NAME, MAX(DATE) AS MaxTime FROM MyTable GROUP BY NAME) grouped ON grouped.NAME = NAME
WHERE NUM = 33
AND grouped.MaxTime = Date
What I'm doing here, is selecting the table, and creating an INNER JOIN where I'm taking the MAX Date value (the biggest/newest value), and grouping by the Name, so this will return the newest created row, for each person (Name), WHERE the NUM field is equal to 33.
Results:
| Ryan | HUH233 | 33 | Jacque | 15042020 |
As you can see, it is returning one row, as there are 3 rows with the NUM value of 33, two of which are with the Name 'Ryan', so it is grouping by the Name, and returning the latest entry for Ryan (This works fine).
But, Mandy is missing, as you can see in my first table, she has two entries, one under the NUM value of 33, and the other with the NUM value of 7. Because the entry with the NUM value of 7 was created most recently, my query where I say 'grouped.MaxTime = Date' is taking that row, and it is not being displayed, as the NUM value is not 33.
What I want to do, is read every row WHERE the NUM field is 33, THEN select the Maximum Time inside of the rows with the value of 33.
I believe what it is doing, prioritising the Maximum Date value first, then filtering the selected fields with the NUM value of 33.
Desired Results:
| Ryan | HUH233 | 33 | Jacque | 15042020 |
| Mandy | GER234 | 33 | Jones | 09042020 |
Any help would be appreciated, thank you.
If I folow you correctly, you can filter with a subquery:
select t.*
from mytable t
where t.num = 33 and t.date = (
select max(t1.date) from mytable t1 where t1.name = t.name and t1.num = t.num
)
Look at your subquery. You want the maximum dates for num 33, but you are selecting the maximum dates independent from num.
I think you want:
select *
from mytable
where (name, date) in
(
select name, max(date)
from mytable
where num = 33
group by name
);

Select latest values for group of related records

I have a table that accommodates data that is logically groupable by multiple properties (foreign key for example). Data is sequential over continuous time interval; i.e. it is a time series data. What I am trying to achieve is to select only latest values for each group of groups.
Here is example data:
+-----------------------------------------+
| code | value | date | relation_id |
+-----------------------------------------+
| A | 1 | 01.01.2016 | 1 |
| A | 2 | 02.01.2016 | 1 |
| A | 3 | 03.01.2016 | 1 |
| A | 4 | 01.01.2016 | 2 |
| A | 5 | 02.01.2016 | 2 |
| A | 6 | 03.01.2016 | 2 |
| B | 1 | 01.01.2016 | 1 |
| B | 2 | 02.01.2016 | 1 |
| B | 3 | 03.01.2016 | 1 |
| B | 4 | 01.01.2016 | 2 |
| B | 5 | 02.01.2016 | 2 |
| B | 6 | 03.01.2016 | 2 |
+-----------------------------------------+
And here is example of desired output:
+-----------------------------------------+
| code | value | date | relation_id |
+-----------------------------------------+
| A | 3 | 03.01.2016 | 1 |
| A | 6 | 03.01.2016 | 2 |
| B | 3 | 03.01.2016 | 1 |
| B | 6 | 03.01.2016 | 2 |
+-----------------------------------------+
To put this in perspective — for every related object I want to select each code with latest date.
Here is a select I came with. I've used ROW_NUMBER OVER (PARTITION BY...) approach:
SELECT indicators.code, indicators.dimension, indicators.unit, x.value, x.date, x.ticker, x.name
FROM (
SELECT
ROW_NUMBER() OVER (PARTITION BY indicator_id ORDER BY date DESC) AS r,
t.indicator_id, t.value, t.date, t.company_id, companies.sic_id,
companies.ticker, companies.name
FROM fundamentals t
INNER JOIN companies on companies.id = t.company_id
WHERE companies.sic_id = 89
) x
INNER JOIN indicators on indicators.id = x.indicator_id
WHERE x.r <= (SELECT count(*) FROM companies where sic_id = 89)
It works but the problem is that it is painfully slow; when working with about 5% of production data which equals to roughly 3 million fundamentals records this select take about 10 seconds to finish. My guess is that happens due to subselect selecting huge amounts of records first.
Is there any way to speed this query up or am I digging in wrong direction trying to do it the way I do?
Postgres offers the convenient distinct on for this purpose:
select distinct on (relation_id, code) t.*
from t
order by relation_id, code, date desc;
So your query uses different column names than your sample data, so it's hard to tell, but it looks like you just want to group by everything except for date? Assuming you don't have multiple most recent dates, something like this should work. Basically don't use the window function, use a proper group by, and your engine should optimize the query better.
SELECT mytable.code,
mytable.value,
mytable.date,
mytable.relation_id
FROM mytable
JOIN (
SELECT code,
max(date) as date,
relation_id
FROM mytable
GROUP BY code, relation_id
) Q1
ON Q1.code = mytable.code
AND Q1.date = mytable.date
AND Q1.relation_id = mytable.relation_id
Other option:
SELECT DISTINCT Code,
Relation_ID,
FIRST_VALUE(Value) OVER (PARTITION BY Code, Relation_ID ORDER BY Date DESC) Value,
FIRST_VALUE(Date) OVER (PARTITION BY Code, Relation_ID ORDER BY Date DESC) Date
FROM mytable
This will return top value for what ever you partition by, and for whatever you order by.
I believe we can try something like this
SELECT CODE,Relation_ID,Date,MAX(value)value FROM mytable
GROUP BY CODE,Relation_ID,Date

SQL Group by one column and decide which column to choose

Let's say I have data like this :
| id | code | name | number |
-----------------------------------------------
| 1 | 20 | A | 10 |
| 2 | 20 | B | 20 |
| 3 | 10 | C | 30 |
| 4 | 10 | D | 80 |
I would like to group rows by code value, but get real rows back (not some aggregate function).
I know that just
select *
from table
group by code
won't work because database don't know which row to return where code is the same.
So my question is how to tell database to select (for example) the lower number column so in my case
| id | code | name | number |
-----------------------------------------------
| 1 | 20 | A | 10 |
| 3 | 10 | C | 30 |
P.S.
I know how to do this by PARTITION but this is only allowed in Oracle databases and can't be created in JPA criteria builder (what is my ultimate goal).
Why You don't use code like this?
SELECT
id,
code,
name,
number
FROM
(
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY code ORDER BY number ASC) AS RowNo
FROM table
) s
WHERE s.RowNo = 1
You can look at this site;
Data Partitioning

Filter by value in last row of LEFT OUTER JOIN table

I have a Clients table in PostgreSQL (version 9.1.11), and I would like to write a query to filter that table. The query should return only clients which meet one of the following conditions:
--The client's last order (based on orders.created_at) has a fulfill_by_date in the past.
OR
--The client has no orders at all
I've looked for around 2 months, on and off, for a solution.
I've looked at custom last aggregate functions in Postgres, but could not get them to work, and feel there must be a built-in way to do this.
I've also looked at Postgres last_value window functions, but most of the examples are of a single table, not of a query joining multiple tables.
Any help would be greatly appreciated! Here is a sample of what I am going for:
Clients table:
| client_id | client_name |
----------------------------
| 1 | FirstClient |
| 2 | SecondClient |
| 3 | ThirdClient |
Orders table:
| order_id | client_id | fulfill_by_date | created_at |
-------------------------------------------------------
| 1 | 1 | 3000-01-01 | 2013-01-01 |
| 2 | 1 | 1999-01-01 | 2013-01-02 |
| 3 | 2 | 1999-01-01 | 2013-01-01 |
| 4 | 2 | 3000-01-01 | 2013-01-02 |
Desired query result:
| client_id | client_name |
----------------------------
| 1 | FirstClient |
| 3 | ThirdClient |
Try it this way
SELECT c.client_id, c.client_name
FROM clients c LEFT JOIN
(
SELECT *, ROW_NUMBER() OVER (PARTITION BY client_id ORDER BY created_at DESC) rnum
FROM orders
) o
ON c.client_id = o.client_id
AND o.rnum = 1
WHERE o.fulfill_by_date < CURRENT_DATE
OR o.order_id IS NULL
Output:
| CLIENT_ID | CLIENT_NAME |
|-----------|-------------|
| 1 | FirstClient |
| 3 | ThirdClient |
Here is SQLFiddle demo