Selecting records from subquery found set (postgres) - sql

I have a query on 2 tables (part, price). The simplified version of this query is:
SELECT price.*
FROM price
INNER JOIN parts ON (price.code = part.code )
WHERE price.type = '01'
ORDER BY date DESC
That returns several records:
code | type | date | price | file
-------------+----------+------------------------------------------------------
00065064705 | 01 | 2008-01-07 00:00:00 | 16.400000 | 28SEP2011.zip
00065064705 | 01 | 2007-02-05 00:00:00 | 15.200000 | 20JUL2011.zip
54868278900 | 01 | 2006-02-24 00:00:00 | 16.642000 | 28SEP2011.zip
As you can see, there is code 00065064705 listed twice. I just need the maxdate record (2008-01-07) along with the code, type, date and price for each unique code. So basically the top record for each unique code. This postgres so I can't use SELECT TOP or something like that.
I think I should be using this as subquery inside of a main query but I'm not sure how. something like
SELECT *
FROM price
JOIN (insert my original query here) AS price2 ON price.code = price2.code
Any help would be greatly appreciated.

You can use the row_number() window function to do that.
select *
from (SELECT price.*,
row_number() over (partition by price.code order by price.date desc) as rn
FROM price
INNER JOIN parts ON (price.code = part.code )
WHERE price.type='01') x
where rn = 1
ORDER BY date DESC
(*) Note: I may have prefixed some of the columns incorrectly, as I'm not sure which column is in which table. I'm sure you can fix that.

In Postgres you can use DISTINCT ON:
SELECT DISTINCT ON(code) *
FROM price
INNER JOIN parts ON price.code = part.code
WHERE price.type='01'
ORDER BY code, "date" DESC

select distinct on (code)
code, p.type, p.date, p.price, p.file
from
price p
inner join
parts using (code)
where p.type='01'
order by code, p.date desc
http://www.postgresql.org/docs/current/static/sql-select.html#SQL-DISTINCT

Related

SQL subqueries using SELECTs only

I need to write a query with subqueries using SELECT and aggregation functions only, e.g.:
select distinct m_name
from MANUFACT
where m_id in (select TOP 1 m_id
from PRODUCT
where p_id = (select p_id
from PRODUCT
where p_desc = 'Bronze Sculpture'));
The question is about query similar to this one, but using SUM(). The data I have:
Table SPERSON:
sp_id | sp_name
---------------
10 | Jones
39 | Matsu
23 | Atsuma
Table SALE:
sp_id | qty
-----------
10 | 20
23 | 30
10 | 10
39 | 20
etc.
The task is to return the sp_name s whose total number of products is <= 75.
The teacher says we're not allowed to use join, but I doubt whether is any way not to use it.
This is what I have so far:
select sp_name
from SPERSON
where sp_id in (select sp_id from SALE
where qty in (select sum(qty) group by sp_id));
Anyway, I only got the 'Each GROUP BY expression must contain at least one column that is not an outer reference' error, but can't really get the thing.
You can use correlated subquery :
SELECT q.sp_name
FROM( SELECT sp_name,
(SELECT SUM(qty) FROM sale s WHERE s.sp_id = p.sp_id ) AS qty
FROM SPERSON p
GROUP BY sp_name
) q
GROUP BY q.sp_name
HAVING SUM(q.qty) <= 75
Mostly, using correlated subqueries, which may contains a reference to the outer query and so produces different results for each row of the outer query, is not suggested. But I suggested to use it as an alternative method depending on your case for not being permitted to use JOIN. Btw, it is more straightforward to use JOIN .
You can try to approach a problem from different direction.
Create a query to calculate total quantity grouped by sp_id
SELECT s.sp_id, SUM(s.qty)
FROM SALE s
GROUP BY s.sp_id
Filter persons id which has quantity less or equal to 75
SELECT s.sp_id, SUM(s.qty)
FROM SALE s
GROUP BY s.sp_id
HAVING SUM(s.qty) <= 75
Because joins not allowed, "inject" name as a subquery
SELECT
(SELECT p.sp_name FROM SPERSON p WHERE p.sp_id = s.sp_id) AS name
FROM SALE s
GROUP BY s.sp_id
HAVING SUM(s.qty) <= 75

SQL join to and from dates - return most recent if no match found

I have two tables that I need to join. I have:
LEFT JOIN AutoBAF on (GETDATE() BETWEEN AutoBAF.FromDate and AutoBAF.ToDate)
and I get the expected result. Now if no matching record is found between the two dates (AutoBAF.FromDate and AutoBAF.ToDate) I would like to join the most recent matching record instead.
Can anyone point me in the right direction.
I am using a MS SQL database hosted in Azure.
Small example:
a small example of what I am trying to achieve:
Table Product:
Product | Description
A | Product A
Table Price
Product | FromDate | ToDate | Price
A | 01-01-20 | 31-01-20 | 100
A | 01-02-20 | 28-02-20 | 110
I need a query that will return the price according to the date returned by GETDATE().
If I run the query 15-01-20 I should get:
Product | Description | Price
A | Product A | 100
If I run the query 15-02-20 I should get:
Product | Description | Price
A | Product A | 110
and finally if I run the query 15-03-20 I will have no price in the Price table. Instead of returning null I would like to "fall back" to the most recent known price instead which in this example is 110
This is not the fastest query cause it joins products with all records with future dates. But if your tables are small, it works.
SELECT product.product, product.description, isnull(pr_curr.price, pr_fut.price) as price
FROM product
left join PRICE pr_curr on product.product=pr_curr.product
and GETDATE() BETWEEN pr_curr.FromDate and pr_curr.ToDate
left join PRICE pr_fut on product.product=pr_fut.product
and GETDATE() < pr_fut.FromDate
where pr_fut.FromDate = (
select min(FromDate) from PRICE dates
where dates.product=pr_fut.product and dates.FromDate>GETDATE()
) or pr_fut.FromDate is null
This looks like SQL Server code, which supports lateral joins via the apply keyword. Assuming you want only one match:
from product p outer apply
(select top (1) ab.*
from autobaf ab
where ab.product = p.product and
getdate() <= ab.todate
order by ab.todate desc
) ab
Note that this correlates on the product, which is not part of your question.
If that is not necessary, then you can use:
from t left join
(select top (1) ab.*
from autobaf ab
where getdate() <= ab.todate
order by ab.todate desc
) ab
on 1 = 1
If you know that there is some record in the past, then you can use cross join instead of left join and dispense with the on clause.
SELECT product.product, product.description, isnull(pr_curr.price, pr_fut.price) as price
FROM product
left join PRICE pr_curr on product.product=pr_curr.product
and GETDATE() BETWEEN pr_curr.FromDate and pr_curr.ToDate
left join PRICE pr_fut on product.product=pr_fut.product
and GETDATE() > pr_fut.FromDate
where pr_fut.FromDate = (
select max(FromDate) from PRICE dates
where dates.product=pr_fut.product and dates.FromDate<GETDATE()
) or pr_fut.FromDate is null

How to filter out conditions based on a group by in JPA?

I have a table like
| customer | profile | status | date |
| 1 | 1 | DONE | mmddyy |
| 1 | 1 | DONE | mmddyy |
In this case, I want to group by on the profile ID having max date. Profiles can be repeated. I've ruled out Java 8 streams as I have many conditions here.
I want to convert the following SQL into JPQL:
select customer, profile, status, max(date)
from tbl
group by profile, customer,status, date, column-k
having count(profile)>0 and status='DONE';
Can someone tell how can I write this query in JPQL if it is correct in SQL? If I declare columns in select it is needed in group by as well and the query results are different.
I am guessing that you want the most recent customer/profile combination that is done.
If so, the correct SQL is:
select t.*
from t
where t.date = (select max(t2.date)
from t t2
where t2.customer = t.customer and t2.profile = t.profile
) and
t.status = 'DONE';
I don't know how to convert this to JPQL, but you might as well start with working SQL code.
In your query date column not needed in group by and status='DONE' should be added with where clause
select customer, profile, status, max(date)
from tbl
where status='DONE'
group by profile, customer,status,
having count(profile)>0

Get specific row from each group

My question is very similar to this, except I want to be able to filter by some criteria.
I have a table "DOCUMENT" which looks something like this:
|ID|CONFIG_ID|STATE |MAJOR_REV|MODIFIED_ON|ELEMENT_ID|
+--+---------+----------+---------+-----------+----------+
| 1|1234 |Published | 2 |2019-04-03 | 98762 |
| 2|1234 |Draft | 1 |2019-01-02 | 98762 |
| 3|5678 |Draft | 3 |2019-01-02 | 24244 |
| 4|5678 |Published | 2 |2017-10-04 | 24244 |
| 5|5678 |Draft | 1 |2015-05-04 | 24244 |
It's actually a few more columns, but I'm trying to keep this simple.
For each CONFIG_ID, I would like to select the latest (MAX(MAJOR_REV) or MAX(MODIFIED_ON)) - but I might want to filter by additional criteria, such as state (e.g., the latest published revision of a document) and/or date (the latest revision, published or not, as of a specific date; or: all documents where a revision was published/modified within a specific date interval).
To make things more interesting, there are some other tables I want to join in.
Here's what I have so far:
SELECT
allDocs.ID,
d.CONFIG_ID,
d.[STATE],
d.MAJOR_REV,
d.MODIFIED_ON,
d.ELEMENT_ID,
f.ID FILE_ID,
f.[FILENAME],
et.COLUMN1,
e.COLUMN2
FROM DOCUMENT -- Get all document revisions
CROSS APPLY ( -- Then for each config ID, only look at the latest revision
SELECT TOP 1
ID,
MODIFIED_ON,
CONFIG_ID,
MAJOR_REV,
ELEMENT_ID,
[STATE]
FROM DOCUMENT
WHERE CONFIG_ID=allDocs.CONFIG_ID
ORDER BY MAJOR_REV desc
) as d
LEFT OUTER JOIN ELEMENT e ON e.ID = d.ELEMENT_ID
LEFT OUTER JOIN ELEMENT_TYPE et ON e.ELEMENT_TYPE_ID=et.ID
LEFT OUTER JOIN TREE t ON t.NODE_ID = d.ELEMENT_ID
OUTER APPLY ( -- This is another optional 1:1 relation, but it's wrongfully implemented as m:n
SELECT TOP 1
FILE_ID
FROM DOCUMENT_FILE_RELATION
WHERE DOCUMENT_ID=d.ID
ORDER BY MODIFIED_ON DESC
) as df -- There should never be more than 1, but we're using TOP 1 just in case, to avoid duplicates
LEFT OUTER JOIN [FILE] f on f.ID=df.FILE_ID
WHERE
allDocs.CONFIG_ID = '5678' -- Just for testing purposes
and d.state ='Released' -- One possible filter criterion, there may be others
It looks like the results are correct, but multiple identical rows are returned.
My guess is that for documents with 4 revisions, the same values are found 4 times and returned.
A simple SELECT DISTINCT would solve this, but I'd prefer to fix my query.
This would be a classic row_number & partition by question I think.
;with rows as
(
select <your-columns>,
row_number() over (partion by config_id order by <whatever you want>) as rn
from document
join <anything else>
where <whatever>
)
select * from rows where rn=1

Getting the latest entry per day / SQL Optimizing

Given the following database table, which records events (status) for different objects (id) with its timestamp:
ID | Date | Time | Status
-------------------------------
7 | 2016-10-10 | 8:23 | Passed
7 | 2016-10-10 | 8:29 | Failed
7 | 2016-10-13 | 5:23 | Passed
8 | 2016-10-09 | 5:43 | Passed
I want to get a result table using plain SQL (MS SQL) like this:
ID | Date | Status
------------------------
7 | 2016-10-10 | Failed
7 | 2016-10-13 | Passed
8 | 2016-10-09 | Passed
where the "status" is the latest entry on a day, given that at least one event for this object has been recorded.
My current solution is using "Outer Apply" and "TOP(1)" like this:
SELECT DISTINCT rn.id,
tmp.date,
tmp.status
FROM run rn OUTER apply
(SELECT rn2.date, tmp2.status AS 'status'
FROM run rn2 OUTER apply
(SELECT top(1) rn3.id, rn3.date, rn3.time, rn3.status
FROM run rn3
WHERE rn3.id = rn.id
AND rn3.date = rn2.date
ORDER BY rn3.id ASC, rn3.date + rn3.time DESC) tmp2
WHERE tmp2.status <> '' ) tmp
As far as I understand this outer apply command works like:
For every id
For every recorded day for this id
Select the newest status for this day and this id
But I'm facing performance issues, therefore I think that this solution is not adequate. Any suggestions how to solve this problem or how to optimize the sql?
Your code seems too complicated. Why not just do this?
SELECT r.id, r.date, r2.status
FROM run r OUTER APPLY
(SELECT TOP 1 r2.*
FROM run r2
WHERE r2.id = r.id AND r2.date = r.date AND r2.status <> ''
ORDER BY r2.time DESC
) r2;
For performance, I would suggest an index on run(id, date, status, time).
Using a CTE will probably be the fastest:
with cte as
(
select ID, Date, Status, row_number() over (partition by ID, Date order by Time desc) rn
from run
)
select ID, Date, Status
from cte
where rn = 1
Do not SELECT from a log table, instead, write a trigger that updates a latest_run table like:
CREATE TRIGGER tr_run_insert ON run FOR INSERT AS
BEGIN
UPDATE latest_run SET Status=INSERTED.Status WHERE ID=INSERTED.ID AND Date=INSERTED.Date
IF ##ROWCOUNT = 0
INSERT INTO latest_run (ID,Date,Status) SELECT (ID,Date,Status) FROM INSERTED
END
Then perform reads from the much shorter lastest_run table.
This will add a performance penalty on writes because you'll need two writes instead of one. But will give you much more stable response times on read. And if you do not need to SELECT from "run" table you can avoid indexing it, therefore the performance penalty of two writes is partly compensated by less indexes maintenance.