How to filter out conditions based on a group by in JPA? - sql

I have a table like
| customer | profile | status | date |
| 1 | 1 | DONE | mmddyy |
| 1 | 1 | DONE | mmddyy |
In this case, I want to group by on the profile ID having max date. Profiles can be repeated. I've ruled out Java 8 streams as I have many conditions here.
I want to convert the following SQL into JPQL:
select customer, profile, status, max(date)
from tbl
group by profile, customer,status, date, column-k
having count(profile)>0 and status='DONE';
Can someone tell how can I write this query in JPQL if it is correct in SQL? If I declare columns in select it is needed in group by as well and the query results are different.

I am guessing that you want the most recent customer/profile combination that is done.
If so, the correct SQL is:
select t.*
from t
where t.date = (select max(t2.date)
from t t2
where t2.customer = t.customer and t2.profile = t.profile
) and
t.status = 'DONE';
I don't know how to convert this to JPQL, but you might as well start with working SQL code.

In your query date column not needed in group by and status='DONE' should be added with where clause
select customer, profile, status, max(date)
from tbl
where status='DONE'
group by profile, customer,status,
having count(profile)>0

Related

SQL: Selecting record where values in one field are unique based off of most recent date

I'm attempting to write an SQL statement to select records such that each record has a unique PartNo, and I want that record to be based off of the most recent ReceiveDate. I got an answer when I asked this question:
SELECT t.*
FROM Table as t
WHERE t.ReceiveDate = (SELECT MAX(t2.ReceiveDate)
FROM Table as t2
WHERE t2.PartNo = t.PartNo
);
However, this answer assumes that for each ReceiveDate, you would not have the same PartNo twice. In situations where there are multiple records with the same PartNo and ReceiveDate, it does not matter which is selected, but I only want one to be selected (PartNo must be unique)
Example:
PartNo | Vendor | Qty | ReceiveDate
100 | Bob | 2 | 2020/07/30
100 | Bob | 3 | 2020/07/30
Should only return one of these records.
I'm using Microsoft Access which uses Jet SQL which is very similar to T-SQL.
Use NOT EXISTS:
select distinct t.*
from tablename as t
where not exists (
select 1 from tablename
where partno = t.partno
and (
receivedate > t.receivedate
or (receivedate = t.receivedate and qty > t.qty)
or (receivedate = t.receivedate and qty = t.qty and vendor > t.vendor)
)
)
manually set up a standard Aggregate query (sigma icon in ribbon) where grouped on Part No and Date field is set to MAX...
run the query to check to see it returns the values you seek... then while in design view - - select SQL view and this will give you the sql statement...

Slightly different greatest-n-per-group

I have read this comment which explains the greatest-n-per-group problem and its solution. Unfortunately, I am facing a slightly different approach, and I am failing to find a solution for it.
Let's suppose I have a table with some basic info regarding users. Due to implementation, this info may or may not repeat itself:
+----+-------------------+----------------+---------------+
| id | user_name | user_name_hash | address |
+----+-------------------+----------------+---------------+
| 1 | peter_jhones | 0xFF321345 | Some Av |
| 2 | sally_whiterspoon | 0x98AB5454 | Certain St |
| 3 | mark_jackobson | 0x0102AB32 | Some Av |
| 4 | mark_jackobson | 0x0102AB32 | Particular St |
+----+-------------------+----------------+---------------+
As you can see, mark_jackobson appears twice, although its address is different in each appearance.
Every now and then, an ETL process queries new user_names and fetches the most recent records of each. Aftewards, it stores the user_name_hash in a table to sign it has already imported that certain user_name
+----------------+
| user_name_hash |
+----------------+
| 0xFF321345 |
| 0x98AB5454 |
+----------------+
Everything begins with the following query:
SELECT DISTINCT user_name_hash
FROM my_table
EXCEPT
SELECT user_name_hash
FROM my_hash_table
This way, I am able to select the new hashes from my table. Since I need to query the most recent occurrence of a hash, I wrap it as a sub-query:
SELECT MAX(id)
FROM my_table
WHERE user_name_hash IN (
SELECT DISTINCT user_name_hash
FROM my_table
EXCEPT
SELECT user_name_hash
FROM my_hash_table)
GROUP BY user_name_hash
Perfect! With the ids of my new users, I can query the addresses as follows:
SELECT
address,
user_name_hash
FROM my_table
WHERE Id IN (
SELECT MAX(id)
FROM my_table
WHERE user_name_hash IN (
SELECT DISTINCT user_name_hash
FROM my_table
EXCEPT
SELECT user_name_hash
FROM my_hash_table)
GROUP BY user_name_hash)
From my perspective, the above query works, but it does not seem optimal. Reading this comment, I noticed I could query the same data, using joins. Since I am failing to write the desired query, could anyone help me out and point me to a direction?
This is the query I have attempted, without success.
SELECT
tb1.address,
tb1.user_name_hash
FROM my_table tb1
INNER JOIN my_table tb2
ON tb1.user_name_hash = tb2.user_name_hash
LEFT JOIN my_hash_table ht
ON tb1.user_name_hash = ht.user_name_hash AND tb1.id > tb2.id
WHERE ht.user_name_hash IS NULL;
Thanks in advance.
EDIT > I am working with PostgreSQL
I believe you are looking for something like this:
SELECT
address,
user_name_hash
FROM my_table t1
JOIN (
SELECT MAX(id) maxid
FROM my_table t2
WHERE NOT EXISTS (
SELECT 1
FROM my_hash_table t3
WHERE t2.user_name_hash = t3.user_name_hash
)
GROUP BY user_name_hash
) t ON t1.ID = t.maxid
I'm using NOT EXISTS instead of EXCEPT since it is more clear to the optimizer.
You can get a better performance using a left outer join (to get the newest records not already imported) and then compute the max id for these records (subquery in the HAVING clause).
SELECT t1.address,
t1.user_name_hash,
MAX(id) AS maxid
FROM my_table t1
LEFT JOIN my_hash_table th ON t1.user_name_hash = th.user_name_hash
WHERE th.user_name_hash IS NULL
GROUP BY t1.address,
t1.user_name_hash
HAVING MAX(id) = (SELECT MAX(id)
FROM my_table t1)

Getting the latest entry per day / SQL Optimizing

Given the following database table, which records events (status) for different objects (id) with its timestamp:
ID | Date | Time | Status
-------------------------------
7 | 2016-10-10 | 8:23 | Passed
7 | 2016-10-10 | 8:29 | Failed
7 | 2016-10-13 | 5:23 | Passed
8 | 2016-10-09 | 5:43 | Passed
I want to get a result table using plain SQL (MS SQL) like this:
ID | Date | Status
------------------------
7 | 2016-10-10 | Failed
7 | 2016-10-13 | Passed
8 | 2016-10-09 | Passed
where the "status" is the latest entry on a day, given that at least one event for this object has been recorded.
My current solution is using "Outer Apply" and "TOP(1)" like this:
SELECT DISTINCT rn.id,
tmp.date,
tmp.status
FROM run rn OUTER apply
(SELECT rn2.date, tmp2.status AS 'status'
FROM run rn2 OUTER apply
(SELECT top(1) rn3.id, rn3.date, rn3.time, rn3.status
FROM run rn3
WHERE rn3.id = rn.id
AND rn3.date = rn2.date
ORDER BY rn3.id ASC, rn3.date + rn3.time DESC) tmp2
WHERE tmp2.status <> '' ) tmp
As far as I understand this outer apply command works like:
For every id
For every recorded day for this id
Select the newest status for this day and this id
But I'm facing performance issues, therefore I think that this solution is not adequate. Any suggestions how to solve this problem or how to optimize the sql?
Your code seems too complicated. Why not just do this?
SELECT r.id, r.date, r2.status
FROM run r OUTER APPLY
(SELECT TOP 1 r2.*
FROM run r2
WHERE r2.id = r.id AND r2.date = r.date AND r2.status <> ''
ORDER BY r2.time DESC
) r2;
For performance, I would suggest an index on run(id, date, status, time).
Using a CTE will probably be the fastest:
with cte as
(
select ID, Date, Status, row_number() over (partition by ID, Date order by Time desc) rn
from run
)
select ID, Date, Status
from cte
where rn = 1
Do not SELECT from a log table, instead, write a trigger that updates a latest_run table like:
CREATE TRIGGER tr_run_insert ON run FOR INSERT AS
BEGIN
UPDATE latest_run SET Status=INSERTED.Status WHERE ID=INSERTED.ID AND Date=INSERTED.Date
IF ##ROWCOUNT = 0
INSERT INTO latest_run (ID,Date,Status) SELECT (ID,Date,Status) FROM INSERTED
END
Then perform reads from the much shorter lastest_run table.
This will add a performance penalty on writes because you'll need two writes instead of one. But will give you much more stable response times on read. And if you do not need to SELECT from "run" table you can avoid indexing it, therefore the performance penalty of two writes is partly compensated by less indexes maintenance.

Self-referencing Query and Not Equals

Trying to pull data from a single table called tblTooling where two TlPartNo numbers are equal to different values and the TlToolNo are not equal for these TlPartNo . This is an Access DB and the following statement gets me close, but still gives too much data.
SELECT DISTINCT
tblTooling.TlToolNo,
tblTooling.TlPartNo,
tblTooling.TlOP,
tblTooling.TlQuantity
FROM tblTooling, tblTooling AS tblTooling_1
WHERE (((tblTooling.TlToolNo)<>tblTooling_1.TlToolNo)
AND ((tblTooling.TlPartNo)="10290722")
AND ((tblTooling_1.TlPartNo)="10295379"));
The included image has the tblTooling structure and Data. Plus the expected results from the query.
You seem to want exclude a ToolNo value when it occurs with both PartNo values. In that case you could group intermediate results by ToolNo, and see whether in such a group there is only one PartNo present (with having). In that case keep that record, and in the outer query, get the two other columns added to it:
SELECT DISTINCT
tblTooling.TlToolNo,
tblTooling.TlPartNo,
tblTooling.TlOP,
tblTooling.TlQuantity
FROM tblTooling
INNER JOIN (
SELECT TlToolNo,
Min(TlPartNo) AS MinTlPartNo,
Max(TlPartNo) AS MaxTlPartNo
FROM tblTooling
WHERE TlPartNo IN ("10290722", "10295379")
GROUP BY TlToolNo
HAVING Min(TlPartNo) = Max(TlPartNo)
) AS grp
ON grp.TlToolNo = tblTooling.TlToolNo
AND grp.MinTlPartNo = tblTooling.TlPartNo
Note that for your sample data this will return 4 rows:
TlToolNo | TlPartNo | TlOP | TlQuantity
----------+----------+------+-----------
T00012362 | 10290722 | OP10 | 2
T00012456 | 10290722 | OP10 | 1
T00013456 | 10290722 | OP20 | 1
T00014348 | 10295379 | OP20 | 1
I think you can do this with not exists:
select t.*
from tblTooling as t
where not exists (select 1
from tblTooling as t2
where t2.TlPartNo in ("10290722", "10295379") and
t2.TlToolNo = t.TlToolNo and
t2.tiid <> t.tiid
) and
t.TlPartNo in ("10290722", "10295379");
This saves on the select distinct, which should be a performance boost.

Selecting records from subquery found set (postgres)

I have a query on 2 tables (part, price). The simplified version of this query is:
SELECT price.*
FROM price
INNER JOIN parts ON (price.code = part.code )
WHERE price.type = '01'
ORDER BY date DESC
That returns several records:
code | type | date | price | file
-------------+----------+------------------------------------------------------
00065064705 | 01 | 2008-01-07 00:00:00 | 16.400000 | 28SEP2011.zip
00065064705 | 01 | 2007-02-05 00:00:00 | 15.200000 | 20JUL2011.zip
54868278900 | 01 | 2006-02-24 00:00:00 | 16.642000 | 28SEP2011.zip
As you can see, there is code 00065064705 listed twice. I just need the maxdate record (2008-01-07) along with the code, type, date and price for each unique code. So basically the top record for each unique code. This postgres so I can't use SELECT TOP or something like that.
I think I should be using this as subquery inside of a main query but I'm not sure how. something like
SELECT *
FROM price
JOIN (insert my original query here) AS price2 ON price.code = price2.code
Any help would be greatly appreciated.
You can use the row_number() window function to do that.
select *
from (SELECT price.*,
row_number() over (partition by price.code order by price.date desc) as rn
FROM price
INNER JOIN parts ON (price.code = part.code )
WHERE price.type='01') x
where rn = 1
ORDER BY date DESC
(*) Note: I may have prefixed some of the columns incorrectly, as I'm not sure which column is in which table. I'm sure you can fix that.
In Postgres you can use DISTINCT ON:
SELECT DISTINCT ON(code) *
FROM price
INNER JOIN parts ON price.code = part.code
WHERE price.type='01'
ORDER BY code, "date" DESC
select distinct on (code)
code, p.type, p.date, p.price, p.file
from
price p
inner join
parts using (code)
where p.type='01'
order by code, p.date desc
http://www.postgresql.org/docs/current/static/sql-select.html#SQL-DISTINCT