Max & Distinct based on several field

Max & Distinct based on several field - sql

Based on the raw data below and expected result, I need help on how to come up with the correct query.
Basically, I need that data based on max(ID), however, do note that RATING and TYPE field could be different hence Group By wouldn't work.
Thank you.

You seem to want the highest id for each NumId. You can do this using row_number():
select t.*
from (select t.*, row_number() over (partition by NumId order by id desc) as seqnum
from t
) t
where t.seqnum = 1;

Use NOT EXISTS to return a row if no other row has the same NumTitle but a higher ID value:
select t1.*
from tablename t1
where not exists (select 1 from tablename t2
where t2.NumTitle = t1.NumTitle
and t2.ID > t1.ID)
Or, JOIN version:
select t1.*
from tablename t1
join (select NumTitle, MAX(ID) from tablename group by NumTitle) t2
on t2.NumTitle = t1.NumTitle and t2.ID = t1.ID

Related

How to implement a LEFT OUTER JOIN CLAUSE after WITH AS?

Currently trying to figure out how to implement a SQL LEFT OUTER JOIN while using the SQL WITH AS clause. My code breaks down into 3 SELECT statements while using the same table, then using LEFT OUTER JOIN to merge another table on the id.
I need 3 SELECT statements before joining because I need a SELECT statement to grab the needed columns, ROW RANK the time, and set WHERE clause for the ROW RANK.
SELECT *
(
WITH employee AS
(
SELECT id, name, department, code, time, reporttime, scheduled_time
FROM table1 AS a
WHERE department = "END"
),
employe_v2 as
(
SELECT address
,ROW_NUMBER() OVER (PARTITION BY id ORDER BY time desc, reporttime desc, scheduled_time desc) AS row_rank
FROM table1 AS b
)
SELECT *
FROM employee, employee_v2
WHERE row_rank = 1
) t1
LEFT OUTER JOIN
(
SELECT b.id, b.new_code, b.date
FROM table2 AS b
WHERE b.newcode != "A"
) t2
ON t1.id = t2.id
Group BY t1.id, t1.name, t1.department, t1.code, t1.time, t1.reporttime,
t1.scheduled_time, t1.row_rank, t2.id, t2.new_code, t2.date
How I could fix my code?

not sure if group by is needed, i see no aggregation whatsover
but if it's something you need , you can add at the end of final select and ofcourse you have to take care of columns/aggregation in select
nevertheless you can simplify your query as below :
with employee as (
select * from (
select id, name, department, code, time, reporttime, scheduled_time, address
,row_number() over (partition by id order by time desc, reporttime desc, scheduled_time desc) AS row_rank
from table1
) t where row_rank =1
)
select t1.*, b.id, b.new_code, b.date
from employee t1
left join table2 as t2
on t1.id = t2.id
where t2.newcode != "A"

where column in from another select results with limit (mysql/mariadb)

when i run this query returns all rows that their id exist in select from table2
SELECT * FROM table1 WHERE id in (
SELECT id FROM table2 where name ='aaa'
)
but when i add limit or between to second select :
SELECT * FROM table1 WHERE id in (
SELECT id FROM table2 where name ='aaa' limit 4
)
returns this error :
This version of MariaDB doesn't yet support 'LIMIT & IN/ALL/ANY/SOME subquery'

You are using LIMIT without an ORDER BY. This is generally not recommended because that returns an arbitrary set of rows -- and those can change from one execution to another.
You can convert this to a JOIN -- fortunately. If id is not duplicated in table2:
SELECT t1.*
FROM table1 t1 JOIN
(SELECT t2.id
FROM table2 t2
WHERE t2.name = 'aaa'
LIMIT 4
) t2
USING (id);
If id can be duplicated in table2, then:
SELECT t1.*
FROM table1 t1 JOIN
(SELECT DISTINCT t2.id
FROM table2 t2
WHERE t2.name = 'aaa'
LIMIT 4
) t2
USING (id);
Another fun way uses LIMIT:
SELECT t1.*
FROM table1 t1
WHERE id <= ANY (SELECT t2.id
FROM table2
WHERE t2.name = 'aaa'
ORDER BY t2.id
LIMIT 1 OFFSET 3
);
LIMIT is allowed in a scalar subquery.

You can use an analytic function such as ROW_NUMBER() in order to return one row from the subquery. I suppose, this way no problem would occur like raising too many rows issue :
SELECT * FROM
(
SELECT t1.*,
ROW_NUMBER() OVER (ORDER BY t2.id DESC) AS rn
FROM table1 t1
JOIN table2 t2 ON t2.id = t1.id
WHERE t2.name ='aaa'
) t
WHERE rn = 1
P.S.: Btw, id columns are expected to be primary keys of your tables, aren't they ?
Update ( depending on your need in the comment ) Consider using :
SELECT * FROM
(
SELECT j.*,
ROW_NUMBER() OVER (ORDER BY j.id DESC) AS rn2
FROM job_forum j
CROSS JOIN
( SELECT t.*,
ROW_NUMBER() OVER (PARTITION BY t2.id ORDER BY t2.id DESC) AS rn1
FROM table2 t2
WHERE t2.name ='aaa'
AND t2.id = j.id ) t2
WHERE rn1 = 1
) jj
WHERE rn2 <= 10

#1093 Table 'table' is specified twice, both as a target for 'DELETE' and as a separate source for data

not getting desire result in mysql query
i have searched a lot but didn't find solution
DELETE FROM table1 WHERE username NOT IN (select t1.id from table1 as t1
inner join table1 as t2 on t1.username = t2.username and t1.id <= t2.id
group by t1.username , t1.id having count(*) <= 5 order by t1.username , t1.id desc);
Output is as follows:-

should work for you
delete from
table1
where
id in ( select
id
from
( select
*,
row_number() over( partition by
username
order by
id desc) as rn
from
table1)
where
rn > 5)

You seem to want to keep the most recent five ids for each user name. I think the simplest method uses window functions:
delete t1
from table1 t1 join
(select t1.*, row_number() over (partition by username order by id desc) as seqnum
from table1 t1
) tt1
on t1.username = tt1.username and
t1.id = tt1.id
where tt1.seqnum > 5;

How to find difference in rows for two tables in DB2 or SQL Server

In the table1 I have 1421144 rows and table2 has 1421134 rows.
I tried this query, but I don't get any rows returned.
select table1.ID
from table1
where ID not in (select ID from table2)
I have also used this query:
select ID from table1
except
select ID from table2
But I don't get any rows. Please help me, if the table1 has duplicates how can I get those duplicates?

Assuming ids are unique, you can use full outer join in either database:
select coalesce(t1.id, t2.id) as id,
(case when t1.id is null then 'T2 only' else 'T1 only' end)
from t1 full outer join
t2
on t1.id = t2.id
where t1.id is null or t2.id is null;
It is quite possible that the two tables have the same sets of ids, but there are duplicates. Try this:
select t1.id, count(*)
from t1
group by t1.id
having count(*) > 1;
and
select t2.id, count(*)
from t2
group by t2.id
having count(*) > 1;

If you have duplicates, try:
WITH Dups AS(
SELECT ID, COUNT(ID) OVER (PARTITION BY ID) AS DupCount
FROM Table1)
SELECT *
FROM Dups
WHERE DupCount > 1;
If you need to delete the dups, you can use the following syntax:
WITH Dups AS(
SELECT ID, ROW_NUMBER() OVER (PARTITION BY ID ORDER BY ID) AS DupCount
FROM Table1)
DELETE FROM Dups
WHERE DupCount > 1;
Obviously, however, check the data before you run a DELETE statement you got from a random on the internet. ;)

I Guess u have data type mismatch between 2 tables, cast them to integers and try your first query
select table1.ID from table1
where cast(ID as int) not in (select cast(ID as int) from table2)
If you have stored in a different format than int, cast them to varchar and
try with this datatype.
Not in takes longer to execute, use left join instead
select t1id from
(
select t1.id t1Id, t2.Id t2Id from table1 left join table2
on cast(t1.id as int) = cast(t2.id as int)
) x where t2Id is null

tsql: alternative to select subquery in join

this is my table layout simplified:
table1: pID (pkey), data
table2: rowID (pkey), pID (fkey), data, date
I want to select some rows from table1 joining one row from table2 per pID for the most recent date for that pID.
I currently do this with the following query:
SELECT * FROM table1 as a
LEFT JOIN table2 AS b ON b.rowID = (SELECT TOP(1) rowID FROM table2 WHERE pID = a.pID ORDER BY date DESC)
This way of working is slow, probabaly because it has to do a subquery on each row of table 1. Is there a way to improve performance on this or do it another way?

You can try something on these lines, use the subquery to get the latest based on the date field (grouping by the pID), then join that with the first table, this way the subquery would not have not have to be executed for each row of Table1 and will result in better performance:
Select *
FROM Table1 a
INNER JOIN
(
SELECT pID, Max(Date) FROM Table2
GROUP BY pID
) b
ON a.pID = b.pID
I have provided the sample SQL for one column using the group by, in case you need additional columns, add them to the GROUP BY clause. Hope this helps.

use the below code, and note that i added the order by Date desc to get the most resent data
select *
from table1 a
inner join table2 b on a.pID=b.pID
where b.rowID in(select top(1) from table2 t where t.pID=a.pID order by Date desc)

I am using the code below in a similar scenaro (I transcripted it to your example)
SELECT b.*
FROM table1 AS a
left outer join (
SELECT a.*
FROM table2 a
inner join (
SELECT a.pID, max(date) as date
FROM table2
WHERE date <= <max_date>
group by pID
) b ON a.pID = b.pID AND a.date = b.date
) b ON a.pID = b.pID
) b on a.pID = b.pID
The only problem with this aproach is that you have to make sure the date's don't reapet for the pID's

You can do this with the row_number() function and a subquery:
SELECT t1.*
FROM table1 t1 LEFT JOIN
(select t2.*, row_number() over (partition by pId order by rowId desc) as seqnum
from table2 t2
) t2
on t1.pId = t2.pId and t2.seqnum = 1;

Use the ROW_NUMBER() function to get a column saying which id of each row in table 2 is the first (As partitioned by the pID, and ordered by the rowDate descending)
Example:
WITH cte AS
(
SELECT
rowID AS t2RowId,
ROW_NUMBER OVER (PARTITION BY pID ORDER BY rowDate DESC) AS rowNum
FROM table2 t2
) -- gets the t2RowIds + a column which says which is the latest for each pID
SELECT t1.*, t2.*
FROM table1 t1
LEFT JOIN
(
table2 t2
JOIN cte ON t2.rowID = cte.t2RowId AND cte.rowNum = 1
) ON t1.pID = t2.pID
This is guaranteed to only return 1 item from table2 per pID, even if multiple items have the same date. You should of course ensure that the date column is indexed in table 2 for quick performance (ideally an index that also covers the PrimaryID of table2)

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Max & Distinct based on several field - sql

Based on the raw data below and expected result, I need help on how to come up with the correct query. Basically, I need that data based on max(ID), however, do note that RATING and TYPE field could be different hence Group By wouldn't work. Thank you.

You seem to want the highest id for each NumId. You can do this using row_number(): select t.* from (select t.*, row_number() over (partition by NumId order by id desc) as seqnum from t ) t where t.seqnum = 1;

Related

How to implement a LEFT OUTER JOIN CLAUSE after WITH AS?

where column in from another select results with limit (mysql/mariadb)

#1093 Table 'table' is specified twice, both as a target for 'DELETE' and as a separate source for data

How to find difference in rows for two tables in DB2 or SQL Server

tsql: alternative to select subquery in join

Categories

Resources