Compare table and find elements with not matching count - sql

I've two tables
Table1
And Table2
Now I want those RequestId and the count of those RequestId from Table1 which differ from that of Table2, For Example, Output should be
I can get individual count and RequestId from both the tables by the following query:
select RequestId, Count(RequestId) AS [Count] from Table1 group by RequestId
But how to compare both tables in a single query, Any help will be good and it would be good if no looping is performed until its the only way of doing it, as there are many records in both the tables, images shared here are just for understanding.

You'd use your queries as subqueries in FROM. Outer join table2, because there ain't rows for all request IDs in it:
select t1.requestid, t1.cnt - coalesce(t2.cnt, 0) as diff
from (select requestid, count(*) as cnt from table1 group by requestid) t1
left join (select requestid, count(*) as cnt from table2 group by requestid) t2
on t2.requestid = t1.requestid
where (t1.cnt - coalesce(t2.cnt, 0)) > 0;

If i didn't get you wrong then you can just use your query to check the count of RequestId in each tables and then join them by the count not even. and do the simple math:
select t1.RequestId, (t1.Count - t2.Count) as count
from
(select RequestId, Count(RequestId) AS [Count] from Table1 group by RequestId)t1
left join(select RequestId, Count(RequestId) AS [Count] from Table2 group by RequestId)t2
on t1.RequestId = t2.RequestId
where t1.Count <> t2.Count

Related

Fastest Query to find records which have equal values and specific different value

I have a table transactions with the following columns:
transactionId, systemId, subId and type
I need to find all transactionIds that have
subId and type as equals but different systemId
I tried the following query but I am not sure that it is the fastest query to use:
SELECT DISTINCT transactionId, T1.systemId system1,
T2.systemId system2, T1.subId
FROM transactions T1
INNER JOIN transactions T2
WHERE T1.subId = T2.subId
AND T1.type = T2.type
AND T1.systemId != T2.systemId
Simply use a GROUP BY with a HAVING clause:
SELECT DISTINCT transactionId
FROM t
GROUP BY transactionId, subId, type
HAVING COUNT(DISTINCT systemId) > 1
There are many factors that affect execution efficiency, and you can write all query for comparison.
such as
select *
from (
select t1.*,
count(distinct t1.systemId) over(partition by t1.type, t1.subId) cot
from transactions t1
) t1
where t1.cot > 1
;
select *
from transactions t1
where exists(
select *
from (
select v1.type, v1.subId
from transactions v1
group by v1.type, v1.subId
having count(distinct v1.systemId) > 1
) vv1
where vv1.type = t1.type
and vv1.subId = t1.subId
)
;
Depending on the dbms you are using, there will be more different query ways

combine queries into one select statement

T1:
T2:
Expected result
Query 1:
select name,
sum(qty) as total,
count(distinct ids) as head
from t1
group by 1
Query 2:
select
count(distinct ids) as staff
from t2
join t1
on t1.id = t2.id
and files>0
where
t2.type='Mutual'
Query 1 gives name,total,head,files
Query 2 gives staff
But how to combine both into a single query? Redshift doesnt seem to support "staff=(select......)" in select statement.
Example: that doesn't work
select name,
sum(qty) as total,
count(distinct ids) as head,
staff = (select
count(distinct ids)
from t2
join t1
on t1.id = t2.id
and files>0
where
t2.type='Mutual')
from t1
group by 1

tsql: alternative to select subquery in join

this is my table layout simplified:
table1: pID (pkey), data
table2: rowID (pkey), pID (fkey), data, date
I want to select some rows from table1 joining one row from table2 per pID for the most recent date for that pID.
I currently do this with the following query:
SELECT * FROM table1 as a
LEFT JOIN table2 AS b ON b.rowID = (SELECT TOP(1) rowID FROM table2 WHERE pID = a.pID ORDER BY date DESC)
This way of working is slow, probabaly because it has to do a subquery on each row of table 1. Is there a way to improve performance on this or do it another way?
You can try something on these lines, use the subquery to get the latest based on the date field (grouping by the pID), then join that with the first table, this way the subquery would not have not have to be executed for each row of Table1 and will result in better performance:
Select *
FROM Table1 a
INNER JOIN
(
SELECT pID, Max(Date) FROM Table2
GROUP BY pID
) b
ON a.pID = b.pID
I have provided the sample SQL for one column using the group by, in case you need additional columns, add them to the GROUP BY clause. Hope this helps.
use the below code, and note that i added the order by Date desc to get the most resent data
select *
from table1 a
inner join table2 b on a.pID=b.pID
where b.rowID in(select top(1) from table2 t where t.pID=a.pID order by Date desc)
I am using the code below in a similar scenaro (I transcripted it to your example)
SELECT b.*
FROM table1 AS a
left outer join (
SELECT a.*
FROM table2 a
inner join (
SELECT a.pID, max(date) as date
FROM table2
WHERE date <= <max_date>
group by pID
) b ON a.pID = b.pID AND a.date = b.date
) b ON a.pID = b.pID
) b on a.pID = b.pID
The only problem with this aproach is that you have to make sure the date's don't reapet for the pID's
You can do this with the row_number() function and a subquery:
SELECT t1.*
FROM table1 t1 LEFT JOIN
(select t2.*, row_number() over (partition by pId order by rowId desc) as seqnum
from table2 t2
) t2
on t1.pId = t2.pId and t2.seqnum = 1;
Use the ROW_NUMBER() function to get a column saying which id of each row in table 2 is the first (As partitioned by the pID, and ordered by the rowDate descending)
Example:
WITH cte AS
(
SELECT
rowID AS t2RowId,
ROW_NUMBER OVER (PARTITION BY pID ORDER BY rowDate DESC) AS rowNum
FROM table2 t2
) -- gets the t2RowIds + a column which says which is the latest for each pID
SELECT t1.*, t2.*
FROM table1 t1
LEFT JOIN
(
table2 t2
JOIN cte ON t2.rowID = cte.t2RowId AND cte.rowNum = 1
) ON t1.pID = t2.pID
This is guaranteed to only return 1 item from table2 per pID, even if multiple items have the same date. You should of course ensure that the date column is indexed in table 2 for quick performance (ideally an index that also covers the PrimaryID of table2)

Optimizing a SELECT with sub SELECT query in Oracle

Select id,
(Select sum(totalpay)
from Table2 t
where t.id = a.id
and t.transamt > 0
and t.paydt BETWEEN TRUNC(sysdate-0-7) and TRUNC(sysdate-0-1)) As Pay
from Table1 a
In spite of having indexes on transamt, paydt and id, the cost of the sub-query on Table2 is very expensive and requires a FULL TABLE scan.
Can this sub-query be optimized in any other way?
Please help.
Select t.id,
sum(totalpay) as Pay
from Table2 t join Table1
Where t.id = Table1.id
and t.transamt > 0
and t.paydt BETWEEN TRUNC(sysdate-0-7) and TRUNC(sysdate-0-1)
group by t.id
Try this:
Select a.id,
pay.totalpay
from Table1 a
(Select t.id, sum(totalpay) totalpay
from Table2 t
where t.transamt > 0
and t.paydt BETWEEN TRUNC(sysdate-0-7) and TRUNC(sysdate-0-1)
group by t.id
) As Pay
where a.id = pay.id
push group by joining columns (id column in this example) into subquery to calculate results for all values in Table2 and then join with Table1 table.
In original query you calculate result for every crow from Table1 table reading full Table2 table.

Select from table not in another table

I have two tables, each with the following fields: IDnumber, SectionNumber, Date. There is overlapping information in the two tables.
How do I select only rows that do NOT overlap (ie. in one table but not the other)?
You can use a NOT IN in your WHERE clause.
SELECT IDnumber, SectionNumber, Date
FROM table1
WHERE IDnumber NOT IN (SELECT IDnumber FROM table2)
OR NOT EXISTS
SELECT IDnumber, SectionNumber, Date
FROM table1 t1
WHERE NOT EXISTS (SELECT IDnumber FROM table2 t2 WHERE t1.IDnumber = t2.IDnumber)
Which DBMS?
If SQL Server, then it's almost what you wrote in the title...
SELECT *
FROM Table1
WHERE IDnumber NOT IN (SELECT IDnumber FROM Table2)
If you want to compare multiple columns, you need an outer join:
select table1.*
from table1 left outer join
table2
on table1.id = table2.id and
table1.SectionNumber = table2.SectionNumber and
table1.date = table2.date
where table2.id is null
In the case where you might have many matches between the tables, then the join can be inefficient. Assuming you only want those three fields, you can use a trick that avoids the join:
select id, SectionNumber, date
from ((select 0 as IsTable2, id, SectionNumber, date
from table1
) union all
(select 1 as IsTable2, id, SectionNumber, date
from table2
)
) t
group by id, SectionNumber, date
having max(isTable2) = 0
SELECT *
FROM Table1 t1 left join Table2 t2
on t1.id=t2.id
where t2.id is null