Inner join takes too much time, how can i speed it up? - sql

i have the next situation:
i had a table with a unique constraint of serie_id and user_id fields, i deleted it to try something and now i have duplicated rows (ie, two or more rows where the pair user_id AND serie_id are equals)
when trying to see the duplicated rows, i use this
SELECT t1.id
FROM table_A t1
INNER JOIN table_A t2
ON t1.serie_id = t2.serie_id AND t1.user_id = t2.user_id
WHERE t1.id < t2.id
but the table has A LOT of data so it takes too long. Is there a way to optimize it or speed it up?
edited: now im using this query to get all the ids of the duplicated rows,
SELECT id
FROM table_A a
WHERE EXISTS (SELECT 1
FROM table_A b
WHERE a.user_id = b.user_id AND a.serie_id = b.serie_id
HAVING Count(*) > 1)
Order by id desc
it also takes a lot of time, more than half an hour.
Also i want to keep, for each duplicated record, the original one, how can i exclude it from the results of this query?
I cannot use OVER or NUMBER_ROW as i saw in other comments, my version doesn't allow it
Sample Data:
id serie_id user_id
1 100 111
2 100 222
3 100 222
4 58 222
5 100 115
6 100 222
I want to delete the first two rows corresponding to the pair user_id:100 - serie_id=222
so the output would be:
id serie_id user_id
1 100 111
4 58 222
5 100 115
6 100 222

You must define Index for fields that you want to use in inner-join.
And also for fields that you want to use in WHERE clusers.
You can enable "Include Actual Execution plane" in SqlServer Managmenet Studio. SQL suggest you tips for increase performance of queries.

To see the duplicate pair, you could use a query like this:
SELECT t1.serie_id, t1.user_id, COUNT(*) CNT
FROM table_A t1
GROUP BY t1.serie_id, t1.user_id
HAVING COUNT(*) > 1
And to return the actual rows, store the result in a temporary table and join it to the source table, like:
IF OBJECT_ID('tempdb.dbo.#tmp') IS NOT NULL DROP TABLE #tmp
CREATE TABLE #tmp ( serie_id INT, user_id INT, CNT INT)
INSERT INTO #tmp( serie_id, user_id, CNT )
SELECT t1.serie_id, t1.user_id, COUNT(*) CNT
FROM table_A t1
GROUP BY t1.serie_id, t1.user_id
HAVING COUNT(*) > 1
SELECT t1.*,
FROM table_A t1 INNER JOIN #tmp tmp on tmp.serie_id = t1.serie_id and tmp.user_id = t1.user_id
Anyway, an index on the serie_id, user_id columns should help.

Related

SQL Joining two tables and removing the duplicates from the two tables but without loosing any duplicates from the tables itslef

I want to join two tables and remove duplicates from both the tables but keeping any duplicate value found in the first table.
T1
Name
-----
A
A
B
C
T2
Name
----
A
D
E
Expected result
A - > FROM T1
A - > FROM T1
B
C
D
E
I tried union but removes all duplicates of 'A' from both tables.
How can I achieve this?
Filter T2 before UNION ALL
select col
from T1
union all
select col
from T2
where not exists (select 1 from T1 where T1.col = T2.col)
Assuming you want the number of duplicates from the table with the most repetitions for each value, you can do it with the ROW_NUMBER() windowing function, to eliminate duplicates by their sequence with the set of repetitions in each table.
SELECT Name FROM (
SELECT Name, ROW_NUMBER() OVER ( PARTITION BY Name ORDER BY Name ) AS Row
FROM T1
UNION
SELECT Name, ROW_NUMBER() OVER ( PARTITION BY Name ORDER BY Name ) AS Row
FROM T2
) x
ORDER BY Name
To see how this works out, we add two B rows to T2 then do this:
SELECT Name, ROW_NUMBER() OVER ( PARTITION BY Name ORDER BY Name ) AS Row
FROM T1
Name Row
A 1
A 2
B 1
C 1
SELECT Name, ROW_NUMBER() OVER ( PARTITION BY Name ORDER BY Name ) AS Row
FROM T2
Name Row
A 1
B 1
B 2
D 1
E 1
Now UNION them without ALL to combine and eliminate duplicates:
SELECT Name, ROW_NUMBER() OVER ( PARTITION BY Name ORDER BY Name ) AS Row
FROM T1
UNION
SELECT Name, ROW_NUMBER() OVER ( PARTITION BY Name ORDER BY Name ) AS Row
FROM T2
Name Row
A 1
A 2
B 1
B 2
C 1
D 1
E 1
The final query up top is then just eliminating the Row column and sorting the result, to ensure ascending order.
See SQL Fiddle for demo.
select * from T1
union all
select * from T2 where name not in (select distinct name from T1)
Sql Fiddle Demo
you should use "union all" instead of "union".
"union" remove other duplicated records while "union all" gives all of them.
for you result,because of we filtered intersects from table 2 in "where",we don't need "UNION ALL"
select col1 from t1
union
select col1 from t2 where t2.col1 not in(select t1.col1 from t1)
I D'not know the following code is good practice or not But it's working
select name from T1
UNION
select name from T2 Where name not in (select name from T1)
The Above Query Filter the value based on T1 value and then join two tables values and show the result.
I hope it's helps you thanks.
Note : It's not better way to get result it's affect your performance.
I sure i update the better solution after my research
You want all names from T1 and all names from T2 except the names that are in T1.
So you can use UNION ALL for the 2 cases and the operator EXCEPT to filter the rows of T2:
SELECT Name FROM T1
UNION ALL
(
SELECT Name FROM T2
EXCEPT
SELECT Name FROM T1
)
See the demo.
Results:
> | Name |
> | :--- |
> | A |
> | A |
> | B |
> | C |
> | D |
> | E |

Combining access sql tables in a query side by side

I have 2 tables containing different data, linked by a column "id", except the id is repeated multiple times
For example,
Table 1:
id grade
1 A
1 C
Table 2:
Id company
1 Alpha
1 Beta
1 Charlie
The number of rows would be inconsistent, table 1 may sometimes have more/less/equal rows compared to table 2. How am I able to combine/merge them into this outcome:
id grade company
1 A Alpha
1 C Beta
1 Charlie
I am using Microsoft access' query.
This is a real pain in MS Access. But you can do it by using a subquery to generate sequence numbers. Here is one method assuming that the rows are unique:
select id, max(grade) as grade, max(company) as company
from ((select id, grade, null as company,
(select count(*)
from table1 as tt1
where tt1.id = t1.id and tt1.grade <= t1.grade
) as seqnum
from table1 as tt1
) union all
(select id, null as grade, company,
(select count(*)
from table2 as tt2
where tt2.id = t2.id and tt2.company <= t1.company
) as seqnum
from table2 as tt2
)
) t12
group by id, seqnum;
This would be much simpler in almost any other database.

PL/SQL pseudo Sequencing

I have the following scenario
ID SEQ
-- ---
123 2
123 4
What I want to be able to do is produce a list of these values and fill in the missing numbers to a maximum number say 6 for example (which I have from another source) where those number do not exist with the ID on the table.
ID NEW_SEQ
-- ---
123 1
123 2
123 3
123 4
123 5
123 6
Thanks
C
This generates a sequence of numbers from 1 through 6, cross joins with all the ids of the table to associate each of the sequence numbers with each id, then removes the already existing combinations.
SELECT t.id, s.seq
FROM (SELECT DISTINCT id FROM myTable) t
,(SELECT rownum AS seq
FROM dual
CONNECT BY LEVEL <= 6) s
MINUS
SELECT id, seq
FROM myTable
ORDER BY 1, 2
If you have a list of the numbers you want to use in OTHER_TABLE then I suggest you use an outer join, as in:
SELECT o.ID, o.NEW_SEQ
FROM OTHER_TABLE o
LEFT OUTER JOIN (SELECT ID, SEQ FROM MY_TABLE) t
ON (o.ID = t.ID AND o.NEW_SEQ = t.SEQ)
WHERE t.SEQ IS NULL
ORDER BY o.ID, o.NEW_SEQ
The outer join will include all rows from the first table (OTHER_TABLE, in this case) joined with the rows which exist from the second table (here, MY_TABLE). If there is a row in OTHER_TABLE which does not have a matching row in MY_TABLE, the fields from MY_TABLE will be NULL - thus, by checking for t.SEQ being NULL you're able to find the rows which exist in OTHER_TABLE but which are not in MY_TABLE.
SQLFiddle here.
Share and enjoy.

Is there something equivalent to putting an order by clause in a derived table?

This is sybase 15.
Here's my problem.
I have 2 tables.
t1.jobid t1.date
------------------------------
1 1/1/2012
2 4/1/2012
3 2/1/2012
4 3/1/2012
t2.jobid t2.userid t2.status
-----------------------------------------------
1 100 1
1 110 1
1 120 2
1 130 1
2 100 1
2 130 2
3 100 1
3 110 1
3 120 1
3 130 1
4 110 2
4 120 2
I want to find all the people who's status for THEIR two most recent jobs is 2.
My plan was to take the top 2 of a derived table that joined t1 and t2 and was ordered by date backwards for a given user. So the top two would be the most recent for a given user.
So that would give me that individuals most recent job numbers. Not everybody is in every job.
Then I was going to make an outer query that joined against the derived table searching for status 2's with a having a sum(status) = 4 or something like that. That would find the people with 2 status 2s.
But sybase won't let me use an order by clause in the derived table.
Any suggestions on how to go about this?
I can always write a little program to loop through all the users, but I was gonna try to make one horrendus sql out of it.
Juicy one, no?
You could rank the rows in the subquery by adding an extra column using a window function. Then select the rows that have the appropriate ranks within their groups.
I've never used Sybase, but the documentation seems to indicate that this is possible.
With Table1 As
(
Select 1 As jobid, '1/1/2012' As [date]
Union All Select 2, '4/1/2012'
Union All Select 3, '2/1/2012'
Union All Select 4, '3/1/2012'
)
, Table2 As
(
Select 1 jobid, 100 As userid, 1 as status
Union All Select 1,110,1
Union All Select 1,120,2
Union All Select 1,130,1
Union All Select 2,100,1
Union All Select 2,130,2
Union All Select 3,100,1
Union All Select 3,110,1
Union All Select 3,120,1
Union All Select 3,130,1
Union All Select 4,110,2
Union All Select 4,120,2
)
, MostRecentJobs As
(
Select T1.jobid, T1.date, T2.userid, T2.status
, Row_Number() Over ( Partition By T2.userid Order By T1.date Desc ) As JobCnt
From Table1 As T1
Join Table2 As T2
On T2.jobid = T1.jobid
)
Select *
From MostRecentJobs As M2
Where Not Exists (
Select 1
From MostRecentJobs As M1
Where M1.userid = M2.userid
And M1.JobCnt <= 2
And M1.status <> 2
)
And M2.JobCnt <= 2
I'm using a number of features here which do exist in Sybase 15. First, I'm using common-table expressions both for my sample data and clump my queries together. Second, I'm using the ranking function Row_Number to order the jobs by date.
It should be noted that in the example data you gave, no user satisfies the requirement of having their two most recent jobs both be of status "2".
__
Edit
If you are using a version of Sybase that does not support ranking functions (e.g. Sybase 15 prior to 15.2), then you need simulate the ranking function using Counts.
Create Table #JobRnks
(
jobid int not null
, userid int not null
, status int not null
, [date] datetime not null
, JobCnt int not null
, Primary Key ( jobid, userid, [date] )
)
Insert #JobRnks( jobid, userid, status, [date], JobCnt )
Select T1.jobid, T1.userid, T1.status, T1.[date], Count(T2.jobid)+ 1 As JobCnt
From (
Select T1.jobid, T2.userid, T2.status, T1.[date]
From #Table2 As T2
Join #Table1 As T1
On T1.jobid = T2.jobid
) As T1
Left Join (
Select T1.jobid, T2.userid, T2.status, T1.[date]
From #Table2 As T2
Join #Table1 As T1
On T1.jobid = T2.jobid
) As T2
On T2.userid = T1.userid
And T2.[date] < T1.[date]
Group By T1.jobid, T1.userid, T1.status, T1.[date]
Select *
From #JobRnks As J1
Where Not Exists (
Select 1
From #JobRnks As J2
Where J2.userid = J1.userid
And J2.JobCnt <= 2
And J2.status <> 2
)
And J1.JobCnt <= 2
The reason for using the temp table here is for performance and ease of reading. Technically, you could plug in the query for the temp table into the two places used as a derived table and achieve the same result.

SQL question: Getting records based on datediff from record to record

Ok, got a tricky one here... If my data looks like this:
Table1
ID Date_Created
1 1/1/2009
2 1/3/2009
3 1/5/2009
4 1/10/2009
5 1/15/2009
6 1/16/2009
How do I get the records that are 2 days apart from each other? My end result set should be rows 1-3, and 5-6. Thanks!
SELECT l.*
FROM Table1 l
INNER JOIN Table1 r ON DATEDIFF(d, l.Date_Created, r.Date_Created) = 2
AND r.Date_Created = (SELECT TOP 1 * FROM Table1 WHERE Date_Created > l.Date_Created ORDER BY Date_Create)
select distinct t1.*
from Table1 t1
inner join Table1 t2
on abs(cast(t1.Date_Created - t2.Date_Created as float)) between 1 and 2
-- what does this give you?
select DISTINCT t1.id, t1.date_created, t2.id, t2.date_created from table1 t1, table1 t2 where datediff(dd,t1.date_created,t2.date_created) = 2 AND t1.id != t2.id ORDER BY t1.id;
Would this work?
select t1.id, t2.id
from table1 t1
join table1 t2
on t2.date_created - t1.date_created <= 2
I might suggest using programming code to do it. You want to collect groups of rows (separate groups). I don't think you can solve this using a single query (which would give you just one set of rows back).
If you want to get the rows which are WITHIN 'N' days apart, you can try this:
select t1.date_created, t2.date_created
from table1 t1, table1 t2
where t1.id <> t2.id and
t2.date_created-t1.date_created between 0 and N;
for exmaple, as you said, if you want to get the rows which are WITHIN 2 days a part,
you can use the below:
select t1.date_created,t2.date_created
from table1 t1, table1.t2
where t1.id <> t2.id and
t2.date_created-t1.date_created between 0 and 2;
I hope this helps....
Regards,
Srikrishna.
A cursor will be fastest, but here is a SELECT query that will do it. Note that for "up to N" days apart instead of 2 you'll have to replace the table Two with a table of integers from 0 to N-1 (and the efficiency will get worse).
I'll admit it's not entirely clear what you want, but I'm guess you want the ranges of rows that contain at least two rows in all and within which the successive rows are at most 2 days apart. If dates increase along with IDs, this should work.
with Two as (
select 0 as offset union all select 1
), r2(ID, Date_Created_o, dr) as (
select
ID, Date_Created+offset,
Date_Created + offset - dense_rank() over (
order by Date_Created+offset
) from r cross join Two
)
select
min(ID) as start, max(ID) as finish
from r2
group by dr
having min(ID) < max(ID)
order by dr;