Better SQL query In SQL Server for best performance - sql

I have two tables (with same structure). One with 80 million records and other with 60 million records.
I want to delete records in 80m table that match in 60m table.
I use a sql query as below:
DELETE FROM tbl_80M
FROM tbl_80M INNER JOIN
tbl_60M ON tbl_80M.MobileNumber = tbl_60M.MobileNumber
In two tables, we have index on mobilenumber fields.
I run above query and it takes a long time .
Is there a better way to reach result in shorter time?
Note: tbl_80M has all of records that is in tbl_60M . I want to find and delete all records that are common in tbl_80M and tbl_60M.

Have you tried writing a query to insert those N million records into a new table and then drop the old tables.
Then at last you can rename new table to tbl_80M.
SELECT
* INTO tbl_NM
FROM tbl_80m a,
tbl_60m b
WHERE tbl_80M.MobileNumber = tbl_60M.MobileNumber

Related

Insert into third table where 2 tables have same Id

I have a huge database and want to process it by smaller chunks so I'm trying to write scripts and copy rows onto a temporary table, process it and then copy them back.
Now I've copied around 1000 rows into PersonMeta from old database and now want to insert corresponding rows for People table.
So basically I want to insert data from olddb.People into newdb.People where newdb.PersonMeta and newdb.People have the same code.
I've created this script but for some reason it doesn't copy all the rows. For example it copies 960 rows when it should copy 1000.
INSERT INTO [newdb].[dbo].[People] ([Id]
,[Name]
,[PersonId])
SELECT fp.[Id]
,fp.[Name]
,fp.[PersonId]
FROM [olddb].[dbo].[People] fp
INNER JOIN [newdb].[dbo].[PersonMeta] pm on
pm.PersonId = fp.PersonId
edit:
I originally wrote 100 rows where it was 1000 rows. So the query is selecting 960 (40 less)
edit 2
The People table has some duplicate values for PersonId column. I removed them and now after I run the query it copies 956 rows (4 less then before).
edit 3:
I created this fiddle and it seems to be working just fine.
However, I did some queries on the database. Turns out when I query with a RIGHT JOIN the value for those records which are not copied are all NULL. So when I run the following query:
Select fp.*, fp.personid, pm.personid
From [olddb].[dbo].[People] fp
right join [newdb].[dbo].[PersonMeta] pm on
fp.personid = pm.personid
It returns this:
Is there another approach I could try to copy the data?
There may be a NULL value in the PersonID field on either table. If so, remove/update the NULL record and try again.
First you need to check your record separately what is generating from your query.
Check the output of this query. It should create 100 rows as you are expecting.
SELECT fp.[Id]
,fp.[Name]
,fp.[PersonId]
FROM [olddb].[dbo].[People] fp
INNER JOIN [newdb].[dbo].[PersonMeta] pm on
pm.PersonId = fp.PersonId
But still if it creates more then your expected rows you may try some filters to test the result as isnull(pm.PersonId,0)<>0 and isnull(fp.PersonId,0)<>0.
So it filter out the record having personId is null, which may duplicate your record.
So final query for test was
SELECT fp.[Id]
,fp.[Name]
,fp.[PersonId]
FROM [olddb].[dbo].[People] fp
INNER JOIN [newdb].[dbo].[PersonMeta] pm on
pm.PersonId = fp.PersonId and isnull(pm.PersonId,0)<>0 and isnull(fp.PersonId,0)<>0
If still you can't figure out the issue then please share your table structure of tables which might help to understand the issue.
OK Now I feel silly but the problem was that simply not all of the rows had a corresponding value in the old People table for PersonMeta table. I thought they had it because I used the Id in the query rather than PersonId.
In short the posted query was in fact correct.
Considering you want to keep distinct and unique records in new table.
Below query will create same schema as of old table and copies all the data present in old table to new table.
select * into [newdb].[dbo].[People] from [olddb].[dbo].[People]
Now if you want to keep the data present in new table in sync with the unique records present in [newdb].[dbo].[PersonMeta]. you can simply do
delete from [newdb].[dbo].[People] where personid not in (select personid from [newdb].[dbo].[PersonMeta] )

SQL Linked Server Join

I have 2 different SQL databases "IM" and "TR". These have different schemas.
IM has the table "BAL" which has 2 columns "Account" and "Balance".
TR has the table "POS" which has 2 columns "AccountId" and "Position".
Here the common link is BAL.Account=POS.AccountId.
The POS table has > 100k records. BAL has few only records as it shows only accounts which are new.
I want to run a select query on IM Databases' BAL table as follows:
Database: IM
Select Account, Balance from BAL
However, here the "Balance" should return the results from TR Database POS.Position based on BAL.Account=POS.AccountId
How can I achieve this in the fastest manner by not slowing the databases and considering that this query will be executed by lot of users every now and then. Should I use OPENQUERY? I will use where clause to shorten the return time.

Show data difference in columns of two tables in same database

I am working with SQL Server 2008 and doing data analysis by using different queries. In my database I have 70 columns each in two different tables in same schema. The data in those tables were entered twice. Now I am comparing data of each column and showing records which have differences. Below is my query.
SELECT
[NEEF_Entry].[dbo].[tbl_TOF].Student_Class4_15,
[NEEF_Entry].[dbo].[tbl_TOF_old].Student_Class4_15
FROM
[NEEF_Entry].[dbo].[tbl_TOF]
INNER JOIN
[NEEF_Entry].[dbo].[tbl_TOF_old] ON [NEEF_Entry].[dbo].[tbl_TOF].FormID = [NEEF_Entry].[dbo].[tbl_TOF_old].FormID
WHERE
[NEEF_Entry].[dbo].[tbl_TOF].Student_Class4_15 <> [NEEF_Entry].[dbo].[tbl_TOF_old].Student_Class4_15
The join is based in the form ID which is same in both the tables. Now the column here is Student_Class4_15 in table tbl_TOF and in table tbl_TOF_old which is being compared here and the output is here
It shows what is the difference when data was entered before and after. Now the problem with this is that I have to manually replace column names of 70 columns each time which is time consuming.
What I want is that SQL query should pick all columns and compare them and return results.
I would use except to compare two tables, If the query returns no rows then the data is the same.
SELECT *
FROM table1
EXCEPT
SELECT *
FROM table2;
In case table2 has an extra rows:
SELECT *
FROM table2
EXCEPT
SELECT *
FROM table1;

Hive Multiple Joins Very Slow

I have Table A with 60 million records and Table B with 20 million records and I am joining those tables based on key columns . (inner join on a.id=b.id)
I am able to fetch the results within 10 minutes . After that for validating another column in Table C (I am joining the result generated from Table A and Table B ) with Table C(30 million records) and joining the column b with column c.(inner join on b.prfl_id=c.prfl_id)
But the query is very slow and it is running for more than 30 minutes and even then result is not generated. Any suggestion to get the results fast.
Thanks in advance for your suggestion .
Regards,
Saravanan

Drop column in DBVisualizer-query

I'm looking for a way to drop single columns from an extended query in DBVisualizer. I have a couple of tables with more than one hundred columns each. Using a number of subqueries I want to drop a single column from one of the tables, something like that:
select *
from table1 a
join table2 b
on a.key = b.key
drop b.key;
I defenitely do not want to write down all the single columns, because the query is supposed to be used for a long time and the tables get new columns to often to change the query each time.
Is there a way to do this?
Thanks
Felix