Subqueries in Python - sql

I am trying to use subqueries to run matching against multiple tables and move the unmatched records to new table.
I have written SQL subqueries but the only issue i am facing is the performace, it is taking lot of time to process.
create table UnmatchedRecord
(select a.*
from HashedValues a
where a.Address_Hash not in(select b.Address_Hash
from HashAddress b)
and a.Person_Hash not in(select d.Person_Hash
from HashPerson d)
and a.HH_Hash not in(select f.HH_Hash
from HashHH f)
and a.VehicleRegistration not in(select VehicleRegistration
from MasterReference)
and a.EmailAddress not in (select EmailAddress
from MasterReference)
and a.PhoneNumber not in (select PhoneNumber
from MasterReference)
and a.NationalInsuranceNo not in (select NationalInsuranceNo
from MasterReference))

You can at least replace four subqueries with one:
select HashedValues.*
from HashedValues
where not exists (
select *
from MasterReference
where HashedValues.VehicleRegistration = MasterReference.VehicleRegistration
or HashedValues.EmailAddress = MasterReference.EmailAddress
or HashedValues.PhoneNumber = MasterReference.PhoneNumber
or HashedValues.NationalInsuranceNo = MasterReference.NationalInsuranceNo
)
and not exists (
select *
from HashAddress
where HashedValues.Address_Hash = HashAddress.Address_Hash
)
and not exists (
select *
from HashPerson
where HashedValues.Person_Hash = HashPerson.Person_Hash
)
and not exists (
select *
from HashHH
where HashedValues.HH_Hash = HashHH.HH_Hash
)

Related

Why is my query inserting the same values when I have added a 'not exists' parameter that should avoid this from happening?

My query should stop inserting values, as the not exists statement is satisfied (I have checked both tables) and matching incidents exist in both tables, any ideas why values are still being returned?
Here is the code:
INSERT INTO
odwh_system.ead_incident_credit_control_s
(
incident
)
SELECT DISTINCT
tp.incident
FROM
odwh_data.ead_incident_status_audit_s ei
INNER JOIN odwh_data.ead_incident_s tp ON ei.incident=tp.incident
WHERE
ei.status = 6
OR
ei.status = 7
AND NOT EXISTS
(
SELECT
true
FROM
odwh_system.ead_incident_credit_control_s ead
WHERE
ead.incident = tp.incident
)
AND EXISTS
(
SELECT
true
FROM
odwh_work.ead_incident_tp_s tp
WHERE
tp.incident = ei.incident
);
dont reuse table aliases
use sane aliases
avoid AND/OR conflicts; prefer IN()
INSERT INTO odwh_system.ead_incident_credit_control_s (incident)
SELECT -- DISTINCT
tp.incident
FROM odwh_data.ead_incident_s dtp
WHERE NOT EXISTS (
SELECT *
FROM odwh_system.ead_incident_credit_control_s sic
WHERE sic.incident = dtp.incident
)
AND EXISTS (
SELECT *
FROM odwh_work.ead_incident_tp_s wtp
JOIN odwh_data.ead_incident_status_audit_s dis ON wtp.incident = dis.incident AND dis.status IN (6 ,7)
WHERE wtp.incident = dtp.incident
);

Is there a way to make this query more efficient performance wise?

This query takes a long time to run on MS Sql 2008 DB with 70GB of data.
If i run the 2 where clauses seperately it takes a lot less time.
EDIT - I need to change the 'select *' to 'delete' afterwards, please keep it in mind when answering. thanks :)
select *
From computers
Where Name in
(
select T2.Name
from
(
select Name
from computers
group by Name
having COUNT(*) > 1
) T3
join computers T2 on T3.Name = T2.Name
left join policyassociations PA on T2.PK = PA.EntityId
where (T2.EncryptionStatus = 0 or T2.EncryptionStatus is NULL) and
(PA.EntityType <> 1 or PA.EntityType is NULL)
)
OR
ClientId in
(
select substring(ClientID,11,100)
from computers
)
Swapping IN for EXISTS will help.
Also, as per Gordon's answer: UNION can out-perform OR.
SELECT computers.*
FROM computers
LEFT
JOIN policyassociations
ON policyassociations.entityid = computers.pk
WHERE (
computers.encryptionstatus = 0
OR computers.encryptionstatus IS NULL
)
AND (
policyassociations.entitytype <> 1
OR policyassociations.entitytype IS NULL
)
AND EXISTS (
SELECT name
FROM (
SELECT name
FROM computers
GROUP
BY name
HAVING Count(*) > 1
) As duplicate_computers
WHERE name = computers.name
)
UNION
SELECT *
FROM computers As c
WHERE EXISTS (
SELECT SubString(clientid, 11, 100)
FROM computers
WHERE SubString(clientid, 11, 100) = c.clientid
)
You've now updated your question asking to make this a delete.
Well the good news is that instead of the "OR" you just make two DELETE statements:
DELETE
FROM computers
LEFT
JOIN policyassociations
ON policyassociations.entityid = computers.pk
WHERE (
computers.encryptionstatus = 0
OR computers.encryptionstatus IS NULL
)
AND (
policyassociations.entitytype <> 1
OR policyassociations.entitytype IS NULL
)
AND EXISTS (
SELECT name
FROM (
SELECT name
FROM computers
GROUP
BY name
HAVING Count(*) > 1
) As duplicate_computers
WHERE name = computers.name
)
;
DELETE
FROM computers As c
WHERE EXISTS (
SELECT SubString(clientid, 11, 100)
FROM computers
WHERE SubString(clientid, 11, 100) = c.clientid
)
;
Some things I would look at are
1. are indexes in place?
2. 'IN' will slow your query, try replacing it with joins,
3. you should use column name, I guess 'Name' in this case, while using count(*),
4. try selecting required data only, by selecting particular columns.
Hope this helps!
or can be poorly optimized sometimes. In this case, you can just split the query into two subqueries, and combine them using union:
select *
From computers
Where Name in
(
select T2.Name
from
(
select Name
from computers
group by Name
having COUNT(*) > 1
) T3
join computers T2 on T3.Name = T2.Name
left join policyassociations PA on T2.PK = PA.EntityId
where (T2.EncryptionStatus = 0 or T2.EncryptionStatus is NULL) and
(PA.EntityType <> 1 or PA.EntityType is NULL)
)
UNION
select *
From computers
WHERE ClientId in
(
select substring(ClientID,11,100)
from computers
);
You might also be able to improve performance by replacing the subqueries with explicit joins. However, this seems like the shortest route to better performance.
EDIT:
I think the version with join's is:
select c.*
From computers c left outer join
(select c.Name
from (select c.*, count(*) over (partition by Name) as cnt
from computers c
) c left join
policyassociations PA
on T2.PK = PA.EntityId and PA.EntityType <> 1
where (c.EncryptionStatus = 0 or c.EncryptionStatus is NULL) and
c.cnt > 1
) cpa
on c.Name = cpa.Name left outer join
(select substring(ClientID, 11, 100) as name
from computers
) csub
on c.Name = csub.name
Where cpa.Name is not null or csub.Name is not null;

Why does this NOT IN query work as intended, but not this NOT EXISTS query?

Working (NOT IN) retrieves 3 rows:
select DISTINCT d.* from Device d , Company c3
WHERE d.deviceid NOT IN
(
Select d1.deviceid from Device d1, Clone x1
WHERE d1.deviceid = x1.deviceID
AND
(
x1.XPath = 'hi'
OR x1.XPath = 'bye'
)
AND
(
EXISTS ( select * from (SELECT * FROM [dbo].[Split] ('T130SF0W2050', ',')) as s
WHERE x1.Value like '%' + s.items + '%' )
)
)
AND
d.companyid = c3.companyid and c3.companynumber in (SELECT * FROM [dbo].[Split] ('00223200', ','))
Not Working(not exists):
select DISTINCT d.* from Device d , Company c3
WHERE NOT EXISTS
(Select * from Device d1, Clone x1
WHERE d1.deviceid = x1.deviceID
AND
(
x1.XPath = 'hi'
OR x1.XPath = 'bye'
)
AND
(
EXISTS ( select * from (SELECT * FROM [dbo].[Split] ('T130SF0W2050', ',')) as s
WHERE x1.Value like '%' + s.items + '%' )
)
)
AND
d.companyid = c3.companyid and c3.companynumber in (SELECT * FROM [dbo].[Split] ('00223200', ','))
I'm unsure I'm using the exists syntax correct, what should I select from the subquery? I've tried a few different combinations. It won't run if I put WHERE d.deviceid NOT EXISTS
Solution (thanks to Nikola):
add AND d1.deviceid = d.deviceid inside the Exists subquery.
The difference is that the NOT IN query returns devices that match the company and don't match the inner query specification.
For the NOT EXIST query to work as written (where "work as written" refers to returning the same result as the top query), there can't be any devices that exist matching the inner query. If any devices match the inner query at all, the query won't return any results.

SQL show records that don't exist in my table variable

I have a table variable that holds orderID, UnitID and OrderServiceId (it is already populated via a query with insert statement).
I then have a query under this that returns 15 columns which also include the OrderId, UnitId, OrderServiceId
I need to only return the rows from this query where the same combination of OrderId, UnitId, and OrderServiceId are not in the table variable.
You can use NOT EXISTS. e.g.
FROM YourQuery q
WHERE NOT EXISTS
(
SELECT * FROM #TableVar t
WHERE t.OrderId = q.OrderId
and t.UnitId = q.UnitId
and t.OrderServiceId=q.OrderServiceId
)
select q.*
from (
MyQuery
) q
left outer join MyTableVariable t on q.ORDERID = t.ORDERID
and q.UNITID= t.UNITID
and q.ORDERSERVICESID = t.ORDERSERVICESID
where t.ORDERID is null
You can use EXCEPT | INTERSECT operators for this (link).
Example:
(select 3,4,1
union all
select 2,4,1)
intersect
(select 1,2,9
union all
select 3,4,1)

MySQL/SQL - When are the results of a sub-query avaliable?

Suppose I have this query
SELECT * FROM (
SELECT * FROM table_a
WHERE id > 10 )
AS a_results LEFT JOIN
(SELECT * from table_b
WHERE id IN
(SElECT id FROM a_results)
ON (a_results.id = b_results.id)
I would get the error "a_results is not a table". Anywhere I could use the re-use the results of the subquery?
Edit: It has been noted that this query doesn't make sense...it doesn't, yes. This is just to illustrate the question which I am asking; the 'real' query actually looks something like this:
SELECT SQL_CALC_FOUND_ROWS * FROM
( SELECT wp_pod_tbl_hotel . *
FROM wp_pod_tbl_hotel, wp_pod_rel, wp_pod
WHERE wp_pod_rel.field_id =12
AND wp_pod_rel.tbl_row_id =1
AND wp_pod.tbl_row_id = wp_pod_tbl_hotel.id
AND wp_pod_rel.pod_id = wp_pod.id
) as
found_hotel LEFT JOIN (
SELECT COUNT(*) as review_count, avg( (
location_rating + staff_performance_rating + condition_rating + room_comfort_rating + food_rating + value_rating
) /6 ) AS average_score, hotelid
FROM (
SELECT r. * , wp_pod_rel.tbl_row_id AS hotelid
FROM wp_pod_tbl_review r, wp_pod_rel, wp_pod
WHERE wp_pod_rel.field_id =11
AND wp_pod_rel.pod_id = wp_pod.id
AND r.id = wp_pod.tbl_row_id
AND wp_pod_rel.tbl_row_id
IN (
SELECT wp_pod_tbl_hotel .id
FROM wp_pod_tbl_hotel, wp_pod_rel, wp_pod
WHERE wp_pod_rel.field_id =12
AND wp_pod_rel.tbl_row_id =1
AND wp_pod.tbl_row_id = wp_pod_tbl_hotel.id
AND wp_pod_rel.pod_id = wp_pod.id
)
) AS hotel_reviews
GROUP BY hotel_reviews.hotelid
ORDER BY average_score DESC
AS sorted_hotel ON (id = sorted_hotel.hotelid)
As you can see, the sub-query which makes up the found_query table is repeated elsewhere downward as another sub-query, so I was hoping to re-use the results
You can not use a sub-query like this.
I'm not sure I understand your query, but wouldn't that be sufficient?
SELECT * FROM table_a a
LEFT JOIN table_b b ON ( b.id = a.id )
WHERE a.id > 10
It would return all rows from table_a where id > 10 and LEFT JOIN rows from table_b where id matches.