SQL Access query speed - sql

I have the following query in my Access 2003 database:
SELECT
Projet.OTP AS OTP,
NumeroDA,
SUM(Quantite*PrixReelCommande) AS PrixTotal,
FIRST(Fournisseur1) AS Fournisseur,
FIRST(Projet.NumeroCommandeReservation) AS NumeroCommande,
FIRST(Projet.GestionContrat) AS GestionContrat,
FIRST(Projet.Acheteur) AS Acheteur,
MIN(DateLivraisonContractuelle) AS DateLivraisonContrat,
MAX(DateFournisseurLivraison) AS DateLivraisonFournisseur,
FIRST(InfoProjet.NomInstallation) AS NomInstallation,
FIRST(InfoProjet.TitreMandat) AS TitreMandat
FROM Projet LEFT JOIN InfoProjet ON Projet.OTP=InfoProjet.OTP
WHERE NumeroDA Like "#*" And NumeroDA IN (
SELECT NumeroDA FROM Projet
WHERE NumeroCommandeReservation="" Or NumeroCommandeReservation Is Null Or NumeroCommandeReservation="0"
)
GROUP BY Projet.OTP, Projet.NumeroDA
ORDER BY Projet.OTP, Projet.NumeroDA
The table Projet has ~2500 rows and InfoProjet has only 200 rows. Opening either of this table in Access takes less than 1 second. However, executing the above query takes more than 5 seconds.
I would like to know if there is anything I can do to improve the performance of this query. Is there something in the query that I should avoid performance-wise? Or am I just under Access limitations? I guess that using Like in the subquery doesn't help, but there must be something else that slows down the query.

Since you're not using any Distincts in the subquery, could you simplify it a little by taking that part out? (I can't test this right now though, so I'm not entirely sure it would give the same results)
SELECT
Projet.OTP AS OTP,
NumeroDA,
SUM(Quantite*PrixReelCommande) AS PrixTotal,
FIRST(Fournisseur1) AS Fournisseur,
FIRST(Projet.NumeroCommandeReservation) AS NumeroCommande,
FIRST(Projet.GestionContrat) AS GestionContrat,
FIRST(Projet.Acheteur) AS Acheteur,
MIN(DateLivraisonContractuelle) AS DateLivraisonContrat,
MAX(DateFournisseurLivraison) AS DateLivraisonFournisseur,
FIRST(InfoProjet.NomInstallation) AS NomInstallation,
FIRST(InfoProjet.TitreMandat) AS TitreMandat
FROM Projet LEFT JOIN InfoProjet ON Projet.OTP=InfoProjet.OTP
WHERE NumeroDA Like "#*" And (
NumeroCommandeReservation="" Or
NumeroCommandeReservation Is Null Or
NumeroCommandeReservation="0")
GROUP BY Projet.OTP, Projet.NumeroDA
ORDER BY Projet.OTP, Projet.NumeroDA

Try running this and see how many rows it returns:
SELECT COUNT(*)
FROM Projet LEFT JOIN InfoProjet ON Projet.OTP=InfoProjet.OTP
WHERE NumeroDA Like "#*" And NumeroDA IN (
SELECT NumeroDA FROM Projet
WHERE NumeroCommandeReservation=""
Or NumeroCommandeReservation Is Null
Or NumeroCommandeReservation="0"
)
Reason: Join may be returning more rows that you'd expect, but as you have only MAX/MIN/FIRST Aggregates you may not notice.

Related

Query SQL Microsoft Access bug?

I've been working on an Access database with SQL. I was trying to perform the following query:
SELECT Produtos.produto,
[aux].[total]/[Produtos].[existencias] AS [peso consumos nas existencias]
FROM (SELECT Produtos.produto, SUM(Consumos.quantidade) AS total
FROM Consumos, Produtos, Fornecedores
WHERE Consumos.codproduto=Produtos.produto
AND Produtos.codfornecedor=9
GROUP BY Produtos.produto
ORDER BY Produtos.produto) AS aux
INNER JOIN Produtos
ON aux.produto = Produtos.produto
WHERE (((aux.produto)=[Produtos].[produto]));
A closer look at the results showed me that the column [peso consumos nas existencias] was multiplied by 10. After trying to fix this, I noticed that I was not using the table Fornecedores although I was calling it after FROM keyword, so I removed it:
SELECT Produtos.produto,
[aux].[total]/[Produtos].[existencias] AS [peso consumos nas existencias]
FROM (SELECT Produtos.produto, SUM(Consumos.quantidade) AS total
FROM Consumos, Produtos
WHERE Consumos.codproduto=Produtos.produto
AND Produtos.codfornecedor=9
GROUP BY Produtos.produto
ORDER BY Produtos.produto) AS aux
INNER JOIN Produtos
ON aux.produto = Produtos.produto
WHERE (((aux.produto)=[Produtos].[produto]));
After running, the results were right. Was this suppose to happen? if so, why?
Thanks!
Your Fornecedores table probably has 10 records.
FROM Consumos, Produtos, Fornecedores
WHERE Consumos.codproduto=Produtos.produto
was doing a cartesian product of the Consumos-Produtos join with those 10 records, so the SUM() used each number 10 times.
Note 1:
It is considered better style to use the explicit INNER JOIN syntax:
FROM Consumos INNER JOIN Produtos
ON Consumos.codproduto=Produtos.produto
WHERE Produtos.codfornecedor=9
instead of FROM Consumos, Produtos
Note 2:
If you think you have found a bug in the Access (or any database) query engine, chances are almost 100% that the bug is in your query. ;-)

sql expression to replace queries into sql producing without matching query

I have two queries where the qryAvailability1 returns dates blocked for reservation while qryAvailability2 produces the totally available dates before any reservation take place.
I combine them in a final “without matching” query to define the available dates for reservation:
qryAvailability1:
SELECT tblReservations.PropertyID, tblDates.Date
FROM tblReservations, tblDates
WHERE (((tblDates.Date) Between [tblReservations]![CheckIn] And [tblReservations]![CheckOut]));
qryAvailability2:
SELECT tblProperties.PropertyID, tblDates.Date
FROM tblProperties, tblDates;
The final “without matching” query:
SELECT qryAvailability2.PropertyID, qryAvailability2.Date
FROM qryAvailability2 LEFT JOIN qryAvailability1 ON (qryAvailability2.Date=qryAvailability1.Date) AND (qryAvailability2.PropertyID=qryAvailability1.PropertyID)
WHERE (((qryAvailability1.Date) Is Null))
ORDER BY qryAvailability2.PropertyID, qryAvailability2.Date;
Is there any way to have a single query statement into 1 query instead of three?
In other words, I need to replace the references to qryAvailability1 and qryAvailability2 with the sql statement which produce them (whatever I tried didn’t work at all).
Assuming your final query works (i haven't checked it), then to combine all three:
SELECT qryAvailability2.PropertyID, qryAvailability2.Date
FROM (
SELECT tblProperties.PropertyID, tblDates.Date FROM tblProperties, tblDates
) qryAvailability2 LEFT JOIN (
SELECT tblReservations.PropertyID, tblDates.Date
FROM tblReservations, tblDates
WHERE (((tblDates.Date) Between [tblReservations]![CheckIn] And [tblReservations]![CheckOut]))
) qryAvailability1 ON (qryAvailability2.Date=qryAvailability1.Date) AND (qryAvailability2.PropertyID=qryAvailability1.PropertyID)
WHERE (((qryAvailability1.Date) Is Null))
ORDER BY qryAvailability2.PropertyID, qryAvailability2.Date;

Timeout running SQL query

I'm trying to using the aggregation features of the django ORM to run a query on a MSSQL 2008R2 database, but I keep getting a timeout error. The query (generated by django) which fails is below. I've tried running it directs the SQL management studio and it works, but takes 3.5 min
It does look it's aggregating over a bunch of fields which it doesn't need to, but I wouldn't have though that should really cause it to take that long. The database isn't that big either, auth_user has 9 records, ticket_ticket has 1210, and ticket_watchers has 1876. Is there something I'm missing?
SELECT
[auth_user].[id],
[auth_user].[password],
[auth_user].[last_login],
[auth_user].[is_superuser],
[auth_user].[username],
[auth_user].[first_name],
[auth_user].[last_name],
[auth_user].[email],
[auth_user].[is_staff],
[auth_user].[is_active],
[auth_user].[date_joined],
COUNT([tickets_ticket].[id]) AS [tickets_captured__count],
COUNT(T3.[id]) AS [assigned_tickets__count],
COUNT([tickets_ticket_watchers].[ticket_id]) AS [tickets_watched__count]
FROM
[auth_user]
LEFT OUTER JOIN [tickets_ticket] ON ([auth_user].[id] = [tickets_ticket].[capturer_id])
LEFT OUTER JOIN [tickets_ticket] T3 ON ([auth_user].[id] = T3.[responsible_id])
LEFT OUTER JOIN [tickets_ticket_watchers] ON ([auth_user].[id] = [tickets_ticket_watchers].[user_id])
GROUP BY
[auth_user].[id],
[auth_user].[password],
[auth_user].[last_login],
[auth_user].[is_superuser],
[auth_user].[username],
[auth_user].[first_name],
[auth_user].[last_name],
[auth_user].[email],
[auth_user].[is_staff],
[auth_user].[is_active],
[auth_user].[date_joined]
HAVING
(COUNT([tickets_ticket].[id]) > 0 OR COUNT(T3.[id]) > 0 )
EDIT:
Here are the relevant indexes (excluding those not used in the query):
auth_user.id (PK)
auth_user.username (Unique)
tickets_ticket.id (PK)
tickets_ticket.capturer_id
tickets_ticket.responsible_id
tickets_ticket_watchers.id (PK)
tickets_ticket_watchers.user_id
tickets_ticket_watchers.ticket_id
EDIT 2:
After a bit of experimentation, I've found that the following query is the smallest that results in the slow execution:
SELECT
COUNT([tickets_ticket].[id]) AS [tickets_captured__count],
COUNT(T3.[id]) AS [assigned_tickets__count],
COUNT([tickets_ticket_watchers].[ticket_id]) AS [tickets_watched__count]
FROM
[auth_user]
LEFT OUTER JOIN [tickets_ticket] ON ([auth_user].[id] = [tickets_ticket].[capturer_id])
LEFT OUTER JOIN [tickets_ticket] T3 ON ([auth_user].[id] = T3.[responsible_id])
LEFT OUTER JOIN [tickets_ticket_watchers] ON ([auth_user].[id] = [tickets_ticket_watchers].[user_id])
GROUP BY
[auth_user].[id]
The weird thing is that if I comment out any two lines in the above, it runs in less that 1s, but it doesn't seem to matter which lines I remove (although obviously I can't remove a join without also removing the relevant SELECT line).
EDIT 3:
The python code which generated this is:
User.objects.annotate(
Count('tickets_captured'),
Count('assigned_tickets'),
Count('tickets_watched')
)
A look at the execution plan shows that SQL Server is first doing a cross-join on all the table, resulting in about 280 million rows, and 6Gb of data. I assume that this is where the problem lies, but why is it happening?
SQL Server is doing exactly what it was asked to do. Unfortunately, Django is not generating the right query for what you want. It looks like you need to count distinct, instead of just count: Django annotate() multiple times causes wrong answers
As for why the query works that way: The query says to join the four tables together. So say an author has 2 captured tickets, 3 assigned tickets, and 4 watched tickets, the join will return 2*3*4 tickets, one for each combination of tickets. The distinct part will remove all the duplicates.
what about this?
SELECT auth_user.*,
C1.tickets_captured__count
C2.assigned_tickets__count
C3.tickets_watched__count
FROM
auth_user
LEFT JOIN
( SELECT capturer_id, COUNT(*) AS tickets_captured__count
FROM tickets_ticket GROUP BY capturer_id ) AS C1 ON auth_user.id = C1.capturer_id
LEFT JOIN
( SELECT responsible_id, COUNT(*) AS assigned_tickets__count
FROM tickets_ticket GROUP BY responsible_id ) AS C2 ON auth_user.id = C2.responsible_id
LEFT JOIN
( SELECT user_id, COUNT(*) AS tickets_watched__count
FROM tickets_ticket_watchers GROUP BY user_id ) AS C3 ON auth_user.id = C3.user_id
WHERE C1.tickets_captured__count > 0 OR C2.assigned_tickets__count > 0
--WHERE C1.tickets_captured__count is not null OR C2.assigned_tickets__count is not null -- also works (I think with beter performance)

Why do I have to use DISTINCT for this to work?

here's my problem: I have an SQL query that makes 4 calls to a lookup table to return their values from a list of combinations in another table. I finally got this working, and for some reason, when I run the query without DISTINCT, I get a ton of data back, so I'm guessing that I'm either missing something or not doing this correctly. It would be really great if this would not only work, but also return the list alphabetically by the first colour name.
I'm putting my SQL here I hope I've explained this well enough:
SELECT DISTINCT
colour1.ColourID AS colour1_ColourID,
colour1.ColourName AS colour1_ColourName,
colour1.ColourHex AS colour1_ColourHex,
colour1.ManufacturerColourID AS colour1_ManufacturerColourID,
colour2.ColourID AS colour2_ColourID,
colour2.ColourName AS colour2_ColourName,
colour2.ColourHex AS colour2_ColourHex,
colour2.QEColourID2 AS colour2_QEColourID2,
colour3.ColourID AS colour3_ColourID,
colour3.ColourName AS colour3_ColourName,
colour3.ColourHex AS colour3_ColourHex,
colour3.QEColourID3 AS colour3_QEColourID3,
colour4.ColourID AS colour4_ColourID,
colour4.ColourName AS colour4_ColourName,
colour4.ColourHex AS colour4_ColourHex,
colour4.QEColourID4 AS colour4_QEColourID4,
Combinations.ID,
Combinations.ManufacturerColourID AS Combinations_ManufacturerColourID,
Combinations.QEColourID2 AS Combinations_QEColourID2,
Combinations.QEColourID3 AS Combinations_QEColourID3,
Combinations.QEColourID4 AS Combinations_QEColourID4,
Combinations.ColourSupplierID,
ColourSuppliers.ColourSupplier
FROM
ColourSuppliers INNER JOIN
(
colour4 INNER JOIN
(
colour3 INNER JOIN
(
colour2 INNER JOIN
(
colour1 INNER JOIN Combinations ON
colour1.ColourID=Combinations.ManufacturerColourID
) ON colour2.ColourID=Combinations.QEColourID2
) ON colour3.ColourID=Combinations.QEColourID3
) ON colour4.ColourID=Combinations.QEColourID4
) ON ColourSuppliers.ColourSupplierID=Combinations.ColourSupplierID
WHERE Combinations.ColourSupplierID = ?
Thanks
Steph
It looks as though you've probably got multiple records for each set of four colour combinations in the Combinations table - posting the structure of the table might help us to work it out.
Adding the clause order by colour1.ColourName to the end of the query should sort it alphabetically by the first colour name.
My guess (and it is a guess because your SQL query is very wide!) is that you're getting the cartesian product.

How do I best get the top 2 unique rows when a JOIN is involved?

I have this query:
select top(2)
property_id_ref
,image_file
,property_name
from property_master a
inner join image_master b
on a.property_id=b.property_id_ref
inner join customer_master c
on a.customer_id=c.customer_id
When I execute it, I get the following result:
512 ~/propertyimg/3954493 id_1.jpg Commercial Land
512 ~/propertyimg/3954493.jpg Commercial Land
But I need the output distinct property_id_ref with random image_file from the property_id_ref like this:
512 ~/propertyimg/3954493 id_1.jpg Commercial Land
513 ~/propertyimg/3119918 Id.jpg Residential Plot
For that I made a query like:
select top(2)
max(pm.property_name) as property_name
,max(im.property_id_ref) as property_id_ref
,CONVERT(varchar(5000), max( CONVERT(binary, im.image_file))) as image_file
from property_master pm
inner join image_master im
on pm.property_id=im.property_id_ref
inner join customer_master cm
on pm.customer_id=cm.customer_id
group by im.property_id_ref
So I got the same output as the one I expected. I want to know whether this is the right way to do it, or is there any other better way of doing the same thing?
I am using SQL Server 2005.
If you really only have that query you posted in example, this will work fine:
SELECT TOP (2)
pm.property_id,
pm.property_name,
(SELECT TOP 1 image_file
FROM image_master
WHERE property_id_ref = pm.property_id) AS image_file
FROM
property_master pm
-- This is only needed if it's possible that [image_file] can be NULL and you
-- don't want to get those rows.
WHERE
EXISTS (SELECT * FROM image_master
WHERE property_id_ref = pm.property_id)
I assume your query is more complex than that though, but I can't give you a more specific query unless you post your real query.
The way you do is the right one.
An group by of the ID_Ref and a random member by max.
It's completly OK and I see no reason why to change it.