How Can I Optimize an SQL Query With Multiple Conditions and Inner Join - sql

How can I optimize the SQL query below? It takes too much time. This is used to select from a very large astronomical table (a few hundred million objects), and this query runs over 12 hours which is more than the maximum allow query time. It appears that adding more restrictions via additional conditions doesn't help at all.
Your help is much appreciated!
select AVG(o.zMeanPSFMagStd) into mydb.z1819
from MeanObject as o
join ObjectThin ot on ot.ObjID=o.ObjID
inner join StackObjectAttributes soa on soa.objID=o.objid
where o.zMeanPSFMag>=18
and o.zMeanPSFMag<19
and o.zQfPerfect > 0.85
and (ot.qualityFlag & 1) = 0
and (ot.qualityFlag & 2) = 0
and o.zMeanPSFMagErr <> -999
and o.zMeanPSFMagStd <> -999
and ot.nz > 10 and soa.zpsfMajorFWHM < 6
and soa.zpsfMinorFWHM/nullif(soa.zpsfMajorFWHM,0) > 0.65
and soa.zpsfFlux/nullif(soa.zpsfFluxErr,0)>5
and (ot.b>10 or ot.b<-10)
and (ot.raMean>0 and ot.raMean<180)

Related

Clickhouse join with condition

I found strange thing, the query:
SELECT *
FROM progress as pp
ALL LEFT JOIN links as ll USING (viewId)
WHERE viewId = 'a776a2f2-16ad-448a-858d-891e68bec9a8'
Result: 0 rows in set. Elapsed: 5.267 sec. Processed 8.62 million rows, 484.94 MB (1.64 million rows/s., 92.08 MB/s.)
Here modified query:
SELECT *
FROM
(SELECT *
FROM progress
WHERE viewId = 'a776a2f2-16ad-448a-858d-891e68bec9a8') AS p ALL
LEFT JOIN
(SELECT *
FROM links
WHERE viewId = toUUID('a776a2f2-16ad-448a-858d-891e68bec9a8')) AS l ON p.viewId = l.viewId;
Result : 0 rows in set. Elapsed: 0.076 sec. Processed 4.48 million rows, 161.35 MB (58.69 million rows/s., 2.12 GB/s.)
But it looks dirty.
Isn't it supposed to optimize the query considering where condition?
What is the right way to write the query here, and what if it will be where in?
Then I try to add another join:
SELECT *
FROM
(SELECT videoUuid AS contentUuid,
viewId
FROM
(SELECT *
FROM progress
WHERE viewId = 'a776a2f2-16ad-448a-858d-891e68bec9a8') p ALL
LEFT JOIN
(SELECT *
FROM links
WHERE viewId = toUUID('a776a2f2-16ad-448a-858d-891e68bec9a8')) USING `viewId`) ALL
LEFT JOIN `metaInfo` USING `viewId`,
`contentUuid`;
The result again very slow, considering that I want just join 3 tables with condition selection one row:
0 rows in set. Elapsed: 1.747 sec. Processed 9.13 million rows, 726.55 MB (5.22 million rows/s., 415.85 MB/s.)
At this moment the CH not very good cope with multi-joins queries (DB star-schema) and the query optimizer not good enough to rely on it completely.
So it needs to explicitly say how to 'execute' a query by using subqueries instead of joins.
Consider the test query:
SELECT table_01.number AS r
FROM numbers(87654321) AS table_01
INNER JOIN numbers(7654321) AS table_02 ON (table_01.number = table_02.number)
INNER JOIN numbers(654321) AS table_03 ON (table_02.number = table_03.number)
INNER JOIN numbers(54321) AS table_04 ON (table_03.number = table_04.number)
WHERE r = 54320
/*
┌─────r─┐
│ 54320 │
└───────┘
1 rows in set. Elapsed: 6.261 sec. Processed 96.06 million rows, 768.52 MB (15.34 million rows/s., 122.74 MB/s.)
*/
Let's rewrite it using subqueries to significantly speed it up.
SELECT number AS r
FROM numbers(87654321)
WHERE r = 54320 AND number IN (
SELECT number AS r
FROM numbers(7654321)
WHERE r = 54320 AND number IN (
SELECT number AS r
FROM numbers(654321)
WHERE r = 54320 AND number IN (
SELECT number AS r
FROM numbers(54321)
WHERE r = 54320
)
)
)
/*
┌─────r─┐
│ 54320 │
└───────┘
1 rows in set. Elapsed: 0.481 sec. Processed 96.06 million rows, 768.52 MB (199.69 million rows/s., 1.60 GB/s.)
*/
There are other ways to optimize JOIN:
use External dictionary to get rid of join on 'small'-table
use Join table engine
use ANY-strictness
use specific settings like join_algorithm, partial_merge_join_optimizations etc
Some useful refs:
Altinity webinar: Tips and tricks every ClickHouse user should know
Altinity webinar: Secrets of ClickHouse Query Performance
Isn't it supposed to optimize the query concidering where condition?
Such optimization is not implemented yet
It is expected behavior.
According to CH doc https://clickhouse.tech/docs/en/sql-reference/statements/select/join/#performance "When running a JOIN, there is no optimization of the order of execution in relation to other stages of the query. The join (a search in the right table) is run before filtering in WHERE and before aggregation."

Select from view takes too long

I have a query against a table that contains like 2 million rows using linked server.
Select * from OPENQUERY(LinkedServerName,
'SELECT
PV.col1
,PV.col2
,PV.col3
,VTR.col1
,CTR.col1
,PSR.col1
FROM
LinkedDbName.dbo.tbl1 PV
INNER JOIN LinkedDbName.dbo.tbl2 VTR
ON PV.col_id = VTR.col_id
INNER JOIN LinkedDbName.dbo.tbl3 CTR
ON PV.col_id = CTR.col_id
INNER JOIN LinkedDbName.dbo.tbl4 PSR
ON PV.col_id = PSR.col_id
WHERE
PV.col_id = ''80C53C9B-6272-11DA-BB34-000E0C7F3ED2''')
That query results into 365 rows and is executed within 0 second.
However when I make that query into a view it runs for about minimum of 20 seconds and sometimes it reaches to 40 seconds tops.
Here's my create view script
CREATE VIEW [dbo].[myview]
AS
Select * from OPENQUERY(LinkedServerName,
'SELECT
PV.col1
,PV.col2
,PV.col3
,VTR.col1
,CTR.col1
,PSR.col1
FROM
LinkedDbName.dbo.tbl1 PV
INNER JOIN LinkedDbName.dbo.tbl2 VTR
ON PV.col_id = VTR.col_id
INNER JOIN LinkedDbName.dbo.tbl3 CTR
ON PV.col_id = CTR.col_id
INNER JOIN LinkedDbName.dbo.tbl4 PSR
ON PV.col_id = PSR.col_id')
then
Select * from myview where PV.col_id = '80C53C9B-6272-11DA-BB34-000E0C7F3ED2'
Any idea ? Thanks !
Your queries are quite different. In the first, the where clause is part of the SQL statement passed to OPENQUERY(). This has two important effects:
The amount of data returned is much smaller, only being the rows that match the condition.
The query can be optimized with the WHERE clause.
If you need to share the table, I might suggest that you make a copy on the local server -- either using replication or scheduling a job to copy it over.

How can I optimize this SQL query? (Solarwinds Orion)

I'm very new to SQL, and still learning. I'm using a reporting tool called Solarwinds Orion, and I'm honestly not sure how specific the query I have written is to the program, so if there's anything in the query that's confusing, let me know and I'll try to figure out if it's specific to the program or not.
The problem with the query I'm running is that it times out after a very long time (maybe an hour) of running. The database I'm using is huge. Unfortunately I don't really know how huge, but I've been told it's huge.
Is there anything I am doing wrong that would have a huge performance impact?
SELECT TOP 10000
Nodes.Caption AS NodeName,
NetflowApplicationSummary.AppName AS Application_Name,
SUM(NetflowApplicationSummary.TotalBytes) AS SUM_of_Bytes_Transferred,
AVG(Case OutBandwidth
When 0 Then 0
Else (NetflowApplicationSummary.TotalBytes/OutBandwidth) * 100
End) AS TEST_PERCENT
FROM
((NetflowApplicationSummary
INNER JOIN Nodes ON (NetflowApplicationSummary.NodeID = Nodes.NodeID))
INNER JOIN InterfaceTraffic ON (Nodes.NodeID = InterfaceTraffic.InterfaceID))
INNER JOIN Interfaces ON (Nodes.NodeID = Interfaces.NodeID)
WHERE
( InterfaceTraffic.DateTime > (GetDate()-30) )
AND
(Nodes.WANCircuit = 1)
GROUP BY Nodes.Caption, NetflowApplicationSummary.AppName
EDIT: I ran COUNT() on each of my tables with the below result.
SELECT COUNT(*) FROM NetflowApplicationSummary # 50671011
SELECT COUNT(*) FROM Nodes # 898
SELECT COUNT(*) FROM InterfaceTraffic # 18000166
SELECT COUNT(*) FROM Interfaces # 3938
# Total : 68,676,013
I really have no idea if 68 million items is a huge database to be honest.
A couple of notes:
The INNER JOIN operator is associative, so get rid of those parenthesis in the FROM clause and let the optimizer figure out the best join order.
You may have an implied cursor from the getdate() function being called for every row. Store the value in a local variable and compare to that.
The resulting SQL should look like this:
DECLARE #Date as datetime = getdate() - 30;
SELECT TOP 10000
Nodes.Caption AS NodeName,
NetflowApplicationSummary.AppName AS Application_Name,
SUM(NetflowApplicationSummary.TotalBytes) AS SUM_of_Bytes_Transferred,
AVG(Case OutBandwidth
When 0 Then 0
Else (NetflowApplicationSummary.TotalBytes/OutBandwidth) * 100
End) AS TEST_PERCENT
FROM NetflowApplicationSummary
INNER JOIN Nodes ON NetflowApplicationSummary.NodeID = Nodes.NodeID
INNER JOIN InterfaceTraffic ON Nodes.NodeID = InterfaceTraffic.InterfaceID
INNER JOIN Interfaces ON Nodes.NodeID = Interfaces.NodeID
WHERE InterfaceTraffic.DateTime > #Date
AND Nodes.WANCircuit = 1
GROUP BY Nodes.Caption, NetflowApplicationSummary.AppName
Also, make sure you have an index on table InterfaceTraffic with a leading field of DateTime. If this doesn't exist you may need to pay the penalty of a first time creation of it.
If this doesn't help, then you may need to post the execution plan where it can be inspected.
Out of interest, also perform a count() on all four tables and post that result, just so members here can make their own assessment of how big your database really is. It is amazing how many non-technical people still think a 1 or 10 GB database is huge, while I run that easily on my workstation!

Improvement in SQL Query - Complex Join / IN / Exists

Looking for suggestions on speeding up this query. Access 2007.
The inner query takes a few minutes , full query takes very long (40 - 80 minutes). Result is as expected. Everything is Indexed.
SELECT qtdetails.F5, qtdetails.F16, ExpectedResult.DLID, ExpectedResult.NumRows
FROM qtdetails
INNER JOIN (INVDL
INNER JOIN ExpectedResult
ON INVDL.DLID = ExpectedResult.DLID)
ON (qtdetails.F1 = INVDL.RegionCode)
AND (qtdetails.RoundTotal = ExpectedResult.RoundTotal)
WHERE
(qtdetails.F5 IN (SELECT qtdetails.F5
FROM (ExpectedResult
INNER JOIN INVDL
ON ExpectedResult.DLID = INVDL.DLID)
INNER JOIN qtdetails
ON (INVDL.RegionCode = qtdetails.F1)
AND (ExpectedResult.RoundTotal = qtdetails.RoundTotal)
GROUP BY qtdetails.F5
HAVING (((COUNT(ExpectedResult.DLID)) < 2));
)
);
INVDL - 80,000 records
ExpectedResult - Ten Million records
qtDetails - 12,000 records
The Inner Query will result in around 5000 - 8000 records.
Tried saving the results of the Inner Query in a table. and then using
Select F5 from qTempTable instead. But still taking a lot of time.
Any help would be very highly appreciated.
Data Type :
qtdetails.F5 = Number
qtdetails.F16 = Text
ExpectedResult.NumRows = Number
INVDL.DLID = Number
ExpectedResult.DLID = Number
INVDL.RegionCode = Text
qtdetails.F1 = Text
Rebuild indexes on all the tables involved in the query. Run the query again and check time. It will decrease the execution time. I will update you soon with tuned query if I can.
Keep Querying!

Slow ORDER BY in large table

I have problem with ORDER BY clause. When I remove ORDER BY in following query, query is finished in 0.004 seconds.
If I keep it, query is running very slow = 56 seconds.
It is because MySQL is first grabbing all 500 000 records, then sorting and finally returing only first 18 records.
How can I solve this big table ordering problem ?
Thanks.
Explain:
http://img444.imageshack.us/img444/9440/explain.png
SQL Query:
SELECT `_sd`.`sazbaDPHId`, `_sd`.`sazbaDPH`, `_sd`.`sazbaDPHProcent`, `_zk`.`zboziKategorieId`, `_zk`.`zboziId`,
`_zk`.`kategorieId`, `_zk`.`zboziKategoriePoradi`, `_k`.`kategorieId`, `_k`.`kategorieNazev`, `_k`.`kategorieCelyNazev`,
`_k`.`kategorieKod`, `_k`.`kategorieCesta`, `_k`.`kategoriePopis`, `_k`.`kategorieKeywords`, `_k`.`kategorieRodiceId`, `_k`.`kategoriePoradi`, `_k`.`kategorieSkryta`, `_k`.`kategorieVTopMenu`, `_v`.`vyrobceId`, `_v`.`vyrobceNazev`, `_v`.`vyrobceKod`,
`_v`.`vyrobceKoeficient`, `_tzvz`.`typZboziVlastnostZboziId`, `_tzvz`.`typZboziId`, `_tzvz`.`vlastnostZboziId`,
`_vzh`.`vlastnostZboziHodnotaId`, `_vzh`.`zboziId`, `_vzh`.`vlastnostZboziId`, `_vzh`.`vlastnostZboziHodnota`, `zvc`.`zboziVyslCenaId` AS`zvc_zboziVyslCenaId`, `zvc`.`zboziVyslCenaZboziId` AS`zvc_zboziVyslCenaZboziId`, `zvc`.`vyslCena` AS`zvc_vyslCena`,
`zvc`.`vyslCenaSDPH` AS`zvc_vyslCenaSDPH`, `this`.`zboziId`, `this`.`zboziNazev`, `this`.`zboziKod`, `this`.`zboziIdentifikator`,
`this`.`zboziPartNum`, `this`.`zboziEAN`, `this`.`zboziPopis`, `this`.`zboziOstatniParametry`, `this`.`zboziInterniInfo`,
`this`.`zboziProdejniCena`, `this`.`zboziAkcniCena`, `this`.`zboziSetovaCena`, `this`.`zboziMocCena`, `this`.`sazbaDPHId`,
`this`.`vyrobceId`, `this`.`typZboziId`, `this`.`stavZboziId`, `this`.`skladovaDostupnostId`, `this`.`zdrojCenId`, `this`.`zboziPHE`,
`this`.`zboziAutorskyPoplatek`, `this`.`zboziVahovyPoplatek`, `this`.`nemenitStavZbozi`
FROM `tbl_Zbozi`AS this
LEFT JOIN `reg_SazbaDPH`AS _sd ON this.sazbaDPHId = _sd.sazbaDPHId
LEFT JOIN `tbl_Zbozi_Kategorie`AS _zk ON this.zboziId = _zk.zboziId
LEFT JOIN `tbl_Kategorie`AS _k ON _zk.kategorieId = _k.kategorieId
LEFT JOIN `tbl_Vyrobce`AS _v ON this.vyrobceId = _v.vyrobceId
LEFT JOIN `tbl_TypZbozi_VlastnostZbozi`AS _tzvz ON this.typZboziId = _tzvz.typZboziId
LEFT JOIN `tbl_VlastnostZboziHodnota`AS _vzh ON this.zboziId = _vzh.zboziId AND _vzh.vlastnostZboziId = _tzvz.vlastnostZboziId
LEFT JOIN `tbl_Zbozi_VyslCena`AS zvc ON this.zboziId = zvc.zboziVyslCenaZboziId
WHERE _k.kategorieId IN (155317, 5570, 155445, 5706, 5707, 155429, 155430, 155431, 5708, 5709, 5710, 155427, 155426, 155428, 11413, 5713,
5714, 5715, 5716, 5717, 5718, 5719, 5720, 10245, 10253, 11253, 10834, 10269, 10249, 10246, 10247, 10248, 5723, 5725, 5726, 5727, 5728, 5729,
155319, 5815, 5816, 5817, 5818, 5819, 5822, 5824, 5832, 11406, 11411, 11410, 11409,
6069, 6070, 6072, 6073, 6075, 6078, 6086, 11414, 6185, 155433, 6186, 6187, 6188, 6190, 6191, 6193, 6198, 6199, 6200, 6201, 6202, 6203, 6207,
6209, 11442, 6210, 6211, 6212, 6215, 6216, 6217, 6218, 6219, 6220, 155366, 6221, 11339, 11340, 11341, 11359, 6222, 6223, 6224, 6225, 6226,
6227, 6228, 11099, 155376, 6231, 6232, 6233, 6234, 6235, 6236, 155391, 155392, 155437, 6237, 6238, 6241, 6243, 6244, 6245, 6246, 6247, 6248,
6249, 6250, 6251, 6252, 6253, 6254, 6256, 6257, 6258, 6259, 6260, 6261, 10839, 155362, 6262, 6263, 6264, 6265, 155361, 6267, 6269, 11390,
11346, 11112, 11394, 11397, 155393, 6270, 11436, 10292, 6271, 6272, 6275, 6277, 6278, 6279, 6280, 6281, 11348, 10288, 11113, 6283, 6284,
6285, 6287, 155494, 11114, 6292, 6293, 6294, 6295, 6296, 6297, 6298, 6300, 6301, 6302, 6303, 6304, 11116, 6305, 10781, 6306, 6307, 6308,
6309, 6310, 6311, 6313, 6314, 6315, 6316, 6317, 6318, 6327, 6328, 155451, 6333, 6334, 6335, 6337, 6340, 6342, 6343, 6344, 6345, 6346, 11344,
11389, 10289, 10291, 10302, 10303, 10304, 10294, 10306, 10300, 10305, 10293, 10299, 10298, 10290, 10296, 10297, 11454, 11100, 11101,
11117, 131475, 11402, 5680, 5684, 5685, 5686, 5687, 5688, 5689, 11383, 5702, 5703, 5704, 5705)
AND stavZboziId IN (2)
AND zvc.zboziVyslCenaSkupinaId = '8'
ORDER BY _k.kategoriePoradi ASC LIMIT 18
How else could it work? If it applied the order after selecting the 18 records you would get the top 18 records in the default order, which it would then sort.
You might get better performance by inserting all the values in your IN statement into a temp table and then joining to the temp table.
it still has to sort 500,000 records to find your 18 so it will be alot slower, you can speed it up by adding indexes to your kategoriePoradi column in your tables