Postgres view causing queries to take too long - sql

I'm trying to replace a entity.school view with entity.current_facts_with_frg view.
There are a few differences between the 2 views, one of which is that entity.school only includes a portion of the data from entity.current_facts_with_frg, so to get the same data, I'd have to run:
select * from entity.current_facts_with_frg where type = 'School'
The above query takes about 30 seconds to run, and select * from entity.school takes less than a second. I am okay with the time difference, however when I try to replace:
select * from entity.relationship r
join entity.school_network sn on r.related_id = sn.id
join entity.school s on r.main_id = s.id
with
select * from entity.relationship r
join entity.school_network sn on r.related_id = sn.id
join (select * from entity.current_facts_with_frg where type = 'School') s on r.main_id = s.id
there ends up being a huge difference in the query speed. The first query taking about 4 minutes, and the second query taking over 30 minutes.
I'm confused as I would presume that it should only take the extra 30 seconds from the initial time discrepancy.

Related

How to optimize too slow SQL query

I need my query to run weekly, but it is taking very long (about one hour per execution). The sheer volume of information makes it long to run, but I wondered how I could optimize it. I'm an SQL novice. This is my request :
SELECT PRIME_MINISTER.PRIME_MINISTER_ID,
PRIME_MINISTER.PRIME_MINISTER_NAME,
CITY.CITY_ID,
CITY.CITY_POPULATION,
CITY.CITY_FOUNDATION_DATE,
STATE.STATE_ID,
STATE.STATE_NAME,
CITY_ACCOUNTANT.CITY_ACCOUNTANT_ID,
CITY_ACCOUNTANT.CITY_ACCOUNTANT_SOCIAL,
CITY_ACCOUNTANT.CITY_ACCOUNTANT_NAME,
CITY_COUNCIL.CITY_COUNCIL_ID,
CITY_COUNCIL.CITY_COUNCIL_FREQUENCY,
CITY_DEBT.CITY_DEBT_ID,
CITY_DEBT.CITY_DEBT_NATURE,
CITY_DEBT.CITY_DEBT_AMOUNT,
HEAD_OF_STATE.HEAD_OF_STATE_ID,
HEAD_OF_STATE.HEAD_OF_STATE_SOCIAL,
DEPUTY_HEAD_OF_STATE.DEPUTY_HEAD_OF_STATE_ID,
DEPUTY_HEAD_OF_STATE.DEPUTY_HEAD_OF_STATE_SOCIAL,
DEPUTY_HEAD_OF_STATE.DEPUTY_HEAD_OF_STATE_NAME
FROM CITY
LEFT JOIN CITY_COUNCIL ON CITY.CITY_ID = CITY_COUNCIL.CITY_ID
LEFT JOIN CITY_DEBT ON CITY_COUNCIL.CITY_COUNCIL_ID = CITY_DEBT.CITY_COUNCIL_ID
OR CITY.CITY_ID = CITY_DEBT.CITY_ID
INNER JOIN CITY_ACCOUNTANT ON CITY_ACCOUNTANT.CITY_ACCOUNTANT_ID = CITY.CITY_ACCOUNTANT_ID
INNER JOIN STATE ON STATE.STATE_ID = CITY.STATE_ID
INNER JOIN HEAD_OF_STATE ON HEAD_OF_STATE.HEAD_OF_STATE_ID = STATE.HEAD_OF_STATE_ID
INNER JOIN DEPUTY_HEAD_OF_STATE ON DEPUTY_HEAD_OF_STATE.DEPUTY_HEAD_OF_STATE_ID = HEAD_OF_STATE.DEPUTY_HEAD_OF_STATE_ID
INNER JOIN PRIME_MINISTER ON STATE.STATE_ID = PRIME_MINISTER.STATE_ID
WHERE CITY.CITY_STATUS = 2
AND CITY.PRIME_MINISTER_STATUS = 2
AND CITY.JURISDICTION = '70'
AND CITY.CITY_ACCOUNTANT_NATURE = 'S'
ORDER BY DEPUTY_HEAD_OF_STATE.DEPUTY_HEAD_OF_STATE_ID,
HEAD_OF_STATE.HEAD_OF_STATE_ID,
STATE.STATE_ID,
CITY.CITY_ID,
CITY_COUNCIL.CITY_COUNCIL_ID,
CITY_DEBT.CITY_DEBT_ID,
CITY_ACCOUNTANT.CITY_ACCOUNTANT_ID;
I select all of this data in the reading part of a spring batch to be able to write it in a file.
This is the database model :
Database Model
The database is not mine so I can't modify the database model but I can create indexes if needed.
There is between 1,000 and 7,000 rows selected per execution. All the columns are needed.
CITY_ACCOUTANT in SQL vs CITY_ACCOUNTANT in picture: is this the right query or a typo?
Is the spring batch process is taking an hour or just the query?

Clickhouse join with condition

I found strange thing, the query:
SELECT *
FROM progress as pp
ALL LEFT JOIN links as ll USING (viewId)
WHERE viewId = 'a776a2f2-16ad-448a-858d-891e68bec9a8'
Result: 0 rows in set. Elapsed: 5.267 sec. Processed 8.62 million rows, 484.94 MB (1.64 million rows/s., 92.08 MB/s.)
Here modified query:
SELECT *
FROM
(SELECT *
FROM progress
WHERE viewId = 'a776a2f2-16ad-448a-858d-891e68bec9a8') AS p ALL
LEFT JOIN
(SELECT *
FROM links
WHERE viewId = toUUID('a776a2f2-16ad-448a-858d-891e68bec9a8')) AS l ON p.viewId = l.viewId;
Result : 0 rows in set. Elapsed: 0.076 sec. Processed 4.48 million rows, 161.35 MB (58.69 million rows/s., 2.12 GB/s.)
But it looks dirty.
Isn't it supposed to optimize the query considering where condition?
What is the right way to write the query here, and what if it will be where in?
Then I try to add another join:
SELECT *
FROM
(SELECT videoUuid AS contentUuid,
viewId
FROM
(SELECT *
FROM progress
WHERE viewId = 'a776a2f2-16ad-448a-858d-891e68bec9a8') p ALL
LEFT JOIN
(SELECT *
FROM links
WHERE viewId = toUUID('a776a2f2-16ad-448a-858d-891e68bec9a8')) USING `viewId`) ALL
LEFT JOIN `metaInfo` USING `viewId`,
`contentUuid`;
The result again very slow, considering that I want just join 3 tables with condition selection one row:
0 rows in set. Elapsed: 1.747 sec. Processed 9.13 million rows, 726.55 MB (5.22 million rows/s., 415.85 MB/s.)
At this moment the CH not very good cope with multi-joins queries (DB star-schema) and the query optimizer not good enough to rely on it completely.
So it needs to explicitly say how to 'execute' a query by using subqueries instead of joins.
Consider the test query:
SELECT table_01.number AS r
FROM numbers(87654321) AS table_01
INNER JOIN numbers(7654321) AS table_02 ON (table_01.number = table_02.number)
INNER JOIN numbers(654321) AS table_03 ON (table_02.number = table_03.number)
INNER JOIN numbers(54321) AS table_04 ON (table_03.number = table_04.number)
WHERE r = 54320
/*
┌─────r─┐
│ 54320 │
└───────┘
1 rows in set. Elapsed: 6.261 sec. Processed 96.06 million rows, 768.52 MB (15.34 million rows/s., 122.74 MB/s.)
*/
Let's rewrite it using subqueries to significantly speed it up.
SELECT number AS r
FROM numbers(87654321)
WHERE r = 54320 AND number IN (
SELECT number AS r
FROM numbers(7654321)
WHERE r = 54320 AND number IN (
SELECT number AS r
FROM numbers(654321)
WHERE r = 54320 AND number IN (
SELECT number AS r
FROM numbers(54321)
WHERE r = 54320
)
)
)
/*
┌─────r─┐
│ 54320 │
└───────┘
1 rows in set. Elapsed: 0.481 sec. Processed 96.06 million rows, 768.52 MB (199.69 million rows/s., 1.60 GB/s.)
*/
There are other ways to optimize JOIN:
use External dictionary to get rid of join on 'small'-table
use Join table engine
use ANY-strictness
use specific settings like join_algorithm, partial_merge_join_optimizations etc
Some useful refs:
Altinity webinar: Tips and tricks every ClickHouse user should know
Altinity webinar: Secrets of ClickHouse Query Performance
Isn't it supposed to optimize the query concidering where condition?
Such optimization is not implemented yet
It is expected behavior.
According to CH doc https://clickhouse.tech/docs/en/sql-reference/statements/select/join/#performance "When running a JOIN, there is no optimization of the order of execution in relation to other stages of the query. The join (a search in the right table) is run before filtering in WHERE and before aggregation."

Select from view takes too long

I have a query against a table that contains like 2 million rows using linked server.
Select * from OPENQUERY(LinkedServerName,
'SELECT
PV.col1
,PV.col2
,PV.col3
,VTR.col1
,CTR.col1
,PSR.col1
FROM
LinkedDbName.dbo.tbl1 PV
INNER JOIN LinkedDbName.dbo.tbl2 VTR
ON PV.col_id = VTR.col_id
INNER JOIN LinkedDbName.dbo.tbl3 CTR
ON PV.col_id = CTR.col_id
INNER JOIN LinkedDbName.dbo.tbl4 PSR
ON PV.col_id = PSR.col_id
WHERE
PV.col_id = ''80C53C9B-6272-11DA-BB34-000E0C7F3ED2''')
That query results into 365 rows and is executed within 0 second.
However when I make that query into a view it runs for about minimum of 20 seconds and sometimes it reaches to 40 seconds tops.
Here's my create view script
CREATE VIEW [dbo].[myview]
AS
Select * from OPENQUERY(LinkedServerName,
'SELECT
PV.col1
,PV.col2
,PV.col3
,VTR.col1
,CTR.col1
,PSR.col1
FROM
LinkedDbName.dbo.tbl1 PV
INNER JOIN LinkedDbName.dbo.tbl2 VTR
ON PV.col_id = VTR.col_id
INNER JOIN LinkedDbName.dbo.tbl3 CTR
ON PV.col_id = CTR.col_id
INNER JOIN LinkedDbName.dbo.tbl4 PSR
ON PV.col_id = PSR.col_id')
then
Select * from myview where PV.col_id = '80C53C9B-6272-11DA-BB34-000E0C7F3ED2'
Any idea ? Thanks !
Your queries are quite different. In the first, the where clause is part of the SQL statement passed to OPENQUERY(). This has two important effects:
The amount of data returned is much smaller, only being the rows that match the condition.
The query can be optimized with the WHERE clause.
If you need to share the table, I might suggest that you make a copy on the local server -- either using replication or scheduling a job to copy it over.

Improvement in SQL Query - Complex Join / IN / Exists

Looking for suggestions on speeding up this query. Access 2007.
The inner query takes a few minutes , full query takes very long (40 - 80 minutes). Result is as expected. Everything is Indexed.
SELECT qtdetails.F5, qtdetails.F16, ExpectedResult.DLID, ExpectedResult.NumRows
FROM qtdetails
INNER JOIN (INVDL
INNER JOIN ExpectedResult
ON INVDL.DLID = ExpectedResult.DLID)
ON (qtdetails.F1 = INVDL.RegionCode)
AND (qtdetails.RoundTotal = ExpectedResult.RoundTotal)
WHERE
(qtdetails.F5 IN (SELECT qtdetails.F5
FROM (ExpectedResult
INNER JOIN INVDL
ON ExpectedResult.DLID = INVDL.DLID)
INNER JOIN qtdetails
ON (INVDL.RegionCode = qtdetails.F1)
AND (ExpectedResult.RoundTotal = qtdetails.RoundTotal)
GROUP BY qtdetails.F5
HAVING (((COUNT(ExpectedResult.DLID)) < 2));
)
);
INVDL - 80,000 records
ExpectedResult - Ten Million records
qtDetails - 12,000 records
The Inner Query will result in around 5000 - 8000 records.
Tried saving the results of the Inner Query in a table. and then using
Select F5 from qTempTable instead. But still taking a lot of time.
Any help would be very highly appreciated.
Data Type :
qtdetails.F5 = Number
qtdetails.F16 = Text
ExpectedResult.NumRows = Number
INVDL.DLID = Number
ExpectedResult.DLID = Number
INVDL.RegionCode = Text
qtdetails.F1 = Text
Rebuild indexes on all the tables involved in the query. Run the query again and check time. It will decrease the execution time. I will update you soon with tuned query if I can.
Keep Querying!

Slow ORDER BY in large table

I have problem with ORDER BY clause. When I remove ORDER BY in following query, query is finished in 0.004 seconds.
If I keep it, query is running very slow = 56 seconds.
It is because MySQL is first grabbing all 500 000 records, then sorting and finally returing only first 18 records.
How can I solve this big table ordering problem ?
Thanks.
Explain:
http://img444.imageshack.us/img444/9440/explain.png
SQL Query:
SELECT `_sd`.`sazbaDPHId`, `_sd`.`sazbaDPH`, `_sd`.`sazbaDPHProcent`, `_zk`.`zboziKategorieId`, `_zk`.`zboziId`,
`_zk`.`kategorieId`, `_zk`.`zboziKategoriePoradi`, `_k`.`kategorieId`, `_k`.`kategorieNazev`, `_k`.`kategorieCelyNazev`,
`_k`.`kategorieKod`, `_k`.`kategorieCesta`, `_k`.`kategoriePopis`, `_k`.`kategorieKeywords`, `_k`.`kategorieRodiceId`, `_k`.`kategoriePoradi`, `_k`.`kategorieSkryta`, `_k`.`kategorieVTopMenu`, `_v`.`vyrobceId`, `_v`.`vyrobceNazev`, `_v`.`vyrobceKod`,
`_v`.`vyrobceKoeficient`, `_tzvz`.`typZboziVlastnostZboziId`, `_tzvz`.`typZboziId`, `_tzvz`.`vlastnostZboziId`,
`_vzh`.`vlastnostZboziHodnotaId`, `_vzh`.`zboziId`, `_vzh`.`vlastnostZboziId`, `_vzh`.`vlastnostZboziHodnota`, `zvc`.`zboziVyslCenaId` AS`zvc_zboziVyslCenaId`, `zvc`.`zboziVyslCenaZboziId` AS`zvc_zboziVyslCenaZboziId`, `zvc`.`vyslCena` AS`zvc_vyslCena`,
`zvc`.`vyslCenaSDPH` AS`zvc_vyslCenaSDPH`, `this`.`zboziId`, `this`.`zboziNazev`, `this`.`zboziKod`, `this`.`zboziIdentifikator`,
`this`.`zboziPartNum`, `this`.`zboziEAN`, `this`.`zboziPopis`, `this`.`zboziOstatniParametry`, `this`.`zboziInterniInfo`,
`this`.`zboziProdejniCena`, `this`.`zboziAkcniCena`, `this`.`zboziSetovaCena`, `this`.`zboziMocCena`, `this`.`sazbaDPHId`,
`this`.`vyrobceId`, `this`.`typZboziId`, `this`.`stavZboziId`, `this`.`skladovaDostupnostId`, `this`.`zdrojCenId`, `this`.`zboziPHE`,
`this`.`zboziAutorskyPoplatek`, `this`.`zboziVahovyPoplatek`, `this`.`nemenitStavZbozi`
FROM `tbl_Zbozi`AS this
LEFT JOIN `reg_SazbaDPH`AS _sd ON this.sazbaDPHId = _sd.sazbaDPHId
LEFT JOIN `tbl_Zbozi_Kategorie`AS _zk ON this.zboziId = _zk.zboziId
LEFT JOIN `tbl_Kategorie`AS _k ON _zk.kategorieId = _k.kategorieId
LEFT JOIN `tbl_Vyrobce`AS _v ON this.vyrobceId = _v.vyrobceId
LEFT JOIN `tbl_TypZbozi_VlastnostZbozi`AS _tzvz ON this.typZboziId = _tzvz.typZboziId
LEFT JOIN `tbl_VlastnostZboziHodnota`AS _vzh ON this.zboziId = _vzh.zboziId AND _vzh.vlastnostZboziId = _tzvz.vlastnostZboziId
LEFT JOIN `tbl_Zbozi_VyslCena`AS zvc ON this.zboziId = zvc.zboziVyslCenaZboziId
WHERE _k.kategorieId IN (155317, 5570, 155445, 5706, 5707, 155429, 155430, 155431, 5708, 5709, 5710, 155427, 155426, 155428, 11413, 5713,
5714, 5715, 5716, 5717, 5718, 5719, 5720, 10245, 10253, 11253, 10834, 10269, 10249, 10246, 10247, 10248, 5723, 5725, 5726, 5727, 5728, 5729,
155319, 5815, 5816, 5817, 5818, 5819, 5822, 5824, 5832, 11406, 11411, 11410, 11409,
6069, 6070, 6072, 6073, 6075, 6078, 6086, 11414, 6185, 155433, 6186, 6187, 6188, 6190, 6191, 6193, 6198, 6199, 6200, 6201, 6202, 6203, 6207,
6209, 11442, 6210, 6211, 6212, 6215, 6216, 6217, 6218, 6219, 6220, 155366, 6221, 11339, 11340, 11341, 11359, 6222, 6223, 6224, 6225, 6226,
6227, 6228, 11099, 155376, 6231, 6232, 6233, 6234, 6235, 6236, 155391, 155392, 155437, 6237, 6238, 6241, 6243, 6244, 6245, 6246, 6247, 6248,
6249, 6250, 6251, 6252, 6253, 6254, 6256, 6257, 6258, 6259, 6260, 6261, 10839, 155362, 6262, 6263, 6264, 6265, 155361, 6267, 6269, 11390,
11346, 11112, 11394, 11397, 155393, 6270, 11436, 10292, 6271, 6272, 6275, 6277, 6278, 6279, 6280, 6281, 11348, 10288, 11113, 6283, 6284,
6285, 6287, 155494, 11114, 6292, 6293, 6294, 6295, 6296, 6297, 6298, 6300, 6301, 6302, 6303, 6304, 11116, 6305, 10781, 6306, 6307, 6308,
6309, 6310, 6311, 6313, 6314, 6315, 6316, 6317, 6318, 6327, 6328, 155451, 6333, 6334, 6335, 6337, 6340, 6342, 6343, 6344, 6345, 6346, 11344,
11389, 10289, 10291, 10302, 10303, 10304, 10294, 10306, 10300, 10305, 10293, 10299, 10298, 10290, 10296, 10297, 11454, 11100, 11101,
11117, 131475, 11402, 5680, 5684, 5685, 5686, 5687, 5688, 5689, 11383, 5702, 5703, 5704, 5705)
AND stavZboziId IN (2)
AND zvc.zboziVyslCenaSkupinaId = '8'
ORDER BY _k.kategoriePoradi ASC LIMIT 18
How else could it work? If it applied the order after selecting the 18 records you would get the top 18 records in the default order, which it would then sort.
You might get better performance by inserting all the values in your IN statement into a temp table and then joining to the temp table.
it still has to sort 500,000 records to find your 18 so it will be alot slower, you can speed it up by adding indexes to your kategoriePoradi column in your tables