Slow ORDER BY in large table

Slow ORDER BY in large table - sql

I have problem with ORDER BY clause. When I remove ORDER BY in following query, query is finished in 0.004 seconds.
If I keep it, query is running very slow = 56 seconds.
It is because MySQL is first grabbing all 500 000 records, then sorting and finally returing only first 18 records.
How can I solve this big table ordering problem ?
Thanks.
Explain:
http://img444.imageshack.us/img444/9440/explain.png
SQL Query:
SELECT `_sd`.`sazbaDPHId`, `_sd`.`sazbaDPH`, `_sd`.`sazbaDPHProcent`, `_zk`.`zboziKategorieId`, `_zk`.`zboziId`,
`_zk`.`kategorieId`, `_zk`.`zboziKategoriePoradi`, `_k`.`kategorieId`, `_k`.`kategorieNazev`, `_k`.`kategorieCelyNazev`,
`_k`.`kategorieKod`, `_k`.`kategorieCesta`, `_k`.`kategoriePopis`, `_k`.`kategorieKeywords`, `_k`.`kategorieRodiceId`, `_k`.`kategoriePoradi`, `_k`.`kategorieSkryta`, `_k`.`kategorieVTopMenu`, `_v`.`vyrobceId`, `_v`.`vyrobceNazev`, `_v`.`vyrobceKod`,
`_v`.`vyrobceKoeficient`, `_tzvz`.`typZboziVlastnostZboziId`, `_tzvz`.`typZboziId`, `_tzvz`.`vlastnostZboziId`,
`_vzh`.`vlastnostZboziHodnotaId`, `_vzh`.`zboziId`, `_vzh`.`vlastnostZboziId`, `_vzh`.`vlastnostZboziHodnota`, `zvc`.`zboziVyslCenaId` AS`zvc_zboziVyslCenaId`, `zvc`.`zboziVyslCenaZboziId` AS`zvc_zboziVyslCenaZboziId`, `zvc`.`vyslCena` AS`zvc_vyslCena`,
`zvc`.`vyslCenaSDPH` AS`zvc_vyslCenaSDPH`, `this`.`zboziId`, `this`.`zboziNazev`, `this`.`zboziKod`, `this`.`zboziIdentifikator`,
`this`.`zboziPartNum`, `this`.`zboziEAN`, `this`.`zboziPopis`, `this`.`zboziOstatniParametry`, `this`.`zboziInterniInfo`,
`this`.`zboziProdejniCena`, `this`.`zboziAkcniCena`, `this`.`zboziSetovaCena`, `this`.`zboziMocCena`, `this`.`sazbaDPHId`,
`this`.`vyrobceId`, `this`.`typZboziId`, `this`.`stavZboziId`, `this`.`skladovaDostupnostId`, `this`.`zdrojCenId`, `this`.`zboziPHE`,
`this`.`zboziAutorskyPoplatek`, `this`.`zboziVahovyPoplatek`, `this`.`nemenitStavZbozi`
FROM `tbl_Zbozi`AS this
LEFT JOIN `reg_SazbaDPH`AS _sd ON this.sazbaDPHId = _sd.sazbaDPHId
LEFT JOIN `tbl_Zbozi_Kategorie`AS _zk ON this.zboziId = _zk.zboziId
LEFT JOIN `tbl_Kategorie`AS _k ON _zk.kategorieId = _k.kategorieId
LEFT JOIN `tbl_Vyrobce`AS _v ON this.vyrobceId = _v.vyrobceId
LEFT JOIN `tbl_TypZbozi_VlastnostZbozi`AS _tzvz ON this.typZboziId = _tzvz.typZboziId
LEFT JOIN `tbl_VlastnostZboziHodnota`AS _vzh ON this.zboziId = _vzh.zboziId AND _vzh.vlastnostZboziId = _tzvz.vlastnostZboziId
LEFT JOIN `tbl_Zbozi_VyslCena`AS zvc ON this.zboziId = zvc.zboziVyslCenaZboziId
WHERE _k.kategorieId IN (155317, 5570, 155445, 5706, 5707, 155429, 155430, 155431, 5708, 5709, 5710, 155427, 155426, 155428, 11413, 5713,
5714, 5715, 5716, 5717, 5718, 5719, 5720, 10245, 10253, 11253, 10834, 10269, 10249, 10246, 10247, 10248, 5723, 5725, 5726, 5727, 5728, 5729,
155319, 5815, 5816, 5817, 5818, 5819, 5822, 5824, 5832, 11406, 11411, 11410, 11409,
6069, 6070, 6072, 6073, 6075, 6078, 6086, 11414, 6185, 155433, 6186, 6187, 6188, 6190, 6191, 6193, 6198, 6199, 6200, 6201, 6202, 6203, 6207,
6209, 11442, 6210, 6211, 6212, 6215, 6216, 6217, 6218, 6219, 6220, 155366, 6221, 11339, 11340, 11341, 11359, 6222, 6223, 6224, 6225, 6226,
6227, 6228, 11099, 155376, 6231, 6232, 6233, 6234, 6235, 6236, 155391, 155392, 155437, 6237, 6238, 6241, 6243, 6244, 6245, 6246, 6247, 6248,
6249, 6250, 6251, 6252, 6253, 6254, 6256, 6257, 6258, 6259, 6260, 6261, 10839, 155362, 6262, 6263, 6264, 6265, 155361, 6267, 6269, 11390,
11346, 11112, 11394, 11397, 155393, 6270, 11436, 10292, 6271, 6272, 6275, 6277, 6278, 6279, 6280, 6281, 11348, 10288, 11113, 6283, 6284,
6285, 6287, 155494, 11114, 6292, 6293, 6294, 6295, 6296, 6297, 6298, 6300, 6301, 6302, 6303, 6304, 11116, 6305, 10781, 6306, 6307, 6308,
6309, 6310, 6311, 6313, 6314, 6315, 6316, 6317, 6318, 6327, 6328, 155451, 6333, 6334, 6335, 6337, 6340, 6342, 6343, 6344, 6345, 6346, 11344,
11389, 10289, 10291, 10302, 10303, 10304, 10294, 10306, 10300, 10305, 10293, 10299, 10298, 10290, 10296, 10297, 11454, 11100, 11101,
11117, 131475, 11402, 5680, 5684, 5685, 5686, 5687, 5688, 5689, 11383, 5702, 5703, 5704, 5705)
AND stavZboziId IN (2)
AND zvc.zboziVyslCenaSkupinaId = '8'
ORDER BY _k.kategoriePoradi ASC LIMIT 18

How else could it work? If it applied the order after selecting the 18 records you would get the top 18 records in the default order, which it would then sort.
You might get better performance by inserting all the values in your IN statement into a temp table and then joining to the temp table.

it still has to sort 500,000 records to find your 18 so it will be alot slower, you can speed it up by adding indexes to your kategoriePoradi column in your tables

Related

Update Query that Lists Rows Not Updated

I have a 12 million row SQL Server table that when I run the following query it shows approximately 11.6 million rows updated:
UPDATE [HCRIS]
SET [HCRIS].[ST_ABB] = dbo.[PRVDR_CHANGE].[state_abbreviation]
FROM [HCRIS]
INNER JOIN dbo.[PRVDR_CHANGE]
ON [HCRIS].[PRVDR_NUM] = dbo.[PRVDR_CHANGE].[PRVDR_NUM]
WHERE [HCRIS].[PRVDR_NUM] = dbo.[PRVDR_CHANGE].[PRVDR_NUM];
I have checked for nulls and non-numeric values in a numeric column but would love to pinpoint those rows that were not updated. My intent is to update all rows.
Thank you.

Use EXCEPT to find rows that won't get updated - where you select all rows in the first query and rows you would have updated in the second.
SELECT H.*
FROM [HCRIS] H
EXCEPT
SELECT H.*
FROM dbo.[HCRIS] H
INNER JOIN dbo.[PRVDR_CHANGE] C
ON H.[PRVDR_NUM] = C.[PRVDR_NUM]
WHERE H.[PRVDR_NUM] = C.[PRVDR_NUM];

SQL query Optimisation JOIN multiple column

I have two tables on Microsoft Access: T_DATAS (about 200 000 rows) and T_REAF (about 1000 rows).
T_DATAS has a lot of columns (about 30 columns) and T_REAF has about 10 columns.
I have to tell you that I am not allowed to change those tables nor to create other tables. I have to work with it.
Both tables have 6 columns that are the same. I need to join the tables on these 6 columns, to select ALL the columns from T_DATAS AND the columns that are in T_REAF but not in T_DATAS.
My query is :
SELECT A.*, B.CARROS_NEW, B.SEGT_NEW, B.ATTR
INTO FINALTABLE
FROM T_DATAS A LEFT JOIN T_REAF B ON
A.REGION LIKE B.REGION AND
A.PAYS LIKE B.PAYS AND
A.MARQUE LIKE B.MARQUE AND
A.MODELE LIKE B.MODELE AND
A.CARROS LIKE B.CARROS AND
A.SEGT LIKE B.SEGT
I have the result I need but the problem is that this query is taking way too long to give the result (about 3 minutes).
I know that T_DATAS contains a lot of rows (200 000) but I think that 3 minutes is too long for this query.
Could you please tell me what is wrong with this query?
Thanks a lot for your help

Two steps for this. One is changing the query to use =. I'm not 100% sure if this is necessary, but it can't hurt. The second is to create an index.
So:
SELECT D.*, R.CARROS_NEW, R.SEGT_NEW, R.ATTR
INTO FINALTABLE
FROM T_DATAS D LEFT JOIN
T_REAF R
ON D.REGION = R.REGION AND
D.PAYS = R.PAYS AND
D.MARQUE = R.MARQUE AND
D.MODELE = R.MODELE AND
D.CARROS = R.CARROS AND
D.SEGT = R.SEGT;
Second, you want an index on T_REAF:
CREATE INDEX IDX_REAF_6 ON T_REAF(REGION, PAYS, MARQUE, MODELE, CARROS, SEGT);
MS Access can then use the index for the JOIN, speeding the query.
Note that I changed the table aliases to be abbreviations for the table names. This makes it easier to follow the logic in the query.

I assume that those 6 columns are same may have same datatype also.
Note: Equals (=) operator is a comparison operator - that compares two values for equality. So in your query replace LIKE with = and see the result time.
SELECT A.*
,B.CARROS_NEW
,B.SEGT_NEW
,B.ATTR
INTO FINALTABLE
FROM T_DATAS A
LEFT JOIN T_REAF B
ON A.REGION = B.REGION
AND A.PAYS = B.PAYS
AND A.MARQUE = B.MARQUE
AND A.MODELE = B.MODELE
AND A.CARROS = B.CARROS
AND A.SEGT = B.SEGT

Optimize SQL query with many left join

I have a SQL query with many left joins
SELECT COUNT(DISTINCT po.o_id)
FROM T_PROPOSAL_INFO po
LEFT JOIN T_PLAN_TYPE tp ON tp.plan_type_id = po.Plan_Type_Fk
LEFT JOIN T_PRODUCT_TYPE pt ON pt.PRODUCT_TYPE_ID = po.cust_product_type_fk
LEFT JOIN T_PROPOSAL_TYPE prt ON prt.PROPTYPE_ID = po.proposal_type_fk
LEFT JOIN T_BUSINESS_SOURCE bs ON bs.BUSINESS_SOURCE_ID = po.CONT_AGT_BRK_CHANNEL_FK
LEFT JOIN T_USER ur ON ur.Id = po.user_id_fk
LEFT JOIN T_ROLES ro ON ur.roleid_fk = ro.Role_Id
LEFT JOIN T_UNDERWRITING_DECISION und ON und.O_Id = po.decision_id_fk
LEFT JOIN T_STATUS st ON st.STATUS_ID = po.piv_uw_status_fk
LEFT OUTER JOIN T_MEMBER_INFO mi ON mi.proposal_info_fk = po.O_ID
WHERE 1 = 1
AND po.CUST_APP_NO LIKE '%100010233976%'
AND 1 = 1
AND po.IS_STP <> 1
AND po.PIV_UW_STATUS_FK != 10
The performance seems to be not good and I would like to optimize the query.
Any suggestions please?

Try this one -
SELECT COUNT(DISTINCT po.o_id)
FROM T_PROPOSAL_INFO po
WHERE PO.CUST_APP_NO LIKE '%100010233976%'
AND PO.IS_STP <> 1
AND po.PIV_UW_STATUS_FK != 10

First, check your indexes. Are they old? Did they get fragmented? Do they need rebuilding?
Then, check your "execution plan" (varies depending on the SQL Engine): are all joins properly understood? Are some of them 'out of order'? Do some of them transfer too many data?
Then, check your plan and indexes: are all important columns covered? Are there any outstandingly lengthy table scans or joins? Are the columns in indexes IN ORDER with the query?
Then, revise your query:
- can you extract some parts that normally would quickly generate small rowset?
- can you add new columns to indexes so join/filter expressions will get covered?
- or reorder them so they match the query better?
And, supporting the solution from #Devart:
Can you eliminate some tables on the way? does the where touch the other tables at all? does the data in the other tables modify the count significantly? If neither SELECT nor WHERE never touches the other joined columns, and if the COUNT exact value is not that important (i.e. does that T_PROPOSAL_INFO exist?) then you might remove all the joins completely, as Devart suggested. LEFTJOINs never reduce the number of rows. They only copy/expand/multiply the rows.

Timeout running SQL query

I'm trying to using the aggregation features of the django ORM to run a query on a MSSQL 2008R2 database, but I keep getting a timeout error. The query (generated by django) which fails is below. I've tried running it directs the SQL management studio and it works, but takes 3.5 min
It does look it's aggregating over a bunch of fields which it doesn't need to, but I wouldn't have though that should really cause it to take that long. The database isn't that big either, auth_user has 9 records, ticket_ticket has 1210, and ticket_watchers has 1876. Is there something I'm missing?
SELECT
[auth_user].[id],
[auth_user].[password],
[auth_user].[last_login],
[auth_user].[is_superuser],
[auth_user].[username],
[auth_user].[first_name],
[auth_user].[last_name],
[auth_user].[email],
[auth_user].[is_staff],
[auth_user].[is_active],
[auth_user].[date_joined],
COUNT([tickets_ticket].[id]) AS [tickets_captured__count],
COUNT(T3.[id]) AS [assigned_tickets__count],
COUNT([tickets_ticket_watchers].[ticket_id]) AS [tickets_watched__count]
FROM
[auth_user]
LEFT OUTER JOIN [tickets_ticket] ON ([auth_user].[id] = [tickets_ticket].[capturer_id])
LEFT OUTER JOIN [tickets_ticket] T3 ON ([auth_user].[id] = T3.[responsible_id])
LEFT OUTER JOIN [tickets_ticket_watchers] ON ([auth_user].[id] = [tickets_ticket_watchers].[user_id])
GROUP BY
[auth_user].[id],
[auth_user].[password],
[auth_user].[last_login],
[auth_user].[is_superuser],
[auth_user].[username],
[auth_user].[first_name],
[auth_user].[last_name],
[auth_user].[email],
[auth_user].[is_staff],
[auth_user].[is_active],
[auth_user].[date_joined]
HAVING
(COUNT([tickets_ticket].[id]) > 0 OR COUNT(T3.[id]) > 0 )
EDIT:
Here are the relevant indexes (excluding those not used in the query):
auth_user.id (PK)
auth_user.username (Unique)
tickets_ticket.id (PK)
tickets_ticket.capturer_id
tickets_ticket.responsible_id
tickets_ticket_watchers.id (PK)
tickets_ticket_watchers.user_id
tickets_ticket_watchers.ticket_id
EDIT 2:
After a bit of experimentation, I've found that the following query is the smallest that results in the slow execution:
SELECT
COUNT([tickets_ticket].[id]) AS [tickets_captured__count],
COUNT(T3.[id]) AS [assigned_tickets__count],
COUNT([tickets_ticket_watchers].[ticket_id]) AS [tickets_watched__count]
FROM
[auth_user]
LEFT OUTER JOIN [tickets_ticket] ON ([auth_user].[id] = [tickets_ticket].[capturer_id])
LEFT OUTER JOIN [tickets_ticket] T3 ON ([auth_user].[id] = T3.[responsible_id])
LEFT OUTER JOIN [tickets_ticket_watchers] ON ([auth_user].[id] = [tickets_ticket_watchers].[user_id])
GROUP BY
[auth_user].[id]
The weird thing is that if I comment out any two lines in the above, it runs in less that 1s, but it doesn't seem to matter which lines I remove (although obviously I can't remove a join without also removing the relevant SELECT line).
EDIT 3:
The python code which generated this is:
User.objects.annotate(
Count('tickets_captured'),
Count('assigned_tickets'),
Count('tickets_watched')
)
A look at the execution plan shows that SQL Server is first doing a cross-join on all the table, resulting in about 280 million rows, and 6Gb of data. I assume that this is where the problem lies, but why is it happening?

SQL Server is doing exactly what it was asked to do. Unfortunately, Django is not generating the right query for what you want. It looks like you need to count distinct, instead of just count: Django annotate() multiple times causes wrong answers
As for why the query works that way: The query says to join the four tables together. So say an author has 2 captured tickets, 3 assigned tickets, and 4 watched tickets, the join will return 2*3*4 tickets, one for each combination of tickets. The distinct part will remove all the duplicates.

what about this?
SELECT auth_user.*,
C1.tickets_captured__count
C2.assigned_tickets__count
C3.tickets_watched__count
FROM
auth_user
LEFT JOIN
( SELECT capturer_id, COUNT(*) AS tickets_captured__count
FROM tickets_ticket GROUP BY capturer_id ) AS C1 ON auth_user.id = C1.capturer_id
LEFT JOIN
( SELECT responsible_id, COUNT(*) AS assigned_tickets__count
FROM tickets_ticket GROUP BY responsible_id ) AS C2 ON auth_user.id = C2.responsible_id
LEFT JOIN
( SELECT user_id, COUNT(*) AS tickets_watched__count
FROM tickets_ticket_watchers GROUP BY user_id ) AS C3 ON auth_user.id = C3.user_id
WHERE C1.tickets_captured__count > 0 OR C2.assigned_tickets__count > 0
--WHERE C1.tickets_captured__count is not null OR C2.assigned_tickets__count is not null -- also works (I think with beter performance)

Improvement in SQL Query - Complex Join / IN / Exists

Looking for suggestions on speeding up this query. Access 2007.
The inner query takes a few minutes , full query takes very long (40 - 80 minutes). Result is as expected. Everything is Indexed.
SELECT qtdetails.F5, qtdetails.F16, ExpectedResult.DLID, ExpectedResult.NumRows
FROM qtdetails
INNER JOIN (INVDL
INNER JOIN ExpectedResult
ON INVDL.DLID = ExpectedResult.DLID)
ON (qtdetails.F1 = INVDL.RegionCode)
AND (qtdetails.RoundTotal = ExpectedResult.RoundTotal)
WHERE
(qtdetails.F5 IN (SELECT qtdetails.F5
FROM (ExpectedResult
INNER JOIN INVDL
ON ExpectedResult.DLID = INVDL.DLID)
INNER JOIN qtdetails
ON (INVDL.RegionCode = qtdetails.F1)
AND (ExpectedResult.RoundTotal = qtdetails.RoundTotal)
GROUP BY qtdetails.F5
HAVING (((COUNT(ExpectedResult.DLID)) < 2));
)
);
INVDL - 80,000 records
ExpectedResult - Ten Million records
qtDetails - 12,000 records
The Inner Query will result in around 5000 - 8000 records.
Tried saving the results of the Inner Query in a table. and then using
Select F5 from qTempTable instead. But still taking a lot of time.
Any help would be very highly appreciated.
Data Type :
qtdetails.F5 = Number
qtdetails.F16 = Text
ExpectedResult.NumRows = Number
INVDL.DLID = Number
ExpectedResult.DLID = Number
INVDL.RegionCode = Text
qtdetails.F1 = Text

Rebuild indexes on all the tables involved in the query. Run the query again and check time. It will decrease the execution time. I will update you soon with tuned query if I can.
Keep Querying!

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Slow ORDER BY in large table - sql

How else could it work? If it applied the order after selecting the 18 records you would get the top 18 records in the default order, which it would then sort. You might get better performance by inserting all the values in your IN statement into a temp table and then joining to the temp table.

it still has to sort 500,000 records to find your 18 so it will be alot slower, you can speed it up by adding indexes to your kategoriePoradi column in your tables

Related

Update Query that Lists Rows Not Updated

SQL query Optimisation JOIN multiple column

Optimize SQL query with many left join

Timeout running SQL query

Improvement in SQL Query - Complex Join / IN / Exists

Categories

Resources