Slow ORDER BY in large table - sql
I have problem with ORDER BY clause. When I remove ORDER BY in following query, query is finished in 0.004 seconds.
If I keep it, query is running very slow = 56 seconds.
It is because MySQL is first grabbing all 500 000 records, then sorting and finally returing only first 18 records.
How can I solve this big table ordering problem ?
Thanks.
Explain:
http://img444.imageshack.us/img444/9440/explain.png
SQL Query:
SELECT `_sd`.`sazbaDPHId`, `_sd`.`sazbaDPH`, `_sd`.`sazbaDPHProcent`, `_zk`.`zboziKategorieId`, `_zk`.`zboziId`,
`_zk`.`kategorieId`, `_zk`.`zboziKategoriePoradi`, `_k`.`kategorieId`, `_k`.`kategorieNazev`, `_k`.`kategorieCelyNazev`,
`_k`.`kategorieKod`, `_k`.`kategorieCesta`, `_k`.`kategoriePopis`, `_k`.`kategorieKeywords`, `_k`.`kategorieRodiceId`, `_k`.`kategoriePoradi`, `_k`.`kategorieSkryta`, `_k`.`kategorieVTopMenu`, `_v`.`vyrobceId`, `_v`.`vyrobceNazev`, `_v`.`vyrobceKod`,
`_v`.`vyrobceKoeficient`, `_tzvz`.`typZboziVlastnostZboziId`, `_tzvz`.`typZboziId`, `_tzvz`.`vlastnostZboziId`,
`_vzh`.`vlastnostZboziHodnotaId`, `_vzh`.`zboziId`, `_vzh`.`vlastnostZboziId`, `_vzh`.`vlastnostZboziHodnota`, `zvc`.`zboziVyslCenaId` AS`zvc_zboziVyslCenaId`, `zvc`.`zboziVyslCenaZboziId` AS`zvc_zboziVyslCenaZboziId`, `zvc`.`vyslCena` AS`zvc_vyslCena`,
`zvc`.`vyslCenaSDPH` AS`zvc_vyslCenaSDPH`, `this`.`zboziId`, `this`.`zboziNazev`, `this`.`zboziKod`, `this`.`zboziIdentifikator`,
`this`.`zboziPartNum`, `this`.`zboziEAN`, `this`.`zboziPopis`, `this`.`zboziOstatniParametry`, `this`.`zboziInterniInfo`,
`this`.`zboziProdejniCena`, `this`.`zboziAkcniCena`, `this`.`zboziSetovaCena`, `this`.`zboziMocCena`, `this`.`sazbaDPHId`,
`this`.`vyrobceId`, `this`.`typZboziId`, `this`.`stavZboziId`, `this`.`skladovaDostupnostId`, `this`.`zdrojCenId`, `this`.`zboziPHE`,
`this`.`zboziAutorskyPoplatek`, `this`.`zboziVahovyPoplatek`, `this`.`nemenitStavZbozi`
FROM `tbl_Zbozi`AS this
LEFT JOIN `reg_SazbaDPH`AS _sd ON this.sazbaDPHId = _sd.sazbaDPHId
LEFT JOIN `tbl_Zbozi_Kategorie`AS _zk ON this.zboziId = _zk.zboziId
LEFT JOIN `tbl_Kategorie`AS _k ON _zk.kategorieId = _k.kategorieId
LEFT JOIN `tbl_Vyrobce`AS _v ON this.vyrobceId = _v.vyrobceId
LEFT JOIN `tbl_TypZbozi_VlastnostZbozi`AS _tzvz ON this.typZboziId = _tzvz.typZboziId
LEFT JOIN `tbl_VlastnostZboziHodnota`AS _vzh ON this.zboziId = _vzh.zboziId AND _vzh.vlastnostZboziId = _tzvz.vlastnostZboziId
LEFT JOIN `tbl_Zbozi_VyslCena`AS zvc ON this.zboziId = zvc.zboziVyslCenaZboziId
WHERE _k.kategorieId IN (155317, 5570, 155445, 5706, 5707, 155429, 155430, 155431, 5708, 5709, 5710, 155427, 155426, 155428, 11413, 5713,
5714, 5715, 5716, 5717, 5718, 5719, 5720, 10245, 10253, 11253, 10834, 10269, 10249, 10246, 10247, 10248, 5723, 5725, 5726, 5727, 5728, 5729,
155319, 5815, 5816, 5817, 5818, 5819, 5822, 5824, 5832, 11406, 11411, 11410, 11409,
6069, 6070, 6072, 6073, 6075, 6078, 6086, 11414, 6185, 155433, 6186, 6187, 6188, 6190, 6191, 6193, 6198, 6199, 6200, 6201, 6202, 6203, 6207,
6209, 11442, 6210, 6211, 6212, 6215, 6216, 6217, 6218, 6219, 6220, 155366, 6221, 11339, 11340, 11341, 11359, 6222, 6223, 6224, 6225, 6226,
6227, 6228, 11099, 155376, 6231, 6232, 6233, 6234, 6235, 6236, 155391, 155392, 155437, 6237, 6238, 6241, 6243, 6244, 6245, 6246, 6247, 6248,
6249, 6250, 6251, 6252, 6253, 6254, 6256, 6257, 6258, 6259, 6260, 6261, 10839, 155362, 6262, 6263, 6264, 6265, 155361, 6267, 6269, 11390,
11346, 11112, 11394, 11397, 155393, 6270, 11436, 10292, 6271, 6272, 6275, 6277, 6278, 6279, 6280, 6281, 11348, 10288, 11113, 6283, 6284,
6285, 6287, 155494, 11114, 6292, 6293, 6294, 6295, 6296, 6297, 6298, 6300, 6301, 6302, 6303, 6304, 11116, 6305, 10781, 6306, 6307, 6308,
6309, 6310, 6311, 6313, 6314, 6315, 6316, 6317, 6318, 6327, 6328, 155451, 6333, 6334, 6335, 6337, 6340, 6342, 6343, 6344, 6345, 6346, 11344,
11389, 10289, 10291, 10302, 10303, 10304, 10294, 10306, 10300, 10305, 10293, 10299, 10298, 10290, 10296, 10297, 11454, 11100, 11101,
11117, 131475, 11402, 5680, 5684, 5685, 5686, 5687, 5688, 5689, 11383, 5702, 5703, 5704, 5705)
AND stavZboziId IN (2)
AND zvc.zboziVyslCenaSkupinaId = '8'
ORDER BY _k.kategoriePoradi ASC LIMIT 18
How else could it work? If it applied the order after selecting the 18 records you would get the top 18 records in the default order, which it would then sort.
You might get better performance by inserting all the values in your IN statement into a temp table and then joining to the temp table.
it still has to sort 500,000 records to find your 18 so it will be alot slower, you can speed it up by adding indexes to your kategoriePoradi column in your tables
Related
Update Query that Lists Rows Not Updated
I have a 12 million row SQL Server table that when I run the following query it shows approximately 11.6 million rows updated: UPDATE [HCRIS] SET [HCRIS].[ST_ABB] = dbo.[PRVDR_CHANGE].[state_abbreviation] FROM [HCRIS] INNER JOIN dbo.[PRVDR_CHANGE] ON [HCRIS].[PRVDR_NUM] = dbo.[PRVDR_CHANGE].[PRVDR_NUM] WHERE [HCRIS].[PRVDR_NUM] = dbo.[PRVDR_CHANGE].[PRVDR_NUM]; I have checked for nulls and non-numeric values in a numeric column but would love to pinpoint those rows that were not updated. My intent is to update all rows. Thank you.
Use EXCEPT to find rows that won't get updated - where you select all rows in the first query and rows you would have updated in the second. SELECT H.* FROM [HCRIS] H EXCEPT SELECT H.* FROM dbo.[HCRIS] H INNER JOIN dbo.[PRVDR_CHANGE] C ON H.[PRVDR_NUM] = C.[PRVDR_NUM] WHERE H.[PRVDR_NUM] = C.[PRVDR_NUM];
SQL query Optimisation JOIN multiple column
I have two tables on Microsoft Access: T_DATAS (about 200 000 rows) and T_REAF (about 1000 rows). T_DATAS has a lot of columns (about 30 columns) and T_REAF has about 10 columns. I have to tell you that I am not allowed to change those tables nor to create other tables. I have to work with it. Both tables have 6 columns that are the same. I need to join the tables on these 6 columns, to select ALL the columns from T_DATAS AND the columns that are in T_REAF but not in T_DATAS. My query is : SELECT A.*, B.CARROS_NEW, B.SEGT_NEW, B.ATTR INTO FINALTABLE FROM T_DATAS A LEFT JOIN T_REAF B ON A.REGION LIKE B.REGION AND A.PAYS LIKE B.PAYS AND A.MARQUE LIKE B.MARQUE AND A.MODELE LIKE B.MODELE AND A.CARROS LIKE B.CARROS AND A.SEGT LIKE B.SEGT I have the result I need but the problem is that this query is taking way too long to give the result (about 3 minutes). I know that T_DATAS contains a lot of rows (200 000) but I think that 3 minutes is too long for this query. Could you please tell me what is wrong with this query? Thanks a lot for your help
Two steps for this. One is changing the query to use =. I'm not 100% sure if this is necessary, but it can't hurt. The second is to create an index. So: SELECT D.*, R.CARROS_NEW, R.SEGT_NEW, R.ATTR INTO FINALTABLE FROM T_DATAS D LEFT JOIN T_REAF R ON D.REGION = R.REGION AND D.PAYS = R.PAYS AND D.MARQUE = R.MARQUE AND D.MODELE = R.MODELE AND D.CARROS = R.CARROS AND D.SEGT = R.SEGT; Second, you want an index on T_REAF: CREATE INDEX IDX_REAF_6 ON T_REAF(REGION, PAYS, MARQUE, MODELE, CARROS, SEGT); MS Access can then use the index for the JOIN, speeding the query. Note that I changed the table aliases to be abbreviations for the table names. This makes it easier to follow the logic in the query.
I assume that those 6 columns are same may have same datatype also. Note: Equals (=) operator is a comparison operator - that compares two values for equality. So in your query replace LIKE with = and see the result time. SELECT A.* ,B.CARROS_NEW ,B.SEGT_NEW ,B.ATTR INTO FINALTABLE FROM T_DATAS A LEFT JOIN T_REAF B ON A.REGION = B.REGION AND A.PAYS = B.PAYS AND A.MARQUE = B.MARQUE AND A.MODELE = B.MODELE AND A.CARROS = B.CARROS AND A.SEGT = B.SEGT
Optimize SQL query with many left join
I have a SQL query with many left joins SELECT COUNT(DISTINCT po.o_id) FROM T_PROPOSAL_INFO po LEFT JOIN T_PLAN_TYPE tp ON tp.plan_type_id = po.Plan_Type_Fk LEFT JOIN T_PRODUCT_TYPE pt ON pt.PRODUCT_TYPE_ID = po.cust_product_type_fk LEFT JOIN T_PROPOSAL_TYPE prt ON prt.PROPTYPE_ID = po.proposal_type_fk LEFT JOIN T_BUSINESS_SOURCE bs ON bs.BUSINESS_SOURCE_ID = po.CONT_AGT_BRK_CHANNEL_FK LEFT JOIN T_USER ur ON ur.Id = po.user_id_fk LEFT JOIN T_ROLES ro ON ur.roleid_fk = ro.Role_Id LEFT JOIN T_UNDERWRITING_DECISION und ON und.O_Id = po.decision_id_fk LEFT JOIN T_STATUS st ON st.STATUS_ID = po.piv_uw_status_fk LEFT OUTER JOIN T_MEMBER_INFO mi ON mi.proposal_info_fk = po.O_ID WHERE 1 = 1 AND po.CUST_APP_NO LIKE '%100010233976%' AND 1 = 1 AND po.IS_STP <> 1 AND po.PIV_UW_STATUS_FK != 10 The performance seems to be not good and I would like to optimize the query. Any suggestions please?
Try this one - SELECT COUNT(DISTINCT po.o_id) FROM T_PROPOSAL_INFO po WHERE PO.CUST_APP_NO LIKE '%100010233976%' AND PO.IS_STP <> 1 AND po.PIV_UW_STATUS_FK != 10
First, check your indexes. Are they old? Did they get fragmented? Do they need rebuilding? Then, check your "execution plan" (varies depending on the SQL Engine): are all joins properly understood? Are some of them 'out of order'? Do some of them transfer too many data? Then, check your plan and indexes: are all important columns covered? Are there any outstandingly lengthy table scans or joins? Are the columns in indexes IN ORDER with the query? Then, revise your query: - can you extract some parts that normally would quickly generate small rowset? - can you add new columns to indexes so join/filter expressions will get covered? - or reorder them so they match the query better? And, supporting the solution from #Devart: Can you eliminate some tables on the way? does the where touch the other tables at all? does the data in the other tables modify the count significantly? If neither SELECT nor WHERE never touches the other joined columns, and if the COUNT exact value is not that important (i.e. does that T_PROPOSAL_INFO exist?) then you might remove all the joins completely, as Devart suggested. LEFTJOINs never reduce the number of rows. They only copy/expand/multiply the rows.
Timeout running SQL query
I'm trying to using the aggregation features of the django ORM to run a query on a MSSQL 2008R2 database, but I keep getting a timeout error. The query (generated by django) which fails is below. I've tried running it directs the SQL management studio and it works, but takes 3.5 min It does look it's aggregating over a bunch of fields which it doesn't need to, but I wouldn't have though that should really cause it to take that long. The database isn't that big either, auth_user has 9 records, ticket_ticket has 1210, and ticket_watchers has 1876. Is there something I'm missing? SELECT [auth_user].[id], [auth_user].[password], [auth_user].[last_login], [auth_user].[is_superuser], [auth_user].[username], [auth_user].[first_name], [auth_user].[last_name], [auth_user].[email], [auth_user].[is_staff], [auth_user].[is_active], [auth_user].[date_joined], COUNT([tickets_ticket].[id]) AS [tickets_captured__count], COUNT(T3.[id]) AS [assigned_tickets__count], COUNT([tickets_ticket_watchers].[ticket_id]) AS [tickets_watched__count] FROM [auth_user] LEFT OUTER JOIN [tickets_ticket] ON ([auth_user].[id] = [tickets_ticket].[capturer_id]) LEFT OUTER JOIN [tickets_ticket] T3 ON ([auth_user].[id] = T3.[responsible_id]) LEFT OUTER JOIN [tickets_ticket_watchers] ON ([auth_user].[id] = [tickets_ticket_watchers].[user_id]) GROUP BY [auth_user].[id], [auth_user].[password], [auth_user].[last_login], [auth_user].[is_superuser], [auth_user].[username], [auth_user].[first_name], [auth_user].[last_name], [auth_user].[email], [auth_user].[is_staff], [auth_user].[is_active], [auth_user].[date_joined] HAVING (COUNT([tickets_ticket].[id]) > 0 OR COUNT(T3.[id]) > 0 ) EDIT: Here are the relevant indexes (excluding those not used in the query): auth_user.id (PK) auth_user.username (Unique) tickets_ticket.id (PK) tickets_ticket.capturer_id tickets_ticket.responsible_id tickets_ticket_watchers.id (PK) tickets_ticket_watchers.user_id tickets_ticket_watchers.ticket_id EDIT 2: After a bit of experimentation, I've found that the following query is the smallest that results in the slow execution: SELECT COUNT([tickets_ticket].[id]) AS [tickets_captured__count], COUNT(T3.[id]) AS [assigned_tickets__count], COUNT([tickets_ticket_watchers].[ticket_id]) AS [tickets_watched__count] FROM [auth_user] LEFT OUTER JOIN [tickets_ticket] ON ([auth_user].[id] = [tickets_ticket].[capturer_id]) LEFT OUTER JOIN [tickets_ticket] T3 ON ([auth_user].[id] = T3.[responsible_id]) LEFT OUTER JOIN [tickets_ticket_watchers] ON ([auth_user].[id] = [tickets_ticket_watchers].[user_id]) GROUP BY [auth_user].[id] The weird thing is that if I comment out any two lines in the above, it runs in less that 1s, but it doesn't seem to matter which lines I remove (although obviously I can't remove a join without also removing the relevant SELECT line). EDIT 3: The python code which generated this is: User.objects.annotate( Count('tickets_captured'), Count('assigned_tickets'), Count('tickets_watched') ) A look at the execution plan shows that SQL Server is first doing a cross-join on all the table, resulting in about 280 million rows, and 6Gb of data. I assume that this is where the problem lies, but why is it happening?
SQL Server is doing exactly what it was asked to do. Unfortunately, Django is not generating the right query for what you want. It looks like you need to count distinct, instead of just count: Django annotate() multiple times causes wrong answers As for why the query works that way: The query says to join the four tables together. So say an author has 2 captured tickets, 3 assigned tickets, and 4 watched tickets, the join will return 2*3*4 tickets, one for each combination of tickets. The distinct part will remove all the duplicates.
what about this? SELECT auth_user.*, C1.tickets_captured__count C2.assigned_tickets__count C3.tickets_watched__count FROM auth_user LEFT JOIN ( SELECT capturer_id, COUNT(*) AS tickets_captured__count FROM tickets_ticket GROUP BY capturer_id ) AS C1 ON auth_user.id = C1.capturer_id LEFT JOIN ( SELECT responsible_id, COUNT(*) AS assigned_tickets__count FROM tickets_ticket GROUP BY responsible_id ) AS C2 ON auth_user.id = C2.responsible_id LEFT JOIN ( SELECT user_id, COUNT(*) AS tickets_watched__count FROM tickets_ticket_watchers GROUP BY user_id ) AS C3 ON auth_user.id = C3.user_id WHERE C1.tickets_captured__count > 0 OR C2.assigned_tickets__count > 0 --WHERE C1.tickets_captured__count is not null OR C2.assigned_tickets__count is not null -- also works (I think with beter performance)
Improvement in SQL Query - Complex Join / IN / Exists
Looking for suggestions on speeding up this query. Access 2007. The inner query takes a few minutes , full query takes very long (40 - 80 minutes). Result is as expected. Everything is Indexed. SELECT qtdetails.F5, qtdetails.F16, ExpectedResult.DLID, ExpectedResult.NumRows FROM qtdetails INNER JOIN (INVDL INNER JOIN ExpectedResult ON INVDL.DLID = ExpectedResult.DLID) ON (qtdetails.F1 = INVDL.RegionCode) AND (qtdetails.RoundTotal = ExpectedResult.RoundTotal) WHERE (qtdetails.F5 IN (SELECT qtdetails.F5 FROM (ExpectedResult INNER JOIN INVDL ON ExpectedResult.DLID = INVDL.DLID) INNER JOIN qtdetails ON (INVDL.RegionCode = qtdetails.F1) AND (ExpectedResult.RoundTotal = qtdetails.RoundTotal) GROUP BY qtdetails.F5 HAVING (((COUNT(ExpectedResult.DLID)) < 2)); ) ); INVDL - 80,000 records ExpectedResult - Ten Million records qtDetails - 12,000 records The Inner Query will result in around 5000 - 8000 records. Tried saving the results of the Inner Query in a table. and then using Select F5 from qTempTable instead. But still taking a lot of time. Any help would be very highly appreciated. Data Type : qtdetails.F5 = Number qtdetails.F16 = Text ExpectedResult.NumRows = Number INVDL.DLID = Number ExpectedResult.DLID = Number INVDL.RegionCode = Text qtdetails.F1 = Text
Rebuild indexes on all the tables involved in the query. Run the query again and check time. It will decrease the execution time. I will update you soon with tuned query if I can. Keep Querying!