So here's my schema (give or take):
cmds.Add(#"CREATE TABLE [Services] ([Id] INTEGER PRIMARY KEY, [AssetId] INTEGER NULL, [Name] TEXT NOT NULL)");
cmds.Add(#"CREATE INDEX [IX_Services_AssetId] ON [Services] ([AssetId])");
cmds.Add(#"CREATE INDEX [IX_Services_Name] ON [Services] ([Name])");
cmds.Add(#"CREATE TABLE [Telemetry] ([Id] INTEGER PRIMARY KEY, [ServiceId] INTEGER NULL, [Name] TEXT NOT NULL)");
cmds.Add(#"CREATE INDEX [IX_Telemetry_ServiceId] ON [Telemetry] ([ServiceId])");
cmds.Add(#"CREATE INDEX [IX_Telemetry_Name] ON [Telemetry] ([Name])");
cmds.Add(#"CREATE TABLE [Events] ([Id] INTEGER PRIMARY KEY, [TelemetryId] INTEGER NOT NULL, [TimestampTicks] INTEGER NOT NULL, [Value] TEXT NOT NULL)");
cmds.Add(#"CREATE INDEX [IX_Events_TelemetryId] ON [Events] ([TelemetryId])");
cmds.Add(#"CREATE INDEX [IX_Events_TimestampTicks] ON [Events] ([TimestampTicks])");
And here's my queries with their strange timer results:
sqlite> SELECT MIN(e.TimestampTicks) FROM Events e INNER JOIN Telemetry ss ON ss.ID = e.TelemetryID INNER JOIN Services s ON s.ID = ss.ServiceID WHERE s.AssetID = 1;
634678974004420000
CPU Time: user 0.296402 sys 0.374402
sqlite> SELECT MIN(e.TimestampTicks) FROM Events e INNER JOIN Telemetry ss ON ss.ID = e.TelemetryID INNER JOIN Services s ON s.ID = ss.ServiceID WHERE s.AssetID = 2;
634691940264680000
CPU Time: user 0.062400 sys 0.124801
sqlite> SELECT MIN(e.TimestampTicks) FROM Events e INNER JOIN Telemetry ss ON ss.ID = +e.TelemetryID INNER JOIN Services s ON s.ID = ss.ServiceID WHERE s.AssetID = 1;
634678974004420000
CPU Time: user 0.000000 sys 0.000000
sqlite> SELECT MIN(e.TimestampTicks) FROM Events e INNER JOIN Telemetry ss ON ss.ID = +e.TelemetryID INNER JOIN Services s ON s.ID = ss.ServiceID WHERE s.AssetID = 2;
634691940264680000
CPU Time: user 0.265202 sys 0.078001
Now I can understand why adding the '+' might change the time, but why is it so inconsistent with the AssetId change? Is there some other index I should create for these MIN queries? There are 900000 rows in the Events table.
Query Plans (first with '+'):
0|0|0|SEARCH TABLE Events AS e USING INDEX IX_Events_TimestampTicks (~1 rows)
0|1|1|SEARCH TABLE Telemetry AS ss USING INTEGER PRIMARY KEY (rowid=?) (~1 rows)
0|2|2|SEARCH TABLE Services AS s USING INTEGER PRIMARY KEY (rowid=?) (~1 rows)
0|0|2|SEARCH TABLE Services AS s USING COVERING INDEX IX_Services_AssetId (AssetId=?) (~1 rows)
0|1|1|SEARCH TABLE Telemetry AS ss USING COVERING INDEX IX_Telemetry_ServiceId (ServiceId=?) (~1 rows)
0|2|0|SEARCH TABLE Events AS e USING INDEX IX_Events_TelemetryId (TelemetryId=?) (~1 rows)
EDIT: In summary, given the tables above what indexes would you create if these were the only queries to ever be executed:
SELECT MIN/MAX(e.TimestampTicks) FROM Events e INNER JOIN Telemetry t ON t.ID = e.TelemetryID INNER JOIN Services s ON s.ID = t.ServiceID WHERE s.AssetID = #AssetId;
SELECT e1.* FROM Events e1 INNER JOIN Telemetry t1 ON t1.Id = e1.TelemetryId INNER JOIN Services s1 ON s1.Id = t1.ServiceId WHERE t1.Name = #TelemetryName AND s1.Name = #ServiceName;
SELECT * FROM Events e INNER JOIN Telemetry t ON t.Id = e.TelemetryId INNER JOIN Services s ON s.Id = t.ServiceId WHERE s.AssetId = #AssetId AND e.TimestampTicks >= #StartTimeTicks ORDER BY e.TimestampTicks LIMIT 1000;
SELECT e.Id, e.TelemetryId, e.TimestampTicks, e.Value FROM (
SELECT e2.Id AS [Id], MAX(e2.TimestampTicks) as [TimestampTicks]
FROM Events e2 INNER JOIN Telemetry t ON t.Id = e2.TelemetryId INNER JOIN Services s ON s.Id = t.ServiceId
WHERE s.AssetId = #AssetId AND e2.TimestampTicks <= #StartTimeTicks
GROUP BY e2.TelemetryId) AS grp
INNER JOIN Events e ON grp.Id = e.Id;
Brannon,
Regarding time differences with change of AssetID:
Perhaps you've already tried this, but have you run each query several times in succession? The memory caching of BOTH your operating system and sqlite will often make a second query much faster than the first run within a session. I would run a given query four times in a row, and see if the 2nd-4th runs are more consistent in timing.
Regarding use of the "+"
(For those who may not know, within a SELECT preceding a field with "+" gives sqlite a hint NOT to use that field's index in the query. May cause your query to miss results if sqlite has optimized the storage to keep the data ONLY in that index. Suspect this is deprecated.)
Have you run the ANALYZE command? It helps the sqlite optimizer quite a bit when making decisions.
http://sqlite.org/lang_analyze.html
Once your schema is stable and your tables are populated, you may only need to run it once -- no need to run it every day.
INDEXED BY
INDEXED BY is a feature the author discourages for typical use, but you might find it helpful in your evaluations.
http://www.sqlite.org/lang_indexedby.html
I'd be interested to know what you discover,
Donald Griggs, Columbia SC USA
Related
Running Windows Server 2012, Hyper-V, SQL Server 2012 Active/Passive failover cluster w/two 8-processor, 60GB nodes, single instance, 300 databases. This query produces inconsistent results, running anywhere between 10 and 30 seconds.
DECLARE #OrgID BigInt = 780246
DECLARE #ActiveOnly Bit = 0
DECLARE #RestrictToOrgID Bit = 0;
WITH og (OrgID, GroupID) AS
(
SELECT ID, ID FROM Common.com.Organizations WHERE ISNULL(ParentID, 0) <> ID
UNION ALL
SELECT o.ID, og.GroupID FROM Common.com.Organizations o JOIN og ON og.OrgID = o.ParentID
)
SELECT e.*, v.Type AS VendorType, v.F1099, v.F1099Type, v.TaxID, v.TaxPercent,
v.ContactName, v.ContactPhone, v.ContactEMail, v.DistrictWide,
a.*
FROM og
JOIN books.Organizations bo ON bo.CommonID = og.OrgID
JOIN books.Organizations po ON po.CommonID = og.GroupID
JOIN books.Entities e ON e.OrgID = po.ID
JOIN Vendors v ON v.ID = e.ID
AND (e.OrgID = bo.ID OR v.DistrictWide = 1)
LEFT JOIN Addresses a ON a.ID = e.AddressID
WHERE bo.ID = #OrgID
AND (#ActiveOnly = 0 OR e.Active = 1)
AND (#RestrictToOrgID = 0 OR e.OrgID = #OrgID)
ORDER BY e.EntityName
Replacing the LEFT JOIN Addresses with JOIN Addresses
JOIN Addresses a ON a.ID = e.AddressID
WHERE bo.ID = #OrgID
AND (#ActiveOnly = 0 OR e.Active = 1)
AND (#RestrictToOrgID = 0 OR e.OrgID = #OrgID)
ORDER BY e.EntityName
or reducing the length of the columns selected from Addresses to less than 100 bytes
SELECT e.*, v.Type AS VendorType, v.F1099, v.F1099Type, v.TaxID, v.TaxPercent,
v.ContactName, v.ContactPhone, v.ContactEMail, v.DistrictWide,
a.Fax
reduces the execution time to about .5 seconds.
In addition, using SELECT DISTINCT and joining books.Entities to Vendors
SELECT DISTINCT e.*, v.Type AS VendorType, v.F1099, v.F1099Type, v.TaxID, v.TaxPercent,
v.ContactName, v.ContactPhone, v.ContactEMail, v.DistrictWide,
a.*
FROM og
JOIN books.Organizations bo ON bo.CommonID = og.OrgID
JOIN books.Organizations po ON po.CommonID = og.GroupID
JOIN Vendors v
JOIN books.Entities e ON v.ID = e.ID
ON e.OrgID = bo.ID OR (e.OrgID = po.ID AND v.DistrictWide = 1)
Reduces the time to about .75 seconds.
Summary
These conditions indicate there is some kind of resource limitation in the SQL Server instance that is causing these erratic results and I don't know how to go about diagnosing it. If I copy the offending database to my laptop running SQL Server 2012, the problem does not present. I can continue to change the SQL around and hope for the best but I would prefer to find a more definitive solution.
Any suggestions are appreciated.
Update 2/27/18
The execution plan for the unmodified query shows a Clustered Index Seek against the Addresses table as the problem.
Reducing the length of the columns selected from Addresses to less than 100 bytes
SELECT e.*, v.Type AS VendorType, v.F1099, v.F1099Type, v.TaxID, v.TaxPercent,
v.ContactName, v.ContactPhone, v.ContactEMail, v.DistrictWide,
a.Fax
replaced the Clustered Index Seek with a Clustered Index Scan to retrieve a.Fax and a Hash Match to join this value to the results.
The Addresses table primary key is created as follows:
ALTER TABLE dbo.Addresses
ADD CONSTRAINT PK_Addresses PRIMARY KEY CLUSTERED (ID ASC)
WITH (PAD_INDEX = OFF,
STATISTICS_NORECOMPUTE = OFF,
SORT_IN_TEMPDB = OFF,
IGNORE_DUP_KEY = OFF,
ONLINE = OFF,
ALLOW_ROW_LOCKS = ON,
ALLOW_PAGE_LOCKS = ON)
ON PRIMARY
This index is defragged and optimized, as needed, every day.
So far, I can find nothing helpful as to why the Clustered Index Seek adds so much time to the query.
Ok, as is so often the case, there was not one problem, but two problems. This is an example of where complex problem analysis can lead to the wrong conclusions.
The primary problem turned out to be the recursive CTE og which returns a pivot table giving the parent/child relationships between organizations. However, analysis of the execution plans appeared to indicate the problem was some kind of glitch in the optimizer related to the amount of data being returned from a left-joined table. This may be entirely the result of my inability to properly analyze an execution plan but there does appear to be some issue in how SQL Server 2012 SP4 creates an execution plan under these circumstances.
While far more significant on our production server, the problem with SQL Server's optimization of recursive CTE was apparent on both my localhost, running 2012 SP4, and staging server, running SP2. But it took further analysis and some guesswork to see it.
The Solution
I replaced the recursive CTE with a pivot table and added a trigger to the Organizations table to maintain it.
USE Common
GO
CREATE VIEW com.OrganizationGroupsCTE
AS
WITH cte (OrgID, GroupID) AS
(
SELECT ID, ID FROM com.Organizations WHERE ISNULL(ParentID, 0) <> ID
UNION ALL
SELECT o.ID, cte.GroupID FROM com.Organizations o JOIN cte ON cte.OrgID = o.ParentID
)
SELECT OrgID, GroupID FROM cte
GO
CREATE TABLE com.OrganizationGroups
(
OrgID BIGINT,
GroupID BIGINT
)
INSERT com.OrganizationGroups
SELECT OrgID, GroupID
FROM com.OrganizationGroupsCTE
GO
CREATE TRIGGER TR_OrganizationGroups ON com.Organizations AFTER INSERT,UPDATE,DELETE
AS
DELETE og
FROM com.OrganizationGroups og
JOIN deleted d ON d.ID IN (og.groupID, og.orgID);
INSERT com.OrganizationGroups
SELECT orgID, groupID
FROM inserted i
JOIN OrganizationGroupsCTE cte ON i.ID IN (cte.orgID, cte.groupID)
GO
After modifying the query to use the pivot table,
SELECT e.*, v.Type AS VendorType, v.F1099, v.F1099Type, v.TaxID, v.TaxPercent,
v.ContactName, v.ContactPhone, v.ContactEMail, v.DistrictWide,
a.*
FROM Common.com.OrganizationGroups og
JOIN books.Organizations bo ON bo.CommonID = og.OrgID
JOIN books.Organizations po ON po.CommonID = og.GroupID
JOIN books.Entities e ON e.OrgID = po.ID
JOIN Vendors v ON v.ID = e.ID
AND (e.OrgID = bo.ID OR v.DistrictWide = 1)
LEFT JOIN Addresses a ON a.ID = e.AddressID
WHERE bo.ID = #OrgID
AND (#ActiveOnly = 0 OR e.Active = 1)
AND (#RestrictToOrgID = 0 OR e.OrgID = #OrgID)
ORDER BY e.EntityName
SQL Server performance was improved, and consistent, in all three environments. Problems on the production server have now been eliminated.
We have hosted our database in Azure and are running stored procedures on this DB. The stored procedures had been running fine till last week but suddenly started giving error connection timeout.
Our database size is 14 GB and the stored procedures in general return 2k to 20k records and we are using the S3 pricing tier (50 DTU) of Azure DB.
What I found interesting was the first time the stored procedure is executed, it takes a lot of time 2 - 3 mins and this is causing the timeout. The later executions are fast (maybe it caches the execution plan).
Also when I run on the same DB with the same number of records on a machine with the config of 8gb ram, Win10 it runs in 15 seconds.
This is my stored procedure:
CREATE PROCEDURE [dbo].[PRSP]
#CompanyID INT,
#fromDate DATETIME,
#toDate DATETIME,
#ListMailboxId as MailboxIds Readonly,
#ListConversationType as ConversationTypes Readonly
AS
BEGIN
SET NOCOUNT ON;
SELECT
C.ID,
C.MailboxID,
C.Status,
C.CustomerID,
Cust.FName,
Cust.LName,
C.ArrivalDate as ConversationArrivalDate,
C.[ClosureDate],
C.[ConversationType],
M.[From],
M.ArrivalDate as MessageArrivalDate,
M.ID as MessageID
FROM
[Conversation] as C
INNER JOIN
[ConversationHistory] AS CHis ON (CHis.ConversationID = C.ID)
INNER JOIN
[Message] AS M ON (M.ConversationID = C.ID)
INNER JOIN
[Mailbox] AS Mb ON (Mb.ID = C.MailboxID)
INNER JOIN
[Customer] AS Cust ON (Cust.ID = C.CustomerID)
JOIN
#ListConversationType AS convType ON convType.ID = C.[ConversationType]
JOIN
#ListMailboxId AS mailboxIds ON mailboxIds.ID = Mb.ID
WHERE
Mb.CompanyID = #CompanyID
AND ((CHis.CreatedOn > #fromDate
AND CHis.CreatedOn < #toDate
AND CHis.Activity = 1
AND CHis.TagData = '3')
OR (M.ArrivalDate > #fromDate
AND M.ArrivalDate < #toDate))
END
This is the execution plan :
Execution Plan
Please do give your suggestions as to what improvement is needed? Also do we need to upgrade my pricing tier?
Ideally for a 14GB DB what should be the Azure Pricing tier?
That query should take 1 to 3 seconds to complete on your Windows 10 8Gb RAM machine. It takes 15 seconds because SQL Server choose a poor execution plan. In this case, the root cause of poor execution plan is bad estimates, several operators in the plan show big difference between estimated rows and actual rows. For example, SQL Server estimated it only need to perform one seek into pk_customer clustered index, but it performed 16,522 seeks. The same thing occurs with [ConversationHistory].[IX_ConversationID_CreatedOn_Activity_ByWhom] and with [Message].[IX_ConversationID_ID_ArrivalDt_From_RStatus_Type.
Here you have some hints you could follow to improve the performance of the query:
Update statistics
Try OPTION (HASH JOIN) at the end of the query. It might improve the
performance or it might slow it down, it even can cause the query to
error.
Store table variable data in temporal tables and use them in the query. (SELECT * INTO #temp_table FROM #table_variable). Table variables don't have statistics causing bad estimates.
Identify the first operator where the difference between estimated rows and actual rows are big enough. Split the query. Query1: SELECT * INTO #operator_result FROM (query equivalent to operator). Query2: write the query using #operator_result. Because #operator_result is a temporal table SQL Server is forced to reevaluate estimates. In this case, the offending operator is the hash match (inner join)
There are other things you can do to improve the performance of this query:
Avoid key lookups. There are 16,522 key lookups into Conversation.PK_dbo.Conversation clusterd index. It can ve avoided by creating the appropriate covering index. In this case, the covering index is the following:
DROP INDEX [IX_MailboxID] ON [dbo].[Conversation]
GO
CREATE INDEX IX_MailboxID ON [dbo].[Conversation](MailboxID)
INCLUDE (ArrivalDate, Status, ClosureDate, CustomerID, ConversationType)
Split OR predicate into UNION or UNION ALL. For example:
instead of:
SELECT *
FROM table
WHERE <predicate1> OR <predicate2>
use:
SELECT *
FROM table
WHERE <predicate1>
UNION
SELECT *
FROM table
WHERE <predicate2>
Sometimes it improves performance.
Apply each hint individually and measure performance.
EDIT: You can try the following and see if it improves performance:
SELECT
C.ID,
C.MailboxID,
C.Status,
C.CustomerID,
Cust.FName,
Cust.LName,
C.ArrivalDate as ConversationArrivalDate,
C.[ClosureDate],
C.[ConversationType],
M.[From],
M.ArrivalDate as MessageArrivalDate,
M.ID as MessageID
FROM
#ListConversationType AS convType
INNER JOIN (
#ListMailboxId AS mailboxIds
INNER JOIN
[Mailbox] AS Mb ON (Mb.ID = mailboxIds.MailboxID)
INNER JOIN
[Conversation] as C
ON C.ID = Mb.ID
) ON convType.ID = C.[ConversationType]
INNER HASH JOIN
[Customer] AS Cust ON (Cust.ID = C.CustomerID)
INNER HASH JOIN
[ConversationHistory] AS CHis ON (CHis.ConversationID = C.ID)
INNER HASH JOIN
[Message] AS M ON (M.ConversationID = C.ID)
WHERE
Mb.CompanyID = #CompanyID
AND ((CHis.CreatedOn > #fromDate
AND CHis.CreatedOn < #toDate
AND CHis.Activity = 1
AND CHis.TagData = '3')
OR (M.ArrivalDate > #fromDate
AND M.ArrivalDate < #toDate))
And this:
SELECT
C.ID,
C.MailboxID,
C.Status,
C.CustomerID,
Cust.FName,
Cust.LName,
C.ArrivalDate as ConversationArrivalDate,
C.[ClosureDate],
C.[ConversationType],
M.[From],
M.ArrivalDate as MessageArrivalDate,
M.ID as MessageID
FROM
#ListConversationType AS convType
INNER JOIN (
#ListMailboxId AS mailboxIds
INNER JOIN
[Mailbox] AS Mb ON (Mb.ID = mailboxIds.MailboxID)
INNER JOIN
[Conversation] as C
ON C.ID = Mb.ID
) ON convType.ID = C.[ConversationType]
INNER MERGE JOIN
[Customer] AS Cust ON (Cust.ID = C.CustomerID)
INNER MERGE JOIN
[ConversationHistory] AS CHis ON (CHis.ConversationID = C.ID)
INNER MERGE JOIN
[Message] AS M ON (M.ConversationID = C.ID)
WHERE
Mb.CompanyID = #CompanyID
AND ((CHis.CreatedOn > #fromDate
AND CHis.CreatedOn < #toDate
AND CHis.Activity = 1
AND CHis.TagData = '3')
OR (M.ArrivalDate > #fromDate
AND M.ArrivalDate < #toDate))
50 DTU is equivalent to 1/2 logical core.
See more: Using the Azure SQL Database DTU Calculator
I had the same issue this week and the final users claimed slowness in using the application connected to the VM hosted in Azure. Also, I have almost the same VM (4CPUs, 14GB of RAM and S3 but with 100DTUs).
In my case, I had a lot of indexes with avg_fragmentation_in_percent greater than 30 and this caused poor performance in executing stored procedures.
Run this in SSMS and if the indexes of the tables you are running your stored procedure against are there, then you might take care of it:
SELECT dbschemas.[name] as 'Schema',
dbtables.[name] as 'Table',
dbindexes.[name] as 'Index',
indexstats.avg_fragmentation_in_percent,
indexstats.page_count
FROM sys.dm_db_index_physical_stats (DB_ID(), NULL, NULL, NULL, NULL) AS indexstats
INNER JOIN sys.tables dbtables on dbtables.[object_id] = indexstats.[object_id]
INNER JOIN sys.schemas dbschemas on dbtables.[schema_id] = dbschemas.[schema_id]
INNER JOIN sys.indexes AS dbindexes ON dbindexes.[object_id] = indexstats.[object_id]
WHERE indexstats.database_id = DB_ID()
AND indexstats.index_id = dbindexes.index_id
AND indexstats.avg_fragmentation_in_percent >30
--AND dbindexes.[name] like '%CLUSTER%'
ORDER BY indexstats.avg_fragmentation_in_percent DESC
More info here.
Edit:
Check also how old the statistics are:
SELECT
sys.objects.name AS table_name,
sys.indexes.name as index_name,
sys.indexes.type_desc as index_type,
stats_date(sys.indexes.object_id,sys.indexes.index_id)
as last_update_stats_date,
DATEDIFF(d,stats_date(sys.indexes.object_id,sys.indexes.index_id),getdate())
as stats_age_in_days
FROM
sys.indexes
INNER JOIN sys.objects on sys.indexes.object_id=sys.objects.object_id
WHERE
sys.objects.type = 'U'
AND
sys.indexes.index_id > 0
--AND sys.indexes.name Like '%CLUSTER%'
ORDER BY
stats_age_in_days DESC;
GO
I have the following query, its runtime is ~2 seconds until I join ProductStores on StoreID which increases it to ~3 minutes, joining only on ProductID keeps it at ~2 seconds.
SELECT
Enabled = pp.PspEnabled
, StockStatusID = ss.ID
, WebSellable = pp.PspWebSellable
, CSSellable = pp.PspCsSellable
FROM
#ExternalProducts pp
JOIN
Product p ON p.ExternalCode = pp.code
JOIN
Stores s ON s.Name = pp.p_externalStore
JOIN
StockStatus ss ON ss.Name = pp.PspStockStatus
JOIN
ProductStores ps ON (/* Store join increases time only */ ps.StoreID = s.ID AND ps.ProductID = p.ID)
Rows:
Stores: 108
Product: 136'598
ProductStores: 609'963
Keys:
CONSTRAINT [PK_dbo.Stores]
PRIMARY KEY CLUSTERED ([ID] ASC)
CONSTRAINT [PK_dbo.Product]
PRIMARY KEY CLUSTERED ([ID] ASC)
CONSTRAINT [PK_dbo.ProductStores]
PRIMARY KEY CLUSTERED ([ProductID] ASC, [StoreID] ASC)
CONSTRAINT [FK_dbo.ProductStores_dbo.Stores_SiteID]
FOREIGN KEY([StoreID]) REFERENCES [dbo].[Stores] ([ID])
CONSTRAINT [FK_dbo.ProductStores_dbo.Product_ProductID]
FOREIGN KEY([ProductID]) REFERENCES [dbo].[Product] ([ID])
Execution Plan:
The execution plan shows the bulk cost is coming from a Hash Match (Inner Join) with a Hash Keys Probe [dbo].[Stores].Name and Hash Keys Build [#ExternalProducts].p_externalstore which I assume is the problem but I'm not sure how to interpret this?
Any help is greatly appreciated!
Denis Rubashkin noted that the estimated and actual row numbers are actually very different. After the UPDATE STATISTICS failed to modify the execution plan I reorganised the query to pull from ProductStores and filter from #ExternalProducts instead which allowed me to force a Nested Loop Join (which I assumed would be preferable on smaller result sets).
SELECT
Enabled = pp.PspEnabled
, StockStatusID = ss.ID
, WebSellable = pp.PspWebSellable
, CSSellable = pp.PspCsSellable
FROM ProductStores ps
JOIN Product p ON p.ID = ps.ProductID
JOIN Stores s ON s.ID = ps.StoreID
INNER LOOP JOIN #ExternalProducts pp ON (p.Code = pp.Code AND s.Name = pp.p_externalstore)
JOIN StockStatus ss ON ss.Name = pp.PspStockStatus
This reduced the query time to ~7 seconds from ~3 minutes which is acceptable for this purpose!
I am running the following query:
SELECT fat.*
FROM Table1 fat
LEFT JOIN modo_captura mc ON mc.id = fat.modo_captura_id
INNER JOIN loja lj ON lj.id = fat.loja_id
INNER JOIN rede rd ON rd.id = fat.rede_id
INNER JOIN bandeira bd ON bd.id = fat.bandeira_id
INNER JOIN produto pd ON pd.id = fat.produto_id
INNER JOIN loja_extensao le ON le.id = fat.loja_extensao_id
INNER JOIN conta ct ON ct.id = fat.conta_id
INNER JOIN banco bc ON bc.id = ct.banco_id
LEFT JOIN conciliacao_vendas cv ON fat.empresa_id = cv.empresa_id AND cv.chavefato = fat.chavefato AND fat.rede_id = cv.rede_id
WHERE 1 = 1
AND cv.controle_upload_arquivo_id = 6906
AND fat.parcela = 1
ORDER BY fat.data_venda, fat.data_credito limit 20
But very slowly. Here the Explain plan: http://explain.depesz.com/s/DnXH
Try this rewritten version:
SELECT fat.*
FROM Table1 fat
JOIN conciliacao_vendas cv USING (empresa_id, chavefato, rede_id)
JOIN loja lj ON lj.id = fat.loja_id
JOIN rede rd ON rd.id = fat.rede_id
JOIN bandeira bd ON bd.id = fat.bandeira_id
JOIN produto pd ON pd.id = fat.produto_id
JOIN loja_extensao le ON le.id = fat.loja_extensao_id
JOIN conta ct ON ct.id = fat.conta_id
JOIN banco bc ON bc.id = ct.banco_id
LEFT JOIN modo_captura mc ON mc.id = fat.modo_captura_id
WHERE cv.controle_upload_arquivo_id = 6906
AND fat.parcela = 1
ORDER BY fat.data_venda, fat.data_credito
LIMIT 20;
JOIN syntax and sequence of joins
In particular I fixed the misleading LEFT JOIN to conciliacao_vendas, which is forced to act as a plain [INNER] JOIN by the later WHERE condition anyways. This should simplify query planning and allow to eliminate rows earlier in the process, which should make everything a lot cheaper. Related answer with detailed explanation:
Explain JOIN vs. LEFT JOIN and WHERE condition performance suggestion in more detail
USING is just a syntactical shorthand.
Since there are many tables involved in the query and the order the rewritten query joins tables is optimal now, you can fine-tune this with SET LOCAL join_collapse_limit = 1 to save planning overhead and avoid inferior query plans. Run in a single transaction:
BEGIN;
SET LOCAL join_collapse_limit = 1;
SELECT ...; -- read data here
COMMIT; -- or ROOLBACK;
More about that:
Sample Query to show Cardinality estimation error in PostgreSQL
The fine manual on Controlling the Planner with Explicit JOIN Clauses
Index
Add some indexes on lookup tables with lots or rows (not necessary for just a couple of dozens), in particular (taken from your query plan):
Seq Scan on public.conta ct ... rows=6771
Seq Scan on public.loja lj ... rows=1568
Seq Scan on public.loja_extensao le ... rows=16394
That's particularly odd, because those columns look like primary key columns and should already have an index ...
So:
CREATE INDEX conta_pkey_idx ON public.conta (id);
CREATE INDEX loja_pkey_idx ON public.loja (id);
CREATE INDEX loja_extensao_pkey_idx ON public.loja_extensao (id);
To make this really fat, a multicolumn index would be of great service:
CREATE INDEX foo ON Table1 (parcela, data_venda, data_credito);
I am stuck... A 'data' table with columns 'value' and 'datatype' is populated with engine load and vehicle speed and each record is stamped with date, time, lat, long. I want to query for engine load over 10% while the vehicle is moving (e.g. speed > 0). I can create a query to select the engine load and I can create a query to select the vehicle speed but how do I create a query to select engine load when > 10% AND the Vehicle is moving where their date, time lat, and long are equal?
This Query does not work, but it provides a jist of what I am trying to do. Can anyone help me create a query?
tables
TName: data
PK datakey
value
fk1 dataeventkey
fk2 datatypenamekey
TName: datatypename
PK datatypenamekey
datatypename
TName: dataevent
PK dataeventkey
datetime
lat
long
SELECT
d1.datetime
FROM
(data INNER JOIN datatypename ON data.datatypenamekey = datatypename.datatypenamekey
INNER JOIN dataevent ON dataevent.dataeventkey = data.dataeventkey) d1
WHERE
( d1.datatypename = "Engine Load [%]" AND d1.value > 10 )
INNER JOIN
SELECT
d2.datetime
FROM
(data INNER JOIN datatypename ON data.datatypenamekey = datatypename.datatypenamekey
INNER JOIN dataevent ON dataevent.dataeventkey = data.dataeventkey) d2
WHERE
( d2.datatypename = "Vehicle Speed [mph]" AND d2.value > 0 )
ON d1.datetime = d2.datetime
I'm not 100% sure I understand, but I think you just need to reference two instances of the same table. Kind of making some assumptions based on your SQL, but giving it a shot here:
SELECT
engineLoad.dateTime
FROM
(
SELECT
d.datakey,
de.datetime
FROM
data d
INNER JOIN datatypename dt ON data.datatypenamekey = dt.datatypenamekey
INNER JOIN dataevent de ON de.dataeventkey = d.dataeventkey
WHERE
d.value > 10 AND
dt.datatypename = "Engine Load [%]"
) engineLoad
INNER JOIN
(
SELECT
d.datakey,
de.datetime
FROM
data d
INNER JOIN datatypename dt ON data.datatypenamekey = dt.datatypenamekey
INNER JOIN dataevent de ON de.dataeventkey = d.dataeventkey
WHERE
d.value > 0 AND
dt.datatypename = "Vehicle Speed [mph]"
) vehicleSpeed
ON engineLoad.dataKey = vehicleSpeed.dataKey <==might need to remove this line
AND engineLoad.datetime = vehicleSpeed.datetime
Edit
Looks like you need to reference datatypename twice as well? Edited the above, so try again.