Slow Join on Composite Primary Key - sql

I have the following query, its runtime is ~2 seconds until I join ProductStores on StoreID which increases it to ~3 minutes, joining only on ProductID keeps it at ~2 seconds.
SELECT
Enabled = pp.PspEnabled
, StockStatusID = ss.ID
, WebSellable = pp.PspWebSellable
, CSSellable = pp.PspCsSellable
FROM
#ExternalProducts pp
JOIN
Product p ON p.ExternalCode = pp.code
JOIN
Stores s ON s.Name = pp.p_externalStore
JOIN
StockStatus ss ON ss.Name = pp.PspStockStatus
JOIN
ProductStores ps ON (/* Store join increases time only */ ps.StoreID = s.ID AND ps.ProductID = p.ID)
Rows:
Stores: 108
Product: 136'598
ProductStores: 609'963
Keys:
CONSTRAINT [PK_dbo.Stores]
PRIMARY KEY CLUSTERED ([ID] ASC)
CONSTRAINT [PK_dbo.Product]
PRIMARY KEY CLUSTERED ([ID] ASC)
CONSTRAINT [PK_dbo.ProductStores]
PRIMARY KEY CLUSTERED ([ProductID] ASC, [StoreID] ASC)
CONSTRAINT [FK_dbo.ProductStores_dbo.Stores_SiteID]
FOREIGN KEY([StoreID]) REFERENCES [dbo].[Stores] ([ID])
CONSTRAINT [FK_dbo.ProductStores_dbo.Product_ProductID]
FOREIGN KEY([ProductID]) REFERENCES [dbo].[Product] ([ID])
Execution Plan:
The execution plan shows the bulk cost is coming from a Hash Match (Inner Join) with a Hash Keys Probe [dbo].[Stores].Name and Hash Keys Build [#ExternalProducts].p_externalstore which I assume is the problem but I'm not sure how to interpret this?
Any help is greatly appreciated!

Denis Rubashkin noted that the estimated and actual row numbers are actually very different. After the UPDATE STATISTICS failed to modify the execution plan I reorganised the query to pull from ProductStores and filter from #ExternalProducts instead which allowed me to force a Nested Loop Join (which I assumed would be preferable on smaller result sets).
SELECT
Enabled = pp.PspEnabled
, StockStatusID = ss.ID
, WebSellable = pp.PspWebSellable
, CSSellable = pp.PspCsSellable
FROM ProductStores ps
JOIN Product p ON p.ID = ps.ProductID
JOIN Stores s ON s.ID = ps.StoreID
INNER LOOP JOIN #ExternalProducts pp ON (p.Code = pp.Code AND s.Name = pp.p_externalstore)
JOIN StockStatus ss ON ss.Name = pp.PspStockStatus
This reduced the query time to ~7 seconds from ~3 minutes which is acceptable for this purpose!

Related

JOIN query optimization using indexes

The question is: How to increase the speed of this query?
SELECT
c.CategoryName,
sc.SubcategoryName,
pm.ProductModel,
COUNT(p.ProductID) AS ModelCount
FROM Marketing.ProductModel pm
JOIN Marketing.Product p
ON p.ProductModelID = pm.ProductModelID
JOIN Marketing.Subcategory sc
ON sc.SubcategoryID = p.SubcategoryID
JOIN Marketing.Category c
ON c.CategoryID = sc.CategoryID
GROUP BY c.CategoryName,
sc.SubcategoryName,
pm.ProductModel
HAVING COUNT(p.ProductID) > 1
Schema:
I tried creating some indexes and reorganizing the order of the JOINs. This did not increase productivity in the least. Maybe I need other indexes or a different query?
My solution:
CREATE INDEX idx_Marketing_Subcategory_IDandName ON Marketing.Subcategory (CategoryID)
CREATE INDEX idx_Marketing_Product_PMID ON Marketing.Product (ProductModelID)
CREATE INDEX idx_Marketing_Product_SCID ON Marketing.Product (SubcategoryID)
SELECT
c.CategoryName,
sc.SubcategoryName,
pm.ProductModel,
COUNT(p.ProductID) AS ModelCount
FROM Marketing.Category AS c
JOIN Marketing.Subcategory AS SC
ON c.CategoryID = SC.CategoryID
JOIN Marketing.Product AS P
ON SC.SubcategoryID = p.SubcategoryID
JOIN Marketing.ProductModel AS PM
ON P.ProductModelID = PM.ProductModelID
GROUP BY c.CategoryName,
sc.SubcategoryName,
pm.ProductModel
HAVING COUNT(p.ProductID) > 1
UPD:
Results:
Plan with my indexes:
Plan
Your query has a cost of 0.12 which is trivial, as is the number of rows, it executes in microseconds, row esitmates are also reasonably close so it's not clear what the problem is you are trying to solve.
Looking at the execution plan there is a key lookup for ProductModelId with an estimated cost of 44% of the query, so you could eliminate this with a covering index by including the column in the index Product.idx_Marketing_Product_SCID
Create index idx_Marketing_Product_SCID on Marketing.Product (SubcategoryID)
include (ProductModelId) with(drop_existing=on)

How to diagnose slow/inconsistent SQL Server query?

Running Windows Server 2012, Hyper-V, SQL Server 2012 Active/Passive failover cluster w/two 8-processor, 60GB nodes, single instance, 300 databases. This query produces inconsistent results, running anywhere between 10 and 30 seconds.
DECLARE #OrgID BigInt = 780246
DECLARE #ActiveOnly Bit = 0
DECLARE #RestrictToOrgID Bit = 0;
WITH og (OrgID, GroupID) AS
(
SELECT ID, ID FROM Common.com.Organizations WHERE ISNULL(ParentID, 0) <> ID
UNION ALL
SELECT o.ID, og.GroupID FROM Common.com.Organizations o JOIN og ON og.OrgID = o.ParentID
)
SELECT e.*, v.Type AS VendorType, v.F1099, v.F1099Type, v.TaxID, v.TaxPercent,
v.ContactName, v.ContactPhone, v.ContactEMail, v.DistrictWide,
a.*
FROM og
JOIN books.Organizations bo ON bo.CommonID = og.OrgID
JOIN books.Organizations po ON po.CommonID = og.GroupID
JOIN books.Entities e ON e.OrgID = po.ID
JOIN Vendors v ON v.ID = e.ID
AND (e.OrgID = bo.ID OR v.DistrictWide = 1)
LEFT JOIN Addresses a ON a.ID = e.AddressID
WHERE bo.ID = #OrgID
AND (#ActiveOnly = 0 OR e.Active = 1)
AND (#RestrictToOrgID = 0 OR e.OrgID = #OrgID)
ORDER BY e.EntityName
Replacing the LEFT JOIN Addresses with JOIN Addresses
JOIN Addresses a ON a.ID = e.AddressID
WHERE bo.ID = #OrgID
AND (#ActiveOnly = 0 OR e.Active = 1)
AND (#RestrictToOrgID = 0 OR e.OrgID = #OrgID)
ORDER BY e.EntityName
or reducing the length of the columns selected from Addresses to less than 100 bytes
SELECT e.*, v.Type AS VendorType, v.F1099, v.F1099Type, v.TaxID, v.TaxPercent,
v.ContactName, v.ContactPhone, v.ContactEMail, v.DistrictWide,
a.Fax
reduces the execution time to about .5 seconds.
In addition, using SELECT DISTINCT and joining books.Entities to Vendors
SELECT DISTINCT e.*, v.Type AS VendorType, v.F1099, v.F1099Type, v.TaxID, v.TaxPercent,
v.ContactName, v.ContactPhone, v.ContactEMail, v.DistrictWide,
a.*
FROM og
JOIN books.Organizations bo ON bo.CommonID = og.OrgID
JOIN books.Organizations po ON po.CommonID = og.GroupID
JOIN Vendors v
JOIN books.Entities e ON v.ID = e.ID
ON e.OrgID = bo.ID OR (e.OrgID = po.ID AND v.DistrictWide = 1)
Reduces the time to about .75 seconds.
Summary
These conditions indicate there is some kind of resource limitation in the SQL Server instance that is causing these erratic results and I don't know how to go about diagnosing it. If I copy the offending database to my laptop running SQL Server 2012, the problem does not present. I can continue to change the SQL around and hope for the best but I would prefer to find a more definitive solution.
Any suggestions are appreciated.
Update 2/27/18
The execution plan for the unmodified query shows a Clustered Index Seek against the Addresses table as the problem.
Reducing the length of the columns selected from Addresses to less than 100 bytes
SELECT e.*, v.Type AS VendorType, v.F1099, v.F1099Type, v.TaxID, v.TaxPercent,
v.ContactName, v.ContactPhone, v.ContactEMail, v.DistrictWide,
a.Fax
replaced the Clustered Index Seek with a Clustered Index Scan to retrieve a.Fax and a Hash Match to join this value to the results.
The Addresses table primary key is created as follows:
ALTER TABLE dbo.Addresses
ADD CONSTRAINT PK_Addresses PRIMARY KEY CLUSTERED (ID ASC)
WITH (PAD_INDEX = OFF,
STATISTICS_NORECOMPUTE = OFF,
SORT_IN_TEMPDB = OFF,
IGNORE_DUP_KEY = OFF,
ONLINE = OFF,
ALLOW_ROW_LOCKS = ON,
ALLOW_PAGE_LOCKS = ON)
ON PRIMARY
This index is defragged and optimized, as needed, every day.
So far, I can find nothing helpful as to why the Clustered Index Seek adds so much time to the query.
Ok, as is so often the case, there was not one problem, but two problems. This is an example of where complex problem analysis can lead to the wrong conclusions.
The primary problem turned out to be the recursive CTE og which returns a pivot table giving the parent/child relationships between organizations. However, analysis of the execution plans appeared to indicate the problem was some kind of glitch in the optimizer related to the amount of data being returned from a left-joined table. This may be entirely the result of my inability to properly analyze an execution plan but there does appear to be some issue in how SQL Server 2012 SP4 creates an execution plan under these circumstances.
While far more significant on our production server, the problem with SQL Server's optimization of recursive CTE was apparent on both my localhost, running 2012 SP4, and staging server, running SP2. But it took further analysis and some guesswork to see it.
The Solution
I replaced the recursive CTE with a pivot table and added a trigger to the Organizations table to maintain it.
USE Common
GO
CREATE VIEW com.OrganizationGroupsCTE
AS
WITH cte (OrgID, GroupID) AS
(
SELECT ID, ID FROM com.Organizations WHERE ISNULL(ParentID, 0) <> ID
UNION ALL
SELECT o.ID, cte.GroupID FROM com.Organizations o JOIN cte ON cte.OrgID = o.ParentID
)
SELECT OrgID, GroupID FROM cte
GO
CREATE TABLE com.OrganizationGroups
(
OrgID BIGINT,
GroupID BIGINT
)
INSERT com.OrganizationGroups
SELECT OrgID, GroupID
FROM com.OrganizationGroupsCTE
GO
CREATE TRIGGER TR_OrganizationGroups ON com.Organizations AFTER INSERT,UPDATE,DELETE
AS
DELETE og
FROM com.OrganizationGroups og
JOIN deleted d ON d.ID IN (og.groupID, og.orgID);
INSERT com.OrganizationGroups
SELECT orgID, groupID
FROM inserted i
JOIN OrganizationGroupsCTE cte ON i.ID IN (cte.orgID, cte.groupID)
GO
After modifying the query to use the pivot table,
SELECT e.*, v.Type AS VendorType, v.F1099, v.F1099Type, v.TaxID, v.TaxPercent,
v.ContactName, v.ContactPhone, v.ContactEMail, v.DistrictWide,
a.*
FROM Common.com.OrganizationGroups og
JOIN books.Organizations bo ON bo.CommonID = og.OrgID
JOIN books.Organizations po ON po.CommonID = og.GroupID
JOIN books.Entities e ON e.OrgID = po.ID
JOIN Vendors v ON v.ID = e.ID
AND (e.OrgID = bo.ID OR v.DistrictWide = 1)
LEFT JOIN Addresses a ON a.ID = e.AddressID
WHERE bo.ID = #OrgID
AND (#ActiveOnly = 0 OR e.Active = 1)
AND (#RestrictToOrgID = 0 OR e.OrgID = #OrgID)
ORDER BY e.EntityName
SQL Server performance was improved, and consistent, in all three environments. Problems on the production server have now been eliminated.

Getting very slow execution time for a postgresql query

I did the explain analyze for this query, it was giving 30ms, but if the data is more I will get an execution expired; Using PostgreSQL 10
For normal execution: https://explain.depesz.com/s/gSPP
For slow execution: https://explain.depesz.com/s/bQN2
SELECT inventory_histories.*, order_items.order_id as order_id FROM
"inventory_histories" LEFT JOIN order_items ON (order_items.id =
inventory_histories.reference_id AND inventory_histories.reference_type = 4)
WHERE "inventory_histories"."inventory_id" = 1313 AND
(inventory_histories.location_id = 15) ORDER BY inventory_histories.id DESC
LIMIT 10 OFFSET 0;
Indexes:
"inventory_histories_pkey" PRIMARY KEY, btree (id)
"inventory_histories_created_at_index" btree (created_at)
"inventory_histories_inventory_id_index" btree (inventory_id)
"inventory_histories_location_id_index" btree (location_id)
For this query:
SELECT ih.*, oi.order_id as order_id
FROM "inventory_histories" ih LEFT JOIN
order_items oi
ON oi.id = ih.reference_id AND
ih.reference_type = 4
WHERE ih."inventory_id" = 1313 AND
ih.location_id = 15
ORDER BY ih.id DESC
LIMIT 10 OFFSET 0;
For this query, you want composite indexes on inventory_histories(inventory_id, location_id, id, reference_id) and on order_items(id, reference_type, order_id).

Deadlock - Lock of a column whereas there is no data

I have a deadlock when I execute this stored procedure :
-- Delete transactions
delete from ADVICESEQUENCETRANSACTION
where ADVICESEQUENCETRANSACTION.id in (
select TR.id from ADVICESEQUENCETRANSACTION TR
inner join ACCOUNTDESCRIPTIONITEM IT on TR.ACCOUNTDESCRIPTIONITEMID = IT.id
inner join ACCOUNTDESCRIPTION ACC on IT.ACCOUNTDESCRIPTIONID = ACC.id
inner join RECOMMENDATIONDESCRIPTION RD on ACC.RECOMMENDATIONDESCRIPTIONID = RD.id
inner join RECOMMENDATION REC on REC.id = RD.RECOMMENDATIONID
inner join ADVICESEQUENCE ADV on ADV.id = REC.ADVICESEQUENCEID
where adv.Id = #AdviceSequenceId AND (#RecommendationState is NULL OR #RecommendationState=REC.[State])
);
Here is the schema of the table :
Here is the deadlock graph :
you can see the detail of the deadlock graph here
So, when I retrieve the associatedobjid of the ressource node, I identify that it's the primary key and an index of the table AdviceSequenceTransaction :
SELECT OBJECT_SCHEMA_NAME([object_id]), * ,
OBJECT_NAME([object_id])
FROM sys.partitions
WHERE partition_id = 72057595553120256 OR partition_id = 72057595553316864;
SELECT name FROM sys.indexes WHERE object_id = 31339176 and (index_id = 1 or index_id = 4)
PK_AdviceSequenceTransaction
IX_ADVICESEQUENCEID_ADVICE
As there is a relation on the table AdviceSequenceTransaction on the key ParentTransactionId and the key Primary key, I have created an index on the column ParentTransactionId.
And I have no more Deadlock. But the problem is I don't know exactly why there is no more deadlock :-/
Moreover, on the set of data to test it, there is no data in ParentTransactionId. All are NULL.
So, Even is there no data (null) in the ParentTransactionId, is there an access to the Primary key by SQL Server ???
An other thing is that I want to remove a join in the delete statement :
delete from ADVICESEQUENCETRANSACTION
where ADVICESEQUENCETRANSACTION.id in (
select TR.id from ADVICESEQUENCETRANSACTION TR
inner join ACCOUNTDESCRIPTIONITEM IT on TR.ACCOUNTDESCRIPTIONITEMID = IT.id
inner join ACCOUNTDESCRIPTION ACC on IT.ACCOUNTDESCRIPTIONID = ACC.id
inner join RECOMMENDATIONDESCRIPTION RD on ACC.RECOMMENDATIONDESCRIPTIONID = RD.id
inner join RECOMMENDATION REC on REC.id = RD.RECOMMENDATIONID
inner join ADVICESEQUENCE ADV on ADV.id = REC.ADVICESEQUENCEID
where adv.Id = #AdviceSequenceId AND (#RecommendationState is NULL OR #RecommendationState=REC.[State])
);
into :
delete from ADVICESEQUENCETRANSACTION
where ADVICESEQUENCETRANSACTION.id in (
select TR.id from ADVICESEQUENCETRANSACTION TR
inner join ACCOUNTDESCRIPTIONITEM IT on TR.ACCOUNTDESCRIPTIONITEMID = IT.id
inner join ACCOUNTDESCRIPTION ACC on IT.ACCOUNTDESCRIPTIONID = ACC.id
inner join RECOMMENDATIONDESCRIPTION RD on ACC.RECOMMENDATIONDESCRIPTIONID = RD.id
inner join RECOMMENDATION REC on REC.id = RD.RECOMMENDATIONID
where TR.AdviceSequenceId = #AdviceSequenceId AND (#RecommendationState is NULL OR #RecommendationState=REC.[State])
);
I removed the last join. But if I do this, I have again the deadlock ! And here again, I don't know why...
Thank you for your enlightment :)
Using a complex, compound join in your WHERE clause can often cause problems. SQL Server processes the clauses using the following Logical Processing Order (view here):
FROM
ON
JOIN
WHERE
GROUP BY
WITH CUBE or WITH ROLLUP
HAVING
SELECT
DISTINCT
ORDER BY
TOP
Using views, or derived tables or views greatly reduces the number of iterative (TABLE) scans required to obtain the desired result because the emphasis in your query is better aligned to use the logical execution order. The FROM clause for each of the derived tables (or views) is executed first, limiting the result set passed to the ON cause, then the JOIN clause and so on, because you're passing your parameters to an "inside FROM" rather than the "outermost WHERE".
So your code could look something like this:
delete from (SELECT ADVICESEQUENCETRANSACTION
FROM (SELECT tr.id
FROM ADVICESEQUENCETRANSACTION WITH NOLOCK
WHERE AdviceSequenceId = #AdviceSequenceId
)TR
INNER JOIN (SELECT [NEXT COLUMN]
FROM [NEXT TABLE] WITH NOLOCK
WHERE COLUMN = [PARAM]
)B
ON TR.COL = B.COL
)ALIAS
WHERE [COLUMN] = COL.PARAM
);
and so on...
(I know the code isn't cut and paste usable, but it should serve to convey the general idea)
In this way, you're passing your parameters to "inner queries" first, pre-loading your limited result set (particularly if you should use views), and then working outward. Using Locking Hints everywhere appropriate will also help to prevent some of the problems you might otherwise encounter. This technique can also help make the execution plan more effective in helping you diagnose where your blocks are coming from, should you still have any.
one approach is to use WITH (nolock) on table ADVICESEQUENCETRANSACTION TR if you dont mind dirty reads.

sqlite performance issue: one index per table is somewhat painful

So here's my schema (give or take):
cmds.Add(#"CREATE TABLE [Services] ([Id] INTEGER PRIMARY KEY, [AssetId] INTEGER NULL, [Name] TEXT NOT NULL)");
cmds.Add(#"CREATE INDEX [IX_Services_AssetId] ON [Services] ([AssetId])");
cmds.Add(#"CREATE INDEX [IX_Services_Name] ON [Services] ([Name])");
cmds.Add(#"CREATE TABLE [Telemetry] ([Id] INTEGER PRIMARY KEY, [ServiceId] INTEGER NULL, [Name] TEXT NOT NULL)");
cmds.Add(#"CREATE INDEX [IX_Telemetry_ServiceId] ON [Telemetry] ([ServiceId])");
cmds.Add(#"CREATE INDEX [IX_Telemetry_Name] ON [Telemetry] ([Name])");
cmds.Add(#"CREATE TABLE [Events] ([Id] INTEGER PRIMARY KEY, [TelemetryId] INTEGER NOT NULL, [TimestampTicks] INTEGER NOT NULL, [Value] TEXT NOT NULL)");
cmds.Add(#"CREATE INDEX [IX_Events_TelemetryId] ON [Events] ([TelemetryId])");
cmds.Add(#"CREATE INDEX [IX_Events_TimestampTicks] ON [Events] ([TimestampTicks])");
And here's my queries with their strange timer results:
sqlite> SELECT MIN(e.TimestampTicks) FROM Events e INNER JOIN Telemetry ss ON ss.ID = e.TelemetryID INNER JOIN Services s ON s.ID = ss.ServiceID WHERE s.AssetID = 1;
634678974004420000
CPU Time: user 0.296402 sys 0.374402
sqlite> SELECT MIN(e.TimestampTicks) FROM Events e INNER JOIN Telemetry ss ON ss.ID = e.TelemetryID INNER JOIN Services s ON s.ID = ss.ServiceID WHERE s.AssetID = 2;
634691940264680000
CPU Time: user 0.062400 sys 0.124801
sqlite> SELECT MIN(e.TimestampTicks) FROM Events e INNER JOIN Telemetry ss ON ss.ID = +e.TelemetryID INNER JOIN Services s ON s.ID = ss.ServiceID WHERE s.AssetID = 1;
634678974004420000
CPU Time: user 0.000000 sys 0.000000
sqlite> SELECT MIN(e.TimestampTicks) FROM Events e INNER JOIN Telemetry ss ON ss.ID = +e.TelemetryID INNER JOIN Services s ON s.ID = ss.ServiceID WHERE s.AssetID = 2;
634691940264680000
CPU Time: user 0.265202 sys 0.078001
Now I can understand why adding the '+' might change the time, but why is it so inconsistent with the AssetId change? Is there some other index I should create for these MIN queries? There are 900000 rows in the Events table.
Query Plans (first with '+'):
0|0|0|SEARCH TABLE Events AS e USING INDEX IX_Events_TimestampTicks (~1 rows)
0|1|1|SEARCH TABLE Telemetry AS ss USING INTEGER PRIMARY KEY (rowid=?) (~1 rows)
0|2|2|SEARCH TABLE Services AS s USING INTEGER PRIMARY KEY (rowid=?) (~1 rows)
0|0|2|SEARCH TABLE Services AS s USING COVERING INDEX IX_Services_AssetId (AssetId=?) (~1 rows)
0|1|1|SEARCH TABLE Telemetry AS ss USING COVERING INDEX IX_Telemetry_ServiceId (ServiceId=?) (~1 rows)
0|2|0|SEARCH TABLE Events AS e USING INDEX IX_Events_TelemetryId (TelemetryId=?) (~1 rows)
EDIT: In summary, given the tables above what indexes would you create if these were the only queries to ever be executed:
SELECT MIN/MAX(e.TimestampTicks) FROM Events e INNER JOIN Telemetry t ON t.ID = e.TelemetryID INNER JOIN Services s ON s.ID = t.ServiceID WHERE s.AssetID = #AssetId;
SELECT e1.* FROM Events e1 INNER JOIN Telemetry t1 ON t1.Id = e1.TelemetryId INNER JOIN Services s1 ON s1.Id = t1.ServiceId WHERE t1.Name = #TelemetryName AND s1.Name = #ServiceName;
SELECT * FROM Events e INNER JOIN Telemetry t ON t.Id = e.TelemetryId INNER JOIN Services s ON s.Id = t.ServiceId WHERE s.AssetId = #AssetId AND e.TimestampTicks >= #StartTimeTicks ORDER BY e.TimestampTicks LIMIT 1000;
SELECT e.Id, e.TelemetryId, e.TimestampTicks, e.Value FROM (
SELECT e2.Id AS [Id], MAX(e2.TimestampTicks) as [TimestampTicks]
FROM Events e2 INNER JOIN Telemetry t ON t.Id = e2.TelemetryId INNER JOIN Services s ON s.Id = t.ServiceId
WHERE s.AssetId = #AssetId AND e2.TimestampTicks <= #StartTimeTicks
GROUP BY e2.TelemetryId) AS grp
INNER JOIN Events e ON grp.Id = e.Id;
Brannon,
Regarding time differences with change of AssetID:
Perhaps you've already tried this, but have you run each query several times in succession? The memory caching of BOTH your operating system and sqlite will often make a second query much faster than the first run within a session. I would run a given query four times in a row, and see if the 2nd-4th runs are more consistent in timing.
Regarding use of the "+"
(For those who may not know, within a SELECT preceding a field with "+" gives sqlite a hint NOT to use that field's index in the query. May cause your query to miss results if sqlite has optimized the storage to keep the data ONLY in that index. Suspect this is deprecated.)
Have you run the ANALYZE command? It helps the sqlite optimizer quite a bit when making decisions.
http://sqlite.org/lang_analyze.html
Once your schema is stable and your tables are populated, you may only need to run it once -- no need to run it every day.
INDEXED BY
INDEXED BY is a feature the author discourages for typical use, but you might find it helpful in your evaluations.
http://www.sqlite.org/lang_indexedby.html
I'd be interested to know what you discover,
Donald Griggs, Columbia SC USA