Insert sorted data to a table with nonclustered index

Insert sorted data to a table with nonclustered index - sql

My db schema:
Table point (point_id int PK, name varchar);
Table point_log (point_log_id int PK, point_id int FK, timestamp datetime, value int)
point_log has an index:
point_log_idx1 (point_id asc, timestamp asc)
I need to insert point log samples to point_log table, in each transaction only insert log samples for the one point_id, and the log samples are already sorted ascendingly. That means the all the log samples data in a transaction is in the same order for the index( point_log_idx1), how can I make SQL Server to take advantage of this, to avoid the the tree search cost?

The tree search cost is probably negligible compared to the cost of physical writing to disk and page splitting and logging.
1) You should definitely insert data in bulk, rather than row by row.
2) To reduce page splitting of the point_log_idx1 index you can try to use ORDER BY in the INSERT statement. It still doesn't guarantee the physical order on disk, but it does guarantee the order in which point_log_id IDENTITY would be generated, and hopefully it will hint to process source data in this order. If source data is processed in the requested order, then the b-tree structure of the point_log_idx1 index may grow without unnecessary costly page splits.
I'm using SQL Server 2008. I have a system that collects a lot of monitoring data in a central database 24/7. Originally I was inserting data as it arrived, row by row. Then I realized that each insert was a separate transaction and most of the time system spent writing into the transaction log.
Eventually I moved to inserting data in batches using stored procedure that accepts table-valued parameter. In my case a batch is few hundred to few thousand rows. In my system I keep data only for a given number of days, so I regularly delete obsolete data. To keep the system performance stable I rebuild my indexes regularly as well.
In your example, it may look like the following.
First, create a table type:
CREATE TYPE [dbo].[PointValuesTableType] AS TABLE(
point_id int,
timestamp datetime,
value int
)
Then procedure would look like this:
CREATE PROCEDURE [dbo].[InsertPointValues]
-- Add the parameters for the stored procedure here
#ParamRows dbo.PointValuesTableType READONLY
AS
BEGIN
-- SET NOCOUNT ON added to prevent extra result sets from
-- interfering with SELECT statements.
SET NOCOUNT ON;
BEGIN TRANSACTION;
BEGIN TRY
INSERT INTO dbo.point_log
(point_id
,timestamp
,value)
SELECT
TT.point_id
,TT.timestamp
,TT.value
FROM #ParamRows AS TT
ORDER BY TT.point_id, TT.timestamp;
COMMIT TRANSACTION;
END TRY
BEGIN CATCH
ROLLBACK TRANSACTION;
END CATCH;
END
In practice you should measure for your system what is more efficient, with ORDER BY, or without.
You really need to consider performance of the INSERT operation as well as performance of subsequent queries.
It may be that faster inserts lead to higher fragmentation of the index, which leads to slower queries.
So, you should check the fragmentation of the index after INSERT with ORDER BY or without.
You can use sys.dm_db_index_physical_stats to get index stats.
Returns size and fragmentation information for the data and indexes of
the specified table or view in SQL Server.

This looks like a good opportunity for changing the clustered index on Point_Log to cluster by its parent point_id Foreign Key:
CREATE TABLE Point_log
(
point_log_id int PRIMARY KEY NONCLUSTERED,
point_id int,
timestamp datetime,
value int
);
And then:
CREATE CLUSTERED INDEX C_PointLog ON dbo.Point_log(point_id);
Rationale: This will reduce the read IO on point_log when fetching point_log records for a given pointid
Moreover, given that Sql Server will add a 4 byte uniquifier to a non-unique clustered index, you may as well include the Surrogate PK on the Cluster as well, to make it unique, viz:
CREATE UNIQUE CLUSTERED INDEX C_PointLog ON dbo.Point_log(point_id, point_log_id);
The non clustered index point_log_idx1 ( point_id asc, timestamp asc) would need to be retained if you have a large number of point_logs per point, and assuming good selectivity of queries filtering on point_log.pointid & point_log.timestamp

Related

Improve insert performance when checking existing rows

I have this simple query that inserts rows from one table(sn_users_main) into another(sn_users_history).
To make sure sn_users_history only has unique rows it checks if the column query_time already exists and if it does then don't insert. query_time is kind of a session identifier that is the same for every row in sn_users_main.
This works fine but since sn_users_history is reaching 50k rows running this query takes more than 2 minutes to run which is too much. Is there anything I can do to improve performance and get the same result?
INSERT INTO sn_users_history(query_time,user_id,sn_name,sn_email,sn_manager,sn_active,sn_updated_on,sn_last_Login_time,sn_is_vip,sn_created_on,sn_is_team_lead,sn_company,sn_department,sn_division,sn_role,sn_employee_profile,sn_location,sn_employee_type,sn_workstation) --- Columns of history table
SELECT snm.query_time,
snm.user_id,
snm.sn_name,
snm.sn_email,
snm.sn_manager,
snm.sn_active,
snm.sn_updated_on,
snm.sn_last_Login_time,
snm.sn_is_vip,
snm.sn_created_on,
snm.sn_is_team_lead,
snm.sn_company,
snm.sn_department,
snm.sn_division,
snm.sn_role,
snm.sn_employee_profile,
snm.sn_location,
snm.sn_employee_type,
snm.sn_workstation
---Columns of main table
FROM sn_users_main snm
WHERE NOT EXISTS(SELECT snh.query_time
FROM sn_users_history snh
WHERE snh.query_time = snm.query_time) --Dont insert items into history table if they already exist

I think you are missing extra condition on user_id, when you are inserting into history table. You have to check combination of userid, querytime.
For your question, I think you are trying to reinvent the wheel. SQL Server is already having temporal tables, to suppor this historical data holding. Read about SQL Server Temporal Tables
If you want to still continue with this approach, I would suggest you to do in batches:
Create a configuration Table to hold the last processed querytime
CREATE TABLE HistoryConfig(HistoryConfigId int, HistoryTableName SYSNAME,
lastProcessedQueryTime DATETIME)
you can do incremental historical inserts
DECLARE #lastProcessedQueryTime DATETIME = (SELECT MAX(lastProcessedQueryTime) FROM HistoryConfig)
INSERT INTO sn_users_history(query_time,user_id,sn_name,sn_email,sn_manager,sn_active,sn_updated_on,sn_last_Login_time,sn_is_vip,sn_created_on,sn_is_team_lead,sn_company,sn_department,sn_division,sn_role,sn_employee_profile,sn_location,sn_employee_type,sn_workstation) --- Columns of history table
SELECT snm.query_time,
snm.user_id,
snm.sn_name,
snm.sn_email,
snm.sn_manager,
snm.sn_active,
snm.sn_updated_on,
snm.sn_last_Login_time,
snm.sn_is_vip,
snm.sn_created_on,
snm.sn_is_team_lead,
snm.sn_company,
snm.sn_department,
snm.sn_division,
snm.sn_role,
snm.sn_employee_profile,
snm.sn_location,
snm.sn_employee_type,
snm.sn_workstation
---Columns of main table
FROM sn_users_main snm
WHERE query_time > #lastProcessedQueryTime
Now, you can update the configuration again
UPDATE HistoryConfig SET lastProcessedQueryTime = (SELECT MAX(lastProcessedQueryTime) FROM HistoryConfig)
HistoryTableName = 'sn_users_history'
I would suggest you to create index on clustered index on UserId, Query_Time(if possible, Otherwise create non-clustered index) which will improve the performance.
Other approaches you can think of:
Create clustered index on userId, querytime in the historical table and also have userid,querytime as clustered index on the main table and perform MERGE operation.

sql azure time consuming query

I have a table in Sql Azure contains about 6M rows.
I want to create a new index for it. the cmd is like:
CREATE NONCLUSTERED INDEX [INDEX1] ON [dbo].Table1
(
[Column1] ASC,
[Column2] ASC,
[Column3] ASC,
[Column4] ASC
)
INCLUDE ( [Column5],[Column6])
And after about 15 minutes, an error occurs
"Msg 10054, Level 20, State 0, Line 0
A transport-level error has occurred when receiving results from the
server. (provider: TCP Provider, error: 0 - An existing connection was
forcibly closed by the remote host.)"
I tried several times, got the same error.
But I have executed other time consuming queries,like:
Insert into table1(Col1,Col2,Col3) select Col1,Col2,Col3 from table2
Which took 20 minutes and returned successfully.
The queries were executed in the same Sql Azure DB. I don't know what's going on here. Could anyone help? Thanks!

I had a the same problem with at table containing 100M rows and contacted Microsoft Support. This is the reply i got:
The reason why you can’t create the index on your table is that you
are facing a limitation on the platform that prevents to have
transactions larger than 2GB.
The creation of an index is a transactional operation that relies on
the transaction log to execute the move of the table pages. More rows
in a table means more pages to put in the T-Log. Since your table
contains 100 million of records (which is quite a big number), it is
easy for you to hit this limit.
In order to create the index we need to change the approach.
Basically we are going to use a temporary(staging) table to store the
data while you create the index on the source table, that you would
have previously cleared from data.
Action Plan:
Create a staging table identical to the original table but without
any index (this makes the staging table a heap)
move the data from the original table to a staging table (the insert
is faster because the staging table is a heap)
empty the original table
create the index on the original table (this time the transaction should be almost empty)
move back data from staging table to original table (this would take some time, as the table contains indexes)
delete the staging table
They suggest using BCP to move data between the staging table and the original table.
When looking in the event_log table...
select * from sys.event_log
where database_name ='<DBName>'
and event_type <> 'connection_successful'
order by start_time desc
.. I found this error message:
The session has been terminated because of excessive transaction log
space usage. Try modifying fewer rows in a single transaction.

Thanks for answering! Actually, I found the root cause either.
There's a solution to it, set the ONLINE=ON, in the online mode, the index creating task will be broke into multiple small tasks so the T-Log won't exceed 2GB.
But there's a limitation, the 'include column' of the index creating command can not be object with unlimited size, like nvarchar(max), if so the command will fail immediately.
So in Sql Azure, for a index creating operation like the following:
CREATE NONCLUSTERED INDEX [INDEX1] ON [dbo].Table1
(
[Column1] ASC,
[Column2] ASC,
[Column3] ASC,
[Column4] ASC
)
INCLUDE ( [Column5],[Column6])
take the following actions, if the previous failed.
1.create index using 'online=on'
2.if #1 failed, means either column5 or column6 is nvarchar(max), query the table size, if < 2GB, directly create index using online=off.
3.if #2 fail, means table size > 2GB, then there's no simple way to create index without temporary table involved, have to take action as ahkvk replied.

Implement a ring buffer

We have a table logging data. It is logging at say 15K rows per second.
Question: How would we limit the table size to the 1bn newest rows?
i.e. once 1bn rows is reached, it becomes a ring buffer, deleting the oldest row when adding the newest.
Triggers might load the system too much. Here's a trigger example on SO.
We are already using a bunch of tweaks to keep the speed up (such as stored procedures, Table Parameters etc).
Edit (8 years on) :
My recent question/answer here addresses a similar issue using a time series database.

Unless there is something magic about 1 billion, I think you should consider other approaches.
The first that comes to mind is partitioning the data. Say, put one hour's worth of data into each partition. This will result in about 15,000*60*60 = 54 million records in a partition. About every 20 hours, you can remove a partition.
One big advantage of partitioning is that the insert performance should work well and you don't have to delete individual records. There can be additional overheads depending on the query load, indexes, and other factors. But, with no additional indexes and a query load that is primarily inserts, it should solve your problem better than trying to delete 15,000 records each second along with the inserts.

I don't have a complete answer but hopefully some ideas to help you get started.
I would add some sort of numeric column to the table. This value would increment by 1 until it reached the number of rows you wanted to keep. At that point the procedure would switch to update statements, overwriting the previous row instead of inserting new ones. You obviously won't be able to use this column to determine the order of the rows, so if you don't already I would also add a timestamp column so you can order them chronologically later.
In order to coordinate the counter value across transactions you could use a sequence, then perform a modulo division to get the counter value.
In order to handle any gaps in the table (e.g. someone deleted some of the rows) you may want to use a merge statement. This should perform an insert if the row is missing or an update if it exists.
Hope this helps.

Here's my suggestion:
Pre-populate the table with 1,000,000,000 rows, including a row number as the primary key.
Instead of inserting new rows, have the logger keep a counter variable that increments each time, and update the appropriate row according to the row number.
This is actually what you would do with a ring buffer in other contexts. You wouldn't keep allocating memory and deleting; you'd just overwrite the same array over and over.
Update: the update doesn't actually change the data in place, as I thought it did. So this may not be efficient.

Just an idea that is to complicated to write in a comment.
Create a few log tables, 3 as an example, Log1, Log2, Log3
CREATE TABLE Log1 (
Id int NOT NULL
CHECK (Id BETWEEN 0 AND 9)
,Message varchar(10) NOT NULL
,CONSTRAINT [PK_Log1] PRIMARY KEY CLUSTERED ([Id] ASC) ON [PRIMARY]
)
CREATE TABLE Log2 (
Id int NOT NULL
CHECK (Id BETWEEN 10 AND 19)
,Message varchar(10) NOT NULL
,CONSTRAINT [PK_Log2] PRIMARY KEY CLUSTERED ([Id] ASC) ON [PRIMARY]
)
CREATE TABLE Log3 (
Id int NOT NULL
CHECK (Id BETWEEN 20 AND 29)
,Message varchar(10) NOT NULL
,CONSTRAINT [PK_Log3] PRIMARY KEY CLUSTERED ([Id] ASC) ON [PRIMARY]
)
Then create a partitioned view
CREATE VIEW LogView AS (
SELECT * FROM Log1
UNION ALL
SELECT * FROM Log2
UNION ALL
SELECT * FROM Log3
)
If you are on SQL2012 you can use a sequence
CREATE SEQUENCE LogSequence AS int
START WITH 0
INCREMENT BY 1
MINVALUE 0
MAXVALUE 29
CYCLE
;
And then start to insert values
INSERT INTO LogView (Id, Message)
SELECT NEXT VALUE FOR LogSequence
,'SomeMessage'
Now you just have to truncate the logtables on some kind of schedule
If you don't have sql2012 you need to create the sequence some other way

I'm looking for something similar myself (using a table as a circular buffer) but it seems like a simpler approach (for me) will be just to periodically delete old entries (e.g. the lowest IDs or lowest create/lastmodified datetimes or entries over a certain age). It's not a circular buffer but perhaps it is a close enough approximation for some. ;)

Is it possible to add index to a temp table? And what's the difference between create #t and declare #t

I need to do a very complex query.
At one point, this query must have a join to a view that cannot be indexed unfortunately.
This view is also a complex view joining big tables.
View's output can be simplified as this:
PID (int), Kind (int), Date (date), D1,D2..DN
where PID and Date and Kind fields are not unique (there may be more than one row having same combination of pid,kind,date), but are those that will be used in join like this
left join ComplexView mkcs on mkcs.PID=q4.PersonID and mkcs.Date=q4.date and mkcs.Kind=1
left join ComplexView mkcl on mkcl.PID=q4.PersonID and mkcl.Date=q4.date and mkcl.Kind=2
left join ComplexView mkco on mkco.PID=q4.PersonID and mkco.Date=q4.date and mkco.Kind=3
Now, if I just do it like this, execution of the query takes significant time because the complex view is ran three times I assume, and out of its huge amount of rows only some are actually used (like, out of 40000 only 2000 are used)
What i did is declare #temptable, and insert into #temptable select * from ComplexView where Date... - one time per query I select only the rows I am going to use from my ComplexView, and then I am joining this #temptable.
This reduced execution time significantly.
However, I noticed, that if I make a table in my database, and add a clustered index on PID,Kind,Date (non-unique clustered) and take data from this table, then doing delete * from this table and insert into this table from complex view takes some seconds (3 or 4), and then using this table in my query (left joining it three times) take down query time to half, from 1 minute to 30 seconds!
So, my question is, first of all - is it possible to create indexes on declared #temptables.
And then - I've seen people talk about "create #temptable" syntax. Maybe this is what i need? Where can I read about what's the difference between declare #temptable and create #temptable? What shall I use for a query like mine? (this query is for MS Reporting Services report, if it matters).

#tablename is a physical table, stored in tempdb that the server will drop automatically when the connection that created it is closed, #tablename is a table stored in memory & lives for the lifetime of the batch/procedure that created it, just like a local variable.
You can only add a (non PK) index to a #temp table.
create table #blah (fld int)
create nonclustered index idx on #blah (fld)

It's not a complete answer but #table will create a temporary table that you need to drop or it will persist in your database. #table is a table variable that will not persist longer than your script.
Also, I think this post will answer the other part of your question.
Creating an index on a table variable

Yes, you can create indexes on temp tables or table variables. http://sqlserverplanet.com/sql/create-index-on-table-variable/

The #tableName syntax is a table variable. They are rather limited. The syntax is described in the documentation for DECLARE #local_variable. You can kind of have indexes on table variables, but only indirectly by specifying PRIMARY KEY and UNIQUE constraints on columns. So, if your data in the columns that you need an index on happens to be unique, you can do this. See this answer. This may be “enough” for many use cases, but only for small numbers of rows. If you don’t have indexes on your table variable, the optimizer will generally treat table variables as if they contain one row (regardless of how many rows there actually are) which can result in terrible query plans if you have hundreds or thousands of rows in them instead.
The #tableName syntax is a locally-scoped temporary table. You can create these either using SELECT…INTO #tableName or CREATE TABLE #tableName syntax. The scope of these tables is a little bit more complex than that of variables. If you have CREATE TABLE #tableName in a stored procedure, all references to #tableName in that stored procedure will refer to that table. If you simply reference #tableName in the stored procedure (without creating it), it will look into the caller’s scope. So you can create #tableName in one procedure, call another procedure, and in that other procedure read/update #tableName. However, once the procedure that created #tableName runs to completion, that table will be automatically unreferenced and cleaned up by SQL Server. So, there is no reason to manually clean up these tables unless if you have a procedure which is meant to loop/run indefinitely or for long periods of time.
You can define complex indexes on temporary tables, just as if they are permanent tables, for the most part. So if you need to index columns but have duplicate values which prevents you from using UNIQUE, this is the way to go. You do not even have to worry about name collisions on indexes. If you run something like CREATE INDEX my_index ON #tableName(MyColumn) in multiple sessions which have each created their own table called #tableName, SQL Server will do some magic so that the reuse of the global-looking identifier my_index does not explode.
Additionally, temporary tables will automatically build statistics, etc., like normal tables. The query optimizer will recognize that temporary tables can have more than just 1 row in them, which can in itself result in great performance gains over table variables. Of course, this also is a tiny amount of overhead. Though this overhead is likely worth it and not noticeable if your query’s runtime is longer than one second.

To extend Alex K.'s answer, you can create the PRIMARY KEY on a temp table
IF OBJECT_ID('tempdb..#tempTable') IS NOT NULL
DROP TABLE #tempTable
CREATE TABLE #tempTable
(
Id INT PRIMARY KEY
,Value NVARCHAR(128)
)
INSERT INTO #tempTable
VALUES
(1, 'first value')
,(3, 'second value')
-- will cause Violation of PRIMARY KEY constraint 'PK__#tempTab__3214EC071AE8C88D'. Cannot insert duplicate key in object 'dbo.#tempTable'. The duplicate key value is (1).
--,(1, 'first value one more time')
SELECT * FROM #tempTable

How to speed up a SQL Server query involving count(distinct())

I have a deceptively simple SQL Server query that's taking a lot longer than I would expect.
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED
SELECT COUNT(DISTINCT(guid)) FROM listens WHERE url='http://www.sample.com/'
'guid' is varchar(64) NULL
'url' is varchar(900) NULL
There is an index on guid and url.
There are over 7 million rows in the 'listens' table, of which 17,000 match the url in question, and the result of the query is 5,500.
It is taking over 1 minute to run this query on SQL Server 2008 on a fairly idle Dual-Core AMD Opteron 2GHz with 1GB RAM.
Any ideas how to get the execution time down? Ideally it should be under 1 second!

Create an index on url which would cover the GUID:
CREATE INDEX ix_listens_url__guid ON listens (url) INCLUDE (guid)
When dealing with urls as identifiers, it is much better to store and index the URL hash rather than the whole URL.

scaning indexes that large will take long no matter what.
what you need to do is to shorten the indexes.
what you can do is have an integer column where the checksum of the url is calculated and stored.
this way your index will be narrow and count will be fast.
note that checksum is not unique but it's unique enough.
here's a complete code example of how to do it. I've included checksums for both columns but it probably needs only one. you could also calculate the checksum on the insert or update by yourself and remove the trigger.
CREATE TABLE MyTable
(
ID INT IDENTITY(1,1) PRIMARY KEY,
[Guid] varchar(64),
Url varchar(900),
GuidChecksum int,
UrlChecksum int
)
GO
CREATE TRIGGER trgMyTableCheckSumCalculation ON MyTable
FOR UPDATE, INSERT
as
UPDATE t1
SET GuidChecksum = checksum(I.[Guid]),
UrlChecksum = checksum(I.Url)
FROM MyTable t1
join inserted I on t1.ID = I.ID
GO
CREATE NONCLUSTERED INDEX NCI_MyTable_GuidChecksum ON MyTable(GuidChecksum)
CREATE NONCLUSTERED INDEX NCI_MyTable_UrlChecksum ON MyTable(UrlChecksum)
INSERT INTO MyTable([Guid], Url)
select NEWID(), 'my url 1' union all
select NEWID(), 'my url 2' union all
select null, 'my url 3' union all
select null, 'my url 4'
SELECT *
FROM MyTable
SELECT COUNT(GuidChecksum)
FROM MyTable
WHERE Url = 'my url 3'
GO
DROP TABLE MyTable

I know this post is a bit late. I was searching on another optimization issue.
Noting that:
guid is VARCHAR(64) **and not really a 16byte uniqueidentifier
url is varchar(900) and you have 7 million rows of it.
My suggestion:
Create a new field for the table.
Column = URLHash AS UNIQUEIDENTIFIER
on creation of a new record. URLHash = CONVERT( UNIQUEIDENTIFIER, HASHBYTES('MD5', url) )
Build an index on URLHash
then in your query:
SELECT COUNT(DISTINCT(guid)) FROM listens WHERE URLHash = CONVERT( UNIQUEIDENTIFIER, HASHBYTES( 'MD5', 'http://www.sample.com/' ) )
This will give you a very fast method of uniquely seeking a specific url, while maintaining a very small index size.
If you need FURTHER optimization, you may want to do the same hash on guid. Performing a distinct on a 16byte uniqueidentifier is faster than a varchar(64).
The above assumption is that you are not adding ALOT of new rows into listen table; ie, new record rates are not that heavy. The reason is that MD5 algorithm, although providing perfect dispersion; is notoriously slow. If you are adding new records in the thousands per-second; then calculating the MD5 hash on record creation can slow down your server (unless you have a very fast server). The alternative approach is to implement your own version of FNV1a hashing algorithm which is not built-in. FNV1a is a lot faster compared to MD5 and yet provide a very good dispersion/low collission rate.
Hope the above helps whoever run into these kind of problems in the future.

Your GUID column will, by nature, be a lot more labour-intensive than, say, a bigint as it takes up more space (16 bytes). Can you replace the GUID column with an auto-incremented numerical column, or failing that, introduce a new column of type bigint/int that is incremented for each new value of the GUID column (you could then use your GUID to ensure global uniqueness, and the bigint/int for indexing purposes)?
From the link above:
At 16 bytes, the uniqueidentifier data
type is relatively large compared to
other data types such as 4-byte
integers. This means indexes built
using uniqueidentifier keys may be
relatively slower than implementing
the indexes using an int key.
Is there any particular reason why you're using a varchar for your guid column rather than uniqueidentifier?

I bet if you have more than 1GB of memory in the machine it would perform better (all DBA's I've met expect at least 4GB in a production SQL server.)
I've no idea if this matters but if you do a
SELECT DISTINCT(guid) FROM listens WHERE url='http://www.sample.com/'
won't #rowcount contain the result you want?

Your best possible plan is a range seek to obtain the 17k candidate URLs and the count distinct to rely on a guaranteed order of input so it does not have to sort. The proper data structure that can satisfy both of these requirements is an index on (url, guid):
CREATE INDEX idxListensURLGuid on listens(url, guid);
You already got plenty of feedback on the wideness of the key used and you can definetely seek to improve them, and also increase that puny 1Gb of RAM if you can.
If is possible to deploy on SQL 2008 EE, then make sure you turn on page compression for such a highly repetitive and wide index. It will do miracles on performance due to reduced IO.

Some hints ...
1) Refactor your query, e.g. use with clause ...
with url_entries as (
select guid
from listens
where url='http://www.sample.com/'
)
select count(distinct(enries.guid)) as distinct_guid_count
from url_entries entries
2) Tell exact SQL Serever which index must be scanned while performing query (of course, index by url field). Another way - simple drop index by guid and leave index by url alone. Look here for more information about hints. Especially for constructions like select ... from listens with (index(index_name_for_url_field) )
3) Verify state of indexes on listens table and update index statistics.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas