Single SELECT with linked server makes multiple SELECT by ID - sql

This is my issue. I defined a linked server, let's call it LINKSERV, which has a database called LINKDB. In my server (MYSERV) I've got the MYDB database.
I want to perform the query below.
SELECT *
FROM LINKSERV.LINKDB.LINKSCHEMA.LINKTABLE
INNER JOIN MYSERV.MYDB.MYSCHEMA.MYTABLE ON MYKEYFIELD = LINKKEYFIELD
The problem is that if I take a look to the profiler, I see that in the LINKSERV server lots of SELECT are made. They looks similar to:
SELECT *
FROM LINKTABLE WHERE LINKKEYFIELD = #1
Where #1 is a parameter that is changed for every SELECT.
This is, of course, unwanted because it appears to be not performing. I could be wrong, but I suppose the problem is related to the use of different servers in the JOIN. In fact, if I avoid this, the problem disappear.
Am I right? Is there a solution? Thank you in advance.

What you see may well be the optimal solution, as you have no filter statements that could be used to limit the number of rows returned from the remote server.
When you execute a query that draws data from two or more servers, the query optimizer has to decide what to do: pull a lot of data to the requesting server and do the joins there, or somehow send parts of the query to the linked server for evaluation? Depending on the filters and the availability or quality of the statistics on both servers, the optimizer may pick different operations for the join (merge or nested loop).
In your case, it has decided that the local table has fewer rows than the target and requests the target row that correspons to each of the local rows.
This behavior and ways to improve performance are described in Linked Server behavior when used on JOIN clauses
The obvious optimizations are to update your statistics and add a WHERE statement that will filter the rows returned from the remote table.
Another optimization is to return only the columns you need from the remote server, instead of selecting *

Related

Why the Jpa N+1 is something so bad?

I know that we must avoid that behavior, using join fetch instead of letting the JPA manage it by making multiple queries, but the question is: why it's so bad performatic since we calling all queries in the same session?
Exemplo :
Select * from person
Select * from accounts
Select * from person p left join fetch p.accounts
My question is just about performance, what the justification for the last one be more performatic?
Thanks
Because there's more than just retrieving the data when you run a query. The other phases can be quite expensive. To name a few:
Prepare the connection.
The query is sent through the wire to the database server.
The db engine parses the query. The cache is populated.
The db engine rewrites/rephrase the query to suit internal needs.
The cache is checked. Otherwise is populated and managed.
The db engine evaluates multiple execution plans for the query.
The db engine chooses the optimal execution plan somehow.
The query is run, the data is retrieved, and this has I/O consequences.
The result set is returned throught the wire.
You may have considered the query only included the query is run phase, while in reality there are many other tasks the db is performing.
Also, once a single I/O operation retrieves many rows at once, and you would be discarding many of those unnecessarily.

How to eliminate multiple server calls | MS SQL Server

There is a stored procedure that needs to be modified to eliminate a call to another server.
What is the easiest and feasible way to do this so that the final SP's execution time is faster and also preference to solutions which do not involve much change to the application?
Eg:
select *
from dbo.table1 a
inner join server2.dbo.table2 b on a.id = b.id
Cross server JOINs can be problematic as the optimiser doesn't always pick the most effective solution, which may even result in the entire remote table being dragged over your network to be queried for a single row.
Replication is by far the best option, if you can justify it. This will mean you need to have a primary key on the table you want to replicate, which seems a reasonable constraint (ha!), but might become an issue with a third-party system.
if the remote table is small then it might be better to take a temporary local copy, e.g. SELECT * INTO #temp FROM server2.<database>.dbo.table2;. Then you can change your query to something like this: select * from dbo.table1 a inner join #temp b on a.id = b.id;. The temporary table will be marked for garbage collection when your session ends, so no need to tidy up after yourself.
If the table is larger then you might want to do the above, but also add an index to your temporary table, e.g. CREATE INDEX ix$temp ON #temp (id);. Note that if you use a named index then you will have issues if you run the same procedure twice simultaneously, as the index name won't be unique. This isn't a problem if the execution is always in series.
If you have a small number of ids that you want to include then OPENQUERY might be the way to go, e.g. SELECT * FROM OPENQUERY('server2', 'SELECT * FROM table2 WHERE id IN (''1'', ''2'')');. The advantage here is that you are now running the query on the remote server, so it's more likely to use a more efficient query plan.
The bottom line is that if you expect to be able to JOIN a remote and local table then you will always have some level of uncertainty; even if the query runs well one day, it might suddenly decide to run a LOT slower the following day. Small things, like adding a single row of data to the remote table, can completely change the way the query is executed.

Same query - different queries in query store

Problem:
We use entity framework (6.21) as our ORM manager.
Our database is Azure Sql Database.
Because some of the parametrized queries (frequently used in our app) are slow on some of the inputs (on some input it runs 60 seconds on other input it runs 0.4 seconds)
We started investigate those queries using QueryStore and QueryStore explorer in MS SQL Management Studio (MSSMS -> Object Explorer -> Query Store).
We found out, that QueryStore stores two same (same sql query but different params - params are not even stored) queries as different queries (with different query_id).
By different query I mean different row in table
sys.query_store_query).
I checked this by looking into QueryStore tables:
SELECT
qStore.query_id,
qStore.query_text_id,
queryTextStore.query_sql_text
ROW_NUMBER() OVER(PARTITION BY query_sql_text ORDER BY query_sql_text ASC) AS rn
FROM
sys.query_store_query qStore
INNER JOIN
sys.query_store_query_text queryTextStore
ON qStore.query_text_id = queryTextStore.query_text_id
I am not able to compare plans of those queries easily in MSSMS, because each query has its own associated plan.
Expected behaviour:
I would assume that each subsequent run of same query with different parametres would result in either:
1/ re-use of existing plan
or
2/ in creation of another plan based on passed params values...
Example:
The query would look like this (in reality queries are much more complex as they are generated by EntityFramework):
SELECT * FROM tbl WHERE a = #__plinq__
and it's two subsequent runs (with different params) would result in two rows in sys.query_store_query.
Question:
How can I make Azure to save queries with same text as same queries? Or am I missing something or is this expected behaviour?
Or more generally how to tune database queries if they are generated by Entity Framework?
How SQL Server Query Store considers two queries same or different?
Edit1: Update
Based on #PeterB comment (Adding a query hint when calling Table-Valued Function) we were able to solve our problem with slow queries on some params values (we added hint "recompile" on problematic queries).
Based on #GrantFritchey hint I checked context_settings, but there are still multiple rows in query_store table which have same query_sql_text and same context_settings_id but with different query_id .
So we still wonder how SQL Server Query Store consider two queries same or different?
As for the different query entries, the key that Query Store uses for a query consists of:
query_text_id,
context_settings_id,
object_id,
batch_sql_handle,
query_parameterization_type
If any of these is different for a query it will generate a new entry in the query table. Note that batch_sql_handle is only populated for queries referencing temp tables.
So you can check which of these values is different for the queries that you listed.
Currently there are no settings that control the way Query Store aggregates queries. The only way to make it treat them as same is to change your workload so that the fields listed above match. But alternatively probably better approach is that you write your own reporting queries that will aggregate queries and their statistics according to your needs.

Why select count(*) is faster the select * even if table has no indexes?

Is there any diffrence between the time taken for Select * and Select count(*) for the table having no primary key and other indexes in SQL server 2008 R2?
I have tried select count(*) from a view and it has taken 00:05:41 for 410063922 records.
Select (*) from view has already taken 10 minutes for first 600000 records and the query is still running. So it looks like that it will take more than 1 hour.
Is there any way through which I can make this view faster without any change in the structure of the underlying tables?
Can I create indexed view for tables without indexes?
Can I use caching for the view inside sql server so if it is called again, it takes less time?
It's a view which contains 20 columns from one table only. The table does not have any indexes.The user is able to query the view. I am not sure whether user does select * or select somecolumn from view with some where conditions. The only thing which I want to do is to propose them for some changes through which their querying on the view will return results faster. I am thinking of indexing and caching but I am not sure whether they are possible on a view with table having no indexes. Indexing is not possible here as mentioned in one of the answers.
Can anyone put some light on caching within sql server 2008 R2?
count(*) returns just a number and select * returns all the data. Imagine having to move all that data and the time it takes for your hundred of thousands of records. Even if your table was indexed probably, running select * on your hundreds of thousands of records will still take a lot of time even if less than before, and should never bee needed in the first place.
Can I create indexed view for tables without indexes?
No, you have to add indexes for indexed results
Can I use caching for the view inside sql server so if it is called again, it takes less time?
Yes you can, but its of no use for such a requirement. Why are you selecting so many records in the first place? You should never have to return millions or thousands of rows of complete data in any query.
Edit
Infact you are trying to get billions of rows without any where clause. This is bound to fail on any server that you can get hold off, so better stop there :)
TL;DR
Indexes do not matter for a SELECT * FROM myTABLE query because there is no condition and billions of rows. Unless you change your query, no optimization can help you
The execution time difference is due to the fact that SELEC * will show the entire content of your table and the SELECT COUNT(*) will only count how many rows are present without showing them.
Answer about optimisation
In my opinion you're taking the problem with the wrong angle. First of all it's important to define the real need of your clients, when the requirements are defined you'll certainly be able to improve your view in order to get better performance and avoid returning billions of data.
Optimisations can even be made on the table structure sometimes (we don't have any info about your current structure).
SQL Server will automatically use a system of caching in order to make the execution quicker but that will not solve your problem.
SQL Server apparently does very different work when its result set field list is different. I just did a test of a query joining several tables where many millions of rows were in play. I tested different queries, which were all the same except for the list of fields in the SELECT clause. Also, the base query (for all tests) returned zero rows.
The SELECT COUNT(*) took 6 seconds and the SELECT MyPrimaryKeyField took 6 seconds. But once I added any other column (even small ones) to the SELECT list, the time jumped to 20 minutes - even though there were no records to return.
When SQL Server thinks it needs to leave its indexes (e.g., to access table columns not included in an index) then its performance is very different - we all know this (which is why SQL Server supports including base columns when creating indexes).
Getting back to the original question, the SQL Server optimizer apparently chooses to access the base table data outside of the indexes before it knows that it has no rows to return. In the poster's original scenario, though, there were no indexes or PK (don't know why), but maybe SQL Server is still accessing table data differently with COUNT(*).

TSQL Join efficiency

I'm developing an ASP.NET/C#/SQL application. I've created a query for a specific grid-view that involves a lot of joins to get the data needed. On the hosted server, the query has randomly started taking up to 20 seconds to process. I'm sure it's partly an overloaded host-server (because sometimes the query takes <1s), but I don't think the query (which is actually a view reference via a stored procedure) is at all optimal regardless.
I'm unsure how to improve the efficiency of the below query:
(There are about 1500 matching records to those joins, currently)
SELECT dbo.ca_Connections.ID,
dbo.ca_Connections.Date,
dbo.ca_Connections.ElectricityID,
dbo.ca_Connections.NaturalGasID,
dbo.ca_Connections.LPGID,
dbo.ca_Connections.EndUserID,
dbo.ca_Addrs.LotNumber,
dbo.ca_Addrs.UnitNumber,
dbo.ca_Addrs.StreetNumber,
dbo.ca_Addrs.Street1,
dbo.ca_Addrs.Street2,
dbo.ca_Addrs.Suburb,
dbo.ca_Addrs.Postcode,
dbo.ca_Addrs.LevelNumber,
dbo.ca_CompanyConnectors.ConnectorID,
dbo.ca_CompanyConnectors.CompanyID,
dbo.ca_Connections.HandOverDate,
dbo.ca_Companies.Name,
dbo.ca_States.State,
CONVERT(nchar, dbo.ca_Connections.Date, 103) AS DateView,
CONVERT(nchar, dbo.ca_Connections.HandOverDate, 103) AS HandOverDateView
FROM dbo.ca_CompanyConnections
INNER JOIN dbo.ca_CompanyConnectors ON dbo.ca_CompanyConnections.CompanyID = dbo.ca_CompanyConnectors.CompanyID
INNER JOIN dbo.ca_Connections ON dbo.ca_CompanyConnections.ConnectionID = dbo.ca_Connections.ID
INNER JOIN dbo.ca_Addrs ON dbo.ca_Connections.AddressID = dbo.ca_Addrs.ID
INNER JOIN dbo.ca_Companies ON dbo.ca_CompanyConnectors.CompanyID = dbo.ca_Companies.ID
INNER JOIN dbo.ca_States ON dbo.ca_Addrs.StateID = dbo.ca_States.ID
It may have nothing to do with your query and everything to do with the data transfer.
How fast does the query run in query analyzer?
How does this compare to the web page?
If you are bringing back the entire data set you may want to introduce paging, say 100 records per page.
The first thing I normally suggest is to profile to look for potential indexes to help out. But the when the problem is sporadic like this and the normal case is for the query to run in <1sec, it's more likely due to lock contention rather than a missing index. That means the cause is something else in the system causing this query to take longer. Perhaps an insert or update. Perhaps another select query — one that you would normally expect to take a little longer so the extra time on it's end isn't noted.
I would start with indexing, but I have a database that is a third-party application. Creating my own indexes is not an option. I read an article (sorry, can't find the reference) recommending breaking up the query into table variables or temp tables (depending on number of records) when you have multiple tables in your query (not sure what the magic number is).
Start with dbo.ca_CompanyConnections, dbo.ca_CompanyConnectors, dbo.ca_Connections. Include the fields you need. And then subsitute these three joined tables with just the temp table.
Not sure what the issue is (would like to here recommendations) but seems like when you get over 5 tables performance seems to drop.