Same query - different queries in query store - sql

Problem:
We use entity framework (6.21) as our ORM manager.
Our database is Azure Sql Database.
Because some of the parametrized queries (frequently used in our app) are slow on some of the inputs (on some input it runs 60 seconds on other input it runs 0.4 seconds)
We started investigate those queries using QueryStore and QueryStore explorer in MS SQL Management Studio (MSSMS -> Object Explorer -> Query Store).
We found out, that QueryStore stores two same (same sql query but different params - params are not even stored) queries as different queries (with different query_id).
By different query I mean different row in table
sys.query_store_query).
I checked this by looking into QueryStore tables:
SELECT
qStore.query_id,
qStore.query_text_id,
queryTextStore.query_sql_text
ROW_NUMBER() OVER(PARTITION BY query_sql_text ORDER BY query_sql_text ASC) AS rn
FROM
sys.query_store_query qStore
INNER JOIN
sys.query_store_query_text queryTextStore
ON qStore.query_text_id = queryTextStore.query_text_id
I am not able to compare plans of those queries easily in MSSMS, because each query has its own associated plan.
Expected behaviour:
I would assume that each subsequent run of same query with different parametres would result in either:
1/ re-use of existing plan
or
2/ in creation of another plan based on passed params values...
Example:
The query would look like this (in reality queries are much more complex as they are generated by EntityFramework):
SELECT * FROM tbl WHERE a = #__plinq__
and it's two subsequent runs (with different params) would result in two rows in sys.query_store_query.
Question:
How can I make Azure to save queries with same text as same queries? Or am I missing something or is this expected behaviour?
Or more generally how to tune database queries if they are generated by Entity Framework?
How SQL Server Query Store considers two queries same or different?
Edit1: Update
Based on #PeterB comment (Adding a query hint when calling Table-Valued Function) we were able to solve our problem with slow queries on some params values (we added hint "recompile" on problematic queries).
Based on #GrantFritchey hint I checked context_settings, but there are still multiple rows in query_store table which have same query_sql_text and same context_settings_id but with different query_id .
So we still wonder how SQL Server Query Store consider two queries same or different?

As for the different query entries, the key that Query Store uses for a query consists of:
query_text_id,
context_settings_id,
object_id,
batch_sql_handle,
query_parameterization_type
If any of these is different for a query it will generate a new entry in the query table. Note that batch_sql_handle is only populated for queries referencing temp tables.
So you can check which of these values is different for the queries that you listed.
Currently there are no settings that control the way Query Store aggregates queries. The only way to make it treat them as same is to change your workload so that the fields listed above match. But alternatively probably better approach is that you write your own reporting queries that will aggregate queries and their statistics according to your needs.

Related

Wrong sql execution plan when I use same the query, but in different databse context

Can someone please me explain, why I have two different execution plan for the same t-sql query?
When I in the query used full table name with the name of the database and only I changed database context by command "Use". Query, but should execute, on the same database, because the query contains full name conventions.
It doesn't make sense to me!
Show two different exec-plan
they are basically the same query plan , except one went parallel.
It might be MAXDOP configuration is different for the databases you are running the query from ,
Query plans are based on statistics which are based on the distribution of your data at a point in time. If the table definitions are the same and the data in all three tables are exactly the same and the statistics have been updated then you may expect to see similar plans (although this would depend on database settings and any difference in hardware if querying on two different servers).
In your case, I suspect that there is a discrepancy in the statistics in the two databases.

Query with subquery is not updateable. Is there a work-around?

I'm working with a database in Microsoft Access, and this is the scenario I'm trying to find a solution for:
Companies send in applications to apply for more capacity. The application can contain multiple smaller capacities which add up to the total capacity of the application. The applications are stored in the table tApplication and the capacities associated with the application are stored in tCapacity.
I wish to present information about the application including the total capacity. Therefore I have a form with the following Data Source:
SELECT tApplication.ID,tApplication.CompanyID, tApplication.statusID,
(SELECT SUM(tCapacity.Capacity) FROM tCapacity WHERE tApplication.ID = tCapacity.ApplicationID) AS TotalCapacity
FROM tApplication;
This query gives me the information I need, but it hinders me from edit anything in the recordset (Using subqueries apparently makes your query not updateable in Microsoft Access). Is there a way around this? I've been playing around with the idea to store the applications and capacities in the same table, but I'm not sure it's a good design choice.
I used DSUM() which worked great. Thank you to June7.
SELECT tApplication.ID,tApplication.CompanyID, tApplication.statusID,
DSUM( "tCapacity.Capacity" , "tCapacity", "tCapacity.ApplicationID = " & tApplication.ID) AS TotalCapacity
FROM tApplication;
Try this two options:
Make Table Queries
Make-Table queries are just like Select queries except their results are put into a new table rather than a datasheet view. You specify the table name and it is created. If the table exists, it is replaced. To create a Make-Table query, open the query (qryCustomerSales) in design mode, and choose Make-Table Query from the Query menu
Use Append Queries Rather than Make-Table Queries
An alternative to Make-Table queries is an Append query. Append queries let you insert records from a query into an existing table. If you just have a one step process, there usually is not much difference. However, if you have multiple steps, Append queries have a clear advantage.
More information:
https://www.fmsinc.com/MicrosoftAccess/query/non-updateable/index.html#Example2

SQL query design with configurable variables

I have a web app that has a large number of tables and variables that the user can select (or not select) at run time. Something like this:
In the DB:
Table A
Table B
Table C
At run time the user can select any number of variables to return. Something like this:
Result Display = A.field1, A.Field3, B.field19
There are up to 100+ total fields spread across 15+ tables that can be returned in a single result set.
We have a query that currently works by creating a temp table to select and aggregate the desired fields then selecting the desired variables from that table. However, this query takes quite some time to execute (30 seconds). I would like to try and find a more efficient way to return the desired results while still allowing the ability for the user to configure the variables to see. I know this can be done as I have seen it done in other areas. Any suggestions?
Instead of using a temporary table, use a view and recompile the view each time your run the query (or just use a subquery or CTE instead of a view). SQL Server might be able to optimize the view based on the fields being selected.
The best reason to use a temporary table would be when intra-day updates are not needed. Then you could create the "temporary" table at night and just select from that table.
The query optimization approach (whether through a view, CTE, or subquery) may not be sufficient. This is a rather hard problem to solve in general. Typically, though, there are probably themes of variables that come from particular subqueries. If so, you can write a stored procedure to generate dynamic SQL that just has the requisite joins for the variables chosen for a given run. Then use that SQL for fetching from the database.
And finally, perhaps there are other ways to optimize the query regardless of the fields being chosen. If you think that might be the case, then simplify the query for human consumption and ask another question

Single SELECT with linked server makes multiple SELECT by ID

This is my issue. I defined a linked server, let's call it LINKSERV, which has a database called LINKDB. In my server (MYSERV) I've got the MYDB database.
I want to perform the query below.
SELECT *
FROM LINKSERV.LINKDB.LINKSCHEMA.LINKTABLE
INNER JOIN MYSERV.MYDB.MYSCHEMA.MYTABLE ON MYKEYFIELD = LINKKEYFIELD
The problem is that if I take a look to the profiler, I see that in the LINKSERV server lots of SELECT are made. They looks similar to:
SELECT *
FROM LINKTABLE WHERE LINKKEYFIELD = #1
Where #1 is a parameter that is changed for every SELECT.
This is, of course, unwanted because it appears to be not performing. I could be wrong, but I suppose the problem is related to the use of different servers in the JOIN. In fact, if I avoid this, the problem disappear.
Am I right? Is there a solution? Thank you in advance.
What you see may well be the optimal solution, as you have no filter statements that could be used to limit the number of rows returned from the remote server.
When you execute a query that draws data from two or more servers, the query optimizer has to decide what to do: pull a lot of data to the requesting server and do the joins there, or somehow send parts of the query to the linked server for evaluation? Depending on the filters and the availability or quality of the statistics on both servers, the optimizer may pick different operations for the join (merge or nested loop).
In your case, it has decided that the local table has fewer rows than the target and requests the target row that correspons to each of the local rows.
This behavior and ways to improve performance are described in Linked Server behavior when used on JOIN clauses
The obvious optimizations are to update your statistics and add a WHERE statement that will filter the rows returned from the remote table.
Another optimization is to return only the columns you need from the remote server, instead of selecting *

Best way to compare contents of two tables in Teradata?

When you need to compare two tables to see what the differences are, are there any tools or shortcuts you use, or do you handcode the SQL to compare the two tables?
Basically the core features of a product like Red Gate SQL Data Compare (schemas for my tables typically always match).
Background: In my SQL Server environment, I created a stored procedure which inspects the metadata of the two tables/views, creates a query (as dynamic sql) which joins the two tables on the specified key columns, and compares data in the compare columns, reporting key differences and data differences. The query can either be printed and modified/copied or just excecuted as is. We are not allowed to create stored procedures in our Teradata environment, unfortunately.
Sounds like a data profiling tool such as Talend's Open Profiler would make the most sense at that point.
You could write a BTEQ statement that builds the query similar to your SQL Server stored procedure and then export the dynamically built SQL. You can then in turn run that inside of your BTEQ. It might get cumbersome, but with enough determination you could probably mock something up.
I dont know if this is the right answer you are searching for.
sel * from database_name1.table_name1
minus
sel * from database_name2.table_name2;
you can do the same by selecting specific columns. This will basically give the non existent rows from table2 which are in table1.
If you were not looking for this type of answer, please ignore this and continue.
Also you can select like below.
select
table1.keycol1,
table2.keycol2,
(table1.factcol1 - table2.factcol2) as diff
from table1
inner join
table2
on table1.keycol1 = table2.keycol1
and table1.keycol2 = table2.keycol2
where diff <> 0
This was just an analysis which can give an idea. Please ignore any syntactical and programmatical errors.
Hope this helps.