SQL Server Cross Database Query from On-Premises to Azure - sql

Two SQL Server
On-Premises
Azure
When I run the T-sql
On-Premises only
Select top 100 * from Orders
The result is very fast as usual.
Azure Only
Select top 100 * from Orders_2
The same, Fast.
This is the point.
No matter what I used "Link-Server" or "OPENDATASOURCE".
Select top 100 * from Orders a LEFT OUTER JOIN
[AZUREDB].DB01.dbo.Orders_2 a2 ON a2.ID= a.ID
OR
Select top 100 * from Orders a LEFT OUTER JOIN
OPENDATASOURCE('SQLOLEDB','Data Source=AzureDB;User
ID=XXX;Password=XXX') .DB01.dbo.Orders_2 a2 ON a2.ID= a.ID
It takes a very long time, about 15 mins.
What's happened? and how to fix it?

Because it needs to pull the whole table from the remote server into the local server and then afterwards do the TOP 100. If you look at the query plan, which you haven't shown us, you will see that's what is happening.
Instead, filter the remote server's data first before joining.
Select
*
from Orders a
LEFT OUTER JOIN (
SELECT TOP (100)
*
FROM [AZUREDB].DB01.dbo.Orders_2
) a2 ON a2.ID = a.ID
Whether that works with the desired results, i don't know, as you haven't shown what you want.
The other alternative is to ensure the remote server's table has an index (probably clustered) on ID. This means that hopefully your query can just pass 100 rows from your own server to the remote one to join it up.

Related

Very bad performance using 3 tables join on SQL Server

I have serious performance issue when I execute a SQL statements which involves 3 tables as following:
TableA<----TableB---->TableC
In particular, these tables are in a data warehouse and the table in the middle is a dimension table while the others are fact tables. TableA has about 9 millions of record, while TableC about 3 million. The dimension table (TableB) only 74 records.
The syntax of the query is very simple, as you can see, where TableA is called _PG, TableB is equal to _MDT and Table C is called _FM:
SELECT _MDT.codiceMandato as Customer, SUM(_FM.Totale) AS Revenue,
SUM(_PG.ErogatoTotale) AS Paid
FROM _PG INNER JOIN
_MDT
ON _PG.idMandato = _MDT.idMandato INNER JOIN
_FM
ON _FM.idMandato = _MDT.idMandato
GROUP BY _MDT.codiceMandato
Actually, I never has seen the end of this query :-(
_PG has a non clustered index on idMandato and the same _FM table
_MDT table has a clustered index on idMandato
and the execution plan is the following
As you can see the bottleneck is due to Stream Aggregate (33% of cost) and Merge Join (66% of cost). In particular, the stream aggregate underlines about 400 billions of estimated rows!!
I don’t know the reasons and I don’t know how to proceed in order to solve this bad issue.
I use SQL Server 2016 SP1 installed of a virtual server with Windows Server 2012 Standard with 4 Cpu core and 32 GB of RAM , 1,5TB on a dedicated volume made up SAS disks with SSD cache.
I hope anybody can help me to understand.
Thanks in advance
The most likely cause is because you are getting a Cartesian product along two dimensions. This multiplies the rows unnecessarily. The solution is to aggregate before doing the join.
You haven't provided sample data, but this is the idea:
SELECT m.codiceMandato as Customer, f.revenue, p.Paid
FROM _MDT m INNER JOIN
(SELECT p.idMandato, SUM(p.ErogatoTotale) AS Paid
FROM _PG p
GROUP BY p.idMandato
) p
ON p.idMandato = m.idMandato INNER JOIN
(SELECT f.idMandato, SUM(f.Totale) AS Revenue
FROM _FM f
GROUP BY f.idMandato
) f
ON f.idMandato = m.idMandato;
I'm not 100% sure this will fix the problem, because your data structure is not clear.
You can try doing a subquery between TableA and TableC without aggregation and then joining this subquery with TableB and apply the GROUP BY:
SELECT _MDT.codiceMandato, SUM(A.Totale) AS Revenue, sum( A.ErogatoTotale)
AS Paid
FROM ( SELECT m.idMandato, _FM.Totale, _PG.ErogatoTotale FROM _PG
INNER JOIN _FM
ON _FM.idMandato = _MDT.idMandato ) A
INNER JOIN _MDT ON A.idMandato = _MDT.idMandato
GROUP BY _MDT.codiceMandato

join large tables - transactional log full due to active transaction

I have two large tables with 60 million, resp. 10 million records. I want to join both tables together however the process runs for 3 hours then comes back with the error message:
the transaction log for database is full due to 'active_transaction'
The autogrowth is unlimited and I have set the DB recovery to simple
The size of the log drive is 50 GB
I am using SQL server 2008 r2.
The SQL query I am using is:
Select * into betdaq.[dbo].temp3 from
(Select XXXXX, XXXXX, XXXXX, XXXXX, XXXXX
from XXX.[dbo].temp1 inner join XXX.[dbo].temp2
on temp1.Date = temp2.[Date] and temp1.cloth = temp2.Cloth nd temp1.Time = temp1.Time) a
A single command is a transaction and the transaction does not commit until the end.
So you are filling up the transaction log.
You are going to need to loop and insert like 100,000 rows at a time
Start with this just to test the first 100,000
Then will need to add loop with a cursor
create table betdaq.[dbo].temp3 ...
insert into betdaq.[dbo].temp3 (a,b,c,d,e)
Select top 100000 with ties XXXXX, XXXXX, XXXXX, XXXXX, XXXXX
from XXX.[dbo].temp1
join XXX.[dbo].temp2
on temp1.Date = temp2.[Date]
and temp1.Time = temp1.Time
and temp1.cloth = temp2.Cloth
order by temp1.Date, temp1.Time
And why? That is a LOT of data. Could you use a View or a CTE?
If those join columns are indexed a View will be very efficient.
Transaction log can be full even though database is in simple recovery model,even though select into is a minimally logged operation,log can become full due to other transactiosn running in parallel as well.
I would use below queries to check tlog space usage by transactions while the query is runnnig
select * from sys.dm_db_log_space_usage
select * from sys.dm_tran_database_transactions
select * from sys.dm_tran_active_transactions
select * from sys.dm_tran_current_transaction
further below query can be used to check sql text also
https://gallery.technet.microsoft.com/scriptcenter/Transaction-Log-Usage-By-e62ba57d

Improving a search query

I have a query for search function.
Basically search function allowed user to define what they "have" and "want". Then this query will filter out all the possible result which created by other user.
For example, I have apple (with good quality) and I want orange (with poor quality). So the result will display all user that have orange (with poor quality) and want apple (with good quality).
The search query a bit long and i try to simplify as below:
This stored procedure will receive user defined table (ItemID & Quality) as parameters
#WantUdt AS HaveItemUdt READONLY,
#HaveUdt AS HaveItemUdt READONLY
Search query (user can define more than one items and quality, so i use IN):
SELECT * from tbl_Trade WHERE TradeID IN
(SELECT TradeID from tbl_Want w INNER JOIN
(SELECT TradeID FROM tbl_Have
WHERE HaveID IN (SELECT ItemID FROM #HaveUdt) AND
Quality IN (SELECT QualityID FROM #HaveUdt)) as h --to filter [have],
ON w.TradeID = h.TradeID
WHERE WantID IN (SELECT ItemID FROM #WantUdt) AND
Quality =IN (SELECT QualityID FROM #WantUdt) --to filter [want]
)
Above query work as expected. However, I am having performance issue. I try do stress test by execute this stored procedure for multiple times within specific time (few seconds), and my db (SQL Server 2008 Express) seems can't effort and generate a timeout error
Timeout expired. The timeout period elapsed prior to completion of the operation or the server is not responding.
I guess it is because the query above use too many IN CLAUSE.
Is there any way to improve this query?
Try this. I hope it will help.
SELECT * -- select only those columns which are required
FROM tbl_Trade AS tt (NOLOCK)
INNER JOIN tbl_Want w(NOLOCK) ON tt.TradeId = w.TradeID
INNER JOIN tbl_Have h (NOLOCK) ON h.TradeID = w.TradeID
INNER JOIN #HaveUdt hu(NOLOCK) ON hu.itemId = h.HaveID AND hu.QualityId = h.Quality
INNER JOIN #WantUdt wu (NOLOCK) ON wu.ItemId = w.WantID AND wu.QualityId = w.Quality
Thanks

Using analytics with left join and partition by

I have two different queries which produce the same results. I wonder which one is more efficent. The second one, I am using one select clause less, but I am moving the where to the outter select. Which one is executed first? The left join or the where clause?
Using 3 "selects":
select * from
(
select * from
(
select
max(t.PRICE_DATETIME) over (partition by t.PRODUCT_ID) as LATEST_SNAPSHOT,
t.*
from
PRICE_TABLE t
) a
where
a.PRICE_DATETIME = a.LATEST_SNAPSHOT;
) r
left join
PRODUCT_TABLE l on (r.PRODUCT_ID = l.PRODUCT_ID and r.PRICE_DATETIME = l.PRICE_DATETIME)
Using 2 selects:
select * from
(
select
max(t.PRICE_DATETIME) over (partition by t.PRODUCT_ID) as LATEST_SNAPSHOT,
t.*
from
PRICE_TABLE t
) r
left join
PRODUCT_TABLE l on (r.PRODUCT_ID = l.PRODUCT_ID and r.PRICE_DATETIME = l.PRICE_DATETIME)
where
r.PRICE_DATETIME = r.LATEST_SNAPSHOT;
ps: I know, I know, "select star" is evil, but I'm writing it this way only here to make it smaller.
"I wonder which one is more efficent"
You can answer this question yourself pretty easily by turning on statistics.
set statistics io on
set statistics time on
-- query goes here
set statistics io off
set statistics time off
Do this for each of your two queries and compare the results. You'll get some useful output about how many reads SQL Server is doing, how many milliseconds each takes to complete, etc.
You can also see the execution plan SQL Server generates viewing the estimated execution plan (ctrl+L or right-click and choose that option) or by enabling "Display Actual Execution Plan" (ctrl+M) and running the queries. That could help answer the question about order of execution; I couldn't tell you off the top of my head.

LEFT JOINing on additional criteria in MS Access

I have the following T-SQL query (a simple test case) running fine in MS SQL but cannot get the equivalent query in MS Access (JET-SQL). The problem is the additional criteria in the LEFT JOIN. How can I do this in MS Access?
T-SQL:
SELECT * FROM A
LEFT OUTER JOIN B ON A.ID = B.A_ID
AND B.F_ID = 3
JET-SQL (what I have so far but crashes Access!):
SELECT * FROM dbo_A
LEFT JOIN dbo_B ON (dbo_A.ID = dbo_B.A_ID AND dbo_B.F_ID = 3)
You need to use a subselect to apply the condition:
SELECT *
FROM dbo_A LEFT JOIN
[SELECT dbo_B.* FROM dbo_B WHERE dbo_B.F_ID = 3]. AS dbo_B
ON dbo_A.ID = dbo_B.A_ID;
If you're running Access with "SQL 92" compatibility mode turned on, you can do the more standard:
SELECT *
FROM dbo_A LEFT JOIN
(SELECT dbo_B.* FROM dbo_B WHERE dbo_B.F_ID = 3) AS dbo_B
ON dbo_A.ID = dbo_B.A_ID;
Do you need this to be editable in Access? If not, just use a passthrough query with the native T-SQL. If so, I would likely create a server-side view for this, and I'd especially want to move it server-side if the literal value is something you would parameterize (i.e., the F_ID=3 is really F_ID=N where N is a value chosen at runtime).
BTW, I write these subselect derived table SQL statements every single day while working in Access. It's not that big a deal.
Do you get an error message when it crashes or does it just lock up? Judging by the dbo_B name I'm going to guess that these are linked tables in Access. I believe that when you do a join like that Access doesn't tell SQL server that it needs the result of the join, it says, "Give me all of the rows of both tables" then it tries to join them itself. If the tables are very large this can cause the application to lock up.
You're probably better off creating a view on SQL Server for what you need.
I think ms access expect to both tables name in each section of Joins ON clause. As a trick this work for me:
SELECT * FROM A
LEFT OUTER JOIN B ON A.ID = B.A_ID
AND B.F_ID = IIF(True, 3, A.ID)
A.ID or any other else field from table A
That last condition technically isn't a join but a comparison to a literal value. Put it in a WHERE clause:
SELECT *
FROM a LEFT OUTER JOIN b ON a.ID = b.a_id
WHERE b.f_id = 3;