I have a table with about 4 million rows. What I'd like to do is to add two more columns and then update the values of these two columns based on the third column in this same table. Basically I'm trying to set IsoWeek and IsoYear based on ReportDate.
I've added the columns and all the values are NULL, now I've started with simple update all script like below:
UPDATE Report
SET IsoWeek = DATEPART(ISO_WEEK, ReportDate), IsoYear = dbo.ISO_YEAR(ReportDate)
It took 5sec locally, but it was over 10min on Azure test DB so I cancelled and reimplemented the query with batches. It was around the same 5sec locally, but on Azure test DB it was still super slow. This time I've waited more and it completed in about 45 minutes.
I have to run a similar script on PROD Azure DB, so now I'm trying to find ways to optimize this update.
I've added WHERE Id <= 50000 to update only one chunk:
UPDATE Report
SET IsoWeek = DATEPART(ISO_WEEK, ReportDate), IsoYear = dbo.ISO_YEAR(ReportDate)
WHERE Id <= 50000
This query executed locally in 0sec and about 7sec on Azure TEST db. This seems like a good comparison test and I started comparing execution plans.
Locally:
Azure TEST db:
So I'm not sure why is it different on local and Azure Test DB and how can I make it faster on Azure.
Any ideas?
UPD:
When I removed dbo.ISO_YEAR, execution plan is now better but execution time went down from 7sec to 6sec only.
Looks like you have a scalar UDF in your query, causing a table spool, plus a lot of context switching. Azure will not inline these UDFs.
The table spool might be removed by changing the UDF to use SCHEMABINDING, but you're best off inlining it yourself, either direct in the query or as an iTVF.
Here is a request to add scalar UDF inlining to Azure:
https://feedback.azure.com/forums/217321-sql-database/suggestions/38955436-bring-scalar-udf-inlining-to-azure-sql-database
There are many things that could be different on Azure SQL vs SQL Server on-premises and that may affect performances. For example:
are you using Simple Recovery model on SQL Server? Azure SQL always run in Full Recovery
are you using ADR on SQL Server? Azure SQL always run with ADR on
are you using TDE on SQL Server? Azure SQL has TDE enabled by default
Also, you don't mention with Azure SQL Tier are you using. Standard/GeneralPurpose or Premium/BusinessCritical? Or Hyperscale? How many cores or DTUs?
Related
We have a linked server (OraOLEDB.Oracle) defined in the SQL Server environment. Oracle 12c, SQL Server 2016. There is also an Oracle client (64 bit) installed on SQL Server.
When retrieving data from Oracle (a simple query, getting all columns from a 3M row, fairly narrow table, with varchars, dates and integers), we are seeing the following performance numbers:
sqlplus: select from Oracle > OS File on the SQL Server itself
less than 2k rows/sec
SSMS: insert into a SQL Server table select from Oracle using OpenQuery (passthrough to Oracle, so remote execution)
less than 2k rows/sec
SQL Export/Import tool (in essence, SSIS): insert into a SQL Server table, using the OLEDB Oracle for source and OLEDB SQL Server for target
over 30k rows/second
Looking for ways to improve throughput using OpenQuery/OpenResultSet, to match SSIS throughput. There is probably some buffer/flag somewhere that allows to achieve the same?
Please advise...
Thank you!
--Alex
There is probably some buffer/flag somewhere that allows to achieve the same?
Probably looking for the FetchSize parameter
FetchSize - specifies the number of rows the provider will fetch at a
time (fetch array). It must be set on the basis of data size and the
response time of the network. If the value is set too high, then this
could result in more wait time during the execution of the query. If
the value is set too low, then this could result in many more round
trips to the database. Valid values are 1 to 429,496, and 296. The
default is 100.
eg
exec sp_addlinkedserver N'MyOracle', 'Oracle', 'ORAOLEDB.Oracle', N'//172.16.8.119/xe', N'FetchSize=2000', ''
See, eg https://blogs.msdn.microsoft.com/dbrowne/2013/10/02/creating-a-linked-server-for-oracle-in-64bit-sql-server/
I think there are many way to enhance the performance on the INSERT query, I suggest reading the following article to get more information about data loading performance.
The Data Loading Performance Guide
There are one method you can try which is minimizing the logging by using clustered index. check the link below for more information:
New update on minimal logging for SQL Server 2008
So I have a summary i need to return to the end user application.
It should accept 3 parameters DateType, StartDate, EndDate.
Date Type will determine the date field I use to filter the data.
The way i accomplished this was putting all the IDs of the records for a datetype into a TEMP table and then joining my summary to the list of IDs.
This worked fine when running on the query on the SQL server that houses the data.
However, that is a replicated server, so when I compiled to a stored proc that would be on the server with the rest of the application data, it slowed the query down. IE 2 seconds vs 50 seconds.
I think the cross join from the temp table that is created on the SQL server then joining to the tables on the replciation server, is causing the slow down.
Are there any methods or techniques that I can use to get around this and build this all in one stored procedure?
If I create 3 stored procedures with their own date range, then they are fast again. However, this means maintaining multiple stored procs for the same thing.
First off, if you are running a version of SQL Server older than 2012 SP1, one problem is that users who aren't allowed to run DBCC SHOW_STATISTICS (which is most users who aren't sysadmins, see the "Permissions" section in the documentation) don't get access to statistics on remote tables. This can severely cripple the optimizer's ability to generate a good execution plan. Upgrading SQL Server or granting more permissions can help there.
If your query involves filtering or joining on a character column, make sure the remote server is flagged in the linked server options as "collation compatible". If this option is off, SQL Server can't assume strings can be compared across the servers and it will start pumping entire tables up and down just to make sure the data ends up where the comparison has to be made.
If the execution plan is as good as it gets and it's still not good enough, one general (lame) technique is to transfer all data locally first (SELECT * INTO #localtable FROM remote.db.schema.table), then run the query as a non-distributed query. Obviously, in order for this to work, the remote table cannot be "too big" and in some cases this actually has worse performance, depending on how many rows are involved. But it's always worth considering, because the optimizer does a better job with local tables.
Another approach that avoids pulling tables together across servers is packing up data in parameters to remote stored procedure calls. Entire tables can be passed as XML through an NVARCHAR(MAX), since neither XML columns nor table-valued parameters are supported in distributed queries. The basic idea is the same: avoid the need for the the optimizer to figure out an efficient distributed query. The best approach greatly depends on your data and your query, obviously.
I just used the SQLAzureMW (SQL Azure Migration Wizard Tool) to migrate my SQL Server database to Azure SQL. It went off without a hitch - all my tables are there, the website is running fine off it, etc.
Here's what's odd: if I execute a simple SELECT statement against my tables, I get only a few of the rows. I assumed they were missing, but my website is using some of those records as if they're there. So I queried with a WHERE clause and BAM - they showed up. How the... what the... why isn't my select showing me everything? This applies to many of the tables I've tested.
SQL Azure
On-Premise
I gave up on MS SQL Management Studio and am instead using SQL Server Object Explorer from Visual Studio 2012/2013. It functions properly and allows inline editing of data.
Consider this SELECT statement:
SELECT
SvcTimeID,
LoginName,
MeanSeconds,
MedianSeconds,
RequestCount,
StdDevSeconds,
SvcDate,
CAST (TS AS INT) AS TS
FROM dbo.SvcTime
WHERE SvcDate >= #SvcDate
Where the parameter is set:
cmd.Parameters["#SvcDate"].Value = DateTime.UtcNow - new TimeSpan(31, 0, 0, 0);
Execute that statement in an Azure Web Role - brought back, say 24 rows.
Now, insert two new rows; wait at least one minute; execute the statement again. Do the recently inserted rows appear? In my case, they did not. Note: the default value of SvcDate in the database is getutcdate().
Move the SQL Azure database from the web edition to the standard (S2) edition. Rows magically appear.
Here is my theory. The issue you had was not with MS SQL Management Studio but with SQL Azure itself where, under certain circumstances, the same query will return the original rows from a cache someplace and will miss the new rows in the database.
This has blown any remaining confidence I had with Azure.
I was scared at first, but I think this has an explanation:
If you inserted some rows in connection "A" and can't find them in other sessions, maybe you have a uncommited transaction. By default, in SQL Server on premise, your second connections would hung until transaction is commited or rolled back. (Isolation level read committed)
Somehow, using the same isolation level, Azure acts differently. I seems to work in some cases as a snapshot isolation. Because of that, you can read from the table, but results are not updated. Or maybe the lock are set in a different way.
To solve this, check sysprocesses for sessions with open_tran > 0 or just be careful commmiting trans. In the example, running commit in your session "A" should do it.
Good luck!
I have 13 SQL databases some 2005 others 2008, on a VPN. I'd like to take all of the data from the "Employees" table on each database and make it a view at each location. I would then like to publish these views to 1 database on another server, all in one table marking where each came from within the origninal databases. For example the database where all the information goes to would look like this:
User Name Location
bik Bob K 1
JS John S 2
Etc.
Any help is appreciated.
I assume you want the data on the final server to be viewable, but not modifiable, and to reflect changes made to the source databases?
This would probably not perform all that well, but one do-it-yourself-way to do it would be the following (disclaimer: I haven't tried doing this myself):
Set up all the source servers as linked servers on the final server.
Create a view in this form:
SELECT *, 1 as Location
FROM [Linked Server 1].Database1.dbo.Table1
UNION ALL
SELECT *, 2 as Location
FROM [Linked Server 2].Database2.dbo.Table2
... etc ....
You might want to read this documentation on distributed queries, if you haven't already.
I believe it's also possible to use SSIS as the source of a distributed query, but a quick scan through the documentation didn't find anything about it. I mention that because SSIS would make pulling and transforming data from disaparate data sources very easy, and if you could use the final recordset as a data source, you could use an SSIS package as the backend to your view. However, again, performance would probably require considerable tuning.
If the queries don't have to be real time you could look into using SQL Server Integration Services (SSIS) to pull in the data to a local DB. you could schedule the job to run hourly/daily/weekly..
I want to set up some monitoring software that will generate an SMNP trap if a database log file goes beyond about 95% usage. It can only look at the first result in the first column of an SQL query, so what I'm looking for is an SQL Query which will just return the percentage figure ONLY in the result - eg, 95
I've found several different ways of doing similar things, but all return table heading etc, whereas I just want the figure. It'll be running this query every hour so nothing too intensive. I'm running SQL version 8.
Thanks, Mike
You could write a query against the OS DMVs to get just the single value you're looking for.
Not sure if this will work for SQL Server 2000, but I know it works as far back as SQL Server 2005. It also requires that performance counters are enabled on the host server (i.e. OS, not just SQL Server).
This query should do the trick:
SELECT cntr_value as PercentUsed
FROM sys.dm_os_performance_counters
WHERE counter_name = 'Percent Log Used'
AND instance_name = 'your_database_name'