How to set DTU for Azure Sql Database via SQL when copying? - azure-sql-database

I know that you can create a new Azure SQL DB by copying an existing one by running the following SQL command in the [master] db of the destination server:
CREATE DATABASE [New_DB_Name] AS COPY OF [Azure_Server_Name].[Existing_DB_Name]
What I want to find out is if its possible to change the number of DTU's the copy will have at the time of creating the copy?
As a real life example, if we're copying a [prod] database to create a new [qa] database, the copy might only need resources to handle a small testing team hitting the QA DB, not a full production audience. Scaling down the assigned DTU's would result in a cheaper DB. At the moment we manually scale after the copy is complete but this takes just as long as the initial copy (several hours for our larger DB's) as it copies the database yet again. In an ideal world we would like to skip that step and be able to fully automate the copy process.

According to the the docs is is:
CREATE DATABASE database_name
AS COPY OF [source_server_name.] source_database_name
[ ( SERVICE_OBJECTIVE =
{ 'basic' | 'S0' | 'S1' | 'S2' | 'S3' | 'S4'| 'S6'| 'S7'| 'S9'| 'S12' |
| 'GP_GEN4_1' | 'GP_GEN4_2' | 'GP_GEN4_4' | 'GP_GEN4_8' | 'GP_GEN4_16' | 'GP_GEN4_24' |
| 'BC_GEN4_1' | 'BC_GEN4_2' | 'BC_GEN4_4' | 'BC_GEN4_8' | 'BC_GEN4_16' | 'BC_GEN4_24' |
| 'GP_GEN5_2' | 'GP_GEN5_4' | 'GP_GEN5_8' | 'GP_GEN5_16' | 'GP_GEN5_24' | 'GP_GEN5_32' | 'GP_GEN5_48' | 'GP_GEN5_80' |
| 'BC_GEN5_2' | 'BC_GEN5_4' | 'BC_GEN5_8' | 'BC_GEN5_16' | 'BC_GEN5_24' | 'BC_GEN5_32' | 'BC_GEN5_48' | 'BC_GEN5_80' |
| { ELASTIC_POOL(name = <elastic_pool_name>) } } )
]
[;]
CREATE DATABASE (sqldbls)
You can also change the DTU level during a copy from the PowerShell API
New-AzureRmSqlDatabaseCopy
But you can only choose "a different performance level within the same service tier (edition)" Copy an Azure SQL Database.
You can, however, copy the database into an elastic pool in the same service tier, so you wouldn't be allocating new DTU resources. You might have a single pool for all your dev/test/qa datatabases and drop the copy there.
If you want to change the service tier, you could a Point-in-time Restore instead of a Database Copy. The database can be restored to any service tier or performance level, using the Portal, PowerShell or REST.
Recover an Azure SQL database using automated database backups

Related

Get the name of every command issued to Redis

I need to add local proxy like twemproxy or dynomite infront of a remote Redis server.
I want to compare the commands we use against the supported commands.
We use redis indirectly so I cannot scan our code to determine which commands need support with a high degree of certainty that I haven't missed something. So I would like to run the test suite
against a Redis instance and afterwards determine every command that was run.
For example, at https://github.com/twitter/twemproxy/blob/master/notes/redis.md, a list like
+-------------------+------------+---------------------------------------------------------------------------------------------------------------------+
| Command | Supported? | Format |
+-------------------+------------+---------------------------------------------------------------------------------------------------------------------+
| DEL | Yes | DEL key [key …] |
+-------------------+------------+---------------------------------------------------------------------------------------------------------------------+
| DUMP | Yes | DUMP key |
+-------------------+------------+---------------------------------------------------------------------------------------------------------------------+
| EXISTS | Yes | EXISTS key |
+-------------------+------------+---------------------------------------------------------------------------------------------------------------------+
| EXPIRE | Yes | EXPIRE key seconds
...
is provided to show which commands are supported.
How can I generate a list of Redis commands issued from a test suite?
You can just run:
redis-cli monitor
in your Terminal to see all commands run, along with their timestamps.

Drill - Parquet IO performance issues with Azure Blob or Azure Files

The problem:
Parquet read performance of Drill appears to be 5x - 10x worse when reading from azure storage and it renders it unusable for bigger data workloads.
It appears to be only a problem when reading parquets. Reading CSV, at the other hand, runs normally.
Let's have:
Azure Blob storage account with ~1GB source.csv and parquets with same data.
Azure Premium File Storage with the same files
Local disk folder containing the same files
Drill running on Azure VM in single mode
Drill configuration:
Azure blob storage plugin working as namespace blob
Azure files mounted with SMB to /data/dfs used as namespace dfs
Local disk folder used as namespace local
The VM
Standard E4s v3 (4 vcpus, 32 GiB memory)
256GB SSD
NIC 2Gbps
6400 IOPS / 96MBps
Azure Premium Files Share
1000GB
1000 IOPS base / 3000 IOPS Burst
120MB/s throughput
Storage benchmarks
Measured with dd, 1GB data, various block sizes, conv=fdatasync
FS cache dropped before each read test (sudo sh -c "echo 3 > /proc/sys/vm/drop_caches")
Local disk
+-------+------------+--------+
| Mode | Block size | Speed |
+-------+------------+--------+
| Write | 1024 | 37MB/s |
| Write | 64 | 16MBs |
| Read | 1024 | 70MB/s |
| Read | 64 | 44MB/s |
+-------+------------+--------+
 Azure Premium File Storage SMB mount
+-------+------------+---------+
| Mode | Block size | Speed |
+-------+------------+---------+
| Write | 1024 | 100MB/s |
| Write | 64 | 23MBs |
| Read | 1024 | 88MB/s |
| Read | 64 | 40MB/s |
+-------+------------+---------+
Azure Blob
Max known throughput of azure blobs is 60MB/s. Upload/download speeds are clamped to target storage read/write speeds.
Drill benchmarks
The filesystem cache was purged before every read test.
IO performance observed with iotop
Queries were chosen simple only for demonstration. Execution time growth for more complex queries is linear.
 Sample queries:
-- Query A: Reading parquet
select sum(`Price`) as test from namespace.`Parquet/**/*.parquet`;
-- Query B: Reading CSV
select sum(CAST(`Price` as DOUBLE)) as test from namespace.`sales.csv`;
 Results
+-------------+--------------------+----------+-----------------+
| Query | Source (namespace) | Duration | Disk read usage |
+-------------+--------------------+----------+-----------------+
| A (Parquet) | dfs(smb) | 14.8s | 2.8 - 3.5 MB/s |
| A (Parquet) | blob | 24.5s | N/A |
| A (Parquet) | local | 1.7s | 40 - 80 MB/s |
| --- | --- | --- | --- |
| B (CSV) | dfs(smb) | 22s | 30 - 60 MB/s |
| B (CSV) | blob | 29s | N/A |
| B (CSV) | local | 18s | 68 MB/s |
+-------------+--------------------+----------+-----------------+
Observations
When reading parquet, more threads will spawn but only cisfd process takes the IO performance.
Trying to tune parquet reader performance as described here but without any significant results.
There is a big peak of egress data at the time of querying parquets from azure storage, that exceeds parquet data size several times. The parquets have ~300MB but the egress peak for one read query is about 2.5GB.
Conclusion
Reading parquets from Azure Files is for some reason slowed down to ridiculous speeds.
Reading parquets from Azure Blob is even a bit slower.
Reading parquets from local filesystem is nicely fast, but not suitable for real use.
Reading CSV from any source utilizes storage throughput normally, therefore I assume some problem / misconfiguration of parquet reader.
The questions
What are the reasons that parquet read performance from Azure Storage is so drastically reduced?
Is there way to optimize it?
I assume that you would have cross checked IO performance issue using Azure Monitor and if the issue still persist, I would like to work closely on this issue. This may require a deeper investigation, so If you have a support plan, I request you file a support ticket, else please do let us know, we will try and help you get a one-time free technical support. In this case, could you send an email to AzCommunity[at]Microsoft[dot]com referencing this thread. Please mention "ATTN subm" in the subject field. Thank you for your cooperation on this matter and look forward to your reply.

Load data from csv in google cloud storage as bigquery 'in' query

I want to compose such query using bigquery, my file stored in Google cloud platform storage:
select * from my_table where id in ('gs://bucket_name/file_name.csv')
I get no results. Is it possible? or am I missing something?
You are able using the CLI or API to do adhoc queries to GCS files without creating tables, a full example is covered here Accessing external (federated) data sources with BigQuery’s data access layer
code snippet is here:
BigQuery query --external_table_definition=healthwatch::date:DATETIME,bpm:INTEGER,sleep:STRING,type:STRING#CSV=gs://healthwatch2/healthwatchdetail*.csv 'SELECT date,bpm,type FROM healthwatch WHERE type = "elevated" and bpm > 150;'
Waiting on BigQueryjob_r5770d3fba8d81732_00000162ad25a6b8_1 ... (0s)
Current status: DONE
+---------------------+-----+----------+
| date | bpm | type |
+---------------------+-----+----------+
| 2018-02-07T11:14:44 | 186 | elevated |
| 2018-02-07T11:14:49 | 184 | elevated |
+---------------------+-----+----------+
on other hand you can create a permament EXTERNAL table with autodetect schema to facilitate WebUI and persistence read more about that here Querying Cloud Storage Data

How do I update a database that's in use?

I'm building a web application using ASP.NET MVC with SQL Server and my development process is going to be like
Make changes in SQL Server locally
Create LINQ-to-SQL classes as necessary
Before committing any change set that has a database, script out the database so that I can regenerate it if I ever need to
What I'm confused about is how I'm going to update the production database which will have live data in set.
For example, let's say I have a table like
People
========================================
Id | FirstName | LastName | FatherId
----------------------------------------
1 | 'Anakin' | 'Skywalker' | NULL
2 | 'Luke' | 'Skywalker' | 1
3 | 'Leah' | 'Skywalker' | 1
in production and locally and let's say I add an extra column locally
ALTER TABLE People ADD COLUMN LightsaberColor VARCHAR(16)
and update my LINQ to SQL, script it out, test it with sample data and decide that I want to add that column to production.
As part of a deployment process, how would I do that? Does there exist some tool that could read my database generation file (call it GenerateDb.sql) and figure out that it needs to update the production People table to put default values in the new column, like
People
==========================================================
Id | FirstName | LastName | FatherId | LightsaberColor
----------------------------------------------------------
1 | 'Anakin' | 'Skywalker' | NULL | NULL
2 | 'Luke' | 'Skywalker' | 1 | NULL
3 | 'Leah' | 'Skywalker' | 1 | NULL
???
You should have a staging DB that is identical to the production database.
When you add any changes to the database, you should perform these changes to the staging DB first, and you can of course compare the dev and staging DB to generate a script with the difference.
Visual Studio has a Schema Compare that generate a script with the differences between a two databases.
There are some other tools a well that does the same.
So, you can generate the script, apply it to the staging Db and if everything went fine, you can apply the script on the production DB
Actually that is right you must have a staging process whenever we commit features we use TFS from Development to Production that is called staging you can look up the history of the TFS whether the database or the solution. and if you're not using TFS in Visual Studio and MSSQL Server.
I guess that your are commiting youre features directly to your server that is your production test. or you can test that in your test server first to see the changes.
Another thing is that I guess if you use stored procedures you can use Temporary Tables if you're asking about the script.
I guess that it's your first time commiting in a live server..

LDAP user data caching on local database

I am integrating LDAP authentication in my web enterprise application. I would like to show listing of people name and email. Instead of querying the LDAP server for the name and email each time a listing containing several users I thought about caching the data locally in the database.
Do you guys know about caching LDAP data best practices?
Should I cache LDAP user data?
When should I insert and refresh the data?
I did the same thing when developping web applications with LDAP authentication.
Each time a user logs in, I retrieve his LDAP uid and checks if it is in the database. If not, I get user information from LDAP, in your case : name, surname (?) and email. Then insert it in the user table of the database.
The user table schema should look like this :
________________
| User |
________________
| - id |
| - ldap_uid |
| - name |
| - first_name |
| - mail |