SQL Server Connected to Hadoop - Thoughts and Challenges of Implementation

SQL Server Connected to Hadoop - Thoughts and Challenges of Implementation - sql

I wanted to broach the issue of SQL Server's Hadoop distribution called HDInsight.
Given that there is a connection provided to Hadoop, does anyone have experience with HDInsight and particularly a comparison between the Hadoop / SQL Server connector and HDIinsight / SQL Server from a real life DTP scenario or personal 1 node installation?
http://sqlmag.com/blog/use-ssis-etl-hadoop
http://www.microsoft.com/en-us/download/details.aspx?id=27584
http://www.microsoft.com/en-us/sqlserver/solutions-technologies/business-intelligence/big-data.aspx

HDInsight is the distribution of Hadoop that Microsoft maintains for use in Azure. You could roughly compare this to Amazon Elastic MapReduce. They both serve the purpose of being a hosted Hadoop service that has almost no management overhead.
The Hortonworks Data Platform for Windows contains the open source changes that Hortonworks and Microsoft have collaborated on to make Hadoop run well on Windows. HDP isn't HDInsight.
In short - you don't need to use HDInsight if you want to run Hadoop in a Windows environment.
While I can't speak directly to using HDInsight and moving data back and forth between SQL Server, I've done implemented a data processing solution using SQL Server, Hadoop, and Elastic MapReduce. Barring some data quality issues and BULK INSERT weirdness, the process was painless.
Finally, you ask "do we really want to run Hadoop size datasets on Windows servers?" - Windows performs well and has solid tooling around it. I've been somewhat skeptical about running Hadoop and other Java platform software on Windows because of legacy Java I/O issues and a lack of community support, not because of any performance issues.
The largest issues that Windows companies will find moving to Hadoop is there will be limited support in community forums and channels when the problem becomes a Hadoop + Windows issue. It's very easy for people to throw their hands up and say "Nope, not helping out, don't have Windows." With time and adoption, this problem goes away. Besides, nothing says you have to finish on the same platform you start with. You could easily deploy with HDP on Windows and move to HDP on Linux at a later date.
I have put together some SQL Server and Hadoop basics for DBAs that should be helpful.

Related

Environment for test IIDR-IBM

I'm new to using IBM IIDR and I am considering using IIDR to do data replication between DB2 - kafka - Postgresql but I can't find an easy way to test this software, I know that the management console and access server can be obtained from IBM central fix, but how can I get the CDC to test on my local machine?
Any help i will appreciate it a lot

You can find the replication engines for Db2, Kafka and PostgreSQL on FixCentral as well.
For example, the IBM InfoSphere Data Replication CDC for all Linux agents 11.4.0.2 Build x installer has all the Linux x64 engines.
The installer will ask you which database type you would like to use. If you will be replicating from PostgreSQL, please select "PostgreSQL source". If you will replicate to PostgreSQL, select "FlexRep". For Kafka and Db2, simply select the matching entry.
To get started with CDC for Kafka, I recommend starting with this CDC Kafka Installation and Configuration guide. More resources are available on the IBM Data Replication wiki.
To get started with CDC for PostgreSQL as a target, see the JDBC configuration information in Knowledge Center. For PostgresSQL as a source, check here for required database user privileges and settings.
CDC for Db2 has a number of deployment options to choose from, described here.
If you can't find the info you need, reach out to the IBM Data Replication support team.
Hope that helps,
Sarah
IBM Data Replication development

Can I set SQL Azure Vunerability Assesment at a server level instead of database level?

So Recently I found a pretty neat feature where SQL Azure can schedule vunerability assesment scans regularly
To configure this you must go onto each database on your server and configure the storage and who you want to receive the reports.
Lets say I have 100 databases - this is going to take a very long time. Is there a way I can set the Vunerability scan at a server level.
Or failing that a script that can set this scan up (prefereably SQL script if possible)?

Yes, you can configure this at the server level as shown below.
However, periodic scans can only be configured at the database level. Azure SQL Database team mentioned that they will provide more options to run scans and analyze results at scale. The currently available option is to use ARM APIs via the Azure SQL Management SDK library.
Azure SQL Database team mentioned also that PowerShell cmdlets will be released in the near future to enable automation via PowerShell Scripts, and as they make more progress the ability to run at scale will be available via the portal as well.

Currently in SQL Vulnerability Assessment, the target storage and periodic scans can be configured only at the database level.
As the feature develops, we will provide more options to run scans and analyze results at scale.
The currently available option is to use ARM APIs via the Azure SQL management SDK: https://www.nuget.org/packages/Microsoft.Azure.Management.Sql
Powershell cmdlets will be released in the near future to enable automation via Powershell scripts.
And as we make more progress, the ability to run at scale will be available via the portal as well.
In the meantime, we’re very glad to get further feedback on the current functionality of VA and additional feature requests.

Switching from Local SQL to Azure SQL Database on Azure VM - Lower performance

Currently I am hosting my online application on an Azure VM. This is a pretty standard Umbraco website with around 300 visitors per day, nothing special here.
Details of Azure VM:
- Basic A3
- 4 cores
- 7 GB Memory
In the current situation MsSQL is installed on the VM itself and this is working fine, but I am not a great expert in maintenance. A solution I found is migrating the SQL database to SQL Azure.
Looking at my current website I decided to do this and I migrated the database to SQL Azure:
- S3 Standard
- 100 DTU
- 250GB
After the migration I switched the connection string with the connection string that was provided in the Azure portal. When I reloaded my website the loading time was suddenly three times slower.
For now I switched it back to the local SQL Database, but I am wondering if it is a normal situation that the local SQL is faster then SQL Azure in this case.
I hope someone can answer my question, please let me know if more information is required to answer my question.
Best regards, Martijn
EDIT
The issue is resolved! I found out that the SQL Azure Server that I created was located in a different region then the Azure VM. After I created a new SQL Azure server in the same region the performance issues where fixed.

Good to hear your perf issues are fixed. In general, comparing the performance of a local database versus a PaaS database is not always an apples-to-apples comparison for a number of reasons:
Azure SQL Database is a highly available service (99.99%) that requires synchronous commits to a secondary database. A local database typically is not configured for high availability.
Azure SQL Database provides automatic backup. Depending on your setup, a local database might may or may not be configured for backup.
The affect of network latency on a local database does not exist
The memory and CPU between of a S3 Azure SQL Database and a A3 VM are likely not the same

Determine hardware specs and OS version of SSAS box using XMLA or a DMV?

At my current project, SSAS is running on a standalone server and I'd like to know the hardware specs (CPU, Memory, etc) and OS version.
The catch is I don't have access to the OS (or even remote access to perfmon or eventvwr) and the DBA's have so far ignored my requests. In the meantime, I'm wondering if there's a XMLA command I can run or a SQL query against one of the DMVs that will provide this information.
Also, I have admin rights to the SSAS instance and can run Profiler traces against it, so if there's another way, I'm all ears!

You can't find out hardware or operating system specifications through SSAS.

Running two instances of Azure VM

I am using Azure VM role. I created a separate VHD (uploaded to page blob) for storing SQL data files (to overcome data persistence issue with VM role). The SharePoint 2010 has been configured on VM. I want to run 2 instances of Azure VM, where I am faining as mounting the data VHD in write mode on 2 instances is not possible. Can anyone help me out in this?

To add to what Joannes said:
A Cloud Drive may be mounted by exactly one writer, but you can make any number of read-only snapshots. This won't help with a scale-out scenario that you're describing, but I just wanted to clarify.
SharePoint 2010 is not a supported configuration in a VM Role currently. There's licensing, compatibility with SQL Azure to consider, scale-out, and potentially other issues. Same goes with installing SQL Server in a VM Role.
Support issues aside, you could look into Azure Connect as a way to reach an on-premise SQL Server instance. This alleviates your need to store SQL Server data files in a Cloud Drive. This will have bandwidth-related performance and cost implications, but it's certainly an option.

CloudDrive is not intended for scaling out. In other words, a blob can be mounted by no more than 1 VM at the same time. This limitation is very unlikely to be lifted in the future, as a single blob is note intended to support scalable writes.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas