Azure Cosmos DB - How to Know unique document structures in a collection? - azure-sql-database

Background: We are working on a migration project wherein we have to migrate data from Azure Cosmos DB To Azure SQL Database. on analyzing the Cosmos DB we found there are different structures of documents.
Requirement: We have been looking over the internet to get a query using which we can know as to how many unique/different document structures are there. however, we have not been able to find one. Any links pointers will be appreciated. Kindly let know if any further details required.

As #Mark Brown said, we can use Hackolade to analyze our structure.
Hackolade was specially adapted to support the data modeling of multiple object types within one single collection - while supporting multiple collections as well - in order to support the pricing model of Cosmos DB.

Related

Best way to replicate MongoDB NoSQL into SQL tables

How can i replicate (incremental load) MongoDB (NoSQL) to SQL tables.
We have a web-based solution that loading data into MongoDB. The data size is almost 1TB. We need to do BI Reporting in the Looker BI tool. but looker doesn't support MongoDB directly. So we have to replicate our data into SQL form we have redshift for the target database.
Main requirements for parsing NoSQL to SQL:
Parent Node should be the main table
Nested node/arrays should be a separate table with parent key (foreign key)
Whenever a new column is introduced in MongoDB source it should automatically start replicating that new field from any document to the target database.
Incremental refresh from source to target.
I've seen Stitch Data ETL which fits my requirement but I'm looking for OpenSource any ETL/DB tool or library.
Please help.
Posting answers to help out others with the same requirements.
I'm not able to get any open source ETL tool who can full fill the above 4 requirements.
Trying to writing python code to do so. But a paid tool named Precog helped me to fulfill all the above requirements, and a little bit cheaper than Stitch Data ETL.
Thanks

General question about ETL solutions for Azure for a small operation

The way we use data is either retrieving survey data from other organizations, or creating survey instruments ourselves and soliciting organizations under our organization for data.
We have a database where our largest table is perhaps 10 million records. We extract and upload most of our data on an annual basis, with occasionally needing to ETL over large numbers of tables from organizations such as the Census, American Community Survey, etc. Our database is all on Azure and currently the way that I get databases from Census flat files/.csv files is by re-saving them as Excel and using the Excel import wizard.
All of the 'T' in ETL is happening within programmed procedures within my staging database before moving those tables (using Visual Studio) to our reporting database.
Is there a more sophisticated technology I should be using, and if so, what is it? All of my education in this matter comes from perusing Google and watching YouTube, so my grasp on all of the different terminology is lacking and searching on the internet for ETL is making it difficult to get to what I believe should be a simple answer.
For a while I thought we wanted to eventually graduate to using SSIS, but I learned that SSIS was something that was used primarily if you had a database on prem. I've tried looking at dynamic SQL using BULK INSERT to find that BULK INSERT doesn't work with Azure DBs. Etc.
Recently I've been learning about Azure Data Factory and something called Bulk Copy Program using Windows Power Shell.
Does anybody have any suggestions as to what technology I should look at for a small-scale BI reporting solution?
I suggest you using the Data Factory, it has good performance for the large data transfer.
Refence here: Copy performance and scalability achievable using ADF
Copy Active supports you using table data, query or stored procedure to filter data in Source:
Sink support you select the destination table, stored procedure or auto create table(bulk insert) to receive the data:
Data Factory Mapping Data Flow provides more features for the data convert.
Ref: Copy and transform data in Azure SQL Database by using Azure Data Factory.
Hope this helps.

Convert an online JSON set of files to a relational DB (SQL Server, MySQL, SQLITE)

I'm using a tool called Teamwork to manage my team's projects.
The have an online API that consists of JSON files that are accessible with authorisation
https://developer.teamwork.com/projects/introduction/welcome-to-the-teamwork-projects-api
I would like to be able to convert this online data to an sql db so i can create custom reports for my management.
I can't seem to find anything ready to do that.
I need a strategy to do this..
If you know how to program, this should be pretty straightforward.
In Python, for example, you could:
Come up with a SQL schema that maps to the JSON data objects you want to store. Create it in a database of your choice.
Use the Requests library to download the JSON resources, if you don't already have them on your system.
Convert each JSON resource to a python data structure using json.loads.
Connect to your database server using the appropriate Python library for your database. e.g., PyMySQL.
Iterate over the python data, inserting rows into the database as appropriate. This is essentially the JSON-to-Tables mapping from step 1 made procedural.
If you are not looking to do this in code, you should be able to use an open-source ETL tool to do this transformation. At LinkedIn a coworker of mine used to use Talend Data Integration for solid ETL work of a very similar nature (JSON to SQL). He was very fond of it and I respected his opinion, so I figured I should mention it, although I have zero experience of it myself.

Schema in MongoDb

I am using SQL server 2008, and I have created one Database called DemoDb as you can see in below image. I am migrating this existing database to MongoDb now. But I am not aware about Schemas in mongodb. In below image in red square you can see the schemas like admin, login, User etc.
My Question is how I can migrate or create this kind of schema in mongodb. ?
can you please suggest me or help me.
Any help or suggestions will be highly appreciated ...
Thanks
Did you read the MongoDB manual at least?
There is a chapter SQL to MongoDB Mapping Chart which have a useful information for you.
SQL Terms/Concepts - MongoDB Terms/Concepts
table - collection
row - document or BSON document
column - field
table joins - $lookup, embedded documents
...
There is no other way as you must familiarize yourself with MongoDB basics at least.
P.S. MongoDB University have a free courses.

how to insert data in master data services programmatically

I'm trying out Microsoft Master Data Services and I would like to add data to the database programmatically. I'm starting to get the model/entity/member structure but I'm not yet sure. If you have a nice explanation for this structure, please share.
Say somebody added a new employee in an ERP system and I would like to send that to the MDS. How would I do that? Is the data that I want to add a new member? Because if I look at the following information (http://technet.microsoft.com/en-us/library/hh230995), the only way to import data is through entities?
Thanks in advance for any useful information about this!
Lets start with the basics.
Entities in Master Data Services (MDS) are roughly analogous to tables in a regular database.
Every entity must live in a model.
A model can contain any number of entities.
The Metadata* methods you see on that page can be used to create, read and update models and entities. Once you have modeled your ERP tables as an MDS model, you can use the EntityMembersCreate API (with the relevant model/entity information) to create a member (roughly analogous to a row in a table). You can use EntityMembersUpdate to update members and EntityMembersDelete to delete them.
Another way to get large amounts of data into MDS is by using Entity Based Staging. Entity Based Staging allows you to use tools like SSIS to get bulk data into MDS. A good primer here: http://msdn.microsoft.com/en-us/sqlserver/hh802433.aspx.
I hope this helps. Feel free to ask more questions.
I like using a generic data-access-object that classes in my model inherit from. Each class has a one to one relationship with tables in the database.
We're using SSIS to replicate data from our CRM (as well as other data sources) into our MDS (for the time-being). If you're not familiar with the tool, I'd recommend in terms of moving data around - it's relatively easy to pickup the basics. If you go this route, here's a great resource I followed to push data into our MDS system:
http://www.sqlchick.com/entries/2013/2/16/importing-data-into-master-data-services-2012-part-2.html