How to migrate data from MongoDB to SQL-Server? [closed] - sql

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I searched around I found that there are ways to transfer/sync data from sql-server to mongodb.
I also know that Mongodb contains collections instead of table and the data is stored differently.
I want to know whether it is possible to move data from mongodb to sql-server. If yes, then how and what are the tools/topics should I use?

Of course it's possible, but you will need to find a way to force the flexibility of a document db like MongoDB into a RDBMS like SQL Server.
It means that you need to define how you want to handle missing fields (will it be a NULL in the db column? or a default value?) and other things that usually don't fit well in a relational database.
Said do, you can use an ETL tool able to connect to both databases, SSIS can be an example if you want to stay in the MicroSoft world (you can check this Importing MongoDB Data Using SSIS 2012 to have an idea) or you can go for an open source tool like Talend Big Data Integration which has a connector to MongoDB (and of course to SQL Server).

There is no way to directly move data from MongoDB to SQL Server. Because MongoDB data is non-relational, any such movement must involve defining a target relational data model in SQL Server, and then developing a transformation that can take the data in MongoDB and transform it into the target data model.
Most ETL tools such as Kettle or Talend can help you with this process, or if you're a glutton for punishment, you can just write gobs of code.
Keep in mind that if you need this transformation process to be online, or applied more than once, you may need to tweak it for any small changes in the structure or types of the data stored in MongoDB. As an example, if a developer adds a new field to a document inside a collection, your ETL process will need rethinking (possibly new data model, new transformation process, etc.).
If you are not sold on SQL Server, I'd suggest you consider Postgres, because there is a widely-used open source tool called MoSQL that has been developed expressly for the purpose of syncing a Postgres database with a MongoDB database. It's primarily used for reporting purposes (getting data out of MongoDB and into an RDBMS so one can layer analytical or reporting tools on top).
MoSQL enjoys wide adoption and is well supported, and for badly tortured data, you always have the option of using the Postgres JSON data type, which is not supported by any analytics or reporting tools, but at least allows you to directly query the data in Postgres. Also, and now my own personal bias is showing through, Postgres is 100% open source, while SQL Server is 100% closed source. :-)
Finally, if you are only extracting the data from MongoDB to make analytics or reporting easier, you should consider SlamData, an open source project I started last year that makes it possible to execute ANSI SQL on MongoDB, using 100% in-database execution (it's basically a SQL-to-MongoDB API compiler). Most people using the project seem to be using it for analytics or reporting use cases. The advantage is that it works with the data as it is, so you don't have to perform ETL, and of course it's always up to date because it runs directly on MongoDB. A disadvantage is that no one has yet built an ODBC / JDBC driver for it, so you can't directly connect BI tools to SlamData.
Good luck!

There is a tool provided by MongoDB called mongoexport and it's capable of exporting csv files. These csv files can be easily imported into MySQL. Good luck!

Related

How can I have multiple people editing the same SQL Server Database?

Me and some friends are working on a school project, and I've been looking for a way to allow us all to work and edit on the same database just like we would on a VS project through GitHub.
I've tried importing the database into an SQL Database project on VS so we could work through GitHub, but I'm not sure if VS is as effective as the actual SSMS.
It doesn't matter if it's not through GitHub, I just want to know if there is a way for us to work on a database without having to export it and then import it again.
Edit: By 'editing' I meant just working on the database overall, make changes, get data, edit tables, etc.
By 'editing' I meant just working on the database overall, make changes, get data, edit tables, etc.
The short answer is no: as of Feburary 2023 there is no established tooling (outside of experimental databases like Dolt) for distributed collaborative work on both design-and-data on an RDBMS, especially not in the SQL Server-based Microsoft/VS ecosystem.
The reason why is rooted in a reality of database-centric software development: the actual data within a database is irrelevant to working on the system that consumes and manipulates it (with exceptions[1]), this principle is what enables companies handling very sensitive data (such as medical records, etc) to get any work done: the devs work with fake, generated data that only resembles real-world data, while the real data about real people lives in a separate database that almost no-one can access but it will have the exact same design/schema as the developer's database with the fake data in.
If you want to collaborate on data and the design then the "best" approach with today's tooling in my opinion is to have a single RDBMS database in the cloud[2] like Azure SQL or Amazon RDS - but you should still have your database design/schema in source-control in an SSDT *.sqlproj project - and to not directly make design/schema changes to this database without going through SSDT - and only make data changes in this live/cloud database.
If you have collaborators that won't always be able to connect to this central single cloud-hosted database then you have a very hard problem to solve which is worthy of another question entirely (welcome to the CAP Theorem).
[1]: Exceptions like setup/config/"system" data, and seed data for bootstrapping, or data used in test-cases. Point is: designing a database for animal taxonomy doesn't require actual Latin animal species names, and designing a patient/medical database doesn't require having the real details of real people with real conditions stored in your git repository.
[2]: ugh, I hate that word

Where do I store the .sql file that creates my database? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
I am trying to develop a web application that, like most of applications do, uses a database. To create the database I write a .sql-file that contains all the changes I make to the database. I don't know why exactly but in the past I always found it hard to empty my database or modify it later on when I find that a change would make sense. Since I am still learning all this database related stuff the first layout of my database will always be changed. To keep track of all those changes I grew the habit to create a this .sql file.
In this file I am always dropping all the tables that are already existing and creating all the tables new. I kind of do this to have a reference of the actual state of my database always on hand. Changes to a file are way easier for me than directly using the command line database tool. The first question would really be: Is this actually a good practice or is there another way of organizing things that I didn't hear of yet?
The real question is: Where do I store this file? Is it good practice to store it in the same git-repository as the actual code? Should I put it in git at all? I also think of git/github as a cloud storage, if my harddrive burns, all my projects will still be there since I got them on github. If I don't have the .sql file in there I would have to set up the database from new.
The general category for this sort of code is "database migrations." There are tools specialized to these tasks and various web application frameworks may support various different DB migrations tools or they may have their own features.
Probably the most popular tool/suite in this category for Python (models implemented in SQLAlchemy) would be Alembic. Some of the other options include Flyway, Liquibase, and Sqitch.
In all of these cases you manage some abstraction of your database schema (SQL, XML, YAML) and these tools generate the necessary SQL and other code to perform "migrations" from each version of the schema to the next throughout the history of your project. Usually these are generated as increments. They start with initial database schema (possibly completely empty), build and initialize the schema into that version; then build and migrate through each step to arrive that the desired version.
This can become arbitrarily complex if you're making more radical changes to your schema. Adding an extra column to a table without a "NOT NULL" constraint is trivial. Adding new M:N relations through "NOT NULL" junction tables and migrating from some denormalized schema to a higher normal form, for example, can entail intermediary stages where you might need to drop some constraints and tolerate referential integrity violations through some transitional state.
I highly recommend reading these websites, their tutorials, HOWTOs and other documentation to gain a deeper understanding of why these tools exist and how they approach this problem space.
Yu are reinventing the wheel, my friend, your solution is versioning the DB Schema. And Yes, the changes should be added to the project files as almost all framework do. I recommend that you read the following question
How do you version your database schema?
Yes you should absolutely keep this code in source control. However, the exact location of the code within the repository is up to you and or your development team or management. Some good options would be an install, setup, SQL, or ddl folder.
Yes, what you are doing is correct.
Compare with ruby on rails: a file db/schema.rb contains the whole schema. It is good too have such a complete file. This allows you, for example, to easily bootstrap a new environment (e.g. a new testing environment). This file is obviously never used in production as it would wipe out all your data.
Then there are separate small files db/migrations/20171003_add_name_to_person_table.rb or whatever with incremental changes to the schema - called migrations. Those are used to change existing environments without losing data, with some mechanism to make sure each one is only run once per DB.
At your stage it is perfectly fine to be doing all of this manually. You can try to automate it as needed, later. It is good enough that you noticed that something is going on here.
That stuff must go into your code repository, wherever seems natural. /db, /schema, /etc might be some choices.

How is Database Migration done?

i remember in my previous job, i needed to do data migration. in that case, i needed to migrate to a new system, i was to develop, so it has a different table schema. i think 1st, i should know:
in general, how is data migrated (with the same schema) to a different DB engine. eg. MySQL -> MSSQL. in my case, my destination DB was MySQL and i used MySQL Migration Toolkit
i am thinking, in an enterprise app, there may be stored procedures, triggers that also need to be imported.
if table schema is different, how will i then go abt doing this? in my prev job, what i did was import data (in my case, from Access) into my destination (MySQL) leaving table structures. then use SQL to select data and manipulate as required into final destination tables.
in my case, where i dont have documentation for the old db, and the columns was not named correctly, eg. it uses say 'field1', 'field2' etc. i needed to trace from the application code what the columns mean. any better way? or sometimes, columns contain multiple values in delimited data, is reading code the only way?
I really depends, but from your question I assume you want to hear what other people do.
So here is what I do in my current project.
I have to migrate from Oracle to Oracle but to a completely different schema.
The old system was 2-tier (old client, old database) the new system is 3-tier (new client, business logic, new database). We have more than 600 tables in the new schema.
After much pondering we scraped the idea of doing a migration from old database to new database in SQL. We decided that in our case i would be much easier to go:
old database -> old client -> business logic -> new database
In the old database much of the data is stored in strange ways and the old client
mangles it in complex ways. We have access to the source code of the old client but it is a very large system.
We wrote a migration tool that sits above the old client and the business logic.
We have some SQL before and some SQL after that but the bulk of data is migrated via
old client and business logic.
The downside is that it is slow, a complete migration taking more than 190 hours in our case but otherwise it works well.
UPDATE
As far as stored procedures and triggers are concerned:
Even as we use the same DBMS in old and new system (both Oracle) the procedures and
triggers are written from scratch for the new system.
When I've performed database migrations, I've used the application instead a general tool to migrate the database. The application connects to two databases and copies objects from one to the other. You don't have to worry about schema or permissions or whatnot since all that is handled in the application, just like what happens when you set up the application in the first place.
Of course, this may not help you if your application doesn't support this. But if you're writing an application, I strongly recommend doing it this way.
I recommend the wikipedia article for a good overview and links to the main commercial tools (and some non-commercial ones). Stored procedures (and kin, e.g. user-defined function), if abundant, are going to be the "hot spots" in the migration, requiring rare abd costly human skills -- as soon as you get away from the "declarative" mood of mainstream SQL, and into procedural code, you cannot expect automated tools to do a decent job (Turing's Theorem says that they actually can't, in a sufficiently general case;-). So, you need engineers with a good understanding of the procedural trappings of BOTH engines -- the one you're migrating from, the one you're migrating to. You can buy that -- it's one of the niches where consultants make REALLY good money!-)
If you are using MS SQL Server, you can use SSMS to script out the schema and all data in one go: SQL Server 2008: Script Data as Inserts.
If you are not using any/many non-standard SQL constructs, then you might be able to manually edit this scipt without too much effort.

Best way to create a DB-tablestructure [duplicate]

This question already has answers here:
Closed 11 years ago.
Possible Duplicates:
Generating SQL Server DB from XSD
Generating SQL Schema from XML
I have loads and loads of xml files with data, and a schema-file (.xsd) which describes the structure of the xml.
I want to store the data in a MSSQL-database so that I can query it later and display it on a web-site.
I must now create the db-structure, and have so far thought of 3 ways of creating the tables:
Using xmlspy I could load the xsd and use the "create DB from xsd" there. The "trouble" is that I have to manually add the relations between the tables, and also add the columns that is used for these relations.
Using Microsoft SQL Management Studio I could graphically create the tables and relations. The "trouble" here is that the xsd describes about 100 tables and the thought of manually doing this in a GUI way is scary. I would loose track of where I was somewhere in there.
Handwriting the sql in notepad or something. Boring, but then I could do it in small steps, something I could not do with the two other options.
Is there any other way I havent't thought of?
You could do something similar to option (1): import the xsd into a database design tool (e.g. ERwin or PowerDesigner) then do the editing steps in a "graphical" environment, and then have the tool generate the database.
I'm not sure how well these tools work with xml and xsd and you may have to generate the db using xmlspy and then reverse engineer the database. But a good tool will make this easier than "just" with the database.
Hope that this is not too similar to the option (2) you mentioned ...

Which tools or methods would you suggest for creating large amounts of SQL test data? [duplicate]

This question already has answers here:
Closed 14 years ago.
I'd like to stress test some of my SQL queries and find out about bad query plans and bottlenecks. I plan to fill some tables with random test data.
Are there tools or a set of scripts available for this purpose, preferably for SQL Server?
Thanks!
UPDATE: Sorry, didn't know these two question already existed:
Data generators for SQL server?
Creating test data in a database
This website will generate reams of customized data for you.
From that site:
Ever needed custom formatted sample / test data, like, bad? Well, that's the idea of the Data Generator. It's a free, open source script written in JavaScript, PHP and MySQL that lets you quickly generate large volumes of custom data in a variety of formats for use in testing software, populating databases, and scoring with girls.
This site offers an online demo where
you're welcome to tinker around to get
a sense of what the script does, what
features it offers and how it works.
Then, once you've whet your appetite,
there's a free, fully functional,
GNU-licensed version available for
download.
I've use this data generator with success in the past - may not be big enough for your needs though.