Extract ETL process of existing database - sql

I have a pretty complex database, I have both its source code and access to it.
My job is to answer the question ‘from what file, given column is loaded?’. Currently what I do is just go through code, from target column to source file. I wonder if there is any tool that would make it automatic or generate it based on SQL?
I'm using SQL Server.

Related

Excel sheet to SQL table upload automation

I am trying to find the easiest and simplest and quickest way to upload a sheet from Excel to a table in SQL Server 2012 automatically every morning as a job from a location on my folder to the table.
SSIS is the ETL tool you could use, but if it’s a very simple job you can just write a BCP command.
https://learn.microsoft.com/en-us/sql/tools/bcp-utility?view=sql-server-2017
The way the schedule it is to add the task to the agent job on the server. A few things to bear in mind with ETL:
Will your file be named the same each day?
Do you need to retain archived versions of the file?
How do you do error handling if it’s absent or malformed?
Does the DDL need to change periodically to accomodate new date ranges (I.e. new day/month year)
Will this pattern be reused in the future?
Do you need to test logically (duplicates/logical fallacies/referential integrity etc)?
Under whose account will the job run (hint, don’t use your own - get a service account)?
The more complex the answers are to these types of questions the more likely you’ll need a real ETL tool like SSIS

Using SSIS and SSMS - Attempting to create an ETL design document that details source and destination columns and their relations

I am attempting to create an ETL design document that details source and destination columns and their relations, by looking at metadata for the columns. The format should be similar to below:
Final document format
I have utilised something similar to the site below to get data for the output/destination database:
MS SQL TIPS - list-columns-and-attributes-for-every-table-in-a-sql-server-database
Now I am trying to get the same information for the source database but I am not sure if I am allowed to run such a query directly on the source as it is important data.
Is there a way I can use SSIS or look at the source in SSMS to see all relationships I need?
I have the packages in SSIS that details as to what transformations I will apply to the source, via sql queries. I've tried looking at packages individually but there are a lot and there should be an easier way I am missing.
Depends on your sources. If they are all queries from tables, you can probably parse them from the .dtsx files. If any of them are stored procs or views, then there's probably nothing you can do without querying the source database.

SSIS 2012 package to import excel data to database table

I want to make a SSIS package that will load data from an excel file to a database table.
I have already made a package that completes the task, but the table needs to be recreated every time the excel data is loaded because the excel data and its column definition changes every month. If the table is not created with every execution there will be errors and my task will not be complete because excel data will be loaded under a wrong column definition.
Is there any way to dynamically drop and create the table every time?
SSIS generates a lot of metadata behind the scenes which describe your source and destination fields and mappings. If it feels it isn't talking to the same data source/destination it will often throw a validation error and refuse to start.
To an extent you can mitigate this by setting DelayValidation to True on the connection manager properties. However this is unlikely to help in your case as your data spec genuinely is changing.
A further option to get around the tight controls on data format is to write your own custom source/transformation/destination logic in script tasks. You can write a script task to read the format of the incoming Excel file, and save those details to variables and pass them into an Execute SQL Task to create the table. Then you can write a script task to dynamically map your data to some generic intermediate columns that exist within your package. Finally you can write a script task destination to load that data into the newly created table.
Basically you're bypassing all out of the box SSIS functionality and writing your own integration solution from the ground up, but it seems you don't have much choice.

how create a sql database fom a stongly typed dataset

I'm looking for an easy way to transfer a database schema I have developed inside visual studio as a strongly typed dataset (xsd file) into a corresponding sql server database. Silly me I assumed the process would be forthright, but I can't find out how to do it. I assume I could duplicate the tables column by column, but that seems so error prone. Does anyone know of a way to perform the schema transfer like this? Maybe a tool to translate the xsd file into a corresponding sql server ddl file?
Final thought once I have the schema transferred moving data around between the two data stores will be straight forward, its just getting the schemas synced that has me stumped...
Thanks,
Keith
Why didn't you implement your data model directly in SQL Server ?! It is more common and engineered and I think this is why Microsoft has not provided any wizard or tool for this case. As well you can make your data model as scripts or .sql files and they can be managed via SVN and whenever you need the model implementation you can sue them.

Queries for migrating data in live database?

I am writing code to migrate data from our live Access database to a new Sql Server database which has a different schema with a reorganized structure. This Sql Server database will be used with a new version of our application in development.
I've been writing migrating code in C# that calls Sql Server and Access and transforms the data as required. I migrated for the first time a table which has entries related to new entries of another table that I have not updated recently, and that caused an error because the record in the corresponding table in SQL Server could not be found
So, my SqlServer productions table has data only up to 1/14/09, and I'm continuing to migrate more tables from Access. So I want to write an update method that can figure out what the new stuff is in Access that hasn't been reflected in Sql Server.
My current idea is to write a query on the SQL side which does SELECT Max(RunDate) FROM ProductionRuns, to give me the latest date in that field in the table. On the Access side, I would write a query that does SELECT * FROM ProductionRuns WHERE RunDate > ?, where the parameter is that max date found in SQL Server, and perform my translation step in code, and then insert the new data in Sql Server.
What I'm wondering is, do I have the syntax right for getting the latest date in that Sql Server table? And is there a better way to do this kind of migration of a live database?
Edit: What I've done is make a copy of the current live database. Which I can then migrate without worrying about changes, then use that to test during development, and then I can migrate the latest data whenever the new database and application go live.
I personally would divide the process into two steps.
I would create an exact copy of Access DB in SQLServer and copy all the data
Copy the data from this temporary SQLServer DB to your destination database
In that way you can write set of SQL code to accomplish second step task
Alternatively use SSIS
Generally when you convert data to a new database that will take it's place in porduction, you shut out all users of the database for a period of time, run the migration and turn on the new database. This ensures no changes to the data are made while doing the conversion. Of course I never would have done this using c# either. Data migration is a database task and should have been done in SSIS (or DTS if you have an older version of SQL Server).
If the databse you are converting to is just in development, I would create a backup of the Access database and load the data from there to test the data loading process and to get the data in so you can do the application development. Then when it is time to do the real load, you just close down the real database to users and use it to load from. If you are trying to keep both in synch wile you develop, well I wouldn't do that but if you must, make a nightly backup of the file and load first thing in the morning using your process.
You may want to look at investing in a tool like SQL Data Compare.
I believe it has support for access databases too, and you can download a trial.
I you are happy with you C# code, but it fails because of the constraints in your destination database you temporarily can disable them and then enable after you copy the whole lot.
I am assuming that your destination database is brand new DB with no data, and not used by anyone when the transfer happens
It sounds like you have two problems:
You're migrating data from one database to another.
You're changing your schema.
Doing either of these things is tricky if you are trying to migrate the data while people are using the data.
The simplest approach is to migrate the data based on a static copy of the data, and also to queue updates to that data from the moment you captured the static copy. I don't know how easy this is in Access, but in SQLServer or Oracle you can use the redo logs for this or a manual solution using triggers. The poor-man's way of doing this is to make triggers for all the relevant tables that log the primary key of the records that have changed. Then after the old database is shut off you can iterate over those keys and get those records from the old database and put them into the new database. Just copy the whole record; if the record was deleted then delete it from the new database.
Your problem is compounded by the fact that you can't simply copy the data, you have to transform it. This means you probably have to shut down both databases and re-migrate the records based on the change list. It will take a lot of planning to ensure you get things right and I'd recommend writing a testing script that can validate that the resulting data is correct.
Also I'd ensure that the code for the migration runs inside one of the databases if possible. Otherwise you are copying the data twice and this will significantly harm the performance.