I was asked by the company I work for, to create SSIS that will take data from few tables in one data source and change few things in the data, then put it in few tables in the destination.
The main entity is "Person". In the people table, each person has a PersonID.
I need to loop on these records and for each person, take his names from the orders from the orders table, and other data from few other tables.
I know how to take data from one table and just move it to a different table in the destination. What I don't know is how to manipulate the data before dumping it in the destination. Also, how can i get data from few tables for each person id?
I need to be done with this very fast, so if someone can tell me which items in ssis i need to use and how, that will be greate.
Thanks
Microsoft has a few tutorials.
Typically it is easy to simply do your joins in SQL before extracting and use that query as the source for extraction. You can also do data modification in that query.
I would recommend using code in SSIS tasks for only things where SQL is problematic - custom scalar functions which can be quicker in the scripting runtime and handling disparate data sources.
I would start with the Data Flow Task.
Use the OledbSource to execute a stored proc that will read, manipulate and return the data you need.
Then you can pass that to either an OleDBDestination or an OleDBCommand that will move that to the destination.
Related
Due to the way our database is stored, we have tables for each significant event that occurs within a products life:
Acquired
Sold
Delivered
I need to go through and find the status of a product at any given time. In order to do so I'd need to query all of the tables within the schema and find the record with the most up to date record. I know this is possible by union-ing all tables and then finding the MAX timestamp but I wonder if there's a more elegant solution?
Is it possible to query all tables by just querying the root schema or database? Is there a way to loop through all tables within the schema and substitute that into the FROM clause?
Any help is appreciated.
Thanks
You could write a Stored Procedure but, IMO, that would only be worth the effort (and more elegant) if the list of tables changed regularly.
If the list of tables is relatively fixed then creating a UNION statement is probably the most elegant solution and relatively trivial to create - if you plan to use it regularly then just create it as a View.
The way I always approach this type of problem (creating the same SQL for multiple tables) is to dump the list of tables out into Excel, generate the SQL statement for the first table using functions, copy this function down for all the table names and then concatenate all these statements in a final function. You can then just paste this text back into your SQL editor
How do I build a general query based in a list of tables and run/export the results?
Here is my conceptual structure
conceptual structure
Basically the from clause in query would be inserted by the table rows.
After that, each schema.table with true result returned must be stored in a text archive.
Could someone help me?
As pentaho doesn't support loops, it's a bit complicated. Basically you read a list of tables and copy the rows to the result and then you have a foreach-job that runs for every row in the result.
I use this setup for all jobs like this: How to process a Kettle transformation once per filename
In the linked example, it is used to process files, but it can be easily adapted to your use.
My question comes from what is more efficient when making queries and insert, since the number of registers(data) in my table will grow a lot.
I would like to know what is more efficient to do if all the data is placed within a single table or is the partition and through a View and trigger is more efficient to obtain and enter registers(data).
As already mentioned take a look at database normalization.
SQL is a way to work with relational databases and is built on the idea that we should have many tables that are linked with each other trough relationships. Thus I recommend multiple tables, because you will be able to reuse data (for example user name and surname) through specific IDs rather than copying that data each time a user performs some action on your platform and you need to insert or update some information.
Hope this helps!
So I've got a few tables with a lot of data. Let's say they are table A, B and C. I want to add auto increment ID fields for each of those tables, normalize them by swapping some fields between the tables and add an additional table D. Gulp. There are three goals: 1) redesign the database, and reload the existing data. 2) Enable a data load from a spreadsheet to add/edit/delete the four tables. 3) Enable a web front end to add/edit/delete the four tables.
My current approach:
I thought I would export a flat file for all the data in the 3 existing tables into a csv (spreadsheet).
Then refactor the database design structure
Then use linq to excel to read back the csv spreadsheet records into dto objects
Then use the Entity Framework to transform those dto objects into Entities to update the database with the appropriate relationships between tables
The spreadsheet would be re-used for future bulk data add/edit/deletes
What about the following tools?
SSIS
Bulk insert
Stored procedures
Am I over complicating this? Is there a better way?
What's your definition of "a lot of data"? For some people it's 10,000 rows, for others it's billions of rows.
I would think that a few stored procedures should be able to do the trick, mostly made up of simple INSERT..SELECT statements. Use sp_rename to rename the existing tables, create your new tables, then move the data over.
If you already have to develop a bulk import process then it might make sense to get reuse out of that by doing an export, but I wouldn't create that whole process just for this scenario.
There might be cases where this approach isn't the best, but I don't see anything in your question that makes me think it would be a problem.
Make sure that you have a good back-up first of course.
I have a UDF in SQL 2005 that I would need to schemabind as it is used in a view that I need an index on.
This UDF gets information from a table that is located in a different db (same server) then the one where the UDF is located.
Since it is invalid to specify table as [DBName].dbo.[Tablename], is there a way I can get the info from the table in the other db?
Schema binding is supposed to guarantee consistency. However consistency can not be guaranteed across two different databases, therefore schema-binding cannot be made across two different databases. In other words it's impossible to achieve.
Imagine that, for example, one database is restored to an earlier point in time - the index on the indexed view would become corrupt and queries would be returning wrong results.
If your UDF is in Database1, and it needs to access data from a table in Database2, all you have to do is create a view in Database1 that grabs the data you need from the table(s) in Database2. Then use this view in your UDF.
Works just fine, I used this approach many times.
Hope it helps.