I'm very new to SSIS. Prior to this I have been writing ETL procedures manually, hence my lack of confidence or familiarity in this Environment. I know the basics like execute SQL and Import using a static query.
However, I'm looking for the best way to do the following:
I have a lookup table with a list of records containing to fields and a flag.
For each record where the flag says 0, i want to import records using a query to dump data to one common table using the two fields in the where clause of the query that comes from the above mentioned lookup table.
Could someone please help this noob.
thanks in advance. will be much appreciate.
Use "Conditional split". It allows splitting row to many streams by a condition. In your case it would be "[Flag] == 0"
Related
Due to the way our database is stored, we have tables for each significant event that occurs within a products life:
Acquired
Sold
Delivered
I need to go through and find the status of a product at any given time. In order to do so I'd need to query all of the tables within the schema and find the record with the most up to date record. I know this is possible by union-ing all tables and then finding the MAX timestamp but I wonder if there's a more elegant solution?
Is it possible to query all tables by just querying the root schema or database? Is there a way to loop through all tables within the schema and substitute that into the FROM clause?
Any help is appreciated.
Thanks
You could write a Stored Procedure but, IMO, that would only be worth the effort (and more elegant) if the list of tables changed regularly.
If the list of tables is relatively fixed then creating a UNION statement is probably the most elegant solution and relatively trivial to create - if you plan to use it regularly then just create it as a View.
The way I always approach this type of problem (creating the same SQL for multiple tables) is to dump the list of tables out into Excel, generate the SQL statement for the first table using functions, copy this function down for all the table names and then concatenate all these statements in a final function. You can then just paste this text back into your SQL editor
How do I build a general query based in a list of tables and run/export the results?
Here is my conceptual structure
conceptual structure
Basically the from clause in query would be inserted by the table rows.
After that, each schema.table with true result returned must be stored in a text archive.
Could someone help me?
As pentaho doesn't support loops, it's a bit complicated. Basically you read a list of tables and copy the rows to the result and then you have a foreach-job that runs for every row in the result.
I use this setup for all jobs like this: How to process a Kettle transformation once per filename
In the linked example, it is used to process files, but it can be easily adapted to your use.
i am facing problem in compare data from two huge tables.
scenario :
problem : i have to find out gaps between two set of data which is stored in tables in oracle DB and having live siebel application on it. i can't simply use select statement on whole set of data (8,000,000 rows) which is effecting performance of application.
what i have done it now :
simply put cursor on one set of data and comparing on my logic with other set of data and inserting gaps in other tables acc to logics, but in this solution it comparing one row at a time which is very slow process and getting time out after sometime.
can anyone suggest any better solution than this so that it can speed up the process really appreciate the help.
There can be several different ways to improve performance depending on the exact use case. Based on the information you had provided, below are few things that might work.
Rewrite the queries using 'minus' or 'not exists'.
Create index on the columns that are used in the where clause. Note that index creation will take time and resources and impact system, so it is advisable to do that when the load on the server is low. If indexes are already there and not being used, try to use hints.
If the data in those tables is static try to duplicate tables in test environment and run appropriate tests.
Using cursor on 8M rows does not sound very efficient unless that is the only way to go.
If you give more details, we might be able to give better suggestions.
I use oracle database. I have many table with data very big (300-500 million record).
I use query statement have join many table together. I set index for table but get report very slow.
Please, help me solution when working with big data.
Thanks.
Do you really need to have all the data at once?
Try creating a table that stores just the information you need for the report, and run a query once a day (or few hours) to update the table. You can also use Sql Server Integration Services, although I have not tried SSIS with Oracle myself.
I agree with the other users, you really need to give more info on the problem.
I have lately learned what is dynamic sql and one of the most interesting features of it to me is that we can use dynamic columns names and tables. But I cannot think about useful real life examples. The only one that came into my mind is statistical table.
Let`s say that we have table with name, type and created_data. Then we want to have a table that in columns are years from created_data column and in row type and number of names created in years. (sorry for my English)
What can be other useful real life examples of using dynamic sql with column and table as parameters? How do you use it?
Thanks for any suggestions and help :)
regards
Gabe
/edit
Thx for replies, I am particulary interested in examples that do not contain administrative things or database convertion or something like that, I am looking for examples where the code in example java is more complicated than using a dynamic sql in for example stored procedure.
An example of dynamic SQL is to fix a broken schema and make it more usable.
For example if you have hundreds of users and someone originally decided to create a new table for each user, you might want to redesign the database to have only one table. Then you'd need to migrate all the existing data to this new system.
You can query the information schema for table names with a certain naming pattern or containing certain columns then use dynamic SQL to select all the data from each of those tables then put it into a single table.
INSERT INTO users (name, col1, col2)
SELECT 'foo', col1, col2 FROM user_foo
UNION ALL
SELECT 'bar', col1, col2 FROM user_bar
UNION ALL
...
Then hopefully after doing this once you will never need to touch dynamic SQL again.
Long-long ago I have worked with appliaction where users uses their own tables in common database.
Imagine, each user can create their own table in database from UI. To get the access to data from these tables, developer needs to use the dynamic SQL.
I once had to write an Excel import where the excel sheet was not like a csv file but layed out like a matrix. So I had to deal with a unknown number of columns for 3 temporary tables (columns, rows, "infield"). The rows were also a short form of tree. Sounds weird, but was a fun to do.
In SQL Server there was no chance to handle this without dynamic SQL.
Another example from a situation I recently came up against. A MySQL database of about 250 tables, all in MyISAM engine and no database design schema, chart or other explanation at all - well, except the not so helpful table and column names.
To plan for conversion to InnoDB and find possible foreign keys, we either had to manually check all queries (and the conditions used in JOIN and WHERE clauses) created from the web frontend code or make a script that uses dynamic SQL and checks all combinations of columns with compatible datatype and compares the data stored in those columns combinations (and then manually accept or reject these possibilities).