I'm hoping someone might be able to help me out with this one - I have 24 files in CSV format, they all have the same layout and need to be joined onto some pre-existing data. Each file has a single column that needs to be joined onto the rest of the data, but those columns all have the same names in the original files. I need the columns automatically renamed to the filename as part of the join.
The final name of the column needs to be: Filename - data from another column.
My current approach is to use a foreach container and use the variable generated by the container to name the column, but there's nowhere I can input that value in the join, and even if I did, it'd mess up the output mappings, because the column names would be different.
Does anyone have any thoughts about how to get around these issues? Whoever has an idea will be saving my neck!
EDIT In case some more detail helps with this... SSIS version is 2008 and there are only a few hundred rows per file. It's basically a one time task to collect a full billing history from several bills which are issued monthly.
The source data has three columns, the product number, the product type and the cost.
The destination needs to have 24*3 columns, each of which has a monthly cost for a given product category. There are three product categories, and 24 bills (in seperate files) hence 24*3.
So hopefully I'm being a bit clearer - all I really need to know how to do, is to change the name of a column using a variable passed in from the foreach file container.
I think the easiest is to create a tmp database (aka staging db)
to load data from xls file to it and to define stored procedures where you can pass paramas (ex file names etc) and to build your won logic ...
Cheers Mario
Related
I have a CSV file having more than 700 columns. I just want 175 columns from them to be inserted into a RDBMS table or a flat file usingPentaho (PDI). Now, the source CSV file has variable columns i.e. the columns can keep adding or deleting but have some specific keywords that remain constant throughout. I have the list of keywords which are present in column names that have to excluded, e.g. starts_with("avgbal_"), starts_with("emi_"), starts_with("delinq_prin_"), starts_with("total_utilization_"), starts_with("min_overdue_"), starts_with("payment_received_")
Any column which have the above keywords have to be excluded and should not pass onto my RDBMS table or a flat file. Is there any way to remove the above columns by writing some SQL query in PDI? Selecting specific 175 columns is not possible as they are variable in nature.
I think your example is fit to use meta data injection you can refer to example shared below
https://help.pentaho.com/Documentation/7.1/0L0/0Y0/0K0/ETL_Metadata_Injection
two things you need to be careful
maintain list of columns you need to push in.
since you have changing column names so you may face issue with valid columns as well which you want to import or work with. in order to do so make sure you generate the meta data file every time so you are sure about the column names you want to push out from the flat file.
I'm new to PDI and Kettle, and what I thought was a simple experiment to teach myself some basics has turned into a lot of frustration.
I want to check a database to see if a particular record exists (i.e. vendor). I would like to get the name of the vendor from reading a flat file (.CSV).
My first hurdle selecting only the vendor name from 8 fields in the CSV
The second hurdle is how to use that vendor name as a variable in a database query.
My third issue is what type of step to use for the database lookup.
I tried a dynamic SQL query, but I couldn't determine how to build the query using a variable, then how to pass the desired value to the variable.
The database table (VendorRatings) has 30 fields, one of which is vendor. The CSV also has 8 fields, one of which is also vendor.
My best effort was to use a dynamic query using:
SELECT * FROM VENDORRATINGS WHERE VENDOR = ?
How do I programmatically assign the desired value to "?" in the query? Specifically, how do I link the output of a specific field from Text File Input to the "vendor = ?" SQL query?
The best practice is a Stream lookup. For each record in the main flow (VendorRating) lookup in the reference file (the CSV) for the vendor details (lookup fields), based on its identifier (possibly its number or name or firstname+lastname).
First "hurdle" : Once the path of the csv file defined, press the Get field button.
It will take the first line as header to know the field names and explore the first 100 (customizable) record to determine the field types.
If the name is not on the first line, uncheck the Header row present, press the Get field button, and then change the name on the panel.
If there is more than one header row or other complexities, use the Text file input.
The same is valid for the lookup step: use the Get lookup field button and delete the fields you do not need.
Due to the fact that
There is at most one vendorrating per vendor.
You have to do something if there is no match.
I suggest the following flow:
Read the CSV and for each row look up in the table (i.e.: the lookup table is the SQL table rather that the CSV file). And put default upon not matching. I suggest something really visible like "--- NO MATCH ---".
Then, in case of no match, the filter redirect the flow to the alternative action (here: insert into the SQL table). Then the two flows and merged into the downstream flow.
I work in a sales based environment and our data consists of 'leads'.
Let's say we record CompanyName, PhoneNumber, Address1 & PostCode(ZIP). These rows a seeded with a unique ID in the schema.
The leads come in from various sources and are compiled onto a spread sheet and then imported into SQL 2012 using SSIS.
After a validation check to see if a file exists we then use a simple data flow which consists of an Excel source, Derived Column, Data Conversion and finally an OLE DB Destination.
My requirement I'm sure has a relatively simple solution. I understand what I need to achieve is the first step. I need to take a sample of data from the last rolling two months, if 2 or more fields in the source excel file match the corresponding field in the destination sql table then I want to redirect to another table.
I am unsure of which combination of components I could use to achieve this. I believe that Fuzzy lookup may not be what I am looking for as I am looking to find exact field matches, I have looked at the lookup component but I am unsure if this is the way to go.
Could anyone please provide some advice on how I can best achieve this as simply as possible.
You can use the Lookup to check for matches in your existing table. However, it will be fairly complicated to implement the requirement of checking for any two or more fields matching. Your expression would be long and complex basically consisting of:
(using pseudo code for readability)
IIF((a=a AND b=b) OR (a=a AND c=c) OR (b=b AND c=c) OR ...and so on
for every combination of two columns you want to test
I would do this by importing the entire spreadsheet to a staging table, and doing the existing rows check in a SQL stored proc that moves the data to the desired destination table.
I have a table in Access linked to a SharePoint list. The table is comprised of about 15 fields whose contents are originally pulled from another data source (in Excel format). There are an additional 10 or so fields after the original 15 that make up a questionnaire (added via SharePoint) that contain answers to questions about the first 15 fields.
The data in the first 15 fields needs to be updated periodically when new data from my external source is available to download. A lot of the information will remain the same, however some of the fields within each of the rows will change and need to be updated. It is also important that the 10 fields that contain the questionnaire are not modified at all during this process.
Is there a way for me to easily update the cells that have changed using an Update query or something similar? The data does have a unique identifier column (ID NUMBER) that is present on the current SharePoint list and the external data source.
I was thinking from a logical standpoint to put the new external data into a table, find the ID Number in the SP list and new external data, compare the values in the rest of the row on the SP list to the row of the external data, and if a value is different update the cell with the value from the external data. Not sure how to accomplish this using Access queries though.
I really appreciate any help at all! If you need more information, please let me know. If you think there's a more logical way to do this, please let me know your feedback!!
Here's how to get started:
http://workerthread.wordpress.com/2009/02/03/using-access-2007-to-update-sharepoint-lists/
After you get the connection set up, it's just a matter of writing the queries correctly. If you need to run multiple queries periodically, you can setup a form with buttons, and attach some VBA code to the buttons that runs the queries.
MS Access - execute a saved query by name in VBA
Ok, I've got a database table where data gets dumped by this horrid little program that I despise, but can't change at the moment. It has merchant data in there, names, addresses, and a set of categories that are pipe-delimited. What I need is a clean way to split these out, so I have one row for each merchant/category pair. From there, I can easily get it into the new data structure. This will need to be a repeatable process for a short period of time. I realize the optimal solution is to rid myself of this structure, but I've wracked my brain trying to figure out how to do this cleanly in sql.
I already have a function in the database that will split a delimited string and return a table.
This is in sql server 2008, btw.
Edit (for clarity_
Basically, the following might be a merchant (with the categories attached - other fields redacted for simplicity. Using commas for field delimiters here).
Jimbo's Bait Shoppe, Bait|Sports Gear|Sandwiches
What I need is:
Jimbo's Bait Shoppe, Bait
Jimbo's Bait Shoppe, Sports Gear
Jimbo's Bait Shoppe, Sandwiches
If you have already written a function that splits the string and returns the table you can use a trigger.
Create a trigger on INSERT on the table where the "horrid" program spits the data. The trigger will then take the unformatted data and populate two clean tables (I think in your case you should have two tables: one is a merchant, and another one is products, that are linked using one-to-many relationship using MerchantID).
In this case you can use the table with unformatted data as a "dirty" table. You can cleanse straight after the "horrid" program imported a file.
Please comment if you need help with the triggers