I have to check multiple tables available or not. please help me with the here ETL.PLC_ACCOUNT is a table and I have to check multiple tables screenshot
According to the screenshot you attached, you're trying to check the table name in a job entry. That only allows for either one static value or variable at the same time.
Instead of that, if you want to validate multiple tables use the transformation step "Table exists". Make sure to have a previous step to set a field containing the list of table names you want to validate.
Related
I am creating table using google drive as a source and google sheet as a format.
I have selected "Drive" as a value for create table from. For file Format, I selected Google Sheet.
Also I selected the Auto Detect Schema and input parameters.
Its creating the table but the first row of the sheet is also loaded as a data instead of table fields.
Kindly tell me what I need to do to get the first row of the sheet as a table column name not as a data.
It would have been helpful if you could include a screenshot of the top few rows of the file you're trying to upload at least to see the data types you have in there. BigQuery, at least as of when this response was composed, cannot differentiate between column names and data rows if both have similar datatypes while schema auto detection is used. For instance, if your data looks like this:
headerA, headerB
row1a, row1b
row2a, row2b
row3a, row3b
BigQuery would not be able to detect the column names (at least automatically using the UI options alone) since all the headers and row data are Strings. The "Header rows to skip" option would not help with this.
Schema auto detection should be able to detect and differentiate column names from data rows when you have different data types for different columns though.
You have an option to skip header row in Advanced options. Simply put 1 as the number of rows to skip (your first row is where your header is). It will skip the first row and use it as the values for your header.
I'm new to PDI and Kettle, and what I thought was a simple experiment to teach myself some basics has turned into a lot of frustration.
I want to check a database to see if a particular record exists (i.e. vendor). I would like to get the name of the vendor from reading a flat file (.CSV).
My first hurdle selecting only the vendor name from 8 fields in the CSV
The second hurdle is how to use that vendor name as a variable in a database query.
My third issue is what type of step to use for the database lookup.
I tried a dynamic SQL query, but I couldn't determine how to build the query using a variable, then how to pass the desired value to the variable.
The database table (VendorRatings) has 30 fields, one of which is vendor. The CSV also has 8 fields, one of which is also vendor.
My best effort was to use a dynamic query using:
SELECT * FROM VENDORRATINGS WHERE VENDOR = ?
How do I programmatically assign the desired value to "?" in the query? Specifically, how do I link the output of a specific field from Text File Input to the "vendor = ?" SQL query?
The best practice is a Stream lookup. For each record in the main flow (VendorRating) lookup in the reference file (the CSV) for the vendor details (lookup fields), based on its identifier (possibly its number or name or firstname+lastname).
First "hurdle" : Once the path of the csv file defined, press the Get field button.
It will take the first line as header to know the field names and explore the first 100 (customizable) record to determine the field types.
If the name is not on the first line, uncheck the Header row present, press the Get field button, and then change the name on the panel.
If there is more than one header row or other complexities, use the Text file input.
The same is valid for the lookup step: use the Get lookup field button and delete the fields you do not need.
Due to the fact that
There is at most one vendorrating per vendor.
You have to do something if there is no match.
I suggest the following flow:
Read the CSV and for each row look up in the table (i.e.: the lookup table is the SQL table rather that the CSV file). And put default upon not matching. I suggest something really visible like "--- NO MATCH ---".
Then, in case of no match, the filter redirect the flow to the alternative action (here: insert into the SQL table). Then the two flows and merged into the downstream flow.
I work in a sales based environment and our data consists of 'leads'.
Let's say we record CompanyName, PhoneNumber, Address1 & PostCode(ZIP). These rows a seeded with a unique ID in the schema.
The leads come in from various sources and are compiled onto a spread sheet and then imported into SQL 2012 using SSIS.
After a validation check to see if a file exists we then use a simple data flow which consists of an Excel source, Derived Column, Data Conversion and finally an OLE DB Destination.
My requirement I'm sure has a relatively simple solution. I understand what I need to achieve is the first step. I need to take a sample of data from the last rolling two months, if 2 or more fields in the source excel file match the corresponding field in the destination sql table then I want to redirect to another table.
I am unsure of which combination of components I could use to achieve this. I believe that Fuzzy lookup may not be what I am looking for as I am looking to find exact field matches, I have looked at the lookup component but I am unsure if this is the way to go.
Could anyone please provide some advice on how I can best achieve this as simply as possible.
You can use the Lookup to check for matches in your existing table. However, it will be fairly complicated to implement the requirement of checking for any two or more fields matching. Your expression would be long and complex basically consisting of:
(using pseudo code for readability)
IIF((a=a AND b=b) OR (a=a AND c=c) OR (b=b AND c=c) OR ...and so on
for every combination of two columns you want to test
I would do this by importing the entire spreadsheet to a staging table, and doing the existing rows check in a SQL stored proc that moves the data to the desired destination table.
I have a requirement where a particular Kettle transformation (ktr) needs to be run multiple times.
This is the scenario:
A transformation has a table input which pulls user details belonging to a particular country.
I have almost 5 countries and this is saved in a separate table.
Can i make a variable and assign the country name to it and run the same transformation in a loop of five times, where every time the variable gets updated to the next country name.
I need the variable to be used in the table input query and in the column name also.
This is how i mentioned the variable in the table input.
When i am giving the variable as value, in the output i am getting '${COUNTRY
}' instead of the value of the variable
PDI allows you to do multiple iteration using a variable. You need to use "copy rows to result" in kettel step. I have a blog written on this topic.
Blog Link : https://anotherreeshu.wordpress.com/2014/12/23/using-copy-rows-to-result-in-pentaho-data-integration/
Please check if it helps you. :)
I have problems with my records within my database, so I have a template with about 260,000 records and for each record they have 3 identification columns to determine what time period the record is from and location: one for year, one for month, and one for region. Then the information for identifying the specific item is TagName, and Description. The Problem I am having is when someone entered data into this database they entered different description for the same device, I know this because the tag name is the same. Can I write code that will go through the data base find the items with the same tag name and use one of the descriptions to replace the ones that are different to have a more uniform database. Also some devices do not have tag names so we would want to avoid the "" Case.
Also moving forward into the future I have added more columns to the database to allow for more information to be retrieved, is there a way that I can back fill the data to older records once I know that they have the same tag name and Description once the database is cleaned up? Thanks in advance for the information it is much appreciated.
I assume that this will have to be done with VBA of some sort to modify records by looking for the first record with that description and using a variable to assign that description to all the other items with the same tag name? I just am not sure of the correct VBA syntax to go about this. I assume a similar method would be used for the backfilling process?
Your question is rather broad and multifaceted, so I'll answer key parts in steps:
The Problem I am having is when someone entered data into this
database they entered different description for the same device, I
know this because the tag name is the same.
While you could fix up those inconsistencies easily enough with a bit of SQL code, it would be better to avoid those inconsistencies being possible in the first place:
Create a new table, let's call it 'Tags', with TagName and TagDescription fields, and with TagName set as the primary key. Ensure both fields have their Required setting to True and Allow Zero Length to False.
Populate this new table with all possible tags - you can do this with a one-off 'append query' in Access jargon (INSERT INTO statement in SQL).
Delete the tag description column from the main table.
Go into the Relationships view and add a one-to-many relation between the two tables, linking the TagName field in the main table to the TagName field in the Tags table.
As required, create a query that aggregates data from the two tables.
Also some devices do not have tag names so we would want to avoid the
"" Case.
In Access, the concept of an empty string ("") is different from the concept of a true blank or 'null'. As such, it would be a good idea to replace all empty strings (if there are any) with nulls -
UPDATE MyTable SET TagName = Null WHERE TagName = '';
You can then set the TagName field's Allow Zero Length property to False in the table designer.
Also moving forward into the future I have added more columns to the
database to allow for more information to be retrieved
Think less in terms of more columns than more tables.
I assume that this will have to be done with VBA of some sort to modify records
Either VBA, SQL, or the Access query designers (which create SQL code behind the scenes). In terms of being able to crunch through data the quickest, SQL is best, though pure VBA (and in particular, using the DAO object library) can be easier to understand and follow.