Set default value for column in Mosaic Decisions - mosaic-decisions

I’m using a data flow in Mosaic Decisions and I’m using a MySQL writer node. The result set that I’m going to write has a field inserted-time. But I want to skip the value in this column and want to use the default value set for that column in the DB table. How do I do that?

You can simply drag the column that you want to skip into the "skip-insert-column" section of the writer node.
In this screenshot for example, the column "Target" will not be inserted into the target table and whatever default value set for that column in the DB table will be applied automatically.

Related

Create table schema and load data in bigquery table using source google drive

I am creating table using google drive as a source and google sheet as a format.
I have selected "Drive" as a value for create table from. For file Format, I selected Google Sheet.
Also I selected the Auto Detect Schema and input parameters.
Its creating the table but the first row of the sheet is also loaded as a data instead of table fields.
Kindly tell me what I need to do to get the first row of the sheet as a table column name not as a data.
It would have been helpful if you could include a screenshot of the top few rows of the file you're trying to upload at least to see the data types you have in there. BigQuery, at least as of when this response was composed, cannot differentiate between column names and data rows if both have similar datatypes while schema auto detection is used. For instance, if your data looks like this:
headerA, headerB
row1a, row1b
row2a, row2b
row3a, row3b
BigQuery would not be able to detect the column names (at least automatically using the UI options alone) since all the headers and row data are Strings. The "Header rows to skip" option would not help with this.
Schema auto detection should be able to detect and differentiate column names from data rows when you have different data types for different columns though.
You have an option to skip header row in Advanced options. Simply put 1 as the number of rows to skip (your first row is where your header is). It will skip the first row and use it as the values for your header.

Read for-each-loop container variable in OLEDB source variable windows

I make one table named QueryTable that store 4 SQL queries each have different meta data
I want to store these four queries result in Excel sheet
First I have taken executable SQL task and configured the connection and Result Set as a Full Result Set, Query statement.
After that open Result Set tab and create Query_variable as a object type.
2) Drag the For-Each_loop container and set Foreach ADO Enumerator in collection part and assign Query_variable
In variable mapping part create new variable as string type to store four queries. Result.
3) Finally add I one data flow task add OLEDB source configure with Same variable (That I have given in for each loop container).
Rightnow it is showing default value what i have given in User::Variable
I can iterate same No of column (Meta-data) queries and store in excel destination
But the Problem is when variable goes to next query that holds lesser or greater no of column.Here package fail cant handle different meta data table
Please assist me ,Can we iterate different meta data queries same time with proper output?
I Hope I have Explain the Problem what i facing exactly
Set the default value of User::Variable to one of the queries, so that BIDS can validate it at design time.
You can also try setting "DelayValidation" to true, but that might not be enough in this case.
Set the delay validation to true for both the data flow and the for each loop container.

Add new column to existing table Pentaho

I have a table input and I need to add the calculation to it i.e. add a new column. I have tried:
to do the calculation and then, feed back. Obviously, it stuck the new data to the old data.
to do the calculation and then feed back but truncate the table. As the process got stuck at some point, I assume what happens is that I was truncating the table while the data was still getting extracted from it.
to use stream lookup and then, feed back. Of course, it also stuck the data on the top of the existing data.
to use stream lookup where I pull the data from the table input, do the calculation, at the same time, pull the data from the same table and do a lookup based on the unique combination of date and id. And use the 'Update' step.
As it is has been running for a while, I am positive it is not the option but I exhausted my options.
It's seems that you need to update the table where your data came from with this new field. Use the Update step with fields A and B as keys.
actully once you connect the hope, result of 1st step is automatically carried forward to the next step. so let's say you have table input step and then you add calculator where you are creating 3rd column. after writing logic right click on calculator step and click on preview you will get the result with all 3 columns
I'd say your issue is not ONLY in Pentaho implementation, there are somethings you can do before reaching Data Staging in Pentaho.
'Workin Hard' is correct when he says you shouldn't use the same table, but instead leave the input untouched, and just upload / insert the new values into a new table, doesn't have to be a new table EVERYTIME, but instead of truncating the original, you truncate the staging table (output table).
How many 'new columns' will you need ? Will every iteration of this run create a new column in the output ? Or you will always have a 'C' Column which is always A+B or some other calculation ? I'm sorry but this isn't clear. If the case is the later, you don't need Pentaho for transformations, Updating 'C' Column with a math or function considering A+B, this can be done directly in most relational DBMS with a simple UPDATE clause. Yes, it can be done in Pentaho, but you're putting a lot of overhead and processing time.

PDI /Kettle - Passing data from previous hop to database query

I'm new to PDI and Kettle, and what I thought was a simple experiment to teach myself some basics has turned into a lot of frustration.
I want to check a database to see if a particular record exists (i.e. vendor). I would like to get the name of the vendor from reading a flat file (.CSV).
My first hurdle selecting only the vendor name from 8 fields in the CSV
The second hurdle is how to use that vendor name as a variable in a database query.
My third issue is what type of step to use for the database lookup.
I tried a dynamic SQL query, but I couldn't determine how to build the query using a variable, then how to pass the desired value to the variable.
The database table (VendorRatings) has 30 fields, one of which is vendor. The CSV also has 8 fields, one of which is also vendor.
My best effort was to use a dynamic query using:
SELECT * FROM VENDORRATINGS WHERE VENDOR = ?
How do I programmatically assign the desired value to "?" in the query? Specifically, how do I link the output of a specific field from Text File Input to the "vendor = ?" SQL query?
The best practice is a Stream lookup. For each record in the main flow (VendorRating) lookup in the reference file (the CSV) for the vendor details (lookup fields), based on its identifier (possibly its number or name or firstname+lastname).
First "hurdle" : Once the path of the csv file defined, press the Get field button.
It will take the first line as header to know the field names and explore the first 100 (customizable) record to determine the field types.
If the name is not on the first line, uncheck the Header row present, press the Get field button, and then change the name on the panel.
If there is more than one header row or other complexities, use the Text file input.
The same is valid for the lookup step: use the Get lookup field button and delete the fields you do not need.
Due to the fact that
There is at most one vendorrating per vendor.
You have to do something if there is no match.
I suggest the following flow:
Read the CSV and for each row look up in the table (i.e.: the lookup table is the SQL table rather that the CSV file). And put default upon not matching. I suggest something really visible like "--- NO MATCH ---".
Then, in case of no match, the filter redirect the flow to the alternative action (here: insert into the SQL table). Then the two flows and merged into the downstream flow.

How to specify a constant as the target value for a column in KTR

I am trying to set up a simple KTR to copy over data from one table to another. In the target table, I have a column called JobId which is not mapped to anything in the source table (the idea is to capture the KTR job id in this column).
I notice that if I do not include this column in the mapping, then the SQL generated drops the column from the table altogether (this happens irrespective of whether I set the update flag to Y or N), which is not what I want.
I would like to know how to set the target column to a constant or not alter it altogether.
Thanks
Use the add constants step, or the get variables step. In the get variables you can get the interal.transformation name variable if you like.