I have two questions about Pentaho Kettle, and I need some help please!
So, I have a CSV file with some data. In one the column's, the file have some dates(in years).. The first problem its, some rows have the "None" in that column and other rows have the date in the right format.
This image should help to "see" the problem:
Problem One
To resolve this problem, I changed the data type in input file and in the database to String. That works, but i thing that's not the correct way to do. I also tryed to use the "Filter Rows " step, but don't worked.. Some help please? :)
The second problem its about a null value in the date field. The database expects to received a date value, but some of tha values are null.. Once again, this image should help to "see" the problem:
Problem Two
What I can do to resolve the both problems? What is the right way to not only resolve the problem, but have a good performance to query the data later?
Thanks very much!
Best regards!
for the first query use input step as a string that's fine after that use select value step use can change string to date formate.
for the second step use filter rows step and separate rows which has none after that replace none with null and link to your next step.
For the "None" String value in the Year column you can first read that column as String then you can use the Step called "Null if" and give "None" as the Value to turn to NULL. Later you can make this Year column as Integer type in the Select Values.
For the second problem, since you are table design expects a non-null value for the date column, you could either change the not-null constraint to nullable. Or if you want a default value for such null values then you can use the step "If field value is null" and you can specify the default value there.
If you want to use the non-null value of the date from the previous previous rows, you can set Repeat to Y in Fields tab of the step Text file input
Alternatively, for both cases, you can try to use a "Value Mapper" from None to something your database can accept.
Related
So I'm trying to find a simple way to create a new column that displays the difference between two existing columns (each with numbers)... I can't seem to find the proper GREL expression....
So I'm trying to find the amount of items sold with a column named "stock_before" and the other named "stock_after".
I click on edit column from the column "stock_before" and then add column based on this column.
For the GREL I have already entered is:
value-cells["Stock_after"]
It returns no syntax error but still all of the cells for preview say "null"... I have transformed the value of the columns to numbers.
For Python I have tried:
substract(value,"Stock_after")
Same no syntax error but still everything null.
This seems so ridiculously simple but I couldn't find an answer... You can guess I'm fairly new to all this :) Hope someone out there can help me!
thanks for your having the patience to read this and thanks for your time if you answer!
I'd like something similar to this (3 columns):
Stock_before, Stock_after, dif
1,1,0
3,1,2
4,4,0
2,1,1
In GREL, the expression cells["Stock_after"] returns a Cell object representing the corresponding cell, not the actual value of that cell. To get the value, you need to use cells["Stock_after"].value.
So your final GREL expression should be value - cells["Stock_after"].value.
You should also make sure your values are stored as numerals, not strings: they should appear in green in the table. If they do not, use a "To number" operation on both columns first.
You can find out more about GREL and Cell objects here:
https://github.com/OpenRefine/OpenRefine/wiki/Variables
I need to make a dimension for a datawarehouse using pentaho.
I need to compare a number in a table with the number I get from a REST call.
If the number is not in the table, I need to set it to a default (999). I was thinking to use table input step with a select statement, and a javascript step that if the result is null to set it to 999. The problem is if there is no result, there is nothing passed through. How can this be done? Another idea was to get all values from that table and somehow convert it to something so I can read id as an array in javascript. I'm very new to pentaho DI but I've did some research but couldn't find what I was looking for. Anyone know how to solve this? If you need information, or want to see my transformation let me know!
Steps something like this:
Load number from api
Get Numbers from table
A) If number not in table -> set number to value 999
B) If number is in table -> do nothing
Continue with transformation with that number
I have this atm:
But the problem is if the number is not in the table, it returns nothing. I was trying to check in javascript if number = null or 0 then set it to 999.
Thanks in advance!
Replace the Input rain-type table by a lookup stream.
You read the main input with a rest, and the dimension table with an Input table, then make a Stream Lookup in which you specify that the lookup step is the dimension input table. In this step you can also specify a default value of 999.
The lookup stream works like this: for each row coming in from the main stream, the steps looks if it exists in the reference step and adds the reference fields to the row. So there is always one and exactly one passing by.
I have received help for splitting a column wit nr and letter.
In the SQL script it all works perfect. It runs complete, with no errors.
But the columns itself doesn't get filled.
I have tried to create te columns in advance as text or as integer. But it doesn't get filled. The SQL query it self turn out ok. But in reality it stay empty. What is wrong?
Your question is not completely clear, but it sounds like what you are trying to do is take a value from one column of a table, split it and use the result to update two other columns in the same table.
If that is the case, you would want to be using the SQL UPDATE command instead of SELECT.
UPDATE d1_plz_whatever
SET nr=SUBSTRING(hn FROM '^[0-9]+'),
zusatz =SUBSTRING(hn FROM '[a-zA-Z]+$');
I have an access database where "Orders" is my table with the column name CusID and is set to Autonumber with format "CUS"0001
I'm trying to read an autonumber with the custom format "CUS0001" from VBN but I can't seem to read it.
I've tried to read it all as a string, but I can't seem to read it.
cmdCustomer.CommandText = "Select * From Orders Where CusID = " & (txtCusID.Text) & ";"
Any help would be greatly appreciated! Thank you :)
As the name suggests, AutoNumber values are numbers. "CUS0001" is clearly not a number so clearly cannot be stored in that column. When you specify a format in Access, that relates ONLY to how the Access application displays that data. It says nothing about how the data is stored. If Access displays a value in that column as "CUS0001" then the column actually contains the number 1 and that is all that your VB app will see, so that's how you have to query it. Also, if you want the value displayed as "CUS0001" in your app then YOU are going to have to format it that way.
It's also worth noting that, if you really did want to search for "CUS0001" then you'd have to wrap that value in single quotes in your SQL code or else you're going to get a syntax error. That said, it shouldn't matter because you should be using a parameter to insert that value into your SQL code.
In my Access database, I have a table called customers. In this table I have a column called DateEntered. The data type for the field is short text.
The values in this column are not coherent - they come in several variations:
MM-DD-YYYY,
MMDDYYYY and
MM/DD/YYYY.
There doesn't seem to be any standard set.
My goal is to select all customers from 2012. I tried
select *
from customers
where DateEntered <('%2013') AND >('%2012');
but it comes up blank when I run it.
Can anyone point out what I'm failing to do correctly & more importantly explain why exactly this query doesn't work in Access? From my understanding of SQL (not very advanced) this should work.
Another variant)
select * from customers where RIGHT(DateEntered, 4) = '2012'
If you have control over the database and application code, the best way to handle this is to use an actual Date field instead of text in the table.
One way to handle this would be to add a new field to the table, write a query or two to correctly convert the text values to actual date values, and populate the new field.
At this point, you would then need to hunt down the application code the refers to this field in any way and adjust to treat the field as a date, not text. This includes your insert and update statements, report code, etc.
Finally, as a last step, I would rename the original text field (or remove it altogether) and rename the new date field to the original field name.
Once you fix the problem, querying against the field will be a piece of cake.
Alternatively, if you can't alter the table and source code, you can use the date conversion function CDATE() to convert the text value to an actual date. Note that you may need to guard against non-date entries (NULL or empty string values, as well as other text values that aren't really dates in the first place). The IsDate() function can be your friend here.
If you have the time and patience, fixing the data and code is the better approach to take, but sometimes this isn't always feasible.
Why don't you use LIKE operators (they're appropriate when you have a pattern using % and _):
select * from customers where DateEntered like '%2013' or DateEntered like '%2012'