How do I create a column based on other columns in Pentaho? - pentaho

For example,
I'm trying to create a Color column based on a Feeling column. If Feeling == Angry, Color = Red. Otherwise, Color = Blue. At the same time, I'm trying to pull data from another column as well. Let's say, If Temperature == Hot, Color = Red; otherwise Color = Blue.
Is there a step in Pentaho that I can use to do this?

There's not one step that can do both of these, but it can be done with a combination of steps. Look at the Value mapper, the Calculator, and the Set field value steps. For more complex calculations, there's the Formula step, and if worse comes to worst, you can always fall back on the Modified Java Script Value step.
Those steps should get you there. If you want a more exacting description, you'll have to post more detail about how your calculation works.

Related

Color a specific range of cells in the same row

Similar to a Gantt Chart but not quite the same, I'm trying to color a variable range of cells on the same row.
As you can see I have a variable number of tasks and a variable start and end for each task
I've created a simple time series where the "Machine 1", which is in the row number 6, is making all the tasks. The time series is in the row number 3.
So basically, I need to color the ranges given by the tasks table in the Machine 1 row, using that time series as a reference to start coloring the cells. Each task has to have a different color.
As I'm a beginner in VBA, I've been trying to do this using a formula:
=IF(AND($C4<=AK$3;$D4>=AK$3);1;0)
Being C4 the start of the task, AK3 the place of the time series right now, and D4 the end of the task. Then I would fill the whole Machine 1 row. This would give a 1 in the range of the task and a 0 before and after the task, then I could format the row and color the cell by the given value in each cell. (1 color and 0 blank)
The problem is that this only works for one task and I really don't know how to change the formula to add the other tasks. I'm pretty sure this can be done in VBA but, like I said, I'm still a beginner. Please help me
The final answer should look like this. The color doesn't matter, it's how to color the variable ranges automatically the problem
If I understand what you want correctly, this can be done with conditional formatting:
Use four conditions, one for each of the following formulas:
=COLUMN(AK6)<=$D$7+37
=COLUMN(AK6)<=$D$6+37
=COLUMN(AK6)<=$D$5+37
=COLUMN(AK6)<=$D$4+37
and apply them in that order to $AK$6:$BZ$6

Conditional cell formatting on SSRS pivot table

I created a pivot table in SQL that has report names along the left side, and hours (00:00, 00:01, etc.) along the top. The values in the table are the number of times each report has been used during that hour over the past three months. I've imported the table into SSRS, and I'm trying to create a heat map of sorts. I want to color the cells darker or lighter across the row based on the number in each cell compared to the value of cells across the row (cell that has the highest value will be the darkest colored).
I've tried following this guide to color the cells, but here the entire row is one field, while I have separate fields for each column. Is there a way to achieve this?
EDIT: Added picture of table design, and preview where coloring is done incorrectly
I understand your problem better now...The function uses the min and max values of a column to determine the range from lightest to darkest, then it probably looks at what fraction of the range your actual value is. In your case where you have each column's data coming from a different cell it'll be a pain unless your columns are fixed and even then it's more trouble than it needs to be.
I would suggest the following.
DON'T PIVOT your data in SQL, we can do that really easily in SSRS, your dataset will be simpler too something like
ReportName Hour UsageCount
ReportA 0 8
ReportA 1 4
ReportC 22 18
and so on...
Create a new report and add a matrix with reportName as the row group and hour as the column group. The data values will be UsageCount.
That's it for the report design, then just set the cells back ground based on your function but this time you can pass in Max(Fields!UsageCount.Value) etc as per the sample.
I've rushed this a bit so it if not clear, let me know and I'll post a clearer solution.

OpenRefine - Fill between cells but not at the end of the list

I have a list of stock prices for several stocks. Some of the values are missing due to weekends, holidays and probably other reasons.
The gaps are not consistent. Some are two days and some are more than that.
I want to fill the gaps with the last known value but not at the end of the list.
I have tried in Excel to test a few cells below and if it's now empty, do the fill. The problem is that due to the inconsistency of the gaps, it's a tedious task to change the function for all the cases.
Is there a way to test for the end of a list?
UPDATE - added a screenshot.
See this screenshot. I want to fill where the blue dots are. The red dots are at the end of the list and I don't want to fill those cells.
I am looking for a way to detect the end of the list and stop the filling when the end is detected.
I think this is pretty difficult in OpenRefine and probably a different tool would work better. The main issue is that OpenRefine does not offer the ability to easily work across rows so 'summing a column' (or part of a column) is tricky - this is mentioned in https://github.com/OpenRefine/OpenRefine/issues/200
However, you can do this by forcing OpenRefine in Record mode with the whole project containing a single record. Once you've done this you can access all values in a column using syntax like:
row.record.cells["Column name"].value
This gives an array of all the non-blank values in the column. Since this ignores blank values, in order to have a true view of the values in the column you have to fill in blank cells with a value.
So I think you could probably achieve what you want as follows:
For each column you are going to work with do a cell transform to put a dummy value in empty cells - e.g. if(isBlank(value),"null",value)
Create a new column at the start of your project and put a single value in the very first cell in that column
Switch to Record mode
At this point you should have a single 'Record' in your project - e.g.
You can now access all cells in a column using syntax like row.record.cells["Column 1"].value. You can combine this with 'forRange' to iterate through the contents of this array, using the row.index as the marker for the current row.
I used the following formula to add a new column to the project:
with(row.record.cells["Column 1"].value,w,if(forRange(row.index,w.length(),1,i,w[i].toNumber()).sum()>0,"a","b"))
Then...
Change back to 'Row' mode
Remove the 'null' placeholder from the original column
Create a facet on the 'fill filter' column
In my case I filter to 'a'
Use the 'fill down' option
Remove the filter
And remove the 'record' column
Rather a long winded way of doing it to say the least, but so far I've not been able to find anything better while not going outside OpenRefine. I'm guessing you could probably compress steps 5-11 into a single step or smaller number of steps.
If you want to access the array of cell values using Jython as suggested by iMitwe you need to use:
row["record"]["cells"]["Column 1"]["value"]
instead of
row.record.cells["Column 1"].value
(step 5)
I am doing this on the top of my head, but I think your best chance my be using the fill down option in record mode:
first move your column to the first column and switch to record mode.
then use the following GREL: row.record.cells["data"].value[-1] where data is the name of your column
The [-1] will take the last value and fill the blank. For the case with the red dot, since there is no value it should remains empty. Let us know how it goes.
Unless there's something I am missing or not seeing...
I would have just sorted reverse (date ascending) on the Date column, then individually use Fill Down on each column, except for that last column where you could then use a Date facet on your column Date to specify the exact Date range you wanted to work with, then fill down on that last column, then remove the Date range facet.

Reporting Services - Two filters on the same chart Category Group?

I have sales data that I'd like to plot on my chart. However, at a specific point in time, we had a change taking place I'd like to ensure is clearly visible in the chart, preferably by dividing the sales data (which is stored in a single SQL Server column) into two different chunks, which would allow me to then treat them as different data series.
I used to solve this in Excel by storing the post-event data in a different column (by simply dragging them to a different column), and thus I was able to treat them as a different series (the blue and green line in the chart below. The red and orange line are pre-event and post-event averages):
I'd like to reproduce this effect in SSRS, but am not sure how to tackle it. I've tried using an approach where I added two category groups, both pointing to the date-time column, and applying filters to them (one <= the cutoff date, the other >=).
I then added my sales data twice, with the idea I could somehow connect them to the individual category groups, but that does not seem possible.
Has anyone tried anything like this before, or would have a different approach to achieve what I'm trying to get?
Thanks!
I managed to get this to work, and figured I'd share how to do it.
My dataset contains a field called DATEKEY, which stores the date in the format YYYYMMDD. It's possible to use this in an expression and evaluate the date for a specific row. In case the expression evaluates to true, we display the value. If not, we display a blank string.
In case we want to show the values prior to the date, the expression would be:
=IIF(Fields!DATEKEY.Value <= 20130601, Avg(Fields!My_NUMBER.Value), "")
The second series can then be made by reversing the symbol:
=IIF(Fields!DATEKEY.Value >= 20130601, Avg(Fields!My_NUMBER.Value), "")
The graph then looks like this:

How to make rows invisible while printing reports?

I am using an .rldc file to define the layout of the reports from my program. The problem is, it is to be used for incremental printing. That means the paper will be used over and over as newer rows need to be printed. I'm attempting to approach it this way:
List all corresponding data on the report view.
Make the older rows invisible and only show the latest row.
Print.
That way, the last row is already properly placed. The problem is, I don't know how to implement this. Can anyone help me out?
You could create an IIF(condition,true,false) statement in your report definition on the row visibility variable.
The best way i guess is to define in your data source something of a rank column.
example :
select col1,col2,col3,RANK() OVER (ORDER BY col3 DESC) AS 'rank' from table1
Then in your table or matrix, you click on the row or/and column that you want to make the borders and text white based on a expression.
Go to the properties and dropdown on bordercolor
choose expression and type in (based on my example query)
=IIf(rank.value <> max(rank.value),White,Black)
That will not remove the rows only make the borders white ( unvisible)
The same you can do with Font Color property.
I think this is your best shot at this issue.
Other solution I could think of is to just hide unneccesary rows (which also replaces the visible row)
Then to move the table down by using a expression with a formula like nr of rows hidden before the actual row * height of 1 row, only I m not sure if this is applicable without programming an RDL extension..
Good Luck !