pentaho process selected rows - pentaho

Objective
I have an excel sheet that has 10 rows. Now, I want to select rows 5 and 6 only.
What I tried
I am getting the rows to set limit in the Excel Input -> Container, but using limit I am only getting rows smaller than input limit. So please anyone tell me how can I get the above condition.
Updated
specified start row = 5 in Sheets Tab
Without specified start row

First of all in content section you can specify the filter and it is working perfectly fine, I have checked.
For achieving your output you can simply use filter rows step, specify the values you want to in values condition.

With the Excel input step, on the Sheet tab you can specify a range. Tell you want to start on row 5, col 1. Tell also on the Content tab, that you do not want a header, and a limit 2.
This should work. However, the answer is a bit academic, and I would suggest #Working Hard's answer. Read every thing, then use a Filter step. In this step you can put more than conditions, like row=5 or row=6, or like row>=5 and row<=6. To do this, put the first condition, then click on the small green + on top right and put the second condition. Afterwards, you can click on the AND to change it in OR (among others).

Related

How to use vba to filer a column using value from a specific cell

I want to use a macro to filter columns in a table. I want to filter for values that are higher than a value I want to put in cell, to be able to easily change the filter. Does someone have a trick for doing this with vba?
Many thanks, Bram
Record a macro whilst filtering a table on a column value. You would right click on the table column header of interest whilst recording the code and select Number_Filters > Greater Than and enter your desired number. That would give you the outline code. You can then amend the code to pick up the desired value from a specified cell. If applying filter to multiple columns record macro whilst doing this process over several columns.
Thank you for you answer. I tried this already, but I could not get the macro to pick from a specific cell. If I stored the value of the specific cell under as 'value' and put that in the outlined code, it would just do Greater Than value.. DO you have shortcut for this?
Thanks!

How can i merge header cells in Excel writer in pentaho?

I am trying to merge header cells columns into one cell but when i do that my data also comes in one column. I want my resulting output as per this screenshot attached. Kindly help me for this.
Are your columns variable? Or you always have the same output schema?
If it's fixed then, I would use a template where the headers are fixed and I start populating from row 5.
Google Spreadsheet input
If you are using the Spreadsheet input that is not possible on the step.
What I usually do in that kind of situation is to create a row with my headers and hide it so the user don't get confused with two headers. Them the Step will get the result perfectly using the column names provided on first row. (you can use a formula like =b3 there so it changes with the real header. No problem.)
Excel input
If you are using the Excel input step you can set the sheet to be read from row 2, column 0 and should work fine. =)

OpenRefine - Fill between cells but not at the end of the list

I have a list of stock prices for several stocks. Some of the values are missing due to weekends, holidays and probably other reasons.
The gaps are not consistent. Some are two days and some are more than that.
I want to fill the gaps with the last known value but not at the end of the list.
I have tried in Excel to test a few cells below and if it's now empty, do the fill. The problem is that due to the inconsistency of the gaps, it's a tedious task to change the function for all the cases.
Is there a way to test for the end of a list?
UPDATE - added a screenshot.
See this screenshot. I want to fill where the blue dots are. The red dots are at the end of the list and I don't want to fill those cells.
I am looking for a way to detect the end of the list and stop the filling when the end is detected.
I think this is pretty difficult in OpenRefine and probably a different tool would work better. The main issue is that OpenRefine does not offer the ability to easily work across rows so 'summing a column' (or part of a column) is tricky - this is mentioned in https://github.com/OpenRefine/OpenRefine/issues/200
However, you can do this by forcing OpenRefine in Record mode with the whole project containing a single record. Once you've done this you can access all values in a column using syntax like:
row.record.cells["Column name"].value
This gives an array of all the non-blank values in the column. Since this ignores blank values, in order to have a true view of the values in the column you have to fill in blank cells with a value.
So I think you could probably achieve what you want as follows:
For each column you are going to work with do a cell transform to put a dummy value in empty cells - e.g. if(isBlank(value),"null",value)
Create a new column at the start of your project and put a single value in the very first cell in that column
Switch to Record mode
At this point you should have a single 'Record' in your project - e.g.
You can now access all cells in a column using syntax like row.record.cells["Column 1"].value. You can combine this with 'forRange' to iterate through the contents of this array, using the row.index as the marker for the current row.
I used the following formula to add a new column to the project:
with(row.record.cells["Column 1"].value,w,if(forRange(row.index,w.length(),1,i,w[i].toNumber()).sum()>0,"a","b"))
Then...
Change back to 'Row' mode
Remove the 'null' placeholder from the original column
Create a facet on the 'fill filter' column
In my case I filter to 'a'
Use the 'fill down' option
Remove the filter
And remove the 'record' column
Rather a long winded way of doing it to say the least, but so far I've not been able to find anything better while not going outside OpenRefine. I'm guessing you could probably compress steps 5-11 into a single step or smaller number of steps.
If you want to access the array of cell values using Jython as suggested by iMitwe you need to use:
row["record"]["cells"]["Column 1"]["value"]
instead of
row.record.cells["Column 1"].value
(step 5)
I am doing this on the top of my head, but I think your best chance my be using the fill down option in record mode:
first move your column to the first column and switch to record mode.
then use the following GREL: row.record.cells["data"].value[-1] where data is the name of your column
The [-1] will take the last value and fill the blank. For the case with the red dot, since there is no value it should remains empty. Let us know how it goes.
Unless there's something I am missing or not seeing...
I would have just sorted reverse (date ascending) on the Date column, then individually use Fill Down on each column, except for that last column where you could then use a Date facet on your column Date to specify the exact Date range you wanted to work with, then fill down on that last column, then remove the Date range facet.

How to randomly distribute a known group of numbers into a column using Excel / VBA

I'm stuck with excel/vba:
I've got a 10 row x 30 column blank array in Excel. I am trying to distribute 10 integers from a known group of 10 (say 1,1,1,1,1,1,3,5,7,9) into each column randomly so that each row of the column contains one of the group (and all of the group members are used once), and I need the second column to contain another random distribution of the same group and so on.
So I'd end up with 30 columns of 10 rows each, with each column containing a different random distribution of the same 10 integers. I want to be able to change the distribution in each row by recalculating the spreadsheet too.
Is there a quick way to do this? Short of arranging 30 different rand() sorted lists and using lookups I couldn't see a way. I'm not savvy enough with VBA to have a go. If someone can point me in the right direction, I'd be eternally grateful!
Perhaps I'm missing something obvious, though this does not seem to be so straightforward using worksheet formulas alone.
If your orginal list of values is in A1:A10, then, in B1:
=INDEX($A$1:$A$10,RANDBETWEEN(1,10))
and in B2, array formula**:
=INDEX($A$1:$A$10,INDEX(MODE.MULT(IF(COUNTIF($A$1:$A$10,$A$1:$A$10)-COUNTIF(B$1:B1,$A$1:$A$10),{1,1}*ROW($A$1:$A$10))),RANDBETWEEN(1,10-ROWS($1:1))))
Copy the above down to B10.
You can then copy the formulas in B1:B10 to the right as desired.
Regards
**Array formulas are not entered in the same way as 'standard' formulas. Instead of pressing just ENTER, you first hold down CTRL and SHIFT, and only then press ENTER. If you've done it correctly, you'll notice Excel puts curly brackets {} around the formula (though do not attempt to manually insert these yourself).
You could make a loop in which you make an array with your 10 numbers. Then loop though 30 columns, with first adding another column of 10 randomly drawn numbers to your array. See this website on how to draw random numbers. Then sort the array on the second column and post the first column.
Edit:
As I read in the comments on the other answer, the purist solution would be to:
Assign each unique option of values a random value
Sort these random values either from top to bottom or bottom to top, and select the top one.
Place it in the first row
Do the same thing again for the second row, but keep track of the sum of all the unique options, as to rule out an option once it maxed its presence.
Edit2:
Once I just clicked post I thought this a bit more through and came to the conclusion that the last digit will allmost always be 1 in this case....

Print When (last element reached) Expression in JasperReports

Is it possible to generated a "Print When Expression" that detects the last element in an XML datasource file?
Basically I have a report with a column break inserted after a sub-report in a detail band so I can clearly define new pages for the beginning of a new record. But it always leaves me with a blank last page. So I am hoping that I can prevent this if I have a print when condition that prevents the column break if it is the last record element in the XML datasource.
Is this even possible?
The problem is that you don't know it's the last element until after you look for the next element. I don't think there is a simple way.
In principle it should be fine to do something like this:
Create a super-report around the entire report. Run the same query in the super-report. Count the rows. Then pass the number of rows to the original report (which is now a subreport) and re-run the query again. Clearly, running the query twice is another drawback.
If the data source were SQL, then I would suggest modifying the SQL to return the number of rows as part of the result set. But for non-SQL data sources, you need some way of knowing the number of rows (well... some way of identifying the last row) before you reach the last row.
Many years late...
if you sure your datasource is a JRBeanCollectionDataSource, you could use:
$V{REPORT_COUNT} == ((net.sf.jasperreports.engine.data.JRBeanCollectionDataSource)ORIGINAL_DATA_SOURCE( )).getData().size()