How can i merge header cells in Excel writer in pentaho? - pentaho

I am trying to merge header cells columns into one cell but when i do that my data also comes in one column. I want my resulting output as per this screenshot attached. Kindly help me for this.

Are your columns variable? Or you always have the same output schema?
If it's fixed then, I would use a template where the headers are fixed and I start populating from row 5.

Google Spreadsheet input
If you are using the Spreadsheet input that is not possible on the step.
What I usually do in that kind of situation is to create a row with my headers and hide it so the user don't get confused with two headers. Them the Step will get the result perfectly using the column names provided on first row. (you can use a formula like =b3 there so it changes with the real header. No problem.)
Excel input
If you are using the Excel input step you can set the sheet to be read from row 2, column 0 and should work fine. =)

Related

Create table schema and load data in bigquery table using source google drive

I am creating table using google drive as a source and google sheet as a format.
I have selected "Drive" as a value for create table from. For file Format, I selected Google Sheet.
Also I selected the Auto Detect Schema and input parameters.
Its creating the table but the first row of the sheet is also loaded as a data instead of table fields.
Kindly tell me what I need to do to get the first row of the sheet as a table column name not as a data.
It would have been helpful if you could include a screenshot of the top few rows of the file you're trying to upload at least to see the data types you have in there. BigQuery, at least as of when this response was composed, cannot differentiate between column names and data rows if both have similar datatypes while schema auto detection is used. For instance, if your data looks like this:
headerA, headerB
row1a, row1b
row2a, row2b
row3a, row3b
BigQuery would not be able to detect the column names (at least automatically using the UI options alone) since all the headers and row data are Strings. The "Header rows to skip" option would not help with this.
Schema auto detection should be able to detect and differentiate column names from data rows when you have different data types for different columns though.
You have an option to skip header row in Advanced options. Simply put 1 as the number of rows to skip (your first row is where your header is). It will skip the first row and use it as the values for your header.

Formulas to Switch Row and Column References in Excel

I have raw data that is arranged like this:
The top row is the trial number. Then the first entry in each row is an ID.
I'm trying to set up a template for this data. When this data is pasted in, I want it to be read by another sheet. This sheet will automatically transpose the data, so that it looks like this:
On this sheet, then, I've been trying to write a formula that will increment horizontally when dragged down vertically. When I copy the formula horizontally, I would need it to increase the row reference, not the column reference, so that it'll reproduce the end result in the screenshot above.
I've tried variations on a formula like
INDEX('Asset Returns'!$B$2:$Z$2,COLUMN()-1,ROW()-1)
but I haven't been able to get it working as described. Thanks in advance for any suggestions on what I'm doing wrong.
Not sure if I've understood, but if what you're doing is transposition, but wanting to do it via a common forumla in all the destination cells, you can do it using this:
=INDIRECT(ADDRESS(COLUMN(A1),ROW(A1),1,1,"Asset Returns"))
This should be pasted into the A1 cell of your "transposed" sheet, or adjust the cell reference accordingly if that isn't where the data is.
Another option is:
=OFFSET('Asset Returns'!$A$1,COLUMN(A1)-1,ROW(A1)-1)

How can I use an additional condition by getting data from xls-file input in Pentaho spoon?

I have just started learning pentaho spoon steps and have one problem with solving one problem. I need to transform the data from xls-file and convert it do database. The problem is that my input file looks like this: table-description
And I can not find how to solve two problems:
For my next step I need to save not only the table itself (Range A8:D11), but also the date (cell A5). When I am trying to do it in pentaho with Microsoft Excel Input – Step it works only when I select A8-cell as a start row, but the date is not saved.
In Microsoft Excel Input – Step I must always select a start row in order to generate a table and use it in next steps. And I must do it manually, I mean to say that my table starts from A8-cell. In my case I can not always say for sure that the table starts from A8-cell. I know, that the start-cell is that cell, which is in A-Column and has value = “Date”. Microsoft Excel Input – Step will be first step in my kettle because I must get data and change them. That is why I think I can not use before Java Script.
I have not found the solution to these two problems and I do not know if it is possible to make it. I will be grateful for any help.
I am not sure what do you mean by converting an excel file to database but If you can convert the xls into csv and read that file then you know from which row you need to filter the data. Basically you can use a simple filter step to filter the data when it matches column name. I hope this will help.
Use two Microsoft Excel Input steps. One step reads the table (A8:D11). The other step reads the date (A5). Then merge the two streams, for example using a Join Rows (cartesian product) step
Read everything. Then use a Javascript step with two script tabs. For one of the tabs: Right-click and choose Set start script. Code : var start = 0; The other tab should be kept as a transformation script. Pseudocode: if(FieldA equals "Date") {start = 1;}. Now you will have an additional field in the stream called start. If start equals 0, then you know that your tabular data hasn't started yet, and you can filter out the row.

How to copy selected columns and filter them before hand in excel VBA

I am just a beginner in VBA. I am trying to copy some data from one workbook that is updated daily to a master woorkbook and generate a report. I want it to first filter one of the columns for nonzero values and copy it with three selected columns for example columns T,C,N. I have looked everywhere for an answer but I haven't succeeded yet. Please help.
You can check if a given cell has value 0 by something like this If Sheets(sheetname).Cells(rownumber,columnnumber)=0 Then
You haven't specified what do you want to do on the other workbooks with the cells that were empty.

Excel formula or VBA script to group data

Here is a screenshot of a sample data set that I am trying to work with in Excel
I want to use either an Excel formula or a VBA script to populate the firm_anamoly column (it's manually populated right now).
The logic is that for set of rows in a given firm number, if there are more than one "sector23code"s in that set, the output in column "firm_anamoly" should be "firm_count", else "firm_anamoly" should be set to 0.
As you can see for firm_number = 5, since sector23codes are both 3 and 5, firm_anamoly is set to 3, i.e. firm_count.
I have around 500K rows of data that I am trying to work with.
Thank you.
There are 2 ways you can go about this. One way is to do it without converting your range to a table format.
Method 1:
You can enter this formula in cell D2:
{=IF(AND(IFNA(IF(A2=$A:$A,$B:$B,NA())=B2,TRUE)),0,C2)}
This will get you the results that you want I believe but it will probably overwhelm your Excel if you have a less than powerful system.
I would most recommend
Method 2:
Convert your range to an Excel table. Then enter this formula in the first row of the 'firm_anomoly' column:
{=IF(AND(IFNA(IF([#[firm_number]]=[firm_number],[sector23code],NA())=[#sector23code],TRUE)),0,[#[firm_count]])}
This version will run much more efficiently than Method 1.
Both of these are examples of Array Formulas so when you enter them hit ctrl + shift + enter to get the curly brackets to show up. Since you have so much data you should definitely back up before entering this formula; array formulas on large data sets can sometimes crash Excel.