Pentaho : How to split single Excel file to multiple excel sheet output - pentaho

I have a list of employee details. I want to split each employee detail in separate Excel sheet. I tried a lot but I get only separate Excel files alone, not all the sheets in a single file by using Pentaho kettle.
Eg:
Raja 22 developer 25000
ravi 23 tester 2000
karthik 24 designer 4000
Mani 28 developer 45000
In that each employee details will need separate sheets in a single excel file. Already, I worked with "MS excel writer" but it did not work.
EDIT
Thanks for your valuable reply,its really clear and more useful. :-) But i need one more detail from you,if i added age,skill ,salary columns into the get variables, after run the job ,i didn't get the values of these three fields only their column names shown in every sheet. I need to include their column values in each sheet
example : sheet 1 : (raja)
Name age skill salary Raja 22 developer 25000
sheet 2 :(ravi) Name age skill salary ravi 23 tester 2000
Like that i need to generate each sheets,I hope you get my point. Can you please help me how to i generate that.

You need to copy the rows (employee names) into memory and then loop it across the excel file to generate multiple sheets with employee names. I have uploaded the codes in this location. You can view it.
First of all, i took an excel input and used "copy rows to result" step to load the data to the memory.
In the Second Step, loop all the data in the memory and write it across the excel file. You can loop the file by enabling the 'copy previous results to parameter' and 'execute for every row' will run for every single row. Check the image below:
Finally when writing the files in an excel file (using Excel Writer Step) make sure that every time a row is coming from prev. step is getting inserted into the same file. Check the below image for this.
I have uploaded a sample code in github. Hope it helps :)

Related

Automated formatting CSV files using an Add-In in Excel 2010

I have a C program that generates CSV files as output.
For example - I have 12 CSV files; one for each month. First column is Employee Name and Second column is their respective salaries. Assume there are 10 employees.
I want to format the 'salaries'column such that if a cell has a value over 10,000, the text should become green. Else it should remain blue. I have been able to do this using VBA for one file by adding a Command Button in the file and writing a small script.
However the issue is, every time I run my C code, it will generate a new set of CSVs; overwriting the existing ones. Besides, I want to apply such formatting to other CSV files I intend to generate.
I read that a .xla Add-In (which can exist on its own unlike embedded VBA macros in an Excel sheet) might be the solution.
I have 2 questions -
1. Is it possible to do with Add-Ins?
2. If yes, then I wanted to create a single batch file which would first generate the CSVs, and then run the Add-In on it. Is that possible too?
Thank You.

Check if a Cell Value exists in column, return a value in the same row but different column

I have a large sheet of data in Excel that comes from a vendor daily but I only need certain things from this worksheet. I don't want to delete any of the data I don't need as its used by others and I don't want to filter the data for other reasons. I've tried some other Excel formulas I've found from other users questions and they don't quite do what I need them to do.
All of the data from the vendor comes via a text file and through connections that I didn't setup, we use the "Refresh All" button on the Data tab of the Ribbon to import the new data from the text file into Sheet1 on a daily basis.
I have created Sheet2 where I plan to use the data for manipulation. What I am trying to do Sheet2 is look for the value "A" (what we call an account tag) in Column B of Sheet1. If the value "A" is found then output the value of the cell located in Column A (the account number) of the same row that the value "A" was located on. I want it to skip all the rows that don't have an "A" in Column B as there are several other tags in Column B that are not useful to me.
I need to pretty much repeat the same process to pull the accounts balance from Column AA but once I figure out how to do the above I am sure I can use the same method to pull the balance information.
I have attempted to use VLOOKUP, MATCH and a couple other methods but I can't seem to figure out how to do this. Another thing I'm afraid of is if I get the formula right, it is going to give me blank rows on Sheet2 for all of the rows on Sheet1 that don't have the "A" value, which I don't want it to do. I only want Sheet2 to contain the information I need. I have a feeling I might need to do a Macro, but I am not sure where to start.
Thanks,
EDIT as of 05/31/2016
Ok, so I will attempt to clarify this.
Sheet 1:
Data that is Input
Sheet 2:
Data that is Output
If you look at the image of Sheet 1 Column B has what I am going to call Tags. I need excel to look in Column B for any rows that have an A tag and copy the Account number to Sheet 2 (the output sheet) and then look in Column AA and copy the Balance so that my results on a second sheet look like the image I posted in the second image(the output sheet). There is other data I'll be dealing with on sheet 2 but for the purposes of explaining my problem I only need this information.
If you look at Sheet 1 you will notice that the account number is the same for many rows until I reach a new account number which is tagged with another A in column B. So I need excel to ignore all the other rows until it sees another A in column B and repeat the process of collecting the account number and the balance.
I hope this clarified the problem.
Thanks,
One can use the Macro recorder to begin code. You need an operation to record.
If I have some data
Team Points <space> Team
Leicester 81 <space> Arsenal
Arsenal 71 <space>
Spurs 70 <space>
then I can use from the Data ribbon "Advanced" which throws a popup box, where one can select the List range, and a criteria range. Also one can choose for Action the Copy to another location.
Running this I can get
Team Points <space> Team <space> Team Points
Leicester 81 <space> Arsenal <space> Arsenal 71
Arsenal 71 <space>
Spurs 70 <space>

Transform and load a large CSV to multiple worksheets in one Excel file

Back Story:
NEW PROJECT FROM MANAGEMENT: I have been given a soft project from my boss to evaluate one of our current ETL plans to look for room for improvement in the process, and I am looking for guidance.
MOTIVE: Excel is currently being used and crashes quite often during the process due to file size.
TASK: Every month an analyst receives a large csv file from a survey vendor containing up to 750 columns (not all unique names) with over 15,000 rows to simply transform a large csv file into an excel file with seven worksheets broken up based on the column headings in the csv. Details of how it is broken up is below.
My question is one large csv being transformed into an edited excel file with multiple worksheets any easier or quicker using VB.NET and VS2010 or VBA for that matter, or would using Excel be the simplilest way to continue this process? I am an Expert Excel user but I am still very much a beginner to intermediate at coding in VBA, VB.NET or any other language.
Detailed Question:
I am open to using free or open source software, but I am most familiar with VB.NET and Excel and Excel-VBA. I have played around a bit coding a simple windows form application to load the csv into a datatable using similar TextFieldParser code found here. I have thought of loading it into an array or even a 2d array to more easily edit the column headings and find the duplicate column headings. The datatable option still leaves me with more questions than answers because I need unique column headings and not sure if I should bother with a datatable if I'm going to just write an excel file right away. I tried CSVreader from CodeProject won't work on files with duplicate header names. I feel as though I am having writers block as I am not sure which direction I should take handle such a process. Any input you can provide will be much appreciated, and I apologize if this question does not have a single and clear best answer, Thanks.
Current Analyst tasks using excel
The current analytical plan has said analyst to open the csv in excel, insert a row above row 1 and use a vlookup to replace the 'New' column names with the 'Old' column names based on a simple two column lookup table on a separate worksheet. For example
New becomes Old
"org-name" becomes "org_name" or
"item_1_Vendor" becomes "item_1" or
"date-created_Survey" becomes "date_created"
etc...checking all sent "New" columns against the list of all possible 750 columns.
Then they paste values of the first row and then delete the 2nd row which contained the New headings we want to change.
Then the analyst has to fix the primary key on the file which is called "sid".
The Survey ID field (sid) should have a number for each row of the data file. Sometimes the sid shows up under the sid_HCAHPS or the sid_CGCAHPS fields instead.
The analyst would insert a column next to the "sid" field and put a formula in it like this, for example:
=IF(BE2<>"",BE2,IF(RD2<>"",RD2,IF(UH2<>"",UH2,"")))
Actual cell references would change but in the example excel formula,
"sid"=Range("BE2")
"sid_HCAHPS"=Range("RD2")
"sid_CGCAHPS"=Range("UH2")
Once the newly created primary key column is made and filled without blanks, we can delete the original "sid" column.
The next step is to check the columns because there may be a redundant HCAHPS section of columns (due to a second survey being sent and then returned- coded as Wave 2), delete second set of columns "sid_HCAHPS" through "language"
Next is the largest alteration because we have setup a system where we send this information to our database admins in the form of a seven worksheet excel file to be loaded by an MS Access Query that creates a table from each sheet that gets loaded into our proprietary business intelligence software. All Done!!
Is your question, "can VB.net automate our current analyst tasks?" -If so, then yes.
You could use the streamreader class to get data from your csv
(http://msdn.microsoft.com/en-us/library/system.io.streamreader.aspx)
Then store it either in an array as you mentioned or use the *list class
(http://msdn.microsoft.com/en-us/library/6sh2ey19.aspx)
Once you've got all your data stored you'll need to automate excel, this is quite straight forward but here's a link to get you started with that as well: http://support.microsoft.com/kb/301982/en-gb
With the list class you can create a list of custom objects using either classes or structures. eg.
We define a structure:
Structure rowOfData
Public intPrimaryKey as Integer
Public strIceCreamName as String
Public decPrice as Decimal
End Structure
We can then create a rowOfData and add properties to it:
Dim iceCream1 as rowOfData
iceCream1.intPrimaryKey = 1
iceCream1.strIceCreamName = "Mr Whippy"
iceCream1.decPrice = 0.99
We create a list with:
Dim listOfIceCreams as New List(of rowOfData)
And add to it like this:
listOfIceCreams.Add(iceCream1)
listOfIceCreams.Add(iceCream2)
etc.
And access the members of the list like this:
listOfIceCreams(0).decPrice 'gives us the price of the ice Cream that was added to the list first.
There are also a lot of other useful methods that lists have which arrays don't. You could have a look through that msdn list class link to see if anything jumps out at you that you might need

Filtering to new sheets from master sheet in Excel

Im wondering if this is possible with excel. I have a list of employees with multiple columns counting up their production in my company. It is a list of about 40 names and has about 10 columns of different tasks that are counted up to determine productivity. these people are split up into different teams and i have their names tagged with the team. For example, if joe smith was in customer service, his name says CSR-Joe Smith. I want to be able to use this excel sheet over and over again so i can simply run this master report and put the data into one sheet to populate a different sheet for each team. Is there a way to do this by looking at the team tag i add to each persons name and extract from the list of employees the team members that are on the same team? Im working on excel 2010 and have some knowledge of VB and C coding in excel but not a ton of experience with it. I also want to pull the values for each team member in each column.
here is an example if my words didnt make sense, each employee has a row like this.
name status1 status 2 status 3 status 4 status 5
CSR-Joe Smith 251 358 12 58 9
I should mention that this data is gathered from a SQL query that i want to pull out of MS SQL management studio and am copying into excel.
To sum it up, my solution for this kind of problem and related:
Solve database-problems in a database, whenever you can ;)
It is far more easy to make joins, selection and filtering inside a database, than inside excel.
If you have to do it in excel, see if subsum, pivotTable or vlookup can help you, because those could most likely do.
Be alarmed, when you start trying to solve database-problems with vba - usually there are better ways, or at least there should be. When you have a good datasource - like sql - it should definetly be your first attempt to get create a better view or table in sql.
Additionlly, you can use .odc connection-files with excel directly. By opening one, you create a new Workbook and you will open your connection as a worksheet.

Excel VB Script

I have two sets of data that contains some of the same information. This data is Names, b-days and other information about people. Each person is contained in one row through multiple columns.
Any ideas of how to make a script through VB to check all information from one person in one set of data with all of the data in the other set? I need to Highlight any names that are not in both sets. e.g.
(If another way in excel is available that works too) Cannot download any other software
Set 1
Somebody, Bob 9/2/2012 Male
Someonelse, Joe 8/16/1950 Male
Set 2
Somebody, Bob 9/2/2012 Male
In this case I would need to highlight Someonelse, Joe in Set 1.
Acual data contains a few thousand people. Efficiency of script not a huge deal, as long as it gets the job done.
Do you have an example? I'm not very familiar with how Excel works. Thank you! – Scape 27 secs ago
If your intention is to find the values which are not in SET 2 then you can use the Countif formula in conjunction with an IF formula. See the snapshot.
And if you want to highlight the cell then use conditional formatting using a formula :)