Copying/Mapping data between excel spread sheets, Vb.net - vb.net

I need to copy data using Vb.net if possible from one excel spreadworbook to another and place the data into the correct columns in the existing excel spreadsheet. The column titles of the spreadsheets match up, I have several templates I need to place data into and the order of the columns is different in each template so I need a way of searching for a column header in the template and then copying the data into that column.
Would the best way of achieving this using ADO?
For example move the data from this Workbook1 with columns "Test1", "Test2" and data
Test1 Test2
1 2
12 23
123 234
Into workbook 2 which will have the same column names but could be in a different order:
Test0 Test1 Test1.1 Test2
I need to do this automatically as I have alot of data to copy and 30-40 workbook templates to copy the data into, the templates columns are in different orders and can not be moved around.

There are different ways to interface with Excel using .NET. If you are just looking to do it with one version of Excel, then VSTO might be your easiest solution, otherwise use something else. I like to use EXCEL-DNA.
You can also use ADO to get the data out, but to put it in another one, I would think you would need one of the ways listed above to do it (since you would need to reference the excel object). If you are using Excel 2007 and above you can also directly access the XML files and manipulate them that way (minus xlb, of course).
You can also create a library from you execution file and copy it locally. See here.
As for headers, just use a Dictionary(Of String, Integer) or List(Of String) to figure out what the index of the file(s) is.

Related

How to load multiple excel sheets into different tables using pentaho metadata injection

I have one excel file which is having 5 different sheets, i want to load all 5 sheets into different table using pentaho meta-data injection.
Note: I have implemented normal approach of repeating flow 5 times.
What i have tried
1) I have created another excel sheet with meta-data of all 5 sheets
2) I am able to pass sheet-name as a run time variable and can able
to replace it into sheets property
3) I am stuck # how to read
corresponding meta-data file and replace into template.
Any solution is appreciated.
Have you followed Pentaho's example on how to use Metadata Injection? It also uses excel sheets to store metadata, and the name of the sheet is passed at runtime and applied by the use of a join.
In my experience, I've stored the metadata in a MySQL table and selected the corresponding one by the use of a Variable set in a previous transformation, depending on the format of the input file.

Formatting Data in excel sheet with blue prism

I'm trying to run a duplicate check In which varying data is pulled from a website and compared to a master list, the master list being stored in Excel. The information from the website is read from a table in which has line breaks. These breaks are translated over to the data collection they are initially stored in. Some of the data from the website us eventually written to the master list in Excel. So when I read the master list back into Blue Prism to run a duplicate check, the rows that have line breaks are written into a collection as multiple rows (ex. I should have on 7 rows in my collections but am getting 42). Since the rows are not EXACTLY the same between the 2 collections, when it runs the automation does not recognize the duplicates.
The easiest way to solve this would be if I could make the collection rows have no line breaks as soon as the data is read. I've attempted to use the calculation stage to do so with no luck. I'm not sure if it is actually possible to do this, but would appreciate any direction.
Record an Excel macro to do the data sorting/cleaning in Excel (possibly Text To Columns, etc..) and then include the running of the macro as part of your Blue Prism process by using an action stage and the MS Excel VBO - Run Macro. Get the process to create an Excel instance (and create a handle data item from that stage), then use Open Workbook (whatever workbook you store your Macro in) and then use the MS Excel VBO - Run Macro (use the same handle created earlier and type in the name of the "macro").
It sounds like what is happening is that the MS Excel VBO is grabbing the data from the Excel Worksheet wholesale.
This is to say that it's accessing your Worksheet table, copying the cell values BUT not the cell formatting data, and then dumping the values into a BP collection.
Since it did not bring along any of the original cell formatting data to reference when it went to populate the collection it's just breaking up the values based on crturn/line breaks. Thus, your collection is organized based on that, and not on the original Worksheet cell.
So, with that said, on to a solution!
Solution 1
Brute force the organization of the incoming Excel cell data to the collection by looping over the Excel Worksheet cell-by-cell.
Run a loop, and in that loop have BP go into the Excel Worksheet and grab the first populated cell it comes across. Run a formatting/cleanup Calculation stage over the data. Dump the cell value into a single collection field.
Repeat.
This is...inelegant, expensive at best, and not at all recommended for any medium to large dataset. But it's definitely the best way to do string manipulation and value comparisons before it hits your collection. Since it sounds like your using a Master template then you as-well know what the expected format of your data should be.
This method will enable you implement Trim(), Concat(), or Split() in a Calculation stage to better organize your incoming data before you dump it into a collection.
This is also basically what I think you're already trying to do, but cell-by-cell instead of Worksheet row-by-row or table-by-table.
Solution 2
Clean up the table data you grab from the website before you dump it into the Excel Worksheet.
This is basically Solution 1, but in reverse. Simply format/cleanup your data before it hits you Excel Worksheet.
I'm not sure this is any better than Solution 1, but, you know, it's something...
Solution 3
Format the cell data IN the MS Excel Worksheet itself.
Basically rearrange the cells and cell data in the Excel Worksheet into a more predictable format by using the Split, Trim, Merge, or other actions included in the MS Excel VBO. You can also do this using the Data - OLEDB utility object, but that requires some pretty solid understanding of SQL syntax.
This would look like this using the MS Excel VBO:
Grab the Excel Worksheet data wholesale and dump into a collection
Count the rows/fields of the collection
Is that number consistent with the desired/expected format of your data?
If not, have the bot go back into the Excel Worksheet and reformat the cells by removing any carriage returns/line breaks/whatever else
Repeat.
However, I'm always reluctant to reformat any original source, as it's then hard to figure out what wrong and where it went wrong when you've changed the original structure of your data. So it's best to always make a copy of the Worksheet before you make any manipulation.
Unfortunately I don't have access to my BP environment at the moment or I'd provide you with the act object actions you'd need to do any of this, my bad. Once I do I'll update this answer.

Sort multiple column in notepad using vb.net

I am doing a project that requires me to transfer data from one notepad to another notepad (saved using excel tab delimited form).
I have successfully done that, the only that left is I need to sort those data after transferring it.
For your information, I am transferring 5 column from the first notepad to the second notepad. I saved those information in five arrays.
How am I supposed to sort them after pasting?
I tried using vb.net sort function but that only will sort one array while the rest of the arrays wont follow.
I tried lines.sort also but the result is not satisfying, any other idea to sort those data like what we normally do manually in excel?
Any help will be very much appreciated.
One solution would be to create an object with 5 values in it. Then you would create an list of those objects (that way the values are all linked).
Then you would just do:
OBJECT.Sort(Function(x, y) x.valueToSortBy.CompareTo(y.valueToSortBy))
This would give you a list of your objects sorted by the value you wanted.

Adding Hyperlinks with VBA

I have two columns in an Excel sheet containing Project names (Column A) and Fields (Column B) and I'm trying to add hyperlink for each row to local files. There's one specific local file for each row.
I don't know how to use Hyperlink function to get the corresponding URLs automatically because there are around 10,000 Project names. I also don't know if it is easier to use VBA.
The URLs are all identical except from "projects":
\nas1\backup\dop4\jobdata\projects\34s\34038 - 10 Wharaora Tce\Structural
The Project name in Column A is something like 34038, 25794 etc. and Filed in Column B is something like Structural, Civil etc.
So my plan is to write a VBA program that adds hyperlinks using URLs constructed from:
\nas1\backup\dop4\jobdata\projects
Because the Project name in column A is 5 digits, I'm thinking of using the LEFT function to get the first two digits to find the files after "Project file".
Folders
Excel file
A data sample will certainly help to understand what you are trying to achieve. Please update your question and then leave a comment to notify people who are watching this thread.
As far as I understand your question, you want to concatenate several cells into a string that can then be used to construct a hyperlink. Consider this screenshot
the formula in cell C2 is
="\nas1\backup\dop4\jobdata\projects\"&LEFT(TEXT(A2,"0"),2)&"s"
Copy down. In D2 you can use the Hyperlink() function to refer to the Address in C2, or wrap a hyperlink function around the formula in C2.

Transform and load a large CSV to multiple worksheets in one Excel file

Back Story:
NEW PROJECT FROM MANAGEMENT: I have been given a soft project from my boss to evaluate one of our current ETL plans to look for room for improvement in the process, and I am looking for guidance.
MOTIVE: Excel is currently being used and crashes quite often during the process due to file size.
TASK: Every month an analyst receives a large csv file from a survey vendor containing up to 750 columns (not all unique names) with over 15,000 rows to simply transform a large csv file into an excel file with seven worksheets broken up based on the column headings in the csv. Details of how it is broken up is below.
My question is one large csv being transformed into an edited excel file with multiple worksheets any easier or quicker using VB.NET and VS2010 or VBA for that matter, or would using Excel be the simplilest way to continue this process? I am an Expert Excel user but I am still very much a beginner to intermediate at coding in VBA, VB.NET or any other language.
Detailed Question:
I am open to using free or open source software, but I am most familiar with VB.NET and Excel and Excel-VBA. I have played around a bit coding a simple windows form application to load the csv into a datatable using similar TextFieldParser code found here. I have thought of loading it into an array or even a 2d array to more easily edit the column headings and find the duplicate column headings. The datatable option still leaves me with more questions than answers because I need unique column headings and not sure if I should bother with a datatable if I'm going to just write an excel file right away. I tried CSVreader from CodeProject won't work on files with duplicate header names. I feel as though I am having writers block as I am not sure which direction I should take handle such a process. Any input you can provide will be much appreciated, and I apologize if this question does not have a single and clear best answer, Thanks.
Current Analyst tasks using excel
The current analytical plan has said analyst to open the csv in excel, insert a row above row 1 and use a vlookup to replace the 'New' column names with the 'Old' column names based on a simple two column lookup table on a separate worksheet. For example
New becomes Old
"org-name" becomes "org_name" or
"item_1_Vendor" becomes "item_1" or
"date-created_Survey" becomes "date_created"
etc...checking all sent "New" columns against the list of all possible 750 columns.
Then they paste values of the first row and then delete the 2nd row which contained the New headings we want to change.
Then the analyst has to fix the primary key on the file which is called "sid".
The Survey ID field (sid) should have a number for each row of the data file. Sometimes the sid shows up under the sid_HCAHPS or the sid_CGCAHPS fields instead.
The analyst would insert a column next to the "sid" field and put a formula in it like this, for example:
=IF(BE2<>"",BE2,IF(RD2<>"",RD2,IF(UH2<>"",UH2,"")))
Actual cell references would change but in the example excel formula,
"sid"=Range("BE2")
"sid_HCAHPS"=Range("RD2")
"sid_CGCAHPS"=Range("UH2")
Once the newly created primary key column is made and filled without blanks, we can delete the original "sid" column.
The next step is to check the columns because there may be a redundant HCAHPS section of columns (due to a second survey being sent and then returned- coded as Wave 2), delete second set of columns "sid_HCAHPS" through "language"
Next is the largest alteration because we have setup a system where we send this information to our database admins in the form of a seven worksheet excel file to be loaded by an MS Access Query that creates a table from each sheet that gets loaded into our proprietary business intelligence software. All Done!!
Is your question, "can VB.net automate our current analyst tasks?" -If so, then yes.
You could use the streamreader class to get data from your csv
(http://msdn.microsoft.com/en-us/library/system.io.streamreader.aspx)
Then store it either in an array as you mentioned or use the *list class
(http://msdn.microsoft.com/en-us/library/6sh2ey19.aspx)
Once you've got all your data stored you'll need to automate excel, this is quite straight forward but here's a link to get you started with that as well: http://support.microsoft.com/kb/301982/en-gb
With the list class you can create a list of custom objects using either classes or structures. eg.
We define a structure:
Structure rowOfData
Public intPrimaryKey as Integer
Public strIceCreamName as String
Public decPrice as Decimal
End Structure
We can then create a rowOfData and add properties to it:
Dim iceCream1 as rowOfData
iceCream1.intPrimaryKey = 1
iceCream1.strIceCreamName = "Mr Whippy"
iceCream1.decPrice = 0.99
We create a list with:
Dim listOfIceCreams as New List(of rowOfData)
And add to it like this:
listOfIceCreams.Add(iceCream1)
listOfIceCreams.Add(iceCream2)
etc.
And access the members of the list like this:
listOfIceCreams(0).decPrice 'gives us the price of the ice Cream that was added to the list first.
There are also a lot of other useful methods that lists have which arrays don't. You could have a look through that msdn list class link to see if anything jumps out at you that you might need