Advice on most conventional method of parsing data from an input file - vba

I'm a somewhat novice writer of Excel Macros for people in my company. I can tackle the issue at hand in a few different ways but would like some advice on the most efficient and conventional manner to do so.
The macro needs to retrieve data from an input file and compare this data to other data from a second input file. These files are consistent in their structure but obviously different in the data they contain. Each file may contain information for up to 96 samples. Each sample has ~20 categories of information and each category may have 10 pieces of data. So, each sample could have up to 200 pieces of data tied to it.
It seems to me that the best way to store this information is to either create a Class to define an Object and then have a Collection of those Objects like:
Dim Samples as Collection
Dim Smp as CSample
Set Samples = New Collection
For x = 1 to NumberOfSamplesInFile
Set Smp = New CSample
'Set the properties of Smp for each piece of data
Samples.Add Smp
Next x
I assume that I can have certain properties of Smp be arrays? I.e. can a property of the Class be defined as:
Private pSampleID as String
Private pAreaUnderCurve(1 to 10) as double
Private pRetentionTime(1 to 10) as double
Such that
Smp.SampleID = "XYZ"
but
Smp.AreaUnderCurve(1) = 1234
Smp.AreaUnderCurve(2) = 2345
Smp.AreaUnderCurve(3) = 123.78
and the same for retention time (different values obviously)?
The other way I imagined doing this is with the Type declaration:
Type Sample
SampleID as String
AreaUnderCurve(1 to 10) as double
RetentionTime(1 to 10) as double
End Type
My questions is which way is most conventional/recommended?

you could also import the two files into excel in 2 worksheets, then run a comparison on them either by looping through or putting in formulae to match the records and highlight unmatched data...
Is the format of the data files suitable for being imported into Excel?

Related

i want to make a new variable that says shot1 shot2 shot3 so on and forth how do i do this?

Here is what i have tried so far , but no new shot variable is being declared
Module Module1
Dim shotlist As New List(Of Boolean)
Dim shono As Integer = 0
Dim shonos As String
Dim shotname As String
Dim fshot As Boolean
Dim shots As String
Sub Main()
For i As Integer = 0 To 1000
Dim ("shots" & i) as String = "shots" & i
fshot = Convert.ToBoolean(shots)
Next
End Sub
End Module
You can't do things this way. The variable names you see when you write a program are lost, turned into memory addresses by the compiler when it compiles your program. You cannot store any information using a variable's name - the name is just a reference for you, the programmer, during the time you write a program (design time)
Think of variables like buckets, with labels on the outside. Buckets only hold certain kinds of data inside
Dim shotname as String
This declares a bucket labelled shotname on the outside and it can hold a string inside. You can put a string inside it:
shotname = "Shot 1"
You can put any string you like inside this bucket, and thus anything you can reasonably represent as a string can also be put inside this bucket:
shotname = DateTime.Now.ToString()
This takes the current time and turns it into a string that looks like a date/time, and puts it in the bucket. The thing in the bucket is always a string, and lots of things (nearly anything actually) can be represented as a string, but we don't type all our buckets as strings because it isn't very useful - if you have two buckets that hold numbers for example, you can multiply them:
Dim a as Integer
a=2
Dim b as Integer
b=3
Dim c as Integer
c=a*b
But you can't multiply strings, even if they're strings trust look like numbers; strings would have to be converted to numbers first. Storing everything as a string and converting it to some other type before working on it then converting it back to a string to store it would be very wearisome
So that's all well and good and solves the problem of storing varying information in the computer memory and giving it a name that you can reference it by, but it means the developer has to know all the info that will ever be entered into the program. shotname is just one bucket storing one bit of info. Sure you could have
Dim shotname1 as String
Dim shotname2 as String
Dim shotname3 as String
But it would be quite tedious copy paste exercise to do this for a thousand shotname, and after you've done it you have to refer to all of them individually using their full name. There isn't a way to dynamically refer to bucket labels when you write a program, so this won't work:
For i = 0 To 1000
shotname&i = "This is shotname number " & i
Next i
Remember, shotname is lost when compiling, and i is too, so shotname&i just flat out cannot work. Both these things become, in essence, memory addresses that mean something to the compiler and you can't join two memory addresses together to get a third memory address that stores some data. They were only ever nice names for you to help you understand how to write the program, pick good names and and not get confused about what is what
Instead you need the mechanism that was invented to allow varying amounts of data not known at the design-time of the program - arrays
At their heart, arrays are what you're trying to do. You're trying to have load of buckets with a name you can vary and form the name programmatically.
Array naming is simple; an array has a name like any other variable (what I've been calling a bucket up to now - going to switch to calling them variables because everyone else does) and the name refers to the array as a whole, but the array is divided up into multiple slots all of which store the same type of data and have a unique index. Though the name of the array itself is fixed, the index can be varied; together they form a reference to a slot within the array and provide a mechanism for having names that can be generated dynamically
Dim shotnames(999) as String
This here is an array of 1000 strings, and it's called shotnames. It's good to use a plural name because it helps remind you that it's an array, holding multiple something. The 999 defines the last valid slot in the array - arrays start at 0, so with this one running 0 to 999 means there are 1000 entries in it
And critically, though the shotnames part of the variable name remains fixed and must be what you use to refer to the array itself(if you want to do something with the entire thing, like pass it to a function), referring to an individual element/slot in the array is done by tacking a number onto the end within brackets:
shotnames(587) = "The 588th shotname"
Keep in mind that "starts at 0 thing"
This 587 can come from anything that supplies a number; it doesn't have to be hard coded by you when you write the program:
For i = 0 to 999
shotnames(i) = "The " & (i+1) & "th shotname"
Next i
Or things that generate numbers:
shotnames(DateTime.Now.Minute) = "X"
If it's 12:59 when this code runs, then shotnames(59), the sixtieth slot in the array, will become filled with X
There are other kinds of varying storage; a list or a dictionary are commonly used examples, but they follow these same notions - you'll get a variable where part of the name is fixed and part of the name is an index you can vary.
At their heart they are just arrays too- if you looked inside a List you'd find an array with some extra code surrounding it that, when the array gets full, it makes a new array of twice the size and copies everything over. This way it provides "an array that expands" type functionality - something that arrays don't do natively.
Dictionaries are worth mentioning because they provide a way to index by other things than a number and can achieve storage of a great number of things not known at design time as a result:
Dim x as New Dictionary(Of String, Object)
This creates a dictionary indexed by a string that stores objects (I.e. anything - dates, strings, numbers, people..)
You could do like:
x("name") = "John"
x("age") = 32
You could even realize what you were trying to do earlier:
For i = 0 to 999
x("shotname"&i) = "The " & (i+1) & "th shotname"
Next i
(Though you'd probably do this in a Dictionary(Of string, string), ie a dictionary that is both indexed by string and stores strings.. and you probably wouldn't go to this level of effort to have a dictionary that stores x("shotname587") when it's simpler to declare an array shotname(587))
But the original premise remains true: your variable has a fixed part of the name (ie x) and a changeable part of the name (ie the string in the brackets) that is used as an index.
And there is no magic to the indexing by string either. If you opened up a dictionary and looked inside it, you'd find an array, together with some bits of code that take the string you passed in as an index, and turn it into a number like 587, and store the info you want in index 587. And there is a routine to deal with the case where two different strings both become 587 when converted, but other than that it's all just "if you want a variable where part of the name is changeable/formable programmatically rather than by the developer, it's an array"

Match Words and Add Quantities vb.net

I am trying to program a way to read a text file and match all the values and their quantites. For example if the text file is like this:
Bread-10 Flour-2 Orange-2 Bread-3
I want to create a list with the total quantity of all the common words. I began my code, but I am having trouble understanding to to sum the values. I'm not asking for anyone to write the code for me but I am having trouble finding resources. I have the following code:
Dim query = From data In IO.File.ReadAllLines("C:\User\Desktop\doc.txt")
Let name As String = data.Split("-")(0)
Let quantity As Integer = CInt(data.Split("-")(1))
Let sum As Integer = 0
For i As Integer = 0 To query.Count - 1
For j As Integer = i To
Next
Thanks
Ok, lets break this down. And I not seen the LET command used for a long time (back in the GWBASIC days!).
But, that's ok.
So, first up, we going to assume your text file is like this:
Bread-10
Flour-2
Orange-2
Bread-3
As opposed to this:
Bread-10 Flour-2 Orange-2 Bread-3
Now, we could read one line, and then process the information. Or we can read all lines of text, and THEN process the data. If the file is not huge (say a few 100 lines), then performance is not much of a issue, so lets just read in the whole file in one shot (and your code also had this idea).
Your start code is good. So, lets keep it (well ok, very close).
A few things:
We don't need the LET for assignment. While older BASIC languages had this, and vb.net still supports this? We don't need it. (but you will see examples of that still floating around in vb.net - especially for what we call "class" module code, or "custom classes". But again lets just leave that for another day.
Now the next part? We could start building up a array, look for the existing value, and then add it. However, this would require a few extra arrays, and a few extra loops.
However, in .net land, we have a cool thing called a dictionary.
And that's just a fancy term of for a collection VERY much like an array, but it has some extra "fancy" features. The fancy feature is that it allows one to put into the handly list things by a "key" name, and then pull that "value" out by the key.
This saves us a good number of extra looping type of code.
And it also means we don't need a array for the results.
This key system is ALSO very fast (behind the scene it uses some cool concepts - hash coding).
So, our code to do this would look like this:
Note I could have saved a few lines here or there - but that would make this code hard to read.
Given that you look to have Fortran, or older BASIC language experience, then lets try to keep the code style somewhat similar. it is stunning that vb.net seems to consume even 40 year old GWBASIC type of syntax here.
Do note that arrays() in vb.net do have some fancy "find" options, but the dictionary structure is even nicer. It also means we can often traverse the results with out say needing a for i = 1 to end of array, and having to pull out values that way.
We can use for each.
So this would work:
Dim MyData() As String ' an array() of strings - one line per array
MyData = File.ReadAllLines("c:\test5\doc.txt") ' read each line to array()
Dim colSums As New Dictionary(Of String, Integer) ' to hold our values and sum them
Dim sKey As String
Dim sValue As Integer
For Each strLine As String In MyData
sKey = Split(strLine, "-")(0)
sValue = Split(strLine, "-")(1)
If colSums.ContainsKey(sKey) Then
colSums(sKey) = colSums(sKey) + sValue
Else
colSums.Add(sKey, sValue)
End If
Next
' display results
Dim KeyPair As KeyValuePair(Of String, Integer)
For Each KeyPair In colSums
Debug.Print(KeyPair.Key & " = " & KeyPair.Value)
Next
The above results in this output in the debug window:
Bread = 13
Flour = 2
Orange = 2
I was tempted here to write this code using just pure array() in vb.net, as that would give you a good idea of the "older" types of coding and syntax we could use here, and a approach that harks all the way back to those older PC basic systems.
While the dictionary feature is more advanced, it is worth the learning curve here, and it makes this problem a lot easier. I mean, if this was for a longer list? Then I would start to consider introduction of some kind of data base system.
However, without some data system, then the dictionary feature is a welcome approach due to that "key" value lookup ability, and not having to loop. It also a very high speed system, so the result is not much looping code, and better yet we write less code.

Create table with variable number of columns

I am hoping to get a few ideas for something to push me in the right direction. I have a custom class that stores data from a table based on criteria. The raw data (consisting of over 100 columns and varying between 10-1000 rows) is on a worksheet. My code does the following:
1 - Creates an object from the custom class
2 - Adds a value to the properties of the object
3 - Adds the object to a collection
4 - Returns the collection to the controller which sends it to the view to build the table
The following will build a collection of column ranges from the raw data, at least:
Private mcolColumnAddresses As Collection
Private Sub Class_Initialize()
Set mcolColumnAddresses = New Collection
Dim vHeader As Variant
For Each vHeader In mwksReport.Range(mwksReport.Cells(1, 1), mwksReport.Cells(1, mlLastColumn))
mcolColumnAddresses.Add vHeader.Offset(1, 0).Resize(mlLastRow - 1), vHeader.value
Next vHeader
End Sub
The end users want the ability to choose the columns they want for building the new table. But a typical class for a table would use a row as an object with the column headers as the properties. How would I build a table using class properties when the columns are not known until run-time? I hope that makes sense.
Note: I am not asking for code but for suggestions. Has anybody else had this requirement? If so, how did you approach it? An example is welcome, too.
But a typical class for a table would use a row as an object with the column headers as the properties
If your table has really more than 100 columns, or if the column names are only known a runtime, you should probably approach this different. One object per row is fine, but your class could provide a method for accessing all column values by their name. In VBA syntax:
Function GetValue(byval columnName as string) as Variant
'...
As you see, you have to sacrifice some type safety here, but that is typically a small price to pay for getting this solved in a sensible manner.
Internally, your objects can store the values in some Dictionary (in VBA available through the MS Scripting Runtime), indexed by the column names. This leads to
Function GetValue(byval columnName as string) as Variant
if valueDict.ContainsKey(columnName) then
GetValue = valueDict(columnName)
else
'... add some error handling here
end if
End Function
For populating the dictionary, any database has possibilities to determine column names for a table, just google for " get column names programmatically" to find some example code.

Excel: Populate cells with data from file names that use common naming convention

I have 1000+ files (mostly PDFs) that all follow a common naming convention, e.g.:
CA0001.02 Tax Return A-333 650.5ca 20140729.pdf
Each file has different information in its filename (the "CA number" is different, the "A-number" is different, the date is different, etc.).
I want to create a spreadsheet so that I can manipulate the data that these file names contain; in other words, take the 5 pieces of info listed in the filename and turn it into 5 columns in Excel.
In my research I've come across ways to insert the Excel filename into the current sheet, but that's not what I want. I want to insert the filenames of thousands of other files located elsewhere on the computer. My ideal solution would ensure that:
Each file gets its own row
Each field in the filename goes into the appropriate Excel column
Any filenames that are missing data wouldn't break the operation (e.g., if the date "20140729" wasn't at the end of the file, then the whole thing wouldn't break, it would just leave that cell empty and move to the next file).
I imagine this will require VBA or Command Prompt (and maybe something else?) but my skill with VBA is pretty weak. I would really appreciate any suggestions to get me started. Thanks!
Your question is too generic for a simple answer. The following VBA function with some help from Google should get you started:
Sub Test()
Dim FN As String, R As Integer
R = 1
FN = Dir("C:\*")
Do While FN <> ""
Cells(R, 1) = FN
If InStr(FN, "CA") Then Cells(R, 2) = "This contains CA!"
FN = Dir
R = R + 1
Loop
End Sub

Defining objects from data in an excel table

So I have data that is supplier by country where the supplier information changes by country.
600 suppliers and will be upwards of 35 countries(15 attributes per country per supplier). The excel data sheet looks similar to this:
SupplierID SupplierName USsupplierCategory USsupplierCategoryCode UKSupplierCategory ...
1 Sup1 Beverages 1 Ropes
3 Sup5 Ladders 46 Small Ladders
If I could figure out a simple way to get this excel data into an array(even copying and pasting works if the formatting goes quickly since I only have to do this once a month at the most) I can then loop through it an build the needed objects off of the array. But I can't find a simple way to build an array with the excel data without going through a lot of formatting for an array assignment.
I'm pretty new to VB.net and still an amateur programmer and I just can't seem to envision a simple solution to this.
Is an array the way to go? Should I instead loop through each row as a string and break the data by tabs and assign the data that way?
If I am too vague I apologize, let me know and I will provide more specific detail and some code if needed.
The high level approach I'd use is as follows:
1) Define your supplier object
2) Create a supplier object for each row of your datatable and add it to a collection, e.g. a List of supplier objects
Once you have done this, you can easily iterate through your collection of supplier objects to do whatever you like.
Detailed approach:
1) Define your supplier object:
Public Class Supplier
Public property SupplierID as integer
Public Property SupplierName as string
Public Property USsupplierCategory as string
Public property USsuppliercategorycode as integer
Public property UKSupplierCategory as string
End Class
2) Read your excel object into a datatable
This gets a bit tricker. As Tim said, it is easier to save your excel files as csv and then read them in. This is because it is easy to read in csv, but it is also complicated to read in excel files. They can be either xls or xlsx format, and deciding on which one can be a problem. We can also get into COM object issues etc.
If you are to stick with reading in excel, I recommend using Linq2Excel, as this library handles the xls / xlsx issue for you. Here is some example code I knocked up based on your example spreadsheet that returns a list of supplier objects:
Imports LinqToExcel
Imports Remotion.Data.Linq
Imports System.Data
Imports System.Linq
Public Class ReadInExcelData
Public Shared Function GetSupplierListFromSupplierExcel() As List(Of Supplier)
Dim excel As ExcelQueryFactory = New ExcelQueryFactory("C:\Users\YourUserName\Desktop\ExampleData.xls")
Dim suppliers = From c In excel.Worksheet
Select c
Dim list As New List(Of Supplier)
For Each supplier In suppliers
list.Add(New Supplier(supplier.Item(0),
supplier.Item(1),
supplier.Item(2),
supplier.Item(3),
supplier.Item(4)))
Next
Return list
End Function
End Class
You can get the linqtoexcel library here: http://code.google.com/p/linqtoexcel/