Is it efficient to use LINQ to parse strings? - vb.net

I've been using LINQ so much in the last couple of weeks that when I had to write a one line function to remove < and > from a string, I found that I had written it as a LINQ query:
Public Function StripLTGT(text As String) As String
Return String.Join("", (From i As Char In text.ToCharArray Where i <> "<"c Where i <> ">"c).ToArray)
End Function
My question is, is it better to do it with LINQ as above or with StringBuilder as I've always done, as below:
Public Function StripLTGT(text As String) As String
Dim a As New StringBuilder(text)
a = a.Replace("<", "")
a = a.Replace(">", "")
Return a.ToString
End Function
Both work, the second one is easier to read, but the first one is designed for executing queries against arrays and other enumerables.

Regex.Replace("[<>]", "")
Is much more straightforward.
Or:
myString = myString.Replace("<", "").Replace(">", "")
Whether or not option A, B or C is faster than the others is hard to say because option A may be better on small strings while option B may be better on long strings, etc.

Either one should really be fine in terms of functionality. The first one is not efficient as is. The ToArray call is doing far more work than necessary (if you're on .NET 4.0, it is not needed anyway), and the ToCharArray call is not needed. Basically the characters in the input string are being iterated a lot more than they need to be, and extra arrays are allocated superfluously.
I wouldn't say this particularly matters; but you asked about efficiency, so that's why I mention it.
The second one seems fine to me. Note that if you wanted to go the one-line route, you could still do so with a StringBuilder and I think still have something more concise than the LINQ version (though I haven't counted characters). Whether or not this even outperforms the more direct String.Replace option is kind of unclear to me, though:
' StringBuilder.Replace version:
Return New StringBuilder(text).Replace("<", "").Replace(">", "").ToString()
' String.Replace version:
Return text.Replace("<", "").Replace(">", "")

Related

Import txt file data and store it in Multidimensioanl array in Vb.net

Sorry if the problem is so basic, I'm a bit used to python not VB.net
I'm trying to read text file data (numbers) and store it in array/list
# Sample of text
1.30e+03,1.30e+03,1.30e+03
5.4600e+02,2.7700e+02,2.8000e+02
# PS: I can control the output of the numbers to have delimiter = ',' or space between numbers, whatever is easier to import
I wrote the following code to read string data and store it. yet, I don't know how to have a multidimensional array (2D or 3D) instead of 1D string (e.g. for the text above, it would be 2x3 array)
' Import Data
Comp_path = FinalPath & "Components_colors.txt"
reader = New StreamReader(Comp_path)
Dim W As String = ""
Dim wArray(10) As String
Dim i As Integer = 0
Do Until reader.Peek = -1
W = reader.ReadLine()
wArray(i) = W
i += 1
Loop
Moreover, I don't know the length of the text file, so I can't determine the length of the array like I did in the code above for the string wArray
For a file like this, you should turn to NuGet for a dedicated CSV parser. There are parsers built into .NET you could also use, but pulling one off of NuGet will also let you parse the values directly into something other than a string.
But if you really don't want to do that you can start with this (assuming Option Infer):
Public Function ImportData(filePath As String) As IEnumerable(Of Double())
Dim lines = File.ReadLines(filePath)
Return lines.Select(Function(line) line.Split(",").Select(AddressOf Double.Parse).ToArray())
End Function
And use it like this:
Comp_path = FinalPath & "Components_colors.txt"
Dim result = ImportData(Comp_path)
Note this code doesn't actually do any meaningful work yet. It doesn't even read the file. What it does is give you an object (result) that you can use with a For Each loop or linq operations. It will read the file in a just-in-time way, parsing out the data for each line as it goes. If you want an array (or List, which you should use in .Net more often), you can append a ToList() call to the end:
Comp_path = FinalPath & "Components_colors.txt"
Dim result = ImportData(Comp_path).ToList()
But you should try to avoid doing that it. It's much less efficient in terms of memory use. The first sample will only ever need to keep one line of the file in memory at a time. Adding ToArray() or ToList() needs to load the entire file.
Some more notes:
Many newer dynamic platforms like Python don't actually use real arrays in the formal computer science sense (fixed block of contiguous memory). Rather, they use collections, and just call them arrays. .Net has collections, too, but when you declare an array, you get an array. This has nice benefits for performance, but if you don't know you want that or how to take advantage of it you're probably better off asking for a generic List most of the time instead.
Thanks to cultural/internationalization issues, parsing numeric (or date) values to string and back again is much slower and more error-prone than you've believed in the past, especially coming from a dynamic platform. It is slow on these other platforms, too, but they want you to pretend it isn't. The first introduction to a strongly-typed platform like .Net can feel stifling in this area, but once you understand the performance and and reliability benefits, you won't want to go back.
In strongly-typed platforms it is very important to understand the data types you are working with at every level of an expression. Otherwise, building and reading statements like the Return line in my answer will be way more difficult and frustrating than it needs to be.

Good programming etiquette/ good programming practices

I'm trying to develop some good programming practices and today I am working on a program that will take info from textboxes, radiobuttons and checkboxes. I know I could do it like this, for example:
TextBoxResult.Text = Val(TextBoxNumbA.Text) + Val(TextBoxNumbB.Text)
and it will work. Or I could also do it like this and it will work as well:
Dim result, numberA, numberB as Integer
numberA = Val(TextBoxNumbA.Text)
numberB = Val(TextBoxNumbB.Text)
result = numberA + numberB
TextBoxResult.Text = result
My question is: which is the proper way to do it? The short way without declaring variables or the long way declaring variables? How do software companies handle this?
Ignoring the horrible val part. Should you use intermediate variables?
Yes if it makes the code more comprehensible. In the case you posted I don't believe it does, but that's only my opinion.
Seeing as you are using VB.net, don't worry about the extra memory required, the compiler will deal with that. In an interpreted language it could be an issue, but code is not written for computers to "understand", it's written for programmers to. Always go for comprehensible first. You can make it incomprehensible once you are sure it's correct...
Val is a left over Legacy function. I would recommend switching to type specific converters. This also provides protection for values that are not convertible like letters. Remember to convert types. Turn Option Strict On and you will see.
Integer.TryParse
Dim result As integer = 0
Dim numberA, numberB as Integer
If Integer.TryParse(TextBoxNumbA.Text, numberA) Then
If Integer.TryParse(TextBoxNumbB.Text, numberB) Then
result = numberA + numberB
End If
End If
TextBoxResult.Text = result.ToString("n2")

Do you choose Linq over Forloops?

Given a datatable containing two columns like this:
Private Function CreateDataTable() As DataTable
Dim customerTable As New DataTable("Customers")
customerTable.Columns.Add(New DataColumn("Id", GetType(System.Int32)))
customerTable.Columns.Add(New DataColumn("Name", GetType(System.String)))
Dim row1 = customerTable.NewRow()
row1.Item("Id") = 1
row1.Item("Name") = "Customer 1"
customerTable.Rows.Add(row1)
Dim row2 = customerTable.NewRow()
row2.Item("Id") = 2
row2.Item("Name") = "Customer 2"
customerTable.Rows.Add(row2)
Dim row3 = customerTable.NewRow()
row3.Item("Id") = 3
row3.Item("Name") = "Customer 3"
customerTable.Rows.Add(row3)
Return customerTable
End Function
Would you use this snippet to retrieve a List(Of Integer) containing all Id's:
Dim table = CreateDataTable()
Dim list1 As New List(Of Integer)
For i As Integer = 0 To table.Rows.Count - 1
list1.Add(CType(table.Rows(i)("Id"), Integer))
Next
Or rather this one:
Dim list2 = (From r In table.AsEnumerable _
Select r.Field(Of Integer)("Id")).ToList()
This is not a question about whether to type cast the Id column to Integer by using .Field(Of Integer), CType, CInt, DirectCast or whatever but generally about whether or not you choose Linq over forloops as the subject implies.
For those who are interested: I ran some iterations with both versions which resulted in the following performance graph:
graph http://dnlmpq.blu.livefilestore.com/y1pOeqhqQ5neNRMs8YpLRlb_l8IS_sQYswJkg17q8i1K3SjTjgsE4O97Re_idshf2BxhpGdgHTD2aWNKjyVKWrQmB0J1FffQoWh/analysis.png?psid=1
The vertical axis shows the milliseconds it took the code to convert the rows' ids into a generic list with the number of rows shown on the horizontal axis. The blue line resulted from the imperative approach (forloop), the red line from the declarative code (linq).
Whatever way you generally choose: Why do you go that way and not the other?
Whenever possible I favor the declarative way of programming instead of imperative. When you use a declarative approach the CLR can optimize the code based on the characteristics of the machine. For example if it has multiple cores it could parallelize the execution while if you use an imperative for loop you are basically locking this possibility. Today maybe there's no big difference but I think that in the future more and more extensions like PLINQ will appear allowing better optimization.
I avoid linq unless it helps readability a lot, because it completely destroys edit-and-continue.
When they fix that, I will probably start using it more, because I do like the syntax a lot for some things.
For almost everything I've done I've come to the conclusion that LINQ is optimized enough. If I handcrafted a for loop it would have better performance, but in the grand scheme of things we are usually talking milliseconds. Since I rarely have a situation where those milliseconds will make any kind of impact, I find it's much more important to have readable code with clear intentions. I would much rather have a call that is 50ms slower than have someone come along and break it altogether!
Resharper has a cool feature that will flag and convert loops into Linq expressions. I will flip it to the Linq version and see if that hurts or helps readability. If the Linq expression more clearly communicates the intent of the code, I will go with that. If the Linq expression is unreadable, I will flip back to the foreach version.
Most of the performance issues don't really compare with readability for me.
Clarity trumps cleverness.
In the above example, I would go with the the Linq version since it clearly explains the intent and also locks out people accidently adding side effects in the loop.
I recently found myself wondering whether I've been totally spoiled by LINQ. Yes, I now use it all the time to pick all sort of things out from all sort of collections.
I started to, but found out in some cases, I saved time by using this approach:
for (var i = 0, len = list.Count; i < len; i++) { .. }
Not necessarily in all cases, but some. Most extension methods use the foreach approach of querying.
I try to follow these rules:
Whenever I'm just querying (filtering, projecting, ...) collections, use LINQ.
As soon as I'm actually 'doing' something with the result (i.e, introduce side effects), I'll use a for loop.
So in this example, I'll use LINQ.
Also, I always try to split up the 'query definition' from the 'query evaluation':
Dim query = From r In table.AsEnumerable()
Select r.Field(Of Integer)("Id")
Dim result = query.ToList()
This makes it clear when that (in this case in-memory) query will be evaluated.

When to use StringBuilder? [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
String vs StringBuilder
I just revisited some of the books that I used to pick up VB.NET. I am not sure I've got this in my head, understand how/what StringBuilder is.
What is the guidance for using? Is it best to use it if you are are concatenating 2 strings or 50?
Or when the the total string length is greater than 128 characters?
Or will you see a performance benefit whenever you use it to add strings together?
In which case is it better to use a StringBuilder instance to build a SQL statement than string.format("Select * from x where y = {0}",1)?
It's always struck me that declaring another variable and including a name space is not beneficial for small string concatenations, but I am not sure now.
Sorry, lot of documentation tells you what to use, just not what's best.
I've got an article on this very topic. In summary (copied from the bottom of the page):
Definitely use StringBuilder when you're concatenating in a non-trivial loop - especially if you don't know for sure (at compile time) how many iterations you'll make through the loop. For example, reading a file a character at a time, building up a string as you go using the += operator is potentially performance suicide.
Definitely use the concatenation operator when you can (readably) specify everything which needs to be concatenated in one statement. (If you have an array of things to concatenate, consider calling String.Concat explicitly - or String.Join if you need a delimiter.)
Don't be afraid to break literals up into several concatenated bits - the result will be the same. You can aid readability by breaking a long literal into several lines, for instance, with no harm to performance.
If you need the intermediate results of the concatenation for something other than feeding the next iteration of concatenation, StringBuilder isn't going to help you. For instance, if you build up a full name from a first name and a last name, and then add a third piece of information (the nickname, maybe) to the end, you'll only benefit from using StringBuilder if you don't need the (first name + last name) string for other purpose (as we do in the example which creates a Person object).
If you just have a few concatenations to do, and you really want to do them in separate statements, it doesn't really matter which way you go. Which way is more efficient will depend on the number of concatenations the sizes of string involved, and what order they're concatenated in. If you really believe that piece of code to be a performance bottleneck, profile or benchmark it both ways.
Here is my rule of thumb:
StringBuilder is best used when the exact number of concatenations is unknown at compile time.
Coding Horror has a good article concerning this question, The Sad Tragedy of Micro-Optimization Theater.
Personally I use StringBuilder when I have more than just one or two strings to concatenate. I'm not sure if there's a real performance hit to be gained, but I've always read and been told that doing a regular concatenation with multiple strings creates an extra copy of the string each time you do it, while using StringBuilder keeps one copy until you call the final ToString() method on it.
Someone's figured out experimentally that the critical number is 6. More than 6 concatenations in a row and you should use a StringBuilder. Can't remember where I found this.
However, note that if you just write this in a line:
"qwert" + "yuiop" + "asdf" + "gh" + "jkl;" + "zxcv" + "bnm" + ",."
That gets converted into one function call (I don't know how to write it in VB.net)
String.Concat("qwert", "yuiop", "asdf", "gh", "jkl;", "zxcv", "bnm", ",.");
So if you're doing all concatenations on one line, then don't bother with StringBuilder because String.Concat effectively will do all the concatenations in one go. It's only if you're doing them in a loop or successively concatenating.
My rule - when you're adding to a string in a For or Foreach loop, use the StringBuilder.

Performance difference between IIf() and If

In Visual Basic, is there a performance difference when using the IIf function instead of the If statement?
VB has the following If statement which the question refers to, I think:
' Usage 1
Dim result = If(a > 5, "World", "Hello")
' Usage 2
Dim foo = If(result, "Alternative")
The first is basically C#'s ternary conditional operator and the second is its coalesce operator (return result unless it’s Nothing, in which case return "Alternative"). If has thus replaced IIf and the latter is obsolete.
Like in C#, VB's conditional If operator short-circuits, so you can now safely write the following, which is not possible using the IIf function:
Dim len = If(text Is Nothing, 0, text.Length)
IIf() runs both the true and false code. For simple things like numeric assignment, this isn't a big deal. But for code that requires any sort of processing, you're wasting cycles running the condition that doesn't match, and possibly causing side effects.
Code illustration:
Module Module1
Sub Main()
Dim test As Boolean = False
Dim result As String = IIf(test, Foo(), Bar())
End Sub
Public Function Foo() As String
Console.WriteLine("Foo!")
Return "Foo"
End Function
Public Function Bar() As String
Console.WriteLine("Bar!")
Return "Bar"
End Function
End Module
Outputs:
Foo!
Bar!
Also, another big issue with the IIf is that it will actually call any functions that are in the arguments [1], so if you have a situation like the following:
string results = IIf(Not oraData.IsDBNull(ndx), oraData.GetString(ndx), string.Empty)
It will actually throw an exception, which is not how most people think the function works the first time that they see it. This can also lead to some very hard to fix bugs in an application as well.
[1] IIf Function - http://msdn.microsoft.com/en-us/library/27ydhh0d(VS.71).aspx
According to this guy, IIf can take up to 6x as long as If/Then. YMMV.
Better use If instead of IIf to use the type inference mechanism correctly (Option Infer On)
In this example, Keywords is recognized as a string when I use If :
Dim Keywords = If(String.IsNullOrEmpty(SelectedKeywords), "N/A", SelectedKeywords)
Otherwise, it is recognized as an Object :
Dim Keywords = IIf(String.IsNullOrEmpty(SelectedKeywords), "N/A", SelectedKeywords)
On top of that, readability should probably be more highly preferred than performance in this case. Even if IIF was more efficient, it's just plain less readable to the target audience (I assume if you're working in Visual Basic, you want other programmers to be able to read your code easily, which is VB's biggest boon... and which is lost with concepts like IIF in my opinion).
Also, "IIF is a function, versus IF being part of the languages' syntax"... which implies to me that, indeed, If would be faster... if for nothing else than that the If statement can be boiled down directly to a small set of opcodes rather than having to go to another space in memory to perform the logic found in said function. It's a trite difference, perhaps, but worth noting.
I believe that the main difference between If and IIf is:
If(test [boolean], statement1, statement2)
it means that according to the test value either satement1 or statement2 will executed
(just one statement will execute)
Dim obj = IIF(test [boolean] , statement1, statement2)
it means that the both statements will execute but according to test value one of them will return a value to (obj).
so if one of the statements will throw an exception it will throw it in (IIf) anyway but in (If) it will throw it just in case the condition will return its value.
...as to why it can take as long as 6x, quoth the wiki:
Because IIf is a library function, it
will always require the overhead of a
function call, whereas a conditional
operator will more likely produce
inline code.
Essentially IIf is the equivalent of a ternary operator in C++/C#, so it gives you some nice 1 line if/else type statements if you'd like it to. You can also give it a function to evaluate if you desire.
Those functions are different! Perhaps you only need to use IF statement.
IIF will always be slower, because it will do both functions plus it will do standard IF statement.
If you are wondering why there is IIF function, maybe this will be explanation:
Sub main()
counter = 0
bln = True
s = iif(bln, f1, f2)
End Sub
Function f1 As String
counter = counter + 1
Return "YES"
End Function
Function f2 As String
counter = counter + 1
Return "NO"
End Function
So the counter will be 2 after this, but s will be "YES" only. I know this counter stuff is useless, but sometimes there are functions that you will need both to run, doesn't matter if IF is true or false, and just assign value from one of them to your variable.