When to use StringBuilder? [duplicate] - vb.net

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
String vs StringBuilder
I just revisited some of the books that I used to pick up VB.NET. I am not sure I've got this in my head, understand how/what StringBuilder is.
What is the guidance for using? Is it best to use it if you are are concatenating 2 strings or 50?
Or when the the total string length is greater than 128 characters?
Or will you see a performance benefit whenever you use it to add strings together?
In which case is it better to use a StringBuilder instance to build a SQL statement than string.format("Select * from x where y = {0}",1)?
It's always struck me that declaring another variable and including a name space is not beneficial for small string concatenations, but I am not sure now.
Sorry, lot of documentation tells you what to use, just not what's best.

I've got an article on this very topic. In summary (copied from the bottom of the page):
Definitely use StringBuilder when you're concatenating in a non-trivial loop - especially if you don't know for sure (at compile time) how many iterations you'll make through the loop. For example, reading a file a character at a time, building up a string as you go using the += operator is potentially performance suicide.
Definitely use the concatenation operator when you can (readably) specify everything which needs to be concatenated in one statement. (If you have an array of things to concatenate, consider calling String.Concat explicitly - or String.Join if you need a delimiter.)
Don't be afraid to break literals up into several concatenated bits - the result will be the same. You can aid readability by breaking a long literal into several lines, for instance, with no harm to performance.
If you need the intermediate results of the concatenation for something other than feeding the next iteration of concatenation, StringBuilder isn't going to help you. For instance, if you build up a full name from a first name and a last name, and then add a third piece of information (the nickname, maybe) to the end, you'll only benefit from using StringBuilder if you don't need the (first name + last name) string for other purpose (as we do in the example which creates a Person object).
If you just have a few concatenations to do, and you really want to do them in separate statements, it doesn't really matter which way you go. Which way is more efficient will depend on the number of concatenations the sizes of string involved, and what order they're concatenated in. If you really believe that piece of code to be a performance bottleneck, profile or benchmark it both ways.

Here is my rule of thumb:
StringBuilder is best used when the exact number of concatenations is unknown at compile time.

Coding Horror has a good article concerning this question, The Sad Tragedy of Micro-Optimization Theater.

Personally I use StringBuilder when I have more than just one or two strings to concatenate. I'm not sure if there's a real performance hit to be gained, but I've always read and been told that doing a regular concatenation with multiple strings creates an extra copy of the string each time you do it, while using StringBuilder keeps one copy until you call the final ToString() method on it.

Someone's figured out experimentally that the critical number is 6. More than 6 concatenations in a row and you should use a StringBuilder. Can't remember where I found this.
However, note that if you just write this in a line:
"qwert" + "yuiop" + "asdf" + "gh" + "jkl;" + "zxcv" + "bnm" + ",."
That gets converted into one function call (I don't know how to write it in VB.net)
String.Concat("qwert", "yuiop", "asdf", "gh", "jkl;", "zxcv", "bnm", ",.");
So if you're doing all concatenations on one line, then don't bother with StringBuilder because String.Concat effectively will do all the concatenations in one go. It's only if you're doing them in a loop or successively concatenating.

My rule - when you're adding to a string in a For or Foreach loop, use the StringBuilder.

Related

VB.NET Decimal.Try parse for dot and comma values

I'm trying to parse string to decimal in vb.net which could contain dot or comma, for ex. '5000.00', '5000,00' (actually for Belgium and Niederlands).
Code for decimal with dot:
Decimal.TryParse(amountStr, amountVal)
Code for decimal with comma:
Decimal.TryParse(amountStr, NumberStyles.AllowDecimalPoint, CultureInfo.CreateSpecificCulture("nl-BE"), amountVal)
Is it possible to combine these into one code without replacing comma in string?
Is it possible to combine these into one code without replacing comma in string?
String-replacement is the "usual" solution to this problem. A slightly more elegant alternative would be to check if the string contains a . or a , and then provide the "correct" CultureInfo to TryParse:
Dim isBelgianFormat As Boolean = amountStr.Contains(",")
Dim ci As CultureInfo = If(isBelgianFormat,
CultureInfo.GetCultureInfo("nl-BE"),
CultureInfo.InvariantCulture)
...Decimal.TryParse(amountStr, NumberStyles.AllowDecimalPoint, ci, amountVal)...
This will also allow you to "fine-tune" your guessing logic by replacing the first line with a more complicated algorithm. (For example, this "simple" solution will fail if your users use thousands separators, i.e., if you want to correctly "guess" the value of both 500.000,00 and 500,000.00.)
That having been said, you can make your code more complicated to cover these cases as well, but how do you want to treat, for example, 500.000 or 500,000? Is it half a million or 500?
Thus, I urge you to reconsider your requirements. Especially when parsing monetary values, failing with a helpful error message is often preferable to guessing what the user might have meant.

How do I concatenate Strings in Kotlin?

Obviously there are multiple ways to concatenate Strings in Kotlin:
processString(pojo.name + " " + pojo.value)
processString("${pojo.name} ${pojo.value}")
processString(pojo.name.plus(" ").plus(pojo.value))
Of course also with StringBuilder, concat()-Method etc.
Those will work.
But my question is, why is Android Studio proposing "convert concatenation to template" and converts 1. to 2. ? Are there any speed advantages with 2.? So wahts the advantage using 2.?
TL;DR: String Templates are the most idiomatic way to concatenate strings
The documentation states
Note that in most cases using string templates or raw strings is preferable to string concatenation.
String templates are basically the same as regular concatenation (using +) but more compact, idiomatic and equally efficient. Both variants are implemented using StringBuilders in the byte code.
That's because the 1. approach comes from java. Of course the compiler knows what's happening but the suggestion is to use it in Kotlin like the 2. point is stated. Using the 2. approach is better because you might get confused with the + (plus()) operator that is used to sum up numbers.

Mid() don't extract string in accurate position

I am using VBA in Ms Access environment, to handle long string (memo field storing HTML originally).
After positioning by Instr(), I put the position into Mid(vStr,vStartPos,vEndPos-vStartPos+1) to extract the string, but the output doesn't match. I have already carefully checked this in immediate windows, as well as NotePad++. What I can say is Instr() and NotePad++ have given the same counting result, while Mid() is different. Mid()'s result are former than Instr()'s in some cases, and latter in other cases. I don't know the reason, and can just believe Mid() use different mechanism or have defeative (surprised!) in handling long string mixed with single-byte and bi-byte chars (but this is common in the world, and meet no problem before), and possibly some special characters.
I believe I need to custom-make a Mid() function. Any idea how to do it effectively and efficiently?
Thanks all for your reply. After I created a custom Mid() by RegEx and find that the problem has no change, I have found out the silly mistake I made. The Instr() and Mid() have no problem, but the string has been carelessly modified between them. So this case should be closed now.

Regex match SQL values string with multiple rows and same number of columns

I tried to match the sql values string (0),(5),(12),... or (0,11),(122,33),(4,51),... or (0,121,12),(31,4,5),(26,227,38),... and so on with the regular expression
\(\s*\d+\s*(\s*,\s*\d+\s*)*\)(\s*,\s*\(\s*\d+\s*(\s*,\s*\d+\s*)*\))*
and it works. But...
How can I ensure that the regex does not match a values string like (0,12),(1,2,3),(56,7) with different number of columns?
Thanks in advance...
As i mentioned in comment to the question, the best way to check if input string is valid: contains the same count of numbers between brackets, is to use client side programm, but not clear SQL.
Implementation:
List<string> s = new List<string>(){
"(0),(5),(12)", "(0,11),(122,33),(4,51)",
"(0,121,12),(31,4,5),(26,227,38)","(0,12),(1,2,3),(56,7)"};
var qry = s.Select(a=>new
{
orig = a,
newst = a.Split(new string[]{"),(", "(", ")"},
StringSplitOptions.RemoveEmptyEntries)
})
.Select(a=>new
{
orig = a.orig,
isValid = (a.newst
.Sum(b=>b.Split(new char[]{','},
StringSplitOptions.RemoveEmptyEntries).Count()) %
a.newst.Count()) ==0
});
Result:
orig isValid
(0),(5),(12) True
(0,11),(122,33),(4,51) True
(0,121,12),(31,4,5),(26,227,38) True
(0,12),(1,2,3),(56,7) False
Note: The second Select statement gets the modulo of sum of comma instances and the count of items in string array returned by Split function. If the result isn't equal to zero, it means that input string is invalid.
I strongly believe there's a simplest way to achieve that, but - at this moment - i don't know how ;)
:(
Unless you add some more constraints, I don't think you can solve this problem only with regular expressions.
It isn't able to solve all of your string problems, just as it cannot be used to check that the opening and closing of brackets (like "((())()(()(())))") is invalid. That's a more complicated issue.
That's what I learnt in class :P If someone knows a way then that'd be sweet!
I'm sorry, I spent a bit of time looking into how we could turn this string into an array and do more work to it with SQL but built in functionality is lacking and the solution would end up being very hacky.
I'd recommend trying to handle this situation differently as large scale string computation isn't the best way to go if your database is to gradually fill up.
A combination of client and serverside validation can be used to help prevent bad data (like the ones with more numbers) from getting into the database.
If you need to keep those numbers then you could rework your schema to include some metadata which you can use in your queries, like how many numbers there are and whether it all matches nicely. This information can be computed inexpensively from your server and provided to the database.
Good luck!

Split SQL statements

I am writing a backend application which needs to be able to send multiple SQL commands to a MySQL server.
MySQL >= 5.x support multiple statements, but unfortunately we are interfacing with MySQL 4.x.
I am trying to find a way (hint: regex) to split SQL statements by their semicolon, but it should ignore semicolons in single and double quotes strings.
http://www.dev-explorer.com/articles/multiple-mysql-queries has a very nice regex to do that, but doesn't support double quotes.
I'd be happy to hear your suggestions.
Can't be done with regex, it's insufficiently powerful to parse SQL. There may be an SQL parser available for your language — which is it? — but parsing SQL is quite hard, especially given the range of different syntaxes available. Even in MySQL alone there are many SQL_MODE flags on a server and connection level that can affect how basic strings and comments are parsed, making statements behave quite differently.
The example at dev-explorer goes to amusing lengths to try to cope with escaped apostrophes and trailing strings, but will still fail for many valid combinations of them, not to mention the double quotes, backticks, the various comment syntaxes, or ANSI SQL_MODE.
As bobince said, regular expressions are probably not going to be powerful enough to do this. They're certainly not going to be powerful enough to do it in any halfway elegant manner. The second link cdonner provided also does not address this; most answers there were trying to talk the questioner out of doing this without semicolons; if he had taken the general advice, then he'd have ended up where you are.
I think the quickest path to solving this is going to be with a string scanner function, that examines every character of the string in sequence, and reacts based on a bit of stored state. Rough pseudocode:
Read in a character
If the character is not special, CONTINUE
If the character is escaped (checking this probably requires examining the previous character), CONTINUE
If the character would start a new string or end an existing one, toggle a flag IN_STRING (you might need multiple flags for different string types... I've honestly tried and succeeded at remaining ignorant of the minutiae of SQL quoting/escaping) and CONTINUE
If the character is a semicolon AND we are not currently in a string, we have found a query! OUTPUT it and CONTINUE scanning until the end of the string.
Language parsing is not any of my areas of experience, so you'll want to consider that approach carefully; nonetheless, it's going to be fast (with C-style strings, none of those steps are at all expensive, save possibly for the OUTPUT, depending on what "outputting" means in your context) and I think it should get the job done.
maybe with the following Java Regexp? check the test...
#Test
public void testRegexp() {
String s = //
"SELECT 'hello;world' \n" + //
"FROM DUAL; \n" + //
"\n" + //
"SELECT 'hello;world' \n" + //
"FROM DUAL; \n" + //
"\n";
String regexp = "([^;]*?('.*?')?)*?;\\s*";
assertEquals("<statement><statement>", s.replaceAll(regexp, "<statement>"));
}
I would suggest seeing if you can redefine the problem space so the need to send multiple queries separated only by their terminator is not required.
Try this. Just replaced the 1st ' with \" and it seems to work for both ' and "
;+(?=([^\"|^\\']['|\\'][^'|^\\']['|\\'])[^'|^\\'][^'|^\\']$)