Faster string comparison in vb.net

Faster string comparison in vb.net - vb.net

I have this condition:
If (cmbStatusSearch.SelectedValue <> "-1") Then
How can I make it better in performance? I read String.CompareOrdinal is faster in comparing strings. So should I use:
If (String.CompareOrdinal(cmbStatusSearch.SelectedValue,"-1" <>0) Then
Or is there any other way to make if faster in performance?

I think you're being overly concerned about the wrong type of performance issues and prematurely optimizing a trivial piece of code.
The first example is much more readable than the second. If it serves your purpose then move on and be content with it. Your performance bottleneck will not be in that statement. If you feel some operation in your program is slow then use a profiler, such as the ANTS Performance Profiler (or similar), to discover where the bottleneck is. Until then, guessing about performance issues is futile.
To put this into perspective, consider that no one would use LINQ if they were so concerned over performance to the level presented in your question. Instead, they would stick to traditional code and for loops, which are known to be faster. However, for the sake of readability and expressiveness, LINQ is commonly used and acceptable.
Although String.CompareOrdinal might be more efficient, I would recommend using it when you need to benefit from its intended purpose, which is to perform a case-sensitive comparison using ordinal sort rules. Otherwise, for your posted example, a direct comparison is fine and more readable.

Lets think about this:
Any string equality check can be implemented viastring.CompareOrdinal.
A string comparison check cannot be implemented via an equality check.
So if CompareOrdinal was faster, why wouldn't they just implement Equals in terms of it? In fact it's slower (exact numbers depend on framework), but this is not surprising since it does strictly more work.

Related

Query complexity vs code complexity

So my question is rather simple, what's better and faster, doing calculations in code(let's say java) or just doing complex database queries(if we assume we can do one same action in both ways)? Which approach is better in general, and why?

I'd do in the code.
Doing business calculations in the queries in the DB gets the logic of the app spread and not easly understandable, plus you often get bound to specific storage (i.e. SQL Server/Oracle/MySql/etc) leaving the possibility to chage storage paradigma (i.e. to a NoSQL DB).
Then in the code you can apply some injection to change easly the behavior of your code, making it more manageable.

I generally find it faster (in development time) to write a query to do what I need. The first round focuses on logical correctness, and I'll complete the rest of the functionality using that query. I try to avoid doing queries in loops in this phase, but otherwise I don't worry too much about performance.
Then, I'll look at the performance of the whole feature. If the query is too slow, I'll EXPLAIN it and look at the server's statistics. My first focus is on indexes, if that doesn't work I'll try restructuring the query. Sometimes correlated subqueries are faster than joins, sometimes unions are faster than disjunctions in the WHERE clause, sometimes it's the other way around.
If I can't get satisfactory performance using a single query, I may break it up into multiple queries and/or do some of the work in code. Such code tends to be more complicated and longer than an equivalent query, so I try to avoid it unless necessary.

Performance of SQL standards vs T-SQL extensions

Articles on the internet say user-defined functions can either burden or increase the performance.
Now, I know that standard SQL is pretty limited, however, some of the behavior can still be written as in T-SQL built-in functions.
For example, adddays() vs. dateadd() . Another point I've heard that its also better to use coalesce() - the ANSI standard function rather than isNull().
What is the performance difference between using the ANSI SQL standard functions vs T-SQL functions?
Does T-SQL adds any burden what so ever on the performance with it trying to make the job easier, or not?
My research does not seem to indicate a trend.

You will need to approach this on a case-by-case basis and do actual testing. There is no general rule, other than Microsoft tries to make the entire stack perform as well as possible. TESTING is what you need to do - we can't tell you that always a certain thing would be faster. That would be really bad advice.
It is important to do this testing on your actual production data, prefereably a copy of it. Do not rely on tests done against data sets that aren't yours. When you're talking about performance differences of functions, some very subtle things can make a big difference. Things like the size of the table, the data types involved, the indexing, and SQL Server versions, can change the result of these tests. That is why "no one has done this" for you. We can't.

When should Regex be used over String.IndexOf()? or String.Contains()?

I'm, currently working my first project in .NET 4.0 and it requires several thousand string comparisons (I'm searching directories and sometimes entire drives for certain files). For the most part, the strings are quite short because I'm only looking at file paths so I have just made use of String.Contains() to see if the file path string contains my needle string.
I was wondering though, would Regex be a better idea? At what point will the Regex be faster than a standard string comparison? Is it based on the length of the strings being compared or the number of strings being compared?

It's variable. Comparison performance is a complex function of the input data, the culture being used for comparing, case sensitivity and CompareOptions. A Regex object is more expensive to instantiate (unless it's in the Regex cache), so if you're doing a lot of one off comparisons, it not that great to use and I've found it's typically slower than IndexOf(), but YMMV.
Keep in mind that when using Contains/IndexOf that the culture under which the user/thread is running will decide how the comparison is done. That can have a significant impact on performance. Not all cultures are as fast.
The Invariant culture is a very fast culture. If you use a CompareInfo directly, rather than doing String.IndexOf(), it will be somewhat faster still.
CultureInfo.InvariantCulture.CompareInfo.IndexOf(..)
The only way to have some confidence in making the right choice is to benchmark. That said, unless you're shifting through many megabytes of strings, it won't make a difference that matters to anyone. As ChrisF said earlier, focus on readable/maintainble code in that case.
Here's a good article on getting the most out of regex:
Optimizing Regular Expression Performance

If your search expression is simple then I don't think it's worth moving to a Regex - no matter how good you are at coding and reading them it will take you more time to understand the code when you (or more importantly, some one else) look at it again in 6 months time.
If the speed improvements are only marginal stay with the more readable, maintainable code.

I'm just guessing, but I suspect that for simple substring searches there will be little difference in performance between String.Contains(), String.IndexOf() and regex (if anything, I'd guess that regex would never be faster, but might be slower by a miniscule amount).
You shouldn't give any thought about moving to regex unless your requirements are (or become) such that you need to match on something more complex than a substring.

In .Net 4.0 there is an issue with the String.IndexOf call see Hotfix 2467309, it may help you decide your answer.

is there such thing as a query being too big?

I don't have too much experience with SQL. Most of the queries I have written have been very small. Whenever I see a very large query, I always kinda assume it needs to be optimized. But is this true? or is there situations where really large queries are just whats needed?
BTW when I say large queries I mean queries that exceed 1000+ chars

Yes, any statement, method, or even query can be "too big".
The problem, is actually defining what too big really is.
If you can't sit down and figure out what the query does in a relatively short amount of time, it's probably best to break it up into smaller chunks.
I always like to look at things from a maintenance standpoint. If the query is hard to understand now, what if you have to debug something in it?
Just because you see a large query, doesn't mean it needs to be changed or optimized, but if it's too complicated for its own good, then you might want to consider refactoring.

Just as in other languages, you can't determine the efficiency of a query based on a character count. Also, 1000 characters isn't what I could call "large", especially when you use good table/column names, aliases that make sense, etc.
If you're not comfortable enough with SQL to be able to "eye ball" the design merits of particular query, run it through a profiler and examine the execution plan. That'll give you a good idea of problems, if any, the code in question will suffer from.
My rule of thumb is this: write the best, tightest, simplest code you can, and optimize where needed - ie, where you see a performance bottleneck or where (as frequently happens) you slap yourself in the head and say "D'OH!" at about three in the morning on vacation.
Summary:Code well, and optimize where needed.
As Robert said, if you can't easily tell what the query is doing, it probably needs to be simplified.

If you are used to writing simple stuff, you may not realize how complex getting information for a complex report might be. Yes, queries can get long and complicated and still perform well for what they are being asked to do. Often the techniques that are used to performance tune something may make the code look more complicated to those less familar with advanced querying techniques. What counts is how long it takes to execute and whether it returns the correct data, not how many characters it has.
When I see a complex query, my first thought is does it return what the developer really intended to return (you'd be surprised at how often the answer to that is no) and then I look to see if it could be performance tuned. Yes there are many badly written long queries out there, but there are also as many or more that do what they are intended to do about as fast as it can be done without a major database redesign or faster hardware.

I'd suggest that it's not the characters that should measure the size/complexity of the query.
I'd boil it down to:
what's the goal of the query?
does it used set-based logic?
does it re-use any components?
does it JOIN improperly/poorly?
what are the performance implications?
maintainability concerns - is it written so that another developer can grok its intentions?

Where I work we've created stored procedures that exceed 1000 characters. I can't really say it was NECESSARY but sometimes haste wins out over efficiency (most notably when a quick fix is necessary for a client).
Having said that ... if given the time I would attempt to optimize a query as small/efficient as it can get without it being overly confusing. I've used nested stored procedures to make things a little more clear and/or functions as well.

The number of characters does not mean that a query needs to be optimized - it is what you see within those characters that does.
Things like subqueries on top of subqueries is something I would review. I'd review JOINs as well, but it shouldn't take long comparing to the ERD to know if there's an unnecessary JOIN - the first thing I'd look at would be what tables are joined but not used in the output, which is fine if the tables are link/corrollary/etc tables.

Tips for optimizing sql commands worrying about legacy

The concern with the legacy of the SQL statements is a constant in my head. Especially when SCRUM is used, where the code has no owner, that is, all must be able to repair and maintain each piece. Optimizing SQL procedures usually means converting it into a set-based commands and using special operators. I need tips to keep the code working without forgetting the threshold optimization x readability.

Comments. If it's a newer command or an obscure one, make sure to leave a comment associated with the statement describing it in code/source. That way you don't have another developer down the road refactoring the statement to improve readability at the cost of performance. My general guideline with this is if someone of intermediate skill level or below would have to spend several minutes or more searching for what the statement is really doing, leave the comment to save them time.

I wouldn't worry so much about readability other than having the formatting conform to defined standards. Optimization is much more important than using only simple SQL that anyone can understand. That is where comments should come in... Explain what the SQL should be doing and why you chose a certain optimization technique. The added advantage to this is that it will help the next person who reads it to learn new SQL techniques.

I've found the best solution to be to include, in your comments, a clearly qualified, duplicable optimization test for the query, using statistics from the optimizer. (This also works nicely for stored procedures, where the same issues appear.)
Include a statement about the nature of the testing context (hardware and data), data generation code if necessary, and a clear description of testing conditions (cache settings, repetitions, etc.) Better yet, agree on a team template for this spec.
Then enforcing comparisons needs to be built into your culture somewhere ... the best solution would be for the culture to expect documented before-and-after optimization testing.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Faster string comparison in vb.net - vb.net

Related

Query complexity vs code complexity

Performance of SQL standards vs T-SQL extensions

When should Regex be used over String.IndexOf()? or String.Contains()?

is there such thing as a query being too big?

Tips for optimizing sql commands worrying about legacy

Categories

Resources