How to convert Microsecond value into CPU cycle calculation, anybody know how to convert this ?
In multicore processors and modern architectures, it's not easy to come with a definition of 'spent' CPU cycles associated to a time interval consumed/elapsed in your program (I wonder SQL procedure/query according to your tags), like in the old pre-superscalar architectures.
I recommend you to find other metrics, procedures or special tools for measuring your performance. I'm not an user of SQL Server, but all professional grade DBMS systems offer profiling tools for tuning DB/application performance.
Read this post and the user comments, it's related to the function QueryPerformanceCounter of Windows for approaching some kind of 'cycles' accounting. The discussion lights up why your cycle-based approach is not correct, from my point of view.
https://blogs.msdn.microsoft.com/oldnewthing/20080908-00/?p=20963/
Related
We are in the case of using a SQL database for a single node storage of roughly 1 hour of high frequency metrics (several k inserts a second). We quickly ran into I/O issues which proper buffering would not simply handle, and we are willing to put time into solving the performance issue.
I suggested to switch to a specialised database for handling time series, but my colleague stayed pretty skeptical. His argument is that the gain "out of the box" is not guaranteed as he knows SQL well and already spent time optimizing the storage, and we in comparison do not have any kind of TSDB experience to properly optimize it.
My intuition is that using a TSDB would be much more efficient even with an out of box configuration but I don't have any data to measure this, and internet benchs such as InfluxDB's are nowhere near trustable. We should run our own, except we can't affoard to loose time in a dead end or a mediocre improvement.
What would be, in my use case but very roughly, the performance gap between relational storage and TSDB, when it comes to single node throughput ?
This question may be bordering on a software recommendation. I just want to point one important thing out: You have an existing code base so switching to another data store is expensive in terms of development costs and time. If you have someone experienced with the current technology, you are probably better off with a good-faith effort to make that technology work.
Whether you switch or not depends on the actual requirements of your application. For instance, if you don't need the data immediately, perhaps writing batches to a file is the most efficient mechanism.
Your infrastructure has ample opportunity for in-place growth -- more memory, more processors, solid-state disk (for example). These might meet your performance needs with a minimal amount of effort.
If you cannot make the solution work (and 10k inserts per second should be quite feasible), then there are numerous solutions. Some NOSQL databases relax some of the strict ACID requirements of traditional RDBMSs, providing faster throughout.
We have a process that was originally written for Sybase that looks at the ##cpu_busy variables to detect the cpu load on the RDMS. The application will be running against Microsoft SQL.
curCPULoadPercent=100*(##cpu_busy - prevCpuBusy)/((##cpu_busy - prevCpuBusy)+(##io_busy - prevIOBusy)+(##idle - prevIdle))
If any of the above deltas are negative, the ratio is ignore and the previous one is used.
Unfortunately, instead of letting the number roll over, the Microsoft engineers stop increasing the number (134217727) and thus fails to reveal CPU load after 29 days (from what I have been reading).
(The Sybase engineers knew that a negative number subtracted from a negative number gives a positive difference.)
The following provides a hint on how to monitor CPU load, but we are unable to determine which wait types to look at.
https://msdn.microsoft.com/en-us/library/ms179984.aspx
We have been measuring the ##cpu_busy differences compared to the wait types and not any particular wait type matches.
Our plan is to build a predictive model to predict the old ratio from correlated wait types but I am wondering if someone has already figured this out.
Any suggestions?
starting with SQL2005 the runtime detail that you can get from SQL Server for things like Wait types and locks and load and memory pressure are extremely detailed and varied.
Do not use ## variables for what you are doing. If you can define exactly what you are trying to detect I can guarantee that the newer sql views have exactly what your are looking for and in particular the values can be defined in a spid or overall context.
The SQL Profiler tools can graph the values you are looking for probably.
Benjamin Nevarez has the answer for CPU utilization:
http://sqlblog.com/blogs/ben_nevarez/archive/2009/07/26/getting-cpu-utilization-data-from-sql-server.aspx
It uses Dynamic Mgmt View data from sys.dm_os_ring_buffers
where ring_buffer_type = N'RING_BUFFER_SCHEDULER_MONITOR'.
I'm writing some code for a piece of network middleware. Right now, our code is running too slowly.
We've already done one round of rewrites and optimizations, but we seem to be running into hard limitations on what can be done in software.
The slowness of the code stems from one subroutine - an esoteric sampling algorithm from computational statistics. Because the math involved is somewhat similar to the stuff done in DSP, I'm wondering if we can use an FPGA to accelerate the computation.
My question is basically in the title - how do I tell whether an FPGA (or even ASIC) will give a useful speedup in my use case?
EDIT: A 'useful' speedup is one that is significant enough to justify the cost and dev time required to build the FPGA.
The short answer is to have an experienced FPGA engineer look at the algorithm and tell you what it would take in development time and bill of material cost for a solution.
Without knowing the details of your algorithm it is harder to guess. How parallelizable is the problem? Or can it be heavily pipelined? How many multiply/accumulate/dsp operations are required? Can you approximate some calculations with a big look up table or other FPGA dsp techniques (CORDIC). FPGAs can do many, many more of these operations in parallel (every clock cycle) 100s or even 1000s depending on how much you are willing to spend on an FPGA. Without knowing the details and having an experienced FPGA/DSP engineer look at the problem it is going to be hard to get a real feel though.
Some other options are:
look into DSP chips (TI dsp chips are one example).
Does your processor have SIMD instructions available that are not being used?
I'm working on a mathematical model written in VB6. The amount of CPU time this model is consuming is becoming a concern to some of our customers and the notion has been floated that porting it to VB.NET will improve its performance.
The model is performing a lot of single-precision arithmetic (a finite-difference scheme over a large grid) with small bursts of database access every five seconds or so (not enough to be important). Only basic arithmetic functions with occasional use of the ^ 4 operator are involved.
Does anyone think porting to VB.NET is likely to improve matters (or not)? Does anyone know any reliable articles or papers I can check over to help with this decision?
My opinion is that VB.Net won't improve performance by far. The improvement is given by your ability to make an optimized algorith.
Probably the best performance boost you can get is eliminating the DB access (even if it doesnt look important I/O usually is the bottleneck, not the language itself). If possible get the data upfront and save it at the end instead of accesing every 5 secs.
Also as other pointed out, change the algorithm if possible since porting the code to .NET probably will only get you small performance benefits.
But if you change it to .NET 4.0 maybe you can use the parallel extensions and really get a boost by using multiple cores. http://msdn.microsoft.com/en-us/library/dd460693.aspx , but that also means, changing the algorithm
Hope it helps. ;-)
I think that improvements in memory management improve performance in VB.NET
To provide you with a correct answer we should check your code...
But certainly VB.NET in theory should be more performant:
possibility to compile on 64 bit machines (without too much effort)
VB6 was interpreted, VB.NET is almost compiled
You can use threads (depends on your algorithm) and other "tricks", so you can use more CPUs to make calculations in parallel
Best thing to try: port the most CPU consuming part of your application to VB.NET and compare.
The same algorithm will perform faster in VB6, because it is compiled in native language.
If the program has extensive memory allocation, it may perform faster in a .NET when running in a 64 bit environment though.
I've been meaning to do a little bit of brushing up on my knowledge of statistics. One area where it seems like statistics would be helpful is in profiling code. I say this because it seems like profiling almost always involves me trying to pull some information from a large amount of data.
Are there any subjects in statistics that I could brush up on to get a better understanding of profiler output? Bonus points if you can point me to a book or other resource that will help me understand these subjects better.
I'm not sure books on statistics are that useful when it comes to profiling. Running a profiler should give you a list of functions and the percentage of time spent in each. You then look at the one that took the most percentage wise and see if you can optimise it in any way. Repeat until your code is fast enough. Not much scope for standard deviation or chi squared there, I feel.
All I know about profiling is what I just read in Wikipedia :-) but I do know a fair bit about statistics. The profiling article mentioned sampling and statistical analysis of sampled data. Clearly statistical analysis will be able to use those samples to develop some statistical statements on performance. Let's say you have some measure of performance, m, and you sample that measure 1000 times. Let's also say you know something about the underlying processes that created that value of m. For instance, if m is the SUM of a bunch of random variates, the distribution of m is probably normal. If m is the PRODUCT of a bunch of random variates, the distribution is probably lognormal. And so on...
If you don't know the underlying distribution and you want to make some statement about comparing performance, you may need what are called non-parametric statistics.
Overall, I'd suggest any standard text on statistical inference (DeGroot), a text that covers different probability distributions and where they're applicable (Hastings & Peacock), and a book on non-parametric statistics (Conover). Hope this helps.
Statistics is fun and interesting, but for performance tuning, you don't need it. Here's an explanation why, but a simple analogy might give the idea.
A performance problem is like an object (which may actually be multiple connected objects) buried under an acre of snow, and you are trying to find it by probing randomly with a stick. If your stick hits it a couple of times, you've found it - it's exact size is not so important. (If you really want a better estimate of how big it is, take more probes, but that won't change its size.) The number of times you have to probe the snow before you find it depends on how much of the area of the snow it is under.
Once you find it, you can pull it out. Now there is less snow, but there might be more objects under the snow that remains. So with more probing, you can find and remove those as well. In this way, you can keep going until you can't find anything more that you can remove.
In software, the snow is time, and probing is taking random-time samples of the call stack. In this way, it is possible to find and remove multiple problems, resulting in large speedup factors.
And statistics has nothing to do with it.
Zed Shaw, as usual, has some thoughts on the subject of statistics and programming, but he puts them much more eloquently than I could.
I think that the most important statistical concept to understand in this context is Amdahl's law. Although commonly referred to in contexts of parallelization, Amdahl's law has a more general interpretation. Here's an excerpt from the Wikipedia page:
More technically, the law is concerned
with the speedup achievable from an
improvement to a computation that
affects a proportion P of that
computation where the improvement has
a speedup of S. (For example, if an
improvement can speed up 30% of the
computation, P will be 0.3; if the
improvement makes the portion affected
twice as fast, S will be 2.) Amdahl's
law states that the overall speedup of
applying the improvement will be
I think one concept related to both statistics and profiling (your original question) that is very useful and used by some (you see the technique advised from time to time) is while doing "micro profiling": a lot of programmers will rally and yell "you can't micro profile, micro profiling simply doesn't work, too many things can influence your computation".
Yet simply run n times your profiling, and keep only x% of your observations, the ones around the median, because the median is a "robust statistic" (contrarily to the mean) that is not influenced by outliers (outliers being precisely the value you want to not take into account when doing such profiling).
This is definitely a very useful statistician technique for programmers who want to micro-profile their code.
If you apply the MVC programming method with PHP this would be what you need to profile:
Application:
Controller Setup time
Model Setup time
View Setup time
Database
Query - Time
Cookies
Name - Value
Sessions
Name - Value