This is the simplified pseudocode where I'm trying to measure a GPU workload:
for(N) vkCmdDrawIndexed();
vkCmdWriteTimestamp(VK_PIPELINE_STAGE_TOP_OF_PIPE_BIT);
vkCmdWriteTimestamp(VK_PIPELINE_STAGE_BOTTOM_OF_PIPE_BIT);
submit();
vkDeviceWaitIdle();
vkGetQueryPoolResults();
Things to note:
N is 224 in my case
I have to wait for an idle device - without it, I continue to receive a validation error saying me that the data is not ready though I have multiple query pools in flight
putting the first timestamp I expect that the query value will be written as soon as all previous commands reached a preprocessing step. I was pretty sure that all 224 commands are preprocessed almost at the same time but the reality shows that this is not true.
putting the second timestamp I expect that the query value will be written after all previous commands are finished. I.e. the time difference between these two query values should give me the time it takes for the GPU to do all the work for a single frame.
I'm taking into account VkPhysicalDeviceLimits::timestampPeriod (1 on my machine) and VkQueueFamilyProperties::timestampValidBits (64 on my machine)
I created a big dataset that visually takes approx 2 seconds (~2000ms) to render a single frame. But the calculated time has only 2 (two) different values - either 0.001024ms or 0.002048ms so the frame by frame output can look like this:
0.001024ms
0.001024ms
0.002048ms
0.001024ms
0.002048ms
0.002048ms
...
Don't know how about you, but I find these values VERY suspicious. I have no answer for that. Maybe at the time, the last draw command reaches the command processor all the work is already done, but why the hell 1024 and 2048??
I tried to modify the code and move the first timestamp above, i.e.:
vkCmdWriteTimestamp(VK_PIPELINE_STAGE_TOP_OF_PIPE_BIT);
for(N) vkCmdDrawIndexed();
vkCmdWriteTimestamp(VK_PIPELINE_STAGE_BOTTOM_OF_PIPE_BIT);
Now at the time the preprocessor hits the timestamp command, it writes the query value immediately, because there was no previous work and nothing to wait (remember idle device). This time I have another, closer to the truth values:
20.9336ms
20.9736ms
21.036ms
21.0196ms
20.9572ms
21.3586ms
...
which is better but still far beyond expected ~2000ms.
What's going on, what's happening inside the device when I set timestamps, how to get correct values?
While commands in Vulkan can be executed out of order (within certain restrictions), you should not broadly expect commands to be executed out of order. This is especially true of timer queries which, if they were executed out of order, would be unreliable in terms of their meaning.
Given that, your code is saying, "do a bunch of work. Then query the time it takes for the start of the pipe to be ready to execute new commands, then query the time it takes for the end of the pipe to be reached." Well, the start of the pipe might only be ready to execute new commands once most of the work is done.
Basically, what you think is happening is this:
top work work work work work work | timer
stage1 work work work work work work
stage2 work work work work work work
bottom work work work work work work | timer
But there's nothing that requires GPUs to execute this way. What is almost certainly actually happening is:
time->
top work work work work work work | timer
stage1 work work work work work work
stage2 work work work work work work
bottom work work work work work work | timer
So your two timers are only getting a fraction of the actual work.
What you want is this:
top timer | work work work work work work
stage1 work work work work work work
stage2 work work work work work work
bottom work work work work work work | timer
This queries the time from start to finish for the entire set of work.
So put the first query before the work whose time you want to measure.
Related
I have some data in prometheus that looks like this:
I have a job that runs every 2 minutes on a server and pushes values to prometheus' pushgateway and that's how it reaches prometheus. Now im trying to query this data with the HTTP API and I'm noticing that it returns inconsistent results, it either returns the data I expect to see or it doesnt return anything at all.
My queries are range queries where for example start = now() - 1w and end = now. The problem seems to show when I use high values for the step/resolution. The only step that seems to work all of the time is 5m. When I try 10m sometimes it works but usually it doesnt. I'm guessing this depends on the time I send the request (maybe when I use the current time it breaks something).
Why is this happening?
try to set the step as the size of the duration in the []. eg. if your query look like that: http_requests_total{job="prometheus"}[5m], use step = 5m.
if you want 10m change the duration and the step to 10m.
We are performance tuning, both in terms of cost and speed, some of our queries and the results we get are a little bit weird. First, we had one query that did an overwrite on an existing table, we stopped that one after 4 hours. Running the same query to an entirely new table and it only took 5 minutes. I wonder if the 5 minute query maybe used a cached result from the first run, is that possible to check? Is it possible to force BigQuery to not use cache?
If you run query in UI - expand Options and make sure Use Cached Result properly set
Also, in UI, you can check Job Details to see if cached result was used
If you run your query programmatically - you should use respective attributes - configuration.query.useQueryCache and statistics.query.cacheHit
In a couple of weeks I will be joining a project that (currently) uses LabView for development. To get myself at least somewhat familiar before this happens I have been creating some simple projects in the trial version of the software. Someone challenged me to write a simple program that could preform simple division without using the division operator.
I've written the program successfully, but my main while loop seems to run one too many times. Here is my program:
The user inputs a dividend and divisor and the program will continually subtract the divisor from the dividend until the dividend becomes <= 0, at which point it should break. The program runs successfully, but when the loop finally stops the dividend always equals x below 0 (where x is whatever number the divisor is). While debugging the application I found the problem, when the loop comparison happens for the final time the dividend will equal 0 and evaluate to 'false' however the code inside the loops executes one last time before the loop breaks. This is what I would expect from a do-while loop, but not a simple while.
Just to prove to myself that it wasn't an (hopefully) obvious logic error I wrote (what I consider to be) an equivalent program in python that does exactly what I expect.
I've spent a long time googling, staring at it, I even did it on paper but I cannot figure out why it doesn't do what I expect. What gives?
LabVIEW executes it's code according to the dataflow principle Which means that the loop cannot stop, until it has finished executing all the code inside it. This is the NI document confirming the above (see the very first flowchart). Moreover, subtraction and comparison are happening simultaneously.
The code you have is largely equivalent to (except that comparison with 0 happens on a temporary value in the wire) :
dividend = YYY
divisor = XXX
dividend = dividend - divisor
while dividend > 0:
dividend = dividend - divisor
If you'll be really getting into LabVIEW I strongly suggest you avoid using local variables. Many times (including this one) they are bad. Do it like this instead:
This is a snipplet, so if you drag this file from explorer and drop it onto your BD it will appear as a piece of code (LV2014).
I believe that the evaluation of the condition and the subtraction happens in parallel as opposed to after each other that's why you always get one more subtraction than you need.
Edit
As it's told in the dataflow tutorial (Figure 2) any operation as soon as all inputs are available can be expected to be executed. You can not know and should not rely on the order of execution for operations that are ready to be performed.
The Python code you've written is not equivalent. A while loop in LabVIEW is actually a do while loop, the code it contains will always execute at least once. This also means that the condition isn't evaluated until after the code it contains has executed.
Does postgres include the time taken for rendering the output on screen within \timing or explain analyze. From what I understand it does not. Am I correct?
Actually I am outputting a lot of rows on screen and I find that postgres does not take much time to display them on screen, whereas if I write a simple C program to output the results, then the C programs takes about 3000ms. Whereas postgres takes about 500ms to display the same data on screen.
"postgres" doesn't display anything at all. I think you mean the psql client.
If so: \timing displays time including the time to receive the data from the server. EXPLAIN ANALYZE doesn't, but adds the overhead of doing the detailed server-side timing. log_min_duration_statement just records statement timings server-side.
I need some pointer crafting an MS-DOS batch "function" to advance the system calendar N number of days from the current one. Is there a more machine-friendly command, or am I stuck with DATE and some Franken-hack parsing of it?
On one of my favorite batch sites on the net Rob van der Woude's Scripting Pages you can find in the examples section a script called DateAdd.bat. It will calculate and print a new date based on a given date and offset.
It should be fairly easy to change the script to your needs or use it along with your own script. Get back to us if you need further help with that.
If you're truly using MS-DOS rather than the more advanced cmd.exe, your options are very limited since the variable manipulation was pretty bad.
I do remember needing something similar in a previous life. From memory, rather than trying to screw around with date calculations, we simply ran a loop (one iteration for each day) and, inside the loop, set the time to 23:59 then wait for five seconds or so. Unfortunately, I think that pre-dated ping so we couldn't even use the sleep trick - we had to run a lengthy goto loop to be certain.
That way, DOS itself figured out whether "tomorrow" was the 31st of September or 1st of October.
In the end, it became too much trouble so I would suggest you do what we finished up doing. Grab yourself a copy of Turbo C from Borland's (or InPrise or Enchilada or whatever they're called nowadays - they'll always be Borland to me) museum site and write a quick little C program to do it for you .