We're going to be using Gnuradio to stream in data from a radio peripheral. In addition, we have another peripheral that is part of the system which control programatically. I have a basic C program to do the controls.
I'd like to be able to implement this in GNUradio, but I dont' know what the best way to do this is. I've seen that you can make blocks, so I was thinking I could make a sink block, have a constant feed into that, and have the constant's value defined by some control like a WX slider.
It would take a needless part out of this if I could remove the constant block and just have the variable assigned to the WX slider directly be assigned to the control block, but then there would be no input. Can you make an inputless and outputless block that just runs some program or subroutine?
Also, when doing a basic test to see if this was feasible, I used a slider to a constant source to a WX scope plot. There seems to be a lag or delay between putting in an option and seeing the result show up on the plot. Is there a more efficient way to do this that will reduce that lag? Or is the lag just becasue my computer is slow?
It would take a needless part out of this if I could remove the constant block and just have the variable assigned to the WX slider directly be assigned to the control block, but then there would be no input. Can you make an inputless and outputless block that just runs some program or subroutine?
Yes, if you do this it will work. In fact, you can write any sort of Python code in a GRC XML file, and if you set up the properties and setter code properly, what you want will work. It doesn't have to actually create any GNU Radio blocks per se.
Also, when doing a basic test to see if this was feasible, I used a slider to a constant source to a WX scope plot. There seems to be a lag or delay between putting in an option and seeing the result show up on the plot.
GNU Radio is not optimized for minimum latency, but for efficient bulk processing. You're seeing the buffering between the source and the sink. Whenever you have a source that computes values rather than being tied to some hardware clock, the buffers downstream of it will be always-nearly-full and you'll get this lag.
In the advanced options there are settings to tune the buffer size, but they will only help so much.
I would guess you need a throttle in your work flow diagram or the sampling rate between blocks is incorrect.
It's almost impossible to help you unless you post your grc file or an image of the it.
Related
In GNU radio I am trying to use the frequency of one signal to generate another signal of a different frequency. Here is the flow diagram that I am using:
I generate a 50 kHz signal with a signal source block and feed this into a Log Power FFT block. I used the Argmax block to find the FFT bin with the most power and multiply that with a constant. I want to use this result as the input to the complex vco block to generate another signal with a different frequency. All vectors have a length of 4096.
However, looking at the output of the complex QT Gui Time Sink block, the output of the vco is always zero. This is strange to me because using a float QT Gui Time Sink to look at the output of the multiply block (which is also going to the input of the vco block), the result is 50,000 as expected. Why am I only getting zero out of the vco?
Also, my sample rate is set to 1M. I am assuming because of the vector length of 4096 that the sample rate out of the Argmax block will be 1M/4096 = 244. Is this correct?
I am running gnu radio companion on windows 10.
The proposed solution is not a solution. Please don't abuse the signal probe, which is really just that, a probe for slow, debugging or purely visual purposes. Every time I use it myself, I see how architecturally bad it is, and I personally think the project should be removing it from the block library altogether.
Now, instead of just say "probe is bad, do something else", let's analyse where your flow graph falls short:
your frequency estimation depends on the argmax of a block that was meant for pure visualization purposes. No, the output rate is not (sampling rate/FFT length), the output rate is roughly "frame rate" (but not actually exactly. That block is terrible and mixes "sample times" with "wall clock times"). Don't do that. If you need something like that, use the FFT block, followed my "complex to magnitude squared". You don't even want the logarithm - you're just looking for a maximum
Instead of looking for a maximum absolute value in an FFT, which is inherently a quantizing frequency estimator, use something that actually gives you an oscillation. There's multiple ways you can do that with a PLL!
your VCO solution probably does what it's programmed to do. You just use an inadequate sensitivity!
The sampling rate you assume in your time sinks is totally off, which is probably why you have the impression of a constant output – it just changes so slowly that you'll not notice.
So, I propose to instead, either / or:
Use the PLL Freq det. Feed the output of that into the VCO. Don't scale with a constant, but simply apply the proper sensitivity. Sensitivity is the factor between "input amplitude" and "phase advance per sample on the output in radians".
Use the PLL Carrier recovery. Use a resampler, or some other mathematical method, to generate the new frequency. You haven't told us how that other frequency relates to the input frequency, so I can't give you concrete advice.
Also notice that this very much suggest this being a case of "I'm trying to recreate an analog approach in digital"; that might be a good approach, but in many cases it's not.
If I might be as brazen: Describe why you need to generate that other frequency, for which purpose, in a post on https://dsp.stackexchange.com or to the GNU Radio mailing list discuss-gnuradio#gnu.org (sign up here). This really only barely is a programming problem, but really a signal processing problem. And there's a lot of people out there eager to help you find an appropriate solution that actually tackles your problem!
It looks like a better solution was to probe the output of the multiplier using a probe signal block along with a Function Probe Block to create a new variable. This variable could then be used as the frequency value in a separate Signal Source Block that is used to generate the new signal. This flow diagram seems to satisfy the original intended purpose: new flow diagram
I have a BPSK modulator/demodulator working that I have added a few blocks around to effectively have a DSSS system working. However, when I try to add a second user (or spreading code) I can only lock on to one of the signals, I am assuming because I am using the standard root raised cosine filter in the PFB clock sync block, and it has no direct knowledge of the spreading code used.
My question is if there is a way to somehow incorporate the spreading code into the root raised cosine filter, or maybe incorporate it some other way into the PFB clock sync block so that I can perform symbol timing recovery on the correct set of symbols?
The RRC I am using now is:
firdes.root_raised_cosine(nfilts,nfilts,1.0,0.35,11*sps*nfilts)
where nfilts = 32 and sps = 2.
I am sorry I am not directly answering your question, but first we need to understand where the RRC is applied. If you are using the Constellation Modulator (CM) block to generate the BPSK and then spreading, the RRC is being applied before the spreading; i.e., it's performed by the CM. If this is true, then I think it may be just luck that it worked for one spreading code.
On the other hand, if you apply an RRC post-spreading, then the PFB Clock Sync should not care. I suggest changing sps to 4 and then looking at the time domain signal post-spreading. Do you see RRC-shaped symbols?
It isn't clear to me when it's a good idea to use VK_IMAGE_LAYOUT_GENERAL as opposed to transitioning to the optimal layout for whatever action I'm about to perform. Currently, my policy is to always transition to the optimal layout.
But VK_IMAGE_LAYOUT_GENERAL exists. Maybe I should be using it when I'm only going to use a given layout for a short period of time.
For example, right now, I'm writing code to generate mipmaps using vkCmdBlitImage. As I loop through the sub-resources performing the vkCmdBlitImage commands, should I transition to VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL as I scale down into a mip, then transition to VK_IMAGE_LAYOUT_TRANSFER_SRC_OPTIMAL when I'll be the source for the next mip before finally transitioning to VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL when I'm all done? It seems like a lot of transitioning, and maybe generating the mips in VK_IMAGE_LAYOUT_GENERAL is better.
I appreciate the answer might be to measure, but it's hard to measure on all my target GPUs (especially because I haven't got anything running on Android yet) so if anyone has any decent rule of thumb to apply it would be much appreciated.
FWIW, I'm writing Vulkan code that will run on desktop GPUs and Android, but I'm mainly concerned about performance on the latter.
You would use it when:
You are lazy
You need to map the memory to host (unless you can use PREINITIALIZED)
When you use the image as multiple incompatible attachments and you have no choice
For Store Images
( 5. Other cases when you would switch layouts too much (and you don't even need barriers) relatively to the work done on the images. Measurement needed to confirm GENERAL is better in that case. Most likely a premature optimalization even then.
)
PS: You could transition all the mip-maps together to TRANSFER_DST by a single command beforehand and then only the one you need to SRC. With a decent HDD, it should be even best to already have them stored with mip-maps, if that's a option (and perhaps even have a better quality using some sophisticated algorithm).
PS2: Too bad, there's not a mip-map creation command. The cmdBlit most likely does it anyway under the hood for Images smaller than half resolution....
If you read from mipmap[n] image for creating the mipmap[n+1] image then you should use the transfer image flags if you want your code to run on all Vulkan implementations and get the most performance across all implementations as the flags may be used by the GPU to optimize the image for reads or writes.
So if you want to go cross-vendor only use VK_IMAGE_LAYOUT_GENERAL for setting up the descriptor that uses the final image and not image reads or writes.
If you don't want to use that many transitions you may copy from a buffer instead of an image, though you obviously wouldn't get the format conversion, scaling and filtering that vkCmdBlitImage does for you for free.
Also don't forget to check if the target format actually supports the BLIT_SRC or BLIT_DST bits. This is independent of whether you use the transfer or general layout for copies.
Situation
I'm using the PsychoPy coder to create a random dot motion task in speed-accuracy trade-off situation. I want to have a letter for fixation point to inform subject if they are in "speed" condition or in "precision" (on every trial), so I first thought of simply drawing a text.stim (like "S" or "P"). But I heard that text.stim was pretty slow to draw and because of the dynamic nature of the RDK task if the text.stim needs to much time I'm afraid that it will impact the display of the dots.
Question
I'm I right?
And if so what would be the best way to draw the "fixation letters"?
Well it seems that I found the answer in the TextStim reference manual so I put it here in case someone needs it:
Performance OBS: in general, TextStim is slower than many other visual stimuli, i.e. it takes longer to change some attributes. In general, it’s the attributes that affect the shapes of the letters: text, height, font, bold etc. These make the next .draw() slower because that sets the text again. You can make the draw() quick by calling re-setting the text (myTextStim.text = myTextStim.text) when you’ve changed the parameters.
So the slowing seems to concern the changing of the attributes which is not my situation.
If you've just got one letter being drawn it shouldn't have a major impact on your RDK, but just check whether frames are being dropped. All these things depend on graphics card and CPU speed so you need to test individually for each machine/experiment
I'm trying to use WebGL to speed up computations in a simulation of a small quantum circuit, like what the Quantum Computing Playground does. The problem I'm running into is that readPixels takes ~10ms, but I want to call it several times per frame while animating in order to get information out of gpu-land and into javascript-land.
As an example, here's my exact use case. The following circuit animation was created by computing things about the state between each column of gates, in order to show the inline-with-the-wire probability-of-being-on graphing:
The way I'm computing those things now, I'd need to call readPixels eight times for the above circuit (once after each column of gates). This is waaaaay too slow at the moment, easily taking 50ms when I profile it (bleh).
What are some tricks for speeding up readPixels in this kind of use case?
Are there configuration options that significantly affect the speed of readPixels? (e.g. the pixel format, the size, not having a depth buffer)
Should I try to make the readPixel calls all happen at once, after all the render calls have been made (maybe allows some pipelining)?
Should I try to aggregate all the textures I'm reading into a single megatexture and sort things out after a single big read?
Should I be using a different method to get the information back out of the textures?
Should I be avoiding getting the information out at all, and doing all the layout and rendering gpu-side (urgh...)?
Should I try to make the readPixel calls all happen at once, after all the render calls have been made (maybe allows some pipelining)?
Yes, yes, yes. readPixels is fundamentally a blocking, pipeline-stalling operation, and it is always going to kill your performance wherever it happens, because it's sending a request for data to the GPU and then waiting for it to respond, which normal draw calls don't have to do.
Do readPixels as few times as you can (use a single combined buffer to read from). Do it as late as you can. Everything else hardly matters.
Should I be avoiding getting the information out at all, and doing all the layout and rendering gpu-side (urgh...)?
This will get you immensely better performance.
If your graphics are all like you show above, you shouldn't need to do any “layout” at all (which is good, because it'd be very awkward to implement) — everything but the text is some kind of color or boundary animation which could easily be done in a shader, and all the layout can be just a static vertex buffer (each vertex has attributes which point at which simulation-state-texel it should be depending on).
The text will be more tedious merely because you need to load all the digits into a texture to use as a spritesheet and do the lookups into that, but that's a standard technique. (Oh, and divide/modulo to get the digits.)
I don't know enough about your use case but just guessing, Why do you need to readPixels at all?
First, you don't need to draw text or your the static parts of your diagram in WebGL. Put another canvas or svg or img over the WebGL canvas, set the css so they overlap. Let the browser composite them. Then you don't have to do it.
Second, let's assume you have a texture that has your computed results in it. Can't you just then make some geometry that matches the places in your diagram that needs to have colors and use texture coords to look up the results from the correct places in the results texture? Then you don't need to call readPixels at all. That shader can use a ramp texture lookup or any other technique to convert the results to other colors to shade the animated parts of your diagram.
If you want to draw numbers based on the result you can use a technique like this so you'd make a shader at references the result shader to look at a result value and then indexes glyphs from another texture based on that.
Am I making any sense?