openCL "Arguments mismatch for instruction 'mov'" error - error-handling

So one of my devices (a Nvidia GeForce GT 650m GPU) keeps giving me this weird ptxas application error saying "Arguments mismatch for instruction 'mov' when I try to build a cl_program on that device. It's the only one of my 3 devices that gives me this error. My CPU and other GPU (Intel HD 4000) do not give me this error at all.
Here's an example of a function that causes this error to happen. It's a helper function I use inside one of my kernels:
//Calculate the dot product of two vectors
float Dot(Vector v1, Vector v2)
{
return (v1.x*v2.x + v1.y*v2.y + v1.z*v2.z);
}
First I tried splitting up the work into something like this:
//Calculate the dot product of two vectors)
float Dot(Vector v1, Vector v2)
{
float a = v1.x*v2.x;
float b = v1.y*v2.y;
float c = v1.z*v2.z;
float result = a + b + c;
return result;
}
But that also gives me the same error. Interestingly enough, if I simply set result = 5.0f and return that it magically compiles and runs:
//THIS WILL COMPILE AND RUN
float Dot(Vector v1, Vector v2)
{
float a = v1.x*v2.x;
float b = v1.y*v2.y;
float c = v1.z*v2.z;
float result = 5.0f; //IGNORE THE CALCULATION. JUST MAKE IT 5
return result;
}
So I have no idea what's going on. My 'Dot' function isn't the only function that's affected but one of several. Is my Nvidia card defective?
EDIT Here is the log I get from clGetProgramBuildInfo after the build fails:
ptxas application ptx input, line 703; error : Arguments mismatch for instruction 'mov'
ptxas application ptx input, line 703; error : Unknown symbol 'LIntersection_2E_n'
ptxas application ptx input, line 703; error : Label expected for forward reference of 'LIntersection_2E_n'
ptxas fatal : Ptx assembly aborted due to errors
Although there are more errors printed than just the 'mov' one I described, they all go away when I make the above change of result = 5.0f;

According to the LLVM developers, this is a bug in the nvptx back-end.
LLVMdev forum message discussing this error

Related

Unable to compile tensorflow lite examples on adafruit circuitplayground bluefruit due to missing files in Adafruit_Tensorflow_Lite library

I am unable to compile the examples , hello_world_arcada and micro_speech_arcada shown below , on the adafruit website found here on my Circuit playground bluefruit microcontroller:
I installed the Adafruit_Tensorflow_Lite library as mentioned in the site however it turns out that examples cannot compile because they have numerous missing files. So i downloaded this tensorflow git hub repo and then transfered the missing files into the Adafruit_Tensorflow_Lite library.
I am now facing this error for the missing files : am_bsp.h ,am_mcu_apollo.h , am_util.h , i cannot locate these files on the repo or on google.[Note: i have found the am_bsp.h file in this repo
but it still doesnt compile.
Can anyone assist me in locating where i can find these files or a way to compile the example code mentioned in the adafruit website ?
The error is shown in the pic below of the missing file am_bsp.h when using Arduino IDE to compile:
My code is shown below:
#include <TensorFlowLite.h>
#include "Adafruit_TFLite.h"
#include "Adafruit_Arcada.h"
#include "output_handler.h"
#include "sine_model_data.h"
// Create an area of memory to use for input, output, and intermediate arrays.
// Finding the minimum value for your model may require some trial and error.
const int kTensorAreaSize (2 * 1024);
// This constant represents the range of x values our model was trained on,
// which is from 0 to (2 * Pi). We approximate Pi to avoid requiring additional
// libraries.
const float kXrange = 2.f * 3.14159265359f;
// Will need tuning for your chipset
const int kInferencesPerCycle = 200;
int inference_count = 0;
Adafruit_Arcada arcada;
Adafruit_TFLite ada_tflite(kTensorAreaSize);
// The name of this function is important for Arduino compatibility.
void setup() {
Serial.begin(115200);
//while (!Serial) yield();
arcada.arcadaBegin();
// If we are using TinyUSB we will have the filesystem show up!
arcada.filesysBeginMSD();
arcada.filesysListFiles();
// Set the display to be on!
arcada.displayBegin();
arcada.setBacklight(255);
arcada.display->fillScreen(ARCADA_BLUE);
if (! ada_tflite.begin()) {
arcada.haltBox("Failed to initialize TFLite");
while (1) yield();
}
if (arcada.exists("model.tflite")) {
arcada.infoBox("Loading model.tflite from disk!");
if (! ada_tflite.loadModel(arcada.open("model.tflite"))) {
arcada.haltBox("Failed to load model file");
}
} else if (! ada_tflite.loadModel(g_sine_model_data)) {
arcada.haltBox("Failed to load default model");
}
Serial.println("\nOK");
// Keep track of how many inferences we have performed.
inference_count = 0;
}
// The name of this function is important for Arduino compatibility.
void loop() {
// Calculate an x value to feed into the model. We compare the current
// inference_count to the number of inferences per cycle to determine
// our position within the range of possible x values the model was
// trained on, and use this to calculate a value.
float position = static_cast<float>(inference_count) /
static_cast<float>(kInferencesPerCycle);
float x_val = position * kXrange;
// Place our calculated x value in the model's input tensor
ada_tflite.input->data.f[0] = x_val;
// Run inference, and report any error
TfLiteStatus invoke_status = ada_tflite.interpreter->Invoke();
if (invoke_status != kTfLiteOk) {
ada_tflite.error_reporter->Report("Invoke failed on x_val: %f\n",
static_cast<double>(x_val));
return;
}
// Read the predicted y value from the model's output tensor
float y_val = ada_tflite.output->data.f[0];
// Output the results. A custom HandleOutput function can be implemented
// for each supported hardware target.
HandleOutput(ada_tflite.error_reporter, x_val, y_val);
// Increment the inference_counter, and reset it if we have reached
// the total number per cycle
inference_count += 1;
if (inference_count >= kInferencesPerCycle) inference_count = 0;
}
Try to install the library from below link, it should solve your problems,
https://github.com/tensorflow/tflite-micro-arduino-examples#how-to-install

RESHAPE failed to prepare when invoking tf.signal.stft in tflite

I am building a flutter app that needs to record an audio and predict some label using a tflite model I built. For linking the audio recording and tflite I use the flutter plugin tf-lite audio (https://github.com/Caldarie/flutter_tflite_audio).
The tensorflow model works on colab but when I launch the app and inference happens hence when it calls interpreter.invoke(), the following error occurs:
TensorFlow Lite Error: tensorflow/lite/kernels/reshape.cc:58 stretch_dim != -1 (0 != -1)
TensorFlow Lite Error: Node number 26 (RESHAPE) failed to prepare.
Failed to invoke the interpreter with error: Must call allocateTensors().
2
Fatal error: Unexpectedly found nil while implicitly unwrapping an Optional value: file tflite_audio/SwiftTfliteAudioPlugin.swift, line 290
* thread #2, queue = 'conversionQueue', stop reason = Fatal error: Unexpectedly found nil while implicitly unwrapping an Optional value
frame #0: 0x00000001a672ee08 libswiftCore.dylib`_swift_runtime_on_report
libswiftCore.dylib`_swift_runtime_on_report:
-> 0x1a672ee08 <+0>: ret
libswiftCore.dylib`_swift_reportToDebugger:
0x1a672ee0c <+0>: b 0x1a672ee08 ; _swift_runtime_on_report
libswiftCore.dylib`_swift_shouldReportFatalErrorsToDebugger:
0x1a672ee10 <+0>: adrp x8, 341475
0x1a672ee14 <+4>: ldrb w0, [x8, #0x7c8]
Target 0: (Runner) stopped.
Lost connection to device.
This error message appears even though I added allocateTensors in the SwiftTfliteAudioPlugin.swift file here:
var interval: TimeInterval!
var outputTensor: Tensor!
do {
// Copy the `[Int16]` buffer data as an array of Floats to the audio buffer input Tensor.
let audioBufferData = Data(copyingBufferOf: buffer.map { Float($0) / maxInt16AsFloat32 })
try interpreter.copy(audioBufferData, toInputAt: 0)
// I added this line
try interpreter.allocateTensors()
// Calculate inference time
let startDate = Date()
try interpreter.invoke() //required!!! Do not touch
interval = Date().timeIntervalSince(startDate) * 1000
// Get the output `Tensor` to process the inference results.
outputTensor = try interpreter.output(at: 0)
print(outputTensor as Any)
} catch let error {
print("Failed to invoke the interpreter with error: \(error.localizedDescription)")
}
In the tflite model here is the problematic node on netron
It looks like it is only squeezing the first dimension so maybe it cannot because as you can see on the summary of my model the first dimension is None, I tried some tricks to avoid having this None but I am not familiar enough with tensorflow to be sure about the validity of the operations I am doing.
I have boiled down my model to the minimal size and this node is between these 2 lines of code, so I suspect the tf.signal.stft function to do this reshaping but have no idea.
spectrograms = tf.signal.stft(waveforms,
frame_length=self.fft_size,
frame_step=self.hop_size,
pad_end=False)
magnitude_spectrograms = tf.abs(spectrograms)
Can anyone help on this issue?
Thanks!
As stated in the error message, you need to call allocateTensors first.

Compile error when trying to access StructuredBuffer

I want to access a StructuredBuffer<int>in a compute shader but I get the error:
Shader error in 'Particle.compute': array, matrix, vector, or indexable object type expected in index expression at Particle.compute(28) (on d3d11)
The code:
#pragma kernel CSMain
#include "Assets/Uplus/ZCommon/Resources/ImageProcessing/UplusDirectCompute.cginc"
struct Particle
{
float3 Position;
float Mass;
};
Texture2D<float2> _terTx;
ConsumeStructuredBuffer<Particle> currentBuffer;
AppendStructuredBuffer<Particle> nextBuffer;
StructuredBuffer<int> particleCount;
float3 _terPos;
float _terSize, _terPhysicalScale, _resolution;
SamplerState _LinearClamp;
SamplerState _LinearRepeat;
#define _gpSize 512
[numthreads(_gpSize, 1, 1)]
void CSMain(uint3 dispatchID : SV_DispatchThreadID)
{
int flatID = dispatchID.x;
int particleCount = particleCount[0];
if (flatID >= particleCount) return;
Particle particle = currentBuffer.Consume();
//Commented the rest of code
nextBuffer.Append(particle);
}
The error points the line int particleCount = particleCount[0];. Why is that?
The whole idea behind the shader is we have two buffers. We fill one with some data (we call each of them Particle) from CPU and then in the shader consume the data from the buffer, process it and then append to another buffer. then we swap buffers and do another iteration. The particleCount buffer holds the current count of Particles that the buffer holds and the if clause prevents from consuming more Particles than available.
This is an old question so I assume you solved it, but here is the answer anyway:
You are declaring particleCount as an int when it already is a buffer.
Either change the name to int currentParticleCount = particleCount[0]; or just don't use a temporary variable:
if (flatID >= particleCount[0]) return;

microchip MPLAB X IDE v2.15 "can't generate code for this expression"

I´m trying to compile a simple piece code, but i run into a error "can't generate code for this expression".
i adapted to code from "http://www.barrysoft.it/blog/midi-with-pic-ausart.html"
could someone enlighten me about this problem?
MPLAB X IDE v2.15
xc8 v1.32
midi.c:
void midi_init(void)
{
/* MIDI uses 31250 baud/s serial speed */
uart_init(19, 1, 0, 0 ); //<---
}
midi.c:31: error: (712) can't generate code for this expression
uart.c:
void uart_init(unsigned char spbrg, unsigned bit brgh, unsigned bit sync, unsigned bit parity)
{
// Setup the baud rate
SPBRG = spbrg;
// High speed baud rate
BRGH = brgh; ////
// Synch or Async
SYNC = sync; ////
// 8bit transmission
TX9 = parity; ////
// Enable serial output
SPEN = 1;
// Enable UART out
TXEN = 1;
}
uart.c:29: error: (712) can't generate code for this expression
uart.c:32: error: (712) can't generate code for this expression
uart.c:35: error: (712) can't generate code for this expression
uart.h:
void uart_init(unsigned char spbrg, unsigned bit brgh,unsigned bit sync,unsigned bit parity);
Unable to resolve identifier bit, this seams to be a MPLAB IDE error, witch can be turned off.
This is probably an issue with how the compiler handles data witdths inferior to processor`s native width.
One simple fix is to use macros instead of a function. This works because you let the compiler handle the type conversions and literal data how it sees fit instead of forcing it to commit bit variables to memory locations (for the function call).
In uart.h:
#define uart_init( spbrg, brgh, sync, parity ) \
SPBRG = spbrg;\
BRGH = brgh;\
SYNC = sync;\
TX9 = parity;\
SPEN = 1;\
TXEN = 1
*Note that I intentionally left out the last line`s ';' so that the macro can be called like a function.
In midi.c: No change...
uart_init(19, 1, 0, 0 );

Is g++ 4.5.3 broken when it comes to pointers to lamba functions?

I was trying out lambda functions and making a jump table to execute them, but I found g++ didn't recognize the type of the lambda function such that I could assign them to an array or tuple container.
One such attempt was this:
auto x = [](){};
decltype(x) fn = [](){};
decltype(x) jmpTable[] = { [](){}, [](){} };
On compilation I get these errors:
tst.cpp:53:27: error: conversion from ‘main()::<lambda()>’ to non-scalar type ‘main()::<lambda()>’ requested
tst.cpp:54:39: error: conversion from ‘main()::<lambda()>’ to non-scalar type ‘main()::<lambda()>’ requested
Hmmmm, can't convert from type A to non-scalar type A? What's that mean? o.O
I can use std::function to get this to work, but a problem with that is it doesn't seem to work with tuple:
function<void()> jmpTable[] = [](){}; // works
struct { int i; function<void()>> fn; } myTuple = {1, [](){}}; // works
tuple<int, function<void()>> stdTuple1 = {1, [](){}}; // fails
tuple<int, function<void()>> stdTuple2 = make_tuple(1, [](){}); // works
tst.cpp:43:58: error: converting to ‘std::tuple<int, std::function<void()> >’ from initializer list would use explicit constructor ‘std::tuple<_T1, _T2>::tuple(_U1&&, _U2&&) [with _U1 = int, _U2 = main()::<lambda()>, _T1 = int, _T2 = std::function<void()>]’
Constructor marked explicit? Why?
So my question is if I am doing something invalid or is this version just not quite up to the task?
Hmmmm, can't convert from type A to non-scalar type A? What's that mean? o.O
No, that's not a conversion to the same type. Despite having identical bodies, the different lambdas have different types. Newer versions of GCC make this clearer, and give the error message:
error: conversion from '__lambda1' to non-scalar type '__lambda0' requested
clang does even better:
error: no viable conversion from '<lambda at test.cc:2:18>' to 'decltype(x)' (aka '<lambda at test.cc:1:10>')
I can use std::function to get this to work, but a problem with that is it doesn't seem to work with tuple:
It does (with 4.5.4, at least, I don't have 4.5.3 to test), but your initialisation isn't quite right.
tuple<int, function<void()>> stdTuple1 {1, [](){}}; // lose the = to initialise stdTuple1 directly
I'm not sure about the state of n3043 in 4.5.3, but you should be able to use function pointer conversion. If I'm not misunderstanding your usage intentions, this may work for you;
void (*x)();
decltype(x) fn = [](){};
decltype(x) jmpTable[] = { [](){}, [](){} };