I want to develop TensorFlow on an android device, So far I trained with python and export model to Protobuf .pb file
the .pb file tested on python and its return no error
......
graph = load_graph("./frozen_model.pb")
for op in graph.get_operations():
print(op.name)
with tf.Session(graph=graph) as sess:
tf_predik = graph.get_tensor_by_name("prefix/tf_pred:0")
tf_data = graph.get_tensor_by_name("prefix/tf_data:0")
img = np.invert(Image.open("7.png").convert('L')).ravel(); image = array(img).reshape(1, 28,28,1);
fd = {tf_data: image};
test_pred = sess.run(tf_predik, feed_dict=fd); temp = np.argmax(test_pred, axis=1); print(temp)
My try on In Xamarin Android:
using Org.Tensorflow.Contrib.Android;
.....
var assets = Android.App.Application.Context.Assets;
var inferenceInterface = new TensorFlowInferenceInterface(assets, "frozen_model.pb");
using (Stream inputSteam = this.Assets.Open("7.png"))
{
byte[] bytes = inputSteam.ReadAllBytes();// convert to byte array???
inferenceInterface.Feed("tf_data", bytes, bytes.Length);
inferenceInterface.Run(new [] { "tf_pred:0" });
inferenceInterface.Fetch("tf_pred:0", predictions);
....
}
I get an error:
Java.Lang.IllegalArgumentException: Expects arg[0] to be float but uint8 is provided
Thank in advance.
Expects arg[0] to be float but uint8 is provided
TensorFlowInferenceInterface.Feed is expecting an array of float and thus you need to convert that asset-based image, decode its file encoding (jpg|png|...) to a Bitmap and obtain the float array from that.
Android Bitmap To Float Array
public float[] AndroidBitmapToFloatArray(Bitmap bitmap)
{
// Assuming a square image to sample|process, adjust based upon your model requirements
const int sizeX = 255;
const int sizeY = 255;
float[] floatArray;
int[] intArray;
using (var sampleImage = Bitmap.CreateScaledBitmap(bitmap, sizeX, sizeY, false).Copy(Bitmap.Config.Argb8888, false))
{
floatArray = new float[sizeX * sizeY * 3];
intArray = new int[sizeX * sizeY];
sampleImage.GetPixels(intArray, 0, sizeX, 0, 0, sizeX, sizeY);
sampleImage.Recycle();
}
for (int i = 0; i < intArray.Length; ++i)
{
var intValue = intArray[i];
floatArray[i * 3 + 0] = ((intValue & 0xFF) - 104);
floatArray[i * 3 + 1] = (((intValue >> 8) & 0xFF) - 117);
floatArray[i * 3 + 2] = (((intValue >> 16) & 0xFF) - 123);
}
return floatArray;
}
Example:
float[] feedArray;
using (var imageAsset = Assets.Open("someimage"))
using (var bitmappAsset = BitmapFactory.DecodeStream(imageAsset))
{
feedArray = AndroidBitmapToFloatArray(bitmappAsset);
}
inferenceInterface.Feed("tf_data", feedArray, feedArray.Length);
Related
I'm trying to apply the solution presented in this Question's answer to crop a bitmap using RenderSript. My approach is a little bit different since I'm using bitmap directly as an input to crop.
Having very limited knowledge in RenderScript I developed the below code to crop my bitmap by modifying that answer.
crop.rs
#pragma version(1)
#pragma rs java_package_name(com.xxx.yyy)
#pragma rs_fp_relaxed
int32_t width;
int32_t height;
rs_allocation croppedImg;
uint xStart, yStart;
void __attribute__((kernel)) doCrop(uchar4 in,uint32_t x, uint32_t y) {
rsSetElementAt_uchar4(croppedImg,in, x-xStart, y-yStart);
}
ImageCropper
import android.content.Context
import android.graphics.Bitmap
import androidx.renderscript.*
import com.xxx.yyy.ScriptC_crop
import kotlin.math.abs
class ImageCropper(context: Context?) {
val rs = RenderScript.create(context)
var dx = 0 // (-width < dx < width);
var dy = 250 // (- height < dy < height);
var xStart = 50
var xEnd = 100
var yStart = 50
var yEnd = 100
fun crop(sourceBitmap: Bitmap): Bitmap{
val width = sourceBitmap.width
val height = sourceBitmap.height
if (dx<0) {
xStart = abs(dx)
xEnd= width
} else {
xStart = 0
xEnd = width - abs(dx)
}
if (dy<0) {
yStart= abs(dy)
yEnd=height
} else {
yStart = 0;
yEnd = height - abs(dy)
}
val cropperScript = ScriptC_crop(rs)
val inputType = Type.createXY(rs, Element.RGBA_8888(rs), width, height)
val inputAllocation = Allocation.createTyped(rs, inputType, Allocation.USAGE_SCRIPT)
inputAllocation.copyFrom(sourceBitmap)
val outputType = Type.createXY(rs, Element.RGBA_8888(rs), xEnd - xStart, yEnd - yStart)
val outputAllocation = Allocation.createTyped(rs, outputType, Allocation.USAGE_SCRIPT)
cropperScript._croppedImg = outputAllocation
cropperScript._width = width
cropperScript._height = height
cropperScript._xStart = xStart.toLong()
cropperScript._yStart = yStart.toLong()
val launchOptions: Script.LaunchOptions = Script.LaunchOptions()
launchOptions.setX(xStart, xEnd)
launchOptions.setY(yStart, yEnd)
cropperScript.forEach_doCrop(inputAllocation, launchOptions)
val resultBitmap = Bitmap.createBitmap(xEnd - xStart, yEnd - yStart, sourceBitmap.config)
outputAllocation.copyTo(resultBitmap)
rs.destroy()
return resultBitmap
}
}
When I try to execute the code I get a bellow error.
2021-04-11 20:55:41.639 18145-18202/com.chathuranga.shan.renderscriptexample E/RenderScript: Script::setVar unable to set allocation, invalid slot index
2021-04-11 20:55:41.645 18145-18202/com.chathuranga.shan.renderscriptexample A/libc: Fatal signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x519 in tid 18202 (erscriptexample), pid 18145 (erscriptexample)
2021-04-11 20:55:41.639 18145-18202/com.chathuranga.shan.renderscriptexample E/RenderScript: Script::setVar unable to set allocation, invalid slot index
This error occurs in the line where I set _xStart and _yStart values. What am I doing wrong? is it some sort of data type issue?
Translating Obj-C to Swift. As you can see I declared let buf = UnsafeMutablePointer<UInt8>(CVPixelBufferGetBaseAddress(cvimgRef)) so I'm getting the error in the for loop below it.
Binary operator '+=' cannot be applied to operands of type 'Int' and 'UInt8'
Also as a little addendum I don't know how to translate the remaining Obj-C code below the for loop. What does that slash mean and how do I deal with the pointer? I have to say UnsafeMutableFloat somewhere?
// process the frame of video
func captureOutput(captureOutput:AVCaptureOutput, didOutputSampleBuffer sampleBuffer:CMSampleBuffer, fromConnection connection:AVCaptureConnection) {
// if we're paused don't do anything
if currentState == CurrentState.statePaused {
// reset our frame counter
self.validFrameCounter = 0
return
}
// this is the image buffer
var cvimgRef:CVImageBufferRef = CMSampleBufferGetImageBuffer(sampleBuffer)
// Lock the image buffer
CVPixelBufferLockBaseAddress(cvimgRef, 0)
// access the data
var width: size_t = CVPixelBufferGetWidth(cvimgRef)
var height:size_t = CVPixelBufferGetHeight(cvimgRef)
// get the raw image bytes
let buf = UnsafeMutablePointer<UInt8>(CVPixelBufferGetBaseAddress(cvimgRef))
var bprow: size_t = CVPixelBufferGetBytesPerRow(cvimgRef)
var r = 0
var g = 0
var b = 0
for var y = 0; y < height; y++ {
for var x = 0; x < width * 4; x += 4 {
b += buf[x]; g += buf[x + 1]; r += buf[x + 2] // error
}
buf += bprow() // error
}
Remaining Obj-C code.
r/=255*(float) (width*height);
g/=255*(float) (width*height);
b/=255*(float) (width*height);
You have a lot of type mismatch error.
The type of x should not be UInt8 because x to increase until the value of the width.
for var x:UInt8 = 0; x < width * 4; x += 4 { // error: '<' cannot be applied to operands of type 'UInt8' and 'Int'
So fix it like below:
for var x = 0; x < width * 4; x += 4 {
To increment the pointer address, you can use advancedBy() function.
buf += bprow(UnsafeMutablePointer(UInt8)) // error: '+=' cannot be applied to operands of type 'UnsafeMutablePointer<UInt8>' and 'size_t'
Like below:
var pixel = buf.advancedBy(y * bprow)
And this line,
RGBtoHSV(r, g, b) // error
There are no implicit casts in Swift between CGFloat and Float unfortunately. So you should cast explicitly to CGFloat.
RGBtoHSV(CGFloat(r), g: CGFloat(g), b: CGFloat(b))
The whole edited code is here:
func RGBtoHSV(r: CGFloat, g: CGFloat, b: CGFloat) -> (h: CGFloat, s: CGFloat, v: CGFloat) {
var h: CGFloat = 0.0
var s: CGFloat = 0.0
var v: CGFloat = 0.0
let col = UIColor(red: r, green: g, blue: b, alpha: 1.0)
col.getHue(&h, saturation: &s, brightness: &v, alpha: nil)
return (h, s, v)
}
// process the frame of video
func captureOutput(captureOutput:AVCaptureOutput, didOutputSampleBuffer sampleBuffer:CMSampleBuffer, fromConnection connection:AVCaptureConnection) {
// if we're paused don't do anything
if currentState == CurrentState.statePaused {
// reset our frame counter
self.validFrameCounter = 0
return
}
// this is the image buffer
var cvimgRef = CMSampleBufferGetImageBuffer(sampleBuffer)
// Lock the image buffer
CVPixelBufferLockBaseAddress(cvimgRef, 0)
// access the data
var width = CVPixelBufferGetWidth(cvimgRef)
var height = CVPixelBufferGetHeight(cvimgRef)
// get the raw image bytes
let buf = UnsafeMutablePointer<UInt8>(CVPixelBufferGetBaseAddress(cvimgRef))
var bprow = CVPixelBufferGetBytesPerRow(cvimgRef)
var r: Float = 0.0
var g: Float = 0.0
var b: Float = 0.0
for var y = 0; y < height; y++ {
var pixel = buf.advancedBy(y * bprow)
for var x = 0; x < width * 4; x += 4 { // error: '<' cannot be applied to operands of type 'UInt8' and 'Int'
b += Float(pixel[x])
g += Float(pixel[x + 1])
r += Float(pixel[x + 2])
}
}
r /= 255 * Float(width * height)
g /= 255 * Float(width * height)
b /= 255 * Float(width * height)
//}
// convert from rgb to hsv colourspace
var h: Float = 0.0
var s: Float = 0.0
var v: Float = 0.0
RGBtoHSV(CGFloat(r), g: CGFloat(g), b: CGFloat(b)) // error
}
The Fit Image Palette is quite nice and powerful. Is there a script interface that we can access it directly?
There is a script interface, and the example script below will get you started. However, the script interface is not officially supported. It might therefore be buggy or likely to change in future GMS versions.
For GMS 2.3 the following script works:
// create the input image:
Image input := NewImage("formula test", 2, 100)
input = 500.5 - icol*11.1 + icol*icol*0.11
// add some random noise:
input += (random()-0.5)*sqrt(abs(input))
// create image with error data (not required)
Image errors := input.ImageClone()
errors = tert(input > 1, sqrt(input), 1)
// setup fit:
Image pars := NewImage("pars", 2, 3)
Image parsToFit := NewImage("pars to fit", 2, 3)
pars = 10; // starting values
parsToFit = 1;
Number chiSqr = 1e6
Number conv_cond = 0.00001
Result("\n starting pars = {")
Number xSize = pars.ImageGetDimensionSize(0)
Number i = 0
for (i = 0; i < xSize; i++)
{
Result(GetPixel(pars, i, 0))
if (i < (xSize-1)) Result(", ")
}
Result("}")
// fit:
String formulaStr = "p0 + p1*x + p2*x**2"
Number ok = FitFormula(formulaStr, input, errors, pars, parsToFit, chiSqr, conv_cond)
Result("\n results pars = {")
for (i = 0; i < xSize; i++)
{
Result(GetPixel(pars, i, 0))
if (i < (xSize-1)) Result(", ")
}
Result("}")
Result(", chiSqr ="+ chiSqr)
// plot results of fit:
Image plot := PlotFormula(formulaStr, input, pars)
// compare the plot and original data:
Image compare := NewImage("Compare Fit", 2, 100, 3)
compare[icol, 0] = input // original data
compare[icol, 1] = plot // fit function
compare[icol, 2] = input - plot // residuals
ImageDocument linePlotDoc = CreateImageDocument("Test Fitting")
ImageDisplay linePlotDsp = linePlotDoc.ImageDocumentAddImageDisplay(compare, 3)
linePlotDoc.ImageDocumentShow()
I have an application, that extracts text and rectangles from pdf files for further analysis. I use ItextSharp for extraction, and everything worked smoothly, until I stumbled upon a document, which has some strange table cell rectangles. The values in the drawing commands, that I retrieve, seem 10 times larger, than actual dimensions of the latter rectangles.
Just an example :
2577 831.676 385.996 3.99609 re
At the same time, when viewing the document all rectangles seem to correctly fit in the bounds of document pages. My guess is that there should be some scaling command, telling, that these values should be scaled down. Is the assumption right, or how is it possible, that such large rectangles are rendered so, that they stay inside the bounds of a page ?
The pdf document is behind this link : https://www.dropbox.com/s/gyvon0dwk6a9cj0/prEVS_ISO_11620_KOM_et.pdf?dl=0
The code, that handles extraction of dimensions from PRStream is as follows :
private static List<PdfRect> GetRectsAndLinesFromStream(PRStream stream)
{
var streamBytes = PdfReader.GetStreamBytes(stream);
var tokenizer = new PRTokeniser(new RandomAccessFileOrArray(streamBytes));
List<string> newBuf = new List<string>();
List<PdfRect> rects = new List<PdfRect>();
List<string> allTokens = new List<string>();
float[,] ctm = null;
List<float[,]> ctms = new List<float[,]>();
//if current ctm has not yet been added to list
bool pendingCtm = false;
//format definition for string-> float conversion
var format = new System.Globalization.NumberFormatInfo();
format.NegativeSign = "-";
while (tokenizer.NextToken())
{
//Add them to our master buffer
newBuf.Add(tokenizer.StringValue);
if (
tokenizer.TokenType == PRTokeniser.TokType.OTHER && newBuf[newBuf.Count - 1] == "re"
)
{
float startPointX = (float)double.Parse(newBuf[newBuf.Count - 5], format);
float startPointY = (float)double.Parse(newBuf[newBuf.Count - 4], format);
float width = (float)double.Parse(newBuf[newBuf.Count - 3], format);
float height = (float)double.Parse(newBuf[newBuf.Count - 2], format);
float endPointX = startPointX + width;
float endPointY = startPointY + height;
//if transformation is defined, correct coordinates
if (ctm!=null)
{
//extract parameters
float a = ctm[0, 0];
float b = ctm[0, 1];
float c = ctm[1, 0];
float d = ctm[1, 1];
float e = ctm[2, 0];
float f = ctm[2, 1];
//reverse transformation to get x and y from x' and y'
startPointX = (startPointX - startPointY * c - e) / a;
startPointY = (startPointY - startPointX * b - f) / d;
endPointX = (endPointX - endPointY * c - e) / a;
endPointY = (endPointY - endPointX * b - f) / d;
}
rects.Add(new PdfRect(startPointX, startPointY , endPointX , endPointY ));
}
//store current ctm
else if (tokenizer.TokenType == PRTokeniser.TokType.OTHER && newBuf[newBuf.Count - 1] == "q")
{
if (ctm != null)
{
ctms.Add(ctm);
pendingCtm = false;
}
}
//fetch last ctm and remove it from list
else if (tokenizer.TokenType == PRTokeniser.TokType.OTHER && newBuf[newBuf.Count - 1] == "Q")
{
if (ctms.Count > 0)
{
ctm = ctms[ctms.Count - 1];
ctms.RemoveAt(ctms.Count -1 );
}
}
else if (tokenizer.TokenType == PRTokeniser.TokType.OTHER && newBuf[newBuf.Count - 1] == "cm")
{
// x' = x*a + y*c + e ; y' = x*b + y*d + f
float a = (float)double.Parse(newBuf[newBuf.Count - 7], format);
float b = (float)double.Parse(newBuf[newBuf.Count - 6], format);
float c = (float)double.Parse(newBuf[newBuf.Count - 5], format);
float d = (float)double.Parse(newBuf[newBuf.Count - 4], format);
float e = (float)double.Parse(newBuf[newBuf.Count - 3], format);
float f = (float)double.Parse(newBuf[newBuf.Count - 2], format);
float[,] tempCtm = ctm;
ctm = new float[3, 3] {
{a,b,0},
{c,d,0},
{e,f,1}
};
//multiply matrices to form 1 transformation matrix
if (pendingCtm && tempCtm != null)
{
float[,] resultantCtm;
if (!TryMultiplyMatrix(tempCtm, ctm, out resultantCtm))
{
throw new InvalidOperationException("Invalid transform matrix");
}
ctm = resultantCtm;
}
//current CTM has not yet been saved to stack
pendingCtm = true;
}
return rects;
}
The command you are looking for is cm. Did you read The ABC of PDF with iText? The book isn't finished yet, but you can already download the first five chapters.
This is a screen shot of the table that shows the cm operator:
This is an example of 5 shapes that are created in the exact same way, using identical syntax:
They are added at different positions, even in a different size and shape, because of the change in the graphics state: the coordinate system was changed, and the shapes are rendered in that altered coordinate system.
I have created and learned autoencoder in Encog and I try to rip it into parts: encoder and decoder part. Unfortunately I cannot get it and I keep getting strange improper data (comparing result from applying once net to data and twice data -> enc -> dec).
I have tried to make it with simply GetWeight and SetWeight but there result is incorrect. The solution found in encog documentation - initialization flat network is for me not clear (I cannot get it working).
public static BasicNetwork getEncoder(BasicNetwork net)
{
var enc = new BasicNetwork();
enc.AddLayer(new BasicLayer(null, true, net.GetLayerNeuronCount(0)));
enc.AddLayer(new BasicLayer(new ActivationSigmoid(), true, net.GetLayerNeuronCount(1)));
enc.AddLayer(new BasicLayer(new ActivationSigmoid(), false, net.GetLayerNeuronCount(2)));
enc.Structure.FinalizeStructure ();
var weights1 = net.Structure.Flat.Weights;
var weights2 = enc.Structure.Flat.Weights;
var idx1 = net.Structure.Flat.WeightIndex;
var idx2 = enc.Structure.Flat.WeightIndex;
for(var i = 0; i < 1; i++)
{
int n = net.GetLayerNeuronCount(i);
int m = net.GetLayerNeuronCount(i + 1);
Console.WriteLine("Decoder: {0} - {1}", n, m);
for(var j = 0; j < n; j++)
{
for(var k = 0; k < m; k++)
{
weights1 [idx1[i] + j * m + k] = weights2 [idx2[i] + j * m * k];
}
}
}
return enc;
}
Full old-like (set/get weight) code of AutoEncoder:
using System;
using Encog.Engine.Network.Activation;
using Encog.ML.Data;
using Encog.ML.Data.Basic;
using Encog.ML.Train;
using Encog.Neural.Networks;
using Encog.Neural.Networks.Layers;
using Encog.Neural.Networks.Training.Propagation.Resilient;
namespace engine
{
public class AutoEncoder
{
private int k = 0;
public IMLDataSet trainingSet
{
get;
set;
}
public AutoEncoder(int k)
{
this.k = k;
}
public static BasicNetwork getDecoder(BasicNetwork net)
{
var dec = new BasicNetwork();
dec.AddLayer(new BasicLayer(null, true, net.GetLayerNeuronCount(1)));
dec.AddLayer(new BasicLayer(new ActivationSigmoid(), true, net.GetLayerNeuronCount(2)));
dec.Structure.FinalizeStructure();
for(var i = 1; i < 2; i++)
{
int n = net.GetLayerNeuronCount(i);
int m = net.GetLayerNeuronCount(i + 1);
Console.WriteLine("Decoder: {0} - {1}", n, m);
for(var j = 0; j < n; j++)
{
for(var k = 0; k < m; k++)
{
dec.SetWeight(i - 1, j, k, net.GetWeight(i, j, k));
}
}
}
return dec;
}
public static BasicNetwork getEncoder(BasicNetwork net)
{
var enc = new BasicNetwork();
enc.AddLayer(new BasicLayer(null, true, net.GetLayerNeuronCount(0)));
enc.AddLayer(new BasicLayer(new ActivationSigmoid(), true, net.GetLayerNeuronCount(1)));
enc.Structure.FinalizeStructure();
for(var i = 0; i < 1; i++)
{
int n = net.GetLayerNeuronCount(i);
int m = net.GetLayerNeuronCount(i + 1);
Console.WriteLine("Encoder: {0} - {1}", n, m);
for(var j = 0; j < n; j++)
{
for(var k = 0; k < m; k++)
{
enc.SetWeight(i, j, k, net.GetWeight(i, j, k));
}
}
}
return enc;
}
public BasicNetwork learn(double[][] data,
double eps = 1e-6,
long trainMaxIter = 10000)
{
int n = data.Length;
int m = data[0].Length;
double[][] output = new double[n][];
for(var i = 0; i < n; i++)
{
output[i] = new double[m];
data[i].CopyTo(output[i], 0);
}
var network = new BasicNetwork();
network.AddLayer(new BasicLayer(null, true, m));
network.AddLayer(new BasicLayer(new ActivationSigmoid(), true, k));
network.AddLayer(new BasicLayer(new ActivationSigmoid(), true, m));
network.Structure.FinalizeStructure();
network.Reset();
trainingSet = new BasicMLDataSet(data, output);
IMLTrain train = new ResilientPropagation(network, trainingSet);
int epoch = 1;
do
{
train.Iteration();
Console.WriteLine(#"Epoch #" + epoch + #" Error:" + train.Error);
epoch++;
} while(train.Error > eps && epoch < trainMaxIter);
train.FinishTraining();
return network;
}
}
}
How can I correctly rip only two first layers from autoencoder for encoder and two last layers from one for decoder?
If you need direct access to the weights, the best method is to use BasicNetwork.GetWeight(). Here is an example that shows how to use GetWeight to obtain all of the weights in the neural network. It is from a unit test, to prove that GetWeight does work, it calculates the output of a simple neural network using BasicNetwork.Compute and also manually just by summing the weighted inputs and applying the TanH. Both result in the same output.
More info here too, if you want to access the weight array directly: http://www.heatonresearch.com/wiki/Weight
var network = new BasicNetwork();
network.AddLayer(new BasicLayer(null, true, 2));
network.AddLayer(new BasicLayer(new ActivationTANH(), true, 2));
network.AddLayer(new BasicLayer(new ActivationTANH(), false, 1));
network.Structure.FinalizeStructure();
network.Reset(100);
BasicMLData input = new BasicMLData(2);
input[0] = 0.1;
input[1] = 0.2;
Console.WriteLine("Using network: " + network.Compute(input));
// now manually
double sum1 = (input[0]*network.GetWeight(0, 0, 0))
+ (input[1]*network.GetWeight(0, 1, 0))
+ (1.0*network.GetWeight(0,2,0));
double sum2 = (input[0]*network.GetWeight(0, 0, 1))
+ (input[1]*network.GetWeight(0, 1, 1))
+ (1.0*network.GetWeight(0,2,1));
double hidden1 = Math.Tanh(sum1);
double hidden2 = Math.Tanh(sum2);
double sum3 = (hidden1 * network.GetWeight(1, 0, 0))
+ (hidden2 * network.GetWeight(1, 1, 0))
+ (1.0 * network.GetWeight(1, 2, 0));
double output = Math.Tanh(sum3);
Console.WriteLine("Using manual: " + network.Compute(input));