I have a problem with ETC1 textures. To load ETC1 textures I use own code that load raw data of ETC1 image, then i use GL operation to load data into GPU memory GLES20.glCompressedTexImage2D(GLES20.GL_TEXTURE_2D, 0, 0x8D64, textureWidth, textureHeight, 0, rawSize, data);
but when device used PowerVR SGX540 GPU, only textures with dimension 512x512 draw correctly. And i don't understand why. OpenGL ES 2.0 standard says that I can use textures with non-power of two dimensions. Please help me to resolve my problem.
It is true that OpenGL ES 2.0 does not have the power of two restriction, however wrap modes and min filter are restricted. Please read the notes on http://www.khronos.org/opengles/sdk/docs/man/xhtml/glTexParameter.xml
which states:
Similarly, if the width or height of a texture image are not powers of two and either the GL_TEXTURE_MIN_FILTER is set to one of the functions that requires mipmaps or the GL_TEXTURE_WRAP_S or GL_TEXTURE_WRAP_T is not set to GL_CLAMP_TO_EDGE, then the texture image unit will return (R, G, B, A) = (0, 0, 0, 1).
Also I recommend you to read the answer and comments on this question: Can OpenGL ES render textures of non base 2 dimensions?
Related
I am working on a dog detection system using deep learning (Tensorflow object detection) and Real Sense D425 camera. I am using the Intel(R) RealSense(TM) ROS Wrapper in order to get images from the camera.
I am executing "roslaunch rs_rgbd.launch" and my Python code is subscribed to "/camera/color/image_raw" topic in order to get the RGB image. Using this image and object detection library, I am able to infer (20 fps) the location of a dog in a image as a box level (xmin,xmax,ymin,ymax)
I will like to crop the PointCloud information with the object detection information (xmin,xmax,ymin,ymax)
and determine if the dog is far away or near the camera. I will like to use the aligned information pixel by pixel between the RGB image and the pointcloud.
How can I do it? Is there any topic for that?
Thanks in advance
Intel publishes their python notebook about the same problem at: https://github.com/IntelRealSense/librealsense/blob/jupyter/notebooks/distance_to_object.ipynb
What they do is as follow :
get color frame and depth frame (point cloud in your case)
align the depth to color
use ssd to detect the dog inside color frame
Get the average depth for detected dog and convert to meter
depth = np.asanyarray(aligned_depth_frame.get_data())
# Crop depth data:
depth = depth[xmin_depth:xmax_depth,ymin_depth:ymax_depth].astype(float)
# Get data scale from the device and convert to meters
depth_scale = profile.get_device().first_depth_sensor().get_depth_scale()
depth = depth * depth_scale
dist,_,_,_ = cv2.mean(depth)
print("Detected a {0} {1:.3} meters away.".format(className, dist))
Hope this help
How can I do a basic face alignment on a 2-dimensional image with the assumption that I have the position/coordinates of the mouth and eyes.
Is there any algorithm that I could implement to correct the face alignment on images?
Face (or image) alignment refers to aligning one image (or face in your case) with respect to another (or a reference image/face). It is also referred to as image registration. You can do that using either appearance (intensity-based registration) or key-point locations (feature-based registration). The second category stems from image motion models where one image is considered a displaced version of the other.
In your case the landmark locations (3 points for eyes and nose?) provide a good reference set for straightforward feature-based registration. Assuming you have the location of a set of points in both of the 2D images, x_1 and x_2 you can estimate a similarity transform (rotation, translation, scaling), i.e. a planar 2D transform S that maps x_1 to x_2. You can additionally add reflection to that, though for faces this will most-likely be unnecessary.
Estimation can be done by forming the normal equations and solving a linear least-squares (LS) problem for the x_1 = Sx_2 system using linear regression. For the 5 unknown parameters (2 rotation, 2 translation, 1 scaling) you will need 3 points (2.5 to be precise) for solving 5 equations. Solution to the above LS can be obtained through Direct Linear Transform (e.g. by applying SVD or a matrix pseudo-inverse). For cases of a sufficiently large number of reference points (i.e. automatically detected) a RANSAC-type method for point filtering and uncertainty removal (though this is not your case here).
After estimating S, apply image warping on the second image to get the transformed grid (pixel) coordinates of the entire image 2. The transform will change pixel locations but not their appearance. Unavoidably some of the transformed regions of image 2 will lie outside the grid of image 1, and you can decide on the values for those null locations (e.g. 0, NaN etc.).
For more details: R. Szeliski, "Image Alignment and Stitching: A Tutorial" (Section 4.3 "Geometric Registration")
In OpenCV see: Geometric Image Transformations, e.g. cv::getRotationMatrix2D cv::getAffineTransform and cv::warpAffine. Note though that you should estimate and apply a similarity transform (special case of an affine) in order to preserve angles and shapes.
For the face there is lot of variability in feature points. So it won't be possible to do a perfect fit of all feature points by just affine transforms. The only way to align all the points perfectly is to warp the image given the points. Basically you can do a triangulation of image given the points and do a affine warp of each triangle to get the warped image where all the points are aligned.
Face detection could be handled based on the just eye positions.
Herein, OpenCV, Dlib and MTCNN offers to detect faces and eyes. Besides, it is a python based framework but deepface wraps those methods and offers an out-of-the box detection and alignment function.
detectFace function applies detection and alignment in the background respectively.
#!pip install deepface
from deepface import DeepFace
backends = ['opencv', 'ssd', 'dlib', 'mtcnn']
DeepFace.detectFace("img.jpg", detector_backend = backends[0])
Besides, you can apply detection and alignment manually.
from deepface.commons import functions
img = functions.load_image("img.jpg")
backends = ['opencv', 'ssd', 'dlib', 'mtcnn']
detected_face = functions.detect_face(img = img, detector_backend = backends[3])
plt.imshow(detected_face)
aligned_face = functions.align_face(img = img, detector_backend = backends[3])
plt.imshow(aligned_face)
processed_img = functions.detect_face(img = aligned_face, detector_backend = backends[3])
plt.imshow(processed_img)
There's a section Aligning Face Images in OpenCV's Face Recognition guide:
http://docs.opencv.org/trunk/modules/contrib/doc/facerec/facerec_tutorial.html#aligning-face-images
The script aligns given images at the eyes. It's written in Python, but should be easy to translate to other languages. I know of a C# implementation by Sorin Miron:
http://code.google.com/p/stereo-face-recognition/
I am developing an iOS game that will need to render 500-800 particles at a time. I have learned that it is a good idea to batch render many sprites in OpenGL ES instead of calling glDrawArrays(..) on every sprite in the game in order to be able to render more sprites w/out a drastic reduction in frame rate.
My question is: how do I batch render 500+ particles that all have different alphas, rotations, and scales, but share the same texture atlas? The emphasis of this question is on the different alphas, rotations, and scales for each particle.
I realize this question is very similar to How do I draw 1000+ particles (w/ unique rotation, scale, and alpha) in iPhone OpenGL ES particle system without slowing down the game?, however, that question does not address batch rendering. Before I take advantage of vertex buffer objects, I want to understand batch rendering in OpenGL ES w/ unique alphas, rotations, and scales (but with the same texture). Therefore, while I plan on using VBOs eventually, I want to take this approach first.
Code examples would greatly be appreciated, and if you use an indices array as some examples do, please explain the structure and purpose of the indices array.
EDIT I am using OpenGL ES 1.1.
EDIT Below is a code example of how I render each particle in the scene. Assume that they share the same texture and that texture is already bound in OpenGL ES 1.1 before this code executes.
- (void) render {
glPushMatrix();
glTranslatef(translation.x, translation.y, translation.z);
glRotatef(rotation.x, 1, 0, 0);
glRotatef(rotation.y, 0, 1, 0);
glRotatef(rotation.z, 0, 0, 1);
glScalef(scale.x, scale.y, scale.z);
// change alpha
glColor4f(1.0, 1.0, 1.0, alpha);
// glBindTexture(GL_TEXTURE_2D, texture[0]);
glVertexPointer(2, GL_FLOAT, 0, texturedQuad.vertices);
glEnableClientState(GL_VERTEX_ARRAY);
glTexCoordPointer(2, GL_FLOAT, 0, texturedQuad.textureCoords);
glEnableClientState(GL_TEXTURE_COORD_ARRAY);
glDrawArrays(GL_TRIANGLE_STRIP, 0, 4);
glDisableClientState(GL_VERTEX_ARRAY);
glDisableClientState(GL_TEXTURE_COORD_ARRAY);
glPopMatrix();
}
A code alternative to this method would be greatly appreciated!
One possibility would be to include those values in a vertex attrib array - I think this is the best option. If you're using OpenGL ES 1.1 instead of 2.0 you're screwed out of this method. Vertex attrib arrays allow you to store values at each vertex in this case you could store the alphas and rotations each in their own attrib array and pass them to the shader with glVertexAttribArray. The shader would then do the rotation transformation and color processing with the alpha.
The other option would be to do rotation transformation on the CPU, and then batch particles with a similar alpha values into several draw calls. This version would require a little bit more work and it would not be a single draw call but it would still help to optimize if the shader is not an option.
NOTE: The question you linked to also recommends the array solution
EDIT: Given your code here is an OpenGL ES 1.0, here's a solution using glColorPointer:
// allocate buffers to store an array of all particle data
verticesBuffer = ...
texCoordBuffer = ...
colorBuffer = ...
for (particle in allParticles)
{
// Create matrix from rotation
rotMatrix = matrix(particle.rotation.x, particle.rotation.y, particle.rotation.z)
// Transform particle by matrix
verticesBuffer[i] = particle.vertices * rotMatrix
// copy other data
texCoordBuffer[i] = particle.texCoords;
colorBuffer[i] = color(1.0, 1.0, 1.0, particle.alpha);
}
glVertexPointer(verticesBuffer, ...)
glTexCoordPointer(texCoodBuffer, ...)
glColorPointer(colorBuffer, ...)
glDrawArrays(particleCount * 4, ...);
A good optimization for this solution would be to share the buffers for each render so you don't have to reallocate them every frame.
Using shader model 5/D3D11/HLSL.
I'd like to treat a 2D array of texels as a 2D matrix of Vectors.
u
v (1,4,3,9) (7, 5.5, 4.9, 2.1)
(Each texel is a 4-component vector). I need to access specific ranges of the data in the texture, for different shaders. So, the ranges to access in the texture naturally should be indexed as u,v components.
How would I do that in HLSL? I'm thinking the following:
Create the texture as per normal
Load your vector values into the texture (1 vector per texel)
Turn off all linear interpolation for texture sampling ("nearest neighbour")
In the shader, look up vectors you need using texture coordinates
The only thing I feel is shaky is whether there will be strange errors introduced when I index the texture using floating point u's and v's.
If the texture is 1024x1024 texels, and I'm trying to index (3,2)->(3,7), that would be u=(3/1024,2/1024)->(3/1024,7/1024) which feels a bit shaky. Is there a way to index the texture by int components, perhaps? Or will it just work out fine?
Texture2DArray
Not desiring to use a GPGPU framework just for this (so no CUDA suggestions pls :).
You can do it using operator[] in hlsl 5.0
See here
Is there a way to get the screen coordinates of the objects in the 3d world?
Implement gluProject().
EDIT: Mesa/SGI has some code.
EDIT2: Or grab a copy of GLU for iOS.
Just implement the transformations of the OpenGL pipeline: modelview, projection, perspective devide, viewport. You get query the current matrices with glGetDoublev and the viewort with glGetIntegerv.
Then you have to compute projection matrix times modelview matrix = MVP.
now for each vertex v compute MVP*v.
then compute v /= v.w;
So you got coordinates in range [-1,1]x[-1,1], the last thing is scaling and translating this into [x,x+w]x[y,y+h] (which are the values of the viewport).
You can also take a look info pages of the OpenGL reference for glFrustum, glViewport, to see how all these transformations are done.