Optimization 3d models and reusing single texture - optimization

If I use one texture for many objects, where each object selects the color (RGB) it needs from the 256 available (1x256 file), is this considered a resource saving or does it create a texture load queue?
And secondly, I have an OBJ file that has a slightly unoptimized number of triangles (example 3420), but weighs 5 KB, after rebuilding the model in blender, the number of triangles decreases (2440), but the file size increases to conditional 8-9 KB. In this case, what is more important for optimization (in the work of the GPU, RAM or CPU)? Number of triangles or file size?
And sorry for the dumb questions :)
Non-optimize with Blender:
3420 triangles, 5 KB file size
Blender remeshing:
2440 triangles, 8-9 KB file size
Environment:
Godot on Vulkan API

Related

Transpose an image2d_t in OpenCL

I work on an image processing code base that uses image2d_t objects everywhere. These have their shape (width and height) formally declared which enables programmers to use built-in boundary checking and so on.
To speed-up a separable 2D convolution, I would like to transpose the image temporarily, so the two 1D convolutions access memory along lines. But since all the image2d_t buffers have the same shape, I need to reshape 2 of them, while not reallocating them (if I need to realloc + transpose, then the speed-up adds up to almost nothing).
Is there a way to switch width and height properties in the image2d_t object ?
There is no point in transposing image2d_t objects.
image2d_t objects represent texture memory. Texture memory is a special kind of memory that is hardware-optimized for situations where threads of a warp / wavefront access elements in nearby 2D locations (x and y).
By 'nearby 2D locations' I mean not necessarily on the same horizontal line (x) and not necessarily in discrete pixel locations.
The GPU hardware has special support for 'texture sampling' - allowing you to 'sample' the texture in non discrete locations and obtain interpolated pixel values.
The exact manner in which texture memory is implemented is vendor dependent, but the general idea is to have 2D regional tiles reside in the same physical line in memory.
Examples where using texture memory makes sense:
Texture mapping in computer graphics. Adjacent pixels in an object sample their color from adjacent 2D locations in an input image.
Image transformation in image processing - scaling, rotating, distorting and undistorting an image. Situations where you 'sample' an input image in an arbitrarily calculated location and write the sample to a target buffer / image.
For most cases in image processing applications, texture memory makes no sense.
Many image processing algorithms access memory in a known pattern, which can be better optimized using linear memory (opencl buffers), and have less overhead.
As for your specific question:
Is there a way to switch width and height properties in the image2d_t
object?
No. image2d_t objects are 'immutable'. Their content can be changed however if you allocate them with appropriate flags and pass them to a kernel as __write_only.
I suggest you switch to using buffer objects. Transposing them is possible to do efficiently and there are some good examples online.

Should the size of the photos be the same for deep learning?

I have lots of image (about 40 GB).
My images are small but they don't have same size.
My images aren't from natural things because I made them from a signal so all pixels are important and I can't crop or delete any pixel.
Is it possible to use deep learning for this kind of images with different shapes?
All pixels are important, please take this into consideration.
I want a model which does not depend on a fixed size input image. Is it possible?
Without knowing what you're trying to learn from the data, it's tough to give a definitive answer:
You could pad all the data at the beginning (or end) of the signal so
they're all the same size. This allows you to keep all the important
pixels, but adds irrelevant information to the image that the network
will most likely ignore.
I've also had good luck with activations where you take a pretrained
network and pull features from the image at a certain part of the
network regardless of size (as long as it's larger than the network
input size). Then run through a classifier.
https://www.mathworks.com/help/deeplearning/ref/activations.html#d117e95083
Or you could window your data, and only process smaller chunks at one
time.
https://www.mathworks.com/help/audio/examples/cocktail-party-source-separation-using-deep-learning-networks.html

Voxel Engine and Optimization

Recently I've started developing voxel engine. What I need is only colorful voxels without texture, but at very large amount (much smaller than minecraft) - and the question is how to draw the scene very fast? I'm using c#/xna but this is in my opinion not very important in this case, let's talk about general cases. Look at these two games:
http://www.youtube.com/watch?v=EKdRri5jSMs
http://www.youtube.com/watch?v=in0bavLJ8KQ
Especially I think video number 2 represents great optimization methods (my gfx card starts choking just at 192 x 192 x 64) How they achieve this?
What i would to have in the engine:
colorful voxels without texture, but shaded
many, many voxels, say minimum 512 x 512 x 128 to achieve something like video #2
shadows (smooth shadows will be great but this is not necessary)
optional: dynamic lighting (for example from fireballs flying, which light up near voxel structures)
framerate minimum 40 FPS
camera have 3 ways of freedom (move in x-axis, move in y-axis, move in z-axis), no camera rotation is needed
finally optional feature may be Depth of Field (it will be sweet ^^ )
What optimization I have already know:
remove unseen voxels that resides inside voxel structure (covered
from six directions by other voxels)
remove unseen faces of voxels - because camera have no rotation and always look aslant forward like in TPP games, so if we divide screen
by vertical cut, left voxels and right voxels will show only 3 faces
keep voxels in Dictionary instead of 3-dimensional array - jumping through array of size 512 x 512 x 128 takes miliseconds which is
unacceptable - but dictionary int:color where int describes packed
3D position is much much faster
use instancing where applciable
occluding? (how to do this?)
space dividing / octtree (is it good idea?)
I'll be very thankful if someone give me a tip how to improve existing optimizations listed above or can share ideas of new improvements. Thanks
1) Voxatron uses a software renderer rather than the GPU. You can read some details about it if you read the comments in this blog post:
http://www.lexaloffle.com/bbs/?tid=201
I haven't looked in detail myself so can't tell you much more than that.
2) I've never played 3D Dot Game Heroes but I don't have any reason to believe it uses voxels at all. I mean, I don't see any cubes being added or deleted. Most likely it is just a static polygon mesh with a nice texture applied.
As for implementing it yourself, do not try to draw the world by rendering cubes as this is very slow. Instead you should process the volume and generate meshes lying on the intersection of solid voxels and empty ones. Break the volume into suitable sized regions (e.g. 32x32x32) and generate a mesh for each.
I have written a book article about this which you might find useful. It's actually about smooth voxel terain but a lot of the priciples stll apply.
You can read it on Google books here: http://books.google.com/books?id=WNfD2u8nIlIC&lpg=PR1&dq=game%20engine%20gems&pg=PA39#v=onepage&q&f=false
And you can find the associated source code here: http://www.thermite3d.org
Since you are using XNA, you can just use instancing to get the desired effect: http://www.float4x4.net/index.php/2010/06/hardware-instancing-in-xna/
http://roecode.wordpress.com/2008/03/17/xna-framework-gameengine-development-part-19-hardware-instancing-pc-only/
The underlying concept is instancing: this feature lets you specify some amount of repeating data and some amount of varying data in a single DrawIndexedPrimitive call. In your case, the instance stream would be a single solid box, and the other stream would be the transform and color information.

Efficient thumbnail generation of huge pdf file?

In a system I'm working on we're generating thumbnails as part of the workflow.
Sometimes the pdf files are quite large (print size 3m2) and can contain huge bitmap images.
Are there thumbnail generation capable programs that are optimized for memory footprint handling such large pdf files?
The resulting thumbnail can be png or jpg.
ImageMagick is what I use for all my CLI graphics, so maybe it can work for you:
convert foo.pdf foo-%png
This produces three separate PNG files:
foo-0.png
foo-1.png
foo-2.png
To create only one thumbnail, treat the PDF as if it were an array ([0] is the first page, [1] is the second, etc.):
convert foo.pdf[0] foo-thumb.png
Since you're worrying about memory, with the -cache option, you can restrict memory usage:
-cache threshold megabytes of memory available to the pixel cache.
Image pixels are stored in memory
until threshold megabytes of memory have been
consumed. Subsequent pixel operations
are cached on disk. Operations to
memory are significantly faster but
if your computer does not have a
sufficient amount of free memory you
may want to adjust this threshold
value.
So to thumbnail a PDF file and resize it,, you could run this command which should have a max memory usage of around 20mb:
convert -cache 20 foo.pdf[0] -resize 10%x10% foo-thumb.png
Or you could use -density to specify the output density (900 scales it down quite a lot):
convert -cache 20 foo.pdf[0] -density 900 foo-thumb.png
Should you care? Current affordable servers have 512 GB ram. That supports storing a full colour uncompressed bitmap of over 9000 inches (250 m) square at 1200 dpi. The performance hit you take from using disk is large.

Streaming Jpeg Resizer

Does anyone know of any code that does streaming Jpeg resizing. What I mean by this is reading a chunk of an image (depending on the original source and destination size this would obviously vary), and resizing it, allowing for lower memory consumption when resizing very large jpegs. Obviously this wouldn't work for progressive jpegs (or at least it would become much more complicated), but it should be possible for standard jpegs.
The design of JPEG data allows simple resizing to 1/2, 1/4 or 1/8 size. Other variations are possible. These same size reductions are easy to do on progressive jpegs as well and the quantity of data to parse in a progressive file will be much less if you want a reduced size image. Beyond that, your question is not specific enough to know what you really want to do.
Another simple trick to reduce the data size by 33% is to render the image into a RGB565 bitmap instead of RGB24 (if you don't need the full color space).
I don't know of a library that can do this off the shelf, but it's certainly possible.
Lets say your JPEG is using 8x8 pixel MCUs (the units in which pixels are grouped). Lets also say you are reducing by a factor to 12 to 1. The first output pixel needs to be the average of the 12x12 block of pixels at the top left of the input image. To get to the input pixels with a y coordinate greater than 8, you need to have decoded the start of the second row of MCUs. You can't really get to decode those pixels before decoding the whole of the first row of MCUs. In practice, that probably means you'll need to store two rows of decoded MCUs. Still, for a 12000x12000 pixel image (roughly 150 mega pixels) you'd reduce the memory requirements by a factor of 12000/16 = 750. That should be enough for a PC. If you're looking at embedded use, you could horizontally resize the rows of MCUs as you read them, reducing the memory requirements by another factor of 12, at the cost of a little more code complexity.
I'd find a simple jpeg decoder library like Tiny Jpeg Decoder and look at the main loop in the jpeg decode function. In the case of Tiny Jpeg Decoder, the main loop calls decode_MCU, Modify from there. :-)
You've got a bunch of fiddly work to do to make the code work for non 8x8 MCUs and a load more if you want to reduce by a none integer factor. Sounds like fun though. Good luck.