Can anybody explain how rendering differs from rasterization especially in the context of font rendering (why not font rasterization)?
Can rendering be called a special technique (like greyscale rendering and subpixel rendering) before the rasterizer rasterizes the image?
Rendering is a broad term that generally means transforming computer-readable information, for example objects in a 3d scene, to one or more images.
Rasterization is a more specific term that typically means the process of transforming a vector (curve based) image to a rasterized (pixel based) image.
Rendering involves performing the calculations for vectors and shape geometry for the elements to be drawn.
Rasterizing involves converting the rendered vectors and shapes into pixel bit maps for display.
Related
A font rendering library (like say freetype) provides a function that will take an outline font file (like a .ttf) and a character code and produce a bitmap of the corresponding glyph in host memory.
For small text (like say up to 30x30 pixel glyphs) what's the most efficient way to render those glyphs to a Vulkan framebuffer?
Some options I've though about might be:
Render the glyphs with the font rendering library every time on demand, blit them with host code to a single host-side image holding a whole "text box", transfer the host-side image of the text box to a device local image, and then render a quad (like a normal image) using fragment shader / image sampler from the text box to be drawn.
At program startup cycle through all the glyphs host side, render them to glyph bitmaps. Do the same as 1 but blit from the cached glyph bitmaps (takes about 1 MB host memory).
Cache the glyph bitmaps individually into device local images. Rather than bitting host-side, render a quad for each glyph device-side and set the image sampler to the corresponding glyph each time. (Not sure how the draw calls would work? One draw call per glyph with a different combined image sampler every time?)
Cache all the glyph bitmaps into one large device-side image (layed out in a big grid say). Use a single device-side combined image sampler, and push params to describe the subregion that contains the glyph image. One draw call per glyph, updating push params each time.
Like 4 but use a single instanced draw call, and rather than push params use instance-varying input attributes.
Something else?
I mean like, how do common game engines like Unreal or Unity or Godot etc solve this problem? Is there a typical approach or best practice?
First, some considerations:
Rasterizing a glyph at around 30px with freetype might take on the order of 10μs. This is a very small one-time cost, but rendering e.g. 100 glyphs every frame would seriously eat into your frame budget (if we assume the math is as simple as 100 * 10μs == 1ms).
State changes (like descriptor updates) are relatively expensive. Changing the bound descriptor for each character you render has non-negligible cost. This could be limited by batching character draws (draw all the As, then the Bs, etc), but using push constants is typically the fastest.
Instanced drawing with small meshes (such as quads or single triangles) can be very slow on some GPUs, as they will not schedule multiple instances on a single wavefront/warp. If you're rendering a quad with 6 vertices, and a single execution unit can process 64 vertices, you may end up wasting 58/64 = 90.6% of available vertex shading capacity.
This suggests 4 is your best option (although 5 is likely comparable); you can further optimize that approach by caching the results of the draw calls. Imagine you have some menu text:
The first frame it is needed, render all the text to an intermediate image.
Each frame it is needed, make a single draw call textured with the intermediate image. (You could also blit the text if you don't need transparency.)
I have a line art texture applied to an object in 3D space. The default behavior is for the object and the texture to receive perspective scaling based on the perspective model view projection matrix. Is there any established technique to keep the positioning and scaling of the 3D object, while keeping the line width constant relative to the screen? The desired effect is as though a pen (fixed screen width) were used to trace a path on the 3D object.
Would something like SDF-based font rendering help?
Or maybe some kind of projective texture mapping?
Or render the object and texture to a buffer and expand the lines using edge detection?
Unfortunately, I'm using OGL ES 2, so I can't use a geom shader or anything like that.
The solution I came up with is inspired by procedural SDF generation, like #Felipe suggested, combined with Chris Green's Improved Alpha-Tested Magnification for Vector Textures and Special Effects.
Basically I hand draw shapes into textures using pure red, green, and blue. Then I render the scene using those textures, and generate an SDF on the fly in a second render pass. The SDF generation uses Green's algorithm with a small spread to improve performance. The SDF is then passed to a final render pass that thresholds and antialiases the SDF per Green's approach, using fwidth to maintain a constant line weight regardless of the distance of the object to the camera.
Since the original question was just for the approach/concept, I'm not posting an example at the moment. But I'll see if I can put together a shadertoy sometime soon.
You could create the texture procedurally in a fragment shader and use the size of a pixel for interpolations.
See:
FabriceNeyret's blog
This question primarily relates to the dimension parameters (width, height, and layers) in the structure VkFramebufferCreateInfo.
The actual question:
In the case that one or more of the VkImageViews, used in creating a VkFrameBuffer, has dimensions that are larger than those specified in the VkFramebufferCreateInfo used to create the VkFrameBuffer, how does one control which part of that VkImageView is used during a render pass instance?
Alternatively worded question:
I am basically asking in the case that the image is larger (not the same dimensions) than the framebuffer, what defines which part of the image is used (read/write)?
Some Details:
The specification states this is a valid situation (I have seen many people state the attachments used by a framebuffer must match the dimensions of the framebuffer itself, but I can't find support for this in the specification):
Each element of pAttachments must have dimensions at least as large as the corresponding framebuffer dimension.
I want to be clear, that I understand that if I just wanted to draw to part of an image I can use a framebuffer that has the same dimensions as the image, and use viewports and scissors. But scissors and viewports are defined relative to the framebuffer's (0,0) as far as I can tell from the spec, although it is not clear to me.
I'm asking this question to help my understand of the framebuffer as I am certain I have misunderstood something. I feel it may well be the case that (x,y) in framebuffer space, is always (x,y) in image space (As in there is no way of controlling which part of the VkImageView is used).
I have been stuck on this for quite sometime (~4 days), and have tried both the Vulkan: Cookbook and the Vulkan Programming Guide, and read most of the specification, and searched online.
If the question needs clarification, please ask. I just didn't want to make it overly long.
Thank you for reading.
There isn't a way to control which part of the image is used by the framebuffer when the framebuffer is smaller than the image. The framebuffer origin always maps to the image origin.
Allowing attachments to be larger than the framebuffer is only meant to allow reusing memory/images/views for several purposes in a frame even when they don't all need the same dimensions. The typical example is reusing a depth buffer (but not it's contents) for several different render passes. You could accomplish the same thing with memory aliasing, but engines that have to support multiple APIs might find it easier to do it this way.
The way to control where you render to is by controlling the viewport. That is, you specify a framebuffer size that's actually big enough to cover the total area of the target images that you may want to render to, and use the viewport transform/scissoring to render to a specific area of those images.
There is no post-viewport transformation that goes from framebuffer space to image space. That would be decidedly redundant, since we already have a post-NDC transform. There's no point in having two of them.
Sure, VkRenderPassBeginInfo has the renderArea object, but that is more of a promise from the user rather than a guarantee for the system:
The application must ensure (using scissor if necessary) that all rendering is contained within the render area, otherwise the pixels outside of the render area become undefined and shader side effects may occur for fragments outside the render area.
So basically, the implementation doesn't do anything with renderArea. It doesn't set up a transformation or anything; you're just promising that no framebuffer pixels outside of that area will be impacted.
In any case, there's really little point to providing a framebuffer size that's smaller than the images sizes. That sort of thing is more the perview of the renderArea than the framebuffer specification.
When learning to program simple 2D games, each object would have a sprite sheet with little pictures of how a player would look in every frame/animation. 3D models don't seem to work this way or we would need one image for every possible view of the object!
For example, a rotating cube would need a lot images depicting how it would look on every single side. So my question is, how are 3D model "images" represented and rendered by the engine when viewed from arbitrary perspectives?
Multiple methods
There is a number of methods for rendering and storing 3D graphics and models. There are even different methods for rendering 2D graphics! In addition to 2D bitmaps, you also have SVG. SVG uses numbers to define points in an image. These points make shapes. The points can also define curves. This allows you to make images without the need for pixels. The result can be smaller file sizes, in addition to the ability to transform the image (scale and rotate) without causing distortion. Most 3D graphics use a similar technique, except in 3D. What these methods have in common, however, is that they all ultimately render the data to a 2D grid of pixels.
Projection
The most common method for rendering 3D models is projection. All of the shapes to be rendered are broken down into triangles before rendering. Why triangles? Because triangles are guaranteed to be coplanar. That saves a lot of work for the renderer since it doesn't have to worry about "coloring outside of the lines". One drawback to this is that most 3D graphics projection technologies don't support perfect spheres or other round surfaces. You have to use approximations and other tricks to make round surfaces (although there are some renderers which support round surfaces). The next step is to convert or project all of the 3D points into 2D points on the screen (as seen below).
From there, you essentially "color in" the triangles to make everything look solid. While this is pretty fast, another downside is that you can't really have things like reflections and refractions. Anytime you see a refractive or reflective surface in a game, they are only using trickery to make it look like a reflective or refractive material. The same goes for lighting and shading.
Here is an example of special coloring being used to make a sphere approximation look smooth. Notice that you can still see straight lines around the smoothed version:
Ray tracing
You also can render polygons using ray tracing. With this method, you basically trace the paths that the light takes to reach the camera. This allows you to make realistic reflections and refractions. However, I won't go into detail since it is too slow to realistically use in games currently. It is mainly used for 3D animations (like what Pixar makes). Simple scenes with low quality settings can be ray traced pretty quickly. But with complicated, realistic scenes, rendering can take several hours for a single frame (as is the case with Pixar movies). However, it does produce ultra realistic images:
Ray casting
Ray casting is not to be confused with the above-mentioned ray tracing. Ray casting does not trace the light paths. That means that you only have flat surfaces; not reflective. It also does not produce realistic light. However, this can be done relatively quickly, since in most cases you don't even need to cast a ray for every pixel. This is the method that was used for early games such as Doom and Wolfenstein 3D. In early games, ray casting was used for the maps, and the characters and other items were rendered using 2D sprites that were always facing the camera. The sprites were drawn from a few different angles to make them look 3D. Here is an image of Wolfenstein 3D:
Castle Wolfenstein with JavaScript and HTML5 Canvas: Image by Martin Kliehm
Storing the data
3D data can be stored using multiple methods. It is not necessarily dependent on the rendering method that is used. The stored data doesn't mean anything by itself, so you have to render it using one of the methods that have already been mentioned.
Polygons
This is similar to SVG. It is also the most common method for storing model data. You define the geometry using 3D points. These points can have other properties, such as texture data (in the form of UV mapping), color data, and whatever else you might want.
The data can be stored using a number of file formats. A common file format that is used is COLLADA, which is an XML file that stores the 3D data. There are a lot of other formats though. Fundamentally, however, all file formats are still storing the 3D data.
Here is an example of a polygon model:
Voxels
This method is pretty simple. You can think of voxel models like bitmaps, except they are a bunch of bitmaps layered together to make 3D bitmaps. So you have a 3D grid of pixels. One way of rendering voxels is converting the voxel points to 3D cubes. Note that voxels do not have to be rendered as cubes, however. Like pixels, they are only points that may have color data which can be interpreted in different ways. I won't go into much detail since this isn't too common and you generally render the voxels with polygon methods (like when you render them as cubes. Here is an example of a voxel model:
Image by Wikipedia user Vossman
In the 2D world with sprite sheets, you are drawing one of the sprites depending on the state of the actor (visual representation of your object). In the 3D world you are rendering a model for your actor that is a series of polygons with a texture mapped to it. There are standardized model files (I am mostly familiar with Autodesk 3DS Max), in which the model and the assigned textures can be packaged together (a .3DS or .MAX file), providing everything your graphics library needs to render the object and its textures.
In a nutshell, you don't use images for each view of a 3D object, you have a model with a texture rendered on it, creating a dynamic view as it is rendered by the graphics library.
I'm searching for a methods of text recognition based on document borders.
Or the methods that can solve the problem of finding new viewpoint.
For exmp. the camera is in point (x1,y1,z1) and the result picture with perspective distortions, but we can find (x2,y2,z2) for camera to correct picture.
Thanks.
The usual approach, which assumes that the document's page is approximately flat in 3D space, is to warp the quadrangle encompassing the page into a rectangle. To do so you must estimate a homography, i.e. a (linear) projective transformation between the original image and its warped counterpart.
The estimation requires matching points (or lines) between the two images, and a common choice for documents is to map the page corners in the original images to the image corners of the warped image. This will in general produce a rectangle with an incorrect aspect ratio (i.e. the warped page will look "wider" or "taller" than the real one), but this can be easily corrected if you happen to know in advance what the real aspect ratio is (for example, because you know the type of paper used, whether letter, A4, etc.).
A simple algorithm to perform the estimation is the so-called Direct Linear Transformation.
The OpenCV library contains routines to help accomplishing all these tasks, look into it.