std vector T = int, second resize causes crash. when i resize and try to write on the new space it fails after the second resize. why? - crash

I detect if the index i want to write is outside of capacity, if so i resize it
large enough to accommodate the new index. i immediately call myVector.at( iIx ) = newValue
do I have to do something to reset the vector? it works after the first resize,
but not after the second. I expect a venerable standard to work off the shelf, instead of requiring days of research. Years ago i wrote my own resizable array, I guess I will go back to that code soon.
here is the code, which works correctly for the first resize.
if ( iIx >= iCapacity ) { // need to resize
iAddNeeded = miMallocSize;
while ( iIx >= iCapacity + iAddNeeded ) // increases the add by miMallocSize until its enough
iAddNeeded += miMallocSize;
if ( iCapacity + iAddNeeded + miMallocSize >= iSysMax )
iNewSize = iSysMax;
else
iNewSize = iCapacity + iAddNeeded + miMallocSize; // at least miMallocSize extra, no more than 2 miMallocSize extra
resize( iNewSize, 0 ); // this reallocs, AND marks the space with zeros
iCapacity = capacity(); // better be large enough now 221107
} // else { // if ( iIx < iCapacity ) {
if ( iIx >= iCapacity )
Hcx( this, DL5, "T ERROR resize FAIL iCapacity %d index %d", iCapacity, iIx );
else {
at( iIx ) = newElement;
iTest = at( iIx );
} // else {
I expect the vector object to function properly after it is resized. it returns the expected capacity, but then when i write to the new space it crashes.

I found out the behavior of std::vector is unexpected. when you size the vector for unknown reasons the space is not necessarily writable. what this means is capacity is not what you think. capacity is supposed to
mean what has been allocated. but that is not enough to make the space writable. If you are manually coding a dynamic array of ints plain old allocation IS enough to make space writable. You malloc some space and write on it. With std vector you have to also do something to make the new space writable. It is analogous to construction. But ints do NOT need construction. I think the design is weak, at least for atomic types like int. Assuming you are growing, when you use std::vector resize it will realloc the data and initialize SOME of it. The capacity however may exceed what you asked for. The designers do not permit the caller to specify the resulting size. The vector decides how much to allocate. capacity tells you how much is allocated, BUT YOU CANT NECESSARILY WRITE ON IT.

Related

Efficient way of Square of a Sorted Array

I am solving leetcode solution. The question is
Given an integer array nums sorted in non-decreasing order, return an array of the squares of each number sorted in non-decreasing order.
Example 1:
Input: nums = [-4,-1,0,3,10]
Output: [0,1,9,16,100]
Explanation: After squaring, the array becomes [16,1,0,9,100].
After sorting, it becomes [0,1,9,16,100].
Example 2:
Input: nums = [-7,-3,2,3,11]
Output: [4,9,9,49,121]
I solved this through map and then use sorted() for sorting purpose and lastly converted in toIntArray().
My solution
class Solution {
fun sortedSquares(nums: IntArray): IntArray {
return nums.map { it * it }.sorted().toIntArray()
}
}
After all I am taking a look in the discuss success, I found this solution
class Solution {
fun sortedSquares(A: IntArray): IntArray {
// Create markers to use to navigate inward since we know that
// the polar ends are (possibly, but not always) the largest
var leftMarker = 0
var rightMarker = A.size - 1
// Create a marker to track insertions into the new array
var resultIndex = A.size - 1
val result = IntArray(A.size)
// Iterate over the items until the markers reach each other.
// Its likely a little faster to consider the case where the left
// marker is no longer producing elements that are less than zero.
while (leftMarker <= rightMarker) {
// Grab the absolute values of the elements at the respective
// markers so they can be compared and inserted into the right
// index.
val left = Math.abs(A[leftMarker])
val right = Math.abs(A[rightMarker])
// Do checks to decide which item to insert next.
result[resultIndex] = if (right > left) {
rightMarker--
right * right
} else {
leftMarker++
left * left
}
// Once the item is inserted we can update the index we want
// to insert at next.
resultIndex--
}
return result
}
}
The guy also mention in the title Kotlin -- O(n), 95% time, 100% space
So my solution is equal in time and space complexity with other solution with efficient time and space? Or Is there any better solution?
So my solution is equal in time and space complexity with other solution with efficient time and space?
No, your solution runs in O(n log n) time, as it relies on sorted(), which likely runs in O(n log n). Since the alternative solution does not sort the items, it indeed runs on O(n) time. Both solutions use O(n) space, although your solution uses three times as much space (each of map, sorted and toIntArray create a copy of the input).

Why am I getting such a large alignment memory requirement for an image?

I create an image in Vulkan and I get an alignment requirement in the memory requirements of 131072. This seems like an enormous alignment and I'm not sure why anything bigger than 128 or 256 may be needed. It's so big that my memory allocation algorithm can't even handle it, and will never be able to practically handle it given that each allocation of this strict an alignment will waste too much space. What's the deal behind this? Here is how I create the image:
VkImageCreateInfo create_info{};
create_info.sType = VK_STRUCTURE_TYPE_IMAGE_CREATE_INFO;
create_info.imageType = VK_IMAGE_TYPE_2D;
create_info.pNext = nullptr;
create_info.sharingMode = VK_SHARING_MODE_EXCLUSIVE;
create_info.samples = VkSampleCountFlagBits::VK_SAMPLE_COUNT_1_BIT;
create_info.queueFamilyIndexCount = 0;
image_create_info.extent.width = 1716;
image_create_info.extent.height = 1731;
image_create_info.extent.depth = 1;
image_create_info.usage = VkImageUsageFlagBits::VK_IMAGE_USAGE_SAMPLED_BIT;
image_create_info.tiling = VkImageTiling::VK_IMAGE_TILING_OPTIMAL;
image_create_info.initialLayout = VkImageLayout::VK_IMAGE_LAYOUT_UNDEFINED;
image_create_info.flags = 0;
image_create_info.mipLevels = 1;
image_create_info.format = VK_FORMAT_R8G8B8A8_UINT;
image_create_info.arrayLayers = 1;
VkImage vk_image;
VkResult result = vkCreateImage((VkDevice)VK::logicalDevice, &image_create_info, nullptr, &vk_image);
VkMemoryRequirements requirements;
vkGetImageMemoryRequirements(VK::logicalDevice, vk_image, &requirements);
Another interesting thing about the requirements returned by the function is that the memory size requirement for format VK_FORMAT_R8G8B8A8_UINT is about 12 mb, which makes sense, but with a format of VK_FORMAT_R8G8B8_UINT (so without the alpha channel), it gives a size requirement of only 3 mb, about a quarter of the size. Have I run into some sort of bug?
I know the dimensions of the image I created aren't power of two, but surely this shouldn't lead to such strange behaviour, should it?
It's so big that my memory allocation algorithm can't even handle it and will never be able to practically handle it given that each allocation of this strict an alignment will waste too much space.
Then fix that.
Implementations are allowed to require all kinds of alignments, especially for optimally-tiled images. 128KiB alignment is hardly unreasonable for images. So your sub-allocator needs to be able to account for this.
As for "waste too much space," perhaps you should take another look at those numbers. The example texture must take up at least 11'881'584 bytes. 128KiB is slightly more than 1% of that storage. That's not a lot of waste.

why are objects clipping behind each other?

I'm making a script that sorts the depth for my objects by prioritizing the y variable, but then afterwards checks to see if the objects that are touching each other have a higher depth the further to the right they are, but for some reason the last part isn't working.
Here's the code:
ds_grid_sort(_dg,1,true);
_yy = 0;
repeat _inst_num
{
_inst = _dg[# 0, _yy];
with _inst
{
with other
{
if (x > _inst.x and y = _inst.y)
{
_inst.depth = depth + building_space;
}
}
}
_yy++;
}
I've identified that the problem is that nothing comes out as true when the game checks the y = _inst.y part of the _inst statement, but that doesn't make any sense seeing how they're all at the same y coordinate. Could someone please tell me what I'm doing wrong?
As Steven mentioned, it's good practice to use double equal signs for comparisons (y == _inst.y) and a single equals sign for assignments (_yy = 0;), but GML doesn't care if you use a single equals sign for comparison, so it won't be causing your issue. Though it does matter in pretty much every other language besides GML.
From what I understand, the issue seems to be your use of other. When you use the code with other, it doesn't iterate through all other objects, it only grabs one instance. You can test this by running this code and seeing how many debug messages it shows:
...
with other
{
show_debug_message("X: "+string(x)+"; Y: "+string(y));
...
You could use with all. That will iterate through all objects or with object, where object is either an object or parent object. That will iterate through all instances of that object. However, neither of these functions check whether the objects overlap (it's just going to iterate over all of them), so you'll have to check for collisions. You could do something like this:
...
with all
{
if place_meeting(x, y, other)
{
if (x > _inst.x and y = _inst.y)
{
_inst.depth = depth + building_space;
}
}
...
I don't know what the rest of your code looks like, but there might be an easier way to achieve your goal. Is it possible to initially set the depth based on both the x and y variables? Something such as depth = -x-y;? For people not as familiar with GameMaker, objects with a smaller depth value are drawn above objects with higher depth values; that is why I propose setting the depth to be -x-y. Below is what a view of that grid would look like (first row and column are x and y variables; the other numbers would be the depth of an object at that position):
Having one equation that everything operates on will also make it so that if you have anything moving (such as a player), you can easily and efficiently update their depth to be able to display them correctly relative to all the other objects.
I think it should be y == _inst.y.
But I'm not sure as GML tends to accept such formatting.
It's a better practise to use == to check if they're equal when using conditions.

CUDA: while loop index correctness

This kernel is doing the right thing giving me the correct result. My problem is more in correctness of the while loop if I want to improve the performance. I tried several configuration of blocks and threads but if i'm going to change them, the while loop won't give me the correct result.
The results i obtained changing the configuration of the kernel are that firstArray and secondArray won't be filled completely (they will have 0 inside the cells). Both arrays must be filled with the curValue obtained from the if loop.
Any advice is welcomed :)
Thank you in advance
#define N 65536
__global__ void whileLoop(int* firstArray_device, int* secondArray_device)
{
int curValue = 0;
int curIndex = 1;
int i = (threadIdx.x)+2;
while(i < N) {
if (i % curIndex == 0) {
curValue = curValue + curIndex;
curIndex *= 2;
}
firstArray_device[i] = curValue;
secondArray_device[i] = curValue;
i += blockDim.x * gridDim.x;
}
}
int main(){
firstArray_host[0] = 0;
firstArray_host[1] = 1;
secondArray_host[0] = 0;
secondArray_host[1] = 1;
// memory allocation + copy on GPU
// definition number of blocks and threads
dim3 dimBlock(1, 1);
dim3 dimGrid(1, 1);
whileLoop<<<dimGrid, dimBlock>>>(firstArray_device, secondArray_device);
// copy back to CPU + free memory
}
You have a data dependency issue here which hinders you to do some meaningful optimization. The variables curValue and curIndex are changed within the while loop and feed forward into the next run. As soon as you try to optimize the loop you will find you in a situation where this variables have different states and the result is changed.
I do not really know what you try to achieve, but try to make the while loop indepdent to the values of a former run of the loop to avoid the dependencies. Try to separate the data into threads and data chunks in a way that the indizes and values are calculated on the environment states like threadIdx, blockDim, gridDim...
Also try to avoid conditional loops. It is better to use for loops with a constant number of runs. This is also easier to optimize.
A few things:
You left out the code you used to declare your global arrays on the
device. It would be helpful to have this info.
Your algorithm is
not thread-safe when multiple blocks are used. In other words, if you are running multiple
blocks, not only would they be doing redundant work (thus giving
you no gains), but they would also likely at some point try to write
to the same global memory locations, creating errors.
Your code is thus
correct when only one block is used, but this makes it rather pointless ... you're running a serial, or lightly-threaded operation on a parallel device. You cannot run on all your available resources (multiple blocks on multiple SMPs without memory conflicts (see below)...
Currently there are two main issues with this code from a parallel standpoint:
int i = (threadIdx.x)+2; ...yields a starting index of 2 for a
single thread; 2 and 3 for two threads in a single block, and so on. I doubt this is
what you want as the first two positions (0, 1) are never getting
addressed. (Remember, arrays start at index 0 in C.)
Further, if you include multiple blocks (say 2 blocks
each with one thread) then you would have multiple duplicate indices
(e.g. for 2 b x 1 t --> indices b1t1: 2, b1t2: 2), which when you used the index
to write to global memory would create conflicts and errors. Doing something like int i = threadIdx.x + blockDim.x * blockIdx.x; would be the typical way to correctly calculate your indices so as to avoid this issue.
Your
final expression i += blockDim.x * gridDim.x; is okay, because its
adds a number equivalent to the total # of threads to i and thus
does not create additional clashing or overlap.
Why use the GPU to shuffle memory and do a trivial computation? You may not see much speedup versus a fast CPU, when you factor in the time to take your arrays onto and off of the device.
Work on problems 1 and 2 if you wish, but beyond that consider your overall goal and what exactly kind of algorithm you are trying to optimize and come up with a more parallel-friendly solution -- or consider whether GPU computing really makes sense for your problem.
To parallelize this algorithm, you need to come up with a formula that can directly calculate the value for a given index in the array. So, pick a random index within the range of the array, then consider what the factors are that go into determining what the value will be for that location. After finding a formula, test it by comparing output values for random indexes with the calculated values from your serial algorithm. When that is correct, create a kernel that starts out by selecting an unique index based on it's thread and block indexes. Then calculate the value for that index and store it in the corresponding index in the array.
A trivial example:
Serial:
__global__ void serial(int* array)
{
int j(0);
for (int i(0); i < 1024; ++i) {
array[i] = j;
j += 5;
}
int main() {
dim3 dimBlock(1);
dim3 dimGrid(1);
serial<<<dimGrid, dimBlock>>>(array);
}
Parallel:
__global__ void parallel(int* array)
{
int i(threadIdx.x + blockDim.x * blockIdx.x);
int j(i * 5);
array[i] = j;
}
int main(){
dim3 dimBlock(256);
dim3 dimGrid(1024 / 256);
parallel<<<dimGrid, dimBlock>>>(array);
}

Getting any point along an NSBezier path

For a program I'm writing, I need to be able to trace a virtual line (that is not straight) that an object must travel along. I was thinking to use NSBezierPath to draw the line, but I cannot find a way to get any point along the line, which I must do so I can move the object along it.
Can anyone suggest a way to find a point along an NSBezierPath? If thats not possible, can anyone suggest a method to do the above?
EDIT: The below code is still accurate, but there are much faster ways to calculate it. See Introduction to Fast Bezier and Even Faster Bezier.
There are two ways to approach this. If you just need to move something along the line, use a CAKeyframeAnimation. This is pretty straightforward and you never need to calculate the points.
If on the other hand you actually need to know the point for some reason, you have to calculate the Bézier yourself. For an example, you can pull the sample code for Chapter 18 from iOS 5 Programming Pushing the Limits. (It is written for iOS, but it applies equally to Mac.) Look in CurvyTextView.m.
Given control points P0_ through P3_, and an offset between 0 and 1 (see below), pointForOffset: will give you the point along the path:
static double Bezier(double t, double P0, double P1, double P2,
double P3) {
return
pow(1-t, 3) * P0
+ 3 * pow(1-t, 2) * t * P1
+ 3 * (1-t) * pow(t, 2) * P2
+ pow(t, 3) * P3;
}
- (CGPoint)pointForOffset:(double)t {
double x = Bezier(t, P0_.x, P1_.x, P2_.x, P3_.x);
double y = Bezier(t, P0_.y, P1_.y, P2_.y, P3_.y);
return CGPointMake(x, y);
}
NOTE: This code violates one of my cardinal rules of always using accessors rather than accessing ivars directly. It's because in it's called many thousands of times, and eliminating the method call has a significant performance impact.
"Offset" is not a trivial thing to work out. It does not proceed linearly along the curve. If you need evenly spaced points along the curve, you'll need to calculate the correct offset for each point. This is done with this routine:
// Simplistic routine to find the offset along Bezier that is
// aDistance away from aPoint. anOffset is the offset used to
// generate aPoint, and saves us the trouble of recalculating it
// This routine just walks forward until it finds a point at least
// aDistance away. Good optimizations here would reduce the number
// of guesses, but this is tricky since if we go too far out, the
// curve might loop back on leading to incorrect results. Tuning
// kStep is good start.
- (double)offsetAtDistance:(double)aDistance
fromPoint:(CGPoint)aPoint
offset:(double)anOffset {
const double kStep = 0.001; // 0.0001 - 0.001 work well
double newDistance = 0;
double newOffset = anOffset + kStep;
while (newDistance <= aDistance && newOffset < 1.0) {
newOffset += kStep;
newDistance = Distance(aPoint,
[self pointForOffset:newOffset]);
}
return newOffset;
}
I leave Distance() as an exercise for the reader, but it's in the example code of course.
The referenced code also provides BezierPrime() and angleForOffset: if you need those. Chapter 18 of iOS:PTL covers this in more detail as part of a discussion on how to draw text along an arbitrary path.