currently I am facing issue with performance of YOLOv3 implemented in objective-c/C++ XCode project for MacOS, however the performance is too slow. I do not have much experience with MacOS and XCode so I followed this tutorial. The execution time is around ~0.25 second.
Setup:
I run it on MacBook Pro Intel Core i5 3.1 GHz and graphic Intel Iris Plus Graphic 650 1536MB and the performance is around 4 fps. That's understandable, the GPU is not powerful one and it uses mostly CPU. Accually, it is impresive because it is faster than Pytorch implementation running on CPU. However, I run this example on MacBook pro Intel i7 2.7GHz and AMD Radeon Pro 460 and the performance is only 6 fps.
By this website the performance should be much better. Can you please let me know where I am doing mistake or it the best performance I can get with this setup? Please note that I've checked system monitor and GPU is fully used in both cases.
This is my initialisation:
//loading model
MLModel *model_ml = [[[YOLOv3 alloc] init] model];
float confidencerThreshold = 0.8;
NSMutableArray<Prediction*> *predictions = [[NSMutableArray alloc] init];
VNCoreMLModel *model = [VNCoreMLModel modelForMLModel:model_ml error:nil];
VNCoreMLRequest *request = [[VNCoreMLRequest alloc] initWithModel:model completionHandler:^(VNRequest * _Nonnull request, NSError * _Nullable error){
for(VNRecognizedObjectObservation *observation in request.results)
{
if(observation.confidence > confidencerThreshold){
CGRect rect = observation.boundingBox;
Prediction* prediction = [[Prediction alloc] initWithValues: 0 Confidence: observation.confidence BBox: rect];
[predictions addObject:prediction];
}
}
}];
request.imageCropAndScaleOption = VNImageCropAndScaleOptionScaleFill;
float ratio = height/CGFloat(width);
And my loop implementation
cv::Mat frame;
int i = 0;
while(1){
cap>>frame;
if(frame.empty()){
break;
}
image = CGImageFromCVMat(frame.clone());
VNImageRequestHandler *imageHandler = [[VNImageRequestHandler alloc] initWithCGImage:image options:nil];
NSDate *methodStart = [NSDate date]; //Measuring performance here
NSError *error = nil;
[imageHandler performRequests:#[request] error:&error]; //Call request
if(error){
NSLog(#"%#",error.localizedDescription);
}
NSDate *methodFinish = [NSDate date];
NSTimeInterval executionTime = [methodFinish timeIntervalSinceDate:methodStart]; //get execution time
// draw bounding boxes
for(Prediction *prediction in predictions){
CGRect rect = [prediction getBBox];
cv::rectangle(frame,cv::Point(rect.origin.x * width,(1 - rect.origin.y) * height),
cv::Point((rect.origin.x + rect.size.width) * width, (1 - (rect.origin.y + rect.size.height)) * height),
cv::Scalar(0,255,0), 1,8,0);
}
std::cout<<"Execution time "<<executionTime<<" sec"<<" Frame id: "<<i<<" with size "<<frame.size()<<std::endl;
[predictions removeAllObjects];
}
cap.release();
Thank you.
Set a breakpoint on the line that calls [imageHandler performRequests] and run the app with optimizations disabled. Use the "Step Into" button from the debugger a number of times. Look in the stacktrace for "Espresso".
Does this show something like Espresso::BNNSEngine? Then the model runs on the CPU, not the GPU.
Does the stacktrace show something like Espresso::MPSEngine? Then you're running on the GPU.
My guess is Core ML runs your model on the CPU, not on the GPU.
Related
I'm trying to build a rolling marble type game. I've decided to convert from Cocos3D to SceneKit so I have probably primitive questions about code snippets.
Here is my CMMotionManager setup. Problem is that as I change my device orientation, the gravity direction also changes (not properly adjusting to device orientation). This code only works with Landscape Left orientation.
-(void) setupMotionManager
{
NSOperationQueue *queue = [[NSOperationQueue alloc] init];
motionManager = [[CMMotionManager alloc] init];
[motionManager startAccelerometerUpdatesToQueue:queue withHandler:^(CMAccelerometerData *accelerometerData, NSError *error)
{
CMAcceleration acceleration = [accelerometerData acceleration];
float accelX = 9.8 * acceleration.y;
float accelY = -9.8 * acceleration.x;
float accelZ = 9.8 * acceleration.z;
scene.physicsWorld.gravity = SCNVector3Make(accelX, accelY, accelZ);
}];
}
This code came from a marble demo from apple. I translated it from Swift to Obj-C.
If I want it to work in Landscape Right I need to change last line to
scene.physicsWorld.gravity = SCNVector3Make(-accelX, -accelY, accelZ);
This brings up another question... If Y is Up in SceneKit, why is it the accelZ variable that needs no change? So my question is how does CMMotionManager coordinates relate to Scene coordinates?
When I call startDeviceMotionUpdatesUsingReferenceFrame, then cache a reference to my first reference frame and call multiplyByInverseOfAttitude on all my motion updates after that, I don't get the change from the reference frame that I am expecting. Here is a really simple demonstration of what I'm not understanding.
self.motionQueue = [[NSOperationQueue alloc] init];
self.motionManager = [[CMMotionManager alloc] init];
self.motionManager.deviceMotionUpdateInterval = 1.0/20.0;
[self.motionManager startDeviceMotionUpdatesUsingReferenceFrame: CMAttitudeReferenceFrameXArbitraryZVertical toQueue:self.motionQueue withHandler:^(CMDeviceMotion *motion, NSError *error){
[[NSOperationQueue mainQueue] addOperationWithBlock:^{
CMAttitude *att = motion.attitude;
if(self.motionManagerAttitudeRef == nil){
self.motionManagerAttitudeRef = att;
return;
}
[att multiplyByInverseOfAttitude:self.motionManagerAttitudeRef];
NSLog(#"yaw:%+0.1f, pitch:%+0.1f, roll:%+0.1f, att.yaw, att.pitch, att.roll);
}];
}];
First off, in my application I only really care about pitch and roll. But yaw is in there too to demonstrate my confusion.
Everything works as expected if I put the phone laying on my flat desk, launch the app and look at the logs. All of the yaw, pitch roll values are 0.0, then if I spin the phone 90 degrees without lifting it off the surface only the yaw changes. So all good there.
To demonstrate what I think is the problem... Now put the phone inside of (for example) an empty coffee mug, so that all of the angles are slightly tilted and the direction of gravity would have some fractional value in all axis. Now launch the app and with the code above you would think everything is working because there is again a 0.0 value for yaw, pitch and roll. But now spin the coffee mug 90 degrees without lifting it from the table surface. Why do I see significant change in attitude on all of the yaw, pitch and roll?? Since I cached my initial attitude (which is now my new reference attitude), and called muptiplyByInverseOfAttitude shouldn't I just be getting a change in the yaw only?
I don't really understand why using the attitude multiplied by a cached reference attitude doesn't work... And I don't think it is a gimbal lock problem. But here is what gets me exactly what I need. And if you tried the experiment with the coffee mug I described above, this provides exactly the expected results (i.e. spinning the coffee mug on a flat surface doesn't affect pitch and roll values, and tilting the coffee mug in all other directions now only affects one axis at a time too). Plus instead of saving a reference frame, I just save the reference pitch and roll, then when the app starts, everything is zero'ed out until there is some movement.
So all good now. But still wish I understood why the other method did not work as expected.
self.motionQueue = [[NSOperationQueue alloc] init];
self.motionManager = [[CMMotionManager alloc] init];
self.motionManager.deviceMotionUpdateInterval = 1.0/20.0;
[self.motionManager startDeviceMotionUpdatesUsingReferenceFrame: CMAttitudeReferenceFrameXArbitraryZVertical toQueue:self.motionQueue withHandler:^(CMDeviceMotion *motion, NSError *error)
{
[[NSOperationQueue mainQueue] addOperationWithBlock:^{
if(self.motionManagerAttitude == nil){
CGFloat x = motion.gravity.x;
CGFloat y = motion.gravity.y;
CGFloat z = motion.gravity.z;
refRollF = atan2(y, x) + M_PI_2;
CGFloat r = sqrtf(x*x + y*y + z*z);
refPitchF = acosf(z/r);
self.motionManagerAttitude = motion.attitude;
return;
}
CGFloat x = motion.gravity.x;
CGFloat y = motion.gravity.y;
CGFloat z = motion.gravity.z;
CGFloat rollF = refRollF - (atan2(y, x) + M_PI_2);
CGFloat r = sqrtf(x*x + y*y + z*z);
CGFloat pitchF = refPitchF - acosf(z/r);
//I don't care about yaw, so just printing out whatever the value is in the attitude
NSLog(#"yaw: %+0.1f, pitch: %+0.1f, roll: %+0.1f", (180.0f/M_PI)*motion.attitude.yaw, (180.0f/M_PI)*pitchF, (180.0f/M_PI)*rollF);
}];
}];
Is there a way in AppKit to measure the width of a large number of NSString objects(say a million) really fast? I have tried 3 different ways to do this:
[NSString sizeWithAttributes:]
[NSAttributedString size]
NSLayoutManager (get text width instead of height)
Here are some performance metrics
Count\Mechanism sizeWithAttributes NSAttributedString NSLayoutManager
1000 0.057 0.031 0.007
10000 0.329 0.325 0.064
100000 3.06 3.14 0.689
1000000 29.5 31.3 7.06
NSLayoutManager is clearly the way to go, but the problem being
High memory footprint(more than 1GB according to profiler) because of the creation of heavyweight NSTextStorage objects.
High creation time. All of the time taken is during creation of the above strings, which is a dealbreaker in itself.(subsequently measuring NSTextStorage objects which have glyphs created and laid out only takes about 0.0002 seconds).
7 seconds is still too slow for what I am trying to do. Is there a faster way? To measure a million strings in about a second?
In case you want to play around, Here is the github project.
Here are some ideas I haven't tried.
Use Core Text directly. The other APIs are built on top of it.
Parallelize. All modern Macs (and even all modern iOS devices) have multiple cores. Divide up the string array into several subarrays. For each subarray, submit a block to a global GCD queue. In the block, create the necessary Core Text or NSLayoutManager objects and measure the strings in the subarray. Both APIs can be used safely this way. (Core Text) (NSLayoutManager)
Regarding “High memory footprint”: Use Local Autorelease Pool Blocks to Reduce Peak Memory Footprint.
Regarding “All of the time taken is during creation of the above strings, which is a dealbreaker in itself”: Are you saying all the time is spent in these lines:
double random = (double)arc4random_uniform(1000) / 1000;
NSString *randomNumber = [NSString stringWithFormat:#"%f", random];
Formatting a floating-point number is expensive. Is this your real use case? If you just want to format a random rational of the form n/1000 for 0 ≤ n < 1000, there are faster ways. Also, in many fonts, all digits have the same width, so that it's easy to typeset columns of numbers. If you pick such a font, you can avoid measuring the strings in the first place.
UPDATE
Here's the fastest code I've come up with using Core Text. The dispatched version is almost twice as fast as the single-threaded version on my Core i7 MacBook Pro. My fork of your project is here.
static CGFloat maxWidthOfStringsUsingCTFramesetter(
NSArray *strings, NSRange range) {
NSString *bigString =
[[strings subarrayWithRange:range] componentsJoinedByString:#"\n"];
NSAttributedString *richText =
[[NSAttributedString alloc]
initWithString:bigString
attributes:#{ NSFontAttributeName: (__bridge NSFont *)font }];
CGPathRef path =
CGPathCreateWithRect(CGRectMake(0, 0, CGFLOAT_MAX, CGFLOAT_MAX), NULL);
CGFloat width = 0.0;
CTFramesetterRef setter =
CTFramesetterCreateWithAttributedString(
(__bridge CFAttributedStringRef)richText);
CTFrameRef frame =
CTFramesetterCreateFrame(
setter, CFRangeMake(0, bigString.length), path, NULL);
NSArray *lines = (__bridge NSArray *)CTFrameGetLines(frame);
for (id item in lines) {
CTLineRef line = (__bridge CTLineRef)item;
width = MAX(width, CTLineGetTypographicBounds(line, NULL, NULL, NULL));
}
CFRelease(frame);
CFRelease(setter);
CFRelease(path);
return (CGFloat)width;
}
static void test_CTFramesetter() {
runTest(__func__, ^{
return maxWidthOfStringsUsingCTFramesetter(
testStrings, NSMakeRange(0, testStrings.count));
});
}
static void test_CTFramesetter_dispatched() {
runTest(__func__, ^{
dispatch_queue_t gatherQueue = dispatch_queue_create(
"test_CTFramesetter_dispatched result-gathering queue", nil);
dispatch_queue_t runQueue =
dispatch_get_global_queue(QOS_CLASS_UTILITY, 0);
dispatch_group_t group = dispatch_group_create();
__block CGFloat gatheredWidth = 0.0;
const size_t Parallelism = 16;
const size_t totalCount = testStrings.count;
// Force unsigned long to get 64-bit math to avoid overflow for
// large totalCounts.
for (unsigned long i = 0; i < Parallelism; ++i) {
NSUInteger start = (totalCount * i) / Parallelism;
NSUInteger end = (totalCount * (i + 1)) / Parallelism;
NSRange range = NSMakeRange(start, end - start);
dispatch_group_async(group, runQueue, ^{
double width =
maxWidthOfStringsUsingCTFramesetter(testStrings, range);
dispatch_sync(gatherQueue, ^{
gatheredWidth = MAX(gatheredWidth, width);
});
});
}
dispatch_group_wait(group, DISPATCH_TIME_FOREVER);
return gatheredWidth;
});
}
I'm using CIDetector as follows multiple times:
-(NSArray *)detect:(UIImage *)inimage
{
UIImage *inputimage = inimage;
UIImageOrientation exifOrientation = inimage.imageOrientation;
NSNumber *orientation = [NSNumber numberWithInt:exifOrientation];
NSDictionary *imageOptions = [NSDictionary dictionaryWithObject:orientation forKey:CIDetectorImageOrientation];
CIImage* ciimage = [CIImage imageWithCGImage:inputimage.CGImage options:imageOptions];
NSDictionary *detectorOptions = [NSDictionary dictionaryWithObject:orientation forKey:CIDetectorImageOrientation];
NSArray* features = [self.detector featuresInImage:ciimage options:detectorOptions];
if (features.count == 0)
{
PXLog(#"no face found");
}
ciimage = nil;
NSMutableArray *returnArray = [NSMutableArray new];
for(CIFaceFeature *feature in features)
{
CGRect rect = feature.bounds;
CGRect r = CGRectMake(rect.origin.x,inputimage.size.height - rect.origin.y - rect.size.height,rect.size.width,rect.size.height);
FaceFeatures * ff = [[FaceFeatures new] initWithLeftEye:CGPointMake(feature.leftEyePosition.x, inputimage.size.height - feature.leftEyePosition.y )
rightEye:CGPointMake(feature.rightEyePosition.x, inputimage.size.height - feature.rightEyePosition.y )
mouth:CGPointMake(feature.mouthPosition.x, inputimage.size.height - feature.mouthPosition.y )];
Face *ob = [[Face new] initFaceInRect:r withFaceFeatures:ff] ;
[returnArray addObject:ob];
}
features = nil;
return returnArray;
}
-(CIContext*) context{
if(!_context){
_context = [CIContext contextWithOptions:nil];
}
return _context;
}
-(CIDetector *)detector
{
if (!_detector)
{
// 1 for high 0 for low
#warning not checking for fast/slow detection operation
NSString *str = #"fast";//[SettingsFunctions retrieveFromUserDefaults:#"face_detection_accuracy"];
if ([str isEqualToString:#"slow"])
{
//DDLogInfo(#"faceDetection: -I- Setting accuracy to high");
_detector = [CIDetector detectorOfType:CIDetectorTypeFace context:nil
options:[NSDictionary dictionaryWithObject:CIDetectorAccuracyHigh forKey:CIDetectorAccuracy]];
} else {
//DDLogInfo(#"faceDetection: -I- Setting accuracy to low");
_detector = [CIDetector detectorOfType:CIDetectorTypeFace context:nil
options:[NSDictionary dictionaryWithObject:CIDetectorAccuracyLow forKey:CIDetectorAccuracy]];
}
}
return _detector;
}
but after having various memory issues and according to Instruments it looks like NSArray* features = [self.detector featuresInImage:ciimage options:detectorOptions]; isn't being released
Is there a memory leak in my code?
I came across the same issue and it seems to be a bug (or maybe by design, for caching purposes) with reusing a CIDetector.
I was able to get around it by not reusing the CIDetector, instead instantiating one as needed and then releasing it (or, in ARC terms, just not keeping a reference around) when the detection is completed. There is some cost to doing this, but if you are doing the detection on a background thread as you said, that cost is probably worth it when compared to unbounded memory growth.
Perhaps a better solution would be, if you a detecting multiple images in a row, to create one detector, use it for all (or maybe, if the growth is too large, release & create a new one every N images. You'll have to experiment to see what N should be).
I've filed a Radar bug about this issue with Apple: http://openradar.appspot.com/radar?id=6645353252126720
I have fixed this problem, you should use #autorelease where you invode the detect method, like this in swift
autoreleasepool(invoking: {
let result = self.detect(image: image)
// do other things
})
I'm trying to use QTKit to convert a list of images to a quicktime movie. I've figured out how to do everything except get the frame rate to 29.97. Through other forums and resources, the trick seems to be using something like this:
QTTime frameDuration = QTMakeTime(1001, 30000)
However, all my attempts using this method, or even (1000, 29970) still produce a movie at 30fps. This fps is what shows up when playing with Quicktime player.
Any ideas? Is there some other way to set the frame rate for the entire movie once its created?
Here's some sample code:
NSDictionary *outputMovieAttribs = [NSDictionary dictionaryWithObjectsAndKeys:#"jpeg", QTAddImageCodecType, [NSNumber numberWithLong:codecHighQuality], QTAddImageCodecQuality, nil];
QTTime frameDuration = QTMakeTime(1001, 30000);
QTMovie *outputMovie = [[QTMovie alloc] initToWritableFile:#"/tmp/testing.mov" error:nil];
[outputMovie setAttribute:[NSNumber numberWithBool:YES] forKey:QTMovieEditableAttribute];
[outputMovie setAttribute:[NSNumber numberWithLong:30000] forKey:QTMovieTimeScaleAttribute];
if (!outputMovie) {
printf("ERROR: Chunk: Could not create movie object:\n");
} else {
int frameID = 0;
while (frameID < [framePaths count]) {
NSAutoreleasePool *readPool = [[NSAutoreleasePool alloc] init];
NSData *currFrameData = [NSData dataWithContentsOfFile:[framePaths objectAtIndex:frameID]];
NSImage *currFrame = [[NSImage alloc] initWithData:currFrameData];
if (currFrame) {
[outputMovie addImage:currFrame forDuration:frameDuration withAttributes:outputMovieAttribs];
[outputMovie updateMovieFile];
NSString *newDuration = QTStringFromTime([outputMovie duration]);
printf("new Duration: %s\n", [newDuration UTF8String]);
currFrame = nil;
} else {
printf("ERROR: Could not add image to movie");
}
frameID++;
[readPool drain];
}
}
NSString *outputDuration = QTStringFromTime([outputMovie duration]);
printf("output Duration: %s\n", [outputDuration UTF8String]);
Ok, thanks to your code, I could solve the issue. I was using the development tool called Atom Inpector to see that the data structure looked totally different than the movies I am currently working with. As I said, I never created a movie from images as you do, but it seems that this is not the way to go if you want to have a movie afterwards. QuickTime recognizes the clip as "Photo-JPEG", so not a normal movie file. The reason for this seems to be, that the added pictures are NOT added to a movie track but just somewhere in the movie. This can also be seen with Atom Inspector.
With the "movieTimeScaleAttribute", you set a timeScale that is not used!
To solve the issue I changed the code just a tiny bit.
NSDictionary *outputMovieAttribs = [NSDictionary dictionaryWithObjectsAndKeys:#"jpeg",
QTAddImageCodecType, [NSNumber numberWithLong:codecHighQuality],
QTAddImageCodecQuality,[NSNumber numberWithLong:2997], QTTrackTimeScaleAttribute, nil];
QTTime frameDuration = QTMakeTime(100, 2997);
QTMovie *outputMovie = [[QTMovie alloc] initToWritableFile:#"/Users/flo/Desktop/testing.mov" error:nil];
[outputMovie setAttribute:[NSNumber numberWithBool:YES] forKey:QTMovieEditableAttribute];
[outputMovie setAttribute:[NSNumber numberWithLong:2997] forKey:QTMovieTimeScaleAttribute];
Everything else is unaltered.
Oh, by the way. To print the timeValue and timeScale, you could also do :
NSLog(#"new Duration timeScale : %ld timeValue : %lld \n",
[outputMovie duration].timeScale, [outputMovie duration].timeValue);
This way you can see better if your code does as desired.
Hope that helps!
Best regards
I have never done what you're trying to do, but I can tell you how to get the desired framerate I guess.
If you "ask" a movie for its current timing information, you always get a QTTime structure, which contains the timeScale and the timeValue.
For a 29.97 fps video, you would get a timeScale of 2997 ( for example, see below ).
This is the amount of "units" per second.
So, if the playback position of the movie is currently at exactly 2 seconds, you would get a timeValue of 5994.
The frameDuration is therefore 100, because 2997 / 100 = 29.97 fps.
QuickTime cannot handle float values, so you have to convert all the values to a long value by multiplication.
By the way, you don't have to use 100, you could also use 1000 and a timeScale of 29970, or 200 as frame duration and 5994 timeScale. That's all I can tell you from what you get if you read timing information from already existing clips.
You wrote that this didn't work out for you, but this is how QuickTime works internally.
You should look into it again!
Best regards