Corrupt when using Google Play Services Face Detection in Picasso Transformation. (Fatal Signal) - google-play-services

I'm writing a face crop transformation of Picasso.
This is to make the face in image would be shown completely when Picasso cropping an image.
So I implement a Picasso Transformation and using Face Detection in the function public Bitmap transform(Bitmap source).
import com.squareup.picasso.Transformation;
....
public class FaceCrop implements Transformation {
...
#Override public Bitmap transform(Bitmap source) {
FaceDetector mFaceDetector = new FaceDetector.Builder(mContext).setTrackingEnabled(false).build();
if (!mFaceDetector.isOperational()) {
return source;
}
Frame frame = new Frame.Builder().setBitmap(source).build();
SparseArray<Face> faces = mFaceDetector.detect(frame); // sometimes crash here
...
RectF rect = null;
for (int i = 0; i < faces.size(); i++) {
Face face = faces.valueAt(i);
float x = face.getPosition().x;
float y = face.getPosition().y;
RectF faceRect = new RectF(x, y, x + face.getWidth(), y + face.getHeight());
if (rect == null) {
rect = faceRect;
continue;
}
rect.union(faceRect);
}
mFaceDetector.release();
...
Bitmap bitmap = Bitmap.createBitmap(mWidth, mHeight, source.getConfig());
Canvas canvas = new Canvas(bitmap);
canvas.drawBitmap(source, null, targetRect, null);
source.recycle();
return bitmap;
}
}
For most images, it works perfectly. Nothing wrong.
But for some specific images, sometimes it works fine, sometimes it crashes at SparseArray<Face> faces = mFaceDetector.detect(frame);. (it's not always crash for one image.)
I tried to use try and catch block to catch this error, it still crash anyway. So I can't skip it.
And the error messages are different every time, but they are similar.
They are all like "Fatal signal x (SIGABRT)...".
Here are 4 examples :
1.
invalid address or address of corrupt block 0xba362ba0 passed to dlfree
Fatal signal 11 (SIGSEGV), code 1, fault addr 0xdeadbaad in tid 10651 (Picasso-/extern)
2.
Fatal signal 11 (SIGSEGV), code 2, fault addr 0x93948000 in tid 16443 (Picasso-/extern)
3.
Fatal signal 11 (SIGSEGV), code 1, fault addr 0x605f5e6a in tid 17951 (GCDaemon)
4.
heap corruption detected by tmalloc_large
Fatal signal 6 (SIGSEGV), code -6 in tid 19020 (Picasso-/extern)
I am sure the parameter frame of function detect is not null.
I checked those images which sometimes cause the crash, I can't see anything wrong with them.
They looks normal, and they are not very large files (smaller than 1M).
Some with faces, some are not.
And they all can be decoded normally by Picasso with Picasso's center crop.
Can anyone give me some suggestion?
Thank you so much!!

There is a known issue with small images, and this will be fixed in a future release. But fortunately, there is a patch available on GitHub now:
https://github.com/googlesamples/android-vision/issues/26
If you are still getting this issue even with that patch, please print out some of the image sizes and post here to help us to debug.

Related

Android OpenGLES remaining() < size < needed

At first I am not used to develop apps. Until now I only used LWJGL for computer games, so don't be too strict to me :)
I tried to implement a VAO class, which cares about all this VAO stuff (loading VAO's, VBO's, binding indices buffers etc.)
The problematic code snippet is the one in the bindIndicesBuffer(int[] indices) function.
Vao.java:
private void bindIndicesBuffer(int[] indices){
int vboID =genBuffers();
vbos.add(vboID);
GLES20.glBindBuffer(GLES20.GL_ELEMENT_ARRAY_BUFFER, vboID);
IntBuffer buffer = Tools.makeIntBuffer(indices);
GLES20.glBufferData(GLES20.GL_ELEMENT_ARRAY_BUFFER, indices.length*4, buffer, GLES20.GL_STATIC_DRAW);
}
Additional Code:
private int genBuffers(){
int[] ids = new int[1];
GLES20.glGenBuffers(1, ids, 0);
return ids[0];
}
Tools.java:
public static IntBuffer makeIntBuffer(int[] arr){
ByteBuffer bb = ByteBuffer.allocateDirect(arr.length *4).order(ByteOrder.nativeOrder());
IntBuffer ib = bb.asIntBuffer();
ib.put(arr);
ib.position(0);
ib.flip();
return ib;
}
The Main activity class calls a sub class of GLSurfaceView, which creates the MasterRenderer which implements GLSurfaceView.Renderer and calls the render methods of the seperate renderers in onDrawFrame();
I get the following error:
E/libEGL: call to OpenGL ES API with no current context (logged once per thread)
D/AndroidRuntime: Shutting down VM
E/AndroidRuntime: FATAL EXCEPTION: main
...
Caused by: java.lang.IllegalArgumentException: remaining() < size < needed
At the line "GLES20.glBufferData(GLES20.GL_ELEMENT_ARRAY_BUFFER, indices.length*4, buffer, GLES20.GL_STATIC_DRAW);"
And now the magic question: what did I do wrong? :D
Thank you for your time!

How to correctly use ImageReader with YUV_420_888 and MediaCodec to encode video to h264 format?

I'm implementing a camera application on Android devices. Currently, I use Camera2 API and ImageReader to get image data in YUV_420_888 format, but I don't know how to exactly write these data to MediaCodec.
Here are my questions:
What is YUV_420_888?
The format YUV_420_888 is ambiguous because it can be any format which belongs to the YUV420 family, such as YUV420P, YUV420PP, YUV420SP and YUV420PSP, right?
By accessing the image's three planes(#0, #1, #2), I can get the Y(#0), U(#1), V(#2) values of this image. But the arrangement of these values may not be the same on different devices. For example, if YUV_420_888 truly means YUV420P, the size of both Plane#1 and Plane#2 is a quarter of the size of Plane#0. If YUV_420_888 truly means YUV420SP, the size of both Plane#1 and Plane#2 is half of the size of Plane#0(Each of Plane#1 and Plane#2 contains U, V values).
If I want to write these data from image's three planes to MediaCodec, what kind of format I need to convert to? YUV420, NV21, NV12, ...?
What is COLOR_FormatYUV420Flexible?
The format COLOR_FormatYUV420Flexible is also ambiguous because it can be any format which belongs to the YUV420 family, right? If I set KEY_COLOR_FORMAT option of a MediaCodec object to COLOR_FormatYUV420Flexible, what format(YUV420P, YUV420SP...?) of data should I input to the MediaCodec object?
How about using COLOR_FormatSurface?
I know MediaCodec has its own surface, which can be used if I set KEY_COLOR_FORMAT option of a MediaCodec object to COLOR_FormatSurface. And with Camera2 API, I don't need to write any data by myself to the MediaCodec object. I can just drain the output buffer.
However, I need to change the image from the camera. For example, draw other pictures, write some text on it, or insert another video as POP(Picture of Picture).
Can I use ImageReader to read the image from Camera, and after re-drawing that, write the new data to MediaCodec's surface, and then drain it out? How to do that?
EDIT1
I implemented the function by using COLOR_FormatSurface and RenderScript. Here is my code:
onImageAvailable method:
public void onImageAvailable(ImageReader imageReader) {
try {
try (Image image = imageReader.acquireLatestImage()) {
if (image == null) {
return;
}
Image.Plane[] planes = image.getPlanes();
if (planes.length >= 3) {
ByteBuffer bufferY = planes[0].getBuffer();
ByteBuffer bufferU = planes[1].getBuffer();
ByteBuffer bufferV = planes[2].getBuffer();
int lengthY = bufferY.remaining();
int lengthU = bufferU.remaining();
int lengthV = bufferV.remaining();
byte[] dataYUV = new byte[lengthY + lengthU + lengthV];
bufferY.get(dataYUV, 0, lengthY);
bufferU.get(dataYUV, lengthY, lengthU);
bufferV.get(dataYUV, lengthY + lengthU, lengthV);
imageYUV = dataYUV;
}
}
} catch (final Exception ex) {
}
}
Convert YUV_420_888 to RGB:
public static Bitmap YUV_420_888_toRGBIntrinsics(Context context, int width, int height, byte[] yuv) {
RenderScript rs = RenderScript.create(context);
ScriptIntrinsicYuvToRGB yuvToRgbIntrinsic = ScriptIntrinsicYuvToRGB.create(rs, Element.U8_4(rs));
Type.Builder yuvType = new Type.Builder(rs, Element.U8(rs)).setX(yuv.length);
Allocation in = Allocation.createTyped(rs, yuvType.create(), Allocation.USAGE_SCRIPT);
Type.Builder rgbaType = new Type.Builder(rs, Element.RGBA_8888(rs)).setX(width).setY(height);
Allocation out = Allocation.createTyped(rs, rgbaType.create(), Allocation.USAGE_SCRIPT);
Bitmap bmpOut = Bitmap.createBitmap(width, height, Bitmap.Config.ARGB_8888);
in.copyFromUnchecked(yuv);
yuvToRgbIntrinsic.setInput(in);
yuvToRgbIntrinsic.forEach(out);
out.copyTo(bmpOut);
return bmpOut;
}
MediaCodec:
mediaFormat.setInteger(MediaFormat.KEY_COLOR_FORMAT, MediaCodecInfo.CodecCapabilities.COLOR_FormatSurface);
...
mediaCodec.configure(mediaFormat, null, null, MediaCodec.CONFIGURE_FLAG_ENCODE);
...
surface = mediaCodec.createInputSurface(); // This surface is not used in Camera APIv2. Camera APIv2 uses ImageReader's surface.
And in athother thread:
while (!stop) {
final byte[] image = imageYUV;
// Do some yuv computation
Bitmap bitmap = YUV_420_888_toRGBIntrinsics(getApplicationContext(), width, height, image);
Canvas canvas = surface.lockHardwareCanvas();
canvas.drawBitmap(bitmap, matrix, paint);
surface.unlockCanvasAndPost(canvas);
}
This way works, but the performance is not good. It can't output 30fps video files(only ~12fps). Perhaps I should not use COLOR_FormatSurface and the surface's canvas for encoding. The computed YUV data should be written to the mediaCodec directly without any surface doing any conversion. But I still don't know how to do that.
You are right, YUV_420_888 is a format that can wrap different YUV 420 formats. The spec carefully explains that the arrangement of U and V planes is not prescribed, but there are certain restrictions; e.g. if the U plane has pixel stride 2, same applies to V (and then the underlying byte buffer can be NV21).
COLOR_FormatYUV420Flexible is a synonym of YUV_420_888, but they belong to different classes: MediaCodec and ImageFormat, respectively.
The spec explains:
All video codecs support flexible YUV 4:2:0 buffers since LOLLIPOP_MR1.
COLOR_FormatSurface is an opaque format that can deliver best performance for MediaCodec, but this comes at price: you cannot directly read or manipulate its content. If you need to manipulate the data that goes to the MediaCodec, then using ImageReader is an option; whether it will be more efficient than ByteBuffer, depends on what you do and how you do it. Note that for API 24+ you can work with both camera2 and MediaCodec in C++.
The invaluable resource of details for MediaCodec is http://www.bigflake.com/mediacodec. It references a full example of 264 encoding.
Create a textureID -> SurfaceTexture -> Surface -> Camera 2 -> onFrameAvaliable -> updateTexImage -> glBindTexture -> draw something -> swapbuffer to Mediacodec's inputSurface.

How to setup "trigger method" for when a float variable reaches a certain value

I've been working with a BLE-042 Cypress module that sends voltage data (0V-3.3V) in Float32 form, which I capture via delegate methods in my iOS application, and print to my screen (via a rectangle with the value on it, and a dynamic line graph using Core Plot 2.2). For this particular project, if the Float32 voltage > 2.0V, then I want to make that rectangle with the value blink to issue a warning.
The Core Plot graph has to run on the main thread, and blinking cannot interrupt the flow of capturing data from the BLE delegate methods. I want the rectangle to blink without making the data out-out-sync with reality.
Here's the delegate method for capturing my data:
- (void)getTransducerData:(CBCharacteristic*)characteristic error:(NSError*)error {
NSData* data = [characteristic value];
const uint8_t *reportData = [data bytes];
uint32_t v0 = (uint32_t)reportData[0];
uint32_t v1 = (uint32_t)reportData[1] << 8;
uint32_t v2 = (uint32_t)reportData[2] << 16;
uint32_t v3 = (uint32_t)reportData[3] << 24;
uint32_t voltage = v0 | v1 | v2 | v3;
if ((characteristic.value) || !error) {
self.transducerValue = voltage;
_real = voltage/1000000.0;
self.transducerPosition.text = [NSString stringWithFormat:#"%0.4lf V", _real];
}
}
I was thinking of applying an if condition right after _real has the new value:
_real = voltage/1000000.0;
if (_real > 2.0) {
// start blinking rectangle
}
The problem is I don't want that if statement in there at all. I just want the // start blinking rectangle part to start based on an event condition....not because any if statement is telling it to blink.
Any suggestions as to how I can get this done? If you think that the animation of blinking will not be "a heavy load" to be applied within an if statement condition, then let me know. Also, as a side note, I have an ADC sampling at 70000 samples/sec, and the getTransducerData:error: is really pumping out values.
I'm using Objective-C with Xcode 7.3.1

Microsoft Kinect and background/environmental noise

I am currently programming with the Microsoft Kinect for Windows SDK 2 on Windows 8.1. Things are going well, and in a home dev environment obviously there is not much noise in the background compared to the 'real world'.
I would like to seek some advice from those with experience in 'real world' applications with the Kinect. How does Kinect (especially v2) fare in a live environment with passers-by, onlookers and unexpected objects in the background? I do expect, in the space from the Kinect sensor to the user there will usually not be interference however - what I am very mindful of right now is the background noise as such.
While I am aware that the Kinect does not track well under direct sunlight (either on the sensor or the user) - are there certain lighting conditions or other external factors I need to factor into the code?
The answer I am looking for is:
What kind of issues can arise in a live environment?
How did you code or work your way around it?
Outlaw Lemur has descibed in detail most of the issues you may encounter in real-world scenarios.
Using Kinect for Windows version 2, you do not need to adjust the motor, since there is no motor and the sensor has a larger field of view. This will make your life much easier.
I would like to add the following tips and advice:
1) Avoid direct light (physical or internal lighting)
Kinect has an infrared sensor that might be confused. This sensor should not have direct contact with any light sources. You can emulate such an environment at your home/office by playing with an ordinary laser pointer and torches.
2) If you are tracking only one person, select the closest tracked user
If your app only needs one player, that player needs to be a) fully tracked and b) closer to the sensor than the others. It's an easy way to make participants understand who is tracked without making your UI more complex.
public static Body Default(this IEnumerable<Body> bodies)
{
Body result = null;
double closestBodyDistance = double.MaxValue;
foreach (var body in bodies)
{
if (body.IsTracked)
{
var position = body.Joints[JointType.SpineBase].Position;
var distance = position.Length();
if (result == null || distance < closestBodyDistance)
{
result = body;
closestBodyDistance = distance;
}
}
}
return result;
}
3) Use the tracking IDs to distinguish different players
Each player has a TrackingID property. Use that property when players interfere or move at random positions. Do not use that property as an alternative to face recognition though.
ulong _trackinfID1 = 0;
ulong _trackingID2 = 0;
void BodyReader_FrameArrived(object sender, BodyFrameArrivedEventArgs e)
{
using (var frame = e.FrameReference.AcquireFrame())
{
if (frame != null)
{
frame.GetAndRefreshBodyData(_bodies);
var bodies = _bodies.Where(b => b.IsTracked).ToList();
if (bodies != null && bodies.Count >= 2 && _trackinfID1 == 0 && _trackingID2 == 0)
{
_trackinfID1 = bodies[0].TrackingId;
_trackingID2 = bodies[1].TrackingId;
// Alternatively, specidy body1 and body2 according to their distance from the sensor.
}
Body first = bodies.Where(b => b.TrackingId == _trackinfID1).FirstOrDefault();
Body second = bodies.Where(b => b.TrackingId == _trackingID2).FirstOrDefault();
if (first != null)
{
// Do something...
}
if (second != null)
{
// Do something...
}
}
}
}
4) Display warnings when a player is too far or too close to the sensor.
To achieve higher accuracy, players need to stand at a specific distance: not too far or too close to the sensor. Here's how to check this:
const double MIN_DISTANCE = 1.0; // in meters
const double MAX_DISTANCE = 4.0; // in meters
double distance = body.Joints[JointType.SpineBase].Position.Z; // in meters, too
if (distance > MAX_DISTANCE)
{
// Prompt the player to move closer.
}
else if (distance < MIN_DISTANCE)
{
// Prompt the player to move farther.
}
else
{
// Player is in the right distance.
}
5) Always know when a player entered or left the scene.
Vitruvius provides an easy way to understand when someone entered or left the scene.
Here is the source code and here is how to use it in your app:
UsersController userReporter = new UsersController();
userReporter.BodyEntered += UserReporter_BodyEntered;
userReporter.BodyLeft += UserReporter_BodyLeft;
userReporter.Start();
void UserReporter_BodyEntered(object sender, UsersControllerEventArgs e)
{
// A new user has entered the scene. Get the ID from e param.
}
void UserReporter_BodyLeft(object sender, UsersControllerEventArgs e)
{
// A user has left the scene. Get the ID from e param.
}
6) Have a visual clue of which player is tracked
If there are a lot of people surrounding the player, you may need to show on-screen who is tracked. You can highlight the depth frame bitmap or use Microsoft's Kinect Interactions.
This is an example of removing the background and keeping the player pixels only.
7) Avoid glossy floors
Some floors (bright, glossy) may mirror people and Kinect may confuse some of their joints (for example, Kinect may extend your legs to the reflected body). If you can't avoid glossy floors, use the FloorClipPlane property of your BodyFrame. However, the best solution would be to have a simple carpet where you expect people to stand. A carpet would also act as an indication of the proper distance, so you would provide a better user experience.
I created an application for home use like you have before, and then presented that same application in a public setting. The result was embarrassing for me, because there were many errors that I would never have anticipated within a controlled environment. However that did help me because it led me to add some interesting adjustments to my code, which is centered around human detection only.
Have conditions for checking the validity of a "human".
When I showed my application in the middle of a presentation floor with many other objects and props, I found that even chairs could be mistaken for people for brief moments, which led to my application switching between the user and an inanimate object, causing it to lose track of the user and lost their progress. To counter this or other false-positive human detections, I added my own additional checks for a human. My most successful method was comparing the proportions of a humans body. I implemented this measured in head units. (head units picture) Below is code of how I did this (SDK version 1.8, C#)
bool PersonDetected = false;
double[] humanRatios = { 1.0f, 4.0, 2.33, 3.0 };
/*Array indexes
* 0 - Head (shoulder to head)
* 1 - Leg length (foot to knee to hip)
* 2 - Width (shoulder to shoulder center to shoulder)
* 3 - Torso (hips to shoulder)
*/
....
double[] currentRatios = new double[4];
double headSize = Distance(skeletons[0].Joints[JointType.ShoulderCenter], skeletons[0].Joints[JointType.Head]);
currentRatios[0] = 1.0f;
currentRatios[1] = (Distance(skeletons[0].Joints[JointType.FootLeft], skeletons[0].Joints[JointType.KneeLeft]) + Distance(skeletons[0].Joints[JointType.KneeLeft], skeletons[0].Joints[JointType.HipLeft])) / headSize;
currentRatios[2] = (Distance(skeletons[0].Joints[JointType.ShoulderLeft], skeletons[0].Joints[JointType.ShoulderCenter]) + Distance(skeletons[0].Joints[JointType.ShoulderCenter], skeletons[0].Joints[JointType.ShoulderRight])) / headSize;
currentRatios[3] = Distance(skeletons[0].Joints[JointType.HipCenter], skeletons[0].Joints[JointType.ShoulderCenter]) / headSize;
int correctProportions = 0;
for (int i = 1; i < currentRatios.Length; i++)
{
diff = currentRatios[i] - humanRatios[i];
if (abs(diff) <= MaximumDiff)//I used .2 for my MaximumDiff
correctProportions++;
}
if (correctProportions >= 2)
PersonDetected = true;
Another method I had success with was finding the average of the sum of the joints distance squared from one another. I found that non-human detections had more variable summed distances, whereas humans are more consistent. The average I learned using a single dimensional support vector machine (I found user's summed distances were generally less than 9)
//in AllFramesReady or SkeletalFrameReady
Skeleton data;
...
float lastPosX = 0; // trying to detect false-positives
float lastPosY = 0;
float lastPosZ = 0;
float diff = 0;
foreach (Joint joint in data.Joints)
{
//add the distance squared
diff += (joint.Position.X - lastPosX) * (joint.Position.X - lastPosX);
diff += (joint.Position.Y - lastPosY) * (joint.Position.Y - lastPosY);
diff += (joint.Position.Z - lastPosZ) * (joint.Position.Z - lastPosZ);
lastPosX = joint.Position.X;
lastPosY = joint.Position.Y;
lastPosZ = joint.Position.Z;
}
if (diff < 9)//this is what my svm learned
PersonDetected = true;
Use player IDs and indexes to remember who is who
This ties in with the previous issue, where if Kinect switched the two users that it was tracking to others, then my application would crash because of the sudden changes in data. To counter this, I would keep track of both each player's skeletal index and their player ID. To learn more about how I did this, see Kinect user Detection.
Add adjustable parameters to adopt to varying situations
Where I was presenting, the same tilt angle and other basic kinect parameters (like near-mode) did not work in the new environment. Let the user be able to adjust some of these parameters so they can get the best setup for the job.
Expect people to do stupid things
The next time I presented, I had adjustable tilt, and you can guess whether someone burned out the Kinect's motor. Anything that can be broken on Kinect, someone will break. Leaving a warning in your documentation will not be sufficient. You should add in cautionary checks on Kinect's hardware to make sure people don't get upset when they break something inadvertently. Here is some code checking whether the user has used the motor more than 20 times in two minutes.
int motorAdjustments = 0;
DateTime firstAdjustment;
...
//in motor adjustment code
if (motorAdjustments == 0)
firstAdjustment = DateTime.Now;
++motorAdjustments;
if (motorAdjustments < 20)
{
//adjust the tilt
}
else
{
DateTime timeCheck = firstAdjustment;
if (DateTime.Now > timeCheck.AddMinutes(2))
{
//reset all variables
motorAdjustments = 1;
firstAdjustment = DateTime.Now;
//adjust the tilt
}
}
I would note that all of these were issues for me with the first version of Kinect, and I don't know how many of them have been solved in the second version as I sadly haven't gotten my hands on one yet. However I would still implement some of these techniques if not back-up techniques because there will be exceptions, especially in computer vision.

How do you set the input level (gain) on the built-in input (OSX Core Audio / Audio Unit)?

I've got an OSX app that records audio data using an Audio Unit. The Audio Unit's input can be set to any available source with inputs, including the built-in input. The problem is, the audio that I get from the built-in input is often clipped, whereas in a program such as Audacity (or even Quicktime) I can turn down the input level and I don't get clipping.
Multiplying the sample frames by a fraction, of course, doesn't work, because I get a lower volume, but the samples themselves are still clipped at time of input.
How do I set the input level or gain for that built-in input to avoid the clipping problem?
This works for me to set the input volume on my MacBook Pro (2011 model). It is a bit funky, I had to try setting the master channel volume, then each independent stereo channel volume until I found one that worked. Look through the comments in my code, I suspect the best way to tell if your code is working is to find a get/set-property combination that works, then do something like get/set (something else)/get to verify that your setter is working.
Oh, and I'll point out of course that I wouldn't rely on the values in address staying the same across getProperty calls as I'm doing here. It seems to work but it's definitely bad practice to rely on struct values being the same when you pass one by reference to a function. This is of course sample code so please forgive my laziness. ;)
//
// main.c
// testInputVolumeSetter
//
#include <CoreFoundation/CoreFoundation.h>
#include <CoreAudio/CoreAudio.h>
OSStatus setDefaultInputDeviceVolume( Float32 toVolume );
int main(int argc, const char * argv[]) {
OSStatus err;
// Load the Sound system preference, select a default
// input device, set its volume to max. Now set
// breakpoints at each of these lines. As you step over
// them you'll see the input volume change in the Sound
// preference panel.
//
// On my MacBook Pro setting the channel[ 1 ] volume
// on the default microphone input device seems to do
// the trick. channel[ 0 ] reports that it works but
// seems to have no effect and the master channel is
// unsettable.
//
// I do not know how to tell which one will work so
// probably the best thing to do is write your code
// to call getProperty after you call setProperty to
// determine which channel(s) work.
err = setDefaultInputDeviceVolume( 0.0 );
err = setDefaultInputDeviceVolume( 0.5 );
err = setDefaultInputDeviceVolume( 1.0 );
}
// 0.0 == no volume, 1.0 == max volume
OSStatus setDefaultInputDeviceVolume( Float32 toVolume ) {
AudioObjectPropertyAddress address;
AudioDeviceID deviceID;
OSStatus err;
UInt32 size;
UInt32 channels[ 2 ];
Float32 volume;
// get the default input device id
address.mSelector = kAudioHardwarePropertyDefaultInputDevice;
address.mScope = kAudioObjectPropertyScopeGlobal;
address.mElement = kAudioObjectPropertyElementMaster;
size = sizeof(deviceID);
err = AudioObjectGetPropertyData( kAudioObjectSystemObject, &address, 0, nil, &size, &deviceID );
// get the input device stereo channels
if ( ! err ) {
address.mSelector = kAudioDevicePropertyPreferredChannelsForStereo;
address.mScope = kAudioDevicePropertyScopeInput;
address.mElement = kAudioObjectPropertyElementWildcard;
size = sizeof(channels);
err = AudioObjectGetPropertyData( deviceID, &address, 0, nil, &size, &channels );
}
// run some tests to see what channels might respond to volume changes
if ( ! err ) {
Boolean hasProperty;
address.mSelector = kAudioDevicePropertyVolumeScalar;
address.mScope = kAudioDevicePropertyScopeInput;
// On my MacBook Pro using the default microphone input:
address.mElement = kAudioObjectPropertyElementMaster;
// returns false, no VolumeScalar property for the master channel
hasProperty = AudioObjectHasProperty( deviceID, &address );
address.mElement = channels[ 0 ];
// returns true, channel 0 has a VolumeScalar property
hasProperty = AudioObjectHasProperty( deviceID, &address );
address.mElement = channels[ 1 ];
// returns true, channel 1 has a VolumeScalar property
hasProperty = AudioObjectHasProperty( deviceID, &address );
}
// try to get the input volume
if ( ! err ) {
address.mSelector = kAudioDevicePropertyVolumeScalar;
address.mScope = kAudioDevicePropertyScopeInput;
size = sizeof(volume);
address.mElement = kAudioObjectPropertyElementMaster;
// returns an error which we expect since it reported not having the property
err = AudioObjectGetPropertyData( deviceID, &address, 0, nil, &size, &volume );
size = sizeof(volume);
address.mElement = channels[ 0 ];
// returns noErr, but says the volume is always zero (weird)
err = AudioObjectGetPropertyData( deviceID, &address, 0, nil, &size, &volume );
size = sizeof(volume);
address.mElement = channels[ 1 ];
// returns noErr, but returns the correct volume!
err = AudioObjectGetPropertyData( deviceID, &address, 0, nil, &size, &volume );
}
// try to set the input volume
if ( ! err ) {
address.mSelector = kAudioDevicePropertyVolumeScalar;
address.mScope = kAudioDevicePropertyScopeInput;
size = sizeof(volume);
if ( toVolume < 0.0 ) volume = 0.0;
else if ( toVolume > 1.0 ) volume = 1.0;
else volume = toVolume;
address.mElement = kAudioObjectPropertyElementMaster;
// returns an error which we expect since it reported not having the property
err = AudioObjectSetPropertyData( deviceID, &address, 0, nil, size, &volume );
address.mElement = channels[ 0 ];
// returns noErr, but doesn't affect my input gain
err = AudioObjectSetPropertyData( deviceID, &address, 0, nil, size, &volume );
address.mElement = channels[ 1 ];
// success! correctly sets the input device volume.
err = AudioObjectSetPropertyData( deviceID, &address, 0, nil, size, &volume );
}
return err;
}
EDIT in response to your question, "How'd [I] figure this out?"
I've spent a lot of time using Apple's audio code over the last five or so years and I've developed some intuition/process when it comes to where and how to look for solutions. My business partner and I co-wrote the original iHeartRadio apps for the first-generation iPhone and a few other devices and one of my responsibilities on that project was the audio portion, specifically writing an AAC Shoutcast stream decoder/player for iOS. There weren't any docs or open-source examples at the time so it involved a lot of trial-and-error and I learned a ton.
At any rate, when I read your question and saw the bounty I figured this was just low-hanging fruit (i.e. you hadn't RTFM ;-). I wrote a few lines of code to set the volume property and when that didn't work I genuinely got interested.
Process-wise maybe you'll find this useful:
Once I knew it wasn't a straightforward answer I started thinking about how to solve the problem. I knew the Sound System Preference lets you set the input gain so I started by disassembling it with otool to see whether Apple was making use of old or new Audio Toolbox routines (new as it happens):
Try using:
otool -tV /System/Library/PreferencePanes/Sound.prefPane/Contents/MacOS/Sound | bbedit
then search for Audio to see what methods are called (if you don't have bbedit, which every Mac developer should IMO, dump it to a file and open in some other text editor).
I'm most familiar with the older, deprecated Audio Toolbox routines (three years to obsolescence in this industry) so I looked at some Technotes from Apple. They have one that shows how to get the default input device and set its volume using the newest CoreAudio methods but as you undoubtedly saw their code doesn't work properly (at least on my MBP).
Once I got to that point I fell back on the tried-and-true: Start googling for keywords that were likely to be involved (e.g. AudioObjectSetPropertyData, kAudioDevicePropertyVolumeScalar, etc.) looking for example usage.
One interesting thing I've found about CoreAudio and using the Apple Toolbox in general is that there is a lot of open-source code out there where people try various things (tons of pastebins and GoogleCode projects etc.). If you're willing to dig through a bunch of this code you'll typically either find the answer outright or get some very good ideas.
In my search the most relevant things I found were the Apple technote showing how to get the default input device and set the master input gain using the new Toolbox routines (even though it didn't work on my hardware), and I found some code that showed setting the gain by channel on an output device. Since input devices can be multichannel I figured this was the next logical thing to try.
Your question is really good because at least right now there is no correct documentation from Apple that shows how to do what you asked. It's goofy too because both channels report that they set the volume but clearly only one of them does (the input mic is a mono source so this isn't surprising, but I consider having a no-op channel and no documentation about it a bit of a bug on Apple).
This happens pretty consistently when you start dealing with Apple's cutting-edge technologies. You can do amazing things with their toolbox and it blows every other OS I've worked on out of the water but it doesn't take too long to get ahead of their documentation, particularly if you're trying to do anything moderately sophisticated.
If you ever decide to write a kernel driver for example you'll find the documentation on IOKit to be woefully inadequate. Ultimately you've got to get online and dig through source code, either other people's projects or the OS X source or both, and pretty soon you'll conclude as I have that the source is really the best place for answers (even though StackOverflow is pretty awesome).
Thanks for the points and good luck with your project :)