How to track the location of a window belonging to another app - objective-c

When screen sharing a specific window on macOS with Zoom or Skype/Teams, they draw a red or green highlight border around that window (which belongs to a different application) to indicate it is being shared. The border is following the target window in real time, with resizing, z-order changes etc.
See example:
What macOS APIs and techniques might be used to achieve this effect?

You can find the location of windows using CGWindowListCopyWindowInfo and related API, which is available to Sandboxed apps.
This is a very fast and efficient API, fast enough to be polled. The SonOfGrab sample code is great platform to try out this stuff.
You can also install a global event tap using +[NSEvent addGlobalMonitorForEventsMatchingMask:handler:] (available in sandbox) to track mouse down, drag and mouse up events and then you can respond immediately whenever the user starts or releases a drag. This way your response will be snappy.
(Drawing a border would be done by creating your own transparent window, slightly larger than, and at the same window layer as, the window you are tracking. And then simply draw a pretty green box into it. I'm not exactly sure about setting the z-order. The details of this part would be best as a separate question.)

Related

Windows Desktop Manager Overlay

I like to make an overlay with the following properties:
should work at least on Windows 8.1
should be on top on everything, like a mouse cursor
should incorporate the pixels which are already on the background, like a blur filter
no flickering
Details to each of this points:
1) I assume that WDM is activated and DirectX 11.2 is used. Sure it would be nice to have it working on other Windows versions but this has no priority.
2) The problem is that with simply using the WS_EX_TOPMOST, menus from applications are over my overlay. In my case this really hurts as I like to display something with the same properties as a cursor. Imagine that a cursor suddenly is hidden if you open a menu -> unacceptable.
3) I like to read the pixels from the Windows desktop, including any effect Windows applies (like blur), and use this information for my filter. If I add my overlay, as described in 2, I should be able to get a fresh unobstructed copy of the background in the next frame and not read out my own overlay.
4) If I just write something into the Windows desktop directly, it gets overwritten immediately on the next frame by Windows itself. This is not acceptable.
One example of such an application is a magnifying glass, which exactly has all the properties I need. But for this case Windows 8.1 has an API. In contrast I like to write a program which displays a hand on the desktop (which is controlled by Leap Motion) which influences the Windows desktop, so you almost "feel" how you move your hand over the desktop.
If I write a tiny DirectX and/or OpenGL application for myself this is all very easy:
render all the regular stuff to a texture
use this texture for a post processing filter and add all my stuff on top of it
render just a quad to the back buffer
But I like to do that for the whole Windows desktop.
I found many different application, but they are to no use for me:
application which claim to be on top, are still behind menus. This normally doesn't really hurt, but is unacceptable for a cursor-alike thing
screen capturing programs which hook them self in all running programs are nice, but I want to hook myself into WDM
normally screen capturing programs do not draw anything into the back buffer, so they get every frame a new unobstructed back buffer
My questions can be boiled down to: How can I write my own magnifying glass for Windows 8.1.
I fear that my only serious option is to hook myself into WDM, what I try to avoid.
I'm happy to hear any idea how to achieve this, or hints to application which are doing what I describe.

General considerations for NUI/touch interface

For the past few months I've been looking into developing a Kinect based multitouch interface for a variety of software music synthesizers.
The overall strategy I've come up with is to create objects, either programatically or (if possible) algorithmically to represent various controls of the soft synth. These should have;
X position
Y position
Height
Width
MIDI output channel
MIDI data scaler (convert x-y coords to midi values)
2 strategies I've considered for agorithmic creation are XML description and somehow pulling stuff right off the screen (ie given a running program, find xycoords of all controls). I have no idea how to go about that second one, which is why I express it in such specific technical language ;). I could do some intermediate solution, like using mouse clicks on the corners of controls to generate an xml file. Another thing I could do, that I've seen frequently in flash apps, is to put the screen size into a variable and use math to build all interface objects in terms of screen size. Note that it isn't strictly necessary to make the objects the same size as onscreen controls, or to represent all onscreen objects (some are just indicators, not interactive controls)
Other considerations;
Given (for now) two sets of X/Y coords as input (left and right hands), what is my best option for using them? My first instinct is/was to create some kind of focus test, where if the x/y coords fall within the interface object's bounds that object becomes active, and then becomes inactive if they fall outside some other smaller bounds for some period of time. The cheap solution I found was to use the left hand as the pointer/selector and the right as a controller, but it seems like I can do more. I have a few gesture solutions (hidden markov chains) I could screw around with. Not that they'd be easy to get to work, exactly, but it's something I could see myself doing given sufficient incentive.
So, to summarize, the problem is
represent the interface (necessary because the default interface always expects mouse input)
select a control
manipulate it using two sets of x/y coords (rotary/continuous controller) or, in the case of switches, preferrably use a gesture to switch it without giving/taking focus.
Any comments, especially from people who have worked/are working in multitouch io/NUI, are greatly appreciated. Links to existing projects and/or some good reading material (books, sites, etc) would be a big help.
Woah lots of stuff here. I worked on lots of NUI stuff during my at Microsoft so let's see what we can do...
But first, I need to get this pet peeve out of the way: You say "Kinect based multitouch". That's just wrong. Kinect inherently has nothing to do with touch (which is why you have the "select a control" challenge). The types of UI consideration needed for touch, body tracking, and mouse are totally different. For example, in touch UI you have to be very careful about resizing things based on screen size/resolution/DPI... regardless of the screen, fingers are always the same physical size and people have the same degreee of physical accuracy so you want your buttons and similar controls to always be roughly the same physical size. Research has found 3/4 of an inch to be the sweet spot for touchscreen buttons. This isn't so much of a concern with Kinect though since you aren't directly touching anything - accuracy is dictated not by finger size but by sensor accuracy and users ability to precisely control finicky & lagging virtual cursors.
If you spend time playing with Kinect games, it quickly becomes clear that there are 4 interaction paradigms.
1) Pose-based commands. User strikes and holds a pose to invoke some application-wide or command (usually brining up a menu)
2) Hover buttons. User moves a virtual cursor over a button and holds still for a certain period of time to select the button
3) Swipe-based navigation and selection. User waves their hands in one direction to scroll and list and another direction to select from the list
4) Voice commands. User just speaks a command.
There are other mouse-like ideas that have been tried by hobbyists (havent seen these in an actual game) but frankly they suck: 1) using one hand for cursor and another hand to "click" where the cursor is or 2) using z-coordinate of the hand to determine whether to "click"
It's not clear to me whether you are asking about how to make some existing mouse widgets work with Kinect. If so, there are some projects on the web that will show you how to control the mouse with Kinect input but that's lame. It may sound super cool but you're really not at all taking advantage of what the device does best.
If I was building a music synthesizer, I would focus on approach #3 - swiping. Something like Dance Central. On the left side of the screen show a list of your MIDI controllers with some small visual indication of their status. Let the user swipe their left hand to scroll through and select a controller from this list. On the right side of the screen show how you are tracking the users right hand within some plane in front of their body. Now you're letting them use both hands at the same time, giving immediate visual feedback of how each hand is being interpretted, and not requiring them to be super precise.
ps... I'd also like to give a shout out to Josh Blake's upcomming NUI book. It's good stuff. If you really want to master this area, go order a copy :) http://www.manning.com/blake/

I want to animate the movement of a foreign OS X app's window

Background: I recently got two monitors and want a way to move the focused window to the other screen and vice versa. I've achieved this by using the Accessibility API. (Specifically, I get an AXUIElementRef that holds the AXUIElement associated with the focused window, then I set the NSAccessibilityPositionAttribute value to move the window.
I have this working almost exactly the way I want it to, except I want to animate the movement of windows. I thought that if I could get the NSWindow somehow, I could get its layer and use CoreAnimation to animate the window movement.
Unfortunately, I found out that this isn't possible. (Correct me I'm wrong though -- if there's a way to do it this way it'd be great!) So I'm asking you all for help. How should I go about animating the movement of the focused window, if I have access to the AXUIElementRef?
-R
--EDIT
I was able to get a crude animation going by creating a while loop and moving the position of the window by a small amount each time to make a successful animation. However, the results are pretty sub-par. As you can guess, it takes a lot of unnecessary processing power, and is still very choppy. There must be a better way.
The best possible way I can imagine would be to perform some hacky property comparison between the AXUIElement info values for the window and the info returned from the CGWindow api. Once you're able to ascertain what windows in the CGWindow API match AXUIElementRefs, you could grab bitmaps of the current window contents, overlay the screen with your own custom animation draw of the faux windows, then as you drop the overlay set the real AXUIElementRef's to the desired-end-animation positions.
Hacky, tho.

Extending Functionality of Magic Mouse: Do I Need a kext?

I recently purchased a Magic Mouse. It is fantastic and full of potential. Unfortunately, it is seriously hindered by the software support. I want to fix that. I have done quite a lot of research and these are my findings regarding the event chain thus far:
The Magic Mouse sends full multitouch events to the system.
Multitouch events are processed in the MultitouchSupport.framework (Carbon)
The events are interpreted in the framework and sent up to the system as normal events
When you scroll with one finger it sends actual scroll wheel events.
When you swipe with two fingers it sends a swipe event.
No NSTouch events are sent up to the system. You cannot use the NSTouch API to interact with the mouse.
After I discovered all of the above, I diassembled the MultitouchSupport.framework file and, with some googling, figured out how to insert a callback of my own into the chain so I would receive the raw touch event data. If you enumerate the list of devices, you can attach for each device (trackpad and mouse). This finding would enable us to create a framework for using multitouch on the mouse, but only in a single application. See my post here: Raw Multitouch Tracking.
I want to add new functionality to the mouse across the entire system, not just a single app.
In an attempt to do so, I figured out how to use Event Taps to see if the lowest level event tap would allow me to get the raw data, interpret it, and send up my own events in its place. Unfortunately, this is not the case. The event tap, even at the HID level, is still a step above where the input is being interpreted in MultitouchSupport.framework.
See my event tap attempt here: Event Tap - Attempt Raw Multitouch.
An interesting side note: when a multitouch event is received, such as a swipe, the default case is hit and prints out an event number of 29. The header shows 28 as being the max.
On to my question, now that you have all the information and have seen what I have tried: what would be the best approach to extending the functionality of the Magic Mouse? I know I need to insert something at a low enough level to get the input before it is processed and predefined events are dispatched. So, to boil it down to single sentence questions:
Is there some way to override the default callbacks used in MultitouchSupport.framework?
Do I need to write a kext and handle all the incoming data myself?
Is it possible to write a kext that sits on top of the kext that is handling the input now, and filters it after that kext has done all the hard work?
My first goal is to be able to dispatch a middle button click event if there are two fingers on the device when you click. Obviously there is far, far more that could be done, but this seems like a good thing to shoot for, for now.
Thanks in advance!
-Sastira
How does what is happening in MultitouchSupport.framework differ between the Magic Mouse and a glass trackpad? If it is based on IOKit device properties, I suspect you will need a KEXT that emulates a trackpad but actually communicates with the mouse. Apple have some documentation on Darwin kernel programming and kernel extensions specifically:
About Kernel Extensions
Introduction to I/O Kit Device Driver Design Guidelines
Kernel Programming Guide
(Personally, I'd love something that enabled pinch magnification and more swipe/button gestures; as it is, the Magic Mouse is a functional downgrade from the Mighty Mouse's four buttons and [albeit ever-clogging] 2D scroll wheel. Update: last year I wrote Sesamouse to do just that, and it does NOT need a kext (just a week or two staring at hex dumps :-) See my other answer for the deets and source code.)
Sorry I forgot to update this answer, but I ended up figuring out how to inject multitouch and gesture events into the system from userland via Quartz Event Services. I'm not sure how well it survived the Lion update, but you can check out the underlying source code at https://github.com/calftrail/Touch
It requires two hacks: using the private Multitouch framework to get the device input, and injecting undocumented CGEvent structures into Quartz Event Services. It was incredibly fun to figure out how to pull it off, but these days I recommend just buying a Magic Trackpad :-P
I've implemented a proof-of-concept of userspace customizable multi-touch events wrapper.
You can read about it here: http://aladino.dmi.unict.it/?a=multitouch (see in WaybackMachine)
--
all the best
If you get to that point, you may want to consider the middle click being three fingers on the mouse instead of two. I've thought about this middle click issue with the magic mouse and I notice that I often leave my 2nd finger on the mouse even though I am only pressing for a left click. So a "2 finger" click might be mistaken for a single left click, and it would also require the user more effort in always having to keep the 2nd finger off the mouse. Therefor if it's possible to detect, three fingers would cause less confusion and headaches. I wonder where the first "middle button click" solution will come from, as I am anxious for my middle click Expose feature to return :) Best of luck.

Mac OSX Overlay

How would I go about programming a HUD type overlay in OSX.
I want to be able to have an application that will display text at a certain point over a different application's window.
And thus if the (other applications) window moves the HUD part will stay at the same coordinate of the other window.
For the window itself, use a borderless, transparent window (plenty of examples) with your own custom view into which to draw your overlaid elements.
For the "other applications' windows" part, there's no public API that's going to let you do this smoothly. You use Universal Access and its window location/navigation API, but it requires your users to turn on "Enable access for assistive devices" (I think it still can't be done programmatically). I don't believe it "lets you know" when a window moves, but I could be wrong. If it does, it'd likely be a one-shot "here's where I am now", so your overlay would likely not keep up. I also don't think it gives you the "window level" to allow you to make sure you're "above" any given window/sheet/palette.
The only other option (to move with other apps' windows) is a system-wide, invasive hack a la Application Enhancer (which is quite controversial). It's easy to get this wrong and destabilize a user's system (hence the controversy).
You could use undocumented CoreGraphics Functions in order to track a window, see http://code.google.com/p/undocumented-goodness/source/browse/trunk/CoreGraphics/CGSPrivate.h