Simulating mouse-down event on another window from CGWindowListCreate - objective-c

You can list the currently open windows using CGWindowListCreate, e.g:
CFArrayRef windowListArray = CGWindowListCreate(kCGWindowListOptionOnScreenOnly|kCGWindowListExcludeDesktopElements, kCGNullWindowID);
NSArray *windows = CFBridgingRelease(CGWindowListCreateDescriptionFromArray(windowListArray));
// stuff here with windows array
CFRelease(windowListArray);
With this you can get a specific window, e.g. a window from Chrome that's somewhere in the background of the current workspace, but not minimized. I also found that you can simulate mouse-clicks anywhere on-screen using CGEventCreateMouseEvent:
CGEventRef click_down = CGEventCreateMouseEvent(NULL, kCGEventLeftMouseDown, point, kCGMouseButtonLeft);
CGEventPost(kCGHIDEventTap, click_down);
Instead of sending this event to the front-most window under this point on the screen, can I send this to a window in background? Or is the only way to temporarily switch to that window (bring it to front), click, and switch back to the previous frontmost window?
This post suggests the latter is possible:
Cocoa switch focus to application and then switch it back
Although I'm very interested in seeing whether this can be avoided, whether we can send mouse-clicks directly to a specific window in background without bringing it into view. I can consider any Objective-C, C, or C++ options for this.

Finally got this working after finding an alternative to the deprecated GetPSNForPID function:
NSEvent *customEvent = [NSEvent mouseEventWithType: NSEventTypeLeftMouseDown
location: point
modifierFlags: NSEventModifierFlagCommand
timestamp:[NSDate timeIntervalSinceReferenceDate]
windowNumber:[self.windowID intValue]
context: nil
eventNumber: 0
clickCount: 1
pressure: 0];
CGEvent = [customEvent CGEvent];
CGEventPostToPid(PID, CGEvent);
Didn't realise there was a CGEventPostToPid method instead of CGEventPostToPSN. No need for the ProcessSerialNumber struct.

Related

Intercept keydown events to NSWindow given to a plugin

I am writing an audio plugin (VST) in Objective-C on OSX. My plugin gets loaded into an application and is given an NSWindow in which to add my own NSView. I need to be able to intercept keyboard events on the NSWindow which I can partly do, but not fully.
Here's what I have tried so far:
Make sure my view is the first response and handle keyDown events. This works for most keyDown events, but not carriage return or special keys like cut/copy/paste.
Use addLocalMonitorForEventsMatchingMask. This doesn't provide anything more useful than keyDown.
The NSWindow I'm given has a menu with some key equivalents for cut/copy/paste. I occasionally need to intercept these if the user has something selected in my NSView. I also occasionally need to intercept carriage return if the user is entering some data.
My UI is rendered using OpenGL so I'm not using standard Cocoa UI components apart from NSView to host my OpenGL surface.
I don't want the user to have to enable anything special to do this, like accessibility.
In my keyDown handler I have something like this:
- (void)keyDown:(NSEvent*)event
{
NSString* s = event.charactersIgnoringModifiers;
unichar modified_key = 0;
if (s && [s length] > 0)
{
modified_key = [s characterAtIndex:0];
}
if (modified_key == NSCarriageReturnCharacter)
{
// carriage return
}
}
This works in a stand alone application, but fails when it's hosted as an audio plugin. The problem I think is that the application hosting the plugin is intercepting events before they reach my event handlers.
You seem to be handling the event in a strange way. Process the NSEvent keyCode directly. You can use modifierFlags to get the modifiers.
This won't work for certain "system" key combinations (like Command + Space) for which you will need accessibility access.
- (void)keyDown:(NSEvent *)event
{
if (event.keyCode == 36)
NSLog(#"you pressed return!");
if (event.modifierFlags & NSEventModifierFlagCommand)
{
if (event.keyCode == 8)
NSLog(#"you pressed command+c!");
}
}

Capture screenshot of macOS window

Note: this question is intentionally very general (e.g. both Objective-C and Swift code examples are requested), as it is intended to document how to capture a window screenshot on macOS as accessibly as possible.
I want to capture a screenshot of a macOS window in Objective-C/Swift code. I know this is possible because of the multitude of ways to take a screenshot on macOS (⇧⌘4, the Grab utility, screencapture on the command line, …), but I’m not sure how to do it in my own code. Ideally, I’d be able to specify a window of a particular application, and then capture it in an NSImage or CGImage that I could then process and display to the user or store in a file.
Screen capture on macOS is possible through Quartz Window Services, a facility of the Core Graphics framework. Our key function here is CGWindowListCreateImage, which “returns a composite image based on a dynamically generated list of windows,” or, in other words, finds windows based on specified criteria and creates an image with the contents of each. Perfect! Its declaration is as follows:
CGImageRef CGWindowListCreateImage(CGRect screenBounds,
CGWindowListOption listOption,
CGWindowID windowID,
CGWindowImageOption imageOption);
So, in order to capture one specific window on the screen, we’ll need its window ID (CGWindowID). To go about retrieving that, we’ll first need a list of all of the windows available on the system. We get that through CGWindowListCopyWindowInfo, which takes CGWindowListOptions and a corresponding CGWindowID that, together, select which windows to include in the resulting list. To get ALL the windows, we specify kCGWindowListOptionAll, and kCGNullWindowID, respectively. Also, if you haven’t figured it out already, this is a C API, so we’ll use a bridging cast to work with the friendlier Objective-C containers rather than the Core Foundation ones.
Objective-C:
NSArray<NSDictionary*> *windowInfoList = (__bridge_transfer id)
CGWindowListCopyWindowInfo(kCGWindowListOptionAll, kCGNullWindowID);
Swift:
let windowInfoList = CGWindowListCopyWindowInfo(.optionAll, kCGNullWindowID)!
as NSArray
From here, we need to filter our windowInfoList down to the specific window that we want. Chances are we want to filter first by application. To do that, we’ll need the process ID of our application of choice. We can use NSRunningApplication to accomplish this:
Objective-C:
NSArray<NSRunningApplication*> *apps =
[NSRunningApplication runningApplicationsWithBundleIdentifier:
/* Bundle ID of the application, e.g.: */ #"com.apple.Safari"];
if (apps.count == 0) {
// Application is not currently running
puts("The application is not running");
return; // Or whatever
}
pid_t appPID = apps[0].processIdentifier;
Swift:
let apps = NSRunningApplication.runningApplications(withBundleIdentifier:
/* Bundle ID of the application, e.g.: */ "com.apple.Safari")
if apps.isEmpty {
// Application is not currently running
print("The application is not running")
return // Or whatever
}
let appPID = apps[0].processIdentifier
With appPID in hand, we can now go ahead and filter down our window info list to only windows with a matching owner PID:
Objective-C:
NSMutableArray<NSDictionary*> *appWindowsInfoList = [NSMutableArray new];
for (NSDictionary *info in windowInfoList) {
if ([info[(__bridge NSString *)kCGWindowOwnerPID] integerValue] == appPID) {
[appWindowsInfoList addObject:info];
}
}
Swift:
var appWindowsInfoList = [NSDictionary]()
for info_ in windowInfoList {
let info = info_ as! NSDictionary
if (info[kCGWindowOwnerPID as NSString] as! NSNumber).intValue == appPID {
appWindowsInfoList.append(info)
}
}
We could have done additional filtering above by testing other keys of the info dictionary—for example, by name (kCGWindowName), or by whether the window is on-screen (kCGWindowIsOnscreen)—but for now, we’ll just take the first window in the list:
Objective-C:
NSDictionary *appWindowInfo = appWindowsInfoList[0];
CGWindowID windowID = [appWindowInfo[(__bridge NSString *)kCGWindowNumber] unsignedIntValue];
Swift:
let appWindowInfo: NSDictionary = appWindowsInfoList[0];
let windowID: CGWindowID = (appWindowInfo[kCGWindowNumber as NSString] as! NSNumber).uint32Value
And we have our window ID! Now, what else did we need for that call again?
CGImageRef CGWindowListCreateImage(CGRect screenBounds,
CGWindowListOption listOption,
CGWindowID windowID,
CGWindowImageOption imageOption);
First, we need a screenBounds to capture. According to the documentation, we can specify CGRectNull for this parameter to enclose all specified windows as tightly as possible. Works for me.
Second, we have to specify how we want to select our windows with listOption. We actually used one of these earlier, in our call to CGWindowListCopyWindowInfo, but there we wanted all the windows on the system; here, we only want one, so we’ll specify kCGWindowListOptionIncludingWindow, which, contrary to its documentation page, is meaningful on its own for CGWindowListCreateImage in that it specifies the window we pass, and only the window we pass.
Third, we pass our windowID as the window we want captured.
Fourth and finally, we can specify CGWindowImageOptions with the imageOption parameter. These affect the appearance of the resulting image; you can combine them through bitwise OR. The full list is here, but common ones include either kCGWindowImageDefault, which captures the window's contents along with its frame and shadow, or kCGWindowImageBoundsIgnoreFraming, which captures only the content, and kCGWindowImageBestResolution, which captures the window's content at the best resolution available, regardless of actual size (and, depending on the window, may be considerably large), or kCGWindowImageNominalResolution, which captures the window at its current size on the screen. Here, I’ve gone with kCGWindowImageBoundsIgnoreFraming and kCGWindowImageNominalResolution to capture only the content at the same size as on the screen.
Aaand, drumroll please:
Objective-C:
CGImageRef windowImage =
CGWindowListCreateImage(CGRectNull, kCGWindowListOptionIncludingWindow,
windowID, kCGWindowImageBoundsIgnoreFraming|
kCGWindowImageNominalResolution);
// NOTE: windowImage may be NULL if the capture failed
Swift:
let windowImage: CGImage? =
CGWindowListCreateImage(.null, .optionIncludingWindow, windowID,
[.boundsIgnoreFraming, .nominalResolution])
Here's the Objective C code without all the exposition, and no need to know your bundle ID ahead of time:
int processID = [[NSProcessInfo processInfo] processIdentifier];
NSArray<NSDictionary*>* windowInfoList = (__bridge_transfer id) CGWindowListCopyWindowInfo(kCGWindowListOptionOnScreenOnly, kCGNullWindowID);
int windowID = -1;
for (NSDictionary* info in windowInfoList) {
int thisProcess = [info[(__bridge NSString *)kCGWindowOwnerPID] integerValue];
if (thisProcess == processID) {
windowID = [info[(__bridge NSString *)kCGWindowNumber] integerValue];
break;
}
}
CGImageRef screenCG = nil;
if (windowID != -1)
screenCG = CGWindowListCreateImage(CGRectNull, kCGWindowListOptionIncludingWindow, windowID, kCGWindowImageBoundsIgnoreFraming);

CGWindowListCreate generates a hugely long list of windows

When I use CGWindowListCreate from quartz window services, it generates a very long array of window id's. I tried to turn on the option to exclude desktop elements, but I get a list of 30-40 windows even if there are only 3 or 4 of what I would call windows open.
Here is how I am doing it:
CGWindowListOption opt = 1 << 4;
CFArrayRef windowids =CGWindowListCreate(opt,kCGNullWindowID);
I am wondering what I am doing wrong that is causing this problem, and what I can do to fix it. I simply want the program to list windows created by applications, such as finder windows or browser windows, and not whatever else it is including. Thank you in advance for your help.
This will return every window whether it is on screen or off screen, you should combine it with the option kCGWindowListOptionOnScreenOnly (and also don't hardcode the one you are using). It will look like this:
CGWindowListOption opt = kCGWindowListOptionOnScreenOnly|kCGWindowListExcludeDesktopElements;
CFArrayRef windowids =CGWindowListCreate(opt,kCGNullWindowID);
That is what I gathered from the docs anyway.
I discovered a solution is to filter the window list to only those windows "below" the Dock (in terms of window layering).
The code below worked well for me. It fetches all on screen windows (excluding desktop elements). It extracts the window ID for the "Dock" window out of the list. Then fetches on screen windows again, filtering to only those windows "below" the Dock window.
// Fetch all on screen windows
CFArrayRef windowListArray = CGWindowListCreate(kCGWindowListOptionOnScreenOnly|kCGWindowListExcludeDesktopElements, kCGNullWindowID);
NSArray *windows = CFBridgingRelease(CGWindowListCreateDescriptionFromArray(windowListArray));
NSLog(#"All on screen windows: %#", windows);
// Find window ID of "Dock" window
NSNumber *dockWindowNumber = nil;
for (NSDictionary *window in windows) {
if ([(NSString *)window[(__bridge NSString *)kCGWindowName] isEqualToString:#"Dock"]) {
dockWindowNumber = window[(__bridge NSString *)kCGWindowNumber];
break;
}
}
NSLog(#"dockWindowNumber: %#", dockWindowNumber);
CFRelease(windowListArray);
if (dockWindowNumber) {
// Fetch on screen windows again, filtering to those "below" the Dock window
// This filters out all but the "standard" application windows
windowListArray = CGWindowListCreate(kCGWindowListOptionOnScreenBelowWindow|kCGWindowListExcludeDesktopElements, [dockWindowNumber unsignedIntValue]);
NSArray *windows = CFBridgingRelease(CGWindowListCreateDescriptionFromArray(windowListArray));
NSLog(#"On screen application windows: %#", windows);
}
else {
NSLog(#"Could not find Dock window description");
}

CGEvent NSKeyDown only working if app is front most?

I am trying to automate some tasks (there's no applescript support) so I have to use CGEvents and post these events. Mouse clicking works fine, but NSKeyDown (enter) only works if I click on the app in the dock(which makes it front most app)...
Here's my code so far:
for (NSDictionary *dict in windowList) {
NSLog(#"%#", dict);
if ([[dict objectForKey:#"kCGWindowName"] isEqualToString:#"Some Window..."]) {
WIDK = [[dict objectForKey:#"kCGWindowNumber"] intValue];
break;
};
}
CGEventRef CGEvent;
NSEvent *customEvent;
customEvent = [NSEvent keyEventWithType:NSKeyDown
location:NSZeroPoint
modifierFlags:0
timestamp:1
windowNumber:WIDK
context:nil
characters:nil
charactersIgnoringModifiers:nil
isARepeat:NO
keyCode:36];
CGEvent = [customEvent CGEvent];
for (int i=0; i <5; i++) {
sleep(3);
CGEventPostToPSN(&psn, CGEvent);
NSLog(#"posted the event");
}
CFRelease(eOne);
The reason why I have posteventtopsn in a loop is for testing purposes (I just need it to send it once). While the program is in the loop, if I activate my app to front most, then the event works fine.
What am I doing wrong? Is there a way it can work if it in background? Thanks.
Here is a better way to post keyboard events to the front-most application:
CGEventRef a = CGEventCreateKeyboardEvent(NULL, 124, true);
CGEventRef b = CGEventCreateKeyboardEvent(NULL, 124, false);
CGEventPost(kCGHIDEventTap, a);
CGEventPost(kCGHIDEventTap, b);
CGEventPost lets you determine where the event is posted. I used this code recently to allow someone to remotely control a PPT presentation on my laptop. (Character 124 is the right arrow key.)
Note that you should be freeing the event after posting it.
You can send to a specific app (eg not the front app) by using CGEventPostToPSN.
i think it's because you create a NSEvent using "WIDK" for parameter windowNumber , and WIDK is related to your app only.

addGlobalMonitorForEventsMatchingMask only returning mouse position

I'm trying to learn to code for the Mac. I've been a Java guy for a while, so I hope the problem I'm running into is a simple misunderstanding of Cocoa.
I've got the following code:
-(IBAction)beginEventMonitor:(id)sender {
_eventMonitor = [NSEvent addGlobalMonitorForEventsMatchingMask:(NSLeftMouseUpMask)
handler:^(NSEvent *incomingEvent) {
//NSWindow *targetWindowForEvent = [incomingEvent window];
NSLog(#"Got a mouse click event at %#", NSStringFromPoint([incomingEvent locationInWindow]));
}];
}
-(IBAction)stopEventMonitor:(id)sender {
if (_eventMonitor) {
[NSEvent removeMonitor:_eventMonitor];
_eventMonitor = nil;
}
}
This is a simple hook to tell me when a mouse click happens at a global level. The handler is working, but the contents of the incomingEvent don't seem to be set to anything. The only useful information that I can find is the location of the mouse at the time of the click, and the windowId of the window that was clicked in.
Shouldn't I be able to get more information? Am I not setting up the monitor correctly? I'd really like to be able to know which window was clicked in, but I can't even find a way to turn the mouse location or windowId into something useful.
You can retrieve more information about the window using the CGWindow APIs (new in Leopard), for example:
CGWindowID windowID = (CGWindowID)[incomingEvent windowNumber];
CFArrayRef a = CFArrayCreate(NULL, (void *)&windowID, 1, NULL);
NSArray *windowInfos = (NSArray *)CGWindowListCreateDescriptionFromArray(a);
CFRelease(a);
if ([windowInfos count] > 0) {
NSDictionary *windowInfo = [windowInfos objectAtIndex:0];
NSLog(#"Name: %#", [windowInfo objectForKey:(NSString *)kCGWindowName]);
NSLog(#"Owner: %#", [windowInfo objectForKey:(NSString *)kCGWindowOwnerName]);
//etc.
}
[windowInfos release];
There's lots of information there (look in CGWindow.h or refer to the docs for available keys). There are also functions to create screenshots of just one window (which even works if it's partially covered by another window), cool stuff!