What does UMA refer in Chrome? - chromium

I saw related words like 'UMA opt-in users' when I browse code of chromium. Can anyone help me understanding 'UMA' or 'safebrowsing'?

UMA (User Metrics Analysis) is user metrics that are reported to help make Chrome/Chromium better. e.g. latency metrics, HTML/CSS feature usage, etc. These are the bits that are sent back to Google when you check the "Help make Google Chrome better by automatically sending usage statistics..." box on installation.
Safe browsing is the feature of chrome that checks page loads for phishing and malware. More info here

Related

How to bypass Captcha while Web Scraping

I am trying to scrape the car details from this site using Selenium: https://www.autoscout24.ch/de/autos/alle-marken?vehtyp=10
Approximately every 30 pages I have to verify that I am not a robot,
even though I have included in my code:
driver.implicitly_wait(20)
Is there any way to overcome this?
CAPTCHA is meant for those reasons. There is no co-relation with it being removed due to use of waits in Selenium script. The use of CAPTCHA is to detect that bots/automated systems are not crawling the web page.
Unless you disable it, I don't think that it is the right approach to automate it. Although you may find some tutorials on web to overcome it, but they are very patchy and do not cover all the use cases.
2 options come to mind on how to solve your issue, which one you'll choose depends on what you need.
Option 1 will be cheaper and probably easier, but you can just make your script wait when the Captcha is detected, and play a sound when it's shown so you can manually do the captcha yourself, after the captcha has been dealt with you can let the script continue doing it's thing.
The second option would be to use a captcha solving service, you would need to pay a little but would not need to manually do anything.
I'm not a robot
The "I'm not a robot" checkbox, commonly known as reCAPTCHA v2 is one of the security measure in practice for implementing challenge-response authentication. CAPTCHA (Completely Automated Public Turing test to tell Computers and Humans Apart) mainly helps to protect the applications and the systems from spam and password decryption by asking to complete a simple test that proves it's a human and not a computer trying to access into a password protected account. In short CAPTCHA is implemented to help prevent unauthorized account entry.
So neither of the wait mechanism Implicit wait or Explicit wait would be of any help to avoid CAPTCHA
Solution
An ideal approach would be to disable the CAPTCHA for the AUT (Application Under Test) within Testing / Stagging environment and enable it only in production environment.
References
You can find a couple of relevant detailed discussions in:
How does reCAPTCHA 3 know I'm using Selenium/chromedriver?
How can I bypass the Google CAPTCHA with Selenium and Python?

Google Pay on the web with 3ds

I'm trying to implement Google pay on the web by this example: https://developers.google.com/pay/api/web/guides/paymentrequest/tutorial.
When I remove "PAN_ONLY" from the example code, the button becomes invisible on my PC and on smartphone.
Under what conditions with the authentication method "CRYPTOGRAM_3DS" payment will be available?
I'm using the latest version of Google Chrome.
Tokenized cards are only available from your Android device today, where there are the means to securely authenticate the transaction and store sensitive information related to your forms of payment.
There are initiatives in the industry aiming at adding means of security and second factors to the web. These are expected to help the utilization of tokenization on the Web too.

Instagram Automation without API allowed?

my two partners and me are about to create a software which automates liking, commenting and following for Instagram with the use of browser simulation (that means that we log into the account of the user through a browser, like google chrome).
Is that kind of automation allowed by Instagram? And if not, is there a possiblity to get aproved?
Yes it's against their terms. I wouldn't bother nor risk it. Instagram is actively suing bot services. Look at the biggest bot service, Instagress - mysteriously shut down entirely.
They're also penalizing accounts that use bots. I run an agency and have seen my clients' engagement mysteriously drop by 50-90% for a seemingly endless amount of time after using bots.
I imagine the purpose of doing it with "browser simulation" like Chrome is to try to avoid detection? Good luck. Instagram is smart and of course has some of the best programmers in the world who know how to combat this type of stuff.
I would say that such operation goes against the terms of user of Instagram. Under "General Description", section 10:
We prohibit crawling, scraping, caching or otherwise accessing any content on the Service via automated means, including but not limited to, user profiles and photos (except as may be the result of standard search engine protocols or technologies used by a search engine with Instagram's express consent).
Since you will be accessing content (and performing actions) via automated means, I would interpret that as a violation of this section.

Tracking Interactive PDF Clicks

I came across a curious question today, asked by my boss. Is it possible to track the clicks to pages inside an interactive PDF without it being embedded in a web page?
The client wants the user to download a PDF from his/her website and track what pages the user is clicking on inside the downloaded PDF.
After searching around on google for a while all I kept getting was links to pages telling you how to track PDF downloads.
Anyone who can shed some light on this or offer me a definitive yes or no to this question would be greatly appreciated.
The Javascript for Acrobat API Reference makes note of this event (page 368 of the API reference):
Page/Open
4.05
This event occurs whenever a new page is viewed by the user and after page
drawing for the page has occurred.
The target for this event is the Doc.
This event does not listen to the rc return code.
This would imply to me that you can hook this event and (assuming the end user permits the communication) send info to your web server every time they change pages.
Obviously this is limited to when the user is reading in Acrobat (Reader or Professional); it will not work if they are reading directly in Chrome or Firefox. And to re-emphasize, Acrobat will prompt the user to ask if it is allowed to communicate with an external website. If the user denies it, no tracking.
As it has already been stated, the PageOpen event would be the hook for tracking pages. But as this works only for (Acrobat) JavaScript enabled viewers, which at that moment have an internet connection available, those stats would be suboptimal.
We also have to point out that this kind of tracking is highly questionable from the point of view privacy (in Europe, this may even be illegal).
A little bit less questionable could be to chop up the document into single pages, add navigation links, and then use the server stats.

Reliably detecting PhantomJS-based spam bots

Is there any way to consistently detect PhantomJS/CasperJS? I've been dealing with a spat of malicious spambots built with it and have been able to mostly block them based on certain behaviours, but I'm curious if there's a rock-solid way to know if CasperJS is in use, as dealing with constant adaptations gets slightly annoying.
I don't believe in using Captchas. They are a negative user experience and ReCaptcha has never worked to block spam on my MediaWiki installations. As our site has no user registrations (anonymous discussion board), we'd need to have a Captcha entry for every post. We get several thousand legitimate posts a day and a Captcha would see that number divebomb.
I very much share your take on CAPTCHA. I'll list what I have been able to detect so far, for my own detection script, with similar goals. It's only partial, as they are many more headless browsers.
Fairly safe to use exposed window properties to detect/assume those particular headless browser:
window._phantom (or window.callPhantom) //phantomjs
window.__phantomas //PhantomJS-based web perf metrics + monitoring tool
window.Buffer //nodejs
window.emit //couchjs
window.spawn //rhino
The above is gathered from jslint doc and testing with phantom js.
Browser automation drivers (used by BrowserStack or other web capture services for snapshot):
window.webdriver //selenium
window.domAutomation (or window.domAutomationController) //chromium based automation driver
The properties are not always exposed and I am looking into other more robust ways to detect such bots, which I'll probably release as full blown script when done. But that mainly answers your question.
Here is another fairly sound method to detect JS capable headless browsers more broadly:
if (window.outerWidth === 0 && window.outerHeight === 0){ //headless browser }
This should work well because the properties are 0 by default even if a virtual viewport size is set by headless browsers, and by default it can't report a size of a browser window that doesn't exist. In particular, Phantom JS doesn't support outerWith or outerHeight.
ADDENDUM: There is however a Chrome/Blink bug with outer/innerDimensions. Chromium does not report those dimensions when a page loads in a hidden tab, such as when restored from previous session. Safari doesn't seem to have that issue..
Update: Turns out iOS Safari 8+ has a bug with outerWidth & outerHeight at 0, and a Sailfish webview can too. So while it's a signal, it can't be used alone without being mindful of these bugs. Hence, warning: Please don't use this raw snippet unless you really know what you are doing.
PS: If you know of other headless browser properties not listed here, please share in comments.
There is no rock-solid way: PhantomJS, and Selenium, are just software being used to control browser software, instead of a user controlling it.
With PhantomJS 1.x, in particular, I believe there is some JavaScript you can use to crash the browser that exploits a bug in the version of WebKit being used (it is equivalent to Chrome 13, so very few genuine users should be affected). (I remember this being mentioned on the Phantom mailing list a few months back, but I don't know if the exact JS to use was described.) More generally you could use a combination of user-agent matching up with feature detection. E.g. if a browser claims to be "Chrome 23" but does not have a feature that Chrome 23 has (and that Chrome 13 did not have), then get suspicious.
As a user, I hate CAPTCHAs too. But they are quite effective in that they increase the cost for the spammer: he has to write more software or hire humans to read them. (That is why I think easy CAPTCHAs are good enough: the ones that annoy users are those where you have no idea what it says and have to keep pressing reload to get something you recognize.)
One approach (which I believe Google uses) is to show the CAPTCHA conditionally. E.g. users who are logged-in never get shown it. Users who have already done one post this session are not shown it again. Users from IP addresses in a whitelist (which could be built from previous legitimate posts) are not shown them. Or conversely just show them to users from a blacklist of IP ranges.
I know none of those approaches are perfect, sorry.
You could detect phantom on the client-side by checking window.callPhantom property. The minimal script is on the client side is:
var isPhantom = !!window.callPhantom;
Here is a gist with proof of concept that this works.
A spammer could try to delete this property with page.evaluate and then it depends on who is faster. After you tried the detection you do a reload with the post form and a CAPTCHA or not depending on your detection result.
The problem is that you incur a redirect that might annoy your users. This will be necessary with every detection technique on the client. Which can be subverted and changed with onResourceRequested.
Generally, I don't think that this is possible, because you can only detect on the client and send the result to the server. Adding the CAPTCHA combined with the detection step with only one page load does not really add anything as it could be removed just as easily with phantomjs/casperjs. Defense based on user agent also doesn't make sense since it can be easily changed in phantomjs/casperjs.