I am trying to scrape linkedin profiles and posts. I tried with selenium and webdriver. It works perfect but after some attempts linkedin blocks account or ip address.
Then I tried tools like phantombuster, scraper api they scrape linkedin data without getting blocked.. So how does these paid tools manages to not get blocked. any idea?
Commercial scrapers have many ways of scraping websites without getting blocked.
A few methods:
Change your user agent
Use rotating proxies
Use residential proxies
Captcha farms
Bypassing JS challenges
Throttle requests
Use API whenever possible
P.S. I do not recommend breaking the ToS and bypassing LinkedIn's blocks.
Related
my dev and I would like to implement the google picker on our website. It will allow the web-visitor to upload their files from their Google drive to our website.
My dev is now trying to get the API for the google picker however they are asking for a "demo video that showcases the process to request an OAuth token" and we were wondering how we should do it when we don't have the API from google.
We are doing all of this on the staging site and we were wondering how are we suppose to do this demo video when the API is not provided and not installed.
please enlighten us, thank you!
See the question How can I make sure the verification process is as streamlined as possible? in the FAQ. It explains what the verification team is looking for with the video. Mostly it's just about showing how your product uses OAuth and the various APIs -- in your case how it asks for access to Drive, how the picker is used, etc. You're showing the integration from the user perspective.
My app uses Google API and it worked well for a long time but recently Google blocked the IP address of my server for about an hour and every response to the API was as follows: Our systems have detected unusual traffic from your computer network. This page checks to see if it's really you sending the requests, and not a robot... - they attached CAPTCHA form field as html.
My app is also an API so CAPTCHA can not be done because it is in json format (HTML in json cannot be rendered).
Only app users sends requests so I have a little control over it. Of course, I limit the number of requests per user, but it was not the case. Limits at google console are also fine.
I wonder how can I prevent this happening in the future? Is there any way I can ask them directly? Have you experienced this?
I am trying to authenticate a user inside a desktop application using the web api. I am not using a browser, I am using straight up GET and PUSH calls to the endpoints of the Spotify servers. Immediately I ran into some problems. It appears that upon the initial GET command to "accounts.spotify.com", the returned response includes HTML with a javascript function that runs and is responsible for dynamically generating HTML that you see on the initial login page. If you look at the Javascript function, it is clear that this is what is going on, however, you can also see this code is obfuscated and not meant to be used by us, the developers! (Link to Javascript code here for reference: Javascript function)
So my question is, while I can probably reverse engineer the code to get this working, would this be against the Spotify developer TOS?
Thanks!
Spotify's authentication happens through oauth, and a big part of user authentication as per the oauth rfc is where the user delegates permissions to your app to carry out API calls that affect their account, or return information about them. That's the web page you're seeing - it must be presented to your users so that they can delegate permissions so that Spotify can give your app an access token. It doesn't necessarily need to happen in a browser - it can happen in a web view inside your desktop application - but it does need to be loaded over https, and your application must not alter or reverse engineer the Spotify permissions delegations page.
As you correctly guessed, reverse engineering any Spotify APIs is against terms of service.
For more information on authorization on the Spotify platform, I'd recommend having a look at this guide.
Hope that helps! Please ping me if you have any more questions.
Hugh
Spotify Developer Support
I have a Youtube video set to private so nobody can watch it via Youtube or the embedded player. However I do want people be to be able to watch it on my website. The goal is to make the video available exclusively on my website for a while before I open it to the world. I was thinking to login to my Youtube account seemlessly using Youtube's API and log out after the video's finished but that doesn't make security sense. What's your take on that?
I agree with your intuition. Making the private sharing secure seems tricky at best. Although the Data API has procedural authentication options, I don't think Player API has that facility. Furthermore even if it did, it's hard to see how it might work without exposing your password.
Your best bet is probably to directly host the video on your website. You would use your website's authentication for restricting access limited release video. Then when your ready for the public release you can either switch to YouTube hosting or relax the authentication of your self hosting. The Video for Everybody site has examples of several options for self hosting of videos.
I'm working on redesigning an older site and I'm upgrading it to HTML5. The site currently has both Google and Yahoo site verification META tags.
I'm using HTML5 Reset as a starting template and it only has an area for Google site verification, not Yahoo. Also, the W3C Validator validates Google site verification in HTML5, but not Yahoo.
Does anyone have an opinion as to whether or not Yahoo site verification is important or useful? And does it hold any weight with SEO these days?
No.
Yahoo search is now powered by Bing.
You can remove both (Google and Bing) verification references and authenticate using other methods (DNS is best). Bing only offers meta tag and XML methods.