Should you add delays when scraping hidden APIs?

Should you add delays when scraping hidden APIs? - api

For most paid and official APIs, it is often stated that you can scrape how fast you want as long as you pay for the amount of data scraped. The question is should it be the same for hidden APIs or should you add delays to imitate human interactions (such as waiting times for a page to load when clicking on say a "More" button)?

Related

Fastest way for webscraping a page implementing a shopping bot

My friend and I are trying to develop a shopping bot. It must be as fast as it can get, because the products might run out of stock in a matter of seconds. We have looked for different ways of doing this, and we came up with Selenium and Scrapy and other python libraries, and we have something working already, but it seems so slow doing the task at hand.
We have thought of instead of scraping the web page (selecting product, adding to cart, etc), try making a bot that just sends an HTML post requests to the server of the store with the product and the rest of the information necessary. We have read in other posts that this is done with the requests library, but how can we know what information and how many post requests does an action require (like for example clicking the add to cart button sends some posts request to the server, so how can we know the information that goes in that request to emulate it in our program?)
We would like the library to be able to scrape web pages with JavaScript, for example when clicking a button or selecting an item from a drop down menu. We have run across some libraries that weren't able to do it (such as Scrapy)
Also we would appreciate if you know of a different programming language with may be better libraries or that it executes faster, we both know Python and Java, but we are open to suggestions

the fastest way would be through requests, using bs4 or regex to scrape the web page, this is what most 'shopping bots' use, to make it even faster you could write the bot in go or typescript which are way faster then python

Shopify Trekkie loading extra tracking Pixels

This is by far the most frustrating issue I've run into with Shopify. I'm trying to optimize a client's site speed by wrapping up all their tracking codes into Google Tag Manager to reduce the total number of outgoing requests. I removed all hardcoded tracking pixels from theme.liquid and placed them in GTM, went through ALL the apps and sales channels and disconnected from accounts, but there are still extra codes being loaded by Trekkie.
I'm using the Shopify Facebook and Google Analytics integrations as recommended, so those are not represented in GTM. Even so, it's still somehow loading 2 Google Analytics, 2 Google Ads and 2 Facebook pixels.
As you can see in the source code, there are 2 facebook pixel ids contained within the Trekkie object, but how is this possible when there's only one place to add this information?
If I remove the facebook pixel id from this screen (Themes > Preferences), then the first pixel will not load, only the second unwanted pixel loads. The same issue persists for Google Analytics and Google Ads, except I cannot see multiple account ids in the source code, I can only see this in the network tab of DevTools and in the Google Tag Assistant.
I would typically assume that these codes must be in the theme code somewhere or an app or something, except I can actually see with DevTools that the code is being called by Trekkie.
This is driving me absolutely crazy and I've already spent lots of time trying to make what I thought should be a simple optimization. If anyone can help with this issue I'd be hugely appreciative.
Thanks!

Instagram Automation without API allowed?

my two partners and me are about to create a software which automates liking, commenting and following for Instagram with the use of browser simulation (that means that we log into the account of the user through a browser, like google chrome).
Is that kind of automation allowed by Instagram? And if not, is there a possiblity to get aproved?

Yes it's against their terms. I wouldn't bother nor risk it. Instagram is actively suing bot services. Look at the biggest bot service, Instagress - mysteriously shut down entirely.
They're also penalizing accounts that use bots. I run an agency and have seen my clients' engagement mysteriously drop by 50-90% for a seemingly endless amount of time after using bots.
I imagine the purpose of doing it with "browser simulation" like Chrome is to try to avoid detection? Good luck. Instagram is smart and of course has some of the best programmers in the world who know how to combat this type of stuff.

I would say that such operation goes against the terms of user of Instagram. Under "General Description", section 10:
We prohibit crawling, scraping, caching or otherwise accessing any content on the Service via automated means, including but not limited to, user profiles and photos (except as may be the result of standard search engine protocols or technologies used by a search engine with Instagram's express consent).
Since you will be accessing content (and performing actions) via automated means, I would interpret that as a violation of this section.

google home reading from website

I'm currently working on a project where my main focus is to create an Action for Google Home which can be invoked and asked to read out some articles (chosen previously from a list, also by voice) from a particular website.
I was wondering if it was possible, or if it were already some similar projects.
What I'd like to do is something like the feature in Pocket or instapaper, where you can make the device read the article for you.
I also thought to make something like a database with all the articles I'm interested in, which auto-updates itself whenever a new article is posted, but my main concern now is to be able to separate the articles in various lists, parse the article and in the end implement text to speech into the Action.
Also some implementations with 3rd party services and apps would be useful.
Please ask me if anything isn't exactly clear, english is not my first language.

Yes, this is possible. Not necessarily easy, but possible.
First - there is nothing in the Actions on Google library or in Google Home that will automatically scrape a website. That will be up to you.
Second - Responses from your Action are limited in how much they can send at a time.
If you're having it do text-to-speech, you're limited to two "text bubbles" of 640 characters each before the user has to reply. You should keep well below that and should probably stick to just one "text bubble".
If you're playing an audio cut, then you're limited to two minutes.
You can work around both of these limitations by using the Media Response. With TTS, you would play a portion of the text, a brief Media response, at the conclusion of which, your server would be triggered to send the next chunk of text. If it is all recorded, you can just send the longer audio as the Media.
Be advised, however, that if you're using the inline editor or using Firebase Cloud Functions (which the inline editor uses), that by default you're not able to access most sites outside Google's network. You need to upgrade to a paid plan to do so. I suggest the Blaze plan which is pay-as-you-go, but includes a free tier which is typically good enough for development work and light production usage.

Sharing on Google+ with big image (photo sharing, perhaps)?

I'm currently working on a Google Hangouts app that, among other things, features posting links to certain pages on the user's Google+ page.
Because the shared content is mainly a visual thing (dynamically generated images, to be precise), I have been looking at ways of having the post
on user's stream display a big, full-width picture, essentially an effect similar to one visible here (disclaimer: I do not endorse the company linked in any way, it was simply one of the first examples I have found of the look).
Now, I've read through Google+ documentation on Share Button and Snippets about ways to have sharing from a single click and customisation of the content that comes with the link, but visually, the attached thumbnail is somewhat smaller than what I'd find ideal for the task (as visible ).
The example of a big picture display was tied to the photo sharing functionality, so I've looked at Google+ API, to see if there's a way to automate it, but as stated on the API docs landing page, "The Google+ API currently provides read-only access to public data.". No ability to pursue the goal through the official channels then.
Next step, GitHub. There are some sites for which the wrappers around their internal communication have been written, thus creating sort of unofficial API, so I tried my luck there. Among various libraries, I have found one that was not a wrapper around an official API, google-plus-extension-jsapi, but being written for the context of Chrome extensions rather than webpages, I couldn't get it to work, mainly due to usage of WebSQL and cross-domain XMLHTTPRequests.
Without any further leads, I ask the community thus - is there any way for a webpage app to provide the user with the ability to share a full-width picture on their Google+ stream or am I limited to standard sharing thumbnails?

I will confirm, you can not do what you're trying to do using the Google+ API because there is no stream write API and you do not have control over how shares will render.
As you have already determined, you can not write posts, such as a picture, directly to a user's stream, without the user's interaction (e.g. share). For branded pages, there is the Pages API, but it is currently not public and would be restricted to Pages as opposed to People/Profiles.
You can generate a share link to an external image and then if the user clicks it, the image can appear in their stream. As you noticed, the image will be a small thumbnail as opposed to a full-bleed photo and will render as a share - undesirable if you want the image to fill the whole stream area.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas