Get app-details from Google Play - api

I am wondering how the various app statistics sites get app-details from Google Play. As GP does not have a public API. An example is - they have full details for Google Play apps.
A possible solution is scraping, however it doesn't work because Google will block you when you start sending hundreds of requests to them.
Any ideas?
p.s. "Google Play Developers API" is not a choice as it lets you access app-details only for your apps.

They use either the mobile API used by Android devices (i.e. with this library) or scrape the Google Play website. Both methods are subject to rate limiting, so they put pauses in between requests.
The mobile device API is completely undocumented and very difficult to program against. I would recommend scraping.
There is no official API or feed that you can use.

Android Marketing API is used to get the All app details from google store, You can check it out at here:

Unfortunately Google Play (previously known as Android Market) does not expose an official API.
To get the data you need, you could develop your own HTML crawler, parse the page and extract the app meta-data you need. This topic has been covered in other questions, for instance here.
If you don't want to implement all that by yourself (as you mentioned it's a complex project to do), you could use a third-party service to access Android apps meta-data through a JSON-based API.
For instance, (the company I work for) offers an API for both Android and iOS, you can see more details here.
The endpoints range from "lookup" (to get one app's meta-data, probably what you need) to "search", but we also expose "rank history" and other stats from the leading app stores. We have extensive documentation for all supported features, you find them in the left panel: 42matters docs
I hope this helps, otherwise feel free to get in touch with me. I know this industry quite well and can point you in the right direction.

The request might be blocked if using requests as default user-agent in requests library is a python-requests.
An additional step could be to rotate user-agent, for example, to switch between PC, mobile, and tablet, as well as between browsers e.g. Chrome, Firefox, Safari, Edge and so on. User-agent rotation can be used in combo with proxy rotation (ideally residential) + CAPTCHA solver.
At the moment, the Google Play Store has been heavily redesigned, now it is almost completely dynamic. However, all the data can be extracted from the inline JSON.
For scraping dynamic sites, selenium or playwright webdriver is great. However, in our case, using BeautifulSoup and regular expression is faster to extract data from the page source.
We must extract certain <script> element from all <script> elements in the HTML, by using regular expression, and transform in to a dict with json.loads():
basic_app_info = json.loads(re.findall(r"<script nonce=\"\w+\" type=\"application/ld\+json\">({.*?)</script>", str("script")[11]), re.DOTALL)[0])
Check code in online IDE.
from bs4 import BeautifulSoup
import requests, re, json, lxml
headers = {
"user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/ Safari/537.36"
params = {
"id": "", # app name
"gl": "US", # country of the search
"hl": "en_GB" # language of the search
html = requests.get("", params=params, headers=headers, timeout=30)
soup = BeautifulSoup(html.text, "lxml")
# where all app data will be stored
app_data = {
"downloads_info": {}
# [11] index is a basic app information
basic_app_info = json.loads(re.findall(r"<script nonce=\"\w+\" type=\"application/ld\+json\">({.*?)</script>", str("script")[11]), re.DOTALL)[0])
additional_basic_info ="<script nonce=\"\w+\">AF_initDataCallback\(.*?(\"{basic_app_info.get('name')}\".*?)\);<\/script>",
str("script")), re.M|re.DOTALL).group(1)
app_data["basic_info"]["name"] = basic_app_info.get("name")
app_data["basic_info"]["type"] = basic_app_info.get("#type")
app_data["basic_info"]["url"] = basic_app_info.get("url")
app_data["basic_info"]["description"] = basic_app_info.get("description").replace("\n", "") # replace new line character to nothing
app_data["basic_info"]["application_category"] = basic_app_info.get("applicationCategory")
app_data["basic_info"]["operating_system"] = basic_app_info.get("operatingSystem")
app_data["basic_info"]["thumbnail"] = basic_app_info.get("image")
app_data["basic_info"]["content_rating"] = basic_app_info.get("contentRating")
app_data["basic_info"]["rating"] = round(float(basic_app_info.get("aggregateRating").get("ratingValue")), 1) # 4.287856 -> 4.3
app_data["basic_info"]["reviews"] = basic_app_info.get("aggregateRating").get("ratingCount")
app_data["basic_info"]["reviews"] = basic_app_info.get("aggregateRating").get("ratingCount")
app_data["basic_info"]["price"] = basic_app_info["offers"][0]["price"]
app_data["basic_info"]["developer"]["name"] = basic_app_info.get("author").get("name")
app_data["basic_info"]["developer"]["url"] = basic_app_info.get("author").get("url")
app_data["basic_info"]["developer"]["email"] ="[a-zA-Z0-9_.+-]+#[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+", additional_basic_info).group(0)
# (a few matches but always matches the first occurence)
app_data["basic_info"]["release_date"] ="\d{1,2}\s[A-Z-a-z]{3}\s\d{4}", additional_basic_info).group(0)
app_data["basic_info"]["downloads_info"]["long_form_not_formatted"] ="\"(\d+,?\d+,?\d+\+)\"\,(\d+),(\d+),\"(\d+M\+)\"", additional_basic_info).group(1)
app_data["basic_info"]["downloads_info"]["long_form_formatted"] ="\"(\d+,?\d+,?\d+\+)\"\,(\d+),(\d+),\"(\d+M\+)\"", additional_basic_info).group(2)
app_data["basic_info"]["downloads_info"]["as_displayed_short_form"] ="\"(\d+,?\d+,?\d+\+)\"\,(\d+),(\d+),\"(\d+M\+)\"", additional_basic_info).group(4)
app_data["basic_info"]["downloads_info"]["actual_downloads"] ="\"(\d+,?\d+,?\d+\+)\"\,(\d+),(\d+),\"(\d+M\+)\"", additional_basic_info).group(3)
# [2:] skips 2 PEGI logo thumbnails and extracts only app images
app_data["basic_info"]["images"] = re.findall(r",\[\d{3,4},\d{3,4}\],.*?(https.*?)\"", additional_basic_info)[2:]
app_data["basic_info"]["video_trailer"] = "".join(re.findall(r"\"(https:\/\/play-games\.\w+\.com\/vp\/mp4\/\d+x\d+\/\S+\.mp4)\"", additional_basic_info)[0])
app_data["basic_info"]["video_trailer"] = None
print(json.dumps(app_data, indent=2, ensure_ascii=False))
Example output:
"basic_info": {
"developer": {
"name": "Nintendo Co., Ltd.",
"url": "",
"email": ""
"downloads_info": {
"long_form_not_formatted": "100,000,000+",
"long_form_formatted": "100000000",
"as_displayed_short_form": "100M+",
"actual_downloads": "213064462"
"name": "Super Mario Run",
"type": "SoftwareApplication",
"url": "",
"description": "Control Mario with just a tap!",
"application_category": "GAME_ACTION",
"operating_system": "ANDROID",
"thumbnail": "",
"content_rating": "Everyone",
"rating": 4.0,
"reviews": "1645926",
"price": "0",
"release_date": "22 Mar 2017",
"images": [
# ...
A possible good solution with shorter and simpler code could be Google Play Store API from SerpApi. It's a paid API with a free plan.
The difference is that it will bypass blocks (including CAPTCHA) from Google, no need to create the parser and maintain it.
SerpApi simple code example:
from serpapi import GoogleSearch
import os, json
params = {
"api_key": os.getenv("API_KEY"), # your serpapi api key
"engine": "google_play_product", # parsing engine
"store": "apps", # app page
"gl": "us", # country of the search
"product_id": "", # low review count example to show it exits the while loop
"all_reviews": "true" # shows all reviews
search = GoogleSearch(params) # where data extraction happens
results = search.get_dict()
print(json.dumps(results["product_info"], indent=2, ensure_ascii=False))
print(json.dumps(results["media"], indent=2, ensure_ascii=False))
# other data
Output exactly the same as in the previous solution.
There's a Scrape Google Play Store App in Python blog post if you need a little bit more code explanation.
Disclaimer, I work for SerpApi.


Send request to wordnet

I need to get send a request on wordnet knowing the tar_id (taken from Imagenet) to get the lemma assigned to that tar (e.g., I have a tar with houses, I need to send the request and obtain the lemma written on wordnet "living accommodation").
I used requests.get() first, with the URL. Then BeautifulSoup's parser.
I get the parsed HTML as a return but, there is no reference to the "body", meaning the part of the Noun and hypernyms / hyponyms.
Can you tell me how to get that part of Wordnet parsed with the rest of the page?
This is the URL I'm working on:
Just use the JSON endpoint.
For example:
import requests
headers = {
"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:96.0) Gecko/20100101 Firefox/96.0",
url = ""
data = requests.get(url, headers=headers).json()
structures collectively in which people are housed
And if you switch the endpoint to
url = ""
You'll get all the word relation data.

How do I ingest tide gauge data from the NOAA HTTP API into Thingsboard Professional?

NOAA provides tidal and weather data through their own http API, and I would like to be able to use their API to get data into ThingsBoard (Professional) every six minutes to overlay with my device data (their data are updated every 6 minutes). Can someone walk me through the details of using the correct Integrations or Rule chains to get the time series data added to the database? It would also be nice to only use the metadata once. Below you can see how to get the most recent tide gauge level (water level) using their API.
For example, to see the latest tide gauge water level for a tide gauge (in this case, tide gauge 8638610), the API allows for getting the most recent water level information --
That call produces the following JSON: {"metadata":{"id":"8638610","name":"Sewells Point","lat":"36.9467","lon":"-76.3300"},"data":[{"t":"2022-02-08 22:42", "v":"-0.134", "s":"0.003", "f":"1,0,0,0", "q":"p"}]}
The Data Converter was fairly easy to construct (except maybe the[0, 0] used in the code below):
//function Decoder(payload,metadata)
var noaa_data = decodeToJson(payload);
var deviceName =;
var dataType = 'water_level';
var latitude =;
var longitude = noaa_data.metadata.lon;
var waterLevelData =[0, 0];
function decodeToString(payload) {
return String.fromCharCode.apply(String, payload);
var result = {
deviceName: deviceName,
dataType: dataType,
time: waterLevelData.t,
waterLevel: waterLevelData.v,
waterLevelStDev: waterLevelData.s,
latitude: latitude,
longitude: longitude
function decodeToJson(payload) {
var str = decodeToString(payload);
var data = JSON.parse(str);
return data;
return result;
which has an Output:
"deviceName": "8638610",
"dataType": "water_level",
"time": "2022-02-08 22:42",
"waterLevel": "-0.134",
"waterLevelStDev": "0.003",
"latitude": "36.9467",
"longitude": "-76.3300"
I am not sure what process to use to get the data into ThingsBoard to be displayed as a device alongside my other device data.
Thank you for your help.
If you have a specific(and small) number of stations to grab then you can do the following:
Create the devices in Thingsboard manually
Go into rule chains, create a water stations rule chain
For each water station place a 'Generator' node, selecting the originator as required.
Route these into an external "Rest API" node.
Route the result of the post into a blue script node and put your decoder in there
Route result to telemetry
Example rule chain
More complex solution but more scalable:
Use a single generator node
Route the message into a blue script. This will contain a list of station id's that you want to pull info for. By setting the output of a script to the following you can make it send out multiple messages in sequence:
return [{msg:{}, metadata:{}, msgType{}, ...etc...]
Route the blue script into the rest api call and get the station data
Do some post processing with another blue script node if you need to. Don't decode the data here though.
Route all this into another rest api node and POST the data back to your HTTP integration endpoint (if you don't have one you will need to create it. Fairly simple)
Connect your data converter to this integration.
Finally, modify your output so that it is accepted by the converter output
"deviceName": "8638610",
"deviceType": "water-station",
"telemetry": {
"dataType": "water_level",
"time": "2022-02-08 22:42",
"waterLevel": "-0.134",
"waterLevelStDev": "0.003",
"latitude": "36.9467",
"longitude": "-76.3300"
Rough example
Above is how I would do it if I didn't want to use any external services. If you're AWS savvy I'd say set up a CRON job to trigger a lambda function every 6 minutes and post into your platform. Either will work.

What data can I save from the spotify API?

I'm building a website and I'm using the Spotify API as a music library. I would like to add more filters and order options to search traks than the api allows me to so I was wondering what track/song data can I save to my DB from the API, like artist name or popularity.
I would like to save: Name, Artists, Album and some other stuff. Is that possible or is it against the terms and conditions?
Thanks in advance!
Yes, it is possible.
Data is stored in Spotify API in endpoints.
Spotify API endpoint reference here.
Each endpoint deals with the specific kind of data being requested by the client (you).
I'll give you one example. The same logic applies for all other endpoints.
import requests
Import library in order to make api calls.
Alternatively, ou can also use a wrapper like "Spotipy"
instead of requesting directely.
# hit desired endpoint
# define your call
def search_by_track_and_artist(artist, track):
path = 'token.json' # you need to get a token for this call
# endpoint reference page will provide you with one
# you can store it in a file
with open(path) as t:
token = json.load(t)
# call API with authentication
myparams = {'type': 'track'}
myparams['q'] = "artist:{} track:{}".format(artist,track)
resp = requests.get(SEARCH_ENDPOINT, params=myparams, headers={"Authorization": "Bearer {}".format(token)})
return resp.json()
try it:
search_by_track_and_artist('Radiohead', 'Karma Police')
Store the data and process it as you wish. But you must comply with Spotify terms in order to make it public.
sidenote: Spotipy docs.

Wit AI response for API requests

I'm using wit ai for a bot and I think it's amazing. However, I must provide the customer with screens in my web app to train and manage the app. And here I found a big problem (or maybe I'm just lost). The documentation of the REST API is not enough to design a client that acts like the wit console (not even close). it's like a tutorial of what endpoints you can hit and an overview of the parameters, but no clean explanation of the structure of the response.
For example, there is no endpoint to get the insights edge. Also and most importantly, no clear documentation about the response structure when hitting the message endpoints (i.e. the structure the returned entities: are they prebuilt or not, and if they are, is the value a string or an object or array, and what the object might contain [e.g. datetime]). Also the problem of the deprecated guide and the new guide (the new guide should be done and complete by now). I'm building parts of the code based on my testing. Sometimes when I test something new (like adding a range in the datetime entity instead of just a value), I get an error when I try to set the values to the user since I haven't parsed the response right, and the new info I get makes me modify the DB structure at my end sometimes.
So, the bottom line, is there a complete reference that I can implement a complete client in my web app (my web app is in Java by the way and I couldn't find a client library that handles the latest version of the API)? Again, the tool is AWESOME but the documentation is not enough, or maybe I'm missing something.
The document is not enough of course but I think its pretty straightforward. And from what I read there is response structure under "Return the meaning of a sentence".
It's response in JSON format. So you need to decode the response first.
Example Request:
$ curl -XGET '' \
-H 'Authorization: Bearer $TOKEN'
Example Response:
"msg_id": "387b8515-0c1d-42a9-aa80-e68b66b66c27",
"_text": "how many people between Tuesday and Friday",
"entities": {
"metric": [ {
"metadata": "{'code': 324}",
"value": "metric_visitor",
"confidence": 0.9231
} ],
"datetime": [ {
"value": {
"from": "2014-07-01T00:00:00.000-07:00",
"to": "2014-07-02T00:00:00.000-07:00"
"confidence": 1
}, {
"value": {
"from": "2014-07-04T00:00:00.000-07:00",
"to": "2014-07-05T00:00:00.000-07:00"
"confidence": 1
} ]
You can read more about response structure under Return the meaning of a sentence

How do I upload a video to Youtube directly from my server?

I'm setting up a (headless) web server that lets people build their own custom time-lapse movies.
Several people want to upload the time-lapse videos they make to YouTube.
Rather than download the video to that person's laptop,
and the that person manually uploads it to YouTube,
is there a way I can write some software on my web server to take that video file on my web server and upload it directly to that user's account on YouTube?
I've been told that asking my users for their YouTube handle and password is the Wrong Thing To Do, and I should be using the YouTube V3 API with Oauth.
I tried the techniques listed at
" I want to upload a video from my web page to youtube by using javascript youtube API ",
which seems to "work", but every time I had to download the video to that person's laptop and then uploading from the laptop to YouTube. Is there a way to tweak that system to upload directly from my server to YouTube?
I found some python code that (after I set up my client_secrets.json) lets me upload videos directly from my server directly to someone's YouTube account after that person did the Oauth authentication.
But the first time some new person tries to upload a video to some new YouTube account that my server has never dealt with before, it either
(a) pops open a web browser on my server, and then if I VNC to the server and type in a YouTube handle and password into that web browser, it gets authenticated -- but I'd rather not do that for every user.
(b) with the "--noauth_local_webserver" option, spits out a URL on the command line and waits. Then if I manually copy that URL and paste it into a web browser, log in to YouTube, copy-and-paste the token back into this application that is still waiting for input on the command line, that person gets authenticated. But I'd rather not do that for every user. I guess that would be OK if I could capture that URL in my cgi-bin script and stick it in a web page, and then later somehow get the authentication response and cram it back into this program, but how? I don't even see that print statement or the raw_input statement in this code.
# which is identical to the code sample at
import httplib
import httplib2
import os
import random
import sys
import time
from apiclient.discovery import build
from apiclient.errors import HttpError
from apiclient.http import MediaFileUpload
from oauth2client.client import flow_from_clientsecrets
from oauth2client.file import Storage
from import argparser, run_flow
# Explicitly tell the underlying HTTP transport library not to retry, since
# we are handling retry logic ourselves.
httplib2.RETRIES = 1
# Maximum number of times to retry before giving up.
# Always retry when these exceptions are raised.
RETRIABLE_EXCEPTIONS = (httplib2.HttpLib2Error, IOError, httplib.NotConnected,
httplib.IncompleteRead, httplib.ImproperConnectionState,
httplib.CannotSendRequest, httplib.CannotSendHeader,
httplib.ResponseNotReady, httplib.BadStatusLine)
# Always retry when an apiclient.errors.HttpError with one of these status
# codes is raised.
RETRIABLE_STATUS_CODES = [500, 502, 503, 504]
# The CLIENT_SECRETS_FILE variable specifies the name of a file that contains
# the OAuth 2.0 information for this application, including its client_id and
# client_secret. You can acquire an OAuth 2.0 client ID and client secret from
# the Google Developers Console at
# Please ensure that you have enabled the YouTube Data API for your project.
# For more information about using OAuth2 to access the YouTube Data API, see:
# For more information about the client_secrets.json file format, see:
CLIENT_SECRETS_FILE = "client_secrets.json"
# This OAuth 2.0 access scope allows an application to upload files to the
# authenticated user's YouTube channel, but doesn't allow other types of access.
# This variable defines a message to display if the CLIENT_SECRETS_FILE is
# missing.
WARNING: Please configure OAuth 2.0
To make this sample run you will need to populate the client_secrets.json file
found at:
with information from the Developers Console
For more information about the client_secrets.json file format, please visit:
""" % os.path.abspath(os.path.join(os.path.dirname(__file__),
VALID_PRIVACY_STATUSES = ("public", "private", "unlisted")
def get_authenticated_service(args):
flow = flow_from_clientsecrets(CLIENT_SECRETS_FILE,
storage = Storage("%s-oauth2.json" % sys.argv[0])
credentials = storage.get()
if credentials is None or credentials.invalid:
credentials = run_flow(flow, storage, args)
def initialize_upload(youtube, options):
tags = None
if options.keywords:
tags = options.keywords.split(",")
# Call the API's videos.insert method to create and upload the video.
insert_request = youtube.videos().insert(
# The chunksize parameter specifies the size of each chunk of data, in
# bytes, that will be uploaded at a time. Set a higher value for
# reliable connections as fewer chunks lead to faster uploads. Set a lower
# value for better recovery on less reliable connections.
# Setting "chunksize" equal to -1 in the code below means that the entire
# file will be uploaded in a single HTTP request. (If the upload fails,
# it will still be retried where it left off.) This is usually a best
# practice, but if you're using Python older than 2.6 or if you're
# running on App Engine, you should set the chunksize to something like
# 1024 * 1024 (1 megabyte).
media_body=MediaFileUpload(options.file, chunksize=-1, resumable=True)
# This method implements an exponential backoff strategy to resume a
# failed upload.
def resumable_upload(insert_request):
response = None
error = None
retry = 0
while response is None:
print "Uploading file..."
status, response = insert_request.next_chunk()
if 'id' in response:
print "Video id '%s' was successfully uploaded." % response['id']
exit("The upload failed with an unexpected response: %s" % response)
except HttpError, e:
if e.resp.status in RETRIABLE_STATUS_CODES:
error = "A retriable HTTP error %d occurred:\n%s" % (e.resp.status,
error = "A retriable error occurred: %s" % e
if error is not None:
print error
retry += 1
if retry > MAX_RETRIES:
exit("No longer attempting to retry.")
max_sleep = 2 ** retry
sleep_seconds = random.random() * max_sleep
print "Sleeping %f seconds and then retrying..." % sleep_seconds
if __name__ == '__main__':
argparser.add_argument("--file", required=True, help="Video file to upload")
argparser.add_argument("--title", help="Video title", default="Test Title")
argparser.add_argument("--description", help="Video description",
default="Test Description")
argparser.add_argument("--category", default="22",
help="Numeric video category. " +
argparser.add_argument("--keywords", help="Video keywords, comma separated",
argparser.add_argument("--privacyStatus", choices=VALID_PRIVACY_STATUSES,
default=VALID_PRIVACY_STATUSES[0], help="Video privacy status.")
args = argparser.parse_args()
if not os.path.exists(args.file):
exit("Please specify a valid file using the --file= parameter.")
youtube = get_authenticated_service(args)
initialize_upload(youtube, args)
except HttpError, e:
print "An HTTP error %d occurred:\n%s" % (e.resp.status, e.content)
use "client_secrets.json"
configure credentials to generate it
Very useful step-by-step guide about how to get access and fresh tokens and save them for future use using YouTube OAuth API v3. PHP server-side YouTube V3 OAuth API video upload guide.