How do you scrape json from APIs but from multiple pages? (scrapy) - scrapy

I'm trying to get json user informations from mastodon api https://mastodon.online/api/v1/accounts/1 (user id number). the problem is every page only stores one user info at a time, however I want to collect all the information at once. Is there a way to collect the json files at number order (https://mastodon.online/api/v1/accounts/{1,2,3,4,...}) then store it all in one json file?
I've been looking around for answers and everytime I used one that is similar to my question it wouldn't work. if anyone can help it would be really great, I've been stuck all day trying stuff out.
documentation; https://docs.joinmastodon.org/methods/accounts/#retrieve-information

Related

Get cryptocurrency twitter, website, and markets from coinmarketcap.com api

I am trying to get the twitter name, website, and markets for any cryptocurrency listed on coinmarketcap.
For example:
https://coinmarketcap.com/currencies/bitcoin shows all of the data that I need but how would I parse the data listed on that page to get the twitter name and website associated with bitcoin?
Don't try to parse the data on that page if you are trying to use it in an application. If you are doing it for a one time data collection then maybe that is OK. There is absolutely no guarantee that the format will remain consistent or that the information will even be there tomorrow.
You should try to find an API that gives you the information that you are looking for. An API is a contract that is expected to be honored and therefore is reliable. A quick look at CoinMarketCap's API doesn't appear to have the info you are looking for but maybe another one exists that does.
If you were to parse the HTML you could write a regex for the specific thing you are looking for. For example if you want to get the website you could write a regex that would pick up the pattern:
Website
the capture group ([^"]+) is the website in this example. You could do something like that for every element you want to get.

Is there a way to get the survey count totals using the SurveyMonkey API?

I have been working with the SurveyMonkey API for a few days now.
My ultimate goal is to be able to gather the voting results for each question in a survey.
For example... if I have a 5 question survey and each question has 3 options/answers... I'd like to gather the results of each question/option.
From what I'm finding in the API documentation... this is not possible.
Can this really not be possible?
Is there a way to gather the results of each question/answer combo using the API?
I hope I'm simply overlooking something.
Thanks!
It is definitely possible to get this kind of information - you can get the metadata of the survey via the API and all response data. How you process and parse that is up to you.
The most common use case to get a list of survey results is done the following way:
Get a list of respondent_ids via get_respondent_list
Send these respondent_ids to get_responses to get the raw response data
Match up the ids from this data with the ids described in the survey's metadata, which you get from get_survey_details

How to get public data from Google plus

I have a project that involves having public data downloaded from Google plus, can you give me a reference on how I can download like 1 GB of any type of public data from Google plus?
The data can be posts or circles information. I've tried to work with developer tools but the far I got is downloading my own profile information but what I need is public data.
Thanks !
There is no truly "public" data on Google+.
Every stream is unique to a user.
Try viewing the site without logging in, and you'll see what I mean.
Since users have the ability to block other users from viewing even their "public" posts, before Google shows you a post they check to see if you're on the blocked list. For them to be able to do that, you have to be logged in.
Your best bet would be to create a dummy account and only look at your nearby stream or What's Hot.
Otherwise you'd need to circle users, and that would create the stream. G+ is not like twitter. There's no firehose to speak of.
To programmatically cull data, you would have to use their API, but even then their HTTP API limits you to 20 results per search and you have to provide a query.
You could get up to 100 results per user if you picked individuals and got their userids, but again there's not a programmatic way to get a bulk dump.
You could randomly select users by using an activity search for a dictionary entry, and then seed that into the activity listing api... something like (in pure pseudocode)
for Random word in dictionary
group = userids from GET https://www.googleapis.com/plus/v1/activities?query=[word]
for userid in group
GET https://www.googleapis.com/plus/v1/people/[userid]/activities/collection/public
Actual code would of course depend on the language.

DISQUS: Is it possible to get a particular user's posts in a given forum?

I am running a website, whose community is powered by Disqus. I would like to create user profile pages, where the page would display the particular user's most recent activity, but only for my particular site (forum, in Disqus' terminology).
I ran through the entire API documentation, but I could not find a way that would allow me to filter by both user, and forum. I would be able to grab either the entire list of posts for a given forum, or the one from a particular user.
In every API call, there is a mysterious query paramater, where I tried to plug a series of filters, but none of them worked.
Is there something that I could be missing?
It's not that obvious, but you can use the query param as a filter for users. Try something like this:
https://disqus.com/api/3.0/forums/listPosts.json?forum={SHORTNAME}&query=user:{USERNAME}&api_key={YOUR_API_KEY}

Specify items per page

I'm trying to query Picasa Web Albums to pull album/photo data (obviously!) however initially I only need to pull the first 4 photos. Is there any way to limit a particular field or specify the items per page?
I've already accomplished something similar with Facebook's Graph API however I'm unable to find anything similar for Picasa. The only items I can find related to limiting the response is specifying which fields, but nothing related to the number of rows.
Use max-results argument. See doc here.
https://developers.google.com/picasa-web/docs/2.0/reference#Parameters