Accessing Metacritic API and/or Scraping

Accessing Metacritic API and/or Scraping - api

Does anybody know where documentation for the Metacritic api is/if it still works. There used to be a Metacritic API at https://market.mashape.com/byroredux/metacritic-v2#get-user-details which disappeared today.
Otherwise I'm trying to scrape the site myself but keeping getting a blocked by a 429 Slow down. I got data like 3 times this hour and haven't been able to get anymore in the last 20 minutes which is making testing difficult and application possibly useless. Please let me know if there's anything else I can be doing to scape I don't know about.

I was using that API as well for an app I wrote a while ago. Looks like the creator removed it from Mashape. I just sent him an email to ask whether it'll be back up. I did find this scraper online. It only has a few endpoints but following the examples given you could easily add more. Let me know if you make any progress!
Edit: Looks like CBS requested it to be taken down. The ToS prohibits scraping:
[…] you agree not to do the following, or assist others to do the following:
Engage in unauthorized spidering, “scraping,” data mining or harvesting of Content, or use any other unauthorized automated means to gather data from or about the Services;

Though I was hoping for a Javascript way of doing this, the creator of the API also told me some info.
He says I was getting blocked for not having a User agent in the header and should use a 429 handling procedure i.e. re-request with longer pauses in between.
A PHP plugin available as well: http://datalinx.io/shop/metacritic-api/

I had to add a user agent like JCDJulian said and now it allows me to scrape. So for Ruby:
agent = Mechanize.new
agent.user_agent_alias = "Mac Firefox"
Then it stopped giving me the 403 Forbidden error.

Related

How do I use/read an api documentation to send a simple request?

I know this is probably strictly case-specific, but I do feel like I encounter this problem a lot so I will make an effort to try and understand it better.
I am new to using APIs, but I have never succeeded in using one without copying someone's code. In this case, I can't even find any examples on forums, nor in the API documentation.
I'm trying to pull my balance value from my investment bank "NordNet" to scroll, amongst other things, on an Arduino display I've made. Right now I use python Selenium to automatically but "physically" login to NordNet and grab my balance from the DOM. As I'm afraid I might get "punished" for such botted behavior, and because the script is fairly high maintenance (as the HTML changes over time), I would obviously much rather get this information through NordNet's new API.
Link to NordNets API doc
Every time I try to utilize an API doc it's always the same, it looks easy, but I can never get it to work.
This time I tried to just play a little with the API before exploring further.
I use PostMan to send the simplest request:
https://www.nordnet.se/api/2
And I get a successful code 200 JSON response.
I then try to take it a step further to access my account data using this endpoint:
https://www.nordnet.se/api/2/accounts
For this one, I obviously need some authentication of some sort
The doc looks like this:
So I set my PostMan client up like this and get the response showcased:
I've put my NordNet login into the "Auth" tab as "basic auth" and I then see PostMan encrypts this info some way, in the "Headers" tab.
I'm getting an unauthorized response code and I have no idea why. Am I using PostMan wrong (probably)? Is the API faulty (probably not)? There is a mention of a session_id that should contain both password and username? Maybe something completely else...
I hope you can help

The documentation says to use session_id as username and password for that api ,
so try logging in and then get the session id (try with both sid and ssid) . from network tab and pass it as username and password for authorization .
sid- is for http and ssid for https i guess , try with both

SoundCloud Track List API Returns 504

For the past couple of years our app has been using SoundCloud's API without any issue. Recently we've started running into 504 errors when trying to request a user's track list. The request for the user's metadata are perfectly fine, but the track list will return a 504 about 80% of the time now.
Has anyone else experienced this? Any SoundCloud engineers able to give some support?
A sample URL is:
https://api.soundcloud.com/users/1887081/tracks.json?client_id=[OUR_APP_ID]
The docs for this call can be found here:
https://developers.soundcloud.com/docs/api/reference#tracks
Example response for the error:

That user ID, 1887081 has 78 tracks. The length of the search query and fetch is clearly longer than what their middleware/API is willing to wait for. I have two recommendations:
Write their support and ask them to optimize their backend, or query/index. In lieu of that, they could also increase the timeout.
You should use pagination. limit=10 and offset=0 to fetch the first 10. offset=10 to fetch the next page, etc.
Also, if this is a production level app on your end, I would recommend use an API monitoring tool like Runscope. You can do automated scheduled monitoring with simple assertions (no programming), such as checking for a status 200, or even specific content that you know should be there in the JSON, etc. That way, when things go south, or performance degrades in any way, you'll know ahead of time, rather than having to figure it out after your app breaks because of 403s.

Google+ .Net API - Getting Authenticated and retrieving profile

I'm trying to get a users profile information for google+ via the .NET API but am having trouble.
Does anyone know if they have changed how the special ID "me" works?
In the documentation it says this can be used as a special ID to get the currently authenticated users information however this throws a 404 from both the API in my code and on Google's own test page https://developers.google.com/+/api/latest/people/get. I was logged in when trying this.
Does anyone know how to get the user ID as I would happily use that instead of me but it isn't returned after the user authenticates as far as I can see (just an authcode)?
I also tried using user IDs returned when using the standard .net Oauth stuff but it isn't the correct ID, I assume it is for something else.
As for my method of getting to this stage, I first downloaded the example files here: http://code.google.com/p/google-api-dotnet-client/wiki/GettingStarted
They don't have a plus example so I took the Tasks.ASP.NET.SimpleOAuth2 example and swapped out tasks (which worked fine) for the plus equivalent.
I also tried rolling this into my own project.
Neither worked. I get the user forwarded to Google where they give me access fine and then when I return they are authenticated successfully as far as I can see, however when I call service.People.Get("me") it returns a 404.
If anyone could help with the above questions (using me, or gettign the user ID) I would appreciate it.
To the moderator who closed the initial version of this question, I have tried to make this as direct a question as possible so please don't close it. This is a legitimate question I would really like help getting to he bottom of.

This is now out of date given recent platform updates. Although the plus.me scope still exists and this code will work, you should be using the plus.login scope for retrieving profile data in C#. For a great way to get started with retrieving and rendering profile information, please start from the Google+ C# quick start available here:
https://developers.google.com/+/quickstart/csharp
First off, the 'me' id still works and is unchanged. The way that it works is:
You authenticate the user using a standard OAUTH2 flow
You use the library to perform a People.get with the special value 'me'
The 404 error code is a little troubling, this means that the client isn't finding the endpoint. To debug this, you might want to use a packet sniffer like fiddler to see what the actual URL it's querying is.
Anyways, how about some C# code. The following example shows how to use the plus service to get the currently authenticated user (assuming you have authenticated someone). What's different from your snippet is that you need to form a get request for the API call, then run fetch on the get request. I've included the following example, for getting 'me', and the following code works:
var auth = CreateAuthenticator();
plusService = new PlusService(auth);
if (plusService != null)
{
PeopleResource.GetRequest prgr = plusService.People.Get("me");
Person me = prgr.Fetch();
}
All of the configuration of the server and getting a client working is pretty hard and pasting all of the code here would be less helpful than just giving you a sample.
And so... I have written a sample application that demonstrates how to do this and also includes a wrapper that makes it easier to develop using the Google+ API in C#. Grab it here:
Google+ C# Server-Side demo and library

Seems you need to use:
Person test = service.People.Get("me").Fetch();
and not
req = service.People.Get("me");
Person test = req.Fetch();
Even though they seem to be identical the first works and the second doesn't.
Still not sure why google's own page doesn't work though. Now to find out how to add things to the scope like birthday.

Table blocked on YQL?

I'm trying to retrieve a user timeline from Twitter using YQL's community Twitter table. The full REST url is
https://query.yahooapis.com/v1/public/yql?q=SELECT%20*%20FROM%20twitter.status.timeline.home%20WHERE%20oauth_consumer_key%20%3D%20'kt9wDTrDREjXzRhBMpw'%20AND%20oauth_consumer_secret%20%3D%20'zNnA76G3NhZSeaJdRv7munbyutlcqK8k0hazf6JrEo'%20AND%20oauth_token%20%3D%20'195tuy9661-yJFEsgA0VPCwg6gsNHtuy2y2Kq2LwTdKe4BRYa4j'%20AND%20oauth_token_secret%20%3D%20'myWfyDTtOHscMmJy6tuyU1XDyiZJiIIRkK7sIPvT2ngI'&diagnostics=true&env=store%3A%2F%2Fdatatables.org%2Falltableswithkeys
(keys have been mangled to protect the guilty)
The response I get is:
The current table
'twitter.status.timeline.user' has
been blocked. It exceeded the allotted
quotas of either time or instructions
As I seem to be doing the querying correctly, I'm at a bit of a loss as to why I should get this response, particularly since it works as it should through the YQL console. The only thing I can think of is that I need to authorize my query somehow with an API key, or oAuth credentials, but I haven't been able to find a comprehensible example of how to do this.
Can anyone possibly point me in the right direction on this? YQL's community tables seem to offer a marvelous way to do very complicated things with ease, so I'd hate to fall at the last hurdle so to speak.

According to the twitter docs the call to this API endpoint is supposed to return the last tweets from the authorized user, right? Not from any kind of user. Just checking that this is really what you want to achieve.
From: http://dev.twitter.com/doc/get/statuses/home_timeline
Returns the 20 most recent statuses,
including retweets if they exist,
posted by the authenticating user and
the user's they follow. This is the
same timeline seen by a user when they
login to twitter.com.
This is the definition of the datatable that you are using. I am a bit confused about the #id parameter in the example of that datatable because I don't see it being used anywhere.
www.datatables.org/twitter/twitter.status.timeline.home.xml
The error message you get sounds like an internal YQL error message and not like something that comes from Twitter, doesn't it?
Sorry for not being able to provide answer right now but maybe raising other related questions can help somebody else or you to figure it out. If I crack this later I will add to this again.

Do I need to send a 404?

We're in the middle of writing a lot of URL rewrite code that would basically take ourdomain.com/SomeTag and some something dynamic to figure out what to display.
Now if the Tag doesn't exist in our system, we're gonna display some information helping them finding what they were looking for.
And now the question came up, do we need to send a 404 header? Should we? Are there any reasons to do it or not to do it?
Thanks
Nathan

You aren't required to, but it can be useful for automated checkers to detect the response code instead of having to parse the page.

I certainly send proper response codes in my applications, especially when I have database errors or other fatal errors. Then the search engine knows to give up and retry in 5 mins instead of indexing the page. e.g. code 503 for "Service Unavailable" and I also send a Retry-After: 600 to tell it to try again...search engines won't take this badly.
404 codes are sent when the page should not be indexed or doesn't exist (e.g. non-existent tag)
So yes, do send status codes.

I say do it - if the user is actually an application acting on behalf of the user (i.e. cURL, wget, something custom, etc...) then a 404 would actually help quite a bit.

You have to keep in mind that the result code you return is not for the user; for the standard user, error codes are meaningless so don't display this info to the user.
However think about what could happen if the crawlers access your pages and consider them valid (with a 200 response); they will start indexing the content and your page will be added to the index. If you tell the search engine to index the same content for all your not found pages, it will certainly affect your ranking and if one page appears in the top search results, you will look like a fool.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas