I'm trying to do a crawler using casperjs. Some requests need raw headers edition: I have to get the raw post data, cookies, etc etc, and once I get them, I'd like to modify them (still raw) and do another request with those modified headers. But I can't find a way to do that.
I've found how to retrieve cookies using Phantomjs, but I did not found anything in casperjs/slimerjs documentation.
Thank you for your help
You can listen for the page.resource.requested event and access the headers property of the requestData:
var casper = require('casper').create();
var utils = require('utils');
casper.start('https://example.com/');
casper.on('page.resource.requested', function (requestData, networkRequest) {
utils.dump(requestData.headers);
});
casper.run();
Related
I'm trying to write a react native app which will stream some tracks from Soundcloud. As a test, I've been playing with the API using python, and I'm able to make requests to resolve the url, pull the playlists/tracks, and everything else I need.
With that said, when making a request to the stream_url of any given track, I get a 401 error.
The current url in question is:
https://api.soundcloud.com/tracks/699691660/stream?client_id=PGBAyVqBYXvDBjeaz3kSsHAMnr1fndq1
I've tried it without the ?client_id..., I have tried replacing the ? with &, I've tried getting another client_id, I've tried it with allow_redirects as both true and false, but nothing seems to work. Any help would be greatly appreciated.
The streamable property of every track is True, so it shouldn't be a permissions issue.
Edit:
After doing a bit of research, I've found a semi-successful workaround. The /stream endpoint of the API is still not working, but if you change your destination endpoint to http://feeds.soundcloud.com/users/soundcloud:users:/sounds.rss, it'll give you an RSS feed that's (mostly) the same as what you'd get by using the tracks or playlists API endpoint.
The link contained therein can be streamed.
Okay, I think I have found a generalized solution that will work for most people. I wish it were easier, but it's the simplest thing I've found yet.
Use API to pull tracks from user. You can use linked_partitioning and the next_href property to gather everything because there's a maximum limit of 200 tracks per call.
Using the data pulled down in the JSON, you can use the permalink_url key to get the same thing you would type into the browser.
Make a request to the permalink_url and access the HTML. You'll need to do some parsing, but the url you'll want will be something to the effect of:
"https://api-v2.soundcloud.com/media/soundcloud:tracks:488625309/c0d9b93d-4a34-4ccf-8e16-7a87cfaa9f79/stream/progressive"
You could probably use a regex to parse this out simply.
Make a request to this url adding ?client_id=... and it'll give you YET ANOTHER url in its return json.
Using the url returned from the previous step, you can link directly to that in the browser, and it'll take you to your track content. I checked on VLC by inputting the link and it streams correctly.
Hopefully this helps some of you out with your developing.
Since I have the same problem, the answer from #Default motivated me to look for a solution. But I did not understand the workaround with the permalink_url in the steps 2 and 3. The easier solution could be:
Fetch for example user track likes using api-v2 endpoint like this:
https://api-v2.soundcloud.com/users/<user_id>/track_likes?client_id=<client_id>
In the response we can finde the needed URL like mentioned from #Default in his answer:
collection: [
{
track: {
media: {
transcodings:[
...
{
url: "https://api-v2.soundcloud.com/media/soundcloud:tracks:713339251/0ab1d60e-e417-4918-b10f-81d572b862dd/stream/progressive"
...
}
]
}
}
...
]
Make request to this URL with client_id as a query param and you get another URL with that you can stream/download the track
Note that the api-v2 is still not public and the request from your client probably will be blocked by CORS.
As mentioned by #user208685 the solution can be a bit simpler by using the SoundCloud API v2:
Obtain the track ID (e.g. using the public API at https://developers.soundcloud.com/docs)
Get JSON from https://api-v2.soundcloud.com/tracks/TRACK_ID?client_id=CLIENT_ID
From JSON parse MP3 progressive stream URL
From stream URL get MP3 file URL
Play media from MP3 file URL
Note: This link is only valid for a limited amount of time and can be regenerated by repeating steps 3. to 5.
Example in node (with node-fetch):
const clientId = 'YOUR_CLIENT_ID';
(async () => {
let response = await fetch(`https://api.soundcloud.com/resolve?url=https://soundcloud.com/d-o-lestrade/gabriel-ananda-maceo-plex-solitary-daze-original-mix&client_id=${clientId}`);
const track = await response.json();
const trackId = track.id;
response = await fetch(`https://api-v2.soundcloud.com/tracks/${trackId}?client_id=${clientId}`);
const trackV2 = await response.json();
const streamUrl = trackV2.media.transcodings.filter(
transcoding => transcoding.format.protocol === 'progressive'
)[0].url;
response = await fetch(`${streamUrl}?client_id=${clientId}`);
const stream = await response.json();
const mp3Url = stream.url;
console.log(mp3Url);
})();
For a similar solution in Python, check this GitHub issue: https://github.com/soundcloud/soundcloud-python/issues/87
I am trying to get JSON data from SportsRadar using an API request. My trial url is:
http://api.sportradar.us/nba/trial/v4/en/games/2018/03/03/schedule.json?api_key=4j9ge4a4rgsbq597f29p9rgb
When I copy this url into my google browser, the data I get back is as expected, but when I try to use/add the API request to my meteor project the API request does not return any data. As a test, in my client/main.js file I have added:
HTTP.call('GET',Meteor.absoluteUrl("http://api.sportradar.us/nba/trial/v4/en/games/2018/03/03/schedule.json?api_key=4j9ge4a4rgsbq597f29p9rgb"),
function(err,result){
console.log(result.data);
});
The console log result come back as null. Any guidance or thoughts will be appreciated - cfp
You need to call your callback function correctly. Try this;
HTTP.call('GET','http://api.sportradar.us/nba/trial/v4/en/games/2018/03/03/schedule.json?api_key=4j9ge4a4rgsbq597f29p9rgb'),
function(err,result){
if (result) {
console.log(result.data);
}
console.log(err);
});
Edit: The parameters of The HTTP.call() is corrected by removing Meteor.absoluteUrl()in the question upon Derrick's comment below.
You can also refer to the official documentation here.
In a particular case I need to be able to disable compression in the requst/response.
Using Firefox RestClient I am able to post some xml to a web service and get some response xml successfully with a single header parameter "Accept-Encoding" : " "
which if I do not set this header, the response body would come back compressed with some binary data in the response body(that's why I want to disable gzip in response)
Now using the same header value in my app (using RestSharp in C#), I still get the binary data (gzip) in response.
Can someone please shed some light? Is it supported in RestSharp?
RestSharp does not support disabling compression.
If you look at the source code in Http.Sync.cs line 267 (assuming a sync request, async has the same code duplicated in Http.Async.cs line 424)
webRequest.AutomaticDecompression = DecompressionMethods.Deflate | DecompressionMethods.GZip | DecompressionMethods.None;
that is, the underlying WebRequest that Restsharp uses to make the http call has the compression options hardcoded. There is an open issue that documents this
The feature (only just) seems to have been added, but stealthily - without a note on the issue's status nor on the changelogs. Possibly as it hasn't been sufficiently tested?
Nevertheless I recently had a need for this functionality and tested it - and it works. Just set the RestClient instance's AutomaticDecompression property to false.
If you intend to keep your RestClient instance long-lived remember to do this before its first use - the setting seems to be 'locked in' after use and cannot change after. In my case I needed to make calls with and without AutomaticDecompression so i simply created two different RestClient instances.
Using RestSharp v106.11.4, I was unable to turn off automatic decompression as Bo Ngoh suggested. I set the AutomaticDecompression on the RestClient instance at the moment it gets instantiated, but still the Accept-Encoding header was added.
The way to set this & disable the decompression is through the ConfigureWebRequest method, which is exposed on the RestClient. Below snippet allowed me to turn off this feature:
var client = new RestClient();
client.ConfigureWebRequest(wr =>
{
wr.AutomaticDecompression = DecompressionMethods.None;
});
Not sure if this relevant anymore, but for maybe future references
RestRequest has IList<DecompressionMethods> AllowedDecompressionMethods, and when creating new RestRequest the list is empty. Only when calling the Execute method it fills with the default values (None, Deflate, and GZip) unless it's not empty
To update the wanted decompression method, simply use the method named AddDecompressionMethod and add the wanted decompression method - and that's that
Example:
var client = new RestClient();
var request = new RestRequest(URL, Method.GET, DataFormat.None);
request.AddDecompressionMethod(DecompressionMethods.GZip);
var response = client.Execute(request);
As of RestSharp version 107, the AddDecompressionMethod has been removed and most of the client options has been move to RestClientOptions. Posting here the solution that worked for me, in case anyone needs it.
var options = new RestClientOptions(url)
{
AutomaticDecompression = DecompressionMethods.None
};
_client = new RestClient(options);
I have a spreadsheet on my Google Drive and I want to download a CSV from another website and put it into my spreadsheet. The problem is that I have to login to the website first, so I need to use some HTTP request to do that.
I have found this site and this. If either of these sites has the answer on it, then I clearly don't understand them enough to figure it out. Could someone help me figure this out? I feel that the second site is especially close to what I need, but I don't understand what it is doing.
To clarify again, I want to login with an HTTP request and then make a call to the same website with a different URL that is the call to get the CSV file.
I have done a lot of this in the past month so I should be able to help you, we are trying to emulate the browsers behaviour here so first you need to use chrome's developer tools(or something similar) and note down the exact things the browser does like the form values posted, the url that is called and so on. The following example shows the general techinique to be used:
The first step is to login to the website and get the session cookie:
var payload =
{
"user_session[email]" : "username",
"user_session[password]" : "password",
};// The actual values of the post variables (like user_session[email]) depends on the site so u need to get it either from the html of the login page or using the developer tools I mentioned.
var options =
{
"method" : "post",
"payload" : payload,
"followRedirects" : false
};
var login = UrlFetchApp.fetch("https://www.website.com/login" , options);
var sessionDetails = login.getAllHeaders()['Set-Cookie'];
We have logged into the website (In order to confirm just log the sessionDetails and match it with the cookies set by chrome). The next step is purely dependent on the website so I will give u a general example
var downloadPayload =
{
"__EVENTTARGET" : 'ctl00$ActionsPlaceHolder$exportDownloadLink1',
};// This is just an example it may or may not be needed, if needed u need to trace the values from the developer tools.
var downloadCsv = UrlFetchApp.fetch("https://www.website.com/",
{"headers" : {"Cookie" : sessionDetails},
"method" : "post",
"payload" : downloadPayload,
});
Logger.log(downloadCsv.getContentText())
The file should now be logged, you can then parse the csv using hte GAS inbuilt function and dump the data in the spreadsheet.
A few points to note:
I have assumed that all form post values are static and can be
hardcoded, in case this is not true then let me know I will give you
a function that can extract values from the html.
Some websites require the browser to send a token value(the value will be present in the html) along with the credentials. In this case you need to extract the values and then post it.
I'm trying to fetch the JSON output of a rest api in AngularJS. Here are the problems I'm facing:
The Rest api url has the port number in it which is being interpolated by AngularJS for a variable. I tried several resolutions for this in vain.
I'm having issues with JSONP method. Rest api isn't hosted on the same domain/server and hence a simple get isn't working.
The parameters to the rest api are slash separated and not like a HTML query string. One of the parameters is an email address and I'm thinking the '#' symbol is causing some problem as well. I wasn't able to fix this either.
My rest api looks something like: http://myserver.com:8888/dosomething/me#mydomain.com/arg2.
Sample code / documentation would be really helpful.
I struggled a lot with this problem, so hopefully this will help someone in the future :)
JSONP expects a function callback, a common mistake is to call a URL that returns JSON and you get a Uncaught SyntaxError: Unexpected token : error. Instead, JSONP should return something like this (don't get hung up on the function name in the example):
angular.callbacks._0({"id":4,"name":"Joe"})
The documentation tells you to pass JSON_CALLBACK on the URL for a reason. That will get replaced with the callback function name to handle the return. Each JSONP request is assigned a callback function, so if you do multiple requests they may be handled by angular.callbacks._1, angular.callbacks._2 and so forth.
With that in mind, your request should be something like this:
var url = 'http://myserver.com:8888/dosomething/me#mydomain.com/arg2';
$http.jsonp(url + '?callback=JSON_CALLBACK')
.then(function (response) {
$scope.mydata = response.data;
...
Then AngularJS will actually request (replacing JSON_CALLBACK):
http://myserver.com:8888/dosomething/me#mydomain.com/arg2?callback=angular.callbacks._0
Some frameworks have support for JSONP, but if your api doesn't do it automatically, you can get the callback name from the querystring to encapsulate the json.
Example is in Node.js:
var request = require('request');
var express = require('express');
var app = express();
app.get('/', function(req, res){
// do something to get the json
var json = '{"id":4,"name":"Joe"}';
res.writeHead(200, {"Content-Type": "application/javascript"});
res.write(req.query.callback + '(' + json + ')');
res.end();
});
app.listen(8888);
The main issue I was facing here was related to CORS. I got the $http to retrieve the JSON data from the server by disabling the web security in Chrome - using the --disable-web-security flag while launching Chrome.
Regarding the 8888 port, see if this works:
$scope.url = 'http://myserver.com:port/dosomething/:email/:arg2';
$scope.data = $resource($scope.url, {port:":8888", email:'me#mydomain.com',
arg2: '...', other defaults here}, …)
Try escaping the ':'
var url = 'http://myserver.com\:8888/dosomething/me#mydomain.com/arg2';
Pretty sure I read about this somewhere else