Scraping with CasperJS/PhantomJs - phantomjs

I want to scrap some data with CasperJS from one popular site. I have already scraped successfully some data with pool of proxies. Now I'm worried about HTTP REQUEST headers, coming with my HTTP Request.
I know there a lot information about me, and my servers - so is there exist some way to delete or modify outgoing HTTP headers.

You can add custom headers to casperjs with the headers property. You should be able to alter headers that you are concerned about.
Example: http://casperjs.org/api.html#casper
casper.open('http://some.testserver.com/post.php', {
method: 'post',
data: {
'title': 'Plop',
'body': 'Wow.'
},
headers: {
'Accept-Language': 'fr,fr-fr;q=0.8,en-us;q=0.5,en;q=0.3'
}
});

Related

Axios request to AEM servlet redirecting to login.html

I have a working servlet that tests properly with Postman, but I can't get the request to execute from the front end. The fact that Postman can execute the servlet with either a Get or a Post tells me the problem is likely with the front-end code.
Does anyone see where the misconfiguration is in this block? The Basic key and cookie are copied from Postman, there is no CORs problem.
const response = await axios.get(url, null, {
headers: {
'Access-Control-Allow-Origin': '*',
'Accept': '*/*',
'Content-type': 'application/json',
'Access-Control-Allow-Methods': 'GET, POST, OPTIONS, PUT, PATCH, DELETE',
'Access-Control-Allow-Headers': 'Origin, Content-Type, X-Auth-Token',
'Authorization': 'Basic YWRtaW46YWRtaW4='
},
withCredentials: true,
Cookie: "cq-authoring-mode=TOUCH;",
params: {
path: rootPath,
maxCount: sourceMax
}
}).catch(err => {
console.log(err)
}, () => {
console.log(response)
}).then(res => {
console.log(res)
})
This is most likely the CSRF filter which rejects some requests that don’t contain a CSRF token. By default it checks only POST, PUT and DELETE requests.
It’s weird that it also checks your request, which seems to be a GET. Either your filter is configured differently or you sending a Content-type header – which describes the request body content type – makes axios switch the request from GET to POST (because GETs don’t have a request body and, thus, don’t need to declare their content type).
The CSRF filter can be configured in various ways and can exclude certain requests from filtering by path or user-agent:
You could also request a token from the /libs/granite/csrf/token.json endpoint and then send it along in your request. One way to do this is via the query, as the :cq_csrf_token param.

Vuejs Issue with the second request having 300ms overhead

I started learning Vuejs and I saw a strange behavior with the HTTP request to my web server. I'm using axios to make requests.
I just installed the vue-cli version 3.11 without using the webpack, and I added the config file vue.config.js next to package.json with the following content to proxy my web server
module.exports = {
devServer: {
port: 8081,
proxy: "http://127.0.0.1:8080"
}
};
Then using axios I'm trying to do a GET request by clicking a button
axios.get(API_URL, {
headers: {
Accept: "application/json",
"Content-Type": "application/json",
"Cache-Control": "no-cache"
}
}).then(response => {});
My web server based on Python and bottle where the API request of fetching users takes between 5ms and 16ms to be processed.
By opening the browser inspector I saw that on the second request I'm getting an overhead around 300ms, please see below:
Request and Response parameters
Anyone has an opinion about this behavior? I searched a bit for vue-cli 3 issues or CORS headers but I did not find anything helpful. Thank you.

How to call Phishtank API to get JSON response?

It was really painful to find how to call Phishtank API here.
After a lot of searching I was able to find how to call the API. Below is a sample call,
https://checkurl.phishtank.com/checkurl/index.php?url=http://auto.smtpsystems.net/&format=json
But the problem with the above call is that it gives the response in XML format whereas I want the response in JSON format.
Any kind of help will be greatly appreciated.
The problem is that you are making an HTTP GET request. And this method accepts an HTTP POST request
//Custom your request
var requestOptions = {
headers: {
'Content-Type': 'application/x-www-form-urlencoded'
},
url: "https://checkurl.phishtank.com/checkurl/",
method: 'POST',
json: true,
body: {
url: The URL to check(urlencoded or base64 encoded),
format: 'json',
app_key: Your application key
},
};
//Do the request
request.post(requestOptions, function callback(err, httpResponse, json) {
//Here you json
})
Make sure to use https instead of http in the endpoint url, although in Documentation http is given(as of writing this).
Use HTTP POST request not HTTP GET.
And format is in quotes(double preferred)
# Python implementation
endpoint = "https://checkurl.phishtank.com/checkurl/"
url = "http://www.travelswitchfly.com/"
response = requests.post(endpoint, data={"url": url, "format": "json"})
You have to specify the url, format, and the app_key in the body of the POST request.
I was trying to implement their API in my android application with the help of Retrofit. Their documentation is outdated. After spending 3 hours I come to know a few things.
use this URL https://checkurl.phishtank.com/checkurl/ (do not use URL with http://)
use the below interface for retrofit GET request. it does not work with #Query and it requires #FormUrlEncoded
#FormUrlEncoded
#GET("https://checkurl.phishtank.com/checkurl/")
fun findPhishing(
#Field("format") format: String,
#Field("url") url: String
): Single<Response>

Vue Resource Cross-site HTTP request

Normally, when I make a jQuery request to a non-local server, it applies Cross-site HTTP request rules and initially sends an OPTIONS request to verify the existence of an endpoint and then it sends the request, i.e.
GET to domain.tld/api/get/user/data/user_id
jQuery works fine, however I would like to use Vue Resource to deal with requests. In my network log, I see only the actual request being made (no OPTIONS request initially), and no data is being received.
Anybody has an idea how to solve this?
Sample Code:
var options = {
headers: {
'Authorization': 'Bearer xxx'
}
};
this.$http.get(config.api.base_url + 'open/cities',[options])
.then(function(response){
console.log('new request');
vm.cities = response;
}, function(error){
console.log('error in .js:');
console.log(error);
});
jquery-request
Solution:
As #Anton mentioned, it's not necessary to have both requests (environment negligible). Not sure what I have changed to make it work, but the request gave me an error. It consisted in setting the headers correctly. Headers should not be passed as options but as a property of http:
this.$http({
root: config.api.base_url + 'open/cities', // url, endpoint
method: 'GET',
headers: {
'Authorization': 'Bearer xxx'
}
}).then(function(response){
console.log('new request');
vm.cities = response;
}, function(error){
console.log('error in .js:');
console.log(error);
});
Thank you guys, it was a team effort :)
Is it a requirement that an additional OPTIONS request is being made? I have created a small (32 LOC) example which works fine and retrieves the data:
https://jsfiddle.net/ct372m7x/2/
As you can see, the data is being loaded from a non-local server. The example is located on jsfiddle.net and the request is made to httpbin.org - this leads to CORS being applied (you can see the Access-Control-Allow-Origin header in the screenshot below).
What you also see is that only the GET request has been executed, no OPTIONS before that.

Is it possible to post files to Slack using the incoming Webhook?

I am trying out the Slack's API using the incoming webhook feature, posting messages works flawlessly, but it doesn't seem to allow any file attachments.
Looking through I understand I have to use a completely different OAuth based API, but creating more tokens just for the purpose of uploading a file seems odd when posting messages works well, is there no way to upload files to slack with the incoming webook?
No, its is not possible to upload files through an incoming Webhook. But you can attach image URLs to your attachments with the image_url tag.
To upload files you need to use the Slack Web API and the files.upload method. Yes, it requires a different authentication, but its not that complicated if you just use a test token for all API calls.
You can see in the Slack API document that it's easy to add an attachment to the POST message to your webhook. Here is a simple example of sending a text message with an attachment in NodeJS:
import fetch from "node-fetch";
const webhook_url = "https://hooks.slack.com/services/xxxx/xxxx/xxxxxxxx"
const url = "https://1.bp.blogspot.com/-ld1w-xCN0nA/UDB2HIY55WI/AAAAAAAAPdA/ho23L6J3TBA/s1600/Cute+Kitten+13.jpg"
await fetch(webhook_url, {
method: "POST",
body: JSON.stringify({
type: "mrkdwn",
text: "Example text",
attachments: [
{
title_link: url,
text: "Your document: <file name>"
},
],
}),
headers: {
"Content-Type": "application/json",
Accept: "application/json",
},
});