I use Twitter4J libraries to access Twitter through their Search API.
I provide such a query to Twitter4j:
Query{query='#hungergames', lang='null', locale='null', maxId=-1, rpp=100, page=-1, since='null', sinceId=241378725860618240, geocode='null', until='null', resultType='recent', nextPageQuery='null'}
and
result = twitter.search(query);
but I am not sure what URL is executes internally.
Any insights into how I can find that out?
I know Twitter API documents how I should form the URL to query something here but I want to know what URL did Twitter4J execute.
The easiest way would probably be to sniff network traffic with a tool like Wireshark.
I used the following code to replicate your query:
public static void main(String[] args) throws TwitterException {
Twitter twitter = new TwitterFactory().getInstance();
Query query = new Query("#hungergames");
query.rpp(100);
query.setSinceId(241378725860618240L);
query.setResultType(Query.RECENT);
System.out.println(query);
QueryResult result = twitter.search(query);
for (Tweet tweet : result.getTweets()) {
System.out.println(tweet.getFromUser() + ":" + tweet.getText());
}
}
The line that prints the query gives me:
Query{query='#hungergames', lang='null', locale='null', maxId=-1, rpp=100, page=-1, since='null', sinceId=241378725860618240, geocode='null', until='null', resultType='recent'}
By sniffing the network traffic I found that the code is requesting the following URL:
http://search.twitter.com/search.json?q=%23hungergames&rpp=100&since_id=241378725860618240&result_type=recent&with_twitter_user_id=true&include_entities=true
Related
I have written the code:
function getId(username) {
var infoUrl = "https://www.instagram.com/web/search/topsearch/?context=user&count=0&query=" + username
return parseInt(fetch(infoUrl)['users']);
}
function fetch(url) {
var ignoreError = {
"muteHttpExceptions": true
};
var source = UrlFetchApp.fetch(url, ignoreError).getContentText();
var data = console.log(source);
return data;
}
To get the userID of the username input.
The error corresponds to the line:
return parseInt(fetch(infoUrl)['users']);
I have tried differnt things but I cant get it to work. The url leads to a page looking like this:
{"users": [{"position": 0, "user": {"pk": "44173477683", "username": "mykindofrock", "full_n........
Where the numbers 44173477683 after the "pk": are what I am trying to get as an output.
I hope someone can help as I am very out of my depth, but I guess this is how we learn! :)
I was surprised that the endpoint you provided actually led to a JSON file. I would have thought that to access the Instagram API, you would need register a developer account with Facebook etc. Nevertheless, it does return a JSON by visiting in the browser. I suppose that it just shows the publicly available information on each user.
However, with Apps Script it seems like a different story. I visited:
https://www.instagram.com/web/search/topsearch/?context=user&count=0&query=user
In a browser and chose a random user id. Then I called it from Apps Script with UrlFetchApp:
function test(){
var username = "username7890543216"
var infoUrl = "https://www.instagram.com/web/search/topsearch/?context=user&count=0&query=" + username
var options = {
'muteHttpExceptions': true
}
var result = UrlFetchApp.fetch(infoUrl, options)
console.log(result.getResponseCode())
}
Which returns a 429 response. Which is a "Too Many Requests" response. So if I had to guess, I would say that all requests to this unauthenticated endpoint from Apps Script have been blocked. This is why when replacing the console.log(result.getResponseCode()) with console.log(result.getContentText()), you get a load of HTML (not JSON) part of it which says:
<title>
Page Not Found • Instagram
</title>
Though maybe its IP based. Try and run this code from your end, unless you get a response code of 200, it is likely that you simply can't access this information from Apps Script.
You are setting data to the return value of console.log(source) which is undefined. So no matter what the data is, you will get undefined.
Another thing to avoid is that fetch will not necessarily be hoisted because fetch is a built in function to make API calls.
my URL with special parameter doesn't fetch and index in google and I have crawler error for all Urls include this parameter '#!'
mysite.com/products/دوربین/1187/view/#!/productgroup-1187/attributes-576644-2207/
and fetching in google just support this:/products/دوربین/1187/view/
mean after the URL that includes '#!' and google fetch doesn't show or know another charter after
/products/دوربین/1187/view/
this parameter used for sorting and filtering
in google URL parameter show the parameter as "_escaped_fragment_" I changed same as follows Pic
crawler error URL:mysite.com/products/%DA%AF%D9%88%D8%B4%DB%8C/1145/view/?_escaped_fragment_=/productgroup-1145/attributes-100686-2305/
enter image description here
I think you might be able to encode those characters using a website like this: http://www.url-encode-decode.com For example:
mysite.com/products/دوربین/1187/view/#!/productgroup-1187/attributes-576644-2207
becomes:
mysite.com%2Fproducts%2F%D8%AF%D9%88%D8%B1%D8%A8%DB%8C%D9%86%2F1187%2Fview%2F%23%21%2Fproductgroup-1187%2Fattributes-576644-2207
Depending on the language you are using there are library functions that can help you achieve this translation programmatically.
As you are using C#, you could try the UriBuilder class. Here's some demo code:
using System;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
var oURL = new UriBuilder("mysite.com/products/دوربین/1187/view/#!/productgroup-1187/attributes-576644-2207");
string sEscapedURL = oURL.Uri.AbsoluteUri;
Console.WriteLine("sEscapedURL = {0}", sEscapedURL);
Console.ReadLine(); //Pause
}
}
}
I'm having no luck getting a response from v4 of the Google Sheets API when running against a public (i.e. "Published To The Web" AND shared with "Anyone On The Web") spreadsheet.
The relevant documentation states:
"If the request doesn't require authorization (such as a request for public data), then the application must provide either the API key or an OAuth 2.0 token, or both—whatever option is most convenient for you."
And to provide the API key, the documentation states:
"After you have an API key, your application can append the query parameter key=yourAPIKey to all request URLs."
So, I should be able to get a response listing the sheets in a public spreadsheet at the following URL:
https://sheets.googleapis.com/v4/spreadsheets/{spreadsheetId}?key={myAPIkey}
(with, obviously, the id and key supplied in the path and query string respectively)
However, when I do this, I get an HTTP 401 response:
{
error: {
code: 401,
message: "The request does not have valid authentication credentials.",
status: "UNAUTHENTICATED"
}
}
Can anyone else get this to work against a public workbook? If not, can anyone monitoring this thread from the Google side either comment or provide a working sample?
I managed to get this working. Even I was frustrated at first. And, this is not a bug. Here's how I did it:
First, enable these in your GDC to get rid of authentication errors.
-Google Apps Script Execution API
-Google Sheets API
Note: Make sure the Google account you used in GDC must be the same account you're using in Spreadsheet project else you might get a "The API Key and the authentication credential are from different projects" error message.
Go to https://developers.google.com/oauthplayground where you will acquire authorization tokens.
On Step 1, choose Google Sheets API v4 and choose https://www.googleapis.com/auth/spreadsheets scope so you have bot read and write permissions.
Click the Authorize APIs button. Allow the authentication and you'll proceed to Step 2.
On Step 2, click Exchange authorization code for tokens button. After that, proceed to Step 3.
On Step 3, time to paste your URL request. Since default server method is GET proceed and click Send the request button.
Note: Make sure your URL requests are the ones indicated in the Spreadsheetv4 docs.
Here's my sample URL request:
https://sheets.googleapis.com/v4/spreadsheets/SPREADSHEET_ID?includeGridData=false
I got a HTTP/1.1 200 OK and it displayed my requested data. This goes for all Spreadsheetv4 server-side processes.
Hope this helps.
We recently fixed this and it should now be working. Sorry for the troubles, please try again.
The document must be shared to "Anyone with the link" or "Public on the web". (Note: the publishing settings from "File -> Publish to the web" are irrelevant, unlike in the v3 API.)
This is not a solution of the problem but I think this is a good way to achieve the goal. On site http://embedded-lab.com/blog/post-data-google-sheets-using-esp8266/ I found how to update spreadsheet using Google Apps Script. This is an example with GET method. I will try to show you POST method with JSON format.
How to POST:
Create Google Spreadsheet, in the tab Tools > Script Editor paste following script. Modify the script by entering the appropriate spreadsheet ID and Sheet tab name (Line 27 and 28 in the script).
function doPost(e)
{
var success = false;
if (e != null)
{
var JSON_RawContent = e.postData.contents;
var PersonalData = JSON.parse(JSON_RawContent);
success = SaveData(
PersonalData.Name,
PersonalData.Age,
PersonalData.Phone
);
}
// Return plain text Output
return ContentService.createTextOutput("Data saved: " + success);
}
function SaveData(Name, Age, Phone)
{
try
{
var dateTime = new Date();
// Paste the URL of the Google Sheets starting from https thru /edit
// For e.g.: https://docs.google.com/---YOUR SPREADSHEET ID---/edit
var MyPersonalMatrix = SpreadsheetApp.openByUrl("https://docs.google.com/spreadsheets/d/---YOUR SPREADSHEET ID---/edit");
var MyBasicPersonalData = MyPersonalMatrix.getSheetByName("BasicPersonalData");
// Get last edited row
var row = MyBasicPersonalData.getLastRow() + 1;
MyBasicPersonalData.getRange("A" + row).setValue(Name);
MyBasicPersonalData.getRange("B" + row).setValue(Age);
MyBasicPersonalData.getRange("C" + row).setValue(Phone);
return true;
}
catch(error)
{
return false;
}
}
Now save the script and go to tab Publish > Deploy as Web App.
Execute the app as: Me xyz#gmail.com,
Who has access to the app: Anyone, even anonymous
Then to test you can use Postman app.
Or using UWP:
private async void Button_Click(object sender, RoutedEventArgs e)
{
using (HttpClient httpClient = new HttpClient())
{
httpClient.BaseAddress = new Uri(#"https://script.google.com/");
httpClient.DefaultRequestHeaders.Accept.Add(new System.Net.Http.Headers.MediaTypeWithQualityHeaderValue("application/json"));
httpClient.DefaultRequestHeaders.AcceptEncoding.Add(new System.Net.Http.Headers.StringWithQualityHeaderValue("utf-8"));
string endpoint = #"/macros/s/---YOUR SCRIPT ID---/exec";
try
{
PersonalData personalData = new PersonalData();
personalData.Name = "Jarek";
personalData.Age = "34";
personalData.Phone = "111 222 333";
HttpContent httpContent = new StringContent(JsonConvert.SerializeObject(personalData), Encoding.UTF8, "application/json");
HttpResponseMessage httpResponseMessage = await httpClient.PostAsync(endpoint, httpContent);
if (httpResponseMessage.IsSuccessStatusCode)
{
string jsonResponse = await httpResponseMessage.Content.ReadAsStringAsync();
//do something with json response here
}
}
catch (Exception ex)
{
}
}
}
public class PersonalData
{
public string Name;
public string Age;
public string Phone;
}
To above code NuGet Newtonsoft.Json is required.
Result:
If your feed is public and you are using api key, make sure you are throwing a http GET request.In case of POST request, you will receive this error.
I faced same.
Getting data using
Method: spreadsheets.getByDataFilter has POST request
Im trying to make steam login work for my website with DotNetOpenAuth.
But looking through the examples in the documentation doesn't give me any idea how to make it work.
Here´s what I done so far:
1) Added the dotnetopenauth reference files to \bin and to the configuration
2) Added a unique user field in the database for the response i get back from DotNetOpenAuth.
So heres my question
How can i retrive the steam id with DotNetOpenAuth?
I found some examples done in php:
http://forums.steampowered.com/forums/showthread.php?t=1430511
This should do it, when you include the correct DotNetOpenAuth.
This is code you would typically put in your Login page, the first if checks whether we have a response from Steam and deals with the response.
The else part makes sets up the request and redirects the user to Steam - steam will then redirect back to this page once the user has logged in on steam.
Unlike other open auth providers, steam does not provide other user information (email, etc...) by sending a claims request with the request - it will only provide a URL in the response.ClaimedIdentifier which is a URL containing the users steam id at the end.
You will have to do the string manipulation to only get the ID if you want.
protected void Page_Load(object sender, EventArgs e)
{
var openid = new OpenIdRelyingParty();
var response = openid.GetResponse();
if (response != null)
{
switch (response.Status)
{
case AuthenticationStatus.Authenticated:
// do success
var responseURI = response.ClaimedIdentifier.ToString();
//"http://steamcommunity.com/openid/id/76561197969877387"
// last part is steam user id
break;
case AuthenticationStatus.Canceled:
case AuthenticationStatus.Failed:
// do fail
break;
}
}
else
{
using (OpenIdRelyingParty openidd = new OpenIdRelyingParty())
{
IAuthenticationRequest request = openidd.CreateRequest("http://steamcommunity.com/openid");
request.RedirectToProvider();
}
}
}
I am new to Web Crawling, and I am using HttpWebRequest to crawl data from sites.
As of now I was successfully able to crawl and get data from my wordpress site. This data was a simple user profile data. (like name, email, AIM id etc...)
Now as an exercise I want to crawl wikipedia, where I will search using the value entered into textbox at my end and then crawl wikipedia with the search value and get the appropriate title(s) from the search.
Now I have the following doubts/difficulties.
Firstly, is this even possible ? I have heard that wiki has robot.txt setup to block this. Though I have heard this only from a friend and hence not sure.
I am using the same procedure I used earlier, but I am not getting the required results.
Thanks !
Update :
After some explanation and help from #svick, I tried the below code, but still not able to get any value (see last line of code, there I am expecting an html markup of the search result page)
string searchUrl = "http://en.wikipedia.org/w/index.php?search=Wikipedia&title=Special%3ASearch";
var postData = new StringBuilder();
postData.Append("search=" + model.Query);
postData.Append("&");
postData.Append("title" + "Special:Search");
byte[] data2 = Crawler.GetEncodedData(postData.ToString());
var webRequest = (HttpWebRequest)WebRequest.Create(searchUrl);
webRequest.Method = "POST";
webRequest.UserAgent = "Crawling HW (http://yassershaikh.com/contact-me/)";
webRequest.AllowAutoRedirect = false;
ServicePointManager.Expect100Continue = false;
Stream requestStream = webRequest.GetRequestStream();
requestStream.Write(data2, 0, data2.Length);
requestStream.Close();
var responseCsv = (HttpWebResponse)webRequest.GetResponse();
Stream response = responseCsv.GetResponseStream();
// Todo Parsing
var streamReader = new StreamReader(response);
string val = streamReader.ReadToEnd();
// val is empty !! <-- this is my problem !
and here is my GetEncodedData method defination.
public static byte[] GetEncodedData(string postData)
{
var encoding = new ASCIIEncoding();
byte[] data = encoding.GetBytes(postData);
return data;
}
Pls help me on this.
You probably don't need to use HttpWebRequest. Using WebClient (or HttpClient if you're on .Net 4.5) will be much easier for you.
robots.txt doesn't actually block anything. If something doesn't support it (and .Net doesn't support it), it can access anything.
Wikipedia does block requests that don't have their User-Agent header set. And you should use an informative User-Agent string with your contact information.
A better way to access Wikipedia is to use its API, rather than scraping. This way, you will get an answer that's specifically meant to be read by a custom applications, formatted as XML or JSON. There are also dumps containing all information from Wikipedia available for download.
EDIT: The problem with your newly posted code is that your query returns a 302 Moved Temporarily response to the searched article, if it exists. Either remove the line that forbids AllowAutoRedirect, or add &fulltext=Search to your query, which will mean you won't get redirected.