Exporting links in google scholar results - google-scholar

I want to export data from google scholar. In particular, I want to export a list of articles that cite a particular paper. If I click the Cited By link I can get this page. One way I can export these data is to add all of them to my library. Then you can export in 4 different formats (BibTex, Refman, Endnote, CSV). However, none of these export formats include the HTML link (URL) to each paper.
The other strategy would be to scrape the data, but I don't want to do that as I know that this can be very tricky with google scholar's captchas.
Is there a way to export the results of a google scholar search that includes the URLs of each paper ?

For the page you're on you mean? From the console (F12) do:
copy($$('li > a').map(a => a.href))
Now they're in your clipboard.

To extract Cited by data, you'll need an ID of a Google Scholar organic search result that the Cited by link belongs to. You can find the ID inside data-cid html attribute.
You can then query the next link to retrieve the data: https://scholar.google.com/scholar?q=info: this_is_where_you_put_the_cite_id:scholar.google.com/&output=cite
There is also a third party solution like SerpApi to do this for you. It's a paid API with a free trial.
Example python code (available in other libraries also):
from serpapi import GoogleSearch
params = {
"engine": "google_scholar_cite",
"q": "FDc6HiktlqEJ",
"api_key": "secret_api_key",
}
search = GoogleSearch(params)
results = search.get_dict()
Example JSON output:
"citations": [
{
"title": "MLA",
"snippet": "Schwertmann, U. T. R. M., and Reginald M. Taylor. \"Iron oxides.\" Minerals in soil environments 1 (1989): 379-438."
},
{
"title": "APA",
"snippet": "Schwertmann, U. T. R. M., & Taylor, R. M. (1989). Iron oxides. Minerals in soil environments, 1, 379-438."
},
{
"title": "Chicago",
"snippet": "Schwertmann, U. T. R. M., and Reginald M. Taylor. \"Iron oxides.\" Minerals in soil environments 1 (1989): 379-438."
},
{
"title": "Harvard",
"snippet": "Schwertmann, U.T.R.M. and Taylor, R.M., 1989. Iron oxides. Minerals in soil environments, 1, pp.379-438."
},
{
"title": "Vancouver",
"snippet": "Schwertmann UT, Taylor RM. Iron oxides. Minerals in soil environments. 1989 Jan 1;1:379-438."
}
],
"links": [
{
"name": "BibTeX",
"link": "https://scholar.googleusercontent.com/scholar.bib?q=info:FDc6HiktlqEJ:scholar.google.com/&output=citation&scisdr=CgXpniNQGAA:AAGBfm0AAAAAYMu3WkYJI4po_pgcUVKgwwFp1dl5uNYk&scisig=AAGBfm0AAAAAYMu3WlZR_joxo-i8FTZ1CphjzmW_d447&scisf=4&ct=citation&cd=-1&hl=en"
},
{
"name": "EndNote",
"link": "https://scholar.googleusercontent.com/scholar.enw?q=info:FDc6HiktlqEJ:scholar.google.com/&output=citation&scisdr=CgXpniNQGAA:AAGBfm0AAAAAYMu3WkYJI4po_pgcUVKgwwFp1dl5uNYk&scisig=AAGBfm0AAAAAYMu3WlZR_joxo-i8FTZ1CphjzmW_d447&scisf=3&ct=citation&cd=-1&hl=en"
},
{
"name": "RefMan",
"link": "https://scholar.googleusercontent.com/scholar.ris?q=info:FDc6HiktlqEJ:scholar.google.com/&output=citation&scisdr=CgXpniNQGAA:AAGBfm0AAAAAYMu3WkYJI4po_pgcUVKgwwFp1dl5uNYk&scisig=AAGBfm0AAAAAYMu3WlZR_joxo-i8FTZ1CphjzmW_d447&scisf=2&ct=citation&cd=-1&hl=en"
},
{
"name": "RefWorks",
"link": "https://scholar.googleusercontent.com/scholar.rfw?q=info:FDc6HiktlqEJ:scholar.google.com/&output=citation&scisdr=CgXpniNQGAA:AAGBfm0AAAAAYMu3WkYJI4po_pgcUVKgwwFp1dl5uNYk&scisig=AAGBfm0AAAAAYMu3WlZR_joxo-i8FTZ1CphjzmW_d447&scisf=1&ct=citation&cd=-1&hl=en"
}
]
Check out the documentation for more details.
Disclaimer: I work at SerpApi.

Related

Duplicated content with json:api related links

In the Resource Linkage section of json:api specification I found that you can fetch a related resource object with a url like this, http://example.com/articles/1/author, making reference to "the author of the article with id 1".
In the site complete example we can see that the author has id 9.
// ...
{
"type": "articles",
"id": "1",
"attributes": {
"title": "Rails is Omakase"
},
"relationships": {
"author": {
"links": {
"self": "http://example.com/articles/1/relationships/author",
"related": "http://example.com/articles/1/author"
},
"data": { "type": "people", "id": "9" }
}
},
"links": {
"self": "http://example.com/articles/1"
}
}
// ...
So, if I understood it, I would be able to request the same resource with two different urls:
http://example.com/articles/1/author
http://example.com/authors/9
Is this ok?
Wouldn't this be considerated duplicate content?
The article you have linked talks about duplicated content in the context of a website. JSON:API specification is about an API. A website is typically meant to be read and consumed by humans. An API is meant for programs to be consumed. The SEO concerns raised by that article are not applyable to an API cause search engines like Google does not care about API responses. They may care about the website build based on the data fetch of that API. That website should have a unique URL or a rel="canonical" attribute.

Accessing a Word(.docx) file's content with Microsoft Graph REST API?

Is there a way to obtain the content of a Word document stored in the cloud through the Microsoft Graph API without having to download the file locally?
The goal is to build an app that analyzes a Word document's inner content and produce some interesting data from it. However after searching through Microsoft's Dev Center, Graph Explorer, and their API's documentation repository, I can't find any API endpoints that can serve me that data.
I can find some endpoints that deal with manipulating Excel's contents, but not one that deals with Word. Does Microsoft Graph not support retrieving a Word document's content?
EDIT: For example, I know I can read the contents of a "message" and even apply a search on it through query parameters, as demonstrated by one of Microsoft's samples. But I can't seem to find how to do this with Word documents.
Well, it's possible to download the content of the document.
See: Download the contents of a DriveItem.
For example:
GET /v1.0/me/drive/root:/some-folder/document.docx:/content
But you'll get the entire docx, with embedded images and all. Don't know if this is what you are looking for.
As an example, see the helix-word2md project that fetches a docx and converts it to markdown.
I'm afraid you can't direly access word content. What you can do is use web URL property of a DriveItem opening a document the associated Word Online or native world if it is installed.
You can use this below to show specific item or all items:
GET /users/{userId}/drive/items/{itemId}
GET me/drive/root/children/
This is the result below:
{
"#microsoft.graph.downloadUrl": "",
"createdDateTime": "2018-08-10T01:43:00Z",
"eTag": "\"{00000000-3E94-4161-9B82-0000000},2\"",
"id": "00000000IOJA4ONFB6MFAZXARX7L7RU4NV",
"lastModifiedDateTime": "2018-08-10T01:43:00Z",
"name": "daily check.docx",
"webUrl": "https://xxxxxxx",
"cTag": "\"c:{00000000-3E94-4161-9B82-37FAFF1A71B5},2\"",
"size": 26330,
"createdBy": {
"user": {
"email": "000000.onmicrosoft.com",
"id": "000000-93dc-41b7-b89b-760c4128455a",
"displayName": "Chris"
}
},
"lastModifiedBy": {
"user": {
"email": "0000#0000.onmicrosoft.com",
"id": "00000000-93dc-41b7-b89b-00000000",
"displayName": "Chris"
}
},
"parentReference": {
"driveId":
"b!000000000gdQMtns72t31yqWMhnFCjmCqO3tR5ypOf17NKl2USqo1bNqhOzrZ",
"driveType": "business",
"id": "00000VN6Y2GOVW7725BZO354PWSELRRZ",
"path": "/drive/root:"
},
"file": {
"mimeType": "application/vnd.openxmlformats-
officedocument.wordprocessingml.document",
"hashes": {
"quickXorHash": "OSOK7r2hIVSeY1+FjaCnlOxn2p8="
}
},
"fileSystemInfo": {
"createdDateTime": "2018-08-10T01:43:00Z",
"lastModifiedDateTime": "2018-08-10T01:43:00Z"
}
}

how to grab geo-data location from gramfeed?

I'm kind of new to all the programming language, and i want to grab the geo-locations for academic research in purpose of visualization data.
There is any simple way for this? or simple tutorial how to do this? i need to extract the geo-locations from the map to csv\json\xls file
The readme here (https://github.com/Instagram/python-instagram) is a tutorial.
For example to authenticate with the API use:
from instagram.client import InstagramAPI
access_token = "YOUR_ACCESS_TOKEN"
client_secret = "YOUR_CLIENT_SECRET"
api = InstagramAPI(access_token=access_token, client_secret=client_secret)
Then you could locations information as with these three queries:
api.location(location_id)
api.location_recent_media(count, max_id, location_id)*
api.location_search(q, count, lat, lng, foursquare_id, foursquare_v2_id)
The docs for the location "endpoint" of this API (https://www.instagram.com/developer/endpoints/locations/).
Essentially the above commands could send a request like:
https://api.instagram.com/v1/locations/search?lat=48.858844&lng=2.294351&access_token=ACCESS-TOKEN
The Instagram API response would be:
{
"data": [{
"id": "788029",
"latitude": 48.858844300000001,
"longitude": 2.2943506,
"name": "Eiffel Tower, Paris"
},
{
"id": "545331",
"latitude": 48.858334059662262,
"longitude": 2.2943401336669909,
"name": "Restaurant 58 Tour Eiffel"
},
{
"id": "421930",
"latitude": 48.858325999999998,
"longitude": 2.294505,
"name": "American Library in Paris"
}]
}
Python can certainly export this data into one of the file types you mentioned. Note that there is also some really nice plotting capabilities, take a look; (http://matplotlib.org/basemap/users/examples.html). The benefit of the wrapper is that you can directly interact with the response data. It would be like a Python dict object.

Sencha touch 2, printing nested JSON array using xtemplate

I am developing a simple movie listing app.
I am using rotten tomatoes api. following is the json :
movies": [{
"id": "771310572",
"title": "Cloud Atlas",
"year": 2012,
"mpaa_rating": "R",
"runtime": 163,
"release_dates": {
"theater": "2012-10-26"
},
"ratings": {
"critics_rating": "Fresh",
"critics_score": 80,
"audience_score": 98
},
"synopsis": "Cloud Atlas explores how the actions and consequences of individual lives impact one another throughout the past, the present and the future. Action, mystery and romance weave dramatically through the story as one soul is shaped from a killer into a hero and a single act of kindness ripples across centuries to inspire a revolution in the distant future. Each member of the ensemble appears in multiple roles as the stories move through time. -- (C) Warner Bros.",
"posters": {
"thumbnail": "http://content6.flixster.com/movie/11/16/71/11167192_mob.jpg",
"profile": "http://content6.flixster.com/movie/11/16/71/11167192_pro.jpg",
"detailed": "http://content6.flixster.com/movie/11/16/71/11167192_det.jpg",
"original": "http://content6.flixster.com/movie/11/16/71/11167192_ori.jpg"
},
"abridged_cast": [{
"name": "Tom Hanks",
"id": "162655641",
"characters": ["Dermot 'Duster' Hoggins", "Dr. Henry Goose", "Isaac Sachs", "Valleysman Zachry"]
}, {
"name": "Halle Berry",
"id": "162652386",
"characters": ["Jocasta Ayrs", "Luisa Rey", "Meronym"]
}, {
"name": "Jim Broadbent",
"id": "162653369",
"characters": ["Vyvyan Ayrs"]
}, {
"name": "Hugo Weaving",
"id": "162709905",
"characters": ["Bill Smoke", "Nurse Noakes", "Old Georgie"]
}, {
"name": "Jim Sturgess",
"id": "563717190",
"characters": ["Adam Ewing", "Hae-Joo Im"]
}]
I am able to get the first list view showing the list of movies and ontap on movie list item, I am able to load the next view to show the movie details.
I am stuck in displaying abridged_cast in the xtemplate. if I use {abridged_cast} the page displays object, Object.
I am unable to find any functions which will extract the values from this array and display.
How to display the array content in the template?
Thanks.

Pinterest board list

I am looking for a way to get the list of board names from a given username. I know pinterest already provides rss for all the pins from a given user and for all the pins from a given pinboard.
All Pins from a given user: pinterest.com/[user]/feed.rss
All pins from a given user and board: pinterest.com/[user]/[board-name]/rss
Now I need a way to get the list of boards from a given user, not the pins. I know there is a way to do it because -> pinreach.com does it.
Thank you in advance :)
RSS feed only has 25 pins. To get boards or all the pins, you have to crawl the site. There is no way around it.
Here is another Unofficial API with Documentation for Pintrest
http://pinterestapi.co.uk/
Hey you can check out this unofficial Pinterest API, you can search boards by username - https://www.mashape.com/ismaelc/pinterest-1#endpoint-Show-User-Boards
Sample result below:
{
"body": [
{
"name": "Books Worth Reading",
"href": "http://pinterest.com/ismael/books-worth-reading/",
"num_of_pins": 6,
"cover_src": "http://media-cache-ec7.pinterest.com/222x/0c/31/22/0c3122735319edbf9b8aae28c9b22f86.jpg",
"thumbs_src": [
"http://media-cache-ec6.pinterest.com/75x75/2a/2d/7b/2a2d7b6f20f7518269b310b25d876810.jpg",
"http://media-cache-ec4.pinterest.com/75x75/e6/05/05/e6050519c5686ae27ad649500965f39c.jpg",
"http://media-cache-ec5.pinterest.com/75x75/07/64/c3/0764c392bae2b073c4c862a6503f09d6.jpg",
"http://media-cache-ec4.pinterest.com/75x75/61/35/0a/61350ab6eb4bb0b0d09f7c191bf30d55.jpg"
]
},
{
"name": "My Style",
"href": "http://pinterest.com/ismael/my-style/",
"num_of_pins": 0,
"cover_src": false,
"thumbs_src": false
},
{
"name": "For the Home",
"href": "http://pinterest.com/ismael/for-the-home/",
"num_of_pins": 0,
"cover_src": false,
"thumbs_src": false
},
{
"name": "Favorite Places & Spaces",
"href": "http://pinterest.com/ismael/favorite-places-spaces/",
"num_of_pins": 0,
"cover_src": false,
"thumbs_src": false
}
],
"meta": {
"count": 4
}
}
You can do this now using the Official Pinterest API using the hook:
https://api.pinterest.com/v1/me/boards/?access_token=********&fields=id%2Cname%2Curl
First, you will need to authenticate and get an acce3ss token.
The getting started doc do a good job at explaining how. https://developers.pinterest.com/docs/api/overview/