How does api archive.org works? - api

As you surely know web.archive.org lets you inspect the history of a domain, ie:http://web.archive.org/web/*/besttatoo.com
I also has an API: http://archive.org/help/json.php
I need to get data from the API but I can't get many info on how to use it, has anyone used it and can paste some examples of use?

This link provides details about the item LovingU on archive.org:
http://archive.org/details/LovingU&output=json
To create an API query to your liking, use this page:
https://archive.org/advancedsearch.php#raw
That page allows you to choose your output format: JSON, XML, HTML, CSV or RSS and also the parameters your want to see. You can limit the number of results, too.

Related

How to get some some specific result using MarkLogic search API

I am new to MarkLogic and now I am trying to get some specific result of searching query.
More specifically, searching some word through search API and supposed to get a result of documents which include specific word.
No header information, no rank or any other meta data, just want to get documents as a result.
Is there any way to just one request and get documents as a result?
Or do I need to write some code to get specific result.
I'll be appreciated if you help me.
Thanks
If you are accessing MarkLogic from outside, I'd have a look at a POST call to /v1/search with an Accept header of multipart/mixed. Details should be described here: https://docs.marklogic.com/REST/POST/v1/search
If running inside MarkLogic, you could consider using the low-level cts:search, which indeed returns documents directly. Keep in mind though that it won't paginate results, and it is usually unwise to return more than about 50 to 100 documents at once. It would just hog memory, and not allow for parallel processing.
HTH!

Get wikitext from wikipedia API?

I'm looking at the API documentation here,
https://www.mediawiki.org/wiki/API:Query
Getting the wikitext for a page is mentioned in the beginning of the documentation,
The action=query module allows you to get information about a wiki and the data stored in it, such as the wikitext of a particular page, the links and categories of a set of pages, or the token you need to change wiki content.
but I cant seem to figure out what parameters to pass in the API request to return the wikitext for a given page. Anyone know how to do this?
I've tried parameters like,
{'action':'query', 'titles':'Anarchism', 'prop':'wikitext', 'format':'json'}
You must use this query .
https://en.wikipedia.org/w/api.php?action=query&prop=revisions&rvprop=content&format=json&titles=Anarchism&rvslots=main

List of all companies on AngelList via API

https://angel.co/api/spec/startups
What would the best approach for hitting every company that is listed on AngelList? My first guess would be to query all the numbers up until 250k, the number of companies on angelList, using this endpoint https://api.angel.co/1/startups/45435
There surely has to be a better way of doing this though.
Yes it is possible via their API. And the API endpoint that you have mentioned in your question is the correct one. I have written a PHP component to achieve this. You can use this exporter application to download the start-ups data for each country into a CSV file : AngelList Data Exporter
I hope this helps you.
Angel.co does not expose its api anymore. So you have to parse the website to get any data.
Also a quick google search would give you a few websites which have different datasets from angel.co website.

Web Scraping through Excel VBA

I need to fetch company addresses(cim) from site http://www.ceginfo.hu/
Example Company Name: AB-KONTÍR Szolgáltató Bt.
I know how to do it using WinHttp.WinHttpRequest object and FireBug.
But I am not able to decide to which URL I should send this request.
When I analyse the request/responses using FireBug, I get the following URL:
http://www.ceginfo.hu/company/search/4221638
4221638 is CompanyID here I think. But in my case I will have company name only and that's what my problem is.
So can anybody please tell me where can I get URL using firebug or any other tool using which I can track the URL with Company Name as parameter which I can use in my VBA code.
Thanks in advance!
So can anybody please tell me where can I get URL using firebug or any
other tool using which I can track the URL with Company Name as
parameter which I can use in my VBA code.
No. Unless there is a publicly available database (I would suggest calling them, if you can) or an API that allows for programmatic access, the only way to arrive at this link slug is by executing the search.
Further, the post slog is not as relevant as you think. If you search for simply "Kontir", this is the resulting page -- with many results:
http://www.ceginfo.hu/company/search/4222407
You're going to have to automate the "search" -- passing the criteria to the Web Page and executing the button-click and/or HTTPPost, and then parse the result(s). In the example company name, there is only one result. But it is possible as in my example above, that there may be multiple matches for some queries, and then you will need to have a method of dealing with these, or ignoring them.

How can I get the full change history for an article on Wikipedia?

I'd like a way to download the content of every page in the history of a popular article on Wikipedia. In other words I want to get the full contents of every edit for a single article. How would I go about doing this?
Is there a simple way to do this using the Wikipedia API. I looked and didn't find anything the popped out as a simple solution. I've also looked into the scripts on the PyWikipedia Bot page (http://botwiki.sno.cc/w/index.php?title=Template:Script&oldid=3813) and didn't find anything that was useful. Some simple way to do it in Python or Java would be the best, but I'm open to any simple solution that will get me the data.
There are multiple options for this. You can use the Special:Export special page to fetch an XML stream of the page history. Or you can use the API, found under /w/api.php. Use action=query&title=$TITLE&prop=revisions&rvprop=timestamp|user|content etc. to fetch the history.
Pywikipedia provides an interface to this, but I do not know by heart how to call it. An alternative library for Python, mwclient, also provides this, via site.pages[page_title].revisions()
Well, one solution is to parse the Wikipedia XML dump.
Just thought I'd put that out there.
If you're only getting one page, that's overkill. But if you don't need the very very latest information, using the XML would have the advantage of being a one-time download instead of repeated network hits.