Getting wikipedia links to company pages - wikipedia-api

I have a list of companies that I would like to get the wikipedia page's of. Does wikipedia offer an api or a way of doing this so I don't have to go and search each on individually?
Thanks for the help!

Wikipedia does have an API, which is documented at http://www.mediawiki.org/wiki/API. You can also export a list of pages (by title) using Special:Export. But if you are intending to download a large number of articles, you would be better served with a database download.

http://www.mediawiki.org/wiki/API

Related

Wikipedia API for personal website

If I search any keyword on Google like "Sesame oil" it shows content from wiki at right side. Those details are informative for users.
I wanted to know, is there any API provided by WikiPedia which I can use as well? So that if any user search for any keyword, details from Wiki can be shown as well.
You can use wikipedia Search API to find articles that are the closest to the keyword. Then once you've got the title, there's a publicly available summary endpoint, which gives you title, short text extract, wikidata definition and an image for an article what you can present to the user.
As for your question about whether it's legal - yep, it totally is.

Use of canonical tag in HTML

I read about canonical tags in HTML and from what I understood it is used to help search engines to realize which is the original content. I have articles in my recently created blog, which I have pasted in certain other popular websites. In those websites I gave back a link to my original blog post with the canonical tag. But yet my blog page is not visible in search engines (other websites do show my article). Before I had pasted onto other websites, my articles were indexed on google and could be seen on the 1st page. So I guess, there is no problem on my SEO part.
Can someone please suggest a method where my original blog gets higher preference for the content?
You can use cross domain canonical tags.
So if you have duplicated content on other domains you can use the canonical tag on those pages pointing back to the original page on your site.
This a great way to deal with syndicated content; of course you would need code level access on these other websites so you can implement the canonical tag.
More info below
http://googlewebmastercentral.blogspot.com/2009/12/handling-legitimate-cross-domain.html
Don't just copy paste your articles on every place on the internet, that will not do you any good. After writing a good article go to other sites and write something else about your articles like what your article is about, how it is helpful to someone, something like that so that people and websites come to your website to read your article. For this you don't need "canonical"
If you copy paste articles to other websites, it will only create duplicate content issues and will only harm your SEO efforts.
No, it is not required for your Blog section to do canonical issue.
Canonical means Google displays same pages with different URL.
The first thing is not submitting your article in different websites I will not give you any benefit in your ranking. If you write a good and quality content you should post in only one website if you post in different sited google will consider as a duplicate content. So it's better for you you can share your approved blog link in social media sites and also do social bookmarking, microblogging. And after you don't need canonical tag.
As #moobot said you can indeed use a cross-domain canonical tag to let Google know about the original source of the content. How exactly are you adding the canonical on other domains?
The canonical link should be in the head section of the html code. If you're adding it yourself somewhere in the body tag that's not going to do you any good.
Check out this article for some other common mistakes with the canonical tag
http://googlewebmastercentral.blogspot.nl/2013/04/5-common-mistakes-with-relcanonical.html
#metadice mentioned that copying your content all over the web isn't good for your SEO and i agree completely. If you do this for some extra backlinks or something i would recommend you to stop doing this.
Hope my answer will help someone who has this same question.

should i put all my link inside my sitemap

Normally i'm putting only important link inside my sitemap which right now they are about 3985 and google has indexed 3501 of them.
But the exact number of my links are over 100,000 and with each link there is an image that i show it to my users.
So, should i put all my links including my images inside my sitemap?
You are on the right path. Only put important links in your sitemap file. Fore more information perhaps check out the Google help page on the topic.
http://support.google.com/webmasters/bin/answer.py?hl=en&answer=156184&from=40318&rd=1
I would also check out the following link which describes sitemaps from a Google employee
https://webmasters.stackexchange.com/questions/30186/are-there-any-clear-indicators-that-my-sitemap-file-is-beneficial
Put every link you want search engines to crawl and index in your sitemap. That's the whole purpose of XML sitemaps, to tell search engines about your pages and images as well. 100,000 links and images are not a lot at all so don't worry about thinking Google will ignore your sitemap or be overwhelmed by it.

How can I get the full change history for an article on Wikipedia?

I'd like a way to download the content of every page in the history of a popular article on Wikipedia. In other words I want to get the full contents of every edit for a single article. How would I go about doing this?
Is there a simple way to do this using the Wikipedia API. I looked and didn't find anything the popped out as a simple solution. I've also looked into the scripts on the PyWikipedia Bot page (http://botwiki.sno.cc/w/index.php?title=Template:Script&oldid=3813) and didn't find anything that was useful. Some simple way to do it in Python or Java would be the best, but I'm open to any simple solution that will get me the data.
There are multiple options for this. You can use the Special:Export special page to fetch an XML stream of the page history. Or you can use the API, found under /w/api.php. Use action=query&title=$TITLE&prop=revisions&rvprop=timestamp|user|content etc. to fetch the history.
Pywikipedia provides an interface to this, but I do not know by heart how to call it. An alternative library for Python, mwclient, also provides this, via site.pages[page_title].revisions()
Well, one solution is to parse the Wikipedia XML dump.
Just thought I'd put that out there.
If you're only getting one page, that's overkill. But if you don't need the very very latest information, using the XML would have the advantage of being a one-time download instead of repeated network hits.

Search Engine Optomisation

My neighbour popped over last night to ask me for help with regards to his company's website. He said that it used to be ranked pretty high on Google but has since fallen off completely.
Now, I'm a Windows App programmer hence my request for help. I took a look and there the meta tags seem ok. I recommended that he add a <h1>heading</h1> to the pages with a page title to help reinforce the content.
I also suggested that finding related websites and getting them to link to his site was good for search ranking.
Are there any other general strategies / tools that could help?
He site is: http://www.colofinder.co.uk/
ps. BTW: this isn't just an attempt to have StackOverflow link to my neighbour's site - I'm aware that links from SO don't add to its ranking.
Go to http://ooyes.net/blog/a-step-by-step-15-minute-seo-audit-%28a-sample-from-seo-secrets%29 and read it. Then go to http://www.searchenginejournal.com/55-quick-seo-tips-even-your-mother-would-love/6760/ and read it. Then go to your friends site and look at it with that information in mind. Off the top of my head, I would add flip the company name and page title in the "title" tags. Look at the google analytics account and see how people are coming to the site. That will give you an idea of where you should start your efforts to build a workable base.
First of all he needs to be make sure that his website contents are well managed and to the point. Then Page title has to be pin point, meta tags are obsolete so try meta description. Then Main Heading should be under h1 tag, sub heading under h2 and further sub heading h3. Try to update your website one in a month.
Use community websites like Facebook, Twitter and linkidin and other related forums for posting updates about completed projects and must give inbound links. You can use your company name as an inlink to your primary website and project name as an inlink of subpage of your company website.
Keep on posting at least once in a week. Post website URL to online directories will be a great help. Do not use Blackhat SEO techniques like cloaking. Do not use any invisible text/div in your website. Make sure that whenever you give your website link any where, give the most to the point and appropriate link.
Your link should have to have that stuff against you are posting your link/sublink. Make a section on your website for tag clouds/google tags, this will be a great attraction for search engines and they will link your website to other popular websites.
Make sure these tags should be directed to top ranking website which should have relevant material. I hope this will help. Feel free if you have trouble to understand anything i have mentioned above. Best of Luck