Showing "See results about" on google knowledge graph - seo

My client as a Wikipedia and Wikidata page and have started to work on getting a google knowledge graph for the client.
Now, after a week on various schema tagging a "See results about" on the right panel now appears when someone googles the entity name in google.
Yet this is not the full graph and to obtain the full graph, a user still needs to click on the link above in order to obtain this:
So my point is that I need the full graph to be obtain straight from the time when the entity is searched in Googlem without having to click first on a disambiguity link.
For the avoidance of doubt, the entity has a unique Name, meaning, no one else has the same, so I am not sure where there is still ambiguity.
Happy to hear your thoughts.
Cheers and thanks.

<!DOCTYPE html>
<html>
<head>
<Tittle>"P#sha Award winner 2016 Multi-biz Services"</Tittle>
<meta name="keyword" content="P#sha ICT Award Winner Abdul Haseeb. Bringing together local tech stars on a single platform, the Awards ceremony lauded innovation and excellence in the Pakistan." >
<meta name="description" content="Multi-Biz services provide e-solutions for the future needs of YOUR Business. At Multi Biz Services our dedicated team works in conjunction with our clients towards a common goal business growth. We aim to become trusted and well-known brand." >
<meta name="description" content="Multi-biz Services, Web developer in pakistan, web developer in Peshawar Multibiz Services, Achieved pasha award in 2016 | Abdul Haseeb" >
<meta name="keywords" content="http://www.multibizservices.com/,KPIT Board Top Company, Web developer in Pakistan, Web developer in Peshawar | Multibiz services, Online Shop Developer in Peshawar | Multibiz Services, TOP IT Company in pakistan| Multibiz Services, Pasha Award winner Abdul Haseeb" >

Related

Web Scrape a specific tag using python BeautifulSoup

I am working on a self-project where I am trying to analyze the causes that happened due to the unethical use of AI systems. I am trying to web scrape this website.
URL - https://incidentdatabase.ai/apps/discover?display=details&page=1
I want each and every 28 pages URL mentioned on page 1, so that I could scrape information from those URLs. But I am not able to access the particular and its contents where under each grid URL for each incident is mentioned, I am getting an empty list only when I try to scrape. I am guessing its because it is mentioned inside a grid. Any help would be appreciated.
I have attached an image of the URL inspect where I have circled what exactly I wanted to scrape.
Thank you in advance for any help.
You don't need to scrape this! You can download all the content that is being displayed from the snapshot page:
https://incidentdatabase.ai/research/snapshots
I generated a new snapshot 2 minutes ago, which will list to the above URL shortly, at the following link.
https://s3.amazonaws.com/aiid-backups-public/backup-20220912170854.tar.bz2
This will give you the entire database, which is rendered to HTML from MongoDB (JSON) collections.
Please reach out via the contact page (or comment on this solution) if something does not suit your needs.
urls=[x.get_attribute('href') for x in driver.find_elements(By.XPATH,"//div[#class='h-100 card']/a")]
If you want the 28 or so elements hrefs you can grab them like so. You can add Webdriver Waits if there is excess page loading.
This is a very interesting question, by its very nature of an X-Y Problem. Selenium is not the right tool for this this job, imho. Page is (very) dynamic, and beside being hydrated from external APIs, is also analyzing user interaction and loading the data as you scroll. Of course, it's possible to do it with selenium as well, but there is a better way. There are 311 incidents, all of them extensively documented. The way forward here is to scrape the api endpoints for each one of them: the result will be a huge json object, very detailed.
For example, to scrape the first 20 incidents using requests and pandas:
import requests
import pandas as pd
from tqdm import tqdm
big_df = pd.DataFrame()
for counter in tqdm(range(1, 20)):
r = requests.get(f'https://incidentdatabase.ai/page-data/cite/{counter}/page-data.json')
df = pd.json_normalize(r.json()['result']['pageContext']['incidentReports'])
big_df = pd.concat([big_df, df], axis=0, ignore_index=True)
print(big_df)
This will result in:
19/19 [01:00<00:00, 3.25s/it]
submitters date_published report_number title url image_url cloudinary_id source_domain mongodb_id text authors epoch_date_submitted language
0 [Roman Yampolskiy] 2015-05-19 1 Google’s YouTube Kids App Criticized for ‘Inappropriate Content’ https://blogs.wsj.com/digits/2015/05/19/googles-youtube-kids-app-criticized-for-inappropriate-content/ http://si.wsj.net/public/resources/images/BN-IM269_YouTub_P_20150518174822.jpg reports/si.wsj.net/public/resources/images/BN-IM269_YouTub_P_20150518174822.jpg blogs.wsj.com 5d34b8c29ced494f010ed45a Child and consumer advocacy groups complained to the Federal Trade Commission Tuesday that Google’s new YouTube Kids app contains “inappropriate content,” including explicit sexual language and jokes about pedophilia.\n\nGoogle launched the app for young children in February, saying the available videos were “narrowed down to content appropriate for kids.” [Alistair Barr] 1559347200 en
1 [Roman Yampolskiy] 2018-02-07 2 YouTube Kids app is STILL showing disturbing videos https://www.dailymail.co.uk/sciencetech/article-5358365/YouTube-Kids-app-showing-disturbing-videos.html https://i.dailymail.co.uk/i/pix/2018/02/06/15/48EEE02F00000578-0-image-a-18_1517931140185.jpg reports/i.dailymail.co.uk/i/pix/2018/02/06/15/48EEE02F00000578-0-image-a-18_1517931140185.jpg dailymail.co.uk 5d34b8c29ced494f010ed45b Google-owned YouTube has apologised again after more disturbing videos surfaced on its YouTube Kids app.\n\nInvestigators found several unsuitable videos including one of a burning aeroplane from the cartoon Paw Patrol and footage explaining how to sharpen a knife.\n\nYouTube has been criticised for using algorithms to sieve through material rather than using human moderators to judge what might be appropriate.\n\nThere have been hundreds of disturbing videos found on YouTube Kids in recent months that are easily accessed by children.\n\nThese videos have featured horrible things happening to various characters, including ones from the Disney movie Frozen, the Minions franchise, Doc McStuffins and Thomas the Tank Engine.\n\nParents, regulators, advertisers and law enforcement have become increasingly concerned about the open nature of the service.\n\nScroll down for video\n\nYouTube has apologised again after more disturbing videos surfaced on its YouTube Kids app. Investigators found several unsuitable videos including one from the cartoon Paw Patrol on a burning aeroplane and footage showing how to sharpen a knife\n\nA YouTube spokesperson has admitted the company needs to 'do more' to tackle inappropriate videos on their kids platform.\n\nThis investigation is the latest to expose inappropriate content on the video-sharing site which has been subject to a slew of controversies since its creation in 2005.\n\nAs part of an in-depth investigation by BBC Newsround, Google's Public Policy Manager Katie O'Donovan met five children who told her about the distressing videos they had seen on the site.\n\nThey included videos showing clowns covered in blood and messages warning them there was someone at the door.\n\nMs O'Donovan said she was 'very, very sorry for any hurt or discomfort'.\n\n'We've actually built a whole new platform for kids, called YouTube Kids, where we take the best content, stuff that children are most interested in and put it on there in a packaged up place just for kids,' she said.\n\nIt normally takes five days for supposedly child-friendly content like cartoons to get from YouTube to YouTube Kids.\n\nWithin that window it is hoped users and a specially-trained team will flag disturbing content.\n\nOnce it has been flagged and reviewed, it won't appear on the YouTube Kids app and only people who are signed in and older than 18 years old will be able to view it.\n\nThe company say thousands of people will be working around the clock to flag content.\n\nHowever, as part of the investigation Newsround revealed there are still lots of inappropriate videos on the Kids section.\n\n'We have seen significant investment in building the right tools so people can flag that [content], and those flags are reviewed very, very quickly', Ms O'Donovan said.\n\n'We're also beginning to use machine learning to identify the most harmful content, which is then automatically reviewed.'\n\nThe problem was managing an open platform where content is uploaded straight onto the site, she added.\n\n'It is a difficult environment because things are moving so, so quickly', said Ms O'Donovan.\n\n'We have a responsibility to make sure the platform can survive and can thrive so that we have a collection that comes from around the world on there'.\n\nBy the end of last year YouTube said it had removed more than 50 user channels and had stopped running ads on more than 3.5 million videos since June.\n\n'Content that endangers children is unacceptable to us and we have clear policies against such videos on YouTube and YouTube Kids', a YouTube spokesperson told MailOnline.\n\n'When we discover any inappropriate content, we quickly take action to remove it from our platform.\n\n'Over the past few months, we've taken a series of steps to tackle many of the emerging challenges around family content on YouTube, including: tightening enforcement of our Community Guidelines, age-gating content that inappropriately targets families, and removing it from the YouTube Kids app.'\n\nYouTube has been criticised for using algorithms to sieve through material rather than using human moderators to judge what might be appropriate (stock image)\n\nIn March, a disturbing Peppa Pig fake, found by journalist Laura June, shows a dentist with a huge syringe pulling out the character's teeth as she screams in distress.\n\nMrs June only realised the violent nature of the video as her three-year-old daughter watched it beside her.\n\n'Peppa does a lot of screaming and crying and the dentist is just a bit sadistic and it's just way, way off what a three-year-old should watch,' she said.\n\n'But the animation is close enough to looking like Peppa - it's crude but it's close enough that my daughter was like 'This is Peppa Pig.''\n\nAnother video depicted Peppa Pig and a friend deliberately burning down a house with someone in it.\n\nAll of these videos are easily accessed by children through YouTube's search results or recommended videos.\n\nIn March, a disturbing Peppa Pig fake, found by journalist Laura June, shows a dentist with a huge syringe pulling ou [Phoebe Weston] 1559347200
[...]
JSON response(s) can be further dissected and analysed, and more useful information can be pulled from them (including euclidean distance between incidents, etc - really a lot).
Requests docs: https://requests.readthedocs.io/en/latest/
Pandas docs: https://pandas.pydata.org/pandas-docs/stable/index.html
And for tqdm: https://tqdm.github.io/

Is the LocalBusiness schema strict to business that you have ownsership?

I work for a company that list small local business from a niche market in a website. Most of these companies have little infrastructure on the internet. So to improve visibility we are adding the Schema.org Local Business in their profiles.
We looked in the schema specification and the google documentation, but it isn't explicit anything about local business ownership. So it is not clear if this structured data allows listing other local businesses instead of your own.
Is it okay to use this schema to create rich snippet cards if you don't own the company? Can this have negative outcome as a Search Engine Optimization?
Providing structured data about other businesses (or any other domain) is perfectly fine. Structured data is useful for the content on your pages, not just for the entities that publish this content.
A consumer (like Google) that offers features making use of this structured data (like rich results) has its own rules for this feature, of course. But even if the consumer would support this feature only for structured data about the author’s own business, this should not stop you from providing the structured data about other businesses.
As a general rule: You provide as much structured data as you can/want, and consumers pick out what they want to use.
To convey that it’s not your own business, you can provide your own LocalBusiness as publisher/author of the WebPage, which is about the other LocalBusiness.
<body typeof="schema:WebPage">
<header property="schema:author schema:publisher" typeof="schema:LocalBusiness">
<h1 property="schema:name">Your own business</h1>
</header>
<article property="schema:about" typeof="schema:LocalBusiness">
<h2 property="schema:name">The other business</h2>
</article>
</body>

How can i get the scores and schedules using ESPN API?

How can I get score & schedule information for all sports matches around the world? I've seen the ESPN API but its available only for strategic partners.
Are there any other solutions, or APIs or RSS feeds for this kind of information?
If you want something that's free, it's going to be tough. There are a lot of companies, like Sports Data LLC whose core business model is to provide this data.
That said, I've had some success pulling JSON score data from the NFL website, inspired by Matt Croydon's blog post here:
http://postneo.com/2007/09/09/accidental-apis-nfl-edition
for anyone else:
an example api endpoint for scores from ESPN is
http://site.api.espn.com/apis/site/v2/sports/soccer/eng.1/scoreboard?lang=en&region=gb&calendartype=whitelist&limit=100&dates=20180407&league=eng.1
the data from this endpoint covers fixtures in the premier league on this day (i.e 2018-04-07)
the endpoint was found by following these steps
open Firefox browser
go to 'Scores' section of espn site (i.e
http://www.espn.co.uk/football/scoreboard/_/league/all/)
press ctrl + shift + e to open developer tools window
go to 'Network Monitor' tab
filter by 'XHR'
click on 'English Premier League' from dropdown-box
endpoint address will appear in table
'Open in new tab' to view data
Since the link is down in Sudhir Vadodariya''s answer, here is another one -
www.gregreda.com/2015/02/15/web-scraping-finding-the-api/
which covers a similar problem using Chrome
ESPN has various unpublished API endpoints that will let you get both scoreboard and schedule information.
Here is another example of a soccer endpoint, showing FIFA games:
Soccer:
FIFA: https://site.api.espn.com/apis/site/v2/sports/soccer/fifa.world/scoreboard
Check the "events" key to find individual games.
You didn't specify a sport, but here are some scoreboards available for other sports/leagues:
Football
College Football: http://site.api.espn.com/apis/site/v2/sports/football/college-football/scoreboard
NFL: https://site.api.espn.com/apis/site/v2/sports/football/nfl/scoreboard
Baseball
MLB: https://site.api.espn.com/apis/site/v2/sports/baseball/mlb/scoreboard
College Baseball: https://site.api.espn.com/apis/site/v2/sports/baseball/college-baseball/scoreboard
Hockey
NHL: http://site.api.espn.com/apis/site/v2/sports/hockey/nhl/scoreboard
Basketball
NBA: https://site.api.espn.com/apis/site/v2/sports/basketball/nba/scoreboard
WNBA: https://site.api.espn.com/apis/site/v2/sports/basketball/wnba/scoreboard
Women's College Basketball: https://site.api.espn.com/apis/site/v2/sports/basketball/womens-college-basketball/scoreboard
Men's College Basketball: https://site.api.espn.com/apis/site/v2/sports/basketball/mens-college-basketball/scoreboard
Note that the data shape may vary slightly between each endpoint, and you should be mindful of potential rate limits.
Source: ESPN's hidden API endpoints

Reselling products from other site - should i worry about duplicate content

I want to sell some products that are also prezent on another webshop. They are providing a datafeed with every information about the product, and they have nothing against that i post the info on my webshop.
The question is should i worry about duplicate content? The number of products is to high and it`s not worth rewriteing their description. Will google think that i stole the content?
Depends.
Personally i would prevent Google from indexing DC pages by adding this to the <head>...</head>:
<meta name="robots" content="noindex,follow"/>
The URLs, which come into question, won't rank anyway. So it's (usually) Ok to keep them completely out of Google's sight and don't have to worry any more about all the Algorithm-Updates.
Or, if i have a lot of pages and need more Crawl-Budget, i would use the robots.txt file:
User-agent: *
Disallow: /path/to/affiliate/products/
In this case the Linkjuice cannot flow freely within my site anymore, but all the important pages get indexed. Plus it's incredibly easy to implement. (Just don't do this if you have a lot of deeplinks to your Products from your Homepage etc.)
Matt Cutts in 2009:
"Can product descriptions be considered duplicate content?"
http://www.youtube.com/watch?v=z07IfCtYbLw
He doesn't say "its bad" but he clearly shows that Google doesn't like it.
Matt Cutts in 2012:
"Is it useful to have a section of my site that re-posts articles from other sites?"
http://www.youtube.com/watch?v=o7sfUDr3w8I States that it's propably a good idea to remove DC Pages (Like content from RSS-Feeds, Press Releases or Product-Description Feeds).
So to make a long story short - I really don't say "start panic" or whatever, i just say "remove everything from your site which could send out negative signals to Google, so you don't have to worry about it anymore" and then you can go on and build up your Brand to sell as many products as possible ;o)
Do worry about the content the site comes under the category of Affiliate sites so the product description would be same. it wont effect your site
If you want to do it properly I would get all the content re-writen. There is an amazing service out there too callws wordai.com.
Their site will re-write the content for you as if a human has on the Turing Plan.
You can then check the content with copyscape.com too see how unique it is!
Best of luck.

know the page rank for certain key words

I want to know the page rank for certain key words against my page. For example I wrote "best movies 2012" my page does come, but in 30th to 50th page. I want to query in the result set Google gave against my keywords so that I can see the rank of my page and my competitors against typical keywords.
I think you may be confusing PageRank with positions. PageRank is an algorithm that Google uses to determine the authority of your site. This doesn't always affect the positions of certain keywords.
There are plenty of good programs and web services around that you can use such as
http://raventools.com/
Most of the good free web services have been closed down due to Google now limiting the amount of searches performed and charging for this data.
You could check out:
http://www.semrush.com
It's free but you have to register to get data.
There are several web services providing this functionality: http://raventools.com/ or http://seomoz.org/
Or, you can perform the task manually. Here is an example on how to query google search using Java: How can you search Google Programmatically Java API
You need to compare your webpage PageRank and website PR against those of the competition. The best indication we have of website PR is the HomePage PagRank.
Ensure that you do this for the appropriate Google domain - USA - Google.com - UK Google.co.uk etc
The technique is described in more detail on http://www.keywordseopro.com
You can repeat the technique for each keyword.