Block specific domain (qaru.site) from google search results - plagiarism-detection

recently I started seeing results from qaru.site in the first 10 search results in google. This is kinda annoying as I hate plagiarism and it's just machine translated questions from StackOverflow, e.g.
java jaxb simple parsing is requiring #XmlAccessorType(XmlAccessType.FIELD) annotation
and
http://qaru.site/questions/2080946/java-jaxb-simple-parsing-is-requiring-xmlaccessortypexmlaccesstypefield-annotation
So, I have two questions:
Is that legal at all? If not can we do something against that?
Can I somehow ban this domain at least from my search results?
Sry, if it should be in some other part of StackExchange, I was guiding by domain used in questions.

Related

Find the dbpedia uri for a given place

I need your help with the following situation.
I have a local relational database that contains information about several places in a city. These places could be any kind of attraction: Museum, a cathedral, or even a square.
As an example I have information about "Square Victoria" (https://en.wikipedia.org/wiki/Victoria_Square,_Montreal)
A simple search in google gave me the wikipedia URL above. But I want to be able to do it programmatically.
For each place in the database I have also its category (square, museum, church, ....). These categories are local only and do not match any standardized categorization.
My goal is to improve this database by associating each place to its dbpedia URI.
My question is what is the best way to do that? I have some theoretical background about Semantic Web technologies but I don't have yet the practice skills to determine how to do that.
More specific questions:
Is it possible to determine the dbpedia URI using sparql only?
If it is not possible to do it with sparql only, what other technologies would I need to be able to accomplish that?
Thank you
First of all I would recommend, if you have not done it yet, to have a look at wikidata. This project is a semantic extension to wikipedia, but contrary to dbpedia, the data is not extracted from wikipedia, it is created by contributors, and therefore appears (or will appear as the project is still growing) to be more relevant.
The service offers many solutions to access data (including a Sparql endpoint), and it's main advantage is that the underlying software is mediawiki, same used for wikipedia and other Wikimedia foundation projects. The mediawiki API offers an Opensearch option that should allow you to search more efficiently that Sparql queries.
Putting everything together, I think it might be worth having a look at wikidata + wikipedia API to get pivot data to align you local database.
No direct answer but I hope that will help.

How to get Author data into Google search results without Microformats?

First, I apologize if this is not considered programming related enough for some peoples taste, however I feel it is appropriate as my question is related to what you put in a websites markup, I think so anyways.
Ok so I searched Google for the term dribbble invite and on page 2 of my results, or at this URL Google result the 5th result on page 2 (will probably be different for you based on your location and other factors) There is a result like the image below
Notice the author Photo and name. I am looking for how to do this with a website? From my research in the past it looks like it is done with Microformats however a search through the source code of the page HERE does not appear to be using any Microformats.
Any idea how this is happening for that website?
Typically, this is done through Google+.
There's a pretty good article on how-to here :
http://www.labnol.org/internet/author-profile-in-google/19775/

Twitter API search within following

wondering if anyone has heard of a way to filter Twitter search results to the users 'following' list? I'd like to do a search for pics that people I follow have posted. The pics part is fairly trivial (search for image URLs) but I'm guessing that a user-filtered search is beyond the API, even with oAuth.
I've seen a couple of services like snapbird.org that advertise this feature (even though they don't seem to work well), any guesses as to how they go about this?
Thanks!
You can implement this specific image search easily with the help of jetwick.com available as open source here: https://github.com/karussell/Jetwick
Currently searching in your friends is possible but adding yet another filter isn't that hard. Patches are welcome ;)

How does Google Know you are Cloaking?

I can't seem to find any information on how google determines if you are cloaking your content. How, from a technical standpoint, do you think they are determining this? Are they sending in things other than the googlebot and comparing it to the googlebot results? Do they have a team of human beings comparing? Or can they somehow tell that you have checked the user agent and executed a different code path because you saw "googlebot" in the name?
It's in relation to this question on legitimate url cloaking for seo. If textual content is exactly the same, but the rendering is different (1995-style html vs. ajax vs. flash), is there really a problem with cloaking?
Thanks for your put on this one.
As far as I know, how Google prepares search engine results is secret and constantly changing. Spoofing different user-agents is easy, so they might do that. They also might, in the case of Javascript, actually render partial or entire pages. "Do they have a team of human beings comparing?" This is doubtful. A lot has been written on Google's crawling strategies including this, but if humans are involved, they're only called in for specific cases. I even doubt this: any person-power spent is probably spent by tweaking the crawling engine.
Google looks at your site while presenting user-agent's other than googlebot.
See the Google Chrome comic book page 11 where it describes (even better than layman's terms) about how a Google tool can take a schematic of a web page. They could be using this or similar technology for Google search indexing and cloak detection - at least that would be another good use for it.
Google does hire contractors (indirectly, through an outside agency, for very low pay) to manually review documents returned as search results and judge their relevance to the search terms, quality of translations, etc. I highly doubt that this is their only tool for detecting cloaking, but it is one of them.
In reality, many of Google's algos are trivially reversed and are far from rocket science. In the case of, so called, "cloaking detection" all of the previous guesses are on the money (apart from, somewhat ironically, John K lol) If you don't believe me set up some test sites (inputs) and some 'cloaking test cases' (further inputs), submit your sites to uncle Google (processing) and test your non-assumptions via pseudo-advanced human-based cognitive correlationary quantum perceptions (<-- btw, i made that up for entertainment value (and now i'm nesting parentheses to really mess with your mind :)) AKA "checking google resuts to see if you are banned yet" (outputs). Loop until enlightenment == True (noob!) lol
A very simple test would be to compare the file size of a webpage the Googlbot saw against the file size of the page scanned by an alias user of Google that looks like a normal user.
This would detect most suspect candidates for closeer examination.
They call your page using tools like curl and they construct a hash based on the page without the user agent, then they construct another hash with the googlebot user-agent. Both hashes must be similars, they have algorithms to check the hashes and know if its cloaking or not

Is listing all products on the homepage's footer making a real difference SEO-wise?

I'm working on a website on which I am asked to add to the homepage's footer a list of all the products that are sold on the website along with a link to the products' detail pages.
The problem is that there are about 900 items to display.
Not only that doesn't look good but that makes the page render a lot slower.
I've been told that such a technique would improve the website's visibility in Search Engine.
I've also heard that such techniques could lead to the opposite effect: google seeing it as "spam".
My question is: Is listing products of a website on its homepage really efficient when it comes to becoming more visible on search engines?
That technique is called keyword stuffing and Google says that it's not a good idea:
"Keyword stuffing" refers to the practice of loading a webpage with keywords in an attempt to manipulate a site's ranking in Google's search results. Filling pages with keywords results in a negative user experience, and can harm your site's ranking. Focus on creating useful, information-rich content that uses keywords appropriately and in context.
Now you might want to ask: Does their crawler really realize that the list at the bottom of the page is just keyword stuffing? Well, that's a question that only Google could answer (and I'm pretty sure that they don't want to). In any case: Even if you could make a keyword stuffing block that is not recognized, they will probably improve they algorithm and -- sooner or later -- discover the truth. My recommendation: Don't do it.
If you want to optimize your search engine page ranking, do it "the right way" and read the Search Engine Optimization Guide published by Google.
Google is likely to see a huge list of keywords at the bottom of each page as spam. I'd highly recommend not doing this.
When is it ever a good idea to specify 900 items to a user? good practice dictates that large lists are usually paginated to avoid giving the user a huge blob of stuff to look through at once.
That's a good rule of thumb, if you're doing it to help the user, then it's probably good ... if you're doing it purely to help a machine (ie. google/bing), then it might be a bad idea.
You can return different html to genuine users and google by inspecting the user agent of the web request.
That way you can provide the google bot with a lot more text than you'd give a human user.
Update: People have pointed out that you shouldn't do this. I'm leaving this answer up though so that people know it's possible but bad.