I wonder what is the best way to sample,say, 1000 questions,completely randomly from Yahoo! Answer.
I want to achieve this complete randomness in which I will totally ignore the categories or date of posting etc.
Doing this manually may result in bias,so could anyone give some suggestions here,like using Yahoo! Answer API or sth.
Thanks a lot.
I do not know if it is correct solution from a formal point of view but I would use yahoo boss search to retrieve 4000 questions, and than randomly pick up 1000. Using a search engine let you to retrieve the most important (highly ranked/linked) questions. You can play around with queries for the search engine to get questions of all kinds - most popular and the worst ones... There is also Yahoo Answer API, which provide search functionality but I have not used it so I can not say how good it is.
Related
wondering if anyone has heard of a way to filter Twitter search results to the users 'following' list? I'd like to do a search for pics that people I follow have posted. The pics part is fairly trivial (search for image URLs) but I'm guessing that a user-filtered search is beyond the API, even with oAuth.
I've seen a couple of services like snapbird.org that advertise this feature (even though they don't seem to work well), any guesses as to how they go about this?
Thanks!
You can implement this specific image search easily with the help of jetwick.com available as open source here: https://github.com/karussell/Jetwick
Currently searching in your friends is possible but adding yet another filter isn't that hard. Patches are welcome ;)
I am having to use the Google AdWords API for a project, I have large chunks working but I am getting frustrated with the documentation. For example I know that a TextAd headline field has a limit of 25 characters and can't have things like ! and ? in it. The documentation makes no mention of it though:
http://code.google.com/apis/adwords/docs/reference/v200909/AdGroupAdService.TextAd.html
Does anyone know where I can find this kind of info? This is not a question about the TextAd service, but about the documentation so that I don't have to find the limits of all these fields by trial and error.
Many thanks,
b.
This post is a bit old, but since it showed up at the top of my Google search results I though it might be worth posting a link to the limits.
A list of the limits can be found here:
https://code.google.com/apis/adwords/docs/appendix/limits.html
And I found a list of disallowed symbols here (though there might be a better source):
https://support.google.com/adwords/bin/answer.py?hl=en&answer=53539
Phil
You won't find these limits in the AdWords API documentation anywhere, because these limitations are imposed by AdWords policies rather than any sort of character limit in the API.
Therefore, you must resort to AdWords documentation in order to find out things like the maximum length for an ad description or headline.
https://adwords.google.com/support/aw/bin/answer.py?hl=en-uk&query=characters&answer=6095&type=f should be a good starting point!
Cheers,
Sérgio
I'm new to seo, so please excuse what may be a very basic question.
I want to count (or estimate) the number of times that a given search phrase has been searched within a particular time period. Are there any API's out there for this? Does Google (or any other relevant search engine) release this information?
Any helpful links are greatly appreciated.
I'll be using Java, though I doubt that makes much difference.
I'm not aware of any API for it, but you can use Google Insights for Search
I use link text
it also presents the volume for every search
It looks like the WordTracker API might be the best took for programmatically finding search data.
http://www.wordtracker.com/api/
Check out this improv API to Google Trends which helps you export data.
I'm working on a website on which I am asked to add to the homepage's footer a list of all the products that are sold on the website along with a link to the products' detail pages.
The problem is that there are about 900 items to display.
Not only that doesn't look good but that makes the page render a lot slower.
I've been told that such a technique would improve the website's visibility in Search Engine.
I've also heard that such techniques could lead to the opposite effect: google seeing it as "spam".
My question is: Is listing products of a website on its homepage really efficient when it comes to becoming more visible on search engines?
That technique is called keyword stuffing and Google says that it's not a good idea:
"Keyword stuffing" refers to the practice of loading a webpage with keywords in an attempt to manipulate a site's ranking in Google's search results. Filling pages with keywords results in a negative user experience, and can harm your site's ranking. Focus on creating useful, information-rich content that uses keywords appropriately and in context.
Now you might want to ask: Does their crawler really realize that the list at the bottom of the page is just keyword stuffing? Well, that's a question that only Google could answer (and I'm pretty sure that they don't want to). In any case: Even if you could make a keyword stuffing block that is not recognized, they will probably improve they algorithm and -- sooner or later -- discover the truth. My recommendation: Don't do it.
If you want to optimize your search engine page ranking, do it "the right way" and read the Search Engine Optimization Guide published by Google.
Google is likely to see a huge list of keywords at the bottom of each page as spam. I'd highly recommend not doing this.
When is it ever a good idea to specify 900 items to a user? good practice dictates that large lists are usually paginated to avoid giving the user a huge blob of stuff to look through at once.
That's a good rule of thumb, if you're doing it to help the user, then it's probably good ... if you're doing it purely to help a machine (ie. google/bing), then it might be a bad idea.
You can return different html to genuine users and google by inspecting the user agent of the web request.
That way you can provide the google bot with a lot more text than you'd give a human user.
Update: People have pointed out that you shouldn't do this. I'm leaving this answer up though so that people know it's possible but bad.
We are planning to put large number of Business Research Reports and Articles from our intranet on to the Internet. However, we don't want others to copy the content and host it on their own.
I read about protection by CAPTCHA and was wondering if this is possible. Readers should be able to read 50% of the article for FREE after which a CAPTCHA should be entered to read the rest of the article [In this way we are making life little harder for those copycats]
Any pointers on how to implment this ? The content is in HTML and programming experience in Perl, PHP. Can hire others if required.
Aditionally, search engine will crawl half of the article and wondering if it will penalize the site for not being able to crawl the rest of the article since it won't be able to crack the CAPTCHA ?
Thanks.
There's a really good Captcha service provided by Recaptcha - http://recaptcha.net/
There is a PHP class that you can use to do all the hard work.
It's important to bear in mind that search engines aren't able to solve a Captcha and so they will only index the first half of the report. As long as this half contains largely the correct key words, it shouldn't cause a massive problem. Don't make the mistake of "detecting" a search engine and showing them different content to a normal user as the major search engines think that this is spamming.
An alternative solution would be to use a service like Copyscape (http://www.copyscape.com/) to protect your content.
I know this is not what you're asking, but please take into account that CAPTCHAs are universally broken, and will not protect your content. You said the first half is free, does that mean you intend to charge for the other half? CAPTCHA won't help you here at all...
But even if you're just trying to prevent automated scraping, CAPTCHA still won't do the trick. Check out my answer to another captcha question... Or you can go straight to the ppt I presented at OWASP last year.
Readers should be able to read 50% of the article for FREE after which a CAPTCHA should be entered to read the rest of the article
Have your PHP programmer output 50% of the article. On the bottom, add a captcha. If the user types in the correct captcha, output 100% of the article.
Any pointers on how to implment this ? The content is in HTML and programming experience in Perl, PHP. Can hire others if required.
As a PHP programmer, I use http://www.phpcaptcha.org to implement captcha.
Aditionally, search engine will crawl half of the article and wondering if it will penalize the site for not being able to crawl the rest of the article since it won't be able to crack the CAPTCHA ?
No, it won't penalize you but that particular section will not be shown on the search results.
As already mentioned reCAPTCHA is a good way to go.
Have a look at Captcha::reCAPTCHA on CPAN which according to the CPAN rating reviews "Works out of the box"
If your want Captcha then there are plenty of modules that do this on CPAN ;-)
Hope that helps.