Google query for a mass of related websites - google-custom-search

Is there a way to load a bunch of urls like a hundred of them and query in google to find other related to those.
To be more specific the command as_rq=www.example.com in google query searches sites that are related to this url, what if I want to search for a vast amount of urls is there an option or I'll have to traverse all the urls one by one.

Unfortunately it is not possible to do multiple url queries. I've tried to do this myself before with no luck after searching multiple online forumns

Yeap it is possible via Google CSE(custom search engine) API where on the required parameter q=exampleQuery you insert q=as_rq=www.example.comand by using annotations you can parametrize your search results.

Related

Search in mutliple sites using Google Custom Search JSON

Trying to figure out how can i search in mutliple sites using Google Custom Search JSON API.
Meaning that search will be only from a specific sites list.
i was playing with the api explorer - https://developers.google.com/custom-search/v1/reference/rest/v1/cse/list?apix_params=%7B%22cx%22%3A%22011602274690322925368%3Atkz2zvvpmk0%22%2C%22siteSearch%22%3A%22www.walla.co.il%22%7D
and noticed the site search query key, but it can only accept a single string not a list of sites:
enter image description here
What is the way to search in only in specific sites?
Thanks
There's a couple things you can do.
If you know the specific sites you want to search, you can add them as refinements to your engine. Then query for that refinement by adding 'more:<REFINEMENT_LABEL>' to the query.
Or, add 'site:' operators to the query itself. For example cats site:cnn.com OR site:bbc.com

Search Database via Google Custom Search? Attached Google CSE to (SQL/NoSQL) Database for website?

TOPIC - Google Search Engine / Custom Search - with Database
References
Search for "Google Search Engine" and "Google Custom Search"
(New to StackOverflow; just joined the other day.I'm limited to 2 links I can post right now).
NOTE:
I have not YET decided/committed to any specific coding language, framework, etc. Not until I figure out how to accomplish my question (below).
BACKGROUND INFO
What I'm trying to do (for now) is add a "search-box/ search engine" to a simple website I'm building out. Before I get too far into it (planning ahead) I would like to use Google CSE if all possible (which can do A LOT of things and works well). However, I will have a database (not sure on type YET. Will depend on what my options and I can do with CSE) of "items" that I want to be able to quickly search (in the search-box) i.e. like Amazon.com.
QUESTION:
Is there any way at all, to use Google Custom Search and or Custom Search API to search/attach a database (SQL, NoSQL, or others)? I would HIGHLY prefer being able to do all of this in Google Cloud Platform, and use one of their storage/database products.
If I get what you try to do, Google CSE is enough.
From the google doc you linked :
#Defining a Custom Search Engine in Control Panel
In the Sites to search section, add the pages you want to include in
your search engine. You can include any sites you want, not just the
sites you own. You can include whole site URLs or individual pages
URLs. You can also use URL patterns.
#Enabling Autocomplete
[...]you can enable or disable autocomplete feature using
enableAutoComplete attribute.
For the Is there any way at all [..] to search a database, I'll said not directly, but it's not a big problem.
Google CSE work on "indexable web pages", so it'll not work again a raw DB, restricted internet, or custom network not under http(s)://.
But in your case, if you make a DB, I suppose you'll have to make web page to display the data you store inside to your users ? (like products pages on Amazon)
If yes, then you'll run Google CSE again these pages by adding your http://[server ip] or http://[domain name] in the white list.
As far as I know, custom search won't guarantee all your content will be indexed.
You probably want to try exporting a full sitemap.xml, a RSS feed and if the custom search results from either of these won't satisfy you, you will probably want to look at the google search appliance product.
There's also http://sphinxsearch.com/ by the way.

SEO: how can dynamic URL with query strings be searched by search engine bots?

I’m developing an ecommerce web site in ASP.NET using SQL server 2008 database.
Most of my pages are database driven and all the content is gathered from a SQL Server.
Every product page is created dynamically from data coming from the database, hence every product’s page URL has a unique query string, containing a “product_id” variable.
*Example: http://www.myecommence.com/products.aspx?product_id=1*
I'd like to improve my Search Engine Optimization.
Dealing with a small number of products could be fine but what if I
had more than 1000 products, how could every product be crawled?
How does the google spider/bot know that a product_id with a
hypothetical number of 767 exists?
I’ve been googleing this, still I can’t understand how pages that
have absolutely no reference in the site or external sites can be
crawled? If this is possible the spider should know how to read the
website’s database tables, but I guess that this is not the case.
At this point since most of the pages and links are dynamic how
could they be indexed, the same thing applies to “user detail” pages
that are accessed via query string using a “user id=n”?
Probably what I’m asking has already been discussed but still I don’t have clear some points.
I would advise using Mod Rewrite rules to make your URLs search engine friendly.
This is very important for Google.
As is a good category structure.
Eg:
domain.com/t-shirts/girls/star-wars-t-shirt/
is far better than
domain.com/products.aspx?product_id=1*
Here is some info:
http://msdn.microsoft.com/en-us/library/ms972974.aspx
http://www.wrox.com/WileyCDA/Section/id-305997.html
To answer your questions:
Dealing with a small number of products could be fine but what if I had more than 1000 products, how could every product be crawled?
If you have a good sitemap / menu structure etc, it is likely that Google will crawl all your pages.
How does the google spider/bot know that a product_id with a hypothetical number of 767 exists?
Via crawling your site, via your sitemap, via the menu system on the site etc. However always remember: Google is not psychic - it cannot find a page unless you tell how to / link to it.
I’ve been googleing this, still I can’t understand how pages that have absolutely no reference in the site or external sites can be crawled? If this is possible the spider should know how to read the website’s database tables, but I guess that this is not the case.
If you have no reference - you are doing something wrong. Improve your site structure.
At this point since most of the pages and links are dynamic how could they be indexed, the same thing applies to “user detail” pages that are accessed via query string using a “user id=n”?
Nothing wrong with a dynamic URL per-se - but again I would recommend implementing search engine friendly URLs via Mod Rewrite or similar - see the above resources.
Good luck,
Colin
Modern systems optimize for SEO by allowing for either custom or automated URLs that remap to your id based url pattern. This URL style allows for a fully custom word for word product title or keyword/description, which carries more weight than a random id number in a URL.
To ensure all individual pages are indexed, you generally benefit most from submitting or making available a sitemap xml. More info from google on generating one here:
https://code.google.com/p/googlesitemapgenerator/
Hope that gets you going in the right direction!

Google autocomplete api for my site

Is it possible for google autocomplete api to specify to return results only for my site not for all sites? I see that there is param ds, but only purpose for that is to search in youtube. So how can I get autocomplete or maybe related or suggested search words only for single site?
I needed the very same thing and so far the only way I found to get this working is to create a custom search engine and then add it as a parameter to the autocomplete call:
http://clients1.google.com/complete/search?client=partner&gs_ri=partner&partnerid={0}&ds=cse
Where {0} is your custom search id
Certain features such as returning the results as XML don't work if you use the partner id but at least all the autocomplete results will be from your site.
You can also have multiple search engines and use different ones in different textboxes. Results are just a json string you parse.
Good luck

Multiple file types search using Google Custom Search API

I need to get Google search results for particular filetypes.
For example, in browser I would directly google search for "hyperloop filetype:pdf" and it will list out PDF files for "Hyperloop".
For this, my Google Custom Search request URI will be https://www.googleapis.com/customsearch/v1?key=MY_KEY&cx=MY_UNIQUE_ID&q=hyperloop&fileType=pdf
However, currently I would like to get search results for "hyperloop" of filetypes .ppt or .doc.
In browser, I would achieve this by googling "hyperloop filetype:ppt OR filetype:doc".
What will be my Search request URI equivalent for this query?
I could not find anything related to querying using multiple values for a single parameter in Google Custom Search Documentation.
Rather than doing
q=hyerloop&filetype=pdf
you can use
q=hyperloop%20filetype:pdf%20OR%20filetype:doc
use this its work
$url='https://www.googleapis.com/customsearch/v1?key=AIzaSyCJUGIb_tevRKD-Kxxi5f4&cx=010407088:onjj7gscy2g&q='. urlencode($keywords).'&filetype=doc&filetype=docx';
for me