How does Google store the index? [closed] - indexing

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
Lately I have been reading about web crawling, indexing and serving. I have found some information on the Google Web Masters Tool - Google Basics about the process that Google does to crawl the Web and serve the searches.
What I am wondering is how they save all those indexs? I mean, that's a lot to store right? How do they do it?
Thanks

I'm answering myself because I found some interesting stuff that talks about Google index:
In Google Webmasters YouTube Channel, Matt Cutts give us some references about the architecture behind Google Index: Google Webmaster YouTube Channel
One of those references, and from my point of view a worth reading, is this one: The Anatomy of a Large-Scale Hypertextual Web Search Engine
This helped me to understand it better, and I hope it help you too!

They use a variety of different types of data stores depending on the type of information. Generally, they don't use SQL because it has too much overhead and isn't very compatible with large-scale distribution of information.
Google actually developed their own data store that they use for large read-mostly applications such as Google Earth and the search engine's cache. This supports distributing information over a very large number of computers with each piece of information stored on three or four different computers. This allows them to use cheap hardware -- if one computer fails, the others immediately begin restoring all the data it held to the appropriate number of copies

Related

Best database paradigm to use [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Improve this question
I'm just starting to get into web development, and I am planning a website.
This website will have users that can edit data. Think of it like a tree:
Theres the organisation (company), then under the organisation there are users. Each user can have multiple "clients", and the user can edit data about the "client" and share that data. The type of data are numbers and text mostly, and possibly some images.
What database paradigm would be best suited to this? I was thinking documents or relational. I want low-cost, but also lots of room for horizontal (and possible vertical) scaling.
Thanks :)
Considering your requirement, Google Cloud SQL will be the best option for you. It provides data manipulation option and horizontal scaling.
Google Cloud SQL is a fully-managed database service that offers high performance, scalability, and convenience. Hosted on Google Cloud Platform, Cloud SQL provides a database infrastructure for applications running anywhere.

New Tactics for Acquiring Link Backs [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 9 years ago.
Improve this question
As all SEOs know that google is trying its very best to kill SEO and linkbacks are quite a difficult task now. Although content is the key but my boss is still possessed with linkbacks. I can not do directory posting, link exchange, paid linking, web 2.0 and blog commenting as they are spam now. I do not see what other choice i have except forum posting and article posting. Can someone suggest new method to acquire link backs ? I know almost all traditional methods so don't say press release or etc. If you really have something out of the box or not very much common please share.
Google isn't killing SEO, they trying to banish practices that your boss is so intent on doing.
If you want to build a quality reputation - you need to start creating genuine and unique content aimed at your target audience. Research your market, offer your visitors information they want to read and share. Make sure what you create is geared towards Google.
Make it relevant, current, accurate and engaging.
Of course, this all takes time and considerable effort - if you or your boss can't devote the time needed, or at least employ someone to do it for you... the business is going to suffer online.
Buy the links. The majority of online marketing agencies do this as the primary way to increase Google rank.
Or go the natural way and produce so much fine content people will naturally share it.

How to setup a donations page for a charity website? [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
I need to help a human rights organisation to setup a donation page at their website. They have tried PayPal and GlobalGiving and they found some glitches with these services like ceiling, transaction fees, etc. They want to setup their own mechanism. So what are the possible options and how much programming is needed? Is there any free-open source e-commerce or charity modules available?
Sounds like you are looking for something very customizable here, what I would recommend you is to do some custom coding or leverage solution like wufoo. You can build as simple as a form that whole bunch of fields and sends all these result to paypal or other payment gateways. Leveraging pre-built solution like wufoo is often recommended for non-technical people and/or simple, quick tasks like this.
(Alternatively) Most well-known applications like drupal, Joomal, wordpress (you name the rest) have fairly good support/module on this area, however, most of them require some degree of customizations and often become an overkill solution (mainly because of the learning curve).
You might look into Google Checkout. It's not free, but they do have an option tailored to non-profits (link).
The main benefit of going with them is that you won't need to set up a direct relationship with a CC merchant gateway, which can be a good sized hassle, especially for a smaller nonprofit. To me, the other benefit is that it keeps you far away from Raiser's Edge / Blackbaud, purveyors of some of the most awful donation pages I've ever had the misfortune to see or use.

Software Environment Documentation Checklist [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 5 years ago.
Improve this question
I work for a insurance company. We have our own development department made-up of almost 150 people plus some providers (outsourcing and custom made apps pretty much). In our company my team have made what we call non-functional logic libraries. That is, software libraries to handle things that are horizontal to all the development teams in our department, e.g. Security, Webservices, Logging, Messaging and so on. Most or these tools are either made from scratch or adaptation of a de-facto standard. For example our logger is an appender based on Log4J that also saves the logging messages into a DB. We also define what libraries to use in the application, for example which framework for webservices to use. We use pretty much JavaEE and Oracle AS in all our organization (with some Websphere Application servers).
Much of these projects have their architecture documented (use cases, UML diagrams, etc) and generally the generated documentation are available.
Now what we have seen is that for users sometimes is difficult to use the the libraries we provide and the are constantly asking question or they simply don't use them.
So we are planning to generate a more friendly documentation for them, so my question is:
What are the best practices or the checklist that software documentation should have?
Something comes to my mind:
API Reference guide
Quick start Tutorial
API Generated Documentation.
Must be searchable
Web Access
What else should it have? Also, based in your experience what is the best way to maintain (keep it up-to-date) and publish this type of documentation?
Keep your documentation in version control too.
Make sure on every page it has a version number so you know where your user has been reading from.
Get a CI server going and push documentation to a LIVE documentation site upon updates.
Do documentation reviews like you would code reviews.
Dog-food it :)
Kindness,
Dan

What are the Alternatives to Google Analytics [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 2 years ago.
Improve this question
I need to Track Unique Visitor count in my web application. I would really like to use Google Analytics but due to the Load limitations that google imposes I will not be able to use them. I am expecting WAY over 10,000 requests a day. This is the limitation that Google web analytics API imposes. Is there another company that has the same features as google analytics that is paid or free?
There definitely are.
Here are two open source and free solutions that are very polished:
Piwik - Designed as a direct competitor to Google Analytics (it looks just as nice) that you host on your own servers
Open Web Analytics
the 10,000 request apply to the Data API, not to the actual data collection.
Like you can have an unlimited number of users seeing your website. On the other hand if you use the API to extract data from their database, you can do 10k request a day only.
check this link for more details
The biggest, most obvious, most usual alternative is to simply do it yourself. Your webserver needs to log requests for security etc. anyway, so it's not a big deal to run something like webalizer on those logs. You won't get the quick, easy access to advanced information like paths users take through the site, btu that can be determined if you care enough. You do gain one huge benefit though: privacy of your own data.
We use Omniture here but it'll cost you.
There is SpeedTrap, a java-based analytics package. Our company used it for years before they turned into cheap **ards and decided Google Analytics was more cost effective (because it was free). But that's a story for another night.