What's the general schema that Gmail uses with Bigtable - bigtable

Google now allows you to develop apps using BigTable (hosted as a product called "Cloud Bigtable" in Google Cloud Platform). However, I can't find too many example on how to design a schema for it. They have a document, but it's very high level: https://cloud.google.com/bigtable/docs/schema-design
My Question: what's the approximate schema for gmail (just the email list component). Is it a tall or wide design? What do they use for primary keys?
Any other production examples from big apps would be appreciated, but I think Gmail would be a great example.

Schema design is very specific to your application. HBase fortunately has quite a bit of public material to help:
There is a Quora Thread on production schemas.
The Time Series material that comes after the doc you mentioned is good.
The HBase documetnation has a chapter on Schema design.
Lars George's HBase book has some good discussion. He also talks on YouTube
Ian Varley's HBaseCon talk covers this as well.
Amandeep Khurana has an article.

Related

Can I build my custom AI assistant using dialog flow?

I am so confused. I want to build a chatbot like Siri but for my own tasks. It should be able to :
- search on the internet and get answers of questions .
- give people specific information daily.
- discuss with people some scientific phenomenas.
I can't determine what platform should I use to build this chatbot . I thought about using Dialog flow but I can't figure out if it will give me the ability to do that or no. Also I thought about using tensor flow but I think that it will take a very long time so I was wondering if I can achieve what I want with using some thing like dialog flow and not to build it from scratch?
Actually in my opinion, DialogFlow is the best option to build an assistant, it is really easy to build a chatbot to save reminders, to check the weather or to have a simple conversation. DialogFlow has a really powerful tool called webhook that uses Cloud Functions to do the real programming, for example, call google APIs, such as Translate API, or insert data in your Cloud SQL database.
Also, DialogFlow use Machine Learning algorithms to understand the customer, for example, if the client says: "What's the weathe in Barcelona?" It will answer the question correctly.
Another great feature is that it is integrated with multiple technologies, such as, Google Assistant, Amazon Alexa, Cortana, Telegram, Line, Facebook Messenger, etc.
I recommend you to follow this tutorial.
Luis from Microsoft and DialogFlow from Google allow you to build models for natural language processing. These models need to be trained. So the answer is "no", out of the box, these tools do not "search the internet" to discover answers for your intents.
What you need to do is figure out how to train a natural language model, and integrate search data therein. This is bleeding edge AI. And this really is your question; "How do I integrate search with NLP and/or chatbot?"
Both Google and Microsoft let you hook into search. You do not need the dialog tools to do this; you can just pass the query text to do the searching (and let the engine use both ML and heuristic methods to rank results). You mentioned IBM Watson and this is a tool that uses ML modelling to try and answer QnA questions. The Google competition is DeepMind. You can check out those yourself.
But I believe curated content is often the way to go. Tools like Microsoft's QnA Maker let you build these types of applications very easily with little programming required. You can also look into the Azure or Bing search APIs.
And if you are looking to start with a bot from template, there are tons of examples on GitHub for Azure Bot Service and Actions-on-Google. Some even integrate with search and QnA tools. :-)
(And here is the disclaimer. I work for Microsoft. My views do not represent that of my employer.)

Enterprise search platform vs General purpose search

I have a question about Solr. It is described as an enterprise search platform. Are there Enterprise oriented search platforms and general purpose search platforms? Can't you just use Solr for example to build a general purpose search engine? If there is such a distinction what are the major differences between them?
Enterprise is a vague term tacked on to things to say "Yes, you can totally use this in professional projects, it's super good". It's baloney, in short. When reading the front page of a software product (or any product really), I find it useful to ignore all adjectives and adverbs, which makes that first sentence on the Solr page read: "Solr is the search platform from the Apache Lucene project."
Don't know why I don't get hired to write ad copy.
I think it would be fair to say that Solr is a general purpose search server, sure (depending on what general purpose entails to you, of course). It indexes data, allows you to search it, and provides a lot of tools to do that in the way the best suits your data and users.
The term Search is overloaded with lots of semantics. It is often used to denote/describe either an action, a function or a technology. But more important wit respect to the question is the fact that there are two common kind of "search projects" which are Web Search and Enterprise Search projects.
Web Search is typically about indexing content from one kind of content source (Web Servers) serving content in html format. Most often it's only about public content and document level security is not an issue. A typical example for this kind of solution is Google's Web Search, but most full-text Site Search solutions can also be seen as good examples of this category. For a basic solution a crawler , an html markup removal tool and an indexing library and some "glue" is sufficient. Apache Nutch or Apache Solr and ElasticSearch in combination with a web crawler are good candidates to be used for implementing these kind of solutions.
Enterprise Search is typically about integrating content in various formats from multiple content sources. A typical example for this kind of solution are corporate intranets, but Search Based Applications often also fall into this category. Those solutions typically come with additional requirements such as support for document level security, advanced linguistics, metadata extraction, data mappings and enrichments, synonyms etc. The projects are more complex and a more complex technology stack is needed. While Apache Solr or ElasticSearch can both be used, a lot of the required functionality is not part of the standard download and needs to be developed or integrated as part of the project. But for both - Apache Solr and ElasticSearch - there are also commercial distributions available that already expand the functionality of the standard download into the direction of Enterprise Search. Other good alternatives are commercial search engines.
I agree with #femtoRgon that Solr:
is a good General Purpose Search Platform
and not an Enterprise Search Platform
but an Enterprise Search Platform can be built with Solr
Solr is a search platform that can be customized for either general purpose search or for Enterprise Search solutions. As suggested by Daniel in the previous comments, ESearch application is used specifically for an enterprise/organization to search for the organizations internal data and also in some cases can search external content as well but only related to the organization. Enterprises generally use various systems which are either internally developed or by a vendor and the ESearch application should be able to connect to the internal systems and index the content including the different file types, metadata and importantly security that is associated with each and every document from those systems.
To conclude, Solr is a Search system which can be used to index and search content as a general or as a ESearch application for a organization.

SQL installation on Amazon Web Services

Folks, I have question this morning that hopefully one of you techies can answer – during past few months, I have been heavily involved in preparing several SQL certifications study guides as it’s my desire to secure Microsoft Certified Solutions Associate (MCSA) or associate level. While I have previous experiences within this skill set and wanted to sharpen it by obtaining further experiences and hopefully securing this certification, it has been quite challenging setting up a home lab that allows me to create environment similar to what the big dogs use nowadays – windows server/several sql instances/virtualization and all that – due to lack of proper hardware or cost. In any case, my question today is to seek your advices and guidance on other possible options, particularly if this task can be accomplished using Amazons AWS – I understand they offer some level of space that can be used as playground or if one want to extend the capacity, subscription is an option. So, if I was to subscribe the paid version of it, is it possible to install all software needed to practice and experiment all needed technologies to complete and or master contents on the training kit. Again, I’m already using my small home network and have all proper software, but just feel that it’s not enough as some areas require higher computing power to properly test or rung specific areas..
Short: Yes
You can create a micro instance for free and install whatever you want on it. If your not familiar with using the CLI, it can be a bit daunting but there are plenty of guides online.
They also offer an RDS service where, they will allow you to set up a database instance and will maintain it for you but it's not free.
Edit
Link to there MS Server Page
http://aws.amazon.com/windows/
Azure is the windows cloud service, I think the comment was have you considered looking at azure instead of AWS

New architecture concepts

I posted this community wiki in the hopes of creating a thread of expertise. My question is thus ... "Where do the experts go to learn about the newest coding techniques?".
I'm basically looking for the leading/bleeding edge of architecture, design, development and theory.
I know conferences and trade shows are probably the best venues to see the latest and greatest, but for those on a limited budget (of both time and money) such as myself, I'm looking for websites that I can read in the evenings that will keep me current on what's new in the world.
I program mostly in C# but the websites need not be geared towards C#.
Blogs
Martin Fowler, the best starting point I think. (http://martinfowler.com/)
articles like "Consumer-Driven Contracts: A Service Evolution Pattern", "Mocks Aren't Stubs", "Inversion of Control Containers and the Dependency Injection pattern" (http://martinfowler.com/articles.html)
David Hayden (http://www.davidhayden.com/)
Reflective Perspective, a good daily feed (http://blog.cwa.me.uk/tags/morning-brew/)
Ayende (http://ayende.com/Blog/)
Eric Lippert - Works on the language. Sometimes read about new C# features before they're announced elsewhere.
Scott Hanselman
Journals
The Architecture Journal (http://msdn.microsoft.com/en-us/architecture/bb410935.aspx) And what's a great option - you can order free, paper based copies!
MSDN Magazine (http://msdn.microsoft.com/en-us/magazine/default.aspx)
Community
Codeproject.com, short and large articles
pnpguidance.com, tutorials, blogs and articles
Real applications and devteams
pattern&practices home: http://msdn.microsoft.com/en-us/practices/default.aspx, and P&P products
SCSF, the Smart Client Software Factory home. Learn about desktop enterprise systems. (http://msdn.microsoft.com/en-us/library/aa480482.aspx)
WCSF, the Web Client Software Factory home. Learn about busines(process) oriented web architecures. (http://msdn.microsoft.com/en-us/library/bb264518.aspx)
Enterprise Library
For free - I would recommend MSDN, particularly keep an eye on the C# and .NET technology pages. Lots of blogs, and nearly every announcements about what's up and coming is put there.
Serverside.net
The ondemand(previously recorded) webcasts from Microsoft are normally really good, but it's a painful number of clicks to actually get to the point where you can download the file, and sometimes you find that it is not available.
Also sometimes you can find a .NET User Group locally that will have speakers/sessions occasionally. These are also great ways to network and find out what kind of work is going on in your area.
Books, books, books! Good books are written by subject matter expects, involve input from many sources, are peer reviewed, well structured and go orders of magnitude deeper than trade shows, and most online material. When you buy a book, you get the experience of an expert for a very reasonable price.
NDepend documentation comes with two white books and also online blog posts and articles concerning the architecture for large .NET application:
Partitioning code base through .NET assemblies and Visual Studio projects (8 pages)
Defining .NET Components with Namespaces (7 pages)
Control Components Dependencies to gain Clean Architecture
Re-factoring, Re-Structuring and the cost of Levelizing
Evolutionary Design and Acyclic componentization
Layering, the Level metric and the Discourse of Method
Fighting Fabricated Complexity
I never get to go to PDC, but I do love to watch the videos.
As a previous post mentioned the MS PDC videos are on online. Same with Mix which has good MS Web development related content. Also, for general MS videos there is Channel 9, it's not all technical content, but it's worth searching if you are looking for something in particular.
Someone already mentioned blogs, here are a few more:
Scott Hansleman - lots of stuff on there, a lot of ASP, MVC stuff.
Phil Haack - another good MVC guy.
Rob Connery - again a lot of focus on MVC.
ScottGu - according to his blog he "builds a few products for Microsoft", which has to be the understatment of the year - he is in charge of ASP, IIS, SIlverlight and much more besides at MS.
Check out Sharp Architecture, it's very promising.
I've collected several RSS feeds that I regularly to stay up-to-date on .NET and Agile. If you like I can share the list with you. It contains most of the stuff already mentioned here.

Example websites using db4o

I'm very impressed with my initial tests with db4o. However, i'm wondering just how many enterprise class websites are out there powered by db4o, i couldn't see any on the main website?
I can't see any reason why db4o should not be used. There appears to be decent enough support for transactions and ways to handle concurrency for example.
Anyone got a list of websites i could look at?
See:
http://developer.db4o.com/Projects/html/projectspaces/gaabormarkt.html
A particular search engine used to be powered by db4o (I say "used to" because I haven't talked to the author about this since a long time).
http://www.rel8r.com/
The author is Travis Reeder.
Although I cannot see websites specifically, here is a list of Open Source Projects from the db4o website:
http://developer.db4o.com/ProjectSpaces/view.aspx/Open_Source_Products