geocoding database provider(sql, nosql) and schema - sql

Companies like Yahoo, Google, MS provide geocoding services. I'd like to know what is the best way to organize the backend for such services - what is the optimal solution in terms of database provider(SQL vs NOSQL) and database schema.
Some providers use Extensible Address Language (xAL) to describe an entity in the geocoding response. xAL has more than 30 data elements. Google geocoding API has about 20 data elements.
So in case of SQL database there will be 20-30 tables with mostly one-to-many relationships via foreign keys?
What about NOSQL databases, like MongoDB. How would one organize such a database? lots of collections for each data element, similar to SQL? One collection where each document completely describes given entity in the address space?

It's hard to say... It depends on what you need to do with the data in term of analysis and caching.
I had to deal with geo coordinates. But our app is very simple and we don't need to manipulate the geolocations in DB, simply store and retrieve. So I simply store start and end points in 2 columns of each route and a polyline in a binary column, with a few milestones being saved in a dedicated SQL table.
But for an advanced use of our APP we considered using this: https://simplegeo.com/

Related

How do I set up the relationship between tables using firebase?

I want to create a relational table on the google console screen using firebase. How can I make the relationship between the user table and the user's posts table on the google console screen
Both databases offered by Firebase, Realtime Database and Cloud Firestore, are both NoSQL type databases. By definition, this means that they are non-relational. They do not have tables, and there are no joins. You can't set up relationships between collections or nodes.
With Cloud Firestore, you can use a DocumentReference type field to have one document point to another document, but that doesn't help you with constructing a query that joins those two documents together.
If you want to use either of the databases that Firebase offers, please take some time to get acquainted with NoSQL data modeling techniques. There are plenty of tutorials out there, as well as the product documentation for each product.
NoSQL databases haven't data relationship with tables. Also they also saves documents in a json format.
nosql ecommerce data model maybe help for you

Creating a blog service or a persistent chat with Table Storage

I'm trying azure storage and can't come up with real life scenarios when I would use it. As far as I understand the only index Table Storage has is Partition Key and Row Key. I can't sort or query on other columns without doing a full partition scan, right?
If I would migrate my blog service from a traditional sql server or a richer nosql database like Mongo i would probably be alright, considering users don't blog that much in one year (I would partition all blog posts per user per year for example). Even if someone would hit around a thousand blog posts a year i would be OK to load them all metadata in memory. I could do smarter partitioning if this won't work well.
If I would migrate my persistent chat service to table storage how would I do that? Users post thousands of messages a day and query history pretty often from desktop clients, mobile devices, web site etc. I don't want to lose on this and only return 1 day history with paging (which can be slow as well).
Any ideas or patterns or what am I missing here?
btw I can always use different database, however considering Table Storage is so cheap I don't want to.
PartitionKey and RowKey values are the only two indexed properties. To work around the lack of secondary indexes, you can store multiple copies of each entity with each copy using a different RowKey value. For instance, one entity will have PartitionKey=DepartmentName and RowKey=EmployeeID, while the other entity will have PartitionKey=DepartmentName and RowKey=EmailAddress. That will allow you to look up either by EmployeeID or emailAddress. Azure Storage Table Design Guide ( http://azure.microsoft.com/en-us/documentation/articles/storage-table-design-guide/) has more detailed example and has all the information that you need to design a scalable and performant Tables.
We will need more information to answer your second question about how you would migrate contents of your chat service to table storage. We need to understand the format and structure of the data that you currently store in your chat service.

Storing records of slightly dissimilar types in a RDBMS

I know SQL, but I'm not terribly experienced with it. I have a system in which I would like to log user logins, logouts and other security-related events in an indexed database to be able to pose manual queries to it, and I figure some SQL-based RDBMS should be the best fit for the job.
However, the records I'd like to store have similar, but not identical, data. All records would store a timestamp and a username, but the other data items would differ. For instance:
A login event would store the IP address the user logged in from, along with an ID for the created session.
A logout event would store the session ID but not the IP address (since I don't have access to the IP address at the point of logout).
An email-change event would store former and new e-mail address of the user.
How should one model something like this in a relational database? I can imagine at least three possibilities:
Use different tables for each kind of data item
Add columns for all the different data items and leave them as NULL for records that don't use them
Use one central table with the common data items, and auxiliary tables that store the rest of the data, linking to an event ID in the central table
Clearly, each one has its own advantages and disadvantages. I do realize that this is a somewhat subjective question and is also likely to depend on actual use-cases, but I imagine there ought to be standard/best practices for this kind of thing that I just haven't seen. Are any of my suggestions reasonable or standard? Is there some other option that I have missed that is better?
The solutions you mention appear in Martin Fowler's book Patterns of Enterprise Application Architecture. You might like to read that book to see what he says about using these patterns.
Use different tables for each kind of data item
Concrete Table Inheritance
Add columns for all the different data items and leave them as NULL for records that don't use them
Single Table Inheritance
Use one central table with the common data items, and auxiliary tables that store the rest of the data, linking to an event ID in the central table
Class Table Inheritance
Fowler also covers a fourth solution for this problem:
Serialized LOB

Totally unstructured data

We currently have a solution were we are having more and more the need to store unstructured data of various kinds. For example clients have the ability to define their own workflows where they define what kind of data should be captured (of various types...some simple some complex). This data then needs to be stored and are then displayed on a web application with a bit of functionality to modify the data.
Until now the workflows have been defined internally and therefore a MS SQL database was designed to cater for these specific workflows and their data. However now that clients have the ability to define workflows we need to relax the structure of our db. At first I thought that a key value table in ms sql might be a good idea but obviously I lose the typeness of the data being capture and then need to deserialize all the data in website (MVC.NET). I am also considering something like raven db but are not sure if this would be a good fit?
So my question is thus what would be the best way to store this unstructured data bearing in mind users must be able to search and edit/display this data as well?
How about combining 2 types of databases. Use a NO-SQL database for your unstructured data and the relational MS SQL database to save the references of your data for each workflow to retrieve them later on?
The data type will always be a problem and you always have to de-serialize it. Searching can be done by using the string representation of each value in your workflow and combining them in a searchable field in your MS SQL row.

Can I create domain schema only (without any data) in Amazon SimpleDB?

I am evaluating Amazon SimpleDB at this time. SimpleDB is very flexible in the sense that it does not have to have table (or domain) schemas. The schema evolves as the create / update commands flow in. All this is good but while I am using a modeling tool (evaluating MindScape LightSpeed) I require the schema upfront, in order for the tool to generate models based on the schema. I can handcraft domains in SimpleDB and that does help but for that I have to perform at least one create operation on the domain. I am looking for the ability to create domain schema only. Any clues?
There is no schema in SimpleDB.
This is the reason why the NoSQL people suggest to "unlearn" relational databases before shifting the paradigm to these non-relational data stores.
So, you cannot do what you describe. Without the data, there will be nothing.
While it's true that SimpleDB has no schema support, keeping some type information turns out to be crucial if you run queries on on numeric data or dates*. Most NoSQL products have both queries and types, or else no-queries and no-types, but SimpleDB has chosen queries and no-types.
As a result, integrating with any tool outside of your main application will require you to either:
store duplicate type information in different places
create your own simple schema system to store the type information
Option 2 seems much better and choosing it, despite what some suggest, does not mean that you "don't have your mind right."
S3 can be a good option for this data, you can keep it in a file with the same name as your domain and it will be accessible from anywhere with the same AWS credentials as your SimpleDB account.
Storing the data as a list of attributename=formatname is the extent of what I have needed to do. You can, in fact, store all this in an item in your domain. The only issue is that this special item could unintentionally come back from a domain query where you are expecting live data not type information.
I'm not familiar with MindScape LightSpeed, but this is a general strategy I have found beneficial when using SimpleDB, and if the product is able to load/store a file in S3 then all the better.
*Note: just to be clear, I'm not talking about reinventing the wheel or trying to use SimpleDB as a relational database. I'm talking about the fact that numeric data must be stored with both zero padding (to a length of your choosing) and an offset value (depending on if it is signed or unsigned) in order to work with SimpleDB's string-base query language. Once you decide on a format, or a set of formats to be used in your application, it would be folly to leave that information hidden in and scattered across your source files in the case where that information is needed by source code tools, query tools, reporting tools or any other code.