What type of google database for a deals based website? - sql

I am trying to find out what makes the most sense for my type of database structure.
A breakdown of what it is and what I intend to do is.
A deals based website using strong consistency that will be needing to update existing linked foreign keys to new parents in a scenario where an alias such as 'Coke' is not linked up to its actual data 'Coca-Cola'.
I will be creating a price over time for these products and should be able to handle large amounts of data with little performance issues over time.
I initially began with Google's BigTable but quickly realised that without a relational part of it, it will fail on any cascading updates.
I don't want to spend plenty of time researching and learning all of these different types to later realise it isn't what I wanted. The most important aspect for me is the cascading update and ensuring it can handle a vertically large data structure for the price over time trends.
Additionally, because this is from scratch, I would be more interested in price and scalability than existing compatibility.

Cloud SQL is a fully-managed database service that makes it easy to
set up, maintain, manage, and administer your relational PostgreSQL
BETA and MySQL databases in the cloud. Cloud SQL offers high
performance, scalability, and convenience. Hosted on Google Cloud
Platform, Cloud SQL provides a database infrastructure for
applications running anywhere.
check this out it might help - https://cloud.google.com/sql/

Googles Cloud SQL service provides a fully managed relational database service. It supports PostgreSQL and MySQL.
Google also provides the Cloud Spanner service. It also provides a fully managed relational database service. Additionally Cloud Spanner provides a distributed relational database. It is better suited for mission critical systems.

Related

Creating a Datawarehouse

Currently our team is having a major database management/data management issue where hundreds of databases are being built and used for minor/one off applications where the app should really be pulling from an already existing database.
Since our security is so tight, the owners of these Systems of authority will not allow others to pull data from them at a consistent (App Necessary) rate, rather they allow a single app to do a weekly pull and that data is then given to the org.
I am being asked to compile all of those publicly available (weekly snapshots) into a single data warehouse for end users to go to. We realistically are talking 30-40 databases each with hundreds of thousands of records.
What is the best way to turn this into a data warehouse? Create a SQL server and treat each one as its own DB on the server? As far as the individual app connections I am less worried, I really want to know what is the best practice to house all of the data for consumption.
What you're describing is more of a simple data lake. If all you're being asked for is a single place for the existing data to live as-is, then sure, directly pulling all 30-40 databases to a new server will get that done. One thing to note is that if they're creating Database Snapshots, those wouldn't be helpful here. With actual database backups, it would be easy to build a process that would copy and restore those to your new server. This is assuming all of the sources are on SQL Server.
"Data warehouse" implies a certain level of organization beyond that, to facilitate reporting on an aggregate of the data across the multiple sources. Generally you'd identify any concepts that are shared between the databases and create a unified table for each concept, then create an ETL (extract, transform, load) process to standardize the data from each source and move it into those unified tables. This would be a large lift for one person to build. There's plenty of resources that you could read to get you started--Ralph Kimball's The Data Warehouse Toolkit is a comprehensive guide.
In either case, a tool you might want to look into is SSIS. It's good for copying data across servers and has drivers for multiple different RDBMS platforms. You can schedule SSIS packages from SQL Agent. It has other features that could help for data warehousing as well.

Testing Databases on the AWS Cloud (RedShift)?

We have shifted from IBM DB2 databases to having PostGRE SQL databases on the AWS Cloud. Is anyone aware of or has worked with AWS to test databases?
a) If so, what tools do you use?
b) What do you test when checking the databases in a Business Intelligence (BI) type of environment?
Anything other than just load or performance testing on it. I wish to check on Functional Testing, where I validation/verify that the data on the Cloud Servers and Databases is equivalent to the Data in the physical Servers with DB2 as the database.
So, mainly a kind of data reconciliation, but with ETL also involved.
Our product Ajilius (http://ajilius.com) does 90% of what you're after. We specialise in cloud data warehouse automation. PostgreSQL is our primary DBMS for on-premise and SMP data warehouses; Redshift is one of our cloud platforms (as well as Snowflake and Azure SQL Data Warehouse); and DB2 is a supported data source.
I say "90%" because our data warehouse migration feature reconciles data that is migrated between warehouses, but only when both warehouses were created by Ajilius. I'd like to understand more about your need, if you email me through our web site we can talk it over in detail.
Two competitors - Matillion and Treasure Data - also work in this space. Matillion is a full ETL tool, Treasure Data is more "EL" without the T. Definitely look at them, they're both good products with different approaches.

Use SQL or NoSQL?

I'm designing a system that checks a given website for any security vulnerabilities. The system includes a client (firefox plugin) and a server. The server does all the scanning while the client just relays that info to the user. If a website is dangerous, it is blacklisted; otherwise whitelisted.
The system must hypothetically be able to handle several thousands of requests and updates to the database simultaneously.
Although the database is expected to have a very simple structure, I am still considering using NoSQL because my understanding is that it can handle a greater amount of queries. Is this true? Which db technology is better suited for my system?
I suggest a NoSQL database.
In fact I've been working with two databases in the last weeks, and searching on internet I found the differences between a NoSQL an a SQL database.
Pratically, you should use a NoSQL db if you have a lot of data to query. Remind that it's not sure the data recovery in case of a db disaster.
Instead, use a SQL database if your data MUST be permanent, and you can't lose it. But query times will be longer, so it's not suggested if you have tons of data.
I understood, from what you wrote, that you need lot of queries and you "can lose" the data (if you lose a website of the list, you'll just need to re-check it, right?).
So I suggest you to go for a NoSQL db (I worked with MongoDb, it is the most famous worl-wide).
If you consider NoSQL Databases you have to analyze your data to get the right Database.
For your use case I think you should look at document databases (like MongoDB) or, if you want really high performance, a key-value Database like Redis or Riak.
With Key-Value databases you can only use the key to find the data you want.
With document databases you still have some kind of querys to find the data.
For further information look at: http://nosql-database.org/

How to isolate SQL Data from different customers?

I'm currently developing a service for an App with WCF. I want to host this data on windows-azure and it should host data from differed users. I'm searching for the right design of my database. In my opinion there are only two differed possibilities:
Create a new database for every customer
Store a customer-id to every table (or the main table when every table is connected via entities)
The first approach has very good speed and isolating, but it's very expansive on windows azure (or am I understanding something of the azure pricing wrong?). Also I don't know how to configure a WCF- Service that way, that it always use another database.
The second approach is low on speed and the isolating is poor. But it's easy to implement and cheaper.
Now to my question:
Is there any other way to get high isolation of data and also easy integration in a WCF- service using azure?
What design should I use and why?
You have two additional options: build multiple schema containers within a database (see my blog post about this technique), or even better use SQL Database Federations (you can use my open-source project called Enzo SQL Shard to access federations). The links I am providing give you access to other options as well.
In the end it's a rather complex decision that involves a tradeoff of performance, security and manageability. I usually recommend Federations, even if it has its own set of limitations, because it is a flexible multitenant option for the cloud with the option to filter data automatically. Check out the open source project - you will see how to implement good separation of customer of data independently of the physical storage.

GAE Cloud SQL and High Replication Datastore

With HRD and BigTable, you are forced to deal with eventual consistency for all queries that are not ancestor queries. Your code has to be robust enough to cope with the fact that the results may be stale.
With Google's launching of Cloud SQL, they put in a disclaimer: ( https://developers.google.com/cloud-sql/faq#hrapps )
"We recommend that you use Google Cloud SQL with
High Replication App Engine applications. While you can use use
Google Cloud SQL with applications that
do not use high replication, doing so might impact performance."
What does this mean? Does this mean that there are the same eventual consistency issues using SQL with HRD? There is no concept of entity groups in SQL, however could this mean that particular SQL queries in particular circumstances deliver stale results?
This would mean that Google's implementation of the SQL atomic transactional contract would be broken and SQL would not function as users of relational databases would expect.
If this is not the case, what are the concerns for having a master/slave or HRD model with SQL and why would Google give you the option of choosing a model with poorer performance?
(from forum)
The Cloud SQL and Data store systems are independent. You can use one or both as you see fit for your app.
We recommend using HRD apps because that type of app will be colocated with Cloud SQL. Master slave apps are served from a different set of datacenters where cloud sql does not have presence. It will work but it will be slower.
Quotes from documentation:
"Google Cloud SQL is, simply put, a MySQL instance that lives in the cloud. It has all the capabilities and functionality of MySQL"
TO answer your question, the high replication/master-slave options for a relational DB are not to do with consistency but with other factors like latency at peak loads and availability for write when there is a planned maintenance. For a high replication datastore the latency is low even if load spikes, and they are available for write even while there is a maintenance planned. Check the comparison at http://code.google.com/appengine/docs/java/datastore/hr/
And second part of question as to why would google offer a master-slave option which is not full proof. Answer is so that people who don't need complete uptime and want to try out GAE can use it.