GAE Datastore Large Amounts of Data - sql

Background:
I'm working on a project that's starting out with a large SQL dump that I have to import to a new database. This dump is about 1.5GB of just plain text, so quite a lot of information. My client right now wants me to use Google App Engine and its datastore, which I'm (a) not so fond of and (b) doesn't really play well with SQL dumps. Before I go through the trouble to make that happen...
Question:
What is a cloud-hosted database solution that can efficiently handle large quantities of data (and ideally is lower-cost)? In particular, which would be a database solution to which I could just import my SQL dump as-is?

Does your client has any reasons to use the datastore? If you already have the SQL dump, I think it would be easier to use Google Cloud Sql from GAE.

Related

Replicate data from cloud SQL postgres to bigQuery

I am looking for the recommended way of streaming database change from cloud SQL (postgres) to bigQuery ? I am seeing that CDC streaming does not seems available for postgres, does anyone know the timeline of this feature ?
Thanks a lot for you help.
Jonathan.
With Datastream for BigQuery, you can now replicate data and schema updates from operational databases directly into BigQuery.
Datastream reads and delivers every change—insert, update, and delete—from your MySQL, PostgreSQL, AlloyDB, and Oracle databases into BigQuery with minimal latency. The source database can be hosted on-premises, on Google Cloud services such as Cloud SQL or Bare Metal Solution for Oracle, or anywhere else on any cloud.
https://cloud.google.com/datastream-for-bigquery
You have to create an ETL process. That will allow you to automatically transform data from Postgres into BigQuery. You can do that using many ways, but I will point you to the two main approaches that I've already implemented:
Way 1:
Set Up the ETL Process manually:
Create your ETL using open source tools...
This method involves the use of the COPY command to migrate data from PostgreSQL tables and standard file-system files. It can be used as a normal SQL statement with SQL functions or PL/pgSQL procedures which gives a lot of flexibility to extract data as a full dump or incrementally. You need to know that it is a time-consuming process and would need you to invest in engineering bandwidth!
Also, you could try different tech stacks to implement the above, and I recommended this one Java Spring Data Flow
Way 2:
Using DataFlow
You can automate the ETL process using GCP's DataFlow without coding your own solution. It is faster and it cost, of course.
DataFlow is Unified stream and batch data processing that's
serverless, fast, and cost-effective.
Check more details and learn in a minute here
Also check this

what is the best way to copy a large sql database from azure managed instance to azure single database?

Hello folks first post in stack, btw wonderful community and helps out a lot.
like mentioned in the title what is the best way to copy such a large database? we got an ~ 500 GB Database and im currently moving this database from managed instance to a azure single database using smss:smss copy via deploy to microsoft azure sql database and it takes me right now 22 hours. i feel like im back in early 20s.
it's all in the same subscription and also in the same network configuration. afaik the process of that is that smss creates a bacpac file and then import it back to the single database. but 16 hours is just too long. so do you know any better option to do this quicker because i've a hell of more and partly larger databases to copy.
Did you think about using ETL tools, such as Azure Data Factory? It has good performance to migrate the big data. Ref this performance table:
It supports SQL database and Azure SQL MI. Ref these tutorial:
Copy and transform data in Azure SQL Database by using Azure Data Factory
Copy and transform data in Azure SQL Managed Instance by using Azure
Data Factory
It may takes some money but save much time. As we all know, time is money.
HTH.

Convert an online JSON set of files to a relational DB (SQL Server, MySQL, SQLITE)

I'm using a tool called Teamwork to manage my team's projects.
The have an online API that consists of JSON files that are accessible with authorisation
https://developer.teamwork.com/projects/introduction/welcome-to-the-teamwork-projects-api
I would like to be able to convert this online data to an sql db so i can create custom reports for my management.
I can't seem to find anything ready to do that.
I need a strategy to do this..
If you know how to program, this should be pretty straightforward.
In Python, for example, you could:
Come up with a SQL schema that maps to the JSON data objects you want to store. Create it in a database of your choice.
Use the Requests library to download the JSON resources, if you don't already have them on your system.
Convert each JSON resource to a python data structure using json.loads.
Connect to your database server using the appropriate Python library for your database. e.g., PyMySQL.
Iterate over the python data, inserting rows into the database as appropriate. This is essentially the JSON-to-Tables mapping from step 1 made procedural.
If you are not looking to do this in code, you should be able to use an open-source ETL tool to do this transformation. At LinkedIn a coworker of mine used to use Talend Data Integration for solid ETL work of a very similar nature (JSON to SQL). He was very fond of it and I respected his opinion, so I figured I should mention it, although I have zero experience of it myself.

How to migrate SQL Data into new Microsoft access Database

We have a 3gb file of data from our propriartary CRM system which is using SQL as a database.
The CRM is not meeting our needs and we are thinking about moving to Microsoft access and building our own system from the start.
We were wondering if it is possible to easily migrate the SQL database into access?
Thanks for your time.
First of all, it has been a long time since I've had to use MS-Access (thankfully) but I'm not sure Access is suitable for databases of that size. In my opinion, it's best suited to small, desktop-type applications with few concurrent users.
To answer your question, I believe Access offers a data import feature(see under the External Data ribbon in 2013) - though I'd suspect it might balk at the idea of 3GB of data. Edit: Actually this link suggests the max databsae size is 2GB
What might be more useful however, is its Linked Table feature. If I remember correctly this allows you to access data stored in SQL Server (or a similar RDBMS) which is more suited to large volumes of data through an Access front end - complete with pre-canned forms, queries, reports etc..
It is possible and fairly straight forward to move all of your data tables from SQL Server to Access; however, SQL Server is a much more robust database engine than Access. I would highly recommend against that. I have however had very good success using Access (ADP project files) as a front for the interface and using SQL Server as the database back-end for simple to moderate complexity interfaces. If you are not getting the performance you desire from your SQL Server, you might want to consider query performance tuning and looking into memory and hardware upgrades first. I think you will get better and faster results from doing that.
The simple solution would be to “link” Access to SQL server. That way you continue to use a robust data engine, but are free to use all the reporting and coding features of Access.
In this setup then Access simply becomes a “front end” to the existing SQL database.
And you do NOT want to use an ADP project in Access since they are depreciated.
The process is thus to create a blank standard database, and then use linked tables to SQL server. This will not only eliminate the need to import data (which is likely changing all the time).

Azure SqlExceptions

I have a program that uploads about 1gb of data to a SQL Azure Database.
I use a SqlBulkCopy to upload this data. I upload about 8,000,000 entities, on average 32,000 entities at a time, with a maximum of about 1,200,000 in one time.
I am receiving a lot of SqlExceptions, with error code 4815.
At first I thought this may be due to me uploading too many at a time and Azure throttling my connection or employing ddos defense, but I allowed mhy program to only submit 25,000 entities with each SqlBulkCopy, and I got even more errors! A lot more!
I have had good results using BCP to move large amounts of data into SQL Azure. The SQL Azure migration wizard uses this approach behind the scenes. This blog post is a bit dated, but the concepts are sound when it comes to importing a lot of data:
Brute Force Migration of Existing SQL Server Databases to SQL Azure
Question did not specify source of the data, so obviously this will not work for you if you are not importing from another database.
In my case, I got a 4815 when the data I was sending in one of the fields was larger than the field size in the table definition... sending 13 characters into a VARCHAR(11).