I am using Firestore as my main database, but I would like to export its data to SQL format. In order to do that, I know I'll need to create a script to create/format the dump file. What is the standard way to structure the file contents? Is it XML? What are the required fields? Unfortunately, I cannot find the answer to this.
Additional Info:
I will be exporting data from Firestore and importing it to Google Cloud SQL.
EDIT 1:
I'm using Postgres.
If you're looking for the easiest way to get your data from Cloud Firestore in a more query-friendly format, have a look at the new Firebase Extension that automatically exports specific collections from Firestore to BigQuery.
BigQuery is still a NoSQL database, but one that has built-in support for structured querying through a SQL dialect.
Related
I have an existing Google BigQuery table with about 30 fields. I would like to start automating the addition of data to this table on a regular basis. I have installed the command line tools and they are working correctly.
I'm confused by the proper process to append data to a table. Do I need to specify the entire schema for the table every time I want to append data? It feels strange to be recreating the schema in an avro file. The schema already exists on the table.
Can someone please clarify how to do this?
Do I need to specify the entire schema for the table every time
No, you don't need to do this. as described in BigQuery official documentation
Schema auto-detection is not used with Avro files, Parquet files, ORC files, Cloud Firestore export files, or Cloud Datastore export files. When you load these files into BigQuery, the table schema is automatically retrieved from the self-describing source data.
I'm using a tool called Teamwork to manage my team's projects.
The have an online API that consists of JSON files that are accessible with authorisation
https://developer.teamwork.com/projects/introduction/welcome-to-the-teamwork-projects-api
I would like to be able to convert this online data to an sql db so i can create custom reports for my management.
I can't seem to find anything ready to do that.
I need a strategy to do this..
If you know how to program, this should be pretty straightforward.
In Python, for example, you could:
Come up with a SQL schema that maps to the JSON data objects you want to store. Create it in a database of your choice.
Use the Requests library to download the JSON resources, if you don't already have them on your system.
Convert each JSON resource to a python data structure using json.loads.
Connect to your database server using the appropriate Python library for your database. e.g., PyMySQL.
Iterate over the python data, inserting rows into the database as appropriate. This is essentially the JSON-to-Tables mapping from step 1 made procedural.
If you are not looking to do this in code, you should be able to use an open-source ETL tool to do this transformation. At LinkedIn a coworker of mine used to use Talend Data Integration for solid ETL work of a very similar nature (JSON to SQL). He was very fond of it and I respected his opinion, so I figured I should mention it, although I have zero experience of it myself.
I need to setup a data pipeline from some source databases like Oracle, MySQL and load the data to BigQuery.
How can I use google-cloud-dataflow to read data from a database(jdbc connection) and write to BigQuery tables using Python.
Also, I have some hive tables in an on-premise Hadoop cluster, how do I transfer this data to BigQuery.
I couldn't find the right documentation or examples to achieve this.
Can you please point me in the right direction.
I applied a solution in my project to provide such thing, you need to follow these steps:
Load data from Google Cloud SQL to Google Cloud storage in CSV by following this link.
Load the CSV data from Google cloud storage directly into BigQuery by following this link.
I've developed an app that uses Parse.com as the back end. I now need a dashboard analytics software package (such as iDashboards) that will enable me to pull data from my Parse.com database classes and present some of that data in a pretty dashboard fashion.
iDashboards looks to be the kind of tool i'm after, but it only supports certain data source inputs such as JDBC, ODBC, SQL, MySQL etc. Not being a database guru by any means, i'm not sure if Parse.com can be classed as any of the above, but from what i've read it doesn't come under any of these categories.
Can anybody recommend a way of either connecting Parse.com to iDashboard, or suggest another dashboard tool that will support Parse.com as a data source?
The main issue you are facing is that data coming out of Parse.com is going to be in json format. Most dashboards are going to prefer csv files.
The best dashboard I am aware of is Tableau and there is a discussion about getting json into Tableau here: http://community.tableau.com/ideas/1276
If your preference is using iDashboards then you need to convert the json coming out of Parse into a csv format that iDashboards can consume. You can do that using RJSON as mentioned in the post above but you'll probably have an easier time of it with a simple php or python script that periodically connects to Parse and pulls out data updates for you and then pushes it to your dashboard of choice.
Converting json to csv in php is addressed here: Converting JSON to CSV format using PHP
The difference is much more fundamental than "unsupported file format". In fact, JSON data coming out of Parse is stored in a so-called denormalized form, which means that a single JSON data file may contain the equivalent of arbitrarily many tables in a relational database. Stated differently, one JSON file may translated into potentially many CSV files, and there's no unique choice of how to perform that translation.
This is a so-called ETL problem, where ETL stands for Extract-Transform-Load. As such, you may be interested in open source ETL tools such as Kettle. Kettle is supported by Pentaho and includes functionality that can help you develop a workflow to turn JSON data into multiple CSV files that can then be imported into iDashboards (or similar). Aside from Kettle, Talend is also widely used for this purpose and has the same ability.
Finally, note that Parse is powered by MongoDB, and exports JSON data that is easily stored and manipulated in MongoDB. As such, a natural fit for reporting on Parse data is any reporting tool built for MongoDB.
As of the time of this writing, there are two such options:
JSON Studio, which is a commercial solution that is built explicitly for MongoDB and has your stated capability to produce dashboards.
SlamData, which is an open source solution, also built for MongoDB, which allows native SQL on the database. The current version does not have reporting capabilities (just CSV export), but the 2.09 version due out in June has reporting dashboards baked in.
An advantage of using a MongoDB reporting tool is that you will not have to wrangle your data into relational form. If it's heavily nested, using arrays, and so forth, it can be quite painful to develop an ETL workflow and keep it in sync with how the data is changing. Instead, all you have to do is built a script to pipe the raw data from Parse into a MongoDB instance (perhaps hosted by MongoLab or equivalent, if you don't want to host it yourself), and connect the MongoDB reporting tool on top.
You might also contact Parse and see if they have a recommended solution for this. It occurs to me they should probably bake some sort of analytical / reporting functionality into their APIs as this is such a common use case.
You can use Axibase Time-Series Database to ingest your data from parse.com and they have built in dashboards and widgets for visualization or you can just export data from ATSD to csv and use iDashboards.
Is there any extension/tool/script available to import data from eXist database to PostgreSQL database automatically?
From the tag description it's pretty clear that you're going to need to use an ETL tool or some custom code. Which is easier depends on the nature of the data and how you want to migrate it.
I'd start by looking at Talend Studio and Pentaho Kettle. See if either of them can meet your needs.
If you can turn the eXist data into structured CSV exports then you can probably just hand-define tables for it in PostgreSQL then COPY the data into it or use pgloader.
If not, then I'd suggest picking up the language you're most familiar with (Python, Java, whatever) and using the eXist data connector for that language along with the PostgreSQL data connector for the language. Write a script that fetches data from eXist and feeds it to PostgreSQL. If using Python I'd use the Psycopg2 database connector, as it's fast and supports COPY for bulk data loading.