How to join data in Google Sheet to Metabase to create dashboard? - sql

My company uses the metabase for data analysis. The data I need to build the dashboard on the metabase is divided into 2, part of the data is retrieved from the SQL querying on the metabase and the other part is using google sheets as manual data. How can I join the data of the metabase and google sheet to create the dashboard on the metabase.
For example:
The data I need to build the dashboard on the metabase:
Name Age Adress Salary
Smith 25 Evans Mills $9000
The data is retrieved from the SQL querying on the metabase:
Name Age Adress
Smith 25 Evans Mills
Manual data on google sheets:
Salary
$9000

As far as my understanding of metabase, one of its limit is that it can not run queries across different databases.
However, I have helped our customer solve similar problems like yours. The software architecture is like this:
Metabase -> Presto SQL/Trino -> different database and different data source
In this design:
Metabase handles the dashboard part of work.
Trino handles the joining across different data sources.
Note: In our customer's case, the integration really requires certain programming work. It is not a quite trivial job.

Related

Google AdWords Transfers in Big Query: Possible to Change Table Schema?

I started to test Google AdWords transfers for Big Query (https://cloud.google.com/bigquery/docs/adwords-transfer).
I have few questions for which I cannot find answers anywhere.
Is it possible to e.g. edit which columns are downloaded from AdWords to Big Query? E.g. Keyword report has only ad group ID column but not ad group text name.
Or is it possible to decide which tables=reports are downloaded? The transfer creates around 60 tables and I need just 5...
DZ
According to here, AdWords data transfer
store your AdWords data into a Dataset. So, the inputs are in terms of Adwords customer IDs (minimum one customer ID) and the output is a collection of Datasets.
I think, you need a modified version of PubSub to store special columns or tables in BigQuery.

Import database table (specific row) based on matching validator

I have a database running now that has had all data in the "leads column" / "phone number row" removed. I have created an updated csv file that has most of the phone numbers present in addition to the client name, email and address.
How can I import the phone numbers in the phone numbers row based on matching the client name, email or address data, without affected any other columns or rows other than the phone numbers row?
This sounds like the perfect fit for an SSIS package! (this is assuming you are referring to SQL Server...since you didnt list an RDBMS it is just a guess)
Some SSIS package basics materials:
http://www.codeproject.com/Articles/155829/SQL-Server-Integration-Services-SSIS-Part-Basics
https://technet.microsoft.com/en-us/library/ms169917(v=sql.110).aspx
http://ssistutorial.blogspot.com/
SSIS is basically an ETL package development tool used with SQL Server that has countless options for moving data around. You would only need on data flow task inside of SSIS to accomplish what you are searching for. I highly recommend reading up on some of the content above and giving it a shot!

How to send data to only one Azure SQL DB Table from Azure Streaming Analytics?

Background
I have set up an IoT project using an Azure Event Hub and Azure Stream Analytics (ASA) based on tutorials from here and here. JSON formatted messages are sent from a wifi enabled device to the event hub using webhooks, which are then fed through an ASA query and stored in one of three Azure SQL databases based on the input stream they came from.
The device (Particle Photon) transmits 3 different messages with different payloads, for which there are 3 SQL tables defined for long term storage/analysis. The next step includes real-time alerts, and visualization through Power BI.
Here is a visual representation of the idea:
The ASA Query
SELECT
ParticleId,
TimePublished,
PH,
-- and other fields
INTO TpEnvStateOutputToSQL
FROM TpEnvStateInput
SELECT
ParticleId,
TimePublished,
EventCode,
-- and other fields
INTO TpEventsOutputToSQL
FROM TpEventsInput
SELECT
ParticleId,
TimePublished,
FreshWater,
-- and other fields
INTO TpConsLevelOutputToSQL
FROM TpConsLevelInput
Problem: For every message received, the data is pushed to all three tables in the database, and not only the output specified in the query. The table in which the data belongs gets populated with a new row as expected, while the two other tables get populated with NULLs for columns which no data existed for.
From the ASA Documentation it was my understanding that the INTO keyword would direct the output to the specified sink. But that does not seem to be the case, as the output from all three inputs get pushed to all sinks (all 3 SQL tables).
The test script I wrote for the Particle Photon will send one of each type of message with hardcoded fields, in the order: EnvState, Event, ConsLevels, each 15 seconds apart, repeating.
Here is an example of the output being sent to all tables, showing one column from each table:
Which was generated using this query (in Visual Studio):
SELECT
t1.TimePublished as t1_t2_t3_TimePublished,
t1.ParticleId as t1_t2_t3_ParticleID,
t1.PH as t1_PH,
t2.EventCode as t2_EventCode,
t3.FreshWater as t3_FreshWater
FROM dbo.EnvironmentState as t1, dbo.Event as t2, dbo.ConsumableLevel as t3
WHERE t1.TimePublished = t2.TimePublished AND t2.TimePublished = t3.TimePublished
For an input event of type TpEnvStateInput where the key 'PH' would exist (and not keys 'EventCode' or 'FreshWater', which belong to TpEventInput and TpConsLevelInput, respectively), an entry into only the EnvironmentState table is desired.
Question:
Is there a bug somewhere in the ASA query, or a misunderstanding on my part on how ASA should be used/setup?
I was hoping I would not have to define three separate Stream Analytics containers, as they tend to be rather pricey. After running through this tutorial, and leaving 4 ASA containers running for one day, I used up nearly $5 in Azure credits. At a projected $150/mo cost, there's just no way I could justify sticking with Azure.
ASA is supposed to be purposed for Complex Event Processing. You are using ASA in your queries to essentially pass data from the event hub to tables. It will be much cheaper if you actually host a simple "worker web app" to process the incoming events.
This blog post covers the best practices:
http://blogs.msdn.com/b/servicebus/archive/2015/01/16/event-processor-host-best-practices-part-1.aspx
ASA is great if you are doing some transformations, filters, light analytics on your input data in real-time. Furthermore, it also works great if you have some Azure Machine Learning models that are exposed as functions (currently in preview).
In your example, all three "select into" statements are reading from the same input source, and don't have any filter clauses, so all rows would be selected.
If you only want to rows select specific rows for each of the output, you have to specify a filter condition. For example, assuming you only want records with a non null value in column "PH" for the output "TpEnvStateOutputToSQL", then ASA query would look like below
SELECT
ParticleId,
TimePublished,
PH
-- and other fields INTO TpEnvStateOutputToSQL FROM TpEnvStateInput WHERE PH IS NOT NULL

Combine multiple database with different queries in 1 report ( bar chart)

I need to create a report using Pentaho User Console. I want to view my report in bar chart. In that report I need to include multiple query from different database and then the result in 1 chart. For example I have 3 database: Car, House, employee. I also have 3 query: quantity of car for each type, quantity of available house, total no of employee for each department. 3 different database and 3 different query but I want to show all 3 result in 1 chart. How I can do that?
You know about schema creation?
Go through on google how to create schema, what is fact table, what is dimension table..
And you can use pentaho schema workbench. It is used for your kind of purpose only.
and after creating schema in pentaho schema workbench you can publish that schema in pentaho BI server and you can view it in bar chart over their, and you can do analysis and drill-up , drill-down, slicing, dicing kinds of operations as well.
You can use kettle transformation as a data source for Pentaho report. Within the transformation it's perfectly fine to query 3 different DBs and prepare the result data set.

querying google fusion table

I have a Google fusion table with 3 row layouts as shown below:
We can query the fusion table as,
var query = new google.visualization.Query("https://www.google.com/fusiontables/gvizdata?tq=select * from *******************");
which select the data from the first row layout ie Rows 1 by default. Is there any way that we can query the second or 3rd Row layout of a fusion table?
API queries apply to the actual table data. The row layout tabs are just different views onto that data. You can get the actual query being executed for a tab with Tools > Publish; the HTML/JavaScript contains the FusionTablesLayer request.
I would recommend using the regular Fusion Tables APi rather than the gvizdata API because it's much more flexible and not limited to 500 response rows.
The documentation for querying a Fusion Tables source has not been updated yet to account for the new structure, so this is just a guess. Try appending #rows:id=2 to the end of your table id:
select * from <table id>#rows:id=2
A couple of things:
Querying Fusion Tables with SQL is deprecated. Please see the porting guide.
Check out the Working With Rows part of the documentation. I believe this has your answers.