How do you "unpack" a query that queries views in Google BigQuery? - sql

Suppose that I have a view in BigQuery, e.g. [views.myview] defined as follows:
SELECT
Id AS Id,
MAX(Time) AS MostRecentTime
FROM
[dataset.mytable]
GROUP BY
Id
And then another query that queries that view:
SELECT
*
FROM
[dataset.mytable] tbl
JOIN [views.myview] view ON tbl.Time = mview.MostRecentTime
Is there a way to automatically generate a query where the [views.myview] in the second query is replaced with the query that generates it - basically "unpacking" the views so you have just one query that queries tables directly?
(The underlying problem: I have a query which queries many different views, including several layers of views-querying-other-views, and I want to put this query in my application. I don't want a user to be able to mess with the results of the query by changing the definition of one of the views, so I want to put the whole query in a fixed form in the application.)

This is not possible to do 'automatically'. You could try writing some script or code to do this through the BigQuery apis - https://cloud.google.com/bigquery/docs/managing-views

Related

How do I correctly use report-level filters in Power BI?

I am fairly new to Power BI and working on my first project. I currently have two separate sql views setup and connected to my project as well as a simple visual for each one. They both share a common field that I want to filter on, call it customer_key. I tried adding a filter to the "Filters on all pages" section but it doesn't seem to filter both visuals, only the one I dragged the field from. I ultimately want to do the same thing with a date field so that this report can be easily re-ran for any of our customers and any date range. Suggestions?
I would like to have a single filter that affects all data sets in my report, provided they have that column to filter on. Currently, I am using separate filters for each data set (even though I moved the filter to the "Filters on all pages" section).
This happens when the "common" customer_key columns are not related. So you have to set up a relation between your two sql views. However, if both are fact tables this is not straight forward, since you would end up with a many-to-many relation.
Workaround: Extract the unique customer_key values from both views and create a third dim table from it. Now you can create two one-to-many relations to your sql views and use the third table customer_key as a report-level filter.
You can create the connecting dim table with the following DAX table expression:
Table3 =
DISTINCT(
UNION(
SELECTCOLUMNS(
'Table1',
"customer_key", 'Table1'[customer_key]
),
SELECTCOLUMNS(
'Table2',
"customer_key", 'Table2'[customer_key]
)
)
)

How do I select attributes from another selection of elements

I am using Excel's Power Query in order to test a SQL query that I am eventually going to use in order to make a pivot table that stays updated with the database. The database is accessed through an ODBC.
The problem is not related to Power Query itself but simply the SQL request.
Here I am trying to select all bills from the "facturation" (French database) table that are from the current year (2021). I am naming this selected data FACTURES_ANEE_COURANTE.
Then I want to also select some attributes of those items from 2021 in order to display them in the pivot table, but only on the selection that I just made in order to only select (and show) bills from the current year.
select * as FACTURES_ANNEE_COURANTE
from facturation
where year(date_fact)=2021 limit 3, select date_fact from FACTURES_ANNEE_COURANTE
I only have very basic knowledge of SQL and therefore this does not seem to work, the second part of my request that is (the first one works). I'm trying to do this in order to be able to show these specific attributes in the pivot table. What's the proper way to select attributes only from my first selection of elements from my table facturation?
Thank you for your help.
A major advantage of Power Query is being able to generate complex logic without needing to be able to code in SQL. So I would abandon writing hand coded SQL - there's no need.
Before PQ came out I had 2 decades of experience writing complex SQL. After PQ came out I've written almost none - the SQL code generated by PQ is good enough, you can easily add complex transformations that are hard/impossible in SQL, and overall developing and debugging is 10x easier.
For your scenario, I would build a PQ query just using the navigation to select your facturation table. Then I would use the PQ UI to Filter (instead of a SQL where clause) and Choose Columns (to restrict the columns returned).
Whatever other transformations you need are likely met using a button in the PQ UI.

Adding New Fields via LEFT JOIN in Running ETL from SQL to MongoDB

I need to run an ETL to get data from a Sybase/SQL back-end into my MongoDB environment. We already have data from Sybase, but now there are a couple of additional fields we want to pull data in for. So with my familiarity being with Mongo (not so much Sybase), I'm trying to determine how I need to adjust our ETL to get this additional data.
The current SELECT statement looks like this:
`SELECT DISTINCT TOP 100 d.*, d10.code code10, d10.id_number as Code10ID FROM diagnosis d LEFT JOIN diagnosis_icd10 d10 on d.icd10_id = d10.id_number ORDER BY d.id_number`
Now, within the diagnosis_icd10 table that we're doing the LEFT JOIN on, there are now a couple of extra fields available.
So, my question is, do I need to explicitly include these additional fields here in the SELECT statement in order for them to be available in the ETL process? Or is this only the case if I want to rename the fields? What should this look like?
Yes you need to explicitly request them, as your current query only gets all columns from the diagnosis table (aliased to d via the d*).
This has nothing to do with Sybase though this is basic SQL so would be the same for most databases. You just add them to the select statement as in d10.column_name like the others.

Bigquery return nested results without flattening it without using a table

It is possible to return nested results(RECORD type) if noflatten_results flag is specified but it is possible to just view them on screen without writing it to table first.
for example, here is an simple user table(my actual table is big large(400+col with multi-level of nesting)
ID,
name: {first, last}
I want to view record particular user & display in my applicable, so my query is
SELECT * FROM dataset.user WHERE id=423421 limit 1
is it possible to return the result directly?
You should write your output to "temp" table with noflatten_results option (also respective expiration to be set to purge table after it is used) and serve your client out of this temp table. All "on-fly"
Have in mind that no matter how small "temp" table is - if you will be querying it (in above second step) you will be billed for at least 10MB, so you better use Tabledata.list API in this step (https://cloud.google.com/bigquery/docs/reference/v2/tabledata/list) which is free!
So if you try to get repeated records it will fail on the interface/BQ console with the error:
Error: Cannot output multiple independently repeated fields at the same time.
and in order to get past this error is to FLATTEN your output.

SQLite view across multiple databases. Is this okay? Is there a better way?

Using SQlite I have a large database split into years:
DB_2006_thru_2007.sq3
DB_2008_thru_2009.sq3
DB_current.sq3
They all have a single table call hist_tbl with two columns (key, data).
The requirements are:
1. to be able to access all the data at once.
2. inserts only go to the current version.
3. the data will continue to be split as time goes on.
4. access is through a single program that has exclusive access.
5. the program can accept some setup SQL but needs to run the same when accessing one database or multiple databases.
To view them cohesively I do the following (really in a program but command line shown here):
sqlite3 DB_current.sq3
attach database 'DB_2006_thru_2007.sq3' as hist1;
attach database 'DB_2008_thru_2009.sq3' as hist2;
create temp view hist_tbl as
select * from hist1.hist_tbl union
select * from hist2.hist_tbl union
select * from main.hist_tbl;
There is now a temp.hist_tbl (view) and a main.hist_tbl (table).
When I select without qualifying the table I get the data thru the view.
This is desirable since I can use my canned sql queries against either the joined view or the individual databases depending on how I setup. Additionally I can always insert into main.hist_tbl.
Question 1: What are the downsides?
Question 2: Is there a better way?
Thanks in advance.
Question 1: What are the downsides?
You have to update the view EVERY. FISCAL. year.
Question 2: Is there a better way?
Add a date column so you can search for things within a given timespan, like a fiscal year.