Google BigQuery Legacy Syntax Help Needed - google-bigquery

I'm having troubles converting a Google BigQuery statement from Standard SQL
to Legacy SQL. For context, I have posted the Standard SQL and respective table schema.
In a nutshell...the code below selects the 'latest' (AS-IS) version of a
Product Hierarchy for reporting. This was done with the use of STRUCTs in Standard SQL.
I'm not sure how to do this in legacy SQL.
Any help would be greatly appreciated!
clbarrineau
Standard SQL Example
SELECT STR_NBR
, SKU
, SKU_CRT_DT
, DS.*
, (
SELECT AS STRUCT
X.*
FROM (
SELECT *
, ROW_NUMBER() OVER(ORDER BY EFF_BGN_DT DESC) AS ROW_NUM
FROM SLS.PROD_HIER
) AS X
WHERE ROW_NUM = 1
) AS P_HIER
FROM `XXXX.YYYY.SKU_STR_SLS_20141201` SLS
, UNNEST(DAILY_SALES) AS DS;
Schema Definition
STR_NBR--------------------------------STRING-----------NULLABLE
SKU------------------------------------INTEGER----------NULLABLE
SKU_CRT_DT-----------------------------DATE-------------NULLABLE
DAILY_SALES----------------------------RECORD-----------REPEATED
DAILY_SALES.SLS_DT---------------------DATE-------------NULLABLE
DAILY_SALES.*(many other attributes) --XXXX-------------XXXX
PROD_HIER------------------------------RECORD-----------REPEATED
PROD_HIER.eff_bgn_dt-------------------DATE-------------NULLABLE
PROD_HIER.*(many other attributes) ----XXXX-------------XXXX

A couple of suggestions, though you may want to contact Tableau's support to ask what the status of being able to use standard SQL is as well. In some tools, it's possible to force standard SQL by putting #standardSQL at the top of the query.
For legacy SQL, instead of the comma operator with UNNEST, you'll need to use FLATTEN. Something like FLATTEN(XXXX.YYYY.SKU_STR_SLS_20141201, DAILY_SALES.SLS_DT), for example. Since you want to compute row numbers prior to flattening, though, you may need to apply FLATTEN to the subquery itself. My legacy SQL is a bit rusty, so I don't want to lead you astray with a non-functional query, but take a look at some of the other SO questions about FLATTEN to see how it's used.

Related

SQL Pivot: Can I make the pivot list dynamic without using stored proc?

Here is my SQL containing a pivot:
select * from (
select
[event_id]
,[attnum]
,PollId
,PollResponseDisplayText
from
[dbo].[v_PollReportDetails2]
) as tmp
pivot (max(tmp.[PollResponseDisplayText])
for tmp.PollId in ([703],[805],[806],[807],[808],[809])) as pivot_table
I want to change the pivot list to be something like this:
for tmp.PollId in (select PollId
from Polls
where event_id = 100100
and isVisible = 1)) as pivot_table
I can do this all in a stored proc and dynamically generate a SQL statement to feed into an execute() statement, but I need to be able to do this in a view.
Maybe I'm looking in the wrong places, but I can't seem to find official documentation that actually says that this can't be done. But after hours of experimenting and discussions, Apparently, this can not be done this way.
#aeoluseros said it best in his comment,
syntax of PIVOT clause requires these distinct values to be known at
query design time. So you cannot use that in view
Logically, this would make sense, because as we add rows to the table over time, it would have the potential to add columns dynamically to the view result.
Thanks #Damien_The_Unbeliever for additional clarification, see here:
https://learn.microsoft.com/en-us/sql/t-sql/queries/from-using-pivot-and-unpivot?view=sql-server-ver15#syntax

BigQuery: Querying repeated fields

I'm trying to use the following query to get the rows for which the event names are equal to: EventGamePlayed, EventGetUserBasicInfos or EventGetUserCompleteInfos
select *
from [com_test_testapp_ANDROID.app_events_20170426]
where event_dim.name in ("EventGamePlayed", "EventGetUserBasicInfos", "EventGetUserCompleteInfos");
I'm getting the following error: Cannot query the cross product of repeated fields event_dim.name and user_dim.user_properties.value.index.
Is it possible to make it work by not having a flattened result ?
Also, I'm not sure why the error is talking about the "user_dim.user_properties.value.index" field.
The error is due to the SELECT *, which includes all columns. Rather than using legacy SQL, try this using standard SQL, which doesn't have this problem with repeated field cross products:
#standardSQL
SELECT *
FROM com_test_testapp_ANDROID.app_events_20170426
CROSS JOIN UNNEST(event_dim) AS event_dim
WHERE event_dim.name IN ("EventGamePlayed", "EventGetUserBasicInfos", "EventGetUserCompleteInfos");
You can read more about working with repeated fields/arrays in the Working with Arrays topic. If you are used to using legacy SQL, you can read about differences between legacy and standard SQL in BigQuery in the migration guide.

SQL-92 (Filemaker): How can I UPDATE a list of sequential numbers?

I need to re-assign all SortID's, starting from 1 until MAX (SortID) from a subset of records of table Beleg, using SQL-92, after one of the SortID's has changed (for example from 444 to 444.1). I have tried several ways (for example SET #a:=0; UPDATE table SET field=#a:=#a+1 WHERE whatever='whatever' ORDER BY field2), but it didn't work, as these solutions all need a special kind of SQL, like SQLServer or Oracle, etc.
The SQL that I use is SQL-92, implemented in FileMaker (INSERT and UPDATE are available, though, but nothing fancy).
Thanks for any hint!
Gary
From what I know, SQL-92 is a standard and not a language. So you can say you are using T-SQL, which is mostly SQL-92 compliant, but you can't say I program SQL Server in SQL-92. The same applies to FileMaker.
I suppose you are trying to update your table through ODBC? The Update statement looks OK, but there are no variables if FileMaker SQL (and I am not sure using a variable inside query will give you result you expect, I think you will set SortId in every row to 1). You are thinking about doing something like Window functions with row() in TSQL, but I do not think this functionality is available.
The easiest solution is to use FileMaker, resetting the numbering for a column is really a trivial task which takes seconds. Do you need help with this?
Edit:
I was referring to TSQL functions rank() and row_number(), there is no row() function in TSQL
I finally got the answer from Ziggy Crueltyfree Zeitgeister on the Database Administrators copy of my question.
He suggested to break this down into multiple steps using a temporary table to store the results:
CREATE TABLE sorting (sid numeric(10,10), rn int);
INSERT INTO sorting (sid, rn)
SELECT SortID, RecordNumber FROM Beleg
WHERE Year ( Valuta ) = 2016
AND Ursprungskonto = 1210
ORDER BY SortID;
UPDATE Beleg SET SortID = (SELECT rn FROM sorting WHERE sid=Beleg.SortID)
WHERE Year ( Valuta ) = 2016
AND Ursprungskonto = 1210;
DROP TABLE sorting;
Of course! I just keep the table definition in Filemaker (let the type coercion be done by Filemaker this way), and filling and deleting from it with my function: RenumberSortID ().

Sub-Queries in Sybase SQL

We have an application which indexes data using user-written SQL statements. We place those statements within parenthesis so we can limit that query to a certain criteria. For example:
select * from (select F_Name from table_1)q where ID > 25
Though we have discovered that this format does not function using a Sybase database. Reporting a syntax error around the parenthesis. I've tried playing around on a test instance but haven't been able to find a way to achieve this result. I'm not directly involved in the development and my SQL knowledge is limited. I'm assuming the 'q' is to give the subresult an alias for the application to use.
Does Sybase have a specific syntax? If so, how could this query be adapted for it?
Thanks in advance.
Sybase ASE is case sensitive w.r.t. all identifiers and the query shall work:
as per #HannoBinder query :
select id from ... is not the same as select ID from... so make sure of the case.
Also make sure that the column ID is returned by the Q query in order to be used in where clause .
If the table and column names are in Upper case the following query shall work:
select * from (select F_NAME, ID from TABLE_1) Q where ID > 25

BigQuery query creation without variables?

Coming from SQL Server and a little bit of MySQL, I'm not sure how to proceed on google's BigQuery web browser query tool.
There doesn't appear to be any way to create, use or Set/Declare variables. How are folks working around this? Or perhaps I have missed something obvious in the instructions or the nature of BigQuery? Java API?
It is now possible to declare and set variables using SQL. For more information, see the documentation, but here is an example:
-- Declare a variable to hold names as an array.
DECLARE top_names ARRAY<STRING>;
-- Build an array of the top 100 names from the year 2017.
SET top_names = (
SELECT ARRAY_AGG(name ORDER BY number DESC LIMIT 100)
FROM `bigquery-public-data`.usa_names.usa_1910_current
WHERE year = 2017
);
-- Which names appear as words in Shakespeare's plays?
SELECT
name AS shakespeare_name
FROM UNNEST(top_names) AS name
WHERE name IN (
SELECT word
FROM `bigquery-public-data`.samples.shakespeare
);
There is currently no way to set/declare variables in BigQuery. If you need variables, you'll need to cut and paste them where you need them. Feel free to file this as a feature request here.
Its not elegant, and its a a pain, but...
The way we handle it is using a python script that replaces a "variable placeholder" in our query and than sending the amended query via the API.
I have opened a feature request asking for "Dynamic SQL" capabilities.
If you want to avoid BQ scripting, you can sometimes use an idiom which utilizes WITH and CROSS JOIN.
In the example below:
the events table contains some timestamped events
the reports table contain occasional aggregate values of the events
the goal is to write a query that only generates incremental (non-duplicate) aggregate rows
This is achieved by
introducing a state temp table that looks at a target table for aggregate results
to determine parameters (params) for the actual query
the params are CROSS JOINed with the actual query
allowing the param row's columns to be used to constrain the query
this query will repeatably return the same results
until the results themselves are appended to the reports table
WTIH state AS (
SELECT
-- what was the newest report's ending time?
COALESCE(
SELECT MAX(report_end_ts) FROM `x.y.reports`,
TIMESTAMP("2019-01-01")
) AS latest_report_ts,
...
),
params AS (
SELECT
-- look for events since end of last report
latest_report_ts AS event_after_ts,
-- and go until now
CURRENT_TIMESTAMP() AS event_before_ts
)
SELECT
MIN(event_ts) AS report_begin_ts,
MAX(event_ts) AS report_end_ts
COUNT(1) AS event_count,
SUM(errors) AS error_total
FROM `x.y.events`
CROSS JOIN params
WHERE event_ts > event_after_ts
AND event_ts < event_before_ts
)
This approach is useful for bigquery scheduled queries.