BigQuery - Issue with TABLE_DATE_RANGE() function - google-bigquery

Background:
I have two datasets on BigQuery.
Dataset 1 is named '12345678' with tables having the names 'ga_sessions_yyyymmdd'. For example, the table names are like ga_sessions_20140721, ga_sessions_20150413 etc.
Dataset 2 is named 'DestinationTables'. The tables names are in the format yyyymmdd. For example, 20140721, 20150413 etc.
Problem:
Using the TABLE_DATE_RANGE(), I ran the following query on Dataset 1:
SELECT
[fullVisitorId] AS [fullVisitorId]
FROM TABLE_DATE_RANGE([12345678.ga_sessions_],TIMESTAMP('2014-07-21'),TIMESTAMP('2014-07-25'));
This query successfully runs.
I now run a similar query on Dataset 2:
SELECT
[fullVisitorId] AS [fullVisitorId]
FROM TABLE_DATE_RANGE([DestinationTables.],TIMESTAMP('2014-07-21'),TIMESTAMP('2014-07-25'));
However, this errors out with the message:
Error: Can't parse table: DestinationTables
Why is this happening? Any insight on this would be greatly appreciated.
Thanks in advance!

The syntax for identifying a dataset and a table prefix are correct in your first example:
[12345678.ga_sessions_]
And as explained in the docs for this function, it will expand to cover tables (in dataset 12345678) of the format:
ga_sessions_yyyymmdd
However, in your second example, the identifier stops with just a dot where it should continue to identify a table prefix. I think the issue is that you have no prefix and so the naked dot on the end of the string is confusing the interpreter.
You may need to change your tables to have some kind of prefix, even if it's just an underscore, so that you can properly specify the prefix when calling TABLE_DATE_RANGE

Related

Can't query one record in BigQuery table, but can query others

I export Google Workspace logs to BigQuery. There are a small number of top-level records and then many nested groups of records. I can query the top level of records and most sub-levels fine but I can't select the groups records. select group_id,admin.user_email,admin.group_email works fine, for example.
But when I try to run a very similar query on the Groups records it fails with Syntax error: Expected end of input but got keyword GROUPS
SELECT
group_id,
groups.group_email
FROM
`workspace-analytics.workspace_prod.activity`
WHERE
groups.group_email='group#domain.com'
LIMIT
100;
What am I doing wrong? Why does this record in particular refuse to work the way the others do?
Answer from #MatBailie, posting it as a WikiAnswer:
The error message tells you that GROUPS is a keyword. If you quote it, then bigquery will realise its a reference and not a keyword. groups.group_email.
Because admin isn't a keyword. Imagine you had a column named from, you couldn't do SELECT from FROM table without confusing the shit out of the parser, but SELECT from FROM table isn't ambiguous at all. You can CHOOSE to quote all references regardless, but if they're keywords then they MUST be quoted.
Make sure you're quoting using backticks, the same ones you use in dataset names.

problem with auto-created bigquery field name that contains "."

I used a simple ETL tool to import QuickBooks data into Google BigQuery. Great! The only challenge notable limitation on this step is that I can't do any translation ... more like it's an EL tool.
That said, now I want to query the imported table. It's no problem at all for correctly named fields in BigQuery (like txndate). However, some of the fields are of the format abc.xyz (e.g., deposittoaccountref.value) and can't be queried. The "." in the name is apparently confusing BigQuery.
If I dump the whole table, I can see the "." name fields and the associated values.
However, I can't create a custom query against those fields. They don't show up in the auto-generated schema that allows one to drag and drop field names into the query.
Also, I tried to manually type the field name in and received the following error message: Missing column alias. Any expression in a SELECT statement that is not a column from the original data source must be followed by an alias, for example: AS my_alias.
I've tried quoting the field name and bracketing the field name but they still throw the same error.
I traced back to QB API documentation and this is indeed how Intuit labels the fields.
Finally, as long as I can query these fields at all, I can rename them to eliminate the "." problem.
Please advise and thank you!
ok, I solved this myself.
The way to fix this within bigquery query editor is to manually type in the field name (i.e., not available in the auto-generated schema) and to parenthesis the field name.
e.g, deposittoaccountref.value becomes (deposittoaccountref.value)
Now, this will label the column in the result set as "value", so you may want to relabel the data field to something without the ".". For example, I took the original
deposittoaccountref.value and modified it to
(deposittoaccountref.value) as deposittoaccountref_value
Hopefully, this will help someone else in the future!
the above answer works when there is a single dot in the name as in the example.
however, if there are multiple e.g., "line.value.amount" then the parenthesis trick doesn't work
i've tried nesting the parenthesis in different ways to no avail
e.g., (line.value.amount) = error error, ((line.value).amount) = error, (line.(value.amount)) = error

Add Column to SAS via Proc SQL Statement

I haven't been able to find this exact question - but it seems simple enough that it's likely been asked before. I apologize in advance if my search skills aren't up to par...
Anyhow, I am trying to create a 'source_flag' column, appended to several tables I'm creating. Basically, each year and payment type has it's own table. I can query and manipulate each table individually, but I'm joining them all together (full join) at the end of the process. I want to create a column with each observation equal to the table the data came from.
For example, I want to join six tables:
2019_PD
2020_PD
2019_PB
2020_PB
2019_PN
2020_PN
All I want to do, is in the query for each table, create a column assigning the table name to the entire row, so that I know where each row came from.
proc sql;
create table 2020_PD as select
...,
...,
...,
"2020_PD" as source_flg,
.
.
.
;
quit;
Right now SAS is trying to find a field called 2020_PD - which obviously doesn't exist. Is there an easy way to do this within the proc statement? I'm not trying to add additional data steps since I'm doing this with too many tables to make that viable.
Thank you!!
SQL uses single quotes to delimit strings. So use:
'2020_PD' as source_flg,
The double quotes are interpreted as escape characters for an identifier, which is why you are getting an unknown column error.

Query with TABLE_QUERY is getting more tables than expected

I have some tables by day and by hour, called 2015_09_01_00, 2015_09_01_01..., 2015_09_02_00, 2015_09_02_01, etc.
I also created a virtual table for 2015_09_01, 2015_09_02, etc, aggregating them respectively by day.
So, in this context, when I want to query some virtual tables (some days) I have to execute this query for example:
SELECT fields FROM TABLE_QUERY(dataset, 'REGEXP_MATCH(table_id, r"(2015_09_01|2015_09_02)$")')
It gives network unreachable error, I guess is messing up between the original tables and the virtual ones since the names are related.
However, if I execute:
SELECT table_id FROM dataset.__TABLES_SUMMARY__ WHERE REGEXP_MATCH(table_id, r"(2015_09_01|2015_09_02)$")
2015_09_01
2015_09_02
it seems that the filter is created successfully.
So, what am I doing wrong here?
Thanks for your help in advance.
Above example worked for me.
Most likely in your dataset you have other tables with same pattern as you are trying to match.
Try to restrict your regex. For example as below
SELECT fields FROM TABLE_QUERY(dataset, 'REGEXP_MATCH(table_id, r"^(2015_09_01|2015_09_02)$")')
Did you define the daily views to reference themselves? Your use of $ suggested you were thinking about this, but to make sure -- they should reference only the hourly tables.
For example, if you named a view 2014_09_14 when it was based on a TABLE_QUERY for
'REGEXP_MATCH(table_id, r"2015_09_14")'
then it would reference itself, which does not work. (This ought to be a clearer error though.)
If you define the view with a TABLE_QUERY that can't match itself
'REGEXP_MATCH(table_id, r"2015_09_14_\d\d")'
then it should work. If you select your view's "Details" what is the query that defined it?

TABLE_QUERY fails to handle dataset names starting with numbers [duplicate]

I'm using bigquery with a dataset called '87891428' containing daily tables. I try to query a dates range thanks to the function TABLE_DATE_RANGE:
SELECT avg(foo)
FROM (
TABLE_DATE_RANGE(87891428.a_abc_,
TIMESTAMP('2014-09-30'),
TIMESTAMP('2014-10-19'))
)
But this leads to a very explicit error message:
Error: Encountered "" at line 3, column 21. Was expecting one of:
I've the feeling that TABLE_DATE_RANGE doesn"t like to have a dataset starting with a number cause when I copy few tables into a new dataset called 'test' the query run properly. Does anyone has already encountered this issue and if so what is the best workaround (as far as I know you can't rename a dataset) ?
The fix for this is to use brackets around the dataset name and table prefix:
SELECT avg(foo)
FROM (
TABLE_DATE_RANGE([87891428.a_abc_],
TIMESTAMP('2014-09-30'),
TIMESTAMP('2014-10-19'))
)