Query with TABLE_QUERY is getting more tables than expected - google-bigquery

I have some tables by day and by hour, called 2015_09_01_00, 2015_09_01_01..., 2015_09_02_00, 2015_09_02_01, etc.
I also created a virtual table for 2015_09_01, 2015_09_02, etc, aggregating them respectively by day.
So, in this context, when I want to query some virtual tables (some days) I have to execute this query for example:
SELECT fields FROM TABLE_QUERY(dataset, 'REGEXP_MATCH(table_id, r"(2015_09_01|2015_09_02)$")')
It gives network unreachable error, I guess is messing up between the original tables and the virtual ones since the names are related.
However, if I execute:
SELECT table_id FROM dataset.__TABLES_SUMMARY__ WHERE REGEXP_MATCH(table_id, r"(2015_09_01|2015_09_02)$")
2015_09_01
2015_09_02
it seems that the filter is created successfully.
So, what am I doing wrong here?
Thanks for your help in advance.

Above example worked for me.
Most likely in your dataset you have other tables with same pattern as you are trying to match.
Try to restrict your regex. For example as below
SELECT fields FROM TABLE_QUERY(dataset, 'REGEXP_MATCH(table_id, r"^(2015_09_01|2015_09_02)$")')

Did you define the daily views to reference themselves? Your use of $ suggested you were thinking about this, but to make sure -- they should reference only the hourly tables.
For example, if you named a view 2014_09_14 when it was based on a TABLE_QUERY for
'REGEXP_MATCH(table_id, r"2015_09_14")'
then it would reference itself, which does not work. (This ought to be a clearer error though.)
If you define the view with a TABLE_QUERY that can't match itself
'REGEXP_MATCH(table_id, r"2015_09_14_\d\d")'
then it should work. If you select your view's "Details" what is the query that defined it?

Related

Why does the number of returned samples where name='keyword' does not match the number of observed samples with 'keyword' in table?

I have a Postgres table whose header is [id(uuid), name(str), arg_name(str), measurements(list), run_id(uuid), parent_id(uuid)] with a total of 237K entries.
When I want to filter for specific measurements I can use 'name', but for the majority of entries in the table 'name' == 'arg_name' and thus map to the same sample.
In my peculiar case I am interested in retrieving samples whose 'name'='TimeM12nS' and whose 'arg_name'='Time'. These two attributes point to the same samples when visually inspecting the table through PgAdmin. That is to say all entries which have arg_name='Time' also have the name='TimeM12nS' and vice-versa.
Its obvious there's a problem because of the quantity of returned samples is not the same. I first noticed the problem using django orm, but the problem is also present when I query the DB using PgAdmin.
SELECT *
FROM TableA
WHERE name='TimeM12nS'
returns 301 entries (name='TimeM12nS' and arg_name='Time' in all cases)
BUT the query:
SELECT *
FROM TableA
WHERE arg_name='Time'
returns 3945 (name='TimeM12nS' and arg_name='Time' in all cases)
I am completely stumped, anyone think they can shed some light into what's happening here?
EDIT:
I should add that the query by 'arg_name' returns the 301 entries that are returned when querying by 'name'
First let me say thank you to everyone who pitched in ideas to solve this conundrum and especially to JGH for the solution (found in the comments of the original post).
Indeed the problem was a indexing issue. After re-indexing the queries return the same number of entries '3945' as expected.
In Postgress re-indexing a table can be achieved through pgAdmin by navigating to Databases > 'database_name' > Schemas > Tables then right-clicking on the table_name selecting Maintenance and pressing the REINDEX button.
or more simply by running the following command
REINDEX TABLE table_name
Postgress Re-Indexing Docs
Without access to the database, it's not possibly to give a definitive answer. All I can provide is the next query that I would use in this case.
SELECT COUNT(*), LENGTH(name), name, arg_name
FROM TableA
WHERE arg_name='Time'
GROUP BY name, arg_name;
This should show you any differences in the name column that you aren't able to see. The length of that string could also be informative.

HQL to SQL translation - defining meanful alias for columns

I have tricki problem:
When I translate a HQL to SQL (using org.hibernate.hql.spi.QueryTranslator) i got a valid SQL.
Hibernate: Parse/Translate HQL FROM part to get pairs class alias, class name
It works as expected!
But, my problem is the transalation of the column aliases!
HQL to SQL
*) HQL for entity:Base
SELECT Base FROM Base Base
leads into:
*) SQL for entity:Base
select base0_.iD as id1_0_, base0_.comment as comment2_0_, base0_.creationDate as creation3_0_ from ...
You can see my problem:
The Alias of the columns are not intutive names:
base0_.creationDate --> creation3_0_
Expected:
base0_.creationDate --> creationDate
UseCases:
Creating Views for each entity, automatically
Better readabillity for our Database adminitrators
I have debugged hours and hours to find a solution to influence the mechnanism.
I hope some one has an idea how to solve this problem (whithout hacking)!
I know this is a not conventional question, so i would be glad, someone has an idea ;-)
Thanks, in advance
Andy
The problem you faced is known as Hibernate naming stragety.
Here https://www.baeldung.com/hibernate-naming-strategy you can find a deeper explanation on how to customize the generated column names by implementing the interface PhysicalNamingStrategy.
Thanks for your help, but it was not the solution, because now I can rename every column, table or schema, but not the alias in a select query.
I still get queries like this:
select base0_.i_d as i_d1_0_, ....
from base base0_
left outer join campaign base0_1_ on base0_.i_d = base0_1_.i_d
but I would like to have a query without all this "1_0" etc.
select base.i_d as i_d, ....
from base base
left outer join campaign base on base.i_d = campaign.i_d ...
I want to use the translated queries to create views for all entities:
resulting view and columns
You can see, column names are not very useful ;-)
Does anyone have any idea, without string substitution or modifying the SQL AST?

BigQuery _TABLE_SUFFIX is empty/missing tables

Thanks in advance for taking a look into this, hope someone can help.
I am creating tables with fixed prefix + dynamic suffix
something like: name123456 in which name is fixed/static and 123456 is an incremental numeric value
I currently have multiple tables like:
name123456
name123457
name123458
And I am trying to dynamically query the most recent one (which is the one with the biggest suffix), in the given example it's "name123458".
When running the query below in the BigQuery UI:
#standardsql
select array_agg(distinct _TABLE_SUFFIX) from `project.dataset.name*`
I get no result, and (as far as I understand) I should get all the listed tables above.
I know to get the most recent one I need to use a WHERE clause with max(_TABLE_SUFFIX) but since I am getting an empty _TABLE_SUFFIX I can not get anything from it.
Let me know if more information is required and I'll update as needed.
I found the solution by myself so I'll share the solution here as an answer, but first, thanks to David and Martin Weitzmann for their time and help.
The problem with _TABLE_SUFFIX ignoring some tables/not returning something was that the tables I had in the dataset were all empty tables (just schema).
That's it, _TABLE_SUFFIX ignores empty tables, hope it helps someone else.
You can't use _TABLE_SUFFIX in your SELECT statement - only in the WHERE clause. But you can instead use metatables to find the most recent one: https://cloud.google.com/bigquery/docs/information-schema-tables

BigQuery - Issue with TABLE_DATE_RANGE() function

Background:
I have two datasets on BigQuery.
Dataset 1 is named '12345678' with tables having the names 'ga_sessions_yyyymmdd'. For example, the table names are like ga_sessions_20140721, ga_sessions_20150413 etc.
Dataset 2 is named 'DestinationTables'. The tables names are in the format yyyymmdd. For example, 20140721, 20150413 etc.
Problem:
Using the TABLE_DATE_RANGE(), I ran the following query on Dataset 1:
SELECT
[fullVisitorId] AS [fullVisitorId]
FROM TABLE_DATE_RANGE([12345678.ga_sessions_],TIMESTAMP('2014-07-21'),TIMESTAMP('2014-07-25'));
This query successfully runs.
I now run a similar query on Dataset 2:
SELECT
[fullVisitorId] AS [fullVisitorId]
FROM TABLE_DATE_RANGE([DestinationTables.],TIMESTAMP('2014-07-21'),TIMESTAMP('2014-07-25'));
However, this errors out with the message:
Error: Can't parse table: DestinationTables
Why is this happening? Any insight on this would be greatly appreciated.
Thanks in advance!
The syntax for identifying a dataset and a table prefix are correct in your first example:
[12345678.ga_sessions_]
And as explained in the docs for this function, it will expand to cover tables (in dataset 12345678) of the format:
ga_sessions_yyyymmdd
However, in your second example, the identifier stops with just a dot where it should continue to identify a table prefix. I think the issue is that you have no prefix and so the naked dot on the end of the string is confusing the interpreter.
You may need to change your tables to have some kind of prefix, even if it's just an underscore, so that you can properly specify the prefix when calling TABLE_DATE_RANGE

Rails doesn't respect my select fields using includes

I want write a query with active record and seems it never respect what I want to do. So, here are my example:
Phonogram.preload(:bpms).includes(:bpms).select("phonograms.id", "bpms.bpm")
This query returns all my fields from phonograms and bpms. The problem is that I need put more 15 relationships in this query.
I also tried use joins but didn't work properly. I've 10 phonograms and returns just 3.
Someone experienced that? How did you solve it properly?
Cheers.
select with includes does not produce consistent behavior. It appears that if the included association returns no results, select will work properly, if it returns results, the select statement will have no effect. In fact, it will be completely ignored, such that your select statement could reference invalid table names and no error would be produced. select with joins will produce consistent behavior.
That's why you better go with joins like:
Phonogram.joins(:bpms).select("phonograms.id", "bpms.bpm")