Thanks in advance for taking a look into this, hope someone can help.
I am creating tables with fixed prefix + dynamic suffix
something like: name123456 in which name is fixed/static and 123456 is an incremental numeric value
I currently have multiple tables like:
name123456
name123457
name123458
And I am trying to dynamically query the most recent one (which is the one with the biggest suffix), in the given example it's "name123458".
When running the query below in the BigQuery UI:
#standardsql
select array_agg(distinct _TABLE_SUFFIX) from `project.dataset.name*`
I get no result, and (as far as I understand) I should get all the listed tables above.
I know to get the most recent one I need to use a WHERE clause with max(_TABLE_SUFFIX) but since I am getting an empty _TABLE_SUFFIX I can not get anything from it.
Let me know if more information is required and I'll update as needed.
I found the solution by myself so I'll share the solution here as an answer, but first, thanks to David and Martin Weitzmann for their time and help.
The problem with _TABLE_SUFFIX ignoring some tables/not returning something was that the tables I had in the dataset were all empty tables (just schema).
That's it, _TABLE_SUFFIX ignores empty tables, hope it helps someone else.
You can't use _TABLE_SUFFIX in your SELECT statement - only in the WHERE clause. But you can instead use metatables to find the most recent one: https://cloud.google.com/bigquery/docs/information-schema-tables
Related
I have a Postgres table whose header is [id(uuid), name(str), arg_name(str), measurements(list), run_id(uuid), parent_id(uuid)] with a total of 237K entries.
When I want to filter for specific measurements I can use 'name', but for the majority of entries in the table 'name' == 'arg_name' and thus map to the same sample.
In my peculiar case I am interested in retrieving samples whose 'name'='TimeM12nS' and whose 'arg_name'='Time'. These two attributes point to the same samples when visually inspecting the table through PgAdmin. That is to say all entries which have arg_name='Time' also have the name='TimeM12nS' and vice-versa.
Its obvious there's a problem because of the quantity of returned samples is not the same. I first noticed the problem using django orm, but the problem is also present when I query the DB using PgAdmin.
SELECT *
FROM TableA
WHERE name='TimeM12nS'
returns 301 entries (name='TimeM12nS' and arg_name='Time' in all cases)
BUT the query:
SELECT *
FROM TableA
WHERE arg_name='Time'
returns 3945 (name='TimeM12nS' and arg_name='Time' in all cases)
I am completely stumped, anyone think they can shed some light into what's happening here?
EDIT:
I should add that the query by 'arg_name' returns the 301 entries that are returned when querying by 'name'
First let me say thank you to everyone who pitched in ideas to solve this conundrum and especially to JGH for the solution (found in the comments of the original post).
Indeed the problem was a indexing issue. After re-indexing the queries return the same number of entries '3945' as expected.
In Postgress re-indexing a table can be achieved through pgAdmin by navigating to Databases > 'database_name' > Schemas > Tables then right-clicking on the table_name selecting Maintenance and pressing the REINDEX button.
or more simply by running the following command
REINDEX TABLE table_name
Postgress Re-Indexing Docs
Without access to the database, it's not possibly to give a definitive answer. All I can provide is the next query that I would use in this case.
SELECT COUNT(*), LENGTH(name), name, arg_name
FROM TableA
WHERE arg_name='Time'
GROUP BY name, arg_name;
This should show you any differences in the name column that you aren't able to see. The length of that string could also be informative.
Halo,
first, i say thank you for helping me solve my problem before.
I'm really newbie using Postgresql.
now i have new problem,
i do select statement like this one :
select * from company where id=10;
when i see the query in pg_stat_statements, i just get the query like this :
select * from company where id=?;
from the result the value of id is missing,
how i can get the complete query without missing the value??
Thank you :)
Alternatively you could set log_min_duration to 0 which will lead Postgres to log every statement.
Pg_stat_statements is meant to be for stats and these are aggregated, if every lookup value would be in there the stats would be useless because it would be hard to group.
If you want to understand a query just run it with explain analyze and you will get the query plan.
I have some tables by day and by hour, called 2015_09_01_00, 2015_09_01_01..., 2015_09_02_00, 2015_09_02_01, etc.
I also created a virtual table for 2015_09_01, 2015_09_02, etc, aggregating them respectively by day.
So, in this context, when I want to query some virtual tables (some days) I have to execute this query for example:
SELECT fields FROM TABLE_QUERY(dataset, 'REGEXP_MATCH(table_id, r"(2015_09_01|2015_09_02)$")')
It gives network unreachable error, I guess is messing up between the original tables and the virtual ones since the names are related.
However, if I execute:
SELECT table_id FROM dataset.__TABLES_SUMMARY__ WHERE REGEXP_MATCH(table_id, r"(2015_09_01|2015_09_02)$")
2015_09_01
2015_09_02
it seems that the filter is created successfully.
So, what am I doing wrong here?
Thanks for your help in advance.
Above example worked for me.
Most likely in your dataset you have other tables with same pattern as you are trying to match.
Try to restrict your regex. For example as below
SELECT fields FROM TABLE_QUERY(dataset, 'REGEXP_MATCH(table_id, r"^(2015_09_01|2015_09_02)$")')
Did you define the daily views to reference themselves? Your use of $ suggested you were thinking about this, but to make sure -- they should reference only the hourly tables.
For example, if you named a view 2014_09_14 when it was based on a TABLE_QUERY for
'REGEXP_MATCH(table_id, r"2015_09_14")'
then it would reference itself, which does not work. (This ought to be a clearer error though.)
If you define the view with a TABLE_QUERY that can't match itself
'REGEXP_MATCH(table_id, r"2015_09_14_\d\d")'
then it should work. If you select your view's "Details" what is the query that defined it?
There exists in my database a page_history table; the idea is that whenever a record in the page table is changed, that record's old values are stored in the history table.
My job now is to find occasions in which a record was changed, and retrieve the pre- and post-conditions of that change. Specifically, I want to know when a page changed groups, and what groups were involved in the change. The query I have below can find these instances, but with the use of the min function, I can only get back the values that match between the two records:
select page_id,
original_group,
min(created2) change_date
from (select h.page_id,
h.group_id original_group,
i.group_id new_group,
h.created_dttm created1,
i.created_dttm created2
from page_history h,
page_history i
where h.page_id = i.page_id
and h.created_dttm < i.created_dttm
and h.group_id != i.group_id)
group by page_id, original_group, created1
order by page_id
When I try to get, say, any details of the second record, like new_group, I'm hit with a ORA-00979: not a GROUP BY expression error. I don't want to group by new_group, though, because that's going to destroy the logic (I think it would find records displaying times a page changed from a group to another group, regardless of any changes to other groups in between).
My question, then, is how can I modify this query, or go about writing a new one, that achieves a similar end, but with the added availability of columns that do not match between the two records? In essence, how can I find that min record without sacrificing all the other columns I'm not trying to compare? I don't exactly need a complete answer, any suggestions that point me in the right direction would be appreciated.
I use PL/SQL Developer, and it looks like version 11.2.0.2.0 of Oracle.
EDIT: I have found a solution. It's not pretty, and I'd still like to see some alternatives, but if helping me out would threaten to explode your brain, I would advise relocating to an easier question.
Without seeing your table structure it's hard to re-write the query but when you have a min function used like that it invariably seems better to put it into a separate sub select to get what you want and then compare the result of that.
I have a BigQuery database where daily data is uploaded into it's own table. So I have tables named "20131201", "20131202", etc. I can write a fixed query to "merge" those tables by doing:
SELECT * FROM db.20131201, db.20131202, ...
I'd like to have a single query that does not require me to update the Custom SQL everytime a new table is added. Something like:
SELECT * FROM db.*
Which currently doesn't work. I would like to avoid making one giant table. Is there a work-around that I can do, or will this have to be a feature request?
End-goal is for a Tableau data connection to all the tables.
This isn't exactly what you've asked for, but I've managed to use https://developers.google.com/bigquery/query-reference#tablewildcardfunctions in particular
TABLE_DATE_RANGE(prefix, timestamp1, timestamp2)
to achieve a similar result for use in tableaux. You'll still need to provide 2 date parameters, but it's substantially better than dynamically generating the FROM clause.
Hope this helps.
As of now in google bigquery this dynamic Sql [like "EXECUTE SQL" in mssqlserver] is not avilable...sulry google will look inthis i belive :)