Can SELECT * FROM multiple tables with same _TABLE_SUFFIX pattern - sql

I am trying to select * rows from 3 tables that match a TABLE_SUFFIX pattern, the thing is I didn't recieve the expected output.
The query I am using:
SELECT
*
FROM
`project-id.airbyte_google_ads.client_id_*`
WHERE
REGEXP_CONTAINS(_TABLE_SUFFIX, r"_campaign_performance_overview$")
The ouput recieved contains columns of other tables, and not from the ones that I want, but if I am using:
SELECT
DISTINCT _TABLE_SUFFIX as tables
FROM
`project-id.airbyte_google_ads.client_id_*`
WHERE
REGEXP_CONTAINS(_TABLE_SUFFIX, r"_campaign_performance_overview$")
The tables names from which I want to select rows, are correct.
My tought is that something is wrong at wildcard line, and i tought if there can be a way to use it somehow like:
`project-id.airbyte_google_ads.client_id_*_campaign`
or something similar, because looks like the query does something at FROM statement, and does whats in WHERE at a different point.
Let me know what are your toughts on that.
Thank you for your time!

As per this documentation, when using wildcard tables, all the tables in the dataset that begin with the table name before * are scanned even if _TABLE_SUFFIX is used in combination with REGEXP_CONTAINS. In our case, the wildcard pattern is client_id_* and hence, the values such as client_id_1_campaigns are also matched irrespective of the pattern in REGEXP_CONTAINS.
The reason for this behaviour is that, the wildcard pattern precedes the regex and scans all the tables matching the wildcard pattern and will not take the regex into account. Using wildcards while also using REGEXP_CONTAINS is applying regex on top of regex and is not recommended.
If you wish to have the intended target tables you will need to use the below query instead of using wildcards to query multiple tables.
SELECT *
FROM (
SELECT * FROM `project-id.dataset-id.client_id_2_campaign_performance_overview` UNION ALL
SELECT * FROM `project-id.dataset-id.client_id_7_campaign_performance_overview` UNION ALL
SELECT * FROM `project-id.dataset-id.client_id_10_campaign_performance_overview`);
Using the LIKE operator also does not give the expected results for the same reason mentioned above. The tables are scanned first then filtered giving extra columns in the result.
Also, BigQuery uses the schema for the most recently created table that matches the wildcard as the schema for the wildcard table. Even if you restrict the number of tables that you want to use from the wildcard table using the _TABLE_SUFFIX pseudo column in a WHERE clause, BigQuery uses the schema for the most recently created table that matches the wildcard. You will see the extra columns in the result if the most recently created table has them.

Related

Use wildcard query on dataset with models in BigQuery

I have a series of tables that are named {YYYYMM}_{id} and I have ML models that are named {groupid}_cost_model. I'm attempting to collate some data across all the tables using the following query:
SELECT * FROM `mydataset.20*`
The problem I'm having is that I have a model named 200_cost_model and it causes the following error:
Wildcard table over non partitioning tables and field based partitioning tables is not yet supported, first normal table myproject:mydataset.200_cost_model, first column table myproject:mydataset.202001_4544248676.
Is there a way to filter out the models from wildcard queries or am I stuck joining all the tables together?
When using Wildcard tables you can use psuedo column to filter results:
Queries with wildcard tables support the _TABLE_SUFFIX pseudo column
in the WHERE clause. This column contains the values matched by the
wildcard character, so that queries can filter which tables are
accessed. For example, the following WHERE clauses use comparison
operators to filter the matched tables
I have tested on my side, although only on standard freshly created tables, that it should work for example like that:
SELECT *
FROM
`mydataset.20*`
WHERE
_TABLE_SUFFIX like '%cost_model' ;
As well to check all possible _TABLE_SUFFIX choices it work to me like this:
select DISTINCT _TABLE_SUFFIX as suffix from `mydataset.20*`
but I am not sure, if this will work in your situation.

Rename all tables in SELECT query

Can anyone tell me how to replace the name of all the table names in a the SELECT/FROM statements?. I'm looking of a way that works well across vanilla queries as well as more complex ones with sub-queries and joins.
I.e.
New table name: new_table
Original query: SELECT * from table;
Result query: SELECT * FROM new_table;
Thanks a lot,
j
If your queries are as simple as what you're proposing, you should be able start by parsing the query which will give you a SqlSelect object. From there you can use getFrom to check if it's the table you want to change and setFrom to change it.
If you want to handle more complex queries, you should be able to implement the SqlVisitor interface to find all occurrences of the table to replace.

How to do damage with SQL by adding to the end of a statement?

Perhaps I am not creative or knowledgeable enough with SQL... but it looks like there is no way to do a DROP TABLE or DELETE FROM within a SELECT without the ability to start a new statement.
Basically, we have a situation where our codebase has some gigantic, "less-than-robust" SQL generation component that never uses prepared statements and we now have an API that interacts with this legacy component.
Right now we can modify a query by appending to the end of it, but have been unable to insert any semicolons. Thus, we can do something like this:
/query?[...]&location_ids=loc1')%20or%20L1.ID%20in%20('loc2
which will result in this
SELECT...WHERE L1.PARENT_ID='1' and L1.ID IN ('loc1') or L1.ID in ('loc2');...
This is just one example.
Basically we can append pretty much anything to the end of any/most generated SQL queries, less adding a semicolon.
Any ideas on how this could potentially do some damage? Can you add something to the end of a SQL query that deletes from or drops tables? Or create a query so absurd that it takes up all CPU and never completes?
You said that this:
/query?[...]&location_ids=loc1')%20or%20L1.ID%20in%20('loc2
will result in this:
SELECT...WHERE L1.PARENT_ID='1' and L1.ID IN ('loc1') or L1.ID in ('loc2');
so it looks like this:
/query?[...]&location_ids=');DROP%20TABLE users;--
will result in this:
SELECT...WHERE L1.PARENT_ID='1' and L1.ID IN ('');DROP TABLE users;--');
which is a SELECT, a DROP and a comment.
If it’s not possible to inject another statement, you limited to the existing statement and its abilities.
Like in this case, if you are limited to SELECT and you know where the injection happens, have a look at PostgreSQL’s SELECT syntax to see what your options are. Since you’re injecting into the WHERE clause, you can only inject additional conditions or other clauses that are allowed after the WHERE clause.
If the result of the SELECT is returned back to the user, you may want to add your own SELECT with a UNION operation. However, PostgreSQL requires compatible data types for corresponding columns:
The two SELECT statements that represent the direct operands of the UNION must produce the same number of columns, and corresponding columns must be of compatible data types.
So you would need to know the number and data types of the columns of the original SELECT first.
The number of columns can be detected with the ORDER BY clause by specifying the column number like ORDER BY 3, which would order the result by the values of the third column. If the specified column does not exist, the query will fail.
Now after determining the number of columns, you can inject a UNION SELECT with the appropriate number of columns with an null value for each column of your UNION SELECT:
loc1') UNION SELECT null,null,null,null,null --
Now you determine the types of each column by using a different value for each column one by one. If the types of a column are incompatible, you may an error that hints the expected data type like:
ERROR: invalid input syntax for integer
ERROR: UNION types text and integer cannot be matched
After you have determined enough column types (one column may be sufficient when it’s one that is presented the user), you can change your SELECT to select whatever you want.

SQL Selecting and Returning Data Based on Pattern

At my office, one of the tables we use keeps track of our Order Numbers. The problem is that the employees don't enter the number consistantly into the database field.
Some of the examples are listed:
'7-26-13 543006-27031', '345009-27031', 'KWYD-863009-27031'.
I need to to find a way to return just the 'nnnnnn-nnnnn' substring
no matter where in the field it is. Most of the time, this pattern is at the end of the field, but that is not always the case. I've already limited the data records to just those with that pattern using a LIKE expression in my WHERE clause, but I have no idea how to best return just that pattern as a column.
Edit:
We are still using SQL Server 2000
What I'm looking to do is along the lines of:
SELECT SUBSTRING(VendorOrderNo, ??, 12) AS OrderNo
FROM Orders
WHERE VendorOrderNo LIKE '%[0-9][0-9][0-9][0-9][0-9][0-9]-[0-9][0-9][0-9][0-9][0-9]%'
select 'nnnnnn-nnnnn' as employeeid
from table
where employees like'%nnnnnn-nnnnn'
It would be better if you adjust your data first, you can give this a try (its for MYSQL)
SELECT * FROM your_table_name WHERE order_number REGEXP '[0-9]+-[0-9]+';
SQLFIDDLE

Use regex stored in a table as criteria for an SQL query

I have a table with regular expressions which I need to use to filter rows from another table.
Something like:
SELECT *
FROM a
WHERE foo SIMILAR TO '(SELECT regex FROM b)'
Obviously, that doesn't work because that isn't the syntax and there are multiple rows in b that I need to iterate through.
I'm using PostgreSQL 8.3.
Perhaps doing a join would work? E.g.
SELECT a.*, b.regex
FROM a JOIN b ON a.foo ~ b.regex
I'm afraid I'm not familiar enough with Postgres to say for certain, but this would be the ordinary way in SQL of iterating over rows. It should return a.foo multiple times if multiple regexes are matched.