Legacy Sql equivalent to TABLE_QUERY(dataset, expr) in standard sql - google-bigquery

Can anyone help me with understanding the equivalent of TABLE_QUERY(dataset, expr) in Standard Sql.
I found this on google docs for Legacy Sql:
#legacySQL
SELECT
speed
FROM (TABLE_QUERY([myproject-1234:mydata],
'table_id CONTAINS "oo" AND length(table_id) >= 4'))
I did not find the equivalent for above in Standard SQl

#standardSQL
SELECT speed
FROM `myproject-1234.mydata.*`
WHERE _TABLE_SUFFIX LIKE '%oo%'
AND LENGTH(_TABLE_SUFFIX) >= 4
Important: using just * as a wildcard for whole table name as in myproject-1234.mydata.* is the worst case performance wise
Ideally your table suffix should be as narrow as you can use - like for example myproject-1234.mydata.myprefix_
Read more about Wildcard Tables
Also, here you can read more about Migrating legacy SQL table wildcard functions

Related

Creating a view in PostgresSQL for a query that worked in SQL Server (COUNT between two DATETIME columns/TIMECODE

I have a problem with Redshift where it throws out a syntax error when trying to create a view similar to the following:
SELECT
*,
(SELECT COUNT(*)
FROM [Leads - leads]
WHERE Receive_Time BETWEEN dbo.[Leads - extended spot summary fla32813].[Plan Air]
AND.dbo.[Leads - extended spot summary fla32813].[Plan End]) AS Leads
INTO
[32813DA]
FROM
.dbo.[Leads - extended spot summary fla32813]
`
This code has numerous errors from a Redshift perspective:
It is using square braces as delimiters. Redshift (and the SQL Standard) is double quotes.
The three-part naming doesn't look right for Redshift.
into is not allowed in views in any of the mentioned databases.
You may have other errors as well, but you need to get the names right before you start with anything related to logic.
The SELECT ... INTO ... syntax is specific to SQL Server and create a table from a SELECT. The ISO SQL syntax is CREATE TABLE AS (SELECT ...) but does not woks with SQL Server.
Use a more simple an sharable syntax like :
CREATE TABLE ??? (..);
INSERT INTO ???
SELECT ...

Equivalent to minus in netezza

I want to compare data between two different db tables in netezza. In oracle we can do that by minus operator. How can the same operation be done in netezza.
SELECT CUSTOMER_SRC_ID,CUSTOMER_SRC_DESC FROM CIDB_SIT..CUSTOMER_SRC
MINUS
SELECT CUSTOMER_SRC_ID,CUSTOMER_SRC_DESC FROM EDW_SIT..CUSTOMER_SRC
Seems like it doesn't work in netezza. Can any one help me find the equivalent query in netezza?
The ANSI-SQL standard calls this operators except. Netezza implements it, as do PostgreSQL and MS SQL Server:
SELECT CUSTOMER_SRC_ID,CUSTOMER_SRC_DESC FROM CIDB_SIT..CUSTOMER_SRC
EXCEPT -- Here
SELECT CUSTOMER_SRC_ID,CUSTOMER_SRC_DESC FROM EDW_SIT..CUSTOMER_SRC
You could use the EXCEPT
or
--if customer_src_id is unique--
SELECT CUSTOMER_SRC_ID,CUSTOMER_SRC_DESC
FROM CIDB_SIT..CUSTOMER_SRC
WHERE CUSTOMER_SRC_ID NOT IN (SELECT CUSTOMER_SRC_ID FROM EDW_SIT..CUSTOMER_SRC);

Is there an equivalent of table wildcard functions in BigQuery with standard SQL?

In legacy SQL, users can use table wildcard functions like TABLE_DATE_RANGE, TABLE_QUERY and TABLE_DATE_RANGE_STRICT.
Is there a similar feature with standard SQL?
In legacy SQL, users can reference data from a subset of tables in a dataset using table wildcard functions. In standard SQL, users can achieve the same result using UNION ALL. However, this approach may not be convenient when users want to dynamically determine the set of tables using, for example, either a date range (supported using TABLE_DATE_RANGE and TABLE_DATE_RANGE_STRICT in legacy SQL) or other complex criteria (supported by TABLE_QUERY in legacy SQL). With Standard SQL, BigQuery offers an equivalent to this described below.
The following legacy SQL query that uses the TABLE_QUERY wildcard function can be rewritten using standard SQL.
Legacy SQL query (using TABLE_QUERY):
SELECT SUM(value1)
FROM TABLE_QUERY([myproject:mydataset],"table_id = 'mydailytable_20150105' OR
table_id = 'mydailytable_20150106' OR table_id = 'maydailytable_20150110'")
GROUP BY value2;
Legacy SQL query (using TABLE_DATE_RANGE):
SELECT SUM(value1)
FROM TABLE_DATE_RANGE([myproject:mydataset], TIMESTAMP("2015-01-05"), TIMESTAMP("2015-01-10"))
Standard SQL query:
SELECT SUM(value1)
FROM `myproject.mydataset.mydailytable_*`
WHERE _TABLE_SUFFIX = '20150105'
OR _TABLE_SUFFIX = '20150106'
OR _TABLE_SUFFIX = '20150110'
GROUP BY value2;
In the above query, the wildcard table myproject.mydataset.mydailytable_* matches all tables in the dataset myproject.mydataset that have table_id starting with mydailytable_. For example, to match all tables in the dataset the user can use an empty prefix for the wildcard. So, myproject.mydataset.* matches all tables in the dataset.
Since * is a special character, wildcard table names must be quoted when using them in a query.
The _TABLE_SUFFIX pseudo column:
The _TABLE_SUFFIX pseudo column has type STRING and can be used just like any other column. It is a reserved column name, so it needs to be aliased when using it as part of the SELECT list.
Official documentation for this feature is available here:
https://cloud.google.com/bigquery/docs/wildcard-tables
https://cloud.google.com/bigquery/docs/querying-wildcard-tables

concatenate string in sql

I have a database table "create table t (s varchar, i int)" with 100 records.
When I want to sum all 'i' fields, I invoke something like "select sum(i) from t". Is there a way to concatenate the 's' fields? (select concatenate(s) from t)
In any sql dialect?
In any sql dialect?
There isn't an ANSI SQL specified way to do this across all SQL dialects. If you want specific solutions for a particular DBMS, then sure, some have a ready made solution, and others have generalized solutions that are more complicated.
e.g.
Oracle = WM_CONCAT
MySQL = GROUP_CONCAT
SQL Server = UDF / FOR XML PATH('') / recursive CTE
You need a question for each RDBMS you need the solution for, but you will find duplicate questions for each case already on StackOverflow.

Make an SQL request more efficient and tidy?

I have the following SQL query:
SELECT Phrases.*
FROM Phrases
WHERE (((Phrases.phrase) Like "*ing aids*")
AND ((Phrases.phrase) Not Like "*getting*")
AND ((Phrases.phrase) Not Like "*contracting*"))
AND ((Phrases.phrase) Not Like "*preventing*"); //(etc.)
Now, if I were using RegEx, I might bunch all the Nots into one big (getting|contracting|preventing), but I'm not sure how to do this in SQL.
Is there a way to render this query more legibly/elegantly?
Just by removing redundant stuff and using a consistent naming convention your SQL looks way cooler:
SELECT *
FROM phrases
WHERE phrase LIKE '%ing aids%'
AND phrase NOT LIKE '%getting%'
AND phrase NOT LIKE '%contracting%'
AND phrase NOT LIKE '%preventing%'
You talk about regular expressions. Some DBMS do have it: MySQL, Oracle... However, the choice of either syntax should take into account the execution plan of the query: "how quick it is" rather than "how nice it looks".
With MySQL, you're able to use regular expression where-clause parameters:
SELECT something FROM table WHERE column REGEXP 'regexp'
So if that's what you're using, you could write a regular expression string that is possibly a bit more compact that your 4 like criteria. It may not be as easy to see what the query is doing for other people, however.
It looks like SQL Server offers a similar feature.
Sinec it sounds like you're building this as you go to mine your data, here's something that you could consider:
CREATE TABLE Includes (phrase VARCHAR(50) NOT NULL)
CREATE TABLE Excludes (phrase VARCHAR(50) NOT NULL)
INSERT INTO Includes VALUES ('%ing aids%')
INSERT INTO Excludes VALUES ('%getting%')
INSERT INTO Excludes VALUES ('%contracting%')
INSERT INTO Excludes VALUES ('%preventing%')
SELECT
*
FROM
Phrases P
WHERE
EXISTS (SELECT * FROM Includes I WHERE P.phrase LIKE I.phrase) AND
NOT EXISTS (SELECT * FROM Excludes E WHERE P.phrase LIKE E.phrase)
You are then always just running the same query and you can simply change what's in the Includes and Excludes tables to refine your searches.
Depending on what SQL server you are using, it may support REGEX itself. For example, google searches show that SQL Server, Oracle, and mysql all support regex.
You could push all your negative criteria into a short circuiting CASE expression (works Sql Server, not sure about MSAccess).
SELECT *
FROM phrases
WHERE phrase LIKE '%ing aids%'
AND CASE
WHEN phrase LIKE '%getting%' THEN 2
WHEN phrase LIKE '%contracting%' THEN 2
WHEN phrase LIKE '%preventing%' THEN 2
ELSE 1
END = 1
On the "more efficient" side, you need to find some criteria that allows you to avoid reading the entire Phrases column. Double sided wildcard criteria is bad. Right sided wildcard criteria is good.