Using period "." in Standard SQL in BigQuery - google-bigquery

BigQuery Standard SQL does not seems to allow period "." in the select statement. Even a simple query (see below) seems to fail. This is a big problem for datasets with field names that contain "." Is there an easy way to avoid this issue?
select id, time_ts as time.ts
from `bigquery-public-data.hacker_news.comments`
LIMIT 10
Returns error...
Error: Syntax error: Unexpected "." at [1:27]
This also fails...
select * except(detected_circle.center_x )
from [bigquery-public-data:eclipse_megamovie.photos_v_0_2]
LIMIT 10

It depends on what you are trying to accomplish. One interpretation is that you want to return a STRUCT named time with a single field named ts inside of it. If that's the case, you can use the STRUCT operator to build the result:
SELECT
id,
STRUCT(time_ts AS ts) AS time
FROM `bigquery-public-data.hacker_news.comments`
LIMIT 10;
In the BigQuery UI, it will display the result as id and time.ts, where the latter indicates that ts is inside a STRUCT named time.
BigQuery disallows columns in the result whose names include periods, so you'll get an error if you run the following query:
SELECT
id,
time_ts AS `time.ts`
FROM `bigquery-public-data.hacker_news.comments`
LIMIT 10;
Invalid field name "time.ts". Fields must contain only letters, numbers, and underscores, start with a letter or underscore, and be at most 128 characters long.

Elliot's answer great and addresses first part of your question, so let me address second part of it (as it is quite different)
First, wanted to mention that select modifiers like SELECT * EXCEPT are supported for BigQuery Standard SQL so, instead of
SELECT * EXCEPT(detected_circle.center_x )
FROM [bigquery-public-data:eclipse_megamovie.photos_v_0_2]
LIMIT 10
you should rather tried
#standardSQL
SELECT * EXCEPT(detected_circle.center_x )
FROM `bigquery-public-data.eclipse_megamovie.photos_v_0_2`
LIMIT 10
and of course now we are back to issue with `using period in standard sql
So, above code can only be interpreted as you try to eliminate center_x field from detected_circle STRUCT (nullable record). Technically speaking, this makes sense and can be done using below code
SELECT *
REPLACE(STRUCT(detected_circle.radius, detected_circle.center_y ) AS detected_circle)
FROM `bigquery-public-data.eclipse_megamovie.photos_v_0_2`
LIMIT 10
... still not clear to me how to use your recommendation to remove the entire detected_circle.*
SELECT * EXCEPT(detected_circle)
FROM `bigquery-public-data.eclipse_megamovie.photos_v_0_2`
LIMIT 10

Related

SQL query ignores "not between"

I have a data source like following
If I ran the following sql query it removes all the records with "Seg Type" MOD and ignores the Fnn range given.
select * from NpsoQueue
where SegmentType not in ('MOD')
and Fnn not between 0888452158 and 0888452158
I want the query to consider both conditions. So, if I ran the query it should remove only the first record
The logic in your where clause is incorrect
Use
select * from NpsoQueue
where NOT (
SegmentType = 'MOD'
and Fnn between '0888452158' and '0888452158'
)
Also, a number with a leading zero is a string literal so you need to put single quotes around it to preserve the leading zero and stop implicit casts happening
As mentioned by #TriV you could also use OR. These are fundamental boolean logic concepts, i.e. not related to SQL Server or databases

Access SQL: like function on a number field

I have ,for example, this table in a Microsoft Access database:
id numeric
context text
numberfield numeric
I want to select every record that ends with 9 in the column"numberfield". This gives a problem because it is a numeric field and as a result I can not use the following SQL:
select * from table where numberfield like "%9"
A solution is that I change the numberfield to a text. But this gives a problem because there are several users and the change might give a problem in the future. Is there an option to select on the ending when it is a number field?
That sound a little fishy.. are you sure you can use that query? Don't know about Access but almost any other DBMS allows it.
If it really doesn't work, you can do this:
select * from table where STR(numberfield) like "*9"
EDIT: Maybe it didn't work because you used % which is used with * in Access :
select * from table where numberfield like "*9"
Numbers are numbers, so use Mod for this:
select * from table where numberfield mod 10 = 9
Instead of casting to string and comparing, just extract the rightmost digit with a MOD operation.
Edit your query as follows:
SELECT *
FROM table
WHERE ((([numberfield] Mod 10)=9));

BigQuery Wildcard using TABLE_DATE_RANGE()

Great news about the new table wildcard functions this morning! Is there a way to use TABLE_DATE_RANGE() on tables that include date but no prefix?
I have a dataset that contains tables named YYYYMMDD (no prefix). Normally I would query like so:
SELECT foo
FROM [mydata.20140319],[mydata.20140320],[mydata.20140321]
LIMIT 100
I tried the following but I'm getting an error:
SELECT foo
FROM
(TABLE_DATE_RANGE(mydata.,
TIMESTAMP('2014-03-19'),
TIMESTAMP('2015-03-21')))
LIMIT 100
as well as:
SELECT foo
FROM
(TABLE_DATE_RANGE(mydata,
TIMESTAMP('2014-03-19'),
TIMESTAMP('2015-03-21')))
LIMIT 100
The underlying bug here has been fixed as of 2015-05-14. You should be able to use TABLE_DATE_RANGE with a purely numeric table name. You'll need to end the dataset in a '.' and enclose the name in brackets, so that the parser doesn't complain. This should work:
SELECT foo
FROM
(TABLE_DATE_RANGE([mydata.],
TIMESTAMP('2014-03-19'),
TIMESTAMP('2015-03-21')))
LIMIT 100
Note: The underlying bug has been fixed, please see my other answer.
Original response left for posterity (since the workaround should still work, in case you need it for some reason)
Great question. That should work, but it doesn't currently. I've filed an internal bug. In the meantime, a workaround is to use the TABLE_QUERY function, as in:
SELECT foo
FROM (
TABLE_QUERY(mydata,
"TIMESTAMP(table_id) BETWEEN "
+ "TIMESTAMP('2014-03-19') "
+ "AND TIMESTAMP('2015-03-21')"))
Note that with standard SQL support in BigQuery, you can use _TABLE_SUFFIX, instead of TABLE_QUERY. For example:
SELECT foo
FROM `mydata_*`
WHERE _TABLE_SUFFIX BETWEEN '20140319' AND '20150321'
Also check this question for more about BigQuery standard SQL.

Return rows where first character is non-alpha

I'm trying to retrieve all columns that start with any non alpha characters in SQlite but can't seem to get it working. I've currently got this code, but it returns every row:
SELECT * FROM TestTable WHERE TestNames NOT LIKE '[A-z]%'
Is there a way to retrieve all rows where the first character of TestNames are not part of the alphabet?
Are you going first character only?
select * from TestTable WHERE substr(TestNames,1) NOT LIKE '%[^a-zA-Z]%'
The substr function (can also be called as left() in some SQL languages) will help isolate the first char in the string for you.
edit:
Maybe substr(TestNames,1,1) in sqllite, I don't have a ready instance to test the syntax there on.
Added:
select * from TestTable WHERE Upper(substr(TestNames,1,1)) NOT in ('A','B','C','D','E',....)
Doesn't seem optimal, but functionally will work. Unsure what char commands there are to do a range of letters in SQLlite.
I used 'upper' to make it so you don't need to do lower case letters in the not in statement...kinda hope SQLlite knows what that is.
try
SELECT * FROM TestTable WHERE TestNames NOT LIKE '[^a-zA-Z]%'
SELECT * FROM NC_CRIT_ATTACH WHERE substring(FILENAME,1,1) NOT LIKE '[A-z]%';
SHOULD be a little faster as it is
A) First getting all of the data from the first column only, then scanning it.
B) Still a full-table scan unless you index this column.

"ORDER BY ... USING" clause in PostgreSQL

The ORDER BY clause is decribed in the PostgreSQLdocumentation as:
ORDER BY expression [ ASC | DESC | USING operator ] [ NULLS { FIRST | LAST } ] [, ...]
Can someone give me some examples how to use the USING operator? Is it possible to get an alternating order of the resultset?
A very simple example would be:
> SELECT * FROM tab ORDER BY col USING <
But this is boring, because this is nothing you can't get with the traditional ORDER BY col ASC.
Also the standard catalog doesn't mention anything exciting about strange comparison functions/operators. You can get a list of them:
> SELECT amoplefttype::regtype, amoprighttype::regtype, amopopr::regoper
FROM pg_am JOIN pg_amop ON pg_am.oid = pg_amop.amopmethod
WHERE amname = 'btree' AND amopstrategy IN (1,5);
You will notice, that there are mostly < and > functions for primitive types like integer, date etc and some more for arrays and vectors and so on. None of these operators will help you to get a custom ordering.
In most cases where custom ordering is required you can get away using something like ... ORDER BY somefunc(tablecolumn) ... where somefunc maps the values appropriately. Because that works with every database this is also the most common way. For simple things you can even write an expression instead of a custom function.
Switching gears up
ORDER BY ... USING makes sense in several cases:
The ordering is so uncommon, that the somefunc trick doesn't work.
You work with a non-primitive type (like point, circle or imaginary numbers) and you don't want to repeat yourself in your queries with strange calculations.
The dataset you want to sort is so large, that support by an index is desired or even required.
I will focus on the complex datatypes: often there is more than one way to sort them in a reasonable way. A good example is point: You can "order" them by the distance to (0,0), or by x first, then by y or just by y or anything else you want.
Of course, PostgreSQL has predefined operators for point:
> CREATE TABLE p ( p point );
> SELECT p <-> point(0,0) FROM p;
But none of them is declared usable for ORDER BY by default (see above):
> SELECT * FROM p ORDER BY p;
ERROR: could not identify an ordering operator for type point
TIP: Use an explicit ordering operator or modify the query.
Simple operators for point are the "below" and "above" operators <^ and >^. They compare simply the y part of the point. But:
> SELECT * FROM p ORDER BY p USING >^;
ERROR: operator > is not a valid ordering operator
TIP: Ordering operators must be "<" or ">" members of __btree__ operator families.
ORDER BY USING requires an operator with defined semantics: Obviously it must be a binary operator, it must accept the same type as arguments and it must return boolean. I think it must also be transitive (if a < b and b < c then a < c). There may be more requirements. But all these requirements are also necessary for proper btree-index ordering. This explains the strange error messages containing the reference to btree.
ORDER BY USING also requires not just one operator to be defined but an operator class and an operator family. While one could implement sorting with only one operator, PostgreSQL tries to sort efficiently and minimize comparisons. Therefore, several operators are used even when you specify only one - the others must adhere to certain mathematical constraints - I've already mentioned transitivity, but there are more.
Switching Gears up
Let's define something suitable: An operator for points which compares only the y part.
The first step is to create a custom operator family which can be used by the btree index access method. see
> CREATE OPERATOR FAMILY xyzfam USING btree; -- superuser access required!
CREATE OPERATOR FAMILY
Next we must provide a comparator function which returns -1, 0, +1 when comparing two points. This function WILL be called internally!
> CREATE FUNCTION xyz_v_cmp(p1 point, p2 point) RETURNS int
AS $$BEGIN RETURN btfloat8cmp(p1[1],p2[1]); END $$ LANGUAGE plpgsql;
CREATE FUNCTION
Next we define the operator class for the family. See the manual for an explanation of the numbers.
> CREATE OPERATOR CLASS xyz_ops FOR TYPE point USING btree FAMILY xyzfam AS
OPERATOR 1 <^ ,
OPERATOR 3 ?- ,
OPERATOR 5 >^ ,
FUNCTION 1 xyz_v_cmp(point, point) ;
CREATE OPERATOR CLASS
This step combines several operators and functions and also defines their relationship and meaning. For example OPERATOR 1 means: This is the operator for less-than tests.
Now the operators <^ and >^ can be used in ORDER BY USING:
> INSERT INTO p SELECT point(floor(random()*100), floor(random()*100)) FROM generate_series(1, 5);
INSERT 0 5
> SELECT * FROM p ORDER BY p USING >^;
p
---------
(17,8)
(74,57)
(59,65)
(0,87)
(58,91)
Voila - sorted by y.
To sum it up: ORDER BY ... USING is an interesting look under the hood of PostgreSQL. But nothing you will require anytime soon unless you work in very specific areas of database technology.
Another example can be found in the Postgres docs. with source code for the example here and here. This example also shows how to create the operators.
Samples:
CREATE TABLE test
(
id serial NOT NULL,
"number" integer,
CONSTRAINT test_pkey PRIMARY KEY (id)
)
insert into test("number") values (1),(2),(3),(0),(-1);
select * from test order by number USING > //gives 3=>2=>1=>0=>-1
select * from test order by number USING < //gives -1=>0=>1=>2=>3
So, it is equivalent to desc and asc. But you may use your own operator, that's the essential feature of USING
Nice answers, but they didn't mentioned one real valuable case for ´USING´.
When you have created an index with non default operators family, for example varchar_pattern_ops ( ~>~, ~<~, ~>=~, ... ) instead of <, >, >= then if you search based on index and you want to use index in order by clause you need to specify USING with the appropriate operator.
This can be illustrated with such example:
CREATE INDEX index_words_word ON words(word text_pattern_ops);
Lets compare this two queries:
SELECT * FROM words WHERE word LIKE 'o%' LIMIT 10;
and
SELECT * FROM words WHERE word LIKE 'o%' ORDER BY word LIMIT 10;
The difference between their executions is nearly 100 times in a 500K words DB! And also results may not be correct within non-C locale.
How this could happend?
When you making search with LIKE and ORDER BY clause, you actually make this call:
SELECT * FROM words WHERE word ~>=~ 'o' AND word ~<~'p' ORDER BY word USING < LIMIT 10;
Your index created with ~<~ operator in mind, so PG cannot use given index in a given ORDER BY clause. To get things done right query must be rewritten to this form:
SELECT * FROM words WHERE word ~>=~ 'o' AND word ~<~'p' ORDER BY word USING ~<~ LIMIT 10;
or
SELECT * FROM words WHERE word LIKE 'o%' ORDER BY word USING ~<~ LIMIT 10;
Optionally one can add the key word ASC (ascending) or DESC
(descending) after any expression in the ORDER BY clause. If not
specified, ASC is assumed by default. Alternatively, a specific
ordering operator name can be specified in the USING clause. An
ordering operator must be a less-than or greater-than member of some
B-tree operator family. ASC is usually equivalent to USING < and DESC
is usually equivalent to USING >.
PostgreSQL 9.0
It may look something like this I think (I don't have postgres to verify this right now, but will verify later)
SELECT Name FROM Person
ORDER BY NameId USING >