BigQuery Wildcard using TABLE_DATE_RANGE() - google-bigquery

Great news about the new table wildcard functions this morning! Is there a way to use TABLE_DATE_RANGE() on tables that include date but no prefix?
I have a dataset that contains tables named YYYYMMDD (no prefix). Normally I would query like so:
SELECT foo
FROM [mydata.20140319],[mydata.20140320],[mydata.20140321]
LIMIT 100
I tried the following but I'm getting an error:
SELECT foo
FROM
(TABLE_DATE_RANGE(mydata.,
TIMESTAMP('2014-03-19'),
TIMESTAMP('2015-03-21')))
LIMIT 100
as well as:
SELECT foo
FROM
(TABLE_DATE_RANGE(mydata,
TIMESTAMP('2014-03-19'),
TIMESTAMP('2015-03-21')))
LIMIT 100

The underlying bug here has been fixed as of 2015-05-14. You should be able to use TABLE_DATE_RANGE with a purely numeric table name. You'll need to end the dataset in a '.' and enclose the name in brackets, so that the parser doesn't complain. This should work:
SELECT foo
FROM
(TABLE_DATE_RANGE([mydata.],
TIMESTAMP('2014-03-19'),
TIMESTAMP('2015-03-21')))
LIMIT 100

Note: The underlying bug has been fixed, please see my other answer.
Original response left for posterity (since the workaround should still work, in case you need it for some reason)
Great question. That should work, but it doesn't currently. I've filed an internal bug. In the meantime, a workaround is to use the TABLE_QUERY function, as in:
SELECT foo
FROM (
TABLE_QUERY(mydata,
"TIMESTAMP(table_id) BETWEEN "
+ "TIMESTAMP('2014-03-19') "
+ "AND TIMESTAMP('2015-03-21')"))

Note that with standard SQL support in BigQuery, you can use _TABLE_SUFFIX, instead of TABLE_QUERY. For example:
SELECT foo
FROM `mydata_*`
WHERE _TABLE_SUFFIX BETWEEN '20140319' AND '20150321'
Also check this question for more about BigQuery standard SQL.

Related

SQL function to transform number with a certain pattern

I need for a SQL query to transform an int with a value between 1 to 300000 to a number which has this pattern : always 8 number.
For example:
1 becomes 00000001,
123 becomes 00000123,
123456 becomes 00123456.
I have no idea how to do that... How can I do it?
In Standard SQL, you can use this trick:
select substring(cast( (num + 100000000) as varchar(255)) from 2)
Few databases actually support this syntax. Any given database can do what you want, but the method depends on the database you are using.
For MS SQL Server
You could use FORMAT function, like this:
SELECT FORMAT(123,'00000000')
https://database.guide/how-to-format-numbers-in-sql-server/#:~:text=Starting%20from%20SQL%20Server%202012,the%20output%20should%20be%20formatted.
Read at the link Leading Zeroes
For MySql/Oracle
You could use LPAD, like this:
SELECT LPAD('123',8,'0')
https://database.guide/how-to-add-leading-zeros-to-a-number-in-mysql/

Using period "." in Standard SQL in BigQuery

BigQuery Standard SQL does not seems to allow period "." in the select statement. Even a simple query (see below) seems to fail. This is a big problem for datasets with field names that contain "." Is there an easy way to avoid this issue?
select id, time_ts as time.ts
from `bigquery-public-data.hacker_news.comments`
LIMIT 10
Returns error...
Error: Syntax error: Unexpected "." at [1:27]
This also fails...
select * except(detected_circle.center_x )
from [bigquery-public-data:eclipse_megamovie.photos_v_0_2]
LIMIT 10
It depends on what you are trying to accomplish. One interpretation is that you want to return a STRUCT named time with a single field named ts inside of it. If that's the case, you can use the STRUCT operator to build the result:
SELECT
id,
STRUCT(time_ts AS ts) AS time
FROM `bigquery-public-data.hacker_news.comments`
LIMIT 10;
In the BigQuery UI, it will display the result as id and time.ts, where the latter indicates that ts is inside a STRUCT named time.
BigQuery disallows columns in the result whose names include periods, so you'll get an error if you run the following query:
SELECT
id,
time_ts AS `time.ts`
FROM `bigquery-public-data.hacker_news.comments`
LIMIT 10;
Invalid field name "time.ts". Fields must contain only letters, numbers, and underscores, start with a letter or underscore, and be at most 128 characters long.
Elliot's answer great and addresses first part of your question, so let me address second part of it (as it is quite different)
First, wanted to mention that select modifiers like SELECT * EXCEPT are supported for BigQuery Standard SQL so, instead of
SELECT * EXCEPT(detected_circle.center_x )
FROM [bigquery-public-data:eclipse_megamovie.photos_v_0_2]
LIMIT 10
you should rather tried
#standardSQL
SELECT * EXCEPT(detected_circle.center_x )
FROM `bigquery-public-data.eclipse_megamovie.photos_v_0_2`
LIMIT 10
and of course now we are back to issue with `using period in standard sql
So, above code can only be interpreted as you try to eliminate center_x field from detected_circle STRUCT (nullable record). Technically speaking, this makes sense and can be done using below code
SELECT *
REPLACE(STRUCT(detected_circle.radius, detected_circle.center_y ) AS detected_circle)
FROM `bigquery-public-data.eclipse_megamovie.photos_v_0_2`
LIMIT 10
... still not clear to me how to use your recommendation to remove the entire detected_circle.*
SELECT * EXCEPT(detected_circle)
FROM `bigquery-public-data.eclipse_megamovie.photos_v_0_2`
LIMIT 10

SQL Select LIKE, find similarities

Is there a way with the Select LIKE operator to find similarities?
For example I have a table with following content.
1. 34578
2. 34878
3. 12578
Now I want select all values with are similar with 34X78, where X can be any number from 0 to 9. Result should be then record 1 and 2.
Also X can be on various positions and something like 3XX79 or 3X5X8 should be possible.
It can be also a solution using SQLScript on SAP HANA
Try using '_' wild card:
SELECT * FROM YourTable
WHERE COLUMN LIKE '34_78'
_ Is a wild card that does what you asked for, can be replaced with any thing.
You can find an explanation about LIKE wildcards here.
According to the manuals HANA suports regular expressions:
WHERE column LIKE_REGEXPR '34[0-9]78'

Pentaho Dynamic SQL queries

I have a Pentaho CDE project in development and i wanted to display a chart wich depends on several parameters (like month, year, precise date, country, etc). But when i want to "add" another parameter to my query, it doesn't work anymore... So i'm sure i'm doing something wrong but what ? Please take a look for the parameter month for example :
Select_months_query : (this is for my checkbox)
SELECT
"All" AS MONTH(TransactionDate)
UNION
SELECT DISTINCT MONTH(TransactionDate) FROM order ORDER BY MONTH(TransactionDate);
Select_barchart_query : (this is for my chart, don't mind the other tables)
SELECT pginit.Family, SUM(order.AmountEUR) AS SALES
FROM pginit INNER JOIN statg ON pginit.PG = statg.PGInit INNER JOIN order ON statg.StatGroup = order.StatGroup
WHERE (MONTH(order.TransactionDate) IN (${month}) OR "All" IN (${month}) OR ${month} IS NULL) AND
/*/* Apply the same pattern for another parameter (like year for example) *\*\
GROUP BY pginit.Family
ORDER BY SALES;
(Here, ${month} is a parameter in CDE)
Any ideas on how to do it ?
I read something there that said to use CASE clauses... But how ?
http://forums.pentaho.com/showthread.php?136969-Parametrized-SQL-clause-in-CDE&highlight=dynamic
Thank you for your help !
Try simplifying that query until it runs and returns something and work from there.
Here are some things I would look into as possible causes:
I think you need single quotes around ${parameter} expressions if they're strings;
"All" should probably be 'All' (single quotes instead of double quotes);
Avoid multi-line comments. I don't think you can have multi-line comments in CDE SQL queries, although -- for single line comments usually works.
Be careful with multi-valued parameters; they are passed as arrays, which CDA will convert into comma separated lists. Try with a single valued parameter, using = instead of IN.

Regular expressions inside SQL Server

I have stored values in my database that look like 5XXXXXX, where X can be any digit. In other words, I need to match incoming SQL query strings like 5349878.
Does anyone have an idea how to do it?
I have different cases like XXXX7XX for example, so it has to be generic. I don't care about representing the pattern in a different way inside the SQL Server.
I'm working with c# in .NET.
You can write queries like this in SQL Server:
--each [0-9] matches a single digit, this would match 5xx
SELECT * FROM YourTable WHERE SomeField LIKE '5[0-9][0-9]'
stored value in DB is: 5XXXXXX [where x can be any digit]
You don't mention data types - if numeric, you'll likely have to use CAST/CONVERT to change the data type to [n]varchar.
Use:
WHERE CHARINDEX(column, '5') = 1
AND CHARINDEX(column, '.') = 0 --to stop decimals if needed
AND ISNUMERIC(column) = 1
References:
CHARINDEX
ISNUMERIC
i have also different cases like XXXX7XX for example, so it has to be generic.
Use:
WHERE PATINDEX('%7%', column) = 5
AND CHARINDEX(column, '.') = 0 --to stop decimals if needed
AND ISNUMERIC(column) = 1
References:
PATINDEX
Regex Support
SQL Server 2000+ supports regex, but the catch is you have to create the UDF function in CLR before you have the ability. There are numerous articles providing example code if you google them. Once you have that in place, you can use:
5\d{6} for your first example
\d{4}7\d{2} for your second example
For more info on regular expressions, I highly recommend this website.
Try this
select * from mytable
where p1 not like '%[^0-9]%' and substring(p1,1,1)='5'
Of course, you'll need to adjust the substring value, but the rest should work...
In order to match a digit, you can use [0-9].
So you could use 5[0-9][0-9][0-9][0-9][0-9][0-9] and [0-9][0-9][0-9][0-9]7[0-9][0-9][0-9]. I do this a lot for zip codes.
SQL Wildcards are enough for this purpose. Follow this link: http://www.w3schools.com/SQL/sql_wildcards.asp
you need to use a query like this:
select * from mytable where msisdn like '%7%'
or
select * from mytable where msisdn like '56655%'