BigQuery - Is it possible to use an IF statement within à FROM? - sql

I would like to do the following
FROM if(... = ...,
table_date_range(mytable, timestamp('2017-01-01'), timestamp('2017-01-17')),
table_date_range(mytable, timestamp('2016-01-01'), timestamp('2016-01-17'))
)
Is this kind of operation allowed on BigQuery ?

You can do this using a condition on _TABLE_SUFFIX in standard SQL. For example,
SELECT *
FROM `my-dataset.mytable`
WHERE IF(condition,
_TABLE_SUFFIX BETWEEN '20170101' AND '20170117',
_TABLE_SUFFIX BETWEEN '20160101' AND '20160117');
One thing to keep in mind is that since the matching table suffixes are probably determined dynamically (based on something in your table) you will be charged for a full table scan.

For BigQuery Legacy SQL (which code in your question looks more like) you can use TABLE_QUERY table wildcard function to achieve this.
See example below:
SELECT
...
FROM
TABLE_QUERY([mydataset],
"CASE WHEN ... = ...
THEN REPLACE(table_id, 'mytable_', '') BETWEEN '20170101' AND '20170117'
ELSE REPLACE(table_id, 'mytable_', '') BETWEEN '20160101' AND '20160117'
")
or , with IF()
SELECT
...
FROM
TABLE_QUERY([mydataset],
"IF(... = ..., REPLACE(table_id, 'mytable_', '') BETWEEN '20170101' AND '20170117',
REPLACE(table_id, 'mytable_', '') BETWEEN '20160101' AND '20160117')
")
Meantime, when possible, consider migrating to BigQuery Standard SQL

Related

AR/Arel - How can i compose a query to SELECT a conditional CONCAT of columns

I've got a model method that conditionally concatenates the user's username ("login") and real name, if they've saved a real name - otherwise it just shows the username. I'd like to rewrite the query in ActiveRecord or Arel.
It looks like I should use an Arel::Nodes::NamedFunction. But i don't understand how to do the conditional concatenation with a named function. (Does Arel know about "if"? I can't find any reference in the docs.)
def primer_values
connection.select_values(%(
SELECT CONCAT(users.login,
IF(users.name = "", "", CONCAT(" <", users.name, ">")))
FROM users
ORDER BY IF(last_login > CURRENT_TIMESTAMP - INTERVAL 1 MONTH,
last_login, NULL) DESC,
contribution DESC
LIMIT 1000
)).uniq.sort
end
There's also similarly a conditional in ORDER BY.
While generally I abhor Raw SQL in rails given this usage I'd leave it as is. Although I might change it to something a bit more idiomatic like.
User
.order(
Arel.sql("IF(last_login > CURRENT_TIMESTAMP - INTERVAL 1 MONTH,last_login, NULL)").desc,
User.arel_table[:contribution].desc)
.limit(1000)
.pluck(Arel.sql(
'CONCAT(users.login,
IF(users.name = "", "",
CONCAT(" <", users.name, ">")))'))
.uniq.sort
Converting this to Arel without abstracting it into an object of its own will damage the readability significantly.
That being said just to give you an idea; the first part would be 3 NamedFunctions
CONCAT
IF
CONCAT
Arel::Nodes::NamedFuction.new(
"CONCAT",
[User.arel_table[:name],
Arel::Nodes::NamedFuction.new(
"IF",
[User.arel_table[:name].eq(''),
Arel.sql("''"),
Arel::Nodes::NamedFuction.new(
"CONCAT",
[Arel.sql("' <'"),
User.arel_table[:name],
Arel.sql("'>'")]
)]
)]
)
A NamedFunction is a constructor for FUNCTION_NAME(ARG1,ARG2,ARG3) so any SQL that uses this syntax can be created using NamedFunction including empty functions like NOW() or other syntaxes like LATERAL(query).

Google BigQuery: TABLE_QUERY AND TABLE_DATE_RANGE

I have Big Query tables like below, and like to issue a query to the tables marked <=.
prefix_AAAAAAA_20170320
prefix_AAAAAAA_20170321
prefix_AAAAAAA_20170322 <=
prefix_AAAAAAA_20170323 <=
prefix_AAAAAAA_20170324 <=
prefix_AAAAAAA_20170325
prefix_BBBBBBB_20170320
prefix_BBBBBBB_20170321
prefix_BBBBBBB_20170322 <=
prefix_BBBBBBB_20170323 <=
prefix_BBBBBBB_20170324 <=
prefix_BBBBBBB_20170325
prefix_CCCCCCC_20170320
prefix_CCCCCCC_20170321
prefix_CCCCCCC_20170322
prefix_CCCCCCC_20170323
prefix_CCCCCCC_20170324
prefix_CCCCCCC_20170325
I made a query as this
SELECT * FROM
(TABLE_QUERY(mydataset,
'table_id CONTAINS "prefix" AND
(table_id CONTAINS "AAAAAA" OR table_id CONTAINS "BBBBBB")' )
AND
TABLE_DATE_RANGE(mydataset.prefix, TIMESTAMP('2017-03-22'), TIMESTAMP('2017-03-24')))
I got this error.
Error: Encountered " "AND" "AND "" at line 5, column 4. Was expecting: ")" ...
Does anybody has ideas?
You cannot mix TABLE_QUERY and TABLE_DATE_RANGE for exactly same FROM!
Try something like below
#legacySQL
SELECT *
FROM (TABLE_QUERY(mydataset, 'REGEXP_MATCH(table_id, "prefix_[AB]{7}_2017032[234]")'))
Consider Migrating to BigQuery Standard SQL
In this case you can Query Multiple Tables Using a Wildcard Table
See How to Migrate from TABLE_QUERY() to _TABLE_SUFFIX
I think, in this case your query can look like
#standardSQL
SELECT *
FROM `mydataset.prefix_*`
WHERE REGEXP_CONTAINS(_TABLE_SUFFIX, '[AB]{7}_2017032[234]')
I can not migrate to Standard SQL because ...
If I would like to search for example between 2017-03-29 and 2017-04-02, do you have any smart SQL
Try below version
#legacySQL
SELECT *
FROM (TABLE_QUERY(mydataset,
'REGEXP_MATCH(table_id, r"prefix_[AB]{7}_(\d){8}") AND
RIGHT(table_id, 8) BETWEEN "20170329" AND "20170402"'))
Of course yo can adjust above to use whatever exactly logic yo need to apply!

OrientDB using LET values in subQuery

How can you use a LET temporary variable inside the Where clause in an OrientDB SQL subQuery.
Here is the context in wich I'm trying to use it.
select *, $t.d from Currency
let $t = (select createdDate.asLong() as d from 13:1)
where createdDate.asLong() >= $t.d and #rid <> #13:1
order by createdDate ASC
The validation in the where statement for the dates does not work. The subQuery actually works on its own. The Query works as well when replacing $t.d with the result from the subQuery.
The $t.d is an array so you are comparing something like createdDate.asLong() >= [1234599]
You have to do this: createdDate.asLong() >= $t[0].d

Dynamic operators in where clause in Informatica

Is it possible to create dynamic sql operator in Informatica using SQL Transformation. For eg.
SELECT p.id
FROM products p
WHERE p.weight ?operator? '30'
where ?operator? can have values: <, > , =
or even: in, not in
The SQL Editor window of SQL Transform allows to use parameter binding (?parameter?) and string substitution (~string~). You need the latter:
SELECT p.id
FROM products p
WHERE p.weight ~operator~ '30'
This topic is described well in SQL Transformation > Query Mode chapter of the Transformation Guide.
One idea is to use a parameter for the whole condition, e.g. With this sample paramFile:
[s_m_test_source_param]
$$sq_param = Id = 1
Use $$sq_param value for Source Filter property on Source Qualifier. In your case youd need to set the $$sq_parameter in this way:
$$sq_param = p.weight > '30'
Obviously, this is not perfect solution you've been looking for.

SQL regex and field

I want to change the query to return multiply values in extra_fields, how can I change the regex? Also I don't understand what extra_fields is - is it a field? If so why it is not called with the table prefix like i.extra_fields?
SELECT i.*,
CASE WHEN i.modified = 0 THEN i.created ELSE i.modified END AS lastChanged,
c.name AS categoryname,
c.id AS categoryid,
c.alias AS categoryalias,
c.params AS categoryparams
FROM #__k2_items AS i
LEFT JOIN #__k2_categories AS c ON c.id = i.catid
WHERE i.published = 1
AND i.access IN(1,1)
AND i.trash = 0
AND c.published = 1
AND c.access IN(1,1)
AND c.trash = 0
AND (i.publish_up = '0000-00-00 00:00:00'
OR i.publish_up <= '2013-06-12 22:45:19'
)
AND (i.publish_down = '0000-00-00 00:00:00'
OR i.publish_down >= '2013-06-12 22:45:19'
)
AND extra_fields REGEXP BINARY '(.*{"id":"2","value":\["[^\"]*1[^\"]*","[^\"]*2[^\"]*","[^\"]*3[^\"]*"\]}.*)'
ORDER BY i.id DESC
The extra_fields is a column of the #__k2_items table. The table qualifier can be omitted, because it is not ambiguous in this query. The column is JSON encoded. That is a serialization format used to store information which is not searchable by design. Applying a RegExp may work one day, but fail another day, since there is no guarantee for id preceeding value (as in your example).
The right way
The right way to filter this is to ignore the extra_fields condition in the SQL query an evaluate in the resultset instead. Example:
$rows = $db->loadObjectList('id');
foreach ($rows as $id => $row) {
$extra_fields = json_decode($row->extra_fields);
if ($extra_fields->id != 2) {
unset($rows[$id]);
}
}
The short way
If you can't change the database layout (which is true for extensions you want to keep updateable), you must split the condition into two, because there is no guarantee for a certain order of the subfields. For some reason, one day value may occur before id. So change your query to
...
AND extra_fields LIKE '%"id":"2"%'
AND extra_fields REGEXP BINARY '"value":\[("[^\"]*[123][^\"]*",?)+\]'
Prepare an intermediate table to hold the contents of extra_fields. Each extra_fields field will be converted into a series of records. Then do a join.
Create a trigger and cronjob to keep the temp table in sync.
Another way is to write UDF in Perl that will decode the field, but AFAIK it is not indexable in mysql.
Using an external search engine is out of scope.
Ok, i didnt want to change the db strucure, i gost some help and changed the regex intoAND extra_fields REGEXP BINARY '(.*{"id":"2","value":\[("[^\"]*[123][^\"]*",?)+\]}.*)'
and i got the right resaults
Thanks