BigQuery UDF in Python or only in JavaScript - google-bigquery

I have been looking into how to write a UDF in BigQuery and found this syntax:
CREATE { TEMPORARY | TEMP } FUNCTION
function_name ([named_parameter[, ...]])
[RETURNS data_type]
{ [LANGUAGE language AS """body"""] | [AS (function_definition)] };
In the document I found, there is no clear mention of what languages are supported. In the examples given in the page, it only talks about "js" and I can't find any other language examples so I presume it only supports JavaScript but I am wondering whether anyone knows for sure.

From that same page:
Supported external UDF languages
External UDFs support code written in JavaScript, which you specify using js as the LANGUAGE.
You can't use languages other than JavaScript.

Related

Bigquery's UDFs: how to load a jQuery library with javascript UDFs?

I'm trying to use jQuery with UDFs written in javascript, to be used with BigQuery. I uploaded the jQuery library to my cloud-storage, but when I try to upload it to my UDF I'm getting an error
TypeError: Cannot read property 'createElement' of undefined at gs://mybucket/jquery.min.js line 2, columns 7311-7312
Any help, please?
Thank you.
CREATE TEMP FUNCTION test()
RETURNS STRING
LANGUAGE js
OPTIONS (
library=["gs://mybucket/jquery.min.js"]
)
AS """
return "test";
""";
As you might know, some limitations apply to temporary and persistent user-defined functions
One of them - The DOM objects Window, Document, and Node, and functions that require them, are not supported.
This might be a reason!

Is there any way to call bigquery API functions from javascript?

Here I have a scala UDF that checks if a url is one of my domains. To check if 'to_site' is one of my domains, I'm using indexOf in javascript.
CREATE TEMPORARY FUNCTION our_domain(to_site STRING)
RETURNS BOOLEAN
LANGUAGE js AS """
domains = ['abc.com', 'xyz.com'];
if (to_site == null || to_site == undefined) return false;
for (var i = 0; i < domains.length; i++){
var q= DOMAIN('XYZ');
if (String.prototype.toLowerCase.call(to_site).indexOf(domains[i]) !== -1)
return true;
}
return false;
""";
SELECT our_domain('www.foobar.com'), our_domain('www.xyz.com');
This returns false, then true.
It would be much nicer if I could use the DOMAIN(url) function from javascript. indexOf is not very good because it will match www.example.com?from=www.abc.com, when actually example.com is not one of my domains. Javascript also has a (new URL('www.example.com/q/z')).hostname to parse the domain component, but it includes the subdomain like 'www.' which complicates the comparison. Bigquery's DOMAIN(url) function only gives the domain and knowing google it's fast C++.
I know I can do this
our_domain(DOMAIN('www.xyz.com'))
But in general it would be nice to use some of the bigquery API functions in javascript. Is this possible?
I also tried this
CREATE TEMPORARY FUNCTION our_domain1(to_site String)
AS (our_domain(DOMAIN(to_site));
but it fails saying DOMAIN does not exist.
DOMAIN() function is supported in BigQuery Legacy SQL whereas Scalar UDFs are part of BigQuery Standard SQL.
So, unfortunatelly, no, you cannot use DOMAIN() function with code that uses Scalar UDF as of now at least.
And, no, you cannot use SQL Functions within JS [Scalar] UDFs, but you can use them in SQL UDFs
Finally, as I suggested in my answer on your previous question - in somple scenario like this your particular one - you better use SQL Scalar SQL vs. JS Scalar UDFs - they do not have LIMITs that JS UDFs have
The DOMAIN function in legacy SQL is more or less just a regular expression. Have you seen this previous question about DOMAIN? As Mikhail points out, you should be able to define a SQL UDF that uses a regex to extract the domain and then checks if it's in your list.

What's the correct casing to use for jsDoc comments?

I've recently started using jsdoc comments for documenting our javascript code, however I'm finding conflicting examples of the usage of the #param tag.
See https://code.google.com/p/jsdoc-toolkit/wiki/TagParam (PascalCase)
and https://developers.google.com/closure/compiler/docs/js-for-compiler (camel/lowercase).
camelCase makes sense to me since:
var foo = 1;
console.log(typeof foo); // outputs "number"
What's the correct casing to use for jsDoc #param comments? Or does it not matter? I'm planning to use it for document generation as well as running the code through google closure to get type checking.
Thanks!
The conflicting examples for JSDoc type expressions involve the JavaScript primitive types string, number and boolean, which have corresponding wrapper types: String, Number, and Boolean.
From Closure: The Definitive Guide:
The use of wrapper types is prohibited in the Closure Library, as
some functions may not behave correctly if wrapper types are used
where primitive types are expected.
See MDN: Distinction between string primitives and String objects.

What's the naming convention for a function that returns a function?

function getPerformActionFunction(someParameter) {
return function() {
performAction(someParameter);
}
}
What would you call getPerformActionFunction to indicate that it doesn't perform the action, but rather returns a function which performs the action?
Example is Javascript, and if there's a Javascript convention that's preferred, but also interested in other languages if the answer differs.
Not sure if it's in any style guides, but I quite like the -er suffix to suggest something that is able to do an action.
e.g. getActionPerformer or fooHandler or XMLTransformer
I've used this sort of style in C#, Java and Clojure an it seems to work OK.

Writing a TemplateLanguage/VewEngine

Aside from getting any real work done, I have an itch. My itch is to write a view engine that closely mimics a template system from another language (Template Toolkit/Perl). This is one of those if I had time/do it to learn something new kind of projects.
I've spent time looking at CoCo/R and ANTLR, and honestly, it makes my brain hurt, but some of CoCo/R is sinking in. Unfortunately, most of the examples are about creating a compiler that reads source code, but none seem to cover how to create a processor for templates.
Yes, those are the same thing, but I can't wrap my head around how to define the language for templates where most of the source is the html, rather than actual code being parsed and run.
Are there any good beginner resources out there for this kind of thing? I've taken a ganer at Spark, which didn't appear to have the grammar in the repo.
Maybe that is overkill, and one could just test-replace template syntax with c# in the file and compile it. http://msdn.microsoft.com/en-us/magazine/cc136756.aspx#S2
If you were in my shoes and weren't a language creating expert, where would you start?
The Spark grammar is implemented with a kind-of-fluent domain specific language.
It's declared in a few layers. The rules which recognize the html syntax are declared in MarkupGrammar.cs - those are based on grammar rules copied directly from the xml spec.
The markup rules refer to a limited subset of csharp syntax rules declared in CodeGrammar.cs - those are a subset because Spark only needs to recognize enough csharp to adjust single-quotes around strings to double-quotes, match curley braces, etc.
The individual rules themselves are of type ParseAction<TValue> delegate which accept a Position and return a ParseResult. The ParseResult is a simple class which contains the TValue data item parsed by the action and a new Position instance which has been advanced past the content which produced the TValue.
That isn't very useful on it's own until you introduce a small number of operators, as described in Parsing expression grammar, which can combine single parse actions to build very detailed and robust expressions about the shape of different syntax constructs.
The technique of using a delegate as a parse action came from a Luke H's blog post Monadic Parser Combinators using C# 3.0. I also wrote a post about Creating a Domain Specific Language for Parsing.
It's also entirely possible, if you like, to reference the Spark.dll assembly and inherit a class from the base CharGrammar to create an entirely new grammar for a particular syntax. It's probably the quickest way to start experimenting with this technique, and an example of that can be found in CharGrammarTester.cs.
Step 1. Use regular expressions (regexp substitution) to split your input template string to a token list, for example, split
hel<b>lo[if foo]bar is [bar].[else]baz[end]world</b>!
to
write('hel<b>lo')
if('foo')
write('bar is')
substitute('bar')
write('.')
else()
write('baz')
end()
write('world</b>!')
Step 2. Convert your token list to a syntax tree:
* Sequence
** Write
*** ('hel<b>lo')
** If
*** ('foo')
*** Sequence
**** Write
***** ('bar is')
**** Substitute
***** ('bar')
**** Write
***** ('.')
*** Write
**** ('baz')
** Write
*** ('world</b>!')
class Instruction {
}
class Write : Instruction {
string text;
}
class Substitute : Instruction {
string varname;
}
class Sequence : Instruction {
Instruction[] items;
}
class If : Instruction {
string condition;
Instruction then;
Instruction else;
}
Step 3. Write a recursive function (called the interpreter), which can walk your tree and execute the instructions there.
Another, alternative approach (instead of steps 1--3) if your language supports eval() (such as Perl, Python, Ruby): use a regexp substitution to convert the template to an eval()-able string in the host language, and run eval() to instantiate the template.
There are sooo many thing to do. But it does work for on simple GET statement plus a test. That's a start.
http://github.com/claco/tt.net/
In the end, I already had too much time in ANTLR to give loudejs' method a go. I wanted to spend a little more time on the whole process rather than the parser/lexer. Maybe in version 2 I can have a go at the Spark way when my brain understands things a little more.
Vici Parser (formerly known as LazyParser.NET) is an open-source tokenizer/template parser/expression parser which can help you get started.
If it's not what you're looking for, then you may get some ideas by looking at the source code.