I was trying to create a BigQuery UDF which requires an external npm package.
CREATE TEMPORARY FUNCTION tempfn(message STRING)
RETURNS STRING
LANGUAGE js AS """
var tesfn = require('js-123');
return tesfn(message)
""";
SELECT tempfn("Hello") as test;
It gives me an error
ReferenceError: require is not defined at tempfn(STRING) line 2,
columns 15-16
Is there a way that I can use these packages?
You can't load npm packages using require from JavaScript UDFs. You can, however, load external libraries from GCS, as outlined in the documentation. The example that the documentation gives is,
CREATE TEMP FUNCTION myFunc(a FLOAT64, b STRING)
RETURNS STRING
LANGUAGE js AS
"""
// Assumes 'doInterestingStuff' is defined in one of the library files.
return doInterestingStuff(a, b);
"""
OPTIONS (
library="gs://my-bucket/path/to/lib1.js",
library=["gs://my-bucket/path/to/lib2.js", "gs://my-bucket/path/to/lib3.js"]
);
SELECT myFunc(3.14, 'foo');
Here the assumption is that you have files with these names in Cloud Storage, and that one of them defines doInterestingStuff.
Related
As per the python API documentation of BigQuery (version 3.3.2), there is a method for insert_rows_from_dataframe (dataframe: pandas.DataFrame) but no similar method exists for PyArrow.
insert_rows_from_dataframe(
table: Union[
google.cloud.bigquery.table.Table,
google.cloud.bigquery.table.TableReference,
str,
],
dataframe,
selected_fields: Optional[
Sequence[google.cloud.bigquery.schema.SchemaField]
] = None,
chunk_size: int = 500,
**kwargs: Dict
)
So what is an equivalent way to load data from pyarrow.Table format into a BigQuery table.
This method ultimately writes a file (I believe parquet by default) and the executes a load job. You can do that directly with the arrow table. It might pay to file a feature request for the BQ library to support this use case directly.
I am using a Spark UDF to read some data from a GET endpoint and write them as a CSV file to a Azure BLOB location.
My GET endpoint takes 2 query parameters,param1 and param2.
So initially, I have a dataframe paramDF that has two columns param1 and param2.
param1 param2
12 25
45 95
Schema: paramDF:pyspark.sql.dataframe.DataFrame
param1:string
param2:string
Now I write a UDF that accept the two parameters, register it, and then invoke this UDF for each row in the dataframe.
UDF is as below:
def executeRestApi(param1,param2):
dlist=[]
try:
print(DataUrl.format(token=TOKEN, q1=param1,q2=param2))
response=requests.get(DataUrl.format(token=TOKEN, oid=param1,wid=param2))
if(response.status_code==200):
metrics=response.json()['data']['metrics']
dic={}
dic['metric1'] = metrics['metric1']
dic['metric2'] = metrics['metric2']
dlist.append(dic)
pandas.DataFrame(dlist).to_csv("../../dbfs/mnt/raw/Important/MetricData/listofmetrics.csv",header=True,index=False,mode='x')
return "Success"
except Exception as e:
return "Failure"
Register the UDF:
udf_executeRestApi = udf(executeRestApi, StringType())
Finally the call the UDF this way
paramDf.withColumn("result",udf_executeRestApi(col("param1"),col("param2"))
I dont see any errors while calling the UDF, in fact the UDF returns the value "Success" correctly.
Only thing is that the files are not written to Azure BLOB storage, no matter what I try.
UDFs' are primarily meant for custom functionality(and return a value).However ,in my case, I am trying to execute the GET API call and the write operation using the UDF(and that is my main intention here).
There is no issue with my pandas.DataFrame().tocsv(),as the same line, when tried separately,with a simple list is writing data to the BLOB correctly.
What could be going wrong here?
Note: Env is Spark on Databricks.
There isn't any problem with the indentation, even though it looks untidy here.
Try calling display on the dataframe
How to implement Fuzzball Java script as a UDF inside Bigquery? Fuzzball has good amount of dependency libraries which is challenging to include as part of UDF inside Bigquery.
It's unclear where you're running into trouble, so I will walk through the process of creating a JavaScript UDF using fuzzball.
Download the fuzzball package: npm i fuzzball
Upload the appropriate file(s) to a GCS bucket. What you want is likely a umd or esm file. At time of writing, fuzzball.umd.min.js
Write your SQL UDF, providing the bucket path and package file in OPTIONS.
For example:
CREATE OR REPLACE FUNCTION
project.table.func (str_1 STRING, str_2 STRING)
RETURNS INT64
LANGUAGE js AS '''
return fuzzball.distance(str_1, str_2);
'''
OPTIONS (library='gs://bucket_name/fuzzball.umd.min.js');
And now you should be able to call your UDF as needed.
I would like to reference a UDF inside a View. According to BigQuery documentation ('bq help mk') and to this post How do I create a BigQuery view that uses a user-defined function?, it is possible to do it with the "--view_udf_resource" syntax.
However, when I try it I get the following error:
# gsutil cat gs://mybucket/bar.js
CREATE TEMP FUNCTION GetWord() AS ('fire');
# bq mk --nouse_legacy_sql --view_udf_resource="gs://mybucket/bar2.js" --view="SELECT 1 as one, GetWord() as myvalue" mydataset.myfoo
Error in query string: Function not found: GetWord at [1:18]
I have also tried it with the Java API and I get the same error:
public void foo(){
final String viewQuery = "#standardSQL\n SELECT 1 as one, GetWord() as myvalue";
UserDefinedFunction userDefinedFunction = UserDefinedFunction.inline("CREATE TEMP FUNCTION GetWord() AS ('fire');");
ViewDefinition tableDefinition = ViewDefinition.newBuilder(viewQuery)
.setUserDefinedFunctions(userDefinedFunction)
.build();
TableId viewTableId = TableId.of(projectName, dataSetName, "foobar");
final TableInfo tableInfo = TableInfo.newBuilder(viewTableId, tableDefinition).build();
bigQuery.create(tableInfo);
}
com.google.cloud.bigquery.BigQueryException: Function not found: GetWord at [2:19]
Am I doing something wrong? Or is the Google's documentation misleading and it is not possible to reference any custom UDF from a View?
You cannot (currently) create a view using standard SQL that uses UDFs. You need to have all of the logic be inline as part of the query itself, and the post that you are looking at is about JavaScript UDFs using legacy SQL. There is an open feature request to support permanent registration of UDFs, however, which would enable you to reference UDFs from views.
I am trying to transform the R/Shiny/SQL application to use data from SQL Server instead of Oracle. In the original code there is a lot of the following type conditions: If the table exists, use it as a data set, otherwise upload new data. I was looking for a counterpart of dbExistsTable command from DBI/ROracle packages, but the odbcTableExists is unfortunately just internal RODBC command not usable in R environment. Also a wrapper for RODBC package, allowing to use DBI type commands - RODBCDBI seems not working. Any ideas?
Here is some code example:
library(RODBC)
library(RODBCDBI)
con <- odbcDriverConnect('driver={SQL
Server};server=xx.xx.xx.xxx;database=test;uid=user;pwd=pass123')
odbcTableExists(con, "table")
Error: could not find function "odbcTableExists"
dbExistsTable(con,"table")
Error in (function (classes, fdef, mtable) :
unable to find an inherited method for function ‘dbExistsTable’ for signature ‘"RODBC", "character"’
You could use
[Table] %in% sqlTables(conn)$TABLE_NAME
Where [Table] is a character string of the table you are looking for.