to_json not working with selectExpr in spark - sql

I am reading a databricks blog link
and I find a problem with the built-in function to_json.
In the codes blew within this tutorial, it returns error:
org.apache.spark.sql.AnalysisException: Undefined function: 'to_json'. This function is neither a registered temporary function nor a permanent function registered in the database 'default'.
Does this means that this usage in the tutorial is wrong? and no udf could be used in selectExpr. Could I do something like register this to_json function into default database?
val deviceAlertQuery = notifydevicesDS
.selectExpr("CAST(dcId AS STRING) AS key", "to_json(struct(*)) AS value")
.writeStream
.format("kafka")
.option("kafka.bootstrap.servers", "host1:port1,host2:port2")
.option("toipic", "device_alerts")
.start()

You need to improt the to_json function as
import org.apache.spark.sql.functions.to_json
This should work rather than the selectExpr
data.withColumn("key", $"dcId".cast("string"))
.select(to_json(struct(data.columns.head, data.columns.tail:_*)).as("value")).show()
You must also use the spark 2.x
I hope this helps to solve your problem.

based on information I get from mail list. this function are not added into SQL from spark 2.2.0. Here is the commit link:commit.
Hope this will help. THX Hyukjin Kwon and Burak Yavuz.

Related

pandas to_csv function not writing to Blob Storage when called from Spark UDF

I am using a Spark UDF to read some data from a GET endpoint and write them as a CSV file to a Azure BLOB location.
My GET endpoint takes 2 query parameters,param1 and param2.
So initially, I have a dataframe paramDF that has two columns param1 and param2.
param1 param2
12 25
45 95
Schema: paramDF:pyspark.sql.dataframe.DataFrame
param1:string
param2:string
Now I write a UDF that accept the two parameters, register it, and then invoke this UDF for each row in the dataframe.
UDF is as below:
def executeRestApi(param1,param2):
dlist=[]
try:
print(DataUrl.format(token=TOKEN, q1=param1,q2=param2))
response=requests.get(DataUrl.format(token=TOKEN, oid=param1,wid=param2))
if(response.status_code==200):
metrics=response.json()['data']['metrics']
dic={}
dic['metric1'] = metrics['metric1']
dic['metric2'] = metrics['metric2']
dlist.append(dic)
pandas.DataFrame(dlist).to_csv("../../dbfs/mnt/raw/Important/MetricData/listofmetrics.csv",header=True,index=False,mode='x')
return "Success"
except Exception as e:
return "Failure"
Register the UDF:
udf_executeRestApi = udf(executeRestApi, StringType())
Finally the call the UDF this way
paramDf.withColumn("result",udf_executeRestApi(col("param1"),col("param2"))
I dont see any errors while calling the UDF, in fact the UDF returns the value "Success" correctly.
Only thing is that the files are not written to Azure BLOB storage, no matter what I try.
UDFs' are primarily meant for custom functionality(and return a value).However ,in my case, I am trying to execute the GET API call and the write operation using the UDF(and that is my main intention here).
There is no issue with my pandas.DataFrame().tocsv(),as the same line, when tried separately,with a simple list is writing data to the BLOB correctly.
What could be going wrong here?
Note: Env is Spark on Databricks.
There isn't any problem with the indentation, even though it looks untidy here.
Try calling display on the dataframe

How to pass the background response value of the to another feature json in function value using Karate

I have got the response in the background to one of the request and passing to the function for polling purpose and need to run until specific condition met. In that function, I need to pass the values to the calling feature JSON file
while (true) {
var result = karate.call('extractProgress.feature') packageid; -- package id
is response of another request
I followed the similar way as mentioned but in that not passing any parameter.
https://github.com/intuit/karate/blob/933d3803987a736cc1a38893e7039c4b5e5132fc/karate-demo/src/test/java/demo/polling/polling.feature
But i am getting the below error
feature(com.intuit.karate.testng.KarateTestngTest):
java.lang.RuntimeException: javascript evaluation failed: packageid,
ReferenceError: "packageid" is not defined in at line number 1
Input for call inside js should be given as
karate.call("<featureFile>",yourInputVaraible);
refer this on doc
https://github.com/intuit/karate#the-karate-object
It sounds wrong to me, maybe you have a typo.
Also please read the docs carefully. Only JSON is supported as a call argument.
The best way for you to get support is to follow this process, else no one can help you with the limited info you seem to be providing in your questions.
https://github.com/intuit/karate/wiki/How-to-Submit-an-Issue

Is "--view_udf_resource" broken?

I would like to reference a UDF inside a View. According to BigQuery documentation ('bq help mk') and to this post How do I create a BigQuery view that uses a user-defined function?, it is possible to do it with the "--view_udf_resource" syntax.
However, when I try it I get the following error:
# gsutil cat gs://mybucket/bar.js
CREATE TEMP FUNCTION GetWord() AS ('fire');
# bq mk --nouse_legacy_sql --view_udf_resource="gs://mybucket/bar2.js" --view="SELECT 1 as one, GetWord() as myvalue" mydataset.myfoo
Error in query string: Function not found: GetWord at [1:18]
I have also tried it with the Java API and I get the same error:
public void foo(){
final String viewQuery = "#standardSQL\n SELECT 1 as one, GetWord() as myvalue";
UserDefinedFunction userDefinedFunction = UserDefinedFunction.inline("CREATE TEMP FUNCTION GetWord() AS ('fire');");
ViewDefinition tableDefinition = ViewDefinition.newBuilder(viewQuery)
.setUserDefinedFunctions(userDefinedFunction)
.build();
TableId viewTableId = TableId.of(projectName, dataSetName, "foobar");
final TableInfo tableInfo = TableInfo.newBuilder(viewTableId, tableDefinition).build();
bigQuery.create(tableInfo);
}
com.google.cloud.bigquery.BigQueryException: Function not found: GetWord at [2:19]
Am I doing something wrong? Or is the Google's documentation misleading and it is not possible to reference any custom UDF from a View?
You cannot (currently) create a view using standard SQL that uses UDFs. You need to have all of the logic be inline as part of the query itself, and the post that you are looking at is about JavaScript UDFs using legacy SQL. There is an open feature request to support permanent registration of UDFs, however, which would enable you to reference UDFs from views.

How to find dbcreatestruct method in mule esb3.8.4 jar file?

Recently i want to use UDT while inserting data into db through procedure call.Now UDT is supported in Mule ESB 3.8.4 version.They have given some example in MULE-11138 jira task.But when i am using MEL function defined in above task.I am getting
java.lang.AbstractMethodError: org.enhydra.jdbc.standard.StandardXAConnectionHandle.createStruct(Ljava/lang/String;[Ljava/lang/Object;)Ljava/sql/Struct;
Can any one help me on this?
Thanks.
Finally i am able to resolve this error all i need to do is to use type defined in db directly for creation of array or struct like.
dbCreateArray('Oracle_Configuration', 'APPS.CONTACT_INFO_ASYNC_TAB_TYPE', payload);
Thanks,
Anurag
#Anurag
This is available only in enterprise edition.

How do I utilize a named instance within the ObjectFactory.Initialize call for StructureMap?

I am trying to do the following bootstrapping:
x.For(Of IErrorLogger).Use(Of ErrorLogger.SQLErrorLogger)().
Ctor(Of IErrorLogger)("backupErrorLogger").Is(ObjectFactory.GetNamedInstance(Of IErrorLogger)("Disk"))
x.For(Of IErrorLogger).Add(
Function()
Return New ErrorLogger.DiskErrorLogger(
CreateErrorFileName(ServerMapPath(GetAppSetting("ErrorLogFolder"))))
End Function).Named("Disk")
But it shows this error:
StructureMap Exception Code: 200
Could not find an Instance named "Disk" for PluginType Logging.IErrorLogger
I sort of understand why this is happening.. the question is, how do I utilize a named instance within the registry? Maybe something like lazy initialization for the ctor argument for the SQLErrorLogger? I am not sure how to make it happen.
Thanks in advance for any help you can provide.
I found the correct way to do it in the latest version (2.6.1) of StructureMap:
x.For(Of IErrorLogger).Use(Of ErrorLogger.SQLErrorLogger)().
Ctor(Of IErrorLogger)("backupErrorLogger").Is(
Function(c) c.ConstructedBy(Function() ObjectFactory.GetNamedInstance(Of IErrorLogger)("Disk"))
)
x.For(Of IErrorLogger).Add(Function() _
New ErrorLogger.DiskErrorLogger(
CreateErrorFileName(ServerMapPath(GetAppSetting("ErrorLogFolder"))))
).Named("Disk")
Notice for the Is method of Ctor, we need to provide a func(IContext), and use the IContext.ConstructedBy(Func()) to call ObjectFactory.Get... to successfully register the IErrorLogger in this case.
This is the only way to do it as far as I know. The other Icontext methods such as IsThis and Instance will only work with already registered type.
Your problem is that you are trying to access the Container before it's configured. In order to make structuremap evaluate the object resolution after the configuration you need to provide a lambda to the Is function. The lambda will be evaluated when trying to resolve the type registered.
x.[For](Of ILogger)().Add(Of SqlLogger)().Ctor(Of ILogger)("backupErrorLogger")_
.[Is](Function(context) context.GetInstance(Of ILogger)("Disk"))
x.[For](Of ILogger)().Add(Of DiskLogger)().Ctor(Of String)("errorFileName")_
.[Is](CreateErrorFileName(ServerMapPath(GetAppSetting("ErrorLogFolder"))))_
.Named("Disk")
Disclaimer: I'm not completely up-to-date with the lambda syntax in VB.NET, but I hope I got it right.
Edit:
The working C# version of this I tried myself before posting was this:
ObjectFactory.Initialize(i =>
{
i.For<ILogger>().Add<SqlLogger>()
.Ctor<ILogger>("backup").Is(
c => c.GetInstance<ILogger>("disk"))
.Named("sql");
i.For<ILogger>().Add<DiskLogger>().Named("disk");
});
var logger = ObjectFactory.GetNamedInstance<ILogger>("sql");