Spark 2.3.1 array_join and array_remove - sql

I have coded a pyspark script to execute a SQL file, it worked perfectly fine on the spark latest version, but the target machine has 2.3.1, and it throws exception:
pyspark.sql.utils.AnalysisException: u"Undefined function: 'array_remove'. This function is neither a registered temporary function nor a permanent function registered in the database 'default'
It seems these are not present in the older versions :( can anyone suggest something, i have searched alot but in vain.
my sql piece which is failing is
SELECT NIEDC.*, array_join(array_remove(split(groupedNIEDC.appearedIn,'-'), StudyCode),'-') AS subjects_assigned_to_other_studies

array_remove and array_join functions were added on spark version 2.4. You can make an UDF and register it to use in a query using this method.

Related

How we can use mutimap_agg function in spark sql and also suggest if any equivalent or alternative function to this

Can anyone help how multimap_agg function in SQL and can be used in spark sql
multimap_agg function doesn't exist in spark-sql at least on version 3.2.1
Reference:
https://spark.apache.org/docs/latest/sql-ref-functions.html

How to make work Django Window expression with SQLite?

I am testing Django ORM 'Window' SQL-wrapper capabilities.
I have following query in my code:
queryset = TripInterval.objects.annotate(
num=Window(RowNumber(), order_by=F('id').asc())
).values('id', 'num')
which results in the following SQL query string (from debugger):
SELECT "request_tripinterval"."id",
ROW_NUMBER() OVER (ORDER BY "request_tripinterval"."id" ASC) AS "num"
FROM "request_tripinterval"
and is pretty straightforward. It WORKS when I copy/paste it into third party db-client. But Django ORM gives me an error:
OperationalError
near "(": syntax error
What is wrong here?
System: Windows 10
RDBMS: SQLite
Django: 2.2.4
Python: 3.6.0
Sounds like your Python is using an outdated version of SQLite.
SQLite added support for window functions in version 3.25, released in August 2018. Prior to that version, the exact same syntax error you're seeing would be thrown when trying to use window functions.
You can check the SQLite version used by Python by running this in the interpreter:
import sqlite3
sqlite3.sqlite_version
If the version that is output is older that 3.25, you'll need to upgrade your SQLite library version.
On a Windows system, the easiest way to do that is by installing the sqlite package from Anaconda. Otherwise, the general approach is to upgrade your installed system SQLite libraries, then recompile/reinstall Python. Alternatively, you could try installing the pysqlite package from PyPi.

Error SQL71501 on exporting Azure SQL Database

I'm getting a strange error when exporting an Azure SQL Database. Exports had been working fine until some recent schema changes, but it's now giving me Error SQL71501.
The database is V12, Compatibility Level 130 (although the master database is still Compatibility Level 120).
The problem seems to be caused by a new table-valued function, which uses the built in STRING_SPLIT function. There were already stored procedures using STRING_SPLIT and they don't seem to have affected the export, but the function (which compiles OK, and is working fine) seems to cause a problem with the export.
The function below is a simplified version of the real one, but causes the same problem.
CREATE FUNCTION [dbo].[TestFunction](
#CommaSeparatedValues VARCHAR(MAX)
)
RETURNS TABLE
AS
RETURN
SELECT c.ClientId,
c.FullName
FROM dbo.Client c
INNER JOIN STRING_SPLIT(#CommaSeparatedValues, ',') csv
ON c.ClientId = csv.value
The complete error message given in the Import/Export history blade is as follows:
Error encountered during the service operation.
One or more unsupported elements were found in the schema used as part of a data package.
Error SQL71501: Error validating element [dbo].[TestFunction]: Function: [dbo].[TestFunction] has an unresolved reference to object [dbo].[STRING_SPLIT].
Error SQL71501: Error validating element [dbo].[TestFunction]: Function: [dbo].[TestFunction] contains an unresolved reference to an object. Either the object does not exist or the reference is ambiguous because it could refer to any of the following objects: [dbo].[Client].[csv], [dbo].[STRING_SPLIT].[csv] or [dbo].[STRING_SPLIT].[value].
This is Xiaochen from Microsoft SQL team. We are already working on the fix of this issue. The fix will be deployed to the export service in next few weeks. In the same time, the fix is already available in the latest DacFX 16.4 (https://blogs.msdn.microsoft.com/ssdt/2016/09/20/sql-server-data-tools-16-4-release/). Before we fix this issue in the service, you can download the DacFX 16.4 and use sqlpackage to work around.
SQLAzure validates Schema,references of the objects when you export database,if any of the references fails like the below one in your case
Error SQL71501: Error validating element [dbo].[TestFunction]: Function: [dbo].[TestFunction] has an unresolved reference to object [dbo].[STRING_SPLIT].
the export won't succeed..So you will need to resolve those errors,prior to export..
From the docs,you will need to set the compatibility level to 130
The STRING_SPLIT function is available only under compatibility level 130. If your database compatibility level is lower than 130, SQL Server will not be able to find and execute STRING_SPLIT function
Update:
I was able to repro the same issue and only current workaround is to delete the table valued function,that is referencing system function and export DACPAC,once export is done,recreate the table valued function :(
I have raised the issue here:please upvote..
https://feedback.azure.com/forums/217321-sql-database/suggestions/16722646-azure-database-export-fails-when-split-string-is-i
Data Migration Assistant did it for me. First run an assessment on the schema, on success, run your migration https://www.microsoft.com/en-us/download/details.aspx?id=53595/

Hive on Spark execution engine failed

I am trying Hive on Spark execution engine.I am using Hadoop2.6.0 ,hive 1.2.1,spark 1.6.0.Hive is successfully running in mapreduce engine.Now I am trying Hive on Spark engine.Individually all are working properly.In Hive I set property as
set hive.execution.engine=spark;
set spark.master=spark://INBBRDSSVM294:7077;
set spark.executor.memory=2g;
set spark.serializer=org.apache.spark.serializer.KryoSerializer;
Added spark -asembly jar in hive lib.
and I am trying this command,
select count(*) from sample;
I am getting like this,
Starting Spark Job = b1410161-a414-41a9-a45a-cb7109028fff
Status: SENT
Failed to execute spark task, with exception 'java.lang.IllegalStateException(RPC channel is closed.)'
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.spark.SparkTask
Am I missing any other settings required,please guide me.
I think the problem may be because you use incompatible versions. If you see the version compatibility on Hive on Spark: Getting Started, you'll see that these two specific versions don't ensure the correct work.
I advise you to change the version and use the compatibility version that they advise. I had same problem and I solved when change the versions for compatibility versions.

Timestamp out of range - PostgreSQL with OLEDB .NET

We have a .Net application which stores data in PostgreSQL. We have a perfectly working code with PostgreSQL 8.3 and now we are trying to upgrade our PostgreSQL server from version 8.3 to 9.3 and our code seems to break.
For connecting PostgeSQL we are using OLEDB.
The issue we are getting is “Timestamp out of range”. When looked through the logs we are receiving weird timestamp “152085-04-28 06:14:51.818821”.
From our application We are trying to pass a value from .Net code to postgreSQL function which is of type timestamp. As we are using OLEDB for connections, we are using parameter type as OleDbType.DBTimeStamp and sending date time value from .Net code. This code works in PostgreSQL 8.3 but breaks in 9.3. From the logs of Postgresql 9.3 the parameter value which we are receiving is “152085-04-28 06:14:51.818821”.
We tried to execute the same function using npgsql provider from sample .net code by passing Date time value and giving parameter type as NpgsqlDbType.TimestampTZ with this we are getting correct results. From the logs of PostgreSQL the parameter values received at the function is shown as “E'2014-01-30 12:17:50.804220'::timestamptz”.
Tried in other versions of postgresql i.e. 9.1, 9.2, 9.3 and was breaking in all these versions.
Any Idea why this is breaking in other versions of PostgreSQL when perfectly working in 8.3?
Thanks