How to convert this query for Spark SQL - sql

I'm trying to convert an SQL Server query to execute it into a Notebook, but I can't figure out how to convert a "CROSS APPLY" into something that Spark can understand.
Here is my SQL Server query :
WITH Benef as (
SELECT DISTINCT
IdBeneficiaireSource
,Adress
FROM
UPExpBeneficiaryStaging
)
-------- Split Adress --------
,AdresseBenefTemp1 as (
SELECT
IdBeneficiaireSource
,REPLACE(REPLACE(Adress, char(10), '|'), char(13), '|') as AdresseV2
FROM
Benef
)
,AdresseBenefTemp2 as (
SELECT
IdBeneficiaireSource
,value as Adresse
,ROW_NUMBER() OVER(PARTITION BY IdBeneficiaireSource ORDER BY (SELECT NULL)) as LigneAdresse
FROM
AdresseBenefTemp1
CROSS APPLY string_split(AdresseV2, '|')
)
,AdresseBenefFinal as (
SELECT DISTINCT
a.IdBeneficiaireSource
,b.Adresse as Adresse_1
,c.Adresse as Adresse_2
,d.Adresse as Adresse_3
FROM
AdresseBenefTemp2 as a
LEFT JOIN AdresseBenefTemp2 as b on b.IdBeneficiaireSource = a.IdBeneficiaireSource AND b.LigneAdresse = 1
LEFT JOIN AdresseBenefTemp2 as c on c.IdBeneficiaireSource = a.IdBeneficiaireSource AND c.LigneAdresse = 2
LEFT JOIN AdresseBenefTemp2 as d on d.IdBeneficiaireSource = a.IdBeneficiaireSource AND d.LigneAdresse = 3
)
-------------------------------
SELECT
a.IdBeneficiaireSource
,Adresse_1
,Adresse_2
,Adresse_3
FROM
AdresseBenefFinal
(This query split an address field into three address fields)
When I run it into a Notebook, it says that "CROSS APPLY" is not correct.
Thanks.

Correct me if I'm wrong, but the cross apply string_split is basically a cross join for each entry in the resulting split.
In Spark you're able to use an explode for this (https://docs.databricks.com/sql/language-manual/functions/explode.html). So you should be able to add another CTE in between where you explode the splitted (https://docs.databricks.com/sql/language-manual/functions/split.html) results from AddresseV2 by '|'.

Related

AWS Athena (Presto) '+' cannot be applied to varchar, varchar error

I'm having a bit of trouble with some Presto SQL that I have written in Athena. I get the following error which I'm a bit confused about:
SYNTAX_ERROR: line 46:39: '+' cannot be applied to varchar, varchar
Here is my script:
SELECT Duplicate.AircraftTypeCode,
Duplicate.LineNumber,
Serial.SerialNumber
FROM (SELECT *
FROM (SELECT DISTINCT TypeCode AS AircraftTypeCode,
LineNumber
FROM (SELECT acl.aircraft_type AS Type,
achl.aircraft_type_code_internal AS TypeCode,
acl.aircraft_line_number AS LineNumber,
row_number()
OVER (
partition BY aahl.aircraft_id
ORDER BY aahl.aircraft_id, aahl.start_event_date
DESC,
aahl.event_sequence_number DESC) AS Rown
FROM fleets.aircraft_all_history_latest aahl
LEFT OUTER JOIN fleets.aircraft_latest acl
ON aahl.aircraft_id = acl.aircraft_id
LEFT OUTER JOIN
fleets.aircraft_configuration_history_latest achl
ON acl.aircraft_id = achl.aircraft_id)
AH
WHERE linenumber IS NOT NULL
GROUP BY TypeCode,
LineNumber)LineNumber
GROUP BY AircraftTypeCode,
LineNumber
HAVING Count(LineNumber) > 1)Duplicate
LEFT OUTER JOIN (SELECT *
FROM (SELECT achl.aircraft_type_code_internal AS
TypeCode,
acl.aircraft_serial_number AS
SerialNumber,
acl.aircraft_line_number AS
LineNumber
FROM fleets.aircraft_all_history_latest aahl
LEFT OUTER JOIN fleets.aircraft_latest acl
ON aahl.aircraft_id =
acl.aircraft_id
LEFT OUTER JOIN fleets.aircraft_configuration_history_latest achl
ON acl.aircraft_id = achl.aircraft_id) SerialNumber
WHERE LineNumber IS NOT NULL
GROUP BY TypeCode,
SerialNumber,
LineNumber) Serial
ON Serial.TypeCode + Serial.LineNumber =
Duplicate.AircraftTypeCode
+ Duplicate.LineNumber
Everything I am using is of type String. Is there is something in Presto that i should be doing differently as my thinking is more along the lines of MSSQL
I assume you want to concatenate TypeCode and LineNumber in your JOIN condition. In Presto / Athena you need to use the CONCAT function or the || operator for that.

Why Row_Number in a view gives a nullable column

I have a view using a CTE and I want use a row number to simulate a key for my edmx in Visual Studio
ALTER VIEW [dbo].[ViewLstTypesArticle]
AS
WITH cte (IdTypeArticle, IdTypeArticleParent, Logo, Libelle, FullLibelle, Racine) AS
(
SELECT
f.Id AS IdTypeArticle, NULL AS IdParent,
f.Logo, f.Libelle,
CAST(f.Libelle AS varchar(MAX)) AS Expr1,
f.Id AS Racine
FROM
dbo.ArticleType AS f
LEFT OUTER JOIN
dbo.ArticleTypeParent AS p ON p.IdTypeArticle = f.Id
WHERE
(p.IdTypeArticleParent IS NULL)
AND (f.Affichable = 1)
UNION ALL
SELECT
f.Id AS IdTypeArticle, p.IdTypeArticleParent,
f.Logo, f.Libelle,
CAST(parent.Libelle + ' / ' + f.Libelle AS varchar(MAX)) AS Expr1,
parent.Racine
FROM
dbo.ArticleTypeParent AS p
INNER JOIN
cte AS parent ON p.IdTypeArticleParent = parent.IdTypeArticle
INNER JOIN
dbo.ArticleType AS f ON f.Id = p.IdTypeArticle
)
SELECT
*,
ROW_NUMBER() OVER (ORDER BY FullLibelle) AS Id
FROM
(SELECT
IdTypeArticle, IdTypeArticleParent, Logo, Libelle,
FullLibelle, Racine
FROM cte) AS CTE1
When I look in properties of column I see Id bigint ... NULL
And my edmx exclude this view cause don't find a column can be used to key
When I execute my view ID have no null. I've all my row number.
If someone encounter this problem and resolved it ... Thanks
SQL Server generally thinks that columns are NULL-able in views (and when using SELECT INTO).
You can convince SQL Server that this is not the case by using ISNULL():
select *,
ISNULL(ROW_NUMBER() over(ORDER BY FullLibelle), 0) as Id
from . . .
Note: This works with ISNULL() but not with COALESCE() which otherwise has very similar functionality.

How to convert list of comma separated Ids into their name?

I have a table that contains:
id task_ids
1 10,15
2 NULL
3 17
I have the table that has the names of this tasks:
id task_name
10 a
15 b
17 c
I want to generate the following output
id task_ids task_names
1 10,15 a,b
2 null null
3 17 c
I know this structure isn't ideal but this is legacy table which I will not change now.
Is there easy way to get the output ?
I'm using Presto but I think this can be solved with native sql
WITH data AS (
SELECT * FROM (VALUES (1, '10,15'), (2, NULL)) x(id, task_ids)
),
task AS (
SELECT * FROM (VALUES ('10', 'a'), ('15', 'b')) x(id, task_name)
)
SELECT
d.id, d.task_ids
-- array_agg will obviously capture NULL task_name comping from LEFT JOIN, so we need to filter out such results
IF(array_agg(t.task_name) IS NOT DISTINCT FROM ARRAY[NULL], NULL, array_agg(t.task_name)) task_names
FROM data d
-- split task_ids by `,`, convert into numbers, UNNEST into separate rows
LEFT JOIN UNNEST (split(d.task_ids, ',')) AS e(task_id) ON true
-- LEFT JOIN with task to pull the task name
LEFT JOIN task t ON e.task_id = t.id
-- aggregate back
GROUP BY d.id, d.task_ids;
You have a horrible data model, but you can do what you want with a bit of effort. Arrays are better than strings, so I'll just use that:
select t.id, t.task_id, array_agg(tt.task_name) as task_names
from t left join lateral
unnest(split(t.task_ids, ',')) u(task_id)
on 1=1 left join
tasks tt
on tt.task_id = u.task_id
group by t.id, t.task_id;
I don't have Presto on hand to test this. But this or some minor variant should do what you want.
EDIT:
This version might work:
select t.id, t.task_id,
(select array_agg(tt.task_name)
from unnest(split(t.task_ids, ',')) u(task_id) join
tasks tt
on tt.task_id = u.task_id
) as task_names
from t ;

How to unpivot the table with column Names

I have two tables with similar records. I have the result as follows:
using the following query
Select
New.ParentId
,New.FatherFirstName
,New.FatherLastName
from ParentsUpdationDetails New
where New.parentId=15999
union all
select
Old.ParentId
,Old.FatherFirstName
,Old.FatherLastName
from parents Old
where Old.parentId=15999
I need to unpivot and want the following output:
you should be able to handle this using CROSS APPLY with a few Table Value Constructors and combining them using INNER JOIN
when you use Cross Apply with (VALUES (Field1), (Field2)) it acts similar to UNPIVOT in that you get a row for each Field you list in your TVC
SELECT ca.Field, ca.New, lj.Old
FROM ParentsUpdationDetails new
CROSS APPLY (
VALUES ('ParentID', CAST(ParentID AS VARCHAR)), -- All datatypes must match
('FatherFirstName', FatherFirstName),
('FatherLastName', FatherLastName)
) ca(Field, New)
INNER JOIN (
SELECT ParentID, Field, Old
FROM Parents old
CROSS APPLY (
VALUES ('ParentID', CAST(ParentID AS VARCHAR)), -- All datatypes must match
('FatherFirstName', FatherFirstName),
('FatherLastName', FatherLastName)
) ca(Field, Old)
) lj ON new.ParentID = lj.ParentID AND ca.Field = lj.Field
WHERE new.ParentID = 15999
be aware that you will be converting non varchar datatypes to varchars in order for this to work

trying to concatenate a column into a comma delimited list

i have 3 tables, 1 for products and one for categories the products are assigned to. what IM trying to do is concatenate the column called stCategoryName to a single column in a comma delimited list.
Basically I have the products table containing the primary key for each product and im trying to figure out how to concatenate all the stcategoryName column next to each product so i can have a simplified return
what im trying to get is the following.
stProductID stCategoryName
123 category1,category2,category3
SELECT
dbo.StoreItemTracking.StCategoryID,
dbo.StoreItemTracking.StProductID,
dbo.StoreItemTracking.viewOrder,
dbo.StoreCategories.StCategoryName,
dbo.Store_Products.PartNumber
FROM
dbo.StoreItemTracking
INNER JOIN dbo.StoreCategories
ON dbo.StoreItemTracking.StCategoryID = dbo.StoreCategories.StCategoryID
INNER JOIN dbo.Store_Products
ON dbo.StoreItemTracking.StProductID = dbo.Store_Products.ID
Im stuck as to how to concatenate a column where the query contains 3 tables to select from.
any help greatly appreciated
Look at using coalesce to turn category into a CSV:
See example:
DECLARE #EmployeeList varchar(100)
SELECT #EmployeeList = COALESCE(#EmployeeList + ', ', '')
+ CAST(Emp_UniqueID AS varchar(5))
FROM SalesCallsEmployees
WHERE SalCal_UniqueID = 1
SELECT #EmployeeList
You can also use CTE's or Subqueries. See:
http://archive.msdn.microsoft.com/SQLExamples/Wiki/View.aspx?title=createacommadelimitedlist
Another nice and easy example:
http://www.codeproject.com/Articles/21082/Concatenate-Field-Values-in-One-String-Using-CTE-i
This:
FId FName
--- ----
2 A
4 B
5 C
6 D
8 E
with:
;WITH ABC (FId, FName) AS
(
SELECT 1, CAST('' AS VARCHAR(8000))
UNION ALL
SELECT B.FId + 1, B.FName + A.FName + ', '
FROM (And the above query will return
SELECT Row_Number() OVER (ORDER BY FId) AS RN, FName FROM tblTest) A
INNER JOIN ABC B ON A.RN = B.FId
)
SELECT TOP 1 FName FROM ABC ORDER BY FId DESC
becomes:
FName
----------------------------
A, B, C, D, E,
Don't understand how your products and categories are connected but in general I do like this to create comma separated lists.
SELECT table1.Id
,Csv
FROM table1
CROSS APPLY (
-- Double select so we can have an alias for the csv column
SELECT (SELECT ',' + table2.Name
FROM table2
WHERE table2.Id = table1.Id
FOR XML PATH('')
) AS RawCsv
) AS CA1
CROSS APPLY (
-- Trim the first comma
SELECT RIGHT(RawCsv, LEN(RawCsv) - 1) AS Csv
) AS CA2