I'm trying to convert an SQL Server query to execute it into a Notebook, but I can't figure out how to convert a "CROSS APPLY" into something that Spark can understand.
Here is my SQL Server query :
WITH Benef as (
SELECT DISTINCT
IdBeneficiaireSource
,Adress
FROM
UPExpBeneficiaryStaging
)
-------- Split Adress --------
,AdresseBenefTemp1 as (
SELECT
IdBeneficiaireSource
,REPLACE(REPLACE(Adress, char(10), '|'), char(13), '|') as AdresseV2
FROM
Benef
)
,AdresseBenefTemp2 as (
SELECT
IdBeneficiaireSource
,value as Adresse
,ROW_NUMBER() OVER(PARTITION BY IdBeneficiaireSource ORDER BY (SELECT NULL)) as LigneAdresse
FROM
AdresseBenefTemp1
CROSS APPLY string_split(AdresseV2, '|')
)
,AdresseBenefFinal as (
SELECT DISTINCT
a.IdBeneficiaireSource
,b.Adresse as Adresse_1
,c.Adresse as Adresse_2
,d.Adresse as Adresse_3
FROM
AdresseBenefTemp2 as a
LEFT JOIN AdresseBenefTemp2 as b on b.IdBeneficiaireSource = a.IdBeneficiaireSource AND b.LigneAdresse = 1
LEFT JOIN AdresseBenefTemp2 as c on c.IdBeneficiaireSource = a.IdBeneficiaireSource AND c.LigneAdresse = 2
LEFT JOIN AdresseBenefTemp2 as d on d.IdBeneficiaireSource = a.IdBeneficiaireSource AND d.LigneAdresse = 3
)
-------------------------------
SELECT
a.IdBeneficiaireSource
,Adresse_1
,Adresse_2
,Adresse_3
FROM
AdresseBenefFinal
(This query split an address field into three address fields)
When I run it into a Notebook, it says that "CROSS APPLY" is not correct.
Thanks.
Correct me if I'm wrong, but the cross apply string_split is basically a cross join for each entry in the resulting split.
In Spark you're able to use an explode for this (https://docs.databricks.com/sql/language-manual/functions/explode.html). So you should be able to add another CTE in between where you explode the splitted (https://docs.databricks.com/sql/language-manual/functions/split.html) results from AddresseV2 by '|'.
Is it possible to transform the following view :
to this structure ?
I've tried with cross join but I have no idea how to make a condition based on column name.
You need APPLY instead of JOIN to be able to access outer columns
SELECT t.Date, v.[1], v.[2], v.number
FROM Table t
CROSS APPLY (VALUES
(t.[1], CAST(NULL AS int), 1),
(NULL, t.[2], 2)
) v ([1], [2], number)
SQL Query :
SELECT id,mti,[24] AS nii,mid,tid,[63-D9] AS TxnType,[63-DB] AS batchStatus ,
[39] AS respondCode,[61] AS batchNumber, uploadStatus
FROM (SELECT e.id,m.capDateTime,mti,procCode,mid,tid,uploadStatus,txnDate,txnTime,fieldNumber,fieldData FROM dbo.iso_fields e
JOIN dbo.iso_main m ON e.id = m.id) a PIVOT (MAX(fieldData) FOR fieldNumber IN
([0],[1],[2],[3],[5],[6],[7],[8],[9],[10],[12],[13],[14],[15],[16],[17],[18],[19],[20], [21],[22],[23],[24],[25],[26],[27],[28],[29],[30],
[31],[32],[33],[34],[35],[36],[39],[40], [41],[42],[43],[44],[45],[46],[47],[48],[49],[50],[51],[52],[53],[54],[55],[56],[57],[58],[59],[60],
[61],[63],[64],[63-D9],[63-DB]))PIV
Im trying to Update the table for example :
UPDATE PIV SET batchStatus = 'C'
SELECT id,mti,[24] AS nii,mid,tid,[63-D9] AS TxnType,[63-DB] AS batchStatus ,
[39] AS respondCode,[61] AS batchNumber, uploadStatus
FROM (SELECT e.id,m.capDateTime,mti,procCode,mid,tid,uploadStatus,txnDate,txnTime,fieldNumber,fieldData FROM dbo.iso_fields e
JOIN dbo.iso_main m ON e.id = m.id) a PIVOT (MAX(fieldData) FOR fieldNumber IN
([0],[1],[2],[3],[5],[6],[7],[8],[9],[10],[12],[13],[14],[15],[16],[17],[18],[19],[20], [21],[22],[23],[24],[25],[26],[27],[28],[29],[30],
[31],[32],[33],[34],[35],[36],[39],[40], [41],[42],[43],[44],[45],[46],[47],[48],[49],[50],[51],[52],[53],[54],[55],[56],[57],[58],[59],[60],
[61],[63],[64],[63-D9],[63-DB]))PIV
WHERE nii = '0000'
Im pretty sure that is not the correct syntax and im so horrible in pivot table's
After reviewing your query it seems you are updating a PIVOT "PIV" which is not possible. Pivot is with aggregation clause only. You can't update it.
If I understand your problem correctly you want to refresh the PIVOT data and if this assumption is correct then you need to update the main tables which you are using in PIVOT Or use a temporary tables to reflect the updated data.
I have two tables with similar records. I have the result as follows:
using the following query
Select
New.ParentId
,New.FatherFirstName
,New.FatherLastName
from ParentsUpdationDetails New
where New.parentId=15999
union all
select
Old.ParentId
,Old.FatherFirstName
,Old.FatherLastName
from parents Old
where Old.parentId=15999
I need to unpivot and want the following output:
you should be able to handle this using CROSS APPLY with a few Table Value Constructors and combining them using INNER JOIN
when you use Cross Apply with (VALUES (Field1), (Field2)) it acts similar to UNPIVOT in that you get a row for each Field you list in your TVC
SELECT ca.Field, ca.New, lj.Old
FROM ParentsUpdationDetails new
CROSS APPLY (
VALUES ('ParentID', CAST(ParentID AS VARCHAR)), -- All datatypes must match
('FatherFirstName', FatherFirstName),
('FatherLastName', FatherLastName)
) ca(Field, New)
INNER JOIN (
SELECT ParentID, Field, Old
FROM Parents old
CROSS APPLY (
VALUES ('ParentID', CAST(ParentID AS VARCHAR)), -- All datatypes must match
('FatherFirstName', FatherFirstName),
('FatherLastName', FatherLastName)
) ca(Field, Old)
) lj ON new.ParentID = lj.ParentID AND ca.Field = lj.Field
WHERE new.ParentID = 15999
be aware that you will be converting non varchar datatypes to varchars in order for this to work
A quick background so that my problem makes sense: The system collects data from the user in the form of questionnaires. Users belong to Organisations, Organisations belong to Sectors, and Questions/Calculations (as found on the questionnaires) differ across the Sectors. (Questions are answered by users ; Calculations are calculated by the system).
The following tables exist:
Sectors (SectorID, Name)
Organisations (OrganisationID, Name, SectorID)
Years (YearID, Name)
Questions (QuestionID, DisplayText, CommonName, SectorID)
Answers (AnswerID, Answer, OrganisationID, YearID, QuestionID)
Calculations (CalculationID, DisplayText, CommonName, SectorID)
CalculationResults (CalculationResultID, Result, OrganisationID, YearID, CalculationID)
I need to display data in the following way:
The thing that makes this particularly complex (for me) is that questions are displayed (to the user) in different ways across the different sectors that they belong to, but some of them can still be common questions. E.g. "Manufacturing sales" would be the same thing as "Sales (manufacturing)". I need to be using the CommonName field to determine commonality.
I've managed to use SQL Pivot to get close to what I want - SQL Fiddle (if you run the SQL you'll notice the nulls and the "commonality" issue). However some things are missing from my attempt:
Commonality and column names - I need the column names to be the CommonName field, not the QuestionID field.
I've only selected from the Answers table - I need to also select from the CalculationResults table which is identically structured.
Edit: Desired result with the SQL Fiddle data is:
(The two blocks with the orange corners need to shift all the way to the left, so that there are a total of 3 columns for the Questions - the 3 unique CommonName values. The next 3 columns are for the 3 unique CommonName values for Calculations. I hope I've made sense, if not let me know.)
Edit2: Another edit just for fun. I've definitely thought about redesigning the db but it's not an option at this stage - too risky on this legacy system. In case anyone saw the design and thought that. I need a solution in the form of Pivot hopefully.
Sometimes instead of PIVOT you can use [Aggregate](CASE EXPRESSION) to get the same data. And sometimes it's faster.
For your problem you can use OUTER APPLY with dynamic MAX(CASE)
DECLARE #Questions NVARCHAR(MAX),
#Calculations NVARCHAR(MAX),
#Sql NVARCHAR(MAX)
SELECT #Questions = COALESCE(#Questions + ', ', '')
+ 'MAX(CASE WHEN q.CommonName = ''' + CommonName + ''' THEN a.Answer END) AS ' + QUOTENAME(CommonName)
FROM Questions
GROUP BY CommonName
SELECT #Calculations = COALESCE(#Calculations + ', ', '')
+ 'MAX(CASE WHEN c.CommonName = ''' + CommonName + ''' THEN cr.Result END) AS ' + QUOTENAME(CommonName)
FROM Calculations
GROUP BY CommonName
SET #Sql = N'
SELECT
o.Name As [Organisation],
y.Name As [Year],
q.*,
c.*
FROM
Organisations o
CROSS JOIN Years y
OUTER APPLY (
SELECT ' + #Questions + '
FROM Answers a
JOIN Questions q ON a.QuestionID = q.QuestionID
WHERE a.OrganisationID = o.OrganisationID
AND a.YearID = y.YearID
) q
OUTER APPLY (
SELECT ' + #Calculations + '
FROM CalculationResults cr
JOIN Calculations c ON cr.CalculationID = c.CalculationID
WHERE cr.OrganisationID = o.OrganisationID
AND cr.YearID = y.YearID
) c
'
SQL FIDDLE DEMO
Basically we want to get the order of the QuestionID Grouped By SectorID, and Name.
You can do this using PARTITION BY with something like this:
ROW_NUMBER() OVER(PARTITION BY q.SectorID, y.Name ORDER BY a.QuestionID)
this should do it:
DECLARE #cols AS NVARCHAR(MAX)
, #query AS NVARCHAR(MAX);
SELECT #cols = STUFF(
(SELECT DISTINCT
','+QUOTENAME(CAST(ROW_NUMBER() OVER(PARTITION BY q.SectorID
, y.Name ORDER BY a.QuestionID) AS VARCHAR(10)))
FROM Answers a
LEFT JOIN Years y ON a.YearID = y.YearID
LEFT JOIN Organisations o ON a.OrganisationID = o.OrganisationID
LEFT JOIN Questions q ON a.QuestionID = q.QuestionID
FOR XML PATH(''), TYPE).value
('.', 'NVARCHAR(MAX)'), 1, 1, '');
SET #query = '
SELECT Organisation, Year, '+#cols+' from
(
SELECT QuestionID = ROW_NUMBER() OVER(PARTITION BY q.SectorID
, y.Name ORDER BY a.QuestionID)
, o.Name AS Organisation
, y.Name AS Year
, a.Answer
FROM Answers a
LEFT JOIN Years y ON a.YearID = y.YearID
LEFT JOIN Organisations o ON a.OrganisationID = o.OrganisationID
LEFT JOIN Questions q ON a.QuestionID = q.QuestionID
) src
pivot
(
max(Answer)
for QuestionID in ('+#cols+')
) piv
order by Organisation, Year
';
PRINT(#query);
EXECUTE (#query);
RESULT: