I'm trying to convert a piece of SQL code to HiveQL, and it's not working as expected.
Please find below the code snippet in SQL that I'm attempting to convert:
SQL Code:
UPDATE
C
SET
C.prod_l = P.prod_l, C.numprod = P.numprod, C.prod_cng = P.prod_cng
FROM
[cnc].dbo.[c_cnc_analysis] C
LEFT JOIN
(
SELECT
X.*,
Len(prod_l) - Len(Replace(prod_l, ' ~ ', ' ')) + 1 AS NumProd,
CASE
WHEN
Len(prod_l) - Len(Replace(prod_l, ' ~ ', ' ')) + 1 = 1
THEN
0
ELSE
1
END
AS PROD_CNG
FROM
(
SELECT DISTINCT
ST2.uitid,
Substring((
SELECT
' ~ ' + ST1.product_id AS [text()]
FROM
(
SELECT
[uitid],
[product_id]
FROM
dbo.[c_cnc_dedup_bse]
GROUP BY
[uitid],
[product_id]
)
ST1
WHERE
ST1.uitid = ST2.uitid
ORDER BY
ST1.uitid FOR xml path ('')), 4, 1000 ) [PROD_L]
FROM
(
SELECT
[uitid],
[product_id]
FROM
dbo.[c_cnc_dedup_bse]
GROUP BY
[uitid],
[product_id]
)
ST2
)
X
)
P
ON C.uitid = P.uitid;
Converted HIVE Query:
create
or replace view prd_temp as
SELECT
`UITID`,
`PRODUCT_ID`
FROM
`C_CNC_DEDUP_BSE`
GROUP BY
`UITID`,
`PRODUCT_ID`;
create
or replace view prd_temp2 as
SELECT
`UITID`,
`PRODUCT_ID`
FROM
`C_CNC_DEDUP_BSE`
GROUP BY
`UITID`,
`PRODUCT_ID`;
create
or replace view prd_temp3 as
SELECT
st1.`uitid`,
concat(' ~ ', st1.`PRODUCT_ID`) AS `text()`
FROM
prd_temp ST1
left join
prd_temp2 st2
on ST1.`UITID` = ST2.`UITID`
where
st1.`UITID` = st2.`UITID`
ORDER BY
ST1.`UITID`;
create
or replace view prd_temp4 as
SELECT
st1.`uitid`,
concat_ws('''', `text()`)
FROM
prd_temp3 ST1
ORDER BY
ST1.`UITID`;
create
or replace view st2 as
SELECT DISTINCT
`UITID`,
SUBSTRING(`_c1` , 4, 1000) as `PROD_L`
FROM
prd_temp4;
create
or replace view x as
SELECT
*,
LENGTH(PROD_L) - LENGTH(REPLACE(PROD_L, ' ~ ', ' ')) + 1 as NumProd,
CASE
WHEN
LENGTH(PROD_L) - LENGTH(REPLACE(PROD_L, ' ~ ', ' ')) + 1 = 1
then
0
ELSE
1
END
as PROD_CNG
from
ST2;
create table C_CNC_ANALYSIS1 as
select
c.*,
P.numprod as numprod,
p.prod_cng as prod_cng,
p.prod_l as prod_l
from
`C_CNC_ANALYSIS` C
LEFT JOIN
X P
ON C.UITID = P.UITID ;
SELECT
*
from
c_cnc_analysis1 limit 100;
Appreciate all the help with this. I think the code converted for the XML path is not working in HIVE, since I'm getting multiple UITIDs (key) and the information in separate rows rather than just one single record per UITID.
Thank You,
Viswanath Sitaraman
Related
I would like to transform this string:
A1+A2+A3.B1+B2.C1
into
A1.B1.C1
A1.B2.C1
A2.B1.C1
A2.B2.C1
A3.B1.C1
A3.B2.C1
How can I do that? (note that each dimension(= a group separate by .), could have x values, I mean it can be A1+A2.B1.C1 or A1+A2.B1+B2+B3+B4+B5.C1+C2)
Thanks
If you have only 3 columns, then just use STRING_SPLIT: number your groups from first split and then do a join 3 times and select each group on corresponding join.
with a as (
select s2.value as v, dense_rank() over(order by s1.value) as rn
from STRING_SPLIT('A1+A2+A3.B1+B2.C1', '.') as s1
cross apply STRING_SPLIT(s1.value, '+') as s2
)
select
a1.v + '.' + a2.v + '.' + a3.v as val
from a as a1
cross join a as a2
cross join a as a3
where a1.rn = 1
and a2.rn = 2
and a3.rn = 3
| val |
----------
|A1.B1.C1|
|A2.B1.C1|
|A3.B1.C1|
|A1.B2.C1|
|A2.B2.C1|
|A3.B2.C1|
If you have indefinite number of groups, then it's better to use recursive CTE instead of dynamic SQL. What you should do:
Start with all the values from the first group.
On recursion step crossjoin all the values of the next group (i.e. step group number is current group number + 1).
Select the last recursion step where you'll have the result.
Code is below:
with a as (
select s2.value as v, dense_rank() over(order by s1.value) as rn
from STRING_SPLIT('A1+A2+A3.B1+B2+B3+B4.C1+C2.D1+D2+D3', '.') as s1
cross apply STRING_SPLIT(s1.value, '+') as s2
)
, b (val, lvl) as (
/*Recursion base*/
select cast(v as nvarchar(1000)) as val, rn as lvl
from a
where rn = 1
union all
/*Increase concatenation on each iteration*/
select cast(concat(b.val, '.', a.v) as nvarchar(1000)) as val, b.lvl + 1 as lvl
from b
join a
on b.lvl + 1 = a.rn /*Recursion step*/
)
select *
from b
where lvl = (select max(rn) from a) /*You need the last step*/
order by val
I won't add a tabular result since it is quite big. But try it by yourself.
Here is SQL server version and fiddle:
with lst(s) as (select * from STRING_SPLIT('A1+A2.B1+B2+B3+B4+B5.C1+C2','.'))
select t1+'.'+t2+'.'+t3 as res from
(select * from STRING_SPLIT((select s from lst where s like 'A%'), '+')) s1(t1) cross join
(select * from STRING_SPLIT((select s from lst where s like 'B%'), '+')) s2(t2) cross join
(select * from STRING_SPLIT((select s from lst where s like 'C%'), '+')) s3(t3);
Of course you can grow it in a regular fashion if the number of dimensions grows.
Here is a Postgresql solution:
with x(s) as (select string_to_array('A1+A2.B1+B2+B3+B4+B5.C1+C2','.'))
select t1||'.'||t2||'.'||t3 as res from
unnest((select string_to_array(s[1],'+') from x)) t1 cross join
unnest((select string_to_array(s[2],'+') from x)) t2 cross join
unnest((select string_to_array(s[3],'+') from x)) t3;
result:
res |
--------|
A1.B1.C1|
A1.B2.C1|
A1.B3.C1|
A1.B4.C1|
A1.B5.C1|
A2.B1.C1|
A2.B2.C1|
A2.B3.C1|
A2.B4.C1|
A2.B5.C1|
A1.B1.C2|
A1.B2.C2|
A1.B3.C2|
A1.B4.C2|
A1.B5.C2|
A2.B1.C2|
A2.B2.C2|
A2.B3.C2|
A2.B4.C2|
A2.B5.C2|
Here my code with your help. I didn't mention, but I can also have more or less than 3 parts, so I'm using a dynamic SQL for this:
declare #FILTER varchar(max)='B+C+D.A+G.T+Y+R.E'
-- Works also with A.B.C
-- Works also with A+B+C.D.E+F
-- Works also with A+B+C.D+E+F+G+H
declare #NB int
declare #SQL varchar(max)=''
select #NB=count(*) from STRING_SPLIT(#FILTER,'.')
set #SQL='
;with T(A,B) as
(select *, row_number() over (order by (select NULL))
from STRING_SPLIT(''' + #FILTER + ''',''.'')
)
select '
;with T(V,N) as (
select *, row_number() over (order by (select NULL))
from STRING_SPLIT(#FILTER,'.')
)
select #SQL=#SQL + 'T' + cast(N as varchar(max)) + ' + ''.'' + ' from T
set #SQL=left(#SQL,len(#SQL)-1) + ' as res from'
;with T(V,N) as (
select *, row_number() over (order by (select NULL))
from STRING_SPLIT(#FILTER,'.')
)
select #SQL=#SQL + '
(select * from STRING_SPLIT((select A from T where B=' + cast(N as varchar(max)) + '), ''+'')) s' + cast(N as varchar(max)) + '(t' + cast(N as varchar(max)) + ') cross join'
from T
set #SQL=left(#SQL,len(#SQL)-len('cross join'))
exec(#SQL)
This query gives me all the information that I need, however I'm wanting to display it differently if possible. The current result: http://i.imgur.com/BFKGFSx.jpg
DECLARE #MainHospital varchar(50)='HOSPITAL 1';
SELECT MainEmail, chkOutpatient, chkPartB
FROM SurveyPicList
WHERE MainHospital = #MainHospital
GROUP BY MainHospital, MainEmail, chkOutpatient, chkPartB
I'm trying to return 2 different list of MainEmail comma-delimited if they = "on" in chkOutpatient and chkPartB. So only 2 cells of data as a result. 1 header of chkOutpatient with a list of comma dilmeted emails that = "on", and 1 header of chkPartB with the same.
So for chkPartB, something like this?http://i.imgur.com/RFlV24Q.jpg
SELECT DISTINCT ', ' + MainEmail AS chkPartB
FROM SurveyPicList
WHERE MainHospital = #MainHospital
AND chkPartB = 'on'
Please let me know if my question is unclear or if I need to give more info.
WITH Outpatients AS (
SELECT DISTINCT MainEmail
FROM SurveyPicList
WHERE MainHospital = #MainHospital
AND chkOutpatient = 'on'
)
,OutpatientsRawCsv AS (
SELECT (
SELECT ',' + MainEmail
FROM Outpatients
FOR XML PATH('')
) AS Csv
)
,PartBs AS (
SELECT DISTINCT MainEmail
FROM SurveyPicList
WHERE MainHospital = #MainHospital
AND chkPartB = 'on'
)
,PartBRawCsv AS (
SELECT (
SELECT ',' + MainEmail
FROM PartBs
FOR XML PATH('')
) AS Csv
)
SELECT STUFF(OutpatientsRawCsv.Csv, 1, 1, '') AS OutpatientsCsv
,STUFF(PartBRawCsv.Csv, 1, 1, '') AS PartBCsv
FROM OutpatientsRawCsv
CROSS JOIN PartBRawCsv
I have this tables:
T:
D:
What I am trying to do is to get for each s_id all it's symbols (DBSymbol) in one cell (merge cells).
I have found this tutorial, and here is my code:
select T.s_id,
(select '; ' + D.symbol
from D
where T.D_b_id = D.id
FOR XML PATH('')) [DBSymbol]
from T
but here is what I am getting:
What is wrong??
Try this -
SELECT t1.s_id,
STUFF(
(SELECT '; ' + symbol AS [text()]
FROM (
SELECT t.s_id,
d.symbol
FROM T
INNER JOIN D ON T.d_b_id = D.id
WHERE t.s_id = t1.s_id
) x
FOR XML PATH('')
), 1, 1, '')
FROM T t1
GROUP BY t1.s_id
Check it: SQL Fiddle
select DISTINCT T.s_id,
Stuff((SELECT DISTINCT '; ' + D.symbol
from D
--where T.D_b_id = D.id
FOR XML PATH('')),1,1,'') [DBSymbol]
from T
Example here
i have a table like following
RequestNo Facility status
1 BDC1 Active
1 BDC2 Active
1 BDC3 Active
2 BDC1 Active
2 BDC2 Active
i want like this
RequestNo Facilty Count
1 BDC (1,2,3) 1
2 BDC(1,2) 1
the count should display based on Status with facilty.Fcilityv should take as BDC only
Try this, (assuming that your facility is fixed 4 character code)
SELECT RequestNo, Fname + '(' + FnoList + ')' Facilty, count(*) cnt
FROM
(
SELECT distinct RequestNo,
SUBSTRING(Facility,1,3) Fname,
stuff((
select ',' + SUBSTRING(Facility,4,4)
from Dummy
where RequestNo = A.RequestNo AND
SUBSTRING(Facility,1,3) = SUBSTRING(A.Facility,1,3)
for xml path('')
) ,
1, 1, '') as FnoList
FROM Dummy A
) x
group by RequestNo, Fname, FnoList;
SQL DEMO
This doesn't put any constraints on the length of the Facility field. It strips out the chars from the beginning and the numeric numbers from the ending:
SELECT RequestNo, FacNameNumbers, COUNT(Status) as StatusCount
FROM
(
SELECT DISTINCT
t1.RequestNo,
t1.Status,
substring(facility, 1, patindex('%[^a-zA-Z ]%',facility) - 1) +
'(' +
STUFF((
SELECT DISTINCT ', ' + t2.fac_number
FROM (
select distinct
requestno,
substring(facility, 2 + len(facility) - patindex('%[^0-9 ]%',reverse(facility)), 9999) as fac_number
from facility
) t2
WHERE t2.RequestNo = t1.RequestNo
FOR XML PATH (''))
,1,2,'') + ')' AS FacNameNumbers
FROM Facility t1
) final
GROUP BY RequestNo, FacNameNumbers
And the SQL Fiddle
I have 2 tables. One is called "Tasks" and the other one is called "TaskDescription"
in my "Task" the setup looks like this:
"taskID(primary)","FileID","TaskTypeID" and a bunch of other columns irrelevant.
Then in my "TaskDescription", the setup looks like:
"TaskTypeID", "TaskTypeDesc"
so for example if TaskTypeID is 1 , then the description would be"admin"
or if TaskTypeID is 2, then TaskTypeDesc would be "Employee" etc.
The two tables have a relationship on the primary/foreign key "TaskTypeID".
What I am trying to do is get a task id, and the TaskDesc where the FileID matches the #fileID(which I pass in as a param). However in my query I get multiple rows returned instead of a single row when trying to obtain the description.
this is my query:
SELECT taskid,
( 'Task ID: '
+ Cast(cf.taskid AS NVARCHAR(15)) + ' - '
+ Cast((SELECT DISTINCT td.tasktypedesc FROM casefiletaskdescriptions
td JOIN
casefiletasks cft ON td.tasktypeid=cft.tasktypeid WHERE cft.taskid =
1841 )AS
NVARCHAR(100))
+ ' - Investigator : ' + ( Cast(i.fname AS NVARCHAR(20)) + ' '
+ Cast(i.lname AS NVARCHAR(20)) ) ) AS
'Display'
FROM casefiletasks [cf]
JOIN investigators i
ON CF.taskasgnto = i.investigatorid
WHERE cf.fileid = 2011630988
AND cf.concluded = 0
AND cf.progressflag != 'Conclude'
I am trying to get the output to look like "Task ID: 1234 - Admin - Investigator : John Doe". However I am having trouble on this part:
CAST((select DISTINCT td.TaskTypeDesc from CaseFileTaskDescriptions td
JOIN CaseFileTasks cft ON td.TaskTypeID=cft.TaskTypeID
where cft.TaskID =1841 )as nvarchar(100))
This seems to work but the problem is I have to hard code the value "1841" to make it work. Is there a way to assign a "taskID" variable with the values being returned from the TaskID select query, or will it not work since I think sql runs everything at once instead of line by line.
EDIT-this is in Microsoft SQL Server Management Studio 2008
You can dynamically reference a column that exists in your FROM set. In this case, it would be any column from casefiletasks or investigators. You would replace 1841 with the table.column reference.
Update
Replacing your static integer with the column reference, your query would look like:
SELECT taskid,
( 'Task ID: '
+ Cast(cf.taskid AS NVARCHAR(15)) + ' - '
+ Cast((SELECT DISTINCT td.tasktypedesc FROM casefiletaskdescriptions
td JOIN
casefiletasks cft ON td.tasktypeid=cft.tasktypeid WHERE cft.taskid =
cf.taskid )AS
NVARCHAR(100))
+ ' - Investigator : ' + ( Cast(i.fname AS NVARCHAR(20)) + ' '
+ Cast(i.lname AS NVARCHAR(20)) ) ) AS
'Display'
FROM casefiletasks [cf]
JOIN investigators i
ON CF.taskasgnto = i.investigatorid
WHERE cf.fileid = 2011630988
AND cf.concluded = 0
AND cf.progressflag != 'Conclude'
Would this work as your inner query?
SELECT DISTINCT td.TaskTypeDesc FROM CaseFileTaskDescriptions td
JOIN CaseFileTasks cft ON td.TaskTypeID = cft.TaskTypeID
WHERE cft.TaskID = cf.TaskID
Why not just do another join instead of a subquery?
SELECT taskid,
( 'Task ID: '
+ Cast(cf.taskid AS NVARCHAR(15)) + ' - '
+ Cast(td.tasktypedesc AS NVARCHAR(100))
+ ' - Investigator : ' + ( Cast(i.fname AS NVARCHAR(20)) + ' '
+ Cast(i.lname AS NVARCHAR(20)) ) ) AS
'Display'
FROM casefiletasks [cf]
JOIN investigators i
ON CF.taskasgnto = i.investigatorid
JOIN casefiletaskdescriptions td
ON td.tasktypeid = cf.tasktypeid
WHERE cf.fileid = 2011630988
AND cf.concluded = 0
AND cf.progressflag != 'Conclude'