How to overcome subquery in Hive for CASE statement

How to overcome subquery in Hive for CASE statement - hive

I have to populate the column value to 'Y' if the condition is true, Or else 'N'. But as in hive it does not support subquery in case stetement , how can this be return in HIVE.
(case
when exists
(select 1
from fntable fs
join dfntable dfs
on fs.id = dfs.id
and dfs.datetime =
(select max (cd.datetime)
from dfntable cd group by id)
and fs.s_id = dfs.s_id) then 'Y'
else 'N')"

Using subquery with analytic function + left join. When joined then 'Y':
select case when cd.id is not null then 'Y' else 'N' end
from fntable fs
left join
( --group by cd.id, cd.s_id and filter
select cd.id, cd.s_id
from
( select max (cd.datetime) over (partition by dc.id) as max_id_datetime,
cd.id, cd.s_id, cd.datetime
from dfntable cd
) cd
where cd.datetime=max_id_datetime --filter only records with max date
group by cd.id, cd.s_id
) cd on fs.id = dfs.id and fs.s_id = dfs.s_id

Related

Re-writing EXISTS as JOIN or a subquery in Oracle

I have a query which is very costly and taking more than an hour to execute. I tried converting the EXISTS clause to join but I am stuck, can anyone help?
The purpose is to find duplicate product within a unique space id. FLAT_TABLE consists of 5 million records.
Query:
select
tbl1.product,
tbl1.status,
tbl1.reservation,
tbl1.unique_space_id
FROM
schema1.flat_table tbl1
WHERE
tbl1.status = 'Active'
AND tbl1.product = 'Cage'
AND EXISTS
(SELECT 1
FROM schema1.flat_table tbl2
WHERE tbl2.product = 'Cage'
AND tbl2.status = 'Active'
AND tbl2.reservation <> 'Space Reserved'
AND tbl1.unique_space_id = tbl2.unique_space_id
GROUP BY tbl2.unique_space_id
HAVING COUNT (1) > 1
);

You can use analytical function count as follows:
select * from
(select tbl1.product, tbl1.status, tbl1.reservation, tbl1.unique_space_id,
count(case when tbl1.reservation <> 'Space Reserved' then 1 end)
over(partition by tbl1.unique_space_id) as cnt
FROM schema1.flat_table tbl1
WHERE tbl1.status = 'Active' AND tbl1.product = 'Cage')
where cnt > 1

You could rewrite your query as an inner join to the current exists subquery. The join would have the effect of filtering in the same way the exists clause was behaving.
SELECT DISTINCT
tbl1.product,
tbl1.status,
tbl1.reservation,
tbl1.unique_space_id
FROM schema1.flat_table tbl1
INNER JOIN
(
SELECT unique_space_id
FROM schema1.flat_table
WHERE product = 'Cage' AND
status = 'Active' AND
reservation <> 'Space Reserved'
GROUP BY unique_space_id
HAVING COUNT(*) > 1
) tbl2
ON tbl2.unique_space_id = tbl1.unique_space_id
WHERE
tbl1.status = 'Active' AND
tbl1.product = 'Cage';
Here is a more concise version using COUNT as an analytic function, along with a QUALIFY clause;
SELECT DISTINCT product, status, reservation, unique_space_id
FROM schema1.flat_table
WHERE status = 'Active' AND product = 'Cage'
QUALIFY COUNT(CASE WHEN reservation <> 'Space Reserved' THEN 1 END)
OVER (PARTITION BY unique_space_id) > 1;

Combine two queries to get the data in two columns

SELECT
tblEmployeeMaster.TeamName, SUM(tblData.Quantity) AS 'TotalQuantity'
FROM
tblData
INNER JOIN
tblEmployeeMaster ON tblData.EntryByHQCode = tblEmployeeMaster.E_HQCode
INNER JOIN
tblPhotos ON tblEmployeeMaster.TeamNo = tblPhotos.TeamNo
WHERE
IsPSR = 'Y'
GROUP BY
tblPhotos.TeamSort, tblPhotos.TeamNo, tblPhotos.Data,
tblEmployeeMaster.TeamName
ORDER BY
tblPhotos.TeamSort DESC, TotalQuantity DESC
This returns
Using this statement
select TeamName, count(TeamName) AS 'Head Count'
from dbo.tblEmployeeMaster
where IsPSR = 'Y'
group by teamname
Which returns
I would like to combine these 2 queries in 1 to get the below result.
Tried union / union all but no success :(
Any help will be very much helpful.

You can simply use the sub-query as follows:
SELECT tblEmployeeMaster.TeamName, SUM(tblData.Quantity) AS 'TotalQuantity',
MAX(HEAD_COUNT) AS HEAD_COUNT, -- USE THIS VALUE FROM SUB-QUERY
CASE WHEN MAX(HEAD_COUNT) <> 0
THEN SUM(tblData.Quantity)/MAX(HEAD_COUNT)
END AS PER_MAN_CONTRIBUTION -- column asked in comment
FROM tblData INNER JOIN
tblEmployeeMaster ON tblData.EntryByHQCode = tblEmployeeMaster.E_HQCode INNER JOIN
tblPhotos ON tblEmployeeMaster.TeamNo = tblPhotos.TeamNo
-- FOLLOWING SUB-QUERY CAN BE USED
LEFT JOIN (select TeamName, count(TeamName) AS HEAD_COUNT
from dbo.tblEmployeeMaster
where IsPSR = 'Y' group by teamname) AS HC
ON HC.TeamName = tblEmployeeMaster.TeamName
where IsPSR = 'Y'
GROUP BY tblPhotos.TeamSort, tblPhotos.TeamNo, tblPhotos.Data,tblEmployeeMaster.TeamName
order by tblPhotos.TeamSort desc, TotalQuantity desc

How can I get value from above row with same ID if row is null?

I'm new with SQL and I encountered this.
Here's a screenshot
My question is how can I make all the NULL on primary_payer_id get the value above only if they have same ClientID and ActionCode IS NOT NULL. FYI CustomColumn is only a copy of primary_payer_id ALSO I attached my code here below.
Here's my code:
SELECT ci.client_id AS ClientID,
ci.primary_payer_id,
ci.effective_date AS EffectiveDate,
ci.action_code_id AS ActionCode,
cc.item_description AS ItemDesc,
ap.description AS IDescription,
ci.deleted
FROM census_item ci
LEFT JOIN common_code cc ON ci.adt_tofrom_id = cc.item_id
LEFT JOIN ar_lib_payers ap ON ci.primary_payer_id = ap.payer_id
WHERE ci.deleted = 'N'

There might be a more efficient method, but this will work:
with t as (
<your query here>
)
select t.*,
(case when t.actioncode is not null and t.clientid is null
then tprev.clientid
else t.clientid
end) as new_clientid
from t outer apply
(select top 1 tprev.*
from t tprev
where tprev.clientid = t.clientid and
tprev.effectivedate < t.effectivedate
order by tprev.effecctivedate desc
) tprev;

How can I optimize the SQL query?

I have a query an SQL query as follows, can anybody suggest any optimization for this; I think most of the effort is being done for the Union operation - is there anything else can be done to get the same result ?
Basically I wanna query first portion of the UNION and if for each record there is no result then the second portion need to be run. Please help.
:
SET dateformat dmy;
WITH incidentcategory
AS (
SELECT 1 ord, i.IncidentId, rl.Description Category FROM incident i
JOIN IncidentLikelihood l ON i.IncidentId = l.IncidentId
JOIN IncidentSeverity s ON i.IncidentId = s.IncidentId
JOIN LikelihoodSeverity ls ON l.LikelihoodId = ls.LikelihoodId AND s.SeverityId = ls.SeverityId
JOIN RiskLevel rl ON ls.RiskLevelId = rl.riskLevelId
UNION
SELECT 2 ord, i.incidentid,
rl.description Category
FROM incident i
JOIN incidentreportlikelihood l
ON i.incidentid = l.incidentid
JOIN incidentreportseverity s
ON i.incidentid = s.incidentid
JOIN likelihoodseverity ls
ON l.likelihoodid = ls.likelihoodid
AND s.severityid = ls.severityid
JOIN risklevel rl
ON ls.risklevelid = rl.risklevelid
) ,
ic AS (
SELECT ROW_NUMBER() OVER (PARTITION BY i.IncidentId ORDER BY (CASE WHEN incidentTime IS NULL THEN GETDATE() ELSE incidentTime END) DESC,ord ASC) rn,
i.incidentid,
dbo.Incidentdescription(i.incidentid, '',
'',
'', '')
IncidentDescription,
dbo.Dateconverttimezonecompanyid(closedtime,
i.companyid)
ClosedTime,
incidenttime,
incidentno,
Isnull(c.category, '')
Category,
opencorrectiveactions,
reportcompleted,
Isnull(classificationcompleted, 0)
ClassificationCompleted,
Cast (( CASE
WHEN closedtime IS NULL THEN 0
ELSE 1
END ) AS BIT)
IncidentClosed,
Cast (( CASE
WHEN investigatorfinishedtime IS NULL THEN 0
ELSE 1
END ) AS BIT)
InvestigationFinished,
Cast (( CASE
WHEN investigationcompletetime IS NULL THEN 0
ELSE 1
END ) AS BIT)
InvestigationComplete,
Cast (( CASE
WHEN investigatorassignedtime IS NULL THEN 0
ELSE 1
END ) AS BIT)
InvestigatorAssigned,
Cast (( CASE
WHEN (SELECT Count(*)
FROM incidentinvestigator
WHERE incidentid = i.incidentid
AND personid = 1588
AND tablename = 'AdminLevels') = 0
THEN 0
ELSE 1
END ) AS BIT)
IncidentInvestigator,
(SELECT dbo.Strconcat(osname)
FROM (SELECT TOP 10 osname
FROM incidentlocation l
JOIN organisationstructure o
ON l.locationid = o.osid
WHERE incidentid = i.incidentid
ORDER BY l.locorder) loc)
Location,
Isnull((SELECT TOP 1 teamleader
FROM incidentinvestigator
WHERE personid = 1588
AND tablename = 'AdminLevels'
AND incidentid = i.incidentid), 0)
TeamLeader,
incidentstatus,
incidentstatussearch
FROM incident i
LEFT OUTER JOIN incidentcategory c
ON i.incidentid = c.incidentid
WHERE i.isdeleted = 0
AND i.companyid = 158
AND incidentno <> 0
--AND reportcompleted = 1
--AND investigatorassignedtime IS NOT NULL
--AND investigatorfinishedtime IS NULL
--AND closedtime IS NULL
),
ic2 AS (
SELECT * FROM ic WHERE rn=1
)
SELECT * FROM ic2
--WHERE rownumber >= 0
-- AND rownumber < 0 + 10
--WHERE ic2.incidentid in(53327,53538)
--WHERE ic2.incidentid = 53338
ORDER BY incidentid DESC
Following is the execution plan I got:
https://www.dropbox.com/s/50dcpelr1ag4blp/Execution_Plan.sqlplan?dl=0

There are several issues:
1) use UNION ALL instead of UNION ALL to avoid the additional operation to aggregate the data.
2) try to modify the numerous function calls (e.g. dbo.Incidentdescription() ) to be an in-lie table valued function so you can reference it using CROSS APPLY or OUTER APPLY. Especially, if those functions referencing a table again.
3) move the subqueries from the SELECT part of the query to the FROM part using CROSS APPLY or OUTER APPLY again.
4) after the above is done, check the execution plan again for any missing indexes. Also, run the query with STATISTICS TIME, IO on to verify that the number of times a table
is referenced is correct (sometimes the execution plan put you in the wrong direction, especially if function calls are involved)...

Since the first inner query produces rows with ord=1 and the second produces rows with ord=2, you should use UNION ALL instead of UNION. UNION will filter out equal rows and since you will never get equal rows it is more efficient to use UNION ALL.
Also, rewrite your query to not use the WITH construct. I've had very bad experiences with this. Just use regular derived tables instead. In the case the query is still abnormally slow, try to serialize some derived tables to a temporary table and query the temporary table instead.

Try alternate approach by removing
(SELECT dbo.Strconcat(osname)
FROM (SELECT TOP 10 osname
FROM incidentlocation l
JOIN organisationstructure o
ON l.locationid = o.osid
WHERE incidentid = i.incidentid
ORDER BY l.locorder) loc)
Location,
Isnull((SELECT TOP 1 teamleader
FROM incidentinvestigator
WHERE personid = 1588
AND tablename = 'AdminLevels'
AND incidentid = i.incidentid), 0)
TeamLeader
from the SELECT. Avoid using complex functions/sub-queries in select.

How do I compare SUM and COUNT()s in SQL?

I am building a small query to find all CustomerNumbers where all of their policies are in a certain status (terminated).
Here is the query I am working on
select
a.cn
,p.pn
, tp = COUNT(p.pn)
, tp2 = SUM(case when p.status = 4 then 1 else 0 end)
from
(
select cn, cn2
from bc
union
select cn, cn2= fn
from ic
) as a
left join p as p
on a.cn = p.cn
group by
a.cn,
pn
My issue is when I add the clause:
WHERE cn = tp
It says the columns are invalid. Am I missing something incredibly obvious?

You can't use aliases at the same level of a query. The reason is that the where clause is logically evaluated before the select, so the aliases defined in the select are not available in the where.
A typical solution is to repeat the expression (other answers) or use a subquery or cte:
with cte as (
<your query here>
)
select cte.*
from cte
where TotalPolicies = TermedPolicies;
However, in your case, you have an easier solution, because you have an aggregation query. So just use:
having TotalPolicies = TermedPolicies

You cannot use the aliased aggregate column names in the where clause. You have to use the expression itself instead. Also you cannot use it as where cluase, but use it in the having clause
HAVING COUNT(p.PolicyNumber) = SUM(case when p.status = 4 then 1 else 0 end)

You can also make the whole query as a subquery then add your where statement:
select CustomerNumber
,PolicyNumber
,TotalPolicies
,TermedPolicies
from (
select
a.CustomerNumber
,p.PolicyNumber
, TotalPolicies = COUNT(p.PolicyNumber)
, TermedPolicies = SUM(case when p.status = 4 then 1 else 0 end)
from
(
select CustomerNumber, CompanyName
from BusinessClients
union
select CustomerNumber, CompanyName = FullName
from IndividualClients
) as a
left join Policies as p
on a.CustomerNumber = p.CustomerNumber
group by
a.CustomerNumber,
PolicyNumber
) tb
where TotalPolicies = TermedPolicies

select
a.CustomerNumber
,p.PolicyNumber
, COUNT(p.PolicyNumber) as TotalPolicies
, SUM(case when p.status = 4 then 1 else 0 end) as TermedPolicies
from
(
select CustomerNumber, CompanyName
from BusinessClients
union
select CustomerNumber, CompanyName = FullName
from IndividualClients
) as a
left join Policies as p
on a.CustomerNumber = p.CustomerNumber
WHERE COUNT(p.PolicyNumber)= SUM(case when p.status = 4 then 1 else 0 end)
group by
a.CustomerNumber,
PolicyNumber
This should work. But it is not tested.

In order to filter by an aggregate function, you must include it in the HAVING clause, rather than the WHERE clause.
select
a.CustomerNumber
,p.PolicyNumber
, TotalPolicies = COUNT(p.PolicyNumber)
, TermedPolicies = SUM(case when p.status = 4 then 1 else 0 end)
from
(
select CustomerNumber, CompanyName
from BusinessClients
union
select CustomerNumber, CompanyName = FullName
from IndividualClients
) as a
left join Policies as p
on a.CustomerNumber = p.CustomerNumber
having COUNT(p.PolicyNumber) = SUM(case when p.status = 4 then 1 else 0 end)
group by
a.CustomerNumber,
PolicyNumber
The reason for this has to do with the way SQL engines evaluate queries. The contents of the WHERE clause are used to filter out rows before the aggregate functions are applied. If you could reference aggregate functions there, the engine would have to have some way to determine which predicates to apply before aggregation and which to apply after. The HAVING clause allows the engine to have a clear demarcation between the two: WHERE applies before aggregation and HAVING applies after aggregation.

When dealing with aggregations in a query that has grouping, you will need to use HAVING. This should work:
select
a.CustomerNumber
,p.PolicyNumber
, TotalPolicies = COUNT(p.PolicyNumber)
, TermedPolicies = SUM(case when p.status = 4 then 1 else 0 end)
from
(
select CustomerNumber, CompanyName
from BusinessClients
union
select CustomerNumber, CompanyName = FullName
from IndividualClients
) as a
left join Policies as p
on a.CustomerNumber = p.CustomerNumber
group by
a.CustomerNumber,
PolicyNumber
HAVING TermedPolicies = SUM(case when p.status = 4 then 1 else 0 end)

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How to overcome subquery in Hive for CASE statement - hive

Related

Re-writing EXISTS as JOIN or a subquery in Oracle

Combine two queries to get the data in two columns

How can I get value from above row with same ID if row is null?

How can I optimize the SQL query?

How do I compare SUM and COUNT()s in SQL?

Categories

Resources