HIVESQL Query Where Statement Assistance - sql

I have a query that I am pulling transactions from. I want to be able to pull transactions where my commodity field = A1 and the newvalue field <> A1 for any of the transactions for each individual case number. In other words, I have 2 case numbers with 5 transactions each, one case number has a transaction record of commodity = A1 and newvalue = A1. The other case has a record where commodity = A1 and newvalue= B2, this is the case I would like returned in the query. Keep in mind that the previous case may have that same transaction but it should not be returned because there is a record of newvalue = A1. I have attached an image and the records highlighted in yellow are what I expect my output to be. Below is my current "Where" statement that I need help re-writing. I was also told that I may need a "Group BY" statement which I tried and still pulled the same results.
SELECT
Allcases.caseno as caseno, Allcases.division_desc as division_desc, Allcases.close_date as close_date, Allcases.week_of as week_of,
Allcases.case_type as case_type,
a.transactdate as transactdate, a.transacttypeid as transacttypeid, a.userid as userid,
concat(RTRIM(Usr.fullname), ' <', RTRIM(Usr.EmailAddress), '>') as CR1_CR2_FULLNAME,
Allcases.commodity as commodity,
b.oldvalue as oldvalue, b.newvalue as newvalue, changereason as changereason
FROM
(
select b.*, sum(case when b.newvalue = 'A1' then 1 else 0 end) over(partition by Allcases.caseno) cnt_new_value_A1
from
dataiku.qca_casedatachange_parquet b
INNER JOIN dataiku.qcatransact_parquet a
ON b.transactid = a.transactid
INNER JOIN dataiku.qca_validated_cases_consolidated_parquet Allcases
ON a.casedataid = Allcases.casedataid
INNER JOIN dataiku.set_qca_reclassification_head_parquet h
ON Allcases.caseno = h.caseno
INNER JOIN dataiku.qca_user_parquet Usr
ON a.UserID = Usr.UserID
where
b.FieldID = 6
AND a.transacttypeid IN (1, 2, 3)
AND Allcases.commodity = 'A1'
)s
where cnt_new_value_A1 = 0
ORDER BY Allcases.caseno, A.transactid

If you do not want to return cases for which does not exists record with b.newvalue = 'A1' then calculate analytic sum() and use it in the where
select ...
from
(
select t.*,
sum(case when newvalue ='A1' then 1 else 0 end) over(partition by case_number) cnt_new_value_A1
from ...
where
FieldID = 6 --COMMODITY
AND transacttypeid IN (1, 2, 3)
AND commodity = 'A1'
)s
where cnt_new_value_A1 = 0 --No A1 in newvalue per case_number

Related

How to update data with multi condition

I have a table with 5 columns like this table below:
I wanna fill the approval column based on conditions such as:
APPROVAL = "Y" IN CASE
(1) CUSTOMER_NEW <> CUSTOMER (as in the case CUSTOMER = 12467)
(2) CUSTOMER_NEW = CUSTOMER AND SALE_SHORT_ID COLUMN HAS DIFFERENT VALUE (as in the case CUSTOMER = 13579)
(3) CUSTOMER_NEW = CUSTOMER AND SALE_SHORT_ID HAS THE SAME VALUE THEN MAX(LEN(SALE_ID) (as in the case CUSTOMER = 65465)
ELSE "N"
My result expected to like this table:
Thanks for your support.
I am not sure about all your your conditions but this one could do it:
UPDATE a
SET approval =
CASE WHEN
a.customer <> a.customer_new OR
a.customer = a.customer_new AND b.sale_short_eq = 0 OR
a.customer = a.customer_new AND a.sale_id = b.max_len_sale_id
THEN 'Y'
ELSE 'N'
END
FROM
thetable a
INNER JOIN (
SELECT
customer,
CASE WHEN MIN(sale_short_id) = MAX(sale_short_id) THEN 1 ELSE 0 END AS sale_short_eq,
( SELECT TOP 1 sale_id
FROM thetable
WHERE customer = c.customer
ORDER BY LEN(sale_id) DESC
) AS max_len_sale_id
FROM thetable c
GROUP BY customer
) b
ON a.customer = b.customer
Note that the columns b.sale_short_eq and b.max_len_sale_id are returned by a sub-query joined to the main query.
Especially SALE_SHORT_ID HAS THE SAME VALUE THEN MAX(LEN(SALE_ID) is not clear. Did you mean that SALE_ID must be the same as the maximum length SALE_ID? Because SALE_SHORT_ID cannot be the same than any SALE_ID. And what happens if two different SALE_IDs have the same max length?
See: https://dbfiddle.uk/7Ct87-Mk

SQL nested grouping issue

Here's my query:
select
cast(ar.AudienceCreationDate as date) as AudienceDate,
Count(*) as [Count],
count(case when ar.Source = 'Contact' then ar.Id end) as PatientCount,
count(case when ar.Source = 'PatientContact' then ar.Id end) as PatientContactCount,
(
select
count(*)
from
_SMSMessageTracking sms
inner join
[CTT Preferences] pref on pref.ContactId = sms.SubscriberKey
where
sms.Name <> 'ky_ctt_join' and pref.Source = 'Patient'
) as PatientSMS,
(
select
count(*)
from
_SMSMessageTracking sms
inner join
[CTT Preferences] pref on pref.ContactId = sms.SubscriberKey
where
sms.Name <> 'ky_ctt_join' and pref.Source = 'PatientContact'
) as PatientContactSMS
from
Daily_Symptom_Check_Audience_Archive ar
group by
cast(ar.AudienceCreationDate as date)
And here's the result set it creates:
The issue I'm having is that the values in the rightmost two columns are the same across the board, for all records. This number represents the TOTAL, and not aggregated by day, as the other values indicate. I realize that I'm doing something wrong - what can I do to modify my query to effectively have a proper "grouping" on these last two columns just like all the other data in this table?
in the where clause of the nested query add sms.AudienceDate = ar.AudienceDate

How to replace null field values with an empty string?

I need to replace the null values in the Estimate Number field with a blank string. I have tried the code below and the values still appear as null. Any thoughts? thanks
Pro Number Estimate Number
10271943 NULL
10271944 NULL
10271945 NULL
10271946 NULL
10271948 94606
SELECT a.AAAREFNUMVALUE AS "Pro Number",
(SELECT TOP 1 isnull(a2.AAAREFNUMVALUE,'')
FROM dbo.AAATOREFNUMS a2
WHERE a2.AAATRANSPORTTABLE = a.AAATRANSPORTTABLE AND
a2.AAAREFNUMTYPE = 4
ORDER BY a2.AAAREFNUMVALUE
) AS "Estimate Number"
FROM dbo.AAATOREFNUMS a
INNER JOIN dbo.AAATODATES d ON a.AAATRANSPORTTABLE = d.AAATRANSPORTTABLE
WHERE a.AAAREFNUMTYPE = 1 AND d.AAADATETYPE = 1
GROUP BY a.AAAREFNUMVALUE,a.AAATRANSPORTTABLE,a.AAAREFNUMTYPE;
The problem occurs when the subquery returns no values. The isnull() needs to go outside the subquery:
ISNULL( (SELECT TOP 1 a2.AAAREFNUMVALUE
FROM dbo.AAATOREFNUMS a2
WHERE a2.AAATRANSPORTTABLE = a.AAATRANSPORTTABLE AND
a2.AAAREFNUMTYPE = 4
ORDER BY a2.AAAREFNUMVALUE
), ''
) AS "Estimate Number"
Note that this is a situation where ISNULL() is preferred over COALESCE(), because the SQL Server implementation of COALESCE() evaluates the first argument twice when it is not NULL.
However, you might find the query easier to express and faster to run if you use window functions instead:
SELECT DISTINCT a.AAAREFNUMVALUE AS "Pro Number",
COALESCE(a.AAAREFNUMVALUE_4, '') as "Estimate Number"
FROM (SELECT a.*,
MAX(CASE WHEN a.AAAREFNUMTYPE = 4 THEN a.AAAREFNUMVALUE END) OVER (PARTITION BY a.AAATRANSPORTTABLE) as AAAREFNUMVALUE_4
FROM dbo.AAATOREFNUMS a
) a INNER JOIN
dbo.AAATODATES d
ON a.AAATRANSPORTTABLE = d.AAATRANSPORTTABLE
WHERE a.AAAREFNUMTYPE = 1 AND d.AAADATETYPE = 1 ;
Try case, just make the adjusts:
CASE
(SELECT TOP 1 a2.AAAREFNUMVALUE
FROM dbo.AAATOREFNUMS a2
WHERE a2.AAATRANSPORTTABLE = a.AAATRANSPORTTABLE AND
a2.AAAREFNUMTYPE = 4
ORDER BY a2.AAAREFNUMVALUE) as Estimate Number
WHEN NULL THEN ''
ELSE
a2.AAAREFNUMVALUE
END as "Estimate Number"
You may try this.
SELECT a.AAAREFNUMVALUE AS "Pro Number",
ISNULL((SELECT TOP 1 isnull(a2.AAAREFNUMVALUE,'')
FROM dbo.AAATOREFNUMS a2
WHERE a2.AAATRANSPORTTABLE = a.AAATRANSPORTTABLE AND
a2.AAAREFNUMTYPE = 4
ORDER BY a2.AAAREFNUMVALUE
),'') AS "Estimate Number"
FROM dbo.AAATOREFNUMS a
INNER JOIN dbo.AAATODATES d ON a.AAATRANSPORTTABLE = d.AAATRANSPORTTABLE
WHERE a.AAAREFNUMTYPE = 1 AND d.AAADATETYPE = 1
GROUP BY a.AAAREFNUMVALUE,a.AAATRANSPORTTABLE,a.AAAREFNUMTYPE;

SQL Server / T-SQL : query optimization assistance

I have this QA logic that looks for errors into every AuditID within a RoomID to see if their AuditType were never marked Complete or if they have two complete statuses. Finally, it picks only the maximum AuditDate of the RoomIDs with errors to avoid showing multiple instances of the same RoomID, since there are many audits per room.
The issue is that the AUDIT table is very large and takes a long time to run. I was wondering if there is anyway to reach the same result faster.
Thank you in advance !
IF object_ID('tempdb..#AUDIT') is not null drop table #AUDIT
IF object_ID('tempdb..#ROOMS') is not null drop table #ROOMS
IF object_ID('tempdb..#COMPLETE') is not null drop table #COMPLETE
IF object_ID('tempdb..#FINALE') is not null drop table #FINALE
SELECT distinct
oc.HotelID, o.RoomID
INTO #ROOMS
FROM dbo.[rooms] o
LEFT OUTER JOIN dbo.[hotels] oc on o.HotelID = oc.HotelID
WHERE
o.[status] = '2'
AND o.orderType = '2'
SELECT
t.AuditID, t.RoomID, t.AuditDate, t.AuditType
INTO
#AUDIT
FROM
[dbo].[AUDIT] t
WHERE
t.RoomID IN (SELECT RoomID FROM #ROOMS)
SELECT
t1.RoomID, t3.AuditType, t3.AuditDate, t3.AuditID, t1.CompleteStatus
INTO
#COMPLETE
FROM
(SELECT
RoomID,
SUM(CASE WHEN AuditType = 'Complete' THEN 1 ELSE 0 END) AS CompleteStatus
FROM
#AUDIT
GROUP BY
RoomID) t1
INNER JOIN
#AUDIT t3 ON t1.RoomID = t3.RoomID
WHERE
t1.CompleteStatus = 0
OR t1.CompleteStatus > 1
SELECT
o.HotelID, o.RoomID,
a.AuditID, a.RoomID, a.AuditDate, a.AuditType, a.CompleteStatus,
c.ClientNum
INTO
#FINALE
FROM
#ROOMS O
LEFT OUTER JOIN
#COMPLETE a on o.RoomID = a.RoomID
LEFT OUTER JOIN
[dbo].[clients] c on o.clientNum = c.clientNum
SELECT
t.*,
Complete_Error_Status = CASE WHEN t.CompleteStatus = 0
THEN 'Not Complete'
WHEN t.CompleteStatus > 1
THEN 'Complete More Than Once'
END
FROM
#FINALE t
INNER JOIN
(SELECT
RoomID, MAX(AuditDate) AS MaxDate
FROM
#FINALE
GROUP BY
RoomID) tm ON t.RoomID = tm.RoomID AND t.AuditDate = tm.MaxDate
One section you could improve would be this one. See the inline comments.
SELECT
t1.RoomID, t3.AuditType, t3.AuditDate, t3.AuditID, t1.CompleteStatus
INTO
#COMPLETE
FROM
(SELECT
RoomID,
COUNT(1) AS CompleteStatus
-- Use the above along with the WHERE clause below
-- so that you are aggregating fewer records and
-- avoiding a CASE statement. Remove this next line.
--SUM(CASE WHEN AuditType = 'Complete' THEN 1 ELSE 0 END) AS CompleteStatus
FROM
#AUDIT
WHERE
AuditType = 'Complete'
GROUP BY
RoomID) t1
INNER JOIN
#AUDIT t3 ON t1.RoomID = t3.RoomID
WHERE
t1.CompleteStatus = 0
OR t1.CompleteStatus > 1
Just a thought. Streamline your code and your solution. you are not effectively filtering your datasets smaller so you continue to query the entire tables which is taking a lot of your resources and your temp tables are becoming full copies of those columns without the indexes (PK, FK, ++??) on the original table to take advantage of. This by no means is a perfect solution but it is an idea of how you can consolidate your logic and reduce your overall data set. Give it a try and see if it performs better for you.
Note this will return the last audit record for any room that has either not had an audit completed or completed more than once.
;WITH cte AS (
SELECT
o.RoomId
,o.clientNum
,a.AuditId
,a.AuditDate
,a.AuditType
,NumOfAuditsComplete = SUM(CASE WHEN a.AuditType = 'Complete' THEN 1 ELSE 0 END) OVER (PARTITION BY o.RoomId)
,RowNum = ROW_NUMBER() OVER (PARTITION BY o.RoomId ORDER BY a.AuditDate DESC)
FROm
dbo.Rooms o
LEFT JOIN dbo.Audit a
ON o.RoomId = a.RoomId
WHERE
o.[Status] = 2
AND o.OrderType = 2
)
SELECT
oc.HotelId
,cte.RoomId
,cte.AuditId
,cte.AuditDate
,cte.AuditType
,cte.NumOfAuditsComplete
,cte.clientNum
,Complete_Error_Status = CASE WHEN cte.NumOfAuditsComplete > 1 THEN 'Complete More Than Once' ELSE 'Not Complete' END
FROM
cte
LEFT JOIN dbo.Hotels oc
ON cte.HotelId = oc.HotelId
LEFT JOIN dbo.clients c
ON cte.clientNum = c.clientNum
WHERE
cte.RowNum = 1
AND cte.NumOfAuditsComplete != 1
Also note I changed your
WHERE
o.[status] = '2'
AND o.orderType = '2'
TO
WHERE
o.[status] = 2
AND o.orderType = 2
to be numeric without the single quotes. If the data type is truely varchar add them back but when you query a numeric column as a varchar it will do data conversion and may not take advantage of indexes that you have built on the table.

SQL Condition for sum

I have a sql statement with many inner join tables, as you can see below I have many conditional SUM statements , these sums are giving me wrong (very large) numbers as the inner join is repeating the same values in my source select pool. I was wondering id there is a way to limit these sum conditions lets say to EMPLIDs. The code is :
SELECT
A.EMPL_CTG,
B.DESCR AS PrName,
SUM(A.CURRENT_COMPRATE) AS SALARY_COST_BUDGET,
SUM(A.BUDGET_AMT) AS BUDGET_AMT,
SUM(A.BUDGET_AMT)*100/SUM(A.CURRENT_COMPRATE) AS MERIT_GOAL,
SUM(C.FACTOR_XSALARY) AS X_Programp,
SUM(A.FACTOR_XSALARY) AS X_Program,
COUNT(A.EMPLID) AS EMPL_CNT,
COUNT(D.EMPLID),
SUM(CASE WHEN A.PROMOTION_SECTION = 'Y' THEN 1 ELSE 0 END) AS PRMCNT,
SUM(CASE WHEN A.EXCEPT_IND = 'Y' THEN 1 ELSE 0 END) AS EXPCNT,
(SUM(CASE WHEN A.PROMOTION_SECTION = 'Y' THEN 1 ELSE 0 END)+SUM(CASE WHEN A.EXCEPT_IND = 'Y' THEN 1 ELSE 0 END))*100/(COUNT(A.EMPLID)) AS PEpercent
FROM
EMP_DTL A INNER JOIN EMPL_CTG_L1 B ON A.EMPL_CTG = B.EMPL_CTG
INNER JOIN
ECM_PRYR_VW C ON A.EMPLID=C.EMPLID
INNER JOIN ECM_INELIG D on D.EMPL_CTG=A.EMPL_CTG and D.YEAR=YEAR(getdate())
WHERE
A.YEAR=YEAR(getdate())
AND B.EFF_STATUS='A'
GROUP BY
A.EMPL_CTG,
B.DESCR
ORDER BY B.DESCR
I already tried moving D.YEAR=YEAR(getdate()) to the where clause. Any help would be greatly appereciated
The probable reason of your very large numbers is probably due to the result of Cartesian product of joining A -> B, A -> C and A -> D where tables C and D appear to have multiple records. So, just example... if A has 10 records, and C has 10 for each of the A records, you now have 10 * 10 records... Finally, join that to D table with 10 records, you now have 10 * 10 * 10 for each "A", thus your bloated answers.
Now, how to resolve. I have taken your "C" and "D" tables and "Pre-Aggregated" those counts based on the join column basis. This way, they will each have only 1 record with the total already computed at that level, joined back to A table and you lose your Cartesian issue.
Now, for table B, it appears that is a lookup table only and would only be a single record result anyhow.
SELECT
A.EMPL_CTG,
B.DESCR AS PrName,
SUM(A.CURRENT_COMPRATE) AS SALARY_COST_BUDGET,
SUM(A.BUDGET_AMT) AS BUDGET_AMT,
SUM(A.BUDGET_AMT)*100/SUM(A.CURRENT_COMPRATE) AS MERIT_GOAL,
PreAggC.X_Programp,
SUM(A.FACTOR_XSALARY) AS X_Program,
COUNT(A.EMPLID) AS EMPL_CNT,
PreAggD.DCount,
SUM(CASE WHEN A.PROMOTION_SECTION = 'Y' THEN 1 ELSE 0 END) AS PRMCNT,
SUM(CASE WHEN A.EXCEPT_IND = 'Y' THEN 1 ELSE 0 END) AS EXPCNT,
( SUM( CASE WHEN A.PROMOTION_SECTION = 'Y' THEN 1 ELSE 0 END
+ CASE WHEN A.EXCEPT_IND = 'Y' THEN 1 ELSE 0 END ) *
100 / COUNT(A.EMPLID) AS PEpercent
FROM
EMP_DTL A
INNER JOIN EMPL_CTG_L1 B
ON A.EMPL_CTG = B.EMPL_CTG
AND B.EFF_STATUS='A'
INNER JOIN ( select
C.EMPLID,
SUM(C.FACTOR_XSALARY) AS X_Programp
from
ECM_PRYR_VW C
group by
C.EMPLID ) PreAggC
ON A.EMPLID = PreAggC.EMPLID
INNER JOIN ( select
D.EMPLID,
COUNT(*) AS DCount
from
ECM_INELIG D
where
D.Year = YEAR( getdate())
group by
D.EMPLID ) PreAggD
ON A.EMPLID = PreAggD.EMPLID
WHERE
A.YEAR=YEAR(getdate())
GROUP BY
A.EMPL_CTG,
B.DESCR
ORDER BY
B.DESCR