What can I do to update my query to avoid a Division by zero error? - syntax-error

I'm trying to update my query to pull a list of stores if it is marked as "third party" and integrated_images_via_api is set to "true".
When returning these results, I would like to use the divide function to pull averages but i keep running into a division by zero error.
Looks like something went wrong with your query.
net.snowflake.client.jdbc.SnowflakeSQLException: Division by zero
With
menu_data as (
SELECT DISTINCT
dht.date_stamp,
dm.BUSINESS_ID,
ps.provider_type,
dht.MENU_ID,
dht.ACTIVE_STORES_LINKED_TO_MENU,
dht.HAS_HEADER_IMAGE,
dht.HAS_LOGO_IMAGE,
dht.PHOTOS_TOTAL,
dht.NUM_ITEM_IDS,
dht.ITEMS_WITH_DESCRIPTIONS,
dht.PHOTOS_TOTAL*dht.ACTIVE_STORES_LINKED_TO_MENU as sum_photos,
dht.NUM_ITEM_IDS*dht.ACTIVE_STORES_LINKED_TO_MENU as sum_items,
dht.ITEMS_WITH_DESCRIPTIONS*dht.ACTIVE_STORES_LINKED_TO_MENU as sum_desc,
dht.HAS_HEADER_IMAGE*dht.ACTIVE_STORES_LINKED_TO_MENU as sum_headers,
dht.HAS_logo_IMAGE*dht.ACTIVE_STORES_LINKED_TO_MENU as sum_logos,
case when dht.has_header_image AND dht.has_logo_image AND dht.photos_total/dht.NUM_ITEM_IDS >=0.1 --NS, >10% Photos
then 1
else 0 end as NS_Sat
FROM
PRODDB.PUBLIC.DIMENSION_MENU_HEALTH_TRACKING dht
Left Join PRODDB.PUBLIC.DIMENSION_MENU dm ON dm.MENU_ID = dht.MENU_ID
LEFT JOIN DOORDASH_MERCHANT.PUBLIC.MAINDB_STORE_POINT_OF_SALE_INFO ps on ps.store_id=dm.store_id
LEFT JOIN PRODDB.STATIC.POS_PROVIDER_CLASSIFICATION pc on pc.PROVIDER_TYPE=ps.PROVIDER_TYPE
LEFT JOIN PRODDB.STATIC.MENU_DETAILS pm on pm.PROVIDER_ID=pc.PROVIDER_TYPE
WHERE
1 = 1
AND dht.DATE_STAMP = (SELECT max(date_stamp) from PRODDB.PUBLIC.DIMENSION_MENU_HEALTH_TRACKING)
AND dht.ACTIVE_MENU
AND dht.NUM_ITEM_IDS >0
AND --dm.BUSINESS_ID in ('1026','57396','859','1037567','400712','554309')
pc.DIRECT_OR_3PT= 'Third Party'
AND pm.INTEGRATED_IMAGES_VIA_API= 'TRUE'
)
--Main Query
SELECT
md.DATE_STAMP,
business_id,
sum(ACTIVE_STORES_LINKED_TO_MENU) as total_store_menus,
sum(case when md.NS_SAT = 1 then ACTIVE_STORES_LINKED_TO_MENU else NULL end) as NS_store_menus,
total_store_menus - NS_store_menus as ns_opp,
round(NS_Store_menus / total_store_menus, 4) as NS_Perc,
sum(sum_photos) as total_photos,
sum(sum_items) as total_items,
sum(sum_desc) as total_descriptions,
sum(sum_headers) as total_headers,
round(total_photos / total_items,4) as item_perc,
round(total_descriptions / total_items,4) as desc_perc,
total_items - total_photos as item_opp,
round(total_headers / total_store_menus,4) as perc_headers
from menu_data md
where ns_perc >= 0.95
group by 1,2
order by 1,2 DESC

Related

How to sum the value of another sum from same select statement

I am trying sum the value of another sum in the same select statement and then I want to check the sum value in case statement. When I do it, it is working instead it is just gets individual value.
I have to sum Billable_Trades and then I have to give some rate if the billable_trades is above some numbers for that, I need to know the total of the billable_trade.
select t.Business_Unit_Description, -- case when Product_Type_Description = 'Fee Based' then 'Fee Based' else '' end as revenue_type,
billable_trades,
isnull(c.comm_adjustments, 0) as commission_adjustments,
rate,
billable_trades*rate as charges,
0.3 as commission_rate,
isnull(c.comm_adjustments, 0)*0.3 as credit,
(billable_trades*rate)- isnull(c.comm_adjustments, 0)*0.3 as total
from
(
select Business_Unit_Description,
sum(billable_trades) as billable_trades,
CASE WHEN SUM(billable_trades) > 0 and SUM(billable_trades) <= 150000 THEN 0.85667 ELSE 0.47104 END as rate
from cte_combined
group by Business_Unit_Description
) t
left outer join cte_comm_adj c on c.Business_Unit_Description = t.Business_Unit_Description
order by t.Business_Unit_Description
There is obviously more to the query than is shown - as you are using a derived table to reference a CTE and also outer joining to another CTE.
I would move the calculation of rate out of the derived table:
Select t.Business_Unit_Description -- case when Product_Type_Description = 'Fee Based' then 'Fee Based' else '' end as revenue_type,
, t.sum_billable_trades
, commission_adjustments = isnull(c.comm_adjustments, 0)
, r.rate
, charges = t.sum_billable_trades * r.rate
, commission_rate = 0.3
, credit = isnull(c.comm_adjustments, 0) * 0.3
, total = (t.sum_billable_trades * r.rate) - isnull(c.comm_adjustments, 0) * 0.3
From (Select Business_Unit_Description
, sum_billable_trades = sum(billable_trades)
From cte_combined
Group By Business_Unit_Description) t
Cross Apply (Values (iif(t.sum_billable_trades > 0 And t.sum_billable_trades <= 150000, 0.85667, 0.47104))) As r(rate)
Left Outer Join cte_comm_adj c On c.Business_Unit_Description = t.Business_Unit_Description
Order By t.Business_Unit_Description;
I also wouldn't use the same name for the sum just to make it clearer.

Out of range integer: infinity

So I'm trying to work through a problem thats a bit hard to explain and I can't expose any of the data I'm working with but what Im trying to get my head around is the error below when running the query below - I've renamed some of the tables / columns for sensitivity issues but the structure should be the same
"Error from Query Engine - Out of range for integer: Infinity"
WITH accounts AS (
SELECT t.user_id
FROM table_a t
WHERE t.type like '%Something%'
),
CTE AS (
SELECT
st.x_user_id,
ad.name as client_name,
sum(case when st.score_type = 'Agility' then st.score_value else 0 end) as score,
st.obs_date,
ROW_NUMBER() OVER (PARTITION BY st.x_user_id,ad.name ORDER BY st.obs_date) AS rn
FROM client_scores st
LEFT JOIN account_details ad on ad.client_id = st.x_user_id
INNER JOIN accounts on st.x_user_id = accounts.user_id
--WHERE st.x_user_id IN (101011115,101012219)
WHERE st.obs_date >= '2020-05-18'
group by 1,2,4
)
SELECT
c1.x_user_id,
c1.client_name,
c1.score,
c1.obs_date,
CAST(COALESCE (((c1.score - c2.score) * 1.0 / c2.score) * 100, 0) AS INT) AS score_diff
FROM CTE c1
LEFT JOIN CTE c2 on c1.x_user_id = c2.x_user_id and c1.client_name = c2.client_name and c1.rn = c2.rn +2
I know the query works for sure because when I get rid of the first CTE and hard code 2 id's into a where clause i commented out it returns the data I want. But I also need it to run based on the 1st CTE which has ~5k unique id's
Here is a sample output if i try with 2 id's:
Based on the above number of row returned per id I would expect it should return 5000 * 3 rows = 150000.
What could be causing the out of range for integer error?
This line is likely your problem:
CAST(COALESCE (((c1.score - c2.score) * 1.0 / c2.score) * 100, 0) AS INT) AS score_diff
When the value of c2.score is 0, 1.0/c2.score will be infinity and will not fit into an integer type that you’re trying to cast it into.
The reason it’s working for the two users in your example is that they don’t have a 0 value for c2.score.
You might be able to fix this by changing to:
CAST(COALESCE (((c1.score - c2.score) * 1.0 / NULLIF(c2.score, 0)) * 100, 0) AS INT) AS score_diff

Use sql variables in query results

I have some of the following code:
Select p.CLIENT_NO,
s.CLIENT_NAME,
s.CLIENT_TYPE,
p.GL_CODE,
p.BATCH_KEY
From RU_POST p,
RU_ACCT a,
Ru_Ru s
Where
a.INTERNAL_KEY(+) = p.INTERNAL_KEY
And p.Batch_Key in
(Select Distinct (p1.BATCH_KEY)
From RU_POST p1
Where Abs(p1.AMOUNT) <> 0
And p1.POST_DATE Between To_Date('01-01-2015', 'dd-mm-yyyy') And
To_Date('01-01-2015', 'dd-mm-yyyy')
And p1.INTERNAL_KEY In ('367', '356'))
Now I want to have values stated in p1.INTERNAL_KEY to appear in query results, like if I did SELECT p1.INTERNAL_KEY.
However, I understand this won't work. So, it would be like '367' for 100 values, '356' for other 100.
Could someone help me how to put this condition value inside my result?
Like that:
CLIENT_NO CLIENT_SHORT CLIENT_NAME GL_CODE INTERNAL_KEY
399999000 399999 A 4568 367
599999000 599999 B 4879 356
You can try changing the in subquery to a join, like this:
select distinct
p.client_no
, s.client_name
, s.client_type
, p.gl_code
, p1.internal_key
from ru_post p
join ru_post p1 on p1.batch_key = p.batch_key
left join ru_acct a on a.internal_key = p.internal_key
cross join ru_ru s
where abs(p1.amount) <> 0
and p1.post_date between date '2015-01-01' and date '2015-01-01'
and p1.internal_key in ('367', '356') );
(Edited to match updated question - now left join ru_post to ru_acct):

Hive - Is there a way to further optimize a HiveQL query?

I have written a query to find 10 most busy airports in the USA from March to April. It produces the desired output however I want to try to further optimize it.
Are there any HiveQL specific optimizations that can be applied to the query? Is GROUPING SETS applicable here? I'm new to Hive and for now this is the shortest query that I've come up with.
SELECT airports.airport, COUNT(Flights.FlightsNum) AS Total_Flights
FROM (
SELECT Origin AS Airport, FlightsNum
FROM flights_stats
WHERE (Cancelled = 0 AND Month IN (3,4))
UNION ALL
SELECT Dest AS Airport, FlightsNum
FROM flights_stats
WHERE (Cancelled = 0 AND Month IN (3,4))
) Flights
INNER JOIN airports ON (Flights.Airport = airports.iata AND airports.country = 'USA')
GROUP BY airports.airport
ORDER BY Total_Flights DESC
LIMIT 10;
The table columns are as following:
Airports
|iata|airport|city|state|country|
Flights_stats
|originAirport|destAirport|FlightsNum|Cancelled|Month|
Filter by airport(inner join) and do aggregation before UNION ALL to reduce dataset passed to the final aggregation reducer. UNION ALL subqueries with joins should run in parallel and faster than join with bigger dataset after UNION ALL.
SELECT f.airport, SUM(cnt) AS Total_Flights
FROM (
SELECT a.airport, COUNT(*) as cnt
FROM flights_stats f
INNER JOIN airports a ON f.Origin=a.iata AND a.country='USA'
WHERE Cancelled = 0 AND Month IN (3,4)
GROUP BY a.airport
UNION ALL
SELECT a.airport, COUNT(*) as cnt
FROM flights_stats f
INNER JOIN airports a ON f.Dest=a.iata AND a.country='USA'
WHERE Cancelled = 0 AND Month IN (3,4)
GROUP BY a.airport
) f
GROUP BY f.airport
ORDER BY Total_Flights DESC
LIMIT 10
;
Tune mapjoins and enable parallel execution:
set hive.exec.parallel=true;
set hive.auto.convert.join=true; --this enables map-join
set hive.mapjoin.smalltable.filesize=25000000; --size of table to fit in memory
Use Tez and vectorizing, tune mappers and reducers parallelism: https://stackoverflow.com/a/48487306/2700344
It might help if you do the aggregation before the union all:
SELECT a.airport, SUM(cnt) AS Total_Flights
FROM ((SELECT Origin AS Airport, COUNT(*) as cnt
FROM flights_stats
WHERE (Cancelled = 0 AND Month IN (3,4))
GROUP BY Origin
) UNION ALL
(SELECT Dest AS Airport, COUNT(*) as cnt
FROM flights_stats
WHERE Cancelled = 0 AND Month IN (3,4)
GROUP BY Dest
)
) f INNER JOIN
airports a
ON f.Airport = a.iata AND a.country = 'USA'
GROUP BY a.airport
ORDER BY Total_Flights DESC
LIMIT 10;
I don't think GROUPING SETS are applicable here because you are only grouping by one field.
From Apache Wiki:
"The GROUPING SETS clause in GROUP BY allows us to specify more than one GROUP BY option in the same record set."
You can test this but you are in the case where an Union maybe better, so You really need to test it and come back :
SELECT airports.airport,
SUM(
CASE
WHEN T1.FlightsNum IS NOT NULL THEN 1
WHEN T2.FlightsNum IS NOT NULL THEN 1
ELSE 0
END
) AS Total_Flights
FROM airports
LEFT JOIN (SELECT Origin AS Airport, FlightsNum
FROM flights_stats
WHERE (Cancelled = 0 AND Month IN (3,4))) t1
on t1.Airport = airports.iata
LEFT JOIN (SELECT Dest AS Airport, FlightsNum
FROM flights_stats
WHERE (Cancelled = 0 AND Month IN (3,4))) t2
on t1.Airport = airports.iata
GROUP BY airports.airport
ORDER BY Total_Flights DESC

Sql query to multiply two column value to third column

I want to multiply two columns value to 3rd column. Here is my query:
select distinct pr.PSProjectId,sfa.CodePattern, case when sfqd.NCR IS null then 'blank' else sfqd.NCR end as NCR
,
case when sfqd.NCR !='blank' then
(Select DATEDIFF(minute,starttime,EndTime) from ShopFloorStatusDetail where ShopFloorActivityId=sfa.ShopFloorActivityId
and StatusId=8
)
else
DATEDIFF(MINUTE,sfs.ShiftStarTime,sfs.shiftendtime)
end as timediff,
(select COUNT(1) from ShopFloorEmployeeTime where ShopFloorShiftId=sfs.ShopFloorShiftId) as totalemployee
from ShopFloor sf
inner join Project pr on pr.ProjectId=sf.ProjectId
inner join ShopFloorActivity sfa on sf.ShopFloorId=sfa.ShopFloorId
inner join ShopFloorShift sfs on sfs.ShopFloorActivityId=sfa.ShopFloorActivityId
left join ShopFloorStatusDetail sfsd on sfsd.ShopFloorActivityId=sfs.ShopFloorActivityId
left join ShopFloorQCDetail sfqd on sfqd.ShopFloorStatusDetailId=sfsd.ShopFloorStatusDetailId
and sfqd.NCR is not null
where CAST(sfs.ShiftStarTime as DATE) between '2014/01/06' and '2014/01/07'
and output from this query is
PSProjectId CodePattern NCR timediff totalemployee
0000129495 3TMEU blank 8 1
0000130583 3UA1P blank 1 1
0000130583 3UA1P blank 2090 2
Now i want to multiply column timediff and totalemployee and show it in a new column.
How do I do this? Please help.
Just add a new column, multiplying the existing expressions:
case when sfqd.NCR !='blank'
then (Select DATEDIFF(minute,starttime,EndTime)
from ShopFloorStatusDetail
where ShopFloorActivityId=sfa.ShopFloorActivityId
and StatusId=8
)
else DATEDIFF(MINUTE,sfs.ShiftStarTime,sfs.shiftendtime)
end
*
(select COUNT(1)
from ShopFloorEmployeeTime
where ShopFloorShiftId=sfs.ShopFloorShiftId)
Alternatively, wrap the whole existing query in another query, and multiply the calcualted columns:
select
PSProjectId,
CodePattern,
NCR,
timediff,
totalemployee,
timediff * totalemployee
from
( ...original query here... )