Hive summary function inside case statement

Hive summary function inside case statement - sql

I am trying to write a simple Hive query:
select sum(case when pot_sls_q > 2* avg(pit_sls_q) then 1 else 0)/count(*) from prd_inv_fnd.item_pot_sls where dept_i=43 and class_i=3 where p_wk_end_d = 2014-06-28;
Here pit_sls_q and pot_sls_q both are columns in the Hive table and I want proportion of records which have pot_sls_q more than 2 times average of pit_sls_q. However I get error:
FAILED: SemanticException [Error 10128]: Line 1:95 Not yet supported place for UDAF 'avg'
To fool around I even tried using some window function:
select sum(case when pot_sls_q > 2* avg(pit_sls_q) over (partition by dept_i,class_i) then 1 else 0 end)/count(*) from prd_inv_fnd.item_pot_sls where dept_i=43 and class_i=3 and p_wk_end_d = '2014-06-28';
which is fine considering the fact filtering or partitioning the data on same condition is "same" data essentially but even with this I get error:
FAILED: SemanticException [Error 10002]: Line 1:36 Invalid column reference 'avg': (possible column names are: p_wk_end_d, dept_i, class_i, item_i, pit_sls_q, pot_sls_q)
please suggest right way of doing this.

You are using AVG inside SUM which won't work (along with other syntax errors).
Try analytic AVG OVER () this:
select sum(case when pot_sls_q > 2 * avg_pit_sls_q then 1 else 0 end) / count(*)
from (
select t.*,
avg(pit_sls_q) over () avg_pit_sls_q
from prd_inv_fnd.item_pot_sls t
where dept_i = 43
and class_i = 3
and p_wk_end_d = '2014-06-28'
) t;

Related

Presto Weighted Moving Average Syntax Error

I'm trying to run the weighted moving average Silota query with similar data in a Presto database but am encountering an error. The same query in the Redshift database has no issues, however in Presto I receive a syntax error:
Query failed (#20220505_230258_04927_5xpwi):
line 14:14: Column 't2.row_number' cannot be resolved io.prestosql.spi.PrestoException:
line 14:14: Column 't2.row_number' cannot be resolved.
The data is the same in both databases, why does the query run in Redshift while Presto throws the error?
WITH t AS
(select date_trunc('month',mql_date) date, avg(mqls) mqls, row_number() over ()
from marketing.campaign
WHERE date_trunc('month',mql_date) > date('2021-12-31')
GROUP BY 1)
select t.date, avg(t.mqls),
sum(case
when t.row_number - t2.row_number = 0 then 0.4 * t2.mqls
when t.row_number - t2.row_number = 1 then 0.3 * t2.mqls
when t.row_number - t2.row_number = 2 then 0.2 * t2.mqls
when t.row_number - t2.row_number = 3 then 0.1 * t2.mqls
end) weighted_avg
from t
join t t2 on t2.row_number between t.row_number - 3 and t.row_number
group by 1
order by 1

I suspect it is because your SQL assumes that the result of the row_number() window function will be called "row_number". This is true in Redshift but other databases may infer a different name onto it. You should alias this to some defined name such as "rn".
Also you have no "order by" clause in your row_number() function which will make the row numbers unpredictable and possibly varying between invocations.

Trouble counting null and missing values (and differentiating between the two) using RODBC package

I am working to create a a matrix of missingness for a SQL database consisting of 5 tables and nearly 10 years of data. I have established ODBC connectivity and am using the RODBC package in R as my working environment. I am trying to write a function that will output a count of rows for each year for each table, a count and percent of null values (values not present) in a given year for a given table, and a count and percent of missing (questions skipped/not answered) values for a given table. I am working with the code below, trying to get it to work on one variable then turning it into a function once it works. However, when I run this code(see below), it appears to not be working, and I believe the issue lies with assigning an integer value to the character for null, NA. I am getting this message when trying to list vars in the function:
Error in as.environment(pos) : no item called "22018 245 [Microsoft][ODBC SQL Server Driver][SQL Server]Conversion failed when converting the varchar value 'NA' to data type int." on the search list.
Also, when I try to find the environment for the function, R returns NULL. I do not necessarily want to assign a new value to the already existent variable, and I new to SQL, but I am trying to do something along these lines If X = 'NA' then Y = 1 else 0. I get the following error message when I try to run the final 2 lines creating the percent vars:
Error in eval(substitute(expr), data, enclos = parent.frame()) : invalid 'envir' argument of type 'character'
Any insight?
test1 <- sqlQuery(channel, "select
[EVENT_YEAR] AS 'YEAR',
COUNT(*) AS 'TOTAL',
SUM(CASE WHEN MOTHER_EDUCATION_TRENDABLE = 'NA' THEN 1 ELSE 0 END) AS 'NULL_VAL',
SUM(CASE WHEN MOTHER_EDUCATION_TRENDABLE = -1 THEN 1 ELSE 0 END) AS 'MISS_VAL'
from [GA_CMH].[dbo].[BIRTHS]
GROUP BY [EVENT_YEAR]
ORDER BY [EVENT_YEAR]")
test1$nullpct<-with(test1, NULL_VAL/TOTAL)
test1$misspct<-with(test1, MISS_VAL/TOTAL)

I believe the data type of your column MOTHER_EDUCATION_TRENDABLE is an integer, if so, try:
select
[EVENT_YEAR] AS 'YEAR',
COUNT(*) AS 'TOTAL',
SUM(CASE WHEN MOTHER_EDUCATION_TRENDABLE IS NULL THEN 1 ELSE 0 END) AS 'NULL_VAL',
SUM(CASE WHEN MOTHER_EDUCATION_TRENDABLE = -1 THEN 1 ELSE 0 END) AS 'MISS_VAL'
from [GA_CMH].[dbo].[BIRTHS]
GROUP BY [EVENT_YEAR]
ORDER BY [EVENT_YEAR]

How to add a column on fly ?

I am facing different kind of problem. In select query I want to add a temporary column on fly based on other columns value.
I have 2 columns
IsOpeningClosingDateToo (tinyint),
HearingDate Date
Now I want to check that if IsOpeningClosingDate = 1 then
Select HearingDate, HearingDate as 'OpeningDate'
If IsOpeningClosingDate= 2
Select HearingDate, HearingDate as 'ClosingDate'
I have tried to do this but failed:
SELECT
,[HearingDate]
,CASE [IsOpeningClosingDate]
when 1 then [HearingDate] as OpeningDate
When 0 then [HearingDate] as ClosingDate
end as 'test'
]
FROM [LitMS_MCP].[dbo].[CaseHearings]

I would suggest returning three columns. Then you can fetch the values in on the application side:
SELECT HearingDate,
(CASE WHEN IsOpeningClosingDate = 1 THEN HearingDate END) as OpeningDate,
(CASE WHEN IsOpeningClosingDate = 0 THEN HearingDate END) as ClosingDate
FROM [LitMS_MCP].[dbo].[CaseHearings];
Alternatively, you could just fetch HearingDate and IsOpeningClosingDate and do the comparison in Python.
The important point is that the columns in a SQL query are fixed by the SELECT. You cannot vary the names or types of the columns conditionally within the query.

Cypher - Issue with Where + Aggregate + With

I am trying to execute the following Cypher query
START b=node:customer_idx(ID = 'ABCD')
MATCH p = b-[r1:LIKES]->stuff, someone_else_too-[r2:LIKES]->stuff
with b,someone_else_too, count(*) as matchingstuffcount
where matchingstuffcount > 1
//with b, someone_else_too, matchingstuffcount, CASE WHEN ...that has r1, r2... END as SortIndex
return someone_else_too, SortIndex
order by SortIndex
The above query works fine but moment I uncomment lower "with" I get following errors
Unknown identifier `b`.
Unknown identifier `someone_else_too`.
Unknown identifier `matchingstuffcount`.
Unknown identifier `r1`.
Unknown identifier `r2`.
To get around, I include r1 and r2 in the top with - "with b,someone_else_too, count(*) as matchingstuffcount". to "with b, r1, r2, someone_else_too, count(*) as matchingstuffcount". This messes my count(*) > 1 condition as count(*) does not aggregate properly.
Any workarounds / suggestions to filter out count(*) > 1 while making sure Case When can also be executed ?

Under neo4j 2.0 via console.neo4j.org I was able to get the following query to work. I tried to mimic the constructs you had, namely the WITH/WHERE/WITH/RETURN sequence. (If I missed something, please let me know!)
START n=node:node_auto_index(name='Neo')
MATCH n-[r:KNOWS|LOVES*]->m
WITH n,COUNT(r) AS cnt,m
WHERE cnt >1
WITH n, cnt, m, CASE WHEN m.name?='Cypher' THEN 1 ELSE 0 END AS isCypher
RETURN n AS Neo, cnt, m, isCypher
ORDER BY cnt
Update it or change it here.

What's wrong with this DB2 SQL Query?

SELECT
getOrgName(BC.ManageOrgID),
COUNT(CASE WHEN (EXISTS (SELECT FO.OBJECTNO FROM FLOW_OBJECT FO WHERE FO.ObjectNo=CR.SerialNo) AND NVL(CR.FinallyResult,'') IN ('01','02','03','04','05')) THEN BC.ManageOrgID ELSE NULL END)
FROM
BUSINESS_CONTRACT BC,
CLASSIFY_RECORD CR
WHERE
CR.ObjectType='BusinessContract'
AND CR.ObjectNo=BC.SerialNo
GROUP BY BC.ManageOrgID, CR.SerialNo, CR.FinallyResult
The error message I receive is:
11:01:32 [SELECT - 0 row(s), 0.000 secs] [Error Code: -112, SQL State: 42607] DB2 SQL Error: SQLCODE=-112, SQLSTATE=42607, SQLERRMC=SYSIBM.COUNT, DRIVER=3.57.82
... 1 statement(s) executed, 0 row(s) affected, exec/fetch time: 0.000/0.000 sec [0 successful, 0 warnings, 1 errors]

"The operand of the column function name (in your case, count) includes a column function, a scalar fullselect, or a subquery." DB2 doesn't allow this. See the documentation on SQL112 for more.
I'm not really sure how to fix your query but perhaps you can try the HAVING clause after GROUP BY.

Here is one way to rework the query:
SELECT
getOrgName( BC.ManageOrgID ),
COUNT( FO.ObjectNo ) AS objectcount
FROM BUSINESS_CONTRACT BC
INNER JOIN CLASSIFY_RECORD CR
ON CR.ObjectNo = BC.SerialNo
AND CR.ObjectType = 'BusinessContract'
AND CR.FinallyResult IN ( '01','02','03','04','05' )
INNER JOIN FLOW_OBJECT FO
ON FO.ObjectNo = CR.SerialNo
GROUP BY BC.ManageOrgID
;

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Hive summary function inside case statement - sql

Related

Presto Weighted Moving Average Syntax Error

Trouble counting null and missing values (and differentiating between the two) using RODBC package

How to add a column on fly ?

Cypher - Issue with Where + Aggregate + With

What's wrong with this DB2 SQL Query?

Categories

Resources