How can I avoid zero values in an aggregate? - sql

I have a query which is pulling back data correctly, but about a quarter of the lines of data are unnecessary because they have 0 in the primary data field. Here is the line I wrote to create that primary data field. How can I exclude any case where the output value for amort_FY2020 is 0?
sum(case when f.sales_revenue_period_calendar_sid in ('20200101',
'20200201',
'20200301',
'20200401',
'20200501',
'20200601',
'20200701',
'20200801',
'20200901',
'20201001',
'20201101',
'20201201')
then f.sales_revenue_before_tax_usd_amount
else 0 end) as amort_FY2020

Depending on your database, you can use:
having amort_FY2020 <> 0
Or may may need to repeat the expression:
sum(case when f.sales_revenue_period_calendar_sid in ('20200101','20200201','20200301','20200401','20200501','20200601','20200701','20200801','20200901','20201001','20201101','20201201')
then f.sales_revenue_before_tax_usd_amount else 0
end) <> 0

Related

Hive query with except conditions

I am trying to build a hive query that does only the below features or a combination of these features. For example, the features include
name = "summary"
name = "details"
name1 = "vehicle stats"
Basically, the query should exclude all the other features in name and name1.
I am quite new to hive. In sql, i know this can be done using except keyword. Just wondering whether there is some functions that can achieve the same.
Thanks very much !!
If I understand correctly, I approach this using group by and having:
select ?
from t
group by ?
having sum(case when name = 'summary' then 1 else 0 end) > 0 and
sum(case when name = 'details' then 1 else 0 end) > 0 and
sum(case when name1 = 'vehicle_stats' then 1 else 0 end) > 0;
The ? is for the column that you want the summary of.

Trouble counting null and missing values (and differentiating between the two) using RODBC package

I am working to create a a matrix of missingness for a SQL database consisting of 5 tables and nearly 10 years of data. I have established ODBC connectivity and am using the RODBC package in R as my working environment. I am trying to write a function that will output a count of rows for each year for each table, a count and percent of null values (values not present) in a given year for a given table, and a count and percent of missing (questions skipped/not answered) values for a given table. I am working with the code below, trying to get it to work on one variable then turning it into a function once it works. However, when I run this code(see below), it appears to not be working, and I believe the issue lies with assigning an integer value to the character for null, NA. I am getting this message when trying to list vars in the function:
Error in as.environment(pos) : no item called "22018 245 [Microsoft][ODBC SQL Server Driver][SQL Server]Conversion failed when converting the varchar value 'NA' to data type int." on the search list.
Also, when I try to find the environment for the function, R returns NULL. I do not necessarily want to assign a new value to the already existent variable, and I new to SQL, but I am trying to do something along these lines If X = 'NA' then Y = 1 else 0. I get the following error message when I try to run the final 2 lines creating the percent vars:
Error in eval(substitute(expr), data, enclos = parent.frame()) : invalid 'envir' argument of type 'character'
Any insight?
test1 <- sqlQuery(channel, "select
[EVENT_YEAR] AS 'YEAR',
COUNT(*) AS 'TOTAL',
SUM(CASE WHEN MOTHER_EDUCATION_TRENDABLE = 'NA' THEN 1 ELSE 0 END) AS 'NULL_VAL',
SUM(CASE WHEN MOTHER_EDUCATION_TRENDABLE = -1 THEN 1 ELSE 0 END) AS 'MISS_VAL'
from [GA_CMH].[dbo].[BIRTHS]
GROUP BY [EVENT_YEAR]
ORDER BY [EVENT_YEAR]")
test1$nullpct<-with(test1, NULL_VAL/TOTAL)
test1$misspct<-with(test1, MISS_VAL/TOTAL)
I believe the data type of your column MOTHER_EDUCATION_TRENDABLE is an integer, if so, try:
select
[EVENT_YEAR] AS 'YEAR',
COUNT(*) AS 'TOTAL',
SUM(CASE WHEN MOTHER_EDUCATION_TRENDABLE IS NULL THEN 1 ELSE 0 END) AS 'NULL_VAL',
SUM(CASE WHEN MOTHER_EDUCATION_TRENDABLE = -1 THEN 1 ELSE 0 END) AS 'MISS_VAL'
from [GA_CMH].[dbo].[BIRTHS]
GROUP BY [EVENT_YEAR]
ORDER BY [EVENT_YEAR]

How to add a column on fly ?

I am facing different kind of problem. In select query I want to add a temporary column on fly based on other columns value.
I have 2 columns
IsOpeningClosingDateToo (tinyint),
HearingDate Date
Now I want to check that if IsOpeningClosingDate = 1 then
Select HearingDate, HearingDate as 'OpeningDate'
If IsOpeningClosingDate= 2
Select HearingDate, HearingDate as 'ClosingDate'
I have tried to do this but failed:
SELECT
,[HearingDate]
,CASE [IsOpeningClosingDate]
when 1 then [HearingDate] as OpeningDate
When 0 then [HearingDate] as ClosingDate
end as 'test'
]
FROM [LitMS_MCP].[dbo].[CaseHearings]
I would suggest returning three columns. Then you can fetch the values in on the application side:
SELECT HearingDate,
(CASE WHEN IsOpeningClosingDate = 1 THEN HearingDate END) as OpeningDate,
(CASE WHEN IsOpeningClosingDate = 0 THEN HearingDate END) as ClosingDate
FROM [LitMS_MCP].[dbo].[CaseHearings];
Alternatively, you could just fetch HearingDate and IsOpeningClosingDate and do the comparison in Python.
The important point is that the columns in a SQL query are fixed by the SELECT. You cannot vary the names or types of the columns conditionally within the query.

Multiple columns within a single CASE statement

I'm sure this has be covered many times so please pardon my repeat. I have a query that works but currently has 6 CASE statements within one select. Someone mentioned that it would be best optimized by putting all my WHEN conditions within a single CASE. However, I'm unable to achieve this
select right(RTRIM(region),5) as cell_id,
sum(CASE WHEN LEFT(cparty,3) in ('999','998','997') THEN chargeduration/60 else 0 END) AS OnNet_Minutes,
sum(CASE WHEN LEFT(cparty,3) in ('996','995') THEN chargeduration/60 else 0 END) AS OffNet_C_Minutes,
sum(CASE WHEN LEFT(cparty,3) in ('994','993','992') THEN chargeduration/60 else 0 END) AS OffNet_A_Minutes,
sum(CASE WHEN LEFT(cparty,3) in ('991','990') THEN chargeduration/60 else 0 END) AS OffNet_S_Minutes,
sum(CASE WHEN LEFT(cparty,2) = '00' THEN chargeduration/60 else 0 END) AS OffNet_T_Minutes,
sum(CASE WHEN len(cparty) < 6 and LEFT(cparty,1) <> 0 THEN chargeduration/60 else 0 END) AS SC_Minutes
from August.dbo.cdr20130818
where CHARGEDURATION > 0 and ISNULL(region,'''')<>'''' and LEN(region) > 5
group by right(RTRIM(region),5)
order by right(RTRIM(region),5) asc
In your case, you can't put them all into one CASE, since the results all go into different columns of the select.
BTW, you should remove your ISNULL(region, '''') <> '''' condition, as it's redundant when paired with the LEN(region) > 5 condition. (When region is null, then LEN(region) is also null, and NULL > 5 is false.)
I think you have it right, six different SUM()'s that each have meaning on their own.
If all of your criteria were in the same CASE statement you'd lose detail, you'd be returning the SUM() of your currently separate statements combined into one.
Combining redundant criteria in the WHERE clause can clean up a CASE statement, but you don't have anything completely redundant here.

SQL COUNT specific unit IF another field <> 0

I have a query that has a select statement that contains the following:
,COUNT(u.[Unit])
,up.[Number_Of_Stops]
I need to only count the units where number of stops <> 0. This has more details in the query so I can't just say WHERE number_of_stops <> 0. It has to be within the select statement.
Thanks
Try:
SUM(CASE WHEN up.[Number_Of_Stops] != 0 THEN 1 ELSE 0 END) AS countWhereNumStopsNotZero
(Edit: original answer said "COUNT" not "SUM")