Counting null and missing values using RODBC package - sql

I am working to create a a matrix of missingness for a SQL database consisting of 5 tables and nearly 10 years of data. I have established ODBC connectivity and am using the RODBC package in R as my working environment. I am trying to write a function that will output a count of rows for each year for each table, a count and percent of null values (values not present) in a given year for a given table, and a count and percent of missing (questions skipped/not answered) values for a given table. I have written the code below, trying to get it to work on one variable then turning it into a function once it works. However, when I run this code, the count for total, missing and null values are all the same and the percent of course is 1. I am not getting any error messages. I am not sure where the issue lies and it is important to distinguish between missing and null for this project. Any insight is much appreciated.
test1 <- sqlQuery(channel, "
SELECT [event_year] AS 'YEAR',
Count(*) AS 'TOTAL',
Count(CASE
WHEN mother_education_trendable = 'NA' THEN 1
ELSE 0
END) AS 'NULL_VAL',
Count(CASE
WHEN mother_education_trendable = -1 THEN 1
ELSE 0
END) AS 'MISS_VAL'
FROM [GA_CMH].[dbo].[births]
GROUP BY [event_year]
ORDER BY [event_year]
")
test1$nullpct<-with(test1, NULL_VAL/TOTAL)
test1$misspct<-with(test1, MISS_VAL/TOTAL)

Your current CASE statement inside the Count aggregate will populate either 1 or 0 both will be considered in Count aggregate so you are getting same count as total.
Zero is a value that will be counted in Count aggregate so remove the ELSE part in CASE statement by default NULL will be populated non matching conditions which will not be counted/considered in COUNT aggregate
SELECT [event_year] AS 'YEAR',
Count(*) AS 'TOTAL',
Count(CASE
WHEN mother_education_trendable = 'NA' THEN 1
END) AS 'NULL_VAL',
Count(CASE
WHEN mother_education_trendable = -1 THEN 1
END) AS 'MISS_VAL'
FROM [GA_CMH].[dbo].[births]
GROUP BY [event_year]
ORDER BY [event_year]
Or use SUM aggregate instead of COUNT
SELECT [event_year] AS 'YEAR',
Count(*) AS 'TOTAL',
SUM(CASE
WHEN mother_education_trendable = 'NA' THEN 1 ELSE 0
END) AS 'NULL_VAL',
SUM(CASE
WHEN mother_education_trendable = -1 THEN 1 ELSE 0
END) AS 'MISS_VAL'
FROM [GA_CMH].[dbo].[births]
GROUP BY [event_year]
ORDER BY [event_year]

Related

PostgreSQL WHERE cause to filter out results with 0 causes a syntax error

I'm trying to produce a SQL query where I have my IDs on the first column, then sum of page hits of accounts in column 2 and page hits of payments in column 3. The below code works, except for one point. I want to exclude any rows which have 0 in accounts column. When I add in the WHERE clause
SELECT
EVAR3 AS MYID
sum(case when web.Page.name = 'Accounts:Overview' then 1 else 0 end) as accounts
sum(case when web.Page.name = 'Payment:Overview' then 1 else 0 end) as payments
FROM mytable
WHERE
timestamp >= to_timestamp('2021-09-01') AND timestamp <= to_timestamp('2021-09-02')
AND
(sum(case when web.Page.name = 'Accounts:Overview' then 1 else 0 end) > 0)
GROUP BY EVAR3
ORDER BY EVAR3 ASC
This gives me an invalid expression error on my where clause saying generate expressions are not valid in the where clause. When I change WHERE to HAVING I get a syntax error saying it expected EOF rather than GROUP.
How do I correctly implement a filter to remove results with 0 in accounts to otherwise working code?
Since it is an aggregate function it should be HAVING but it should appear after the GROUP BY. the structure is SELECT, FROM, WHERE, GROUP BY, HAVING, ORDER BY
SELECT
EVAR3 AS MYID,
sum(case when web.Page.name = 'Accounts:Overview' then 1 else 0 end) as accounts,
sum(case when web.Page.name = 'Payment:Overview' then 1 else 0 end) as payments
FROM mytable
WHERE
timestamp >= to_timestamp('2021-09-01') AND timestamp <= to_timestamp('2021-09-02')
GROUP BY EVAR3
HAVING (sum(case when web.Page.name = 'Accounts:Overview' then 1 else 0 end) > 0)
ORDER BY EVAR3 ASC

Cleaning "SUM" Query

I have a bit of sql code that look similar to this:
select sum(case when latitude = '0' then 1 else 0 end) as count_zero,
sum(case when latitude is NULL then 1 else 0 end) as count_null,
sum((case when latitude = '0' then 1 else 0 end) +
(case when latitude is NULL then 1 else 0 end)
) as total_zero,
count(latitude) as count_not_nulls,
count(*) as total
from sites_database
Is there a "cleaner" way to write this same query. I have tried using the "sum" expression using the column alias, something like:
Sum(count_zero + count_null) as total_null
But this doesn't seem to work for some reason
You could use COUNT instead of SUM:
SELECT
COUNT(CASE WHEN latitude = '0' THEN 1 END) As count_zero,
COUNT(CASE WHEN latitude IS NULL THEN 1 END) AS count_null,
COUNT(CASE WHEN COALESCE(latitude, '0') = '0' THEN 1 END) AS total_zero,
COUNT(latitude) As count_not_nulls,
COUNT(*) as total
FROM sites_database;
Using COUNT here saves a bit of coding, because we don't have to provide an explicit ELSE condition (the default ELSE is NULL, which just isn't counted at all). Also note that for the total_zero conditional sum, I used COALESCE to merge the two counts into just one.

SQL select grouping and subtract

i have table named source table with data like this :
And i want to do query that subtract row with status plus and minus to be like this group by product name :
How to do that in SQL query? thanks!
Group by the product and then use a conditional SUM()
select product,
sum(case when status = 'plus' then total else 0 end) -
sum(case when status = 'minus' then total else 0 end) as total,
sum(case when status = 'plus' then amount else 0 end) -
sum(case when status = 'minus' then amount else 0 end) as amount
from your_table
group by product
There is another method using join, which works for the particular data you have provided (which has one "plus" and one "minus" row per product):
select tplus.product, (tplus.total - tminus.total) as total,
(tplus.amount - tminus.amount) as amount
from t tplus join
t tminus
on tplus.product = tminus.product and
tplus.status = 'plus' and
tplus.status = 'minus';
Both this and the aggregation query work well for the data you have provided. In other words, there are multiple ways to solve this problem (each has its strengths).
you can query as below:
select product , sum (case when [status] = 'minus' then -Total else Total end) as Total
, sum (case when [status] = 'minus' then -Amount else Amount end) as SumAmount
from yourproduct
group by product

SQL percentage with rows same table with different where condition

I want to do a query like:
select
count(asterisk) where acción='a'/count(asterisk) where acción='b' * 100
from
same_table
grouped by day
but I don't want use subquery, is it possible with joins?
I`m not sure the syntax is correct, but you can use something like this:
SELECT day,
SUM(CASE WHEN "acción" = 'a' THEN 1 ELSE 0 END) AS SUM_A,
SUM(CASE WHEN "acción" = 'b' THEN 1 ELSE 0 END) AS SUM_B,
SUM(CASE WHEN "acción" = 'a' THEN 1 ELSE 0 END) AS SUM_A / SUM(CASE WHEN "acción" = 'b' THEN 1 ELSE 0 END) * 100 AS result
FROM your_table
GROUP BY day
The concept is to actually sum the the values that you need, instead of count.

Query from same table to extract different data

In a single table I have 3 columns. First defines a sector, second count and third amount. I need to extract 5 columns of data in the following manner. First column sector. Second and third to contains the values were amount is less than count and third and four to display were amount is more than count in the specific sectors. How should my query look?
Sample Data - 4 row data for sector one.
1,23,44
1,20,15
1,50,45
1,30,20
Result should be
1,100,80,23,44
You can get it done using a GROUP BY and SUM() aggregate function along with CASE statement like
SELECT sector,
SUM(case when count > amount then count else 0 end) as count1,
SUM(case when amount < count then amount else 0 end) as amount1,
SUM(case when count < amount then count else 0 end) as count2,
SUM(case when amount > count then amount else 0 end) as amount2
FROM mytable
GROUP BY sector;