Trouble counting null and missing values (and differentiating between the two) using RODBC package - sql

I am working to create a a matrix of missingness for a SQL database consisting of 5 tables and nearly 10 years of data. I have established ODBC connectivity and am using the RODBC package in R as my working environment. I am trying to write a function that will output a count of rows for each year for each table, a count and percent of null values (values not present) in a given year for a given table, and a count and percent of missing (questions skipped/not answered) values for a given table. I am working with the code below, trying to get it to work on one variable then turning it into a function once it works. However, when I run this code(see below), it appears to not be working, and I believe the issue lies with assigning an integer value to the character for null, NA. I am getting this message when trying to list vars in the function:
Error in as.environment(pos) : no item called "22018 245 [Microsoft][ODBC SQL Server Driver][SQL Server]Conversion failed when converting the varchar value 'NA' to data type int." on the search list.
Also, when I try to find the environment for the function, R returns NULL. I do not necessarily want to assign a new value to the already existent variable, and I new to SQL, but I am trying to do something along these lines If X = 'NA' then Y = 1 else 0. I get the following error message when I try to run the final 2 lines creating the percent vars:
Error in eval(substitute(expr), data, enclos = parent.frame()) : invalid 'envir' argument of type 'character'
Any insight?
test1 <- sqlQuery(channel, "select
[EVENT_YEAR] AS 'YEAR',
COUNT(*) AS 'TOTAL',
SUM(CASE WHEN MOTHER_EDUCATION_TRENDABLE = 'NA' THEN 1 ELSE 0 END) AS 'NULL_VAL',
SUM(CASE WHEN MOTHER_EDUCATION_TRENDABLE = -1 THEN 1 ELSE 0 END) AS 'MISS_VAL'
from [GA_CMH].[dbo].[BIRTHS]
GROUP BY [EVENT_YEAR]
ORDER BY [EVENT_YEAR]")
test1$nullpct<-with(test1, NULL_VAL/TOTAL)
test1$misspct<-with(test1, MISS_VAL/TOTAL)

I believe the data type of your column MOTHER_EDUCATION_TRENDABLE is an integer, if so, try:
select
[EVENT_YEAR] AS 'YEAR',
COUNT(*) AS 'TOTAL',
SUM(CASE WHEN MOTHER_EDUCATION_TRENDABLE IS NULL THEN 1 ELSE 0 END) AS 'NULL_VAL',
SUM(CASE WHEN MOTHER_EDUCATION_TRENDABLE = -1 THEN 1 ELSE 0 END) AS 'MISS_VAL'
from [GA_CMH].[dbo].[BIRTHS]
GROUP BY [EVENT_YEAR]
ORDER BY [EVENT_YEAR]

Related

searching for customers where they have two transactions of a certain value

trying to run a script in athena where I can pull back customers who have made purchases of two specified values (14.45 and 17.45). Thought I would make a column for each value appearing and filter out for both columns >0 when downloaded onto excel but my code isn't working, any help.
select order_customer_id,
sum(invoice_total_price= cast('14.45' as decimal(20,2))) > 0,
sum(invoice_total_price = cast('17.45' as decimal(20,2))) > 0
from orders
where year_month_day between '2022-01-10' and '2022-03-14'
group by order_customer_id
Get this error when I run it
Unexpected parameters (boolean) for function sum. Expected: sum(double) , sum(real) , sum(bigint) , sum(interval day to second) , sum(interval year to month) , sum(decimal(p,s))
I done the cast within the two sum columns as the invoice_total_price is stored as decimal
You can use count_if, also potentially cast from string is not needed:
select order_customer_id,
count_if(invoice_total_price = 14.45) > 0 has_14,
count_if(invoice_total_price = 17.45) > 0 has_17
from orders
where year_month_day between '2022-01-10' and '2022-03-14'
group by order_customer_id
Which will give you a table with 3 corresponding columns. If you don't need them in the output you can consider moving checks into HAVING clause:
select order_customer_id
from orders
where year_month_day between '2022-01-10' and '2022-03-14'
group by order_customer_id
having count_if(invoice_total_price = 14.45) > 0
and count_if(invoice_total_price = 17.45) > 0

How can I update if value is different or empty

I want to update my column if the vlaue is different from last value or its empty. I came up with this sql but it gives this error:
missing FROM-clause entry for table "box_per_pallet"
SQL:
UPDATE products AS p
SET box_per_pallet[0] = (CASE WHEN p.box_per_pallet.length = 0 THEN 0 ELSE p.box_per_pallet[0] END)
WHERE sku = 'A' AND store_id = 1
This is what I came up with based on your input. ARRAY_LENGTH takes the array and the dimension you want to check the length of as parameters. This missing from clause is because Postgres thinks that p.box_per_pallet is something other than an array and it can't find that anywhere in the query. You can't use the dot operator on arrays like p.box_per_pallet.length. It's like saying, "find the length field on table box_per_pallet in schema p".
UPDATE products
SET box_per_pallet[0] = CASE WHEN ARRAY_LENGTH(box_per_pallet, 1) = 0
OR box_per_pallet IS NULL
OR box_per_pallet[0] <> 0 -- your new value?
THEN 0
ELSE box_per_pallet[0]
END
WHERE sku = 'A'
AND store_id = 1
;
Here is a link to a dbfiddle showing the idea.

Hive summary function inside case statement

I am trying to write a simple Hive query:
select sum(case when pot_sls_q > 2* avg(pit_sls_q) then 1 else 0)/count(*) from prd_inv_fnd.item_pot_sls where dept_i=43 and class_i=3 where p_wk_end_d = 2014-06-28;
Here pit_sls_q and pot_sls_q both are columns in the Hive table and I want proportion of records which have pot_sls_q more than 2 times average of pit_sls_q. However I get error:
FAILED: SemanticException [Error 10128]: Line 1:95 Not yet supported place for UDAF 'avg'
To fool around I even tried using some window function:
select sum(case when pot_sls_q > 2* avg(pit_sls_q) over (partition by dept_i,class_i) then 1 else 0 end)/count(*) from prd_inv_fnd.item_pot_sls where dept_i=43 and class_i=3 and p_wk_end_d = '2014-06-28';
which is fine considering the fact filtering or partitioning the data on same condition is "same" data essentially but even with this I get error:
FAILED: SemanticException [Error 10002]: Line 1:36 Invalid column reference 'avg': (possible column names are: p_wk_end_d, dept_i, class_i, item_i, pit_sls_q, pot_sls_q)
please suggest right way of doing this.
You are using AVG inside SUM which won't work (along with other syntax errors).
Try analytic AVG OVER () this:
select sum(case when pot_sls_q > 2 * avg_pit_sls_q then 1 else 0 end) / count(*)
from (
select t.*,
avg(pit_sls_q) over () avg_pit_sls_q
from prd_inv_fnd.item_pot_sls t
where dept_i = 43
and class_i = 3
and p_wk_end_d = '2014-06-28'
) t;

How to add a column on fly ?

I am facing different kind of problem. In select query I want to add a temporary column on fly based on other columns value.
I have 2 columns
IsOpeningClosingDateToo (tinyint),
HearingDate Date
Now I want to check that if IsOpeningClosingDate = 1 then
Select HearingDate, HearingDate as 'OpeningDate'
If IsOpeningClosingDate= 2
Select HearingDate, HearingDate as 'ClosingDate'
I have tried to do this but failed:
SELECT
,[HearingDate]
,CASE [IsOpeningClosingDate]
when 1 then [HearingDate] as OpeningDate
When 0 then [HearingDate] as ClosingDate
end as 'test'
]
FROM [LitMS_MCP].[dbo].[CaseHearings]
I would suggest returning three columns. Then you can fetch the values in on the application side:
SELECT HearingDate,
(CASE WHEN IsOpeningClosingDate = 1 THEN HearingDate END) as OpeningDate,
(CASE WHEN IsOpeningClosingDate = 0 THEN HearingDate END) as ClosingDate
FROM [LitMS_MCP].[dbo].[CaseHearings];
Alternatively, you could just fetch HearingDate and IsOpeningClosingDate and do the comparison in Python.
The important point is that the columns in a SQL query are fixed by the SELECT. You cannot vary the names or types of the columns conditionally within the query.

SQL - CountIf on a column

Trying to do some calculations via SQL on my iSeries and have the following conundrum: I need to count the number of times a certain value appears in a column. My select statement is as follows:
Select
MOTRAN.ORDNO, MOTRAN.OPSEQ, MOROUT.WKCTR, MOTRAN.TDATE,
MOTRAN.LBTIM, MOROUT.SRLHU, MOROUT.RLHTD, MOROUT.ACODT,
MOROUT.SCODT, MOROUT.ASTDT, MOMAST.SSTDT, MOMAST.FITWH,
MOMAST.FITEM,
CONCAT(MOTRAN.ORDNO, MOTRAN.OPSEQ) As CON,
count (Concat(MOTRAN.ORDNO, MOTRAN.OPSEQ) )As CountIF,
MOROUT.SRLHU / (count (Concat(MOTRAN.ORDNO, MOTRAN.OPSEQ))) as calc
*(snip)*
With this information, I'm trying to count the number of times a value in CON appears. I will need this to do some math with so it's kinda important. My count statement doesn't work properly as it reports a certain value as occurring once when I see it appears 8 times.
Try putting a CASE statement inside a SUM().
SUM(CASE WHEN value = 'something' THEN 1 ELSE 0 END)
This will count the number of rows where value = 'something'.
Similary...
SUM(CASE WHEN t1.val = CONCAT(t2.val, t3.val) THEN 1 ELSE 0 END)
If you're on a supported version of the OS, ie 6.1 or higher...
You might be able to make use of "grouping set" functionality. Particularly the ROLLUP clause.
I can't say for sure without more understanding of your data.
Otherwise, you're going to need to so something like
wth Cnt as (select ORDNO, OPSEQ, count(*) as NbrOccur
from MOTRAN
group by ORDNO, OPSEQ
)
Select
MOTRAN.ORDNO, MOTRAN.OPSEQ, MOROUT.WKCTR, MOTRAN.TDATE,
MOTRAN.LBTIM, MOROUT.SRLHU, MOROUT.RLHTD, MOROUT.ACODT,
MOROUT.SCODT, MOROUT.ASTDT, MOMAST.SSTDT, MOMAST.FITWH,
MOMAST.FITEM,
CONCAT(MOTRAN.ORDNO, MOTRAN.OPSEQ) As CON,
Cnt.NbrOccur,
MOROUT.SRLHU / Cnt.NbrOccur as calc
from
motran join Cnt on mortran.ordno = cnt.ordno and mortran.opseq = cnt.opseq
*(snip)*