I am working to create a a matrix of missingness for a SQL database consisting of 5 tables and nearly 10 years of data. I have established ODBC connectivity and am using the RODBC package in R as my working environment. I am trying to write a function that will output a count of rows for each year for each table, a count and percent of null values (values not present) in a given year for a given table, and a count and percent of missing (questions skipped/not answered) values for a given table. I am working with the code below, trying to get it to work on one variable then turning it into a function once it works. However, when I run this code(see below), it appears to not be working, and I believe the issue lies with assigning an integer value to the character for null, NA. I am getting this message when trying to list vars in the function:
Error in as.environment(pos) : no item called "22018 245 [Microsoft][ODBC SQL Server Driver][SQL Server]Conversion failed when converting the varchar value 'NA' to data type int." on the search list.
Also, when I try to find the environment for the function, R returns NULL. I do not necessarily want to assign a new value to the already existent variable, and I new to SQL, but I am trying to do something along these lines If X = 'NA' then Y = 1 else 0. I get the following error message when I try to run the final 2 lines creating the percent vars:
Error in eval(substitute(expr), data, enclos = parent.frame()) : invalid 'envir' argument of type 'character'
Any insight?
test1 <- sqlQuery(channel, "select
[EVENT_YEAR] AS 'YEAR',
COUNT(*) AS 'TOTAL',
SUM(CASE WHEN MOTHER_EDUCATION_TRENDABLE = 'NA' THEN 1 ELSE 0 END) AS 'NULL_VAL',
SUM(CASE WHEN MOTHER_EDUCATION_TRENDABLE = -1 THEN 1 ELSE 0 END) AS 'MISS_VAL'
from [GA_CMH].[dbo].[BIRTHS]
GROUP BY [EVENT_YEAR]
ORDER BY [EVENT_YEAR]")
test1$nullpct<-with(test1, NULL_VAL/TOTAL)
test1$misspct<-with(test1, MISS_VAL/TOTAL)
I believe the data type of your column MOTHER_EDUCATION_TRENDABLE is an integer, if so, try:
select
[EVENT_YEAR] AS 'YEAR',
COUNT(*) AS 'TOTAL',
SUM(CASE WHEN MOTHER_EDUCATION_TRENDABLE IS NULL THEN 1 ELSE 0 END) AS 'NULL_VAL',
SUM(CASE WHEN MOTHER_EDUCATION_TRENDABLE = -1 THEN 1 ELSE 0 END) AS 'MISS_VAL'
from [GA_CMH].[dbo].[BIRTHS]
GROUP BY [EVENT_YEAR]
ORDER BY [EVENT_YEAR]
Related
trying to run a script in athena where I can pull back customers who have made purchases of two specified values (14.45 and 17.45). Thought I would make a column for each value appearing and filter out for both columns >0 when downloaded onto excel but my code isn't working, any help.
select order_customer_id,
sum(invoice_total_price= cast('14.45' as decimal(20,2))) > 0,
sum(invoice_total_price = cast('17.45' as decimal(20,2))) > 0
from orders
where year_month_day between '2022-01-10' and '2022-03-14'
group by order_customer_id
Get this error when I run it
Unexpected parameters (boolean) for function sum. Expected: sum(double) , sum(real) , sum(bigint) , sum(interval day to second) , sum(interval year to month) , sum(decimal(p,s))
I done the cast within the two sum columns as the invoice_total_price is stored as decimal
You can use count_if, also potentially cast from string is not needed:
select order_customer_id,
count_if(invoice_total_price = 14.45) > 0 has_14,
count_if(invoice_total_price = 17.45) > 0 has_17
from orders
where year_month_day between '2022-01-10' and '2022-03-14'
group by order_customer_id
Which will give you a table with 3 corresponding columns. If you don't need them in the output you can consider moving checks into HAVING clause:
select order_customer_id
from orders
where year_month_day between '2022-01-10' and '2022-03-14'
group by order_customer_id
having count_if(invoice_total_price = 14.45) > 0
and count_if(invoice_total_price = 17.45) > 0
I want to update my column if the vlaue is different from last value or its empty. I came up with this sql but it gives this error:
missing FROM-clause entry for table "box_per_pallet"
SQL:
UPDATE products AS p
SET box_per_pallet[0] = (CASE WHEN p.box_per_pallet.length = 0 THEN 0 ELSE p.box_per_pallet[0] END)
WHERE sku = 'A' AND store_id = 1
This is what I came up with based on your input. ARRAY_LENGTH takes the array and the dimension you want to check the length of as parameters. This missing from clause is because Postgres thinks that p.box_per_pallet is something other than an array and it can't find that anywhere in the query. You can't use the dot operator on arrays like p.box_per_pallet.length. It's like saying, "find the length field on table box_per_pallet in schema p".
UPDATE products
SET box_per_pallet[0] = CASE WHEN ARRAY_LENGTH(box_per_pallet, 1) = 0
OR box_per_pallet IS NULL
OR box_per_pallet[0] <> 0 -- your new value?
THEN 0
ELSE box_per_pallet[0]
END
WHERE sku = 'A'
AND store_id = 1
;
Here is a link to a dbfiddle showing the idea.
I am trying to write a simple Hive query:
select sum(case when pot_sls_q > 2* avg(pit_sls_q) then 1 else 0)/count(*) from prd_inv_fnd.item_pot_sls where dept_i=43 and class_i=3 where p_wk_end_d = 2014-06-28;
Here pit_sls_q and pot_sls_q both are columns in the Hive table and I want proportion of records which have pot_sls_q more than 2 times average of pit_sls_q. However I get error:
FAILED: SemanticException [Error 10128]: Line 1:95 Not yet supported place for UDAF 'avg'
To fool around I even tried using some window function:
select sum(case when pot_sls_q > 2* avg(pit_sls_q) over (partition by dept_i,class_i) then 1 else 0 end)/count(*) from prd_inv_fnd.item_pot_sls where dept_i=43 and class_i=3 and p_wk_end_d = '2014-06-28';
which is fine considering the fact filtering or partitioning the data on same condition is "same" data essentially but even with this I get error:
FAILED: SemanticException [Error 10002]: Line 1:36 Invalid column reference 'avg': (possible column names are: p_wk_end_d, dept_i, class_i, item_i, pit_sls_q, pot_sls_q)
please suggest right way of doing this.
You are using AVG inside SUM which won't work (along with other syntax errors).
Try analytic AVG OVER () this:
select sum(case when pot_sls_q > 2 * avg_pit_sls_q then 1 else 0 end) / count(*)
from (
select t.*,
avg(pit_sls_q) over () avg_pit_sls_q
from prd_inv_fnd.item_pot_sls t
where dept_i = 43
and class_i = 3
and p_wk_end_d = '2014-06-28'
) t;
I am facing different kind of problem. In select query I want to add a temporary column on fly based on other columns value.
I have 2 columns
IsOpeningClosingDateToo (tinyint),
HearingDate Date
Now I want to check that if IsOpeningClosingDate = 1 then
Select HearingDate, HearingDate as 'OpeningDate'
If IsOpeningClosingDate= 2
Select HearingDate, HearingDate as 'ClosingDate'
I have tried to do this but failed:
SELECT
,[HearingDate]
,CASE [IsOpeningClosingDate]
when 1 then [HearingDate] as OpeningDate
When 0 then [HearingDate] as ClosingDate
end as 'test'
]
FROM [LitMS_MCP].[dbo].[CaseHearings]
I would suggest returning three columns. Then you can fetch the values in on the application side:
SELECT HearingDate,
(CASE WHEN IsOpeningClosingDate = 1 THEN HearingDate END) as OpeningDate,
(CASE WHEN IsOpeningClosingDate = 0 THEN HearingDate END) as ClosingDate
FROM [LitMS_MCP].[dbo].[CaseHearings];
Alternatively, you could just fetch HearingDate and IsOpeningClosingDate and do the comparison in Python.
The important point is that the columns in a SQL query are fixed by the SELECT. You cannot vary the names or types of the columns conditionally within the query.
Trying to do some calculations via SQL on my iSeries and have the following conundrum: I need to count the number of times a certain value appears in a column. My select statement is as follows:
Select
MOTRAN.ORDNO, MOTRAN.OPSEQ, MOROUT.WKCTR, MOTRAN.TDATE,
MOTRAN.LBTIM, MOROUT.SRLHU, MOROUT.RLHTD, MOROUT.ACODT,
MOROUT.SCODT, MOROUT.ASTDT, MOMAST.SSTDT, MOMAST.FITWH,
MOMAST.FITEM,
CONCAT(MOTRAN.ORDNO, MOTRAN.OPSEQ) As CON,
count (Concat(MOTRAN.ORDNO, MOTRAN.OPSEQ) )As CountIF,
MOROUT.SRLHU / (count (Concat(MOTRAN.ORDNO, MOTRAN.OPSEQ))) as calc
*(snip)*
With this information, I'm trying to count the number of times a value in CON appears. I will need this to do some math with so it's kinda important. My count statement doesn't work properly as it reports a certain value as occurring once when I see it appears 8 times.
Try putting a CASE statement inside a SUM().
SUM(CASE WHEN value = 'something' THEN 1 ELSE 0 END)
This will count the number of rows where value = 'something'.
Similary...
SUM(CASE WHEN t1.val = CONCAT(t2.val, t3.val) THEN 1 ELSE 0 END)
If you're on a supported version of the OS, ie 6.1 or higher...
You might be able to make use of "grouping set" functionality. Particularly the ROLLUP clause.
I can't say for sure without more understanding of your data.
Otherwise, you're going to need to so something like
wth Cnt as (select ORDNO, OPSEQ, count(*) as NbrOccur
from MOTRAN
group by ORDNO, OPSEQ
)
Select
MOTRAN.ORDNO, MOTRAN.OPSEQ, MOROUT.WKCTR, MOTRAN.TDATE,
MOTRAN.LBTIM, MOROUT.SRLHU, MOROUT.RLHTD, MOROUT.ACODT,
MOROUT.SCODT, MOROUT.ASTDT, MOMAST.SSTDT, MOMAST.FITWH,
MOMAST.FITEM,
CONCAT(MOTRAN.ORDNO, MOTRAN.OPSEQ) As CON,
Cnt.NbrOccur,
MOROUT.SRLHU / Cnt.NbrOccur as calc
from
motran join Cnt on mortran.ordno = cnt.ordno and mortran.opseq = cnt.opseq
*(snip)*