R function within SQL stored procedure issue - sql

I'm attempting to set up a simple R function using SQL stored procedure but I can't get it to work without errors.
Here is my script
CREATE PROCEDURE sp_aggregate (#paramGroup nvarchar(40))
AS
EXECUTE sp_execute_external_script
#language =N'R',
#script=N'
library(dplyr)
data <- data.frame(x = c(1,2,3,4,5,6,7,8,9,10), Country = c("A", "A","A","A","A",
"B", "B", "B", "B", "B"), Class = c("C", "C", "C", "C", "C", "D", "D", "D", "D", "D"))
agg_function <- function(df, column) {
enq_column <- enquo(column)
output <- df %>% group_by(!!enq_column) %>% summarize(total = sum(x))
return(output)
}
OutputDataSet <- as.data.frame(agg_function(df = data, column = paramgroup_var))
'
, #input_data_1 = N''
-- , #input_data_1_name = N'SQL_input'
, #output_data_1_name = N'OutputDataSet'
, #params = N'#paramgroup_var nvarchar(40)'
, #paramgroup_var = #paramGroup;
GO
Execute sp_aggregate #paramGroup = Country
This is the error I'm running into:
Error in grouped_df_impl(data, unname(vars), drop) :
Column `paramgroup_var` is unknown
Calls: source ... group_by.data.frame -> grouped_df -> grouped_df_impl
Error in execution. Check the output for more information.
Error in eval(ei, envir) :
Error in execution. Check the output for more information.
Calls: runScriptFile -> source -> withVisible -> eval -> eval -> .Call
Execution halted

The error is from missing a ) at the end of the data.frame()
CREATE PROCEDURE sp_aggregate (#paramGroup nvarchar(40))
AS
EXECUTE sp_execute_external_script
#language =N'R',
#script=N'
library(dplyr)
data <- data.frame(x = c(1,2,3,4,5,6,7,8,9,10),
Country = c("A", "A","A","A","A", "B", "B", "B", "B", "B"),
Class = c("C", "C", "C", "C", "C", "D", "D", "D", "D", "D"))
agg_function <- function(df, column) {
enq_column <- enquo(column)
output <- df %>% group_by(!!enq_column) %>% summarize(total = sum(x))
return(output)
}
OutputDataSet <- as.data.frame(agg_function(df = data, column = paramgroup_var))
'
, #input_data_1 = N''
-- , #input_data_1_name = N'SQL_input'
, #output_data_1_name = N'OutputDataSet'
, #params = N'#paramgroup_var nvarchar(40)'
, #paramgroup_var = #paramGroup;
GO

Related

R : x comparison (1) is possible only for atomic and list types

I am using R. In a previous post (R: Loop Producing the Following Error: Argument 1 must have names), I learned how to make a function ("create_data") for my code.
Now, I am trying to modify this function.
First, I create some data to be used for this example:
#load library
library(dplyr)
set.seed(123)
# create some data for this example
a1 = rnorm(1000,100,10)
b1 = rnorm(1000,100,5)
c1 = sample.int(1000, 1000, replace = TRUE)
train_data = data.frame(a1,b1,c1)
Here is the modified version of the function:
create_data <- function() {
#generate random numbers
random_1 = runif(1, 80, 120)
random_2 = runif(1, random_1, 120)
random_3 = runif(1, 85, 120)
random_4 = runif(1, random_3, 120)
#bin data according to random criteria
train_data <- train_data %>% mutate(cat = ifelse(a1 <= random_1 & b1 <= random_3, "a", ifelse(a1 <= random_2 & b1 <= random_4, "b", "c")))
train_data$cat = as.factor(train_data$cat)
#new splits
a_table = train_data %>%
select(a1, b1, c1) %>%
filter(cat == "a")
b_table = train_data %>%
select(a1, b1, c1) %>%
filter(cat == "b")
c_table = train_data %>%
select(a1, b1, c1) %>%
filter(cat == "c")
split_1 = runif(1,0, 1)
split_2 = runif(1, 0, 1)
split_3 = runif(1, 0, 1)
#calculate 60th quantile ("quant") for each bin
table_a = data.frame(a_table%>% group_by(cat) %>%
mutate(quant = quantile(c1, prob = split_1)))
table_b = data.frame(b_table%>% group_by(cat) %>%
mutate(quant = quantile(c1, prob = split_2)))
table_c = data.frame(c_table%>% group_by(cat) %>%
mutate(quant = quantile(c1, prob = split_3)))
#create a new variable ("diff") that measures if the quantile is bigger tha the value of "c1"
table_a$diff = ifelse(table_a$quant > table_a$c1,1,0)
table_b$diff = ifelse(table_b$quant > table_b$c1,1,0)
table_c$diff = ifelse(table_c$quant > table_c$c1,1,0)
#group all tables
final_table = rbind(table_a, table_b, table_c)
#create a table: for each bin, calculate the average of "diff"
final_table_2 = data.frame(final_table %>%
group_by(cat) %>%
summarize(
mean = mean(diff)
))
#add "total mean" to this table
final_table_2 = data.frame(final_table_2 %>% add_row(cat = "total", mean = mean(final_table$diff)))
#format this table: add the random criteria to this table for reference
final_table_2$random_1 = random_1
final_table_2$random_2 = random_2
final_table_2$random_3 = random_3
final_table_2$random_4 = random_4
final_table_2$split_1 = split_1
final_table_2$split_2 = split_2
final_table_2$split_3 = split_3
final_table$iteration_number = i
}
The error results when I try to call the function:
Error: Problem with `filter()` input `..1`.
i Input `..1` is `cat == "a"`.
x comparison (1) is possible only for atomic and list types
I have a feeling that maybe the error is occurring over here:
a_table = train_data %>%
select(a1, b1, c1) %>%
filter(cat == "a")
I tried to replace this "select" with a non-dplyr version:
a_table <- train_data[cat == "a", ]
But this also producing an error:
Error in cat == "a" :
comparison (1) is possible only for atomic and list types
Can someone please show me what I am doing wrong?
Thanks
You are selecting only 3 columns here which does not include cat column hence you get the error.
a_table = train_data %>%
select(a1, b1, c1) %>%
filter(cat == "a")
Instead you can first filter and then select.
a_table = train_data %>%
filter(cat == "a") %>%
select(a1, b1, c1)
Same should be applied for b_table and c_table.
We could do this in base R
a_table <- subset(train_data, cat == "a", select = c(a1, b1, c1))

How to set the number of y-axis values displayed when faceting

I have a faceted bar graph.
dat <- data.frame(ID = c("A", "A", "B", "B", "C", "C"),
A = c("Type 1", "Type 2", "Type 1", "Type 2", "Type 1", "Type 2"),
B = c(1, 2, 53, 87, 200, 250))
ggplot(data = dat, aes(x = A, y = B)) +
geom_bar(stat = "identity") +
facet_wrap(~ID, scales= "free_y")
How do I code to only have 3 y-axis values displayed per graph?
I've tried
+scale_y_continuous(breaks=3)
To get more control over the breaks you can write your own breaks function. The following code gives you exactly 3 breaks. However, this very basic approach does not necessarily result in "pretty" breaks:
library(ggplot2)
my_breaks <- function(x) {
seq(0, round(max(x), 0), length.out = 3)
}
my_limits <- function(x) {
c(x[1], ceiling(x[2]))
}
# Dataset 2
dat <- data.frame(
ID = c("A", "A", "A","B", "B", "B", "C", "C", "C"),
A = c("Type 1", "Type 2", "Type 3", "Type 1", "Type 2", "Type 3","Type 1", "Type 2", "Type 3"),
B = c(1.7388, 4.2059, .7751, .9489, 2.23405, .666589, 0.024459, 1.76190, 0.066678))
ggplot(data = dat, aes(x = A, y = B)) +
geom_bar(stat = "identity") +
facet_wrap(~ID, scales= "free_y") +
scale_y_continuous(breaks = my_breaks, limits = my_limits)
# Dataset 1
dat <- data.frame(ID = c("A", "A", "B", "B", "C", "C"),
A = c("Type 1", "Type 2", "Type 1", "Type 2", "Type 1", "Type 2"),
B = c(1, 2, 53, 87, 200, 250))
ggplot(data = dat, aes(x = A, y = B)) +
geom_bar(stat = "identity") +
facet_wrap(~ID, scales= "free_y") +
scale_y_continuous(breaks = my_breaks, limits = my_limits)
Created on 2020-04-10 by the reprex package (v0.3.0)

Python - find cell reference within merged_cells collection in openpyxl

I wish to identify if a cell in a worksheet is found within the merged_cells collection returned by openpyxl.
The merged_cells range looks like this (VSCode debugger):
I have the cell reference J31 - which is found in this collection. How would I write a function that returns true if that cell is found in the merged_cells.ranges collection?
for cell in ^^merged_range^^:
if cell==your_special_cell:
return True
^^merged_range^^ must be of type openpyxl.worksheet.cell_range
Further to D.Banakh's answer (+1), try something like this (building upon a previous example I wrote for someone else, since there is little context to your question):
for cell in ws.merged_cells.ranges:
#print(cellRef +' ==> '+ str(cell.min_row) +'/'+ str(cell.max_row) +'/'+ str(cell.min_col) +'/'+ str(cell.max_col))
if (int(cell.min_row) <= int(row) <= int(cell.max_row)) and (int(cell.min_col) <= int(col) <= int(cell.max_col)):
print('Cell ' +cellRef+ ' is a merged cell')
Example within a context:
import re
cellBorders = fnGetCellBorders(ws, cellRef)
if ('T' in cellBorders) or ('L' in cellBorders) or ('R' in cellBorders) or ('B' in cellBorders) or ('M' in cellBorders):
print('Cell has border *OR* is a merged cell and borders cannot be checked')
def getCellBorders(ws, cellRef):
tmp = ws[cellRef].border
brdrs = ''
if tmp.top.style is not None: brdrs += 'T'
if tmp.left.style is not None: brdrs += 'L'
if tmp.right.style is not None: brdrs += 'R'
if tmp.bottom.style is not None: brdrs += 'B'
if (brdrs == '') and ('condTableTopLeftCell' in refs):
if fnIsCellWithinMergedRange(ws, cellRef): brdrs = 'M'
return brdrs
def fnIsCellWithinMergedRange(ws, cellRef):
ans = False
col = fnAlphaToNum(re.sub('[^A-Z]', '', cellRef))
row = re.sub('[^0-9]', '', cellRef)
for cell in ws.merged_cells.ranges:
if (int(cell.min_row) <= int(row) <= int(cell.max_row)) and (int(cell.min_col) <= int(col) <= int(cell.max_col)):
ans = True
return ans
def fnAlphaToNum(ltr):
ab = ["MT", "A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L", "M", "N", "O", "P", "Q", "R", "S", "T", "U", "V", "W", "X", "Y", "Z"]
return ab.index(ltr)
References:
OpenPyXL - How to query cell borders?
How to detect merged cells in excel with openpyxl
https://bitbucket.org/openpyxl/openpyxl/issues/911/borders-on-merged-cells-are-not-preserved

Referencing Matrix (VB)

I have a matrix (5x5) with values in them for example:
Matrix (1,1) Value: 'a'
Matrix (1, 2) Value: 'b'
Matrix (2, 1) Value: 'c'
how would I be able to find the letter 'a' in that matrix and have it output the coordinates?
ie
user inputs 'b'
[searches for 'b' in table]
output (1,2)
thanks in advance
It's as simple as:
For i As Integer = 0 To LengthOfMatrix - 1
For y As Integer = 0 To HeightOfMatrix - 1
If Matrix(i, y) = "a" Then Console.Write(i & " " & y & vbCrLf)
Next
Next
Asuming that you declared Matrix as:
Dim Matrix As Char(,) = {{"a", "b", "c", "d", "e"}, {"a", "b", "c", "d", "e"}, {"a", "b", "c", "d", "e"}, {"a", "b", "c", "d", "e"}, {"a", "b", "c", "d", "e"}}
And LengthOfMatrix And HeightOfMatrix should be the dimentions of your matrix. They could be switched to something more dynamic like:
For i As Integer = 0 To Matrix.GetLength(0) - 1 'Get's the length of the first dimension
For y As Integer = 0 To Matrix.GetLength(1) - 1 'Get's the length of the second dimension
If Matrix(i, y) = "a" Then Console.Write(i & " " & y & vbCrLf)
Next
Next
In a short description, all that this loop does is it goes through all of the elements of the matrix and outputs the coordinates of every element that matches a certain criteria (In this case - equals to 'a').
Note: In most programming languages array's indexes begin from 0, so the first element in your matrix will be at coords (0,0).

An SQL statement into SSRS

I tried to incorporate the following into SSRS but failed.
If XXX = “A” then display “AT”
Else if XXX = “B” then display “BEE”
Else if XXX = “C” then display “CAR”
Else display “Other”
I tried
=Switch(
Fields!XXX.Value = "A", "AT",
Fields!XXX.Value = "B", "BEE",
Fields!XXX.Value = "C", "CAR", "Other")
You almost had it. For every output in the Switch function must be paired with a condition. Just make your last condition evaluate to True.
=Switch(
Fields!XXX.Value = "A", "AT",
Fields!XXX.Value = "B", "BEE",
Fields!XXX.Value = "C", "CAR",
True, "Other"
)
You want something like this:
=iif(Fields!XXX.Value = "A", "AT", iif(Fields!XXX.Value = "B", "BEE", iif(Fields!XXX.Value = "C", "CAR", "Other")))
[check the parens in the expression builder]