Numeric Value out of Range when Inserting to SQL from R - sql

Edit: Here is the column file for you to try to insert to your database: https://easyupload.io/jls3mk
So I narrowed my problem down to 1 column in my dataframe. It's a numeric column from 0-260000 with NaNs in it.
When I try to insert pred_new_export[46] (only column 46) using this statement:
dbWriteTable(conn = con,
name = SQL("ML.CreditLineApplicationOutputTemp"),
value = pred_new_export[46], overwrite=TRUE) ## x is any data frame
I get the issue:
Error in result_insert_dataframe(rs#ptr, values, batch_rows) :
nanodbc/nanodbc.cpp:1655: 22003: [Microsoft][SQL Server Native Client 11.0]Numeric value out of range
I've looked at this for 2 hours and it's been driving me insane. I can't figure out why it wouldn't insert into a fresh SQL table. The column only contains numbers.
The numbers are within range of the column:
This is the SQL schema create statement.
USE [EDWAnalytics]
GO
/****** Object: Table [ML].[CreditLineApplicationOutputTemp] Script Date: 4/20/2022 9:26:22 AM ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [ML].[CreditLineApplicationOutputTemp](
[MedianIncomeInAgeBracket] [float] NULL
) ON [PRIMARY]
GO

You said it has NaNs, which many DBMSes do not understand. I suggest you replace all NaN with NA.
Reprex:
# con <- DBI::dbConnect(..)
DBI::dbExecute(con, "create table quux (num float)")
# [1] 0
df <- data.frame(num=c(1,NA,NaN))
DBI::dbAppendTable(con, "quux", df)
# Error in result_insert_dataframe(rs#ptr, values, batch_rows) :
# nanodbc/nanodbc.cpp:1655: 42000: [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]The incoming tabular data stream (TDS) remote procedure call (RPC) protocol stream is incorrect. Parameter 1 (""): The supplied value is not a valid instance of data type float. Check the source data for invalid values. An example of an invalid value is data of numeric type with scale greater than precision.
df$num[is.nan(df$num)] <- NA
DBI::dbAppendTable(con, "quux", df)
DBI::dbGetQuery(con, "select * from quux")
# num
# 1 1
# 2 NA
# 3 NA
FYI, the version of SQL Server ODBC you are using is rather antiquated: even the most recent release of 11 was in 2017. For many reasons, I suggest you upgrade to ODBC Driver for SQL Server 17 (the 17 has nothing to do with the version of SQL Server to which you are connecting).
FYI, my DBMS/version:
cat(DBI::dbGetQuery(con, "select ##version")[[1]], "\n")
# Microsoft SQL Server 2019 (RTM-CU14) (KB5007182) - 15.0.4188.2 (X64)
# Nov 3 2021 19:19:51
# Copyright (C) 2019 Microsoft Corporation
# Developer Edition (64-bit) on Linux (Ubuntu 20.04.3 LTS) <X64>
though this is also the case with SQL Server 2016 (and likely other versions).

Related

I've performed a JOIN using bigrquery and the dbGetQuery function. Now I'd like to query the temporary table I've created but can't connect

I'm afraid that if a bunch of folks start running my actual code I'll be billed for the queries so my example code is for a fake database.
I've successfully established my connection to BigQuery:
con <- dbConnect(
bigrquery::bigquery(),
project = 'myproject',
dataset = 'dataset',
billing = 'myproject'
)
Then performed a LEFT JOIN using the coalesce function:
dbGetQuery(con,
"SELECT
`myproject.dataset.table_1x`.Pokemon,
coalesce(`myproject.dataset.table_1`.Type_1,`myproject.dataset.table_2`.Type_1) AS Type_1,
coalesce(`myproject.dataset.table_1`.Type_2,`myproject.dataset.table_2`.Type_2) AS Type_2,
`myproject.dataset.table_1`.Total,
`myproject.dataset.table_1`.HP,
`myproject.dataset.table_1`.Attack,
`myproject.dataset.table_1`.Special_Attack,
`myproject.dataset.table_1`.Defense,
`myproject.dataset.table_1`.Special_Defense,
`myproject.dataset.table_1`.Speed,
FROM `myproject.dataset.table_1`
LEFT JOIN `myproject.dataset.table_2`
ON `myproject.dataset.table_1`.Pokemon = `myproject.dataset.table_2`.Pokemon
ORDER BY `myproject.dataset.table_1`.ID;")
The JOIN produced the table I intended and now I'd like to query that table but like...where is it? How do I connect? Can I save it locally so that I can start working my analysis in R? Even if I go to BigQuery, select the Project History tab, select the query I just ran in RStudio, and copy the Job ID for the temporary table, I still get the following error:
Error: Job 'poke-340100.job_y0IBocmd6Cpy-irYtNdLJ-mWS7I0.US' failed
x Syntax error: Unexpected string literal 'poke-340100:US.bquxjob_7c3a7664_17ed44bb4ae' at [2:6] [invalidQuery]
Run `rlang::last_error()` to see where the error occurred.
And if I follow up:
Run `rlang::last_error()` to see where the error occurred.
> rlang::last_error()
<error/rlang_error>
Job 'poke-340100.job_y0IBocmd6Cpy-irYtNdLJ-mWS7I0.US' failed
x Syntax error: Unexpected string literal 'poke-340100:US.bquxjob_7c3a7664_17ed44bb4ae' at [2:6] [invalidQuery]
Backtrace:
1. DBI::dbGetQuery(con, "SELECT *\nFROM 'poke-340100:US.bquxjob_7c3a7664_17ed44bb4ae'\nWHERE Type_1 IS NULL;")
2. DBI::dbGetQuery(con, "SELECT *\nFROM 'poke-340100:US.bquxjob_7c3a7664_17ed44bb4ae'\nWHERE Type_1 IS NULL;")
3. DBI:::.local(conn, statement, ...)
5. bigrquery::dbSendQuery(conn, statement, ...)
6. bigrquery:::BigQueryResult(conn, statement, ...)
7. bigrquery::bq_job_wait(job, quiet = conn#quiet)
Run `rlang::last_trace()` to see the full context.
> rlang::last_trace()
<error/rlang_error>
Job 'poke-340100.job_y0IBocmd6Cpy-irYtNdLJ-mWS7I0.US' failed
x Syntax error: Unexpected string literal 'poke-340100:US.bquxjob_7c3a7664_17ed44bb4ae' at [2:6] [invalidQuery]
Backtrace:
x
1. +-DBI::dbGetQuery(con, "SELECT *\nFROM 'poke-340100:US.bquxjob_7c3a7664_17ed44bb4ae'\nWHERE Type_1 IS NULL;")
2. \-DBI::dbGetQuery(con, "SELECT *\nFROM 'poke-340100:US.bquxjob_7c3a7664_17ed44bb4ae'\nWHERE Type_1 IS NULL;")
3. \-DBI:::.local(conn, statement, ...)
4. +-DBI::dbSendQuery(conn, statement, ...)
5. \-bigrquery::dbSendQuery(conn, statement, ...)
6. \-bigrquery:::BigQueryResult(conn, statement, ...)
7. \-bigrquery::bq_job_wait(job, quiet = conn#quiet)
Can someone please explain? Is it just that I can't query a temporary table with the bigrquery package?
From looking at the documentation here and here, the problem might just be that you did not assign the results anywhere.
local_df = dbGetQuery(...
should take the results from your database query and copy them into local R memory. Take care as there is no check for the size of the results, so it is easy to run out of memory in when doing this.
You have tagged the question with dbplyr, but it looks like you are just using the DBI package. If you want to be writing R and have it translated to SQL, then you can do this using dbplyr. It would look something like this:
con <- dbConnect(...) # your connection details here
remote_tbl1 = tbl(con, from = "table_1")
remote_tbl2 = tbl(con, from = "table_2")
new_remote_tbl = remote_tbl1 %>%
left_join(remote_tbl2, by = "Pokemon", suffix = c("",".y")) %>%
mutate(Type_1 = coalesce(Type_1, Type_1.y),
Type_2 = coalesce(Type_2, Type_2.y)) %>%
select(ID, Pokemon, Type_1, Type_2, ...) %>% # list your return columns
arrange(ID)
When you use this approach, new_remote_tbl can be thought of as a new table in the database which you can query and manipulate further. (It is not actually a table - no data was saved to disc - but you can query it and interact with it as if it were and the database will produce it for you on demand).
There are some limitations of working with a remote table (the biggest is you are limited to commands that dbplyr can translate into SQL). When you want to copy the current remote table into local R memory, use collect:
local_df = remote_df %>%
collect()

Insert new timestamp value to acc table in kamailio

I want to add a new column to acc table. I created a new column in the acc table of type timestamp and named it ring_time. In every call I put the ring time to a $dlg_var like this:
$dlg_var(ringtime) = $Ts;
Then I add a extra column in config like this:
modparam("acc", "log_extra", "src_user=$fU;src_domain=$fd;src_ip=$si;" "dst_ouser=$tU;dst_user=$rU;dst_domain=$rd;ring_time=$dlg_var(ringtime)")
but when I try to test it, I always get:
db_mysql [km_dbase.c:122]: db_mysql_submit_query(): driver error on query: Incorrect datetime value: '1591361996' for column kamailio.acc.ring_time at row 1 (1292)
Jun 5 17:29:59 kamailio /usr/sbin/kamailio[22901]: ERROR: {2 102 INVITE 105a0f4a3d99a0a5558355e54b43f4e1#192.168.1.121:5060} <core> [db_query.c:244]: db_do_insert_cmd(): error while submitting query
Jun 5 17:29:59 kamailio /usr/sbin/kamailio[22901]: ERROR: {2 102 INVITE 105a0f4a3d99a0a5558355e54b43f4e1#192.168.1.121:5060} acc [acc.c:477]: acc_db_request(): failed to insert into database
Sounds like an error with the SQL INSERT query, if I had to guess I'd say you're being caught out by the date format in the SQL table not matching the date format you're pushing to it.
I don't know the structure of your database, but there's a simple trick I use for debugging SQL queries when I can't see the query being run;
Start up Wireshark/TCPdump on the machine and packet capture for all SQL traffic (MySQL is port 3306) and replicate the error.
From the packet capture and you'll be able to see the Query Kamailio's database engine ran.
If the error "db_mysql [km_dbase.c:122]: db_mysql_submit_query(): driver error on query: Incorrect datetime value: '1591361996' for column kamailio.acc.ring_time at row 1 (1292)", the '1591361996' looks like it is an epoch for the $dlg_var(ringtime). The "Incorrect datetime value" part of the error looks like the database is trying to store the value in datetime data type so a data type mismatch. Double-check and you may need either change the ringtime to convert to datetime or change the database column to a type that will take epoch.

result_fetch(res#ptr, n)': nanodbc/nanodbc.cpp:2966: 07009: [Microsoft][ODBC Driver 13 for SQL Server]Invalid Descriptor Index

I have problem with MSSQL in R language, similar like in R DBI ODBC error: nanodbc/nanodbc.cpp:3110: 07009: [Microsoft][ODBC Driver 13 for SQL Server]Invalid Descriptor Index , but a little another, or I don't understand something.
I have clearly connection with DB and my SELECT works when I send something like this:
third <- DBI::dbGetQuery(con, "SELECT TOP 1
arr_delay_new,
fl_date,
carrier,
origin_city_name,
dest_city_name
FROM Flight_delays
ORDER BY arr_delay_new DESC")
Problem is in the columns order.
I have to show response in other order - like this:
third <- DBI::dbGetQuery(con, "SELECT TOP 1
carrier,
arr_delay_new,
fl_date,
origin_city_name,
dest_city_name
FROM Flight_delays
ORDER BY arr_delay_new DESC")
and when I send this request - is error:
"result_fetch(res#ptr, n)': nanodbc/nanodbc.cpp:2966: 07009: [Microsoft][ODBC Driver 13 for SQL Server]Invalid Descriptor Index"
How I can set up this or which workaround could help me change order?
I'm fresh in R-language, so sorry if it's to easy
EDIT: if you are on odbc-1.3.0 or older, then skip this portion and go to the original answer, below. (Or update and reap the benefits.)
Starting with odbc-1.3.1, the underlying code works around the fundamental ODBC "feature" (bug). With the update, this particular error no longer indicates a problem with column-order (if it occurs at all).
Edit 2: make sure you're using a recent-enough version of Microsoft's ODBC driver (OS-level), either "ODBC Driver 17 for SQL Server" or "ODBC Driver 18 for SQL Server". I don't think (but have not verified that) this is sensitive to the subversions within 17 or 18.
# con <- DBI::dbConnect(...)
DBI::dbExecute(con, "create table test (id int, longstr nvarchar(max), shortstr nvarchar(64))")
DBI::dbWriteTable(con, "test", data.frame(id=1, longstr="hello", shortstr="world"), create=FALSE, append=TRUE)
DBI::dbGetQuery(con, "select * from test")
# id longstr shortstr
# 1 1 hello world
Huge accolades to #detule (author of PR !415), and to #Jim (#jimhester on HG) and #krlmlr (among several others) for updating and maintaining odbc.
(for odbc-1.3.0 and older)
Up front, the order of columns matters.
This is a long-standing error when using Microsoft's own "ODBC Driver": in the ODBC standard, Microsoft says (arbitrarily, I think, since no other drivers or DBMSes feel this is necessary) that "long data" must all be at the end of the query. "Long data" is vague, even MS's page says "such as 255 character", not sure if that's the firm number.
Unfortunately, as long as you're using MS's ODBC drivers for its own SQL Server, then it doesn't matter if this is R or python or Access, it's still broken. (Actually, they don't think it's broken.)
So the fix is to determine which columns are "long" and ensure they are the last column(s) selected.
For example:
# con <- DBI::dbConnect(...)
DBI::dbExecute(con, "create table test (id int, longstr nvarchar(max), shortstr nvarchar(64))")
DBI::dbGetQuery(con, "select column_name, data_type, character_maximum_length from information_schema.columns where table_name='test'")
# column_name data_type character_maximum_length
# 1 id int NA
# 2 longstr nvarchar -1
# 3 shortstr nvarchar 64
In this case, longstr's length is -1 indicating "max"; even 255 would be too big.
DBI::dbWriteTable(con, "test", data.frame(id=1, longstr="hello", shortstr="world"), create=FALSE, append=TRUE)
DBI::dbGetQuery(con, "select * from test")
# Error in result_fetch(res#ptr, n) :
# nanodbc/nanodbc.cpp:2966: 07009: [Microsoft][ODBC Driver 17 for SQL Server]Invalid Descriptor Index
### must reconnect
# con <- DBI::dbConnect(...)
DBI::dbGetQuery(con, "select id, shortstr, longstr from test")
# id shortstr longstr
# 1 1 world hello
References:
https://learn.microsoft.com/en-us/sql/odbc/reference/develop-app/getting-long-data
https://github.com/r-dbi/odbc/issues/112
https://github.com/r-dbi/odbc/issues/86
https://github.com/nanodbc/nanodbc/issues/228 (discussion about workaround/resolution!)
perhaps many more.
The solution of updating the ODBC package is dependent on the Microsoft ODBC Driver you're using. When I upgraded to ODBC-1.3.3 the problem returned. Upgrading the Microsoft ODBC Driver to Microsoft ODBC Driver 17 for SQL Server (https://learn.microsoft.com/en-us/sql/connect/odbc/download-odbc-driver-for-sql-server?view=sql-server-ver15) solved the problem. So, if the above solution does not work for you, try upgrading your driver.

pandas read_sql Imports NULLs as 0s

Edit:
Apparently my connection (PYODBC) is reading in the NULLs as 0s. I'd like to correct this on a "higher" level as I'll be reading in lots of SQL files. I can't find anything in the documentation to prevent this. Any suggestions?
I am importing a SQL file/query as a string and then using pd.read_sql to query my database. In SQL Server I can do calculations between NULL values which results in a NULL value, however, in Python, this errors out my script.
Query ran in SQL Server:
SELECT PurchaseOrderID, POTotalQtyVouched, NewColumn = PurchaseOrderID/POTotalQtyVouched FROM #POQtys;
Here is my desired output after running the query (which works in SQL Server):
PurchaseOrderID POTotalQtyVouched NewColumn
NULL NULL NULL
007004 8 875.5
008017 21 381.761904761905
008478 NULL NULL
Running the query in Python:
query = '''
...
[Other code defining #POQTYs]
...
SELECT PurchaseOrderID, POTotalQtyVouched, NewColumn = PurchaseOrderID/POTotalQtyVouched FROM #POQtys;
'''
conn = pyodbc.connect('Driver={ODBC Driver 17 for SQL Server};'
'Server=MSS-SL-SQL;'
'Database=TRACE DB;'
'Trusted_Connection=yes;')
df = pd.read_sql(query, conn)
Error in Python:
DataError: ('22012', '[22012] [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]Divide by zero error encountered. (8134) (SQLFetch)')

saved data frame is not shown correctly in sql server

I have data frame named distTest which have columns with UTF-8 format. I want to save the distTest as table in my sql database. My code is as follows;
library(RODBC)
load("distTest.RData")
Sys.setlocale("LC_CTYPE", "persian")
dbhandle <- odbcDriverConnect('driver={SQL Server};server=****;database=TestDB;
trusted_connection=true',DBMSencoding="UTF-8" )
Encoding(distTest$regsub)<-"UTF-8"
Encoding(distTest$subgroup)<-"UTF-8"
sqlSave(dbhandle,distTest,
tablename = "DistBars", verbose = T, rownames = FALSE, append = TRUE)
I considered DBMSencoding for my connection and encodings Encoding(distTest$regsub)<-"UTF-8"
Encoding(distTest$subgroup)<-"UTF-8"
for my columns. However, when I save it to sql the columns are not shown in correct format, and they are like this;
When I set fast in sqlSave function to FALSE, I got this error;
Error in sqlSave(dbhandle, Distbars, tablename = "DistBars", verbose =
T, : 22001 8152 [Microsoft][ODBC SQL Server Driver][SQL
Server]String or binary data would be truncated. 01000 3621
[Microsoft][ODBC SQL Server Driver][SQL Server]The statement has been
terminated. [RODBC] ERROR: Could not SQLExecDirect 'INSERT INTO
"DistBars" ( "regsub", "week", "S", "A", "F", "labeled_cluster",
"subgroup", "windows" ) VALUES ( 'ظâ€', 5, 4, 2, 3, 'cl1', 'ط­ظ…ظ„
ط²ط¨ط§ظ„ظ‡', 1 )'
I also tried NVARCHAR(MAX) for utf-8 column in the design of table with fast=false the error gone, but the same error with format.
By the way, a part of data is exported as RData in here.
I want to know why the data format is not shown correctly in sql server 2016?
UPDATE
I am fully assured that there is something wrong with RODBC package.
I tried inserting to table by
sqlQuery(channel = dbhandle,"insert into DistBars
values(N'7من',NULL,NULL,NULL,NULL,NULL,NULL,NULL)")
as a test, and the format is still wrong. Unfortunately, adding CharSet=utf8; to connection string does not either work.
I had the same issue in my code and I managed to fix it eliminating rows_at_time = 1 from my connection configuration.