RODBC loses time values of datetime when result set is large - sql

So this is VERY strange. RODBC seems to drop the time portion of DateTime SQL columns if the result set is large enough. (The queries are running against an SQL Server 2012 machine, and, yes, when I run them on the SQL Server side they produce identical and proper results, regardless of how many rows are returned.)
For example, the following works perfectly:
myconn <- odbcConnect(dsnName, uid, pwd)
results <- sqlQuery(myconn, "SELECT TOP 100 MyID, MyDateTimeColumn from MyTable ORDER BY MyDateTimeColumn DESC")
close(myconn)
In R, the following works as expected:
> results$MyDateTimeColumn[3]
[1] "2013-07-01 00:01:22 PDT"
which is a valid POSIXct date time. However, when somewhere between 10,000 and 100,000 rows are returned, suddenly the time portion disappears:
myconn <- odbcConnect(dsnName, uid, pwd)
bigResults <- sqlQuery(myconn, "SELECT TOP 100000 MyID, MyDateTimeColumn from MyTable ORDER BY MyDateTimeColumn DESC")
close(myconn)
(same code, simply a larger number of rows returned; NOTE: the exact same row has now lost its time component), R responds:
> bigResults$MyDateTimeColumn[3]
[1] "2013-07-01 PDT"
Note that the time is now missing (this is not a different row; it's the exact same row as previous), as the following shows:
>strptime(results$TriggerTime[3], "%Y-%m-%d %H:%M:%S")
[1] "2013-07-01 00:01:22"
>strptime(bigResults$TriggerTime[3], "%Y-%m-%d %H:%M:%S")
[1] NA
Obviously the work-around is either incremental query-with-append or export-to-CSV-and-import-to-R, but this seems very odd. Anyone ever seen anything like this?
Config: I'm using the latest version of RODBC (1.3-10) and can duplicate the behavior on both an R installation running on Windows x64 and an R installation running on Mac OS X 10.9 (Mavericks).
EDIT #2 Adding output of dput() to compare the objects, per request:
> dput(results[1:10,]$MyDateTimeColumn)
structure(c(1396909903.347, 1396909894.587, 1396909430.903, 1396907996.9, 1396907590.02, 1396906077.887, 1396906071.99, 1396905537.36, 1396905531.413, 1396905231.787), class = c("POSIXct", "POSIXt"), tzone = "")
> dput(bigResults[1:10,]$MyDateTimeColumn)
structure(c(1396854000, 1396854000, 1396854000, 1396854000, 1396854000, 1396854000, 1396854000, 1396854000, 1396854000, 1396854000), class = c("POSIXct", "POSIXt"), tzone = "")
It would appear that the underlying data are actually changing as a result of the number of rows returned by the query, which is downright strange.

sqlQuery() has an option called as.is. Setting this to TRUE will pull everything as seen in for example Microsoft SQL Management Studio.

I cope with the same problem as well. Even stranger, on a large dataset one column would import both date and time the other column only imported the date.
My advise would be to split the data/time in SQL
myconn <- odbcConnect(dsnName, uid, pwd)
results <- sqlQuery(myconn, "SELECT TOP 100 MyID, format(MyDateTimeColumn,"HH:mm:ss") as MyTimeColumn,format(MyDateTimeColumn,"yyyy-MM-dd") as MyDateColumn from MyTable ORDER BY MyDateTimeColumn DESC")
close(myconn)
Then combine them in R afterwards. Hope it helps.

I had the same issue and concluded that it is due to DST:
This fails:
as.POSIXct(c("2015-03-29 01:59:22", "2015-03-29 02:00:04"))
This works:
as.POSIXct(c("2015-03-29 01:59:22", "2015-03-29 02:00:04"), tz="UTC")
I could not find how to force tz="UTC" in default RODBC behavior, however using as.is = TRUE and converting columns myself does the job.
Note: At first I had the impression that it was due to huge result, but in fact it was due to the fact that in huge results I have more chances to cross DST updates.

This is an older question, but I had similar issues when trying to programmatically read in data from 15 different .accdb. All POSIXct fields were read in correctly for every database except those from the months of March, from which I inferred that it is some sort of daylight-savings time issue.
The solution for me (because I didn't want to have to make multiple queries to a dB and then rbind() everything together) was to alter my function to include the lines
#Get initial tz
current_tz <- Sys.timezone()
Sys.setenv(TZ = "America/Phoenix")
[BODY OF FUNCTION]
Sys.setenv(TZ = curent_tz)
After including these few lines, the day/time fields from the March databases were being read in correctly.

sqlQuery(ch, getSQL(sqlquerypath))
stripped the times off my datetime column.
sqlQuery(ch, getSQL(sqlquerypath), as.is = TRUE)
fixed the issue.

I think this is a case of times being stripped from the dates where the date range includes shifts to/from daylight savings time. If you are selecting periods that don't include daylight savings shifts, the times will be retained (e.g., from 1/1/2007 to 3/1/2007. This could possibly be avoided by changing the system time on your computer to follow a time zone (e.g., Arizona) where there are no daylight savings shifts (sounds bizarre, but it has worked for me).
To overcome this issue, import the DateTimes as characters (using "as.is") and then convert them to POSIXct. You could alternatively use "strptime" which converts to POSIXlt and allows you to specify the format. Here's an example of a SQL query where DateTimes are imported as.is (TRUE), but asociated DataValues are not (FALSE) and then the DateTime is converted to an R date format:
data <- sqlQuery(channel, paste("SELECT LocalDateTime, DataValue FROM DataValues WHERE LocalDateTime >= '1/1/2007 0:00' AND LocalDateTime < '1/1/2008 0:00' ORDER BY LocalDateTime ASC"),as.is=c(TRUE,FALSE))
data$LocalDateTime <- as.POSIXct(totalP$LocalDateTime,tz="MST")

It may be a daylight saving issue. If there is a time that doesnt exist in your timezone (because of daylight saving) it may cause something like this.

Why this happens on large datasets returned from sqlQuery()? I don't know. But was able to workaround it by applying a sql conversion in the sql call:
data <- sqlQuery(channel, "SELECT CONVERT(nvarchar(24), DtTimeField, 21) AS HourDt, * FROM ...
This is your workaround.

Related

I've performed a JOIN using bigrquery and the dbGetQuery function. Now I'd like to query the temporary table I've created but can't connect

I'm afraid that if a bunch of folks start running my actual code I'll be billed for the queries so my example code is for a fake database.
I've successfully established my connection to BigQuery:
con <- dbConnect(
bigrquery::bigquery(),
project = 'myproject',
dataset = 'dataset',
billing = 'myproject'
)
Then performed a LEFT JOIN using the coalesce function:
dbGetQuery(con,
"SELECT
`myproject.dataset.table_1x`.Pokemon,
coalesce(`myproject.dataset.table_1`.Type_1,`myproject.dataset.table_2`.Type_1) AS Type_1,
coalesce(`myproject.dataset.table_1`.Type_2,`myproject.dataset.table_2`.Type_2) AS Type_2,
`myproject.dataset.table_1`.Total,
`myproject.dataset.table_1`.HP,
`myproject.dataset.table_1`.Attack,
`myproject.dataset.table_1`.Special_Attack,
`myproject.dataset.table_1`.Defense,
`myproject.dataset.table_1`.Special_Defense,
`myproject.dataset.table_1`.Speed,
FROM `myproject.dataset.table_1`
LEFT JOIN `myproject.dataset.table_2`
ON `myproject.dataset.table_1`.Pokemon = `myproject.dataset.table_2`.Pokemon
ORDER BY `myproject.dataset.table_1`.ID;")
The JOIN produced the table I intended and now I'd like to query that table but like...where is it? How do I connect? Can I save it locally so that I can start working my analysis in R? Even if I go to BigQuery, select the Project History tab, select the query I just ran in RStudio, and copy the Job ID for the temporary table, I still get the following error:
Error: Job 'poke-340100.job_y0IBocmd6Cpy-irYtNdLJ-mWS7I0.US' failed
x Syntax error: Unexpected string literal 'poke-340100:US.bquxjob_7c3a7664_17ed44bb4ae' at [2:6] [invalidQuery]
Run `rlang::last_error()` to see where the error occurred.
And if I follow up:
Run `rlang::last_error()` to see where the error occurred.
> rlang::last_error()
<error/rlang_error>
Job 'poke-340100.job_y0IBocmd6Cpy-irYtNdLJ-mWS7I0.US' failed
x Syntax error: Unexpected string literal 'poke-340100:US.bquxjob_7c3a7664_17ed44bb4ae' at [2:6] [invalidQuery]
Backtrace:
1. DBI::dbGetQuery(con, "SELECT *\nFROM 'poke-340100:US.bquxjob_7c3a7664_17ed44bb4ae'\nWHERE Type_1 IS NULL;")
2. DBI::dbGetQuery(con, "SELECT *\nFROM 'poke-340100:US.bquxjob_7c3a7664_17ed44bb4ae'\nWHERE Type_1 IS NULL;")
3. DBI:::.local(conn, statement, ...)
5. bigrquery::dbSendQuery(conn, statement, ...)
6. bigrquery:::BigQueryResult(conn, statement, ...)
7. bigrquery::bq_job_wait(job, quiet = conn#quiet)
Run `rlang::last_trace()` to see the full context.
> rlang::last_trace()
<error/rlang_error>
Job 'poke-340100.job_y0IBocmd6Cpy-irYtNdLJ-mWS7I0.US' failed
x Syntax error: Unexpected string literal 'poke-340100:US.bquxjob_7c3a7664_17ed44bb4ae' at [2:6] [invalidQuery]
Backtrace:
x
1. +-DBI::dbGetQuery(con, "SELECT *\nFROM 'poke-340100:US.bquxjob_7c3a7664_17ed44bb4ae'\nWHERE Type_1 IS NULL;")
2. \-DBI::dbGetQuery(con, "SELECT *\nFROM 'poke-340100:US.bquxjob_7c3a7664_17ed44bb4ae'\nWHERE Type_1 IS NULL;")
3. \-DBI:::.local(conn, statement, ...)
4. +-DBI::dbSendQuery(conn, statement, ...)
5. \-bigrquery::dbSendQuery(conn, statement, ...)
6. \-bigrquery:::BigQueryResult(conn, statement, ...)
7. \-bigrquery::bq_job_wait(job, quiet = conn#quiet)
Can someone please explain? Is it just that I can't query a temporary table with the bigrquery package?
From looking at the documentation here and here, the problem might just be that you did not assign the results anywhere.
local_df = dbGetQuery(...
should take the results from your database query and copy them into local R memory. Take care as there is no check for the size of the results, so it is easy to run out of memory in when doing this.
You have tagged the question with dbplyr, but it looks like you are just using the DBI package. If you want to be writing R and have it translated to SQL, then you can do this using dbplyr. It would look something like this:
con <- dbConnect(...) # your connection details here
remote_tbl1 = tbl(con, from = "table_1")
remote_tbl2 = tbl(con, from = "table_2")
new_remote_tbl = remote_tbl1 %>%
left_join(remote_tbl2, by = "Pokemon", suffix = c("",".y")) %>%
mutate(Type_1 = coalesce(Type_1, Type_1.y),
Type_2 = coalesce(Type_2, Type_2.y)) %>%
select(ID, Pokemon, Type_1, Type_2, ...) %>% # list your return columns
arrange(ID)
When you use this approach, new_remote_tbl can be thought of as a new table in the database which you can query and manipulate further. (It is not actually a table - no data was saved to disc - but you can query it and interact with it as if it were and the database will produce it for you on demand).
There are some limitations of working with a remote table (the biggest is you are limited to commands that dbplyr can translate into SQL). When you want to copy the current remote table into local R memory, use collect:
local_df = remote_df %>%
collect()

Bad performance when filtering Azure logs - WCF Data Services filters

Azure Diagnostics is pushing Windows Events into a storage table "WADWindowsEventLogsTable".
I would like to query this storage Table using VisualStudio(2015) and CloudExplorer.
As this table has an huge content, I'm indefinitely waiting for the results..
Here is a query sample:
EventId eq 4096 and Timestamp gt datetime'2016-06-24T08:20:00' and Timestamp lt datetime'2016-06-24T10:00:00'
I suppose that this query is correct ?
Does exist a way to improve performance ?
filter result columns ?
return only TOP X results ?
another usefull tips ?
I know that a better way would be to script that; for example using Python, but I would like to use the UI as much as possible..
(Edit) following Gaurav Mantri answer I used this little C# program to build my query. The answer is so quick and that solve my initial performance issue:
static void Main(string[] args)
{
string startDate = "24 June 2016 8:20:00 AM";
string endDate = "24 June 2016 10:00:00 AM";
string startPKey = convertDateToPKey(startDate);
string endPKey = convertDateToPKey(endDate);
Debug.WriteLine("(PartitionKey gt '" + startPKey + "'"
+ " and PartitionKey le '" + endPKey +"')"
+ " and (EventId eq 4096)"
);
}
private static string convertDateToPKey(string myDate)
{
System.DateTime dt = System.Convert.ToDateTime(myDate);
long dt2ticks = dt.Ticks;
string ticks = System.Convert.ToString(dt2ticks);
return "0" + ticks;
}
NB: for those, like me, who are searching so far away how to export results to a CSV file, you should know that this icon is your answer (and it's not a 'undo' ;) ):
In your query, you're filtering on Timestamp attribute which is not indexed (Only PartitionKey and RowKey attributes are indexed). Thus your query is making a full table scan (i.e. going from the 1st record till the time it finds a matching record) and hence not optimized.
In order to avoid full table scan, please use PartitionKey in your query. In case of WADWindowsEventLogsTable, the PartitionKey essentially represents the date/time value in ticks. What you would need to do is convert the date/time range for which you want to get the data into ticks, prepend a 0 in front of it and then use it in the query.
So your query would be something like:
(PartitionKey gt 'from date/time value in ticks prepended with 0' and PartitionKey le 'to date/time value in ticks prepended with 0') and (EventId eq 4096)
I wrote a blog post about it some time ago that you may find useful: http://gauravmantri.com/2012/02/17/effective-way-of-fetching-diagnostics-data-from-windows-azure-diagnostics-table-hint-use-partitionkey/

Gain more Results on enhancing the query on one specific Server

I wrote an Access add-in (VBA) which works perfectly fine on my and other test servers, but on one server I encountered a very strange problem. When I execute the query with the data condition, I get the correct results (which are none). When I add the date condition I get more results than before. These results do not even match with my other conditions. I use the exact same database on both servers.
Here is the Query:
select BE.* from dbo.KHKBuchungserfassung AS BE
left join KHKSachkonten AS SSK on BE.KtoSoll = SSK.SaKto and BE.Mandant = SSK.Mandant
left join KHKSachkonten AS HSK on BE.KtoHaben = HSK.SaKto and BE.Mandant = HSK.Mandant
where ((((BE.KtoSollTyp = 1) or (BE.KtoSollTyp = 2)) and (HSK.Kontenart = 'BF' or HSK.Kontenart = 'BG')) or
(((BE.KtoHabentyp = 1) or (BE.KtoHabentyp = 2)) and (SSK.Kontenart = 'BF' or SSK.Kontenart = 'BG')))
and BE.Mandant= 88 and BE.Buchungsdatum>={d '2016-01-01'} AND BE.Buchungsdatum<={d '2016-06-30'}
If I execute this on that specific server with from date < a specific date ('2016-02-24') it also works perfectly fine (e.g. '2016-01-01' & '2016-06-30'). Any given date higher than this one gives results that violate my first condition... (the one in the parenthesis). I already checked the number of parenthesis. I also rotated the places of the given conditions. In addition to that, I am pretty sure my DBMS would give me an error if there is a problem with my syntax (not sure though).
This is the Result Set:
Dim rs As ADODB.Recordset
sQry = ...
Set rs = goMandant.oData.rsOpenRecordset(sQry, adOpenStatic)
Do Until rs.EOF
...
I am sorry that I can not provide test data due to juridical reasons, but if you would also select SSK.Kontenart and HSK.Kontenart there would be 'NULL' on both sides which is not at all possible in my first condition...
I would also like to note that this is my first question and if I made any mistakes, I would appreciate it if you told me.
Regards
TK

How to fix error "Conversion failed when converting datetime from character string" in SQL Query?

The error that I found at the log is the one below.
'Illuminate\Database\QueryException' with message 'SQLSTATE[22007]:
[Microsoft][ODBC Driver 11 for SQL Server][SQL Server]Conversion
failed when converting date and/or time from character string. (SQL:
SELECT COUNT(*) AS aggregate
FROM [mytable]
WHERE [mytable].[deleted_at] IS NULL
AND [created_at] BETWEEN '2015-09-30T00:00:00' AND '2015-09-30T23:59:59'
AND ((SELECT COUNT(*) FROM [mytable_translation]
WHERE [mytable_translation].[item_id] = [mytable].[id]) >= 1)
)'
in
wwwroot\myproject\vendor\laravel\framework\src\Illuminate\Database\Connection.php:625
On the database, the DataType is datetime and is not null
Based on marc_s's answer I tried to change the format that I'm sending to the database. So I tried without the T on and [created_at] between '2015-09-30 00:00:00' and '2015-09-30 23:59:59'.
In my local, I'm using mysql, and the code works just fine. If I test the query above on the SQL Server client, both (with and without the T) works too.
How can I fix this problem without create any changes on the database itself?
The PHP/Laravel code:
$items = $items->whereBetween($key, ["'".$value_aux."T00:00:00'", "'".$value_aux."T23:59:59'"]);
With #lad2025 help, I got it to work.
Based on the point of his comments on my question, I changed in the code part (Laravel/PHP in this case) the format that I was passing. (In reality, I "removed" the format it self, and just added fields to a variable before passing to the query. This way, I let the database decide the format that he wants)
Instead of
$itens = $itens->whereBetween($key, ["'".$value_aux."T00:00:00'", "'".$value_aux."T23:59:59'"]);
I changed the code to this:
$sta = $value_aux."T00:00:00";
$end = $value_aux."T23:59:59";
$itens = $itens->whereBetween($key, [$sta, $end]);

Limit number of characters imported from SQL in R

I am using the sqlquery function in R to connect the DB with R. I am using the following lines
for (i in 1:length(Counter)){
if (Counter[i] %in% str_sub(dir(),1,29) == FALSE){
DT <- data.table(sqlQuery(con, paste0("select a.* from edp_data.sme_loan a
where a.edcode IN (", print(paste0("\'",EDCode,"\'"), quote=FALSE),
") and a.poolcutoffdate in (",print(paste0("\'",str_sub(PoolCutoffDate,1,4),"-",str_sub(PoolCutoffDate,5,6),"-",
str_sub(PoolCutoffDate,7,8),"\'"), quote=FALSE),")")))}}
Thus I am importing subsets of the DB by EDCode and PoolCutoffDate. This works perfectly, however there is one variable in edp_data.sme in one particular EDCode which produces an undesired result.
If I take the unique of this "as.3" variable for a particular EDCode I get:
unique(DT$as3)
[1] 30003000000000019876240886000 30003000000000028672000424000
In reality there shoud be more unique IDs in this DB. The problem is that the string of as3 is much longer than the one which is imported.
nchar(unique(DT$as3))
[1] 29 29
How can I import more characters from this string? I do not want to specify each variable in select a.* ideally, but only make sure that it imports the full string of as3.
Any help is appreciated!