Spark time datatype equivalent to MYSQL TIME - sql

I am importing data to spark from MYSQL through JDBC and one of the column has time type (SQL type TIME and JDBC type java.sql.Time) with large hour value (Eg: 168:03:01). Spark convert them to timestamp format and causing error while reading three digit hour.How to deal with Time type in Spark

Probably your best shot at this moment is to cast data before it is actually read by Spark and parse it directly in your application. JDBC data source allows you to pass a valid subquery as a dbtable option or table argument. It means you can do for example something similar to this:
sqlContext.read.format("jdbc").options(Map(
"url" -> "xxxx",
"dbtable" -> "(SELECT some_field, CAST(time_field AS TEXT) FROM table) tmp",
))
and use some combination of built-in functions to convert it in Spark to a type that is applicable for your application.

Related

AsterixDB unable to import datetime when importing from CSV file (SQL)

I am attempting to load a database from a CSV file using AsterixDB. Currently, it works using only string, int, and double fields. However, I have a column in the CSV file that is in DateTime format. Currently I am importing them as strings, which works fine, but I would like to import them as the SQL DateTime data type. When I try changing my schema and reimporting I get the following error:
ERROR: Code: 1 "org.apache.hyracks.algebricks.common.exceptions.NotImplementedException: No value parser factory for fields of type datetime"
All entries are in this format 02/20/2010 12:00:00 AM.
I know this isn't exactly inline with the format specified by the Asterix Data Model, however, I tried a test line with the proper format and the error persisted.
Does this mean AsterixDB cant parse DateTime when doing mass imports? And if so how can I get around this issue?
Any help is much appreciated.
Alright, after discussing with some colleagues, we believe that AsterixDB does not currently support DateTime parsing when mass importing. Our solution was to upsert every entry in the dataset with the parsing built into the query.
We used the following query:
upsert into csv_set (
SELECT parse_datetime(c.Date_Rptd, "M/D/Y h:m:s a") as Datetime_Rptd,
parse_datetime(c.Date_OCC, "M/D/Y h:m:s a") as Datetime_OCC,
c.*
FROM csv_set c
);
As you can see we parse the strings using the parse_datetime function from the AsterixDB Temporal Functions library. This query intentionally doesn't erase the column with the DateTimes in string format, although that would be very simple to do if your application requires it. If anyone has a better or more elegant solution please feel free to add to this thread!

Querying data with type not supported by Trino

I use Trino to consume data from a MariaDB table.
I have a specificy column at this table with Geographical Data (Point Data https://mariadb.com/kb/en/geometry-types/). Querying the source, the data appear like this:
SELECT location FROM x.y.z
location
---------------------------
POINT (51.566682 83.32865)
POINT (46.77708 16.32856)
POINT (84.857691 4.295681)
But this kind of data isn't supported by Trino (https://trino.io/docs/current/connector/mariadb.html)
I just want the values (x, y) inside POINT(x,y).
The documentation has a flag unsupported-type-handling=CONVERT_TO_VARCHAR but when I use it the data retrieved came like this:
location
-------------------------
�Q�GHJk ���*#
�{���GMg'���(#
0�Z¶nK#�B< / #
I tested a lot of conversions on this varchar but no one worked well. So how can I get this kind of data type using Trino?
The datype is not text, so converting it will not help
You can always use natve functions to myriadb, as long as they return datatypes that are allowed
Table functions
The connector provides specific table functions to access MariaDB.
query(varchar) -> table#
The query function allows you to query the underlying database directly. It requires syntax native to MariaDB, because the full query is pushed down and processed in MariaDB. This can be useful for accessing native features which are not available in Trino or for improving query performance in situations where running a query natively may be faster.
so a query like will work
SELECT
X,Y
FROM
TABLE(
mariadb.system.query(
query => 'SELECT
ST_X(loacation) as X,
ST_Y/location) As Y
FROM
mytable'
)
);

Create non-quoted identifier column names when creating sql developer table in python

After connecting to the oracle DB, I have created a sql developer table via Python, I then load my csv into the df and convert the datatypes to varchar for a faster load into the table (because the default str type takes an unreasonable amount of time). The data loads fast and it is all present, but the issue is when interrogating the data in SQL Developer, I am forced to put '' round the column names for it to be recognised, and when trying to perform simple operations on the data such as SELECT * FROM new_table ORDER BY 'CATEGORY' asc sql can not seem to sort my data at all, does anyone have any suggestions please? Below is a snippet of the python code I have used
os.chdir('C:\\Oracle\\instantclient_19_5')
dataload= Table('new_table, meta,
Column('Category',VARCHAR(80)),
Column('Code',VARCHAR(80)),
Column('Medium',VARCHAR(80)),
Column('Source',VARCHAR(80)),
Column('Start_date',VARCHAR(80)),
Column('End_date',VARCHAR(80))
meta.create_all(engine)
df3= pd.read_csv(fname)
dtyp = {c:types.VARCHAR(df3[c].str.len().max())
for c in df3.columns[df3.dtypes == 'object'].tolist()}
df3.to_sql(new_table, engine, index=False, if_exists='replace', dtype=dtyp)

Are there SQL datatypes that don't work with R?

I am trying run an sqlQuery in Rstudio which seems to crash the program. I want to use the RODBC package to import a name called package name and elapsed time from a Oracle database. When I try to do an sqlQuery such as the following
dataframe <- sqlQuery(channel,
"select package_name, elapsed_time from fooSchema.barTable")
When I run this with just the package_name or other fields in the table, it works fine. If I try to run this with the elapsed_time, RStudio crashes. The datatype of elapsed_time is INTERVAL DAY (3) TO SECOND (6) so one record for example looks like this, "+000 00:00:00.22723"
Are there certain data types, such as Interval Day to Second, from Oracle that don't work in RStudio or R in general?
The problem isn't R, Rstudio, or even RODBC. The problem is that Oracle doesn't support interval data types for ODBC connections.
It is under section E.1
https://docs.oracle.com/cd/B28359_01/server.111/b32009/app_odbc.htm#CIHBFHCG
To get back to your question in a more general sense. Base R supports Date, POSIXct, and POSIXlt objects.
Dates and POSIXct objects are stored as the number of days/seconds respectively since 1/1/1970 whereas POSIXlt is a list of elements.
Whatever SQL connector you're using will need to coerce the SQL version of a date and time into one of the above. Sometimes it'll just convert to a character string. For instance with RPostgreSQL it'll take columns stored as Postgre's Date type as a character but Postgres timestamp columns will be coerced into POSIXct directly.

FileMaker TimeStamp field UPDATE using SQL

I need to update a FileMaker Timestamp field with a timestamp taken from PHP and put into a script using the PHP API and executeSQL API and plugin
so
UPDATE table SET time ='2011-05-27 11:28:57'
My Question is as follows, how do I utilise the available scripting functions within Filemaker Pro 11 to convert the string that is being supplied within the SQL statement to an acceptable TimeStamp format for FileMake? or is it possible using the executeSQL plugin for FileMaker to do the conversion within the ExecuteSQL() function within the Execute SQL plugin?
I haven't tried it out, but it should work using CAST:
CAST( expression AS type [ (length) ] )
so, it should read:
UPDATE table SET time = CAST ('2011-05-27 11:28:57' AS TIMESTAMP)
However, please be aware that Filemaker's own ExecuteSQL() functions doesn't support UPDATE or INSERT INTO statements. You need to get a free extension from Dracoventions called epSQLExecute() in order to do this.
Hope this helps (someone).
Gary
You haven't given us much to go on, but my guess would be that you are updating a timestamp column with a string that does not match the required format.
You should convert your string to the appropriate object and then the update should work.