datetime, string format in Python/Pandas - pandas

New to Python/Pandas. Just wondering if it is correct to assume the following are the same:
pd.to_datetime(str(20061231), format='%Y%m%d')
and
pd.Timestamp('2006-12-31')
also
start_date = pd.to_datetime(str(20061231), format='%Y%m%d')
str(20061231) of course produces a string, and is the same as start_date.strftime('%Y%m%d')

Related

I am getting "CREATE MODEL OPTIONS() format parameter is missing" error in BigQuery while creating a model?

I am getting this error "CREATE MODEL OPTIONS() format parameter is missing" while trying to create an ARIMA model and it seems to be telling me that I need to define a certain format parameter, but I don't understand which one exactly it is asking me to add.
I am using the following script:
CREATE MODEL forecast
OPTIONS (model_type = 'ARIMA_PLUS',
time_series_timestamp_col='day',
time_series_data_col='cost',
auto_arima = TRUE,
data_frequency = 'AUTO_FREQUENCY',
decompose_time_series = TRUE) AS
SELECT
FORMAT_DATE('%Y-%m-%d', date) as day,
sum(net_cost) as cost
FROM ads_mif.logs_actual_footprint_cost_daily_raw
GROUP BY 1
'time_series_timestamp_col' must have a type of Timestamp, Date or DateTime, but instead has STRING type in query statement.
Remove the formatting of the date column.

Prisma queryRaw returning date as string

Issue: When I use prisma.$queryRaw, it returns my date as a string, even though I specify the query's return type. If I use prisma.find then it returns it correctly. But, I have to use queryRaw because of the complexity of the query.
schema.prisma has the date defined like such:
effectiveDate DateTime? #map("effective_date") #db.Date
So, the model object has the field defined like effectiveDate: Date | null
The query looks something like this:
const catalogCourses: CatalogCourse[] = await prisma.$queryRaw<CatalogCourse[]>`
SELECT
id,
campus,
effective_date as "effectiveDate",
...rest of the query ommitted here because it's not important
If I then do something like
console.log(`typeof date: ${typeof catalogCourses[0].effectiveDate}, value ${catalogCourses[0].effectiveDate}`)
The result shows typeof date: string, value 2000-12-31. Why isn't it a date? I need to be able to work with it as a Date, but if I do effectiveDate.getTime() for example, it errors during runtime, saying 'getTime is not a function', which it is doc. If I try and do new Date(effectiveDate), that doesn't work either because typescript sees the field as a Date object already. EDIT: I was incorrect about why the previous statement wasn't working; doing new Date(effectiveDate) does work.
I do see in the prisma docs that it says:
Type caveats when using raw SQL When you type the results of
$queryRaw, the raw data does not always match the suggested TypeScript
type.
Is there a way for queryRaw to return my date as a Date object?

Cast to long datatype - BigQuery

BigQuery and SQL noob here. I was going through possible data types big query supports here. I have a column in bigtable which is of type bytes and its original data type is scala Long. This was converted to bytes and stored in bigtable from my application code. I am trying to do CAST(itemId AS integer) (where itemId is the column name) in the BigQuery UI but the output of CAST(itemId AS integer) is 0 instead of actual value. I have no idea how to do this. If someone could point me in the right direction then I would greatly appreciate it.
EDIT: Adding more details
Sample itemId is 190007788462
Following is the code which writes itemId to the big table. I have included the relevant method. Using hbase client to write to bigtable.
import org.apache.hadoop.hbase.client._
def toPut(key: String, itemId: Long): Put = {
val TrxColumnFamily = Bytes.toBytes("trx")
val ItemIdColumn = Bytes.toBytes("itemId")
new Put(Bytes.toBytes(key))
.addColumn(TrxColumnFamily,
ItemIdColumn,
Bytes.toBytes(itemId))
}
Following is the entry in big table based on above code
ROW COLUMN+CELL
foo column=trx:itemId, value=\x00\x00\x00\xAFP]F\xAA
Following is the relevant code which reads the entry from big table in scala. This works correctly. Result is a org.apache.hadoop.hbase.client.Result
private def getItemId(row: Result): Long = {
val key = Bytes.toString(row.getRow)
val TrxColumnFamily = Bytes.toBytes("trx")
val ItemIdColumn = Bytes.toBytes("itemId")
val itemId =
Bytes.toLong(row.getValue(TrxColumnFamily, ItemIdColumn))
itemId
}
The getItemId function above correctly returns itemId. That's because Bytes.toLong is part of org.apache.hadoop.hbase.util.Bytes which correctly casts the Byte string to Long.
I am using big query UI similar to this one and using CAST(itemId AS integer) because BigQuery doesn't have a Long data type. This incorrectly casts the itemId byte string to integer and resulting value is 0.
Is there any way I can have a Bytes.toLong equivalent from hbase-client in BigQuery UI? If not is there any other way I can go about this issue?
Try this:
SELECT CAST(CONCAT('0x', TO_HEX(itemId)) AS INT64) AS itemId
FROM YourTable;
It converts the bytes into a hex string, then casts that string into an INT64. Note that the query uses standard SQL, as opposed to legacy SQL. If you want to try it with some sample data, you can run this query:
WITH `YourTable` AS (
SELECT b'\x00\x00\x00\xAFP]F\xAA' AS itemId UNION ALL
SELECT b'\xFA\x45\x99\x61'
)
SELECT CAST(CONCAT('0x', TO_HEX(itemId)) AS INT64) AS itemId
FROM YourTable;

SparkSQL errors when using SQL DATE function

In Spark I am trying to execute SQL queries on a temporary table derived from a data frame that I manually built by reading a csv file and converting the columns into the right data type.
Specifically, the table I'm talking about is the LINEITEM table from [TPC-H specification][1]. Unlike stated in the specification I am using TIMESTAMP rather than DATE because I've read that Spark does not support the DATE type.
In my single scala source file, after creating the data frame and registering a temporary table called "lineitem", I am trying to execute the following query:
val res = sqlContext.sql("SELECT * FROM lineitem l WHERE date(l.shipdate) <= date('1998-12-01 00:00:00');")
When I submit the packaged jar using spark-submit, I get the following error:
Exception in thread "main" java.lang.RuntimeException: [1.75] failure: ``union'' expected but but `;' found
When I omit the semicolon and do the same thing, I get the following error:
Exception in thread "main" java.util.NoSuchElementException: key not found: date
Spark version is 1.4.0.
Does anyone have an idea what's the problem with these queries?
[1] http://www.tpc.org/TPC_Documents_Current_Versions/pdf/tpch2.17.1.pdf
SQL queries passed to SQLContext.sql shouldn't be delimited using semicolon - this the source of your first problem
DATE UDF expects date in the YYYY-­MM-­DD form and DATE('1998-12-01 00:00:00') evaluates to null. As long as timestamp can be casted to DATE correct query string looks like this:
"SELECT * FROM lineitem l WHERE date(l.shipdate) <= date('1998-12-01')"
DATE is a Hive UDF. It means you have to use HiveContext not a standard SQLContext - this is the source of your second problem.
import org.apache.spark.sql.hive.HiveContext
val sqlContext = new HiveContext(sc) // where sc is a SparkContext
In Spark >= 1.5 it is also possible to use to_date function:
import org.apache.spark.sql.functions.{lit, to_date}
df.where(to_date($"shipdate") <= to_date(lit("1998-12-01")))
Please try hive function CAST (expression AS toDatatype)
It changes an expression from one datatype to other
e.g. CAST ('2016-06-17 00.00.000' AS DATE) will convert String to Date
In your case
val res = sqlContext.sql("SELECT * FROM lineitem l WHERE CAST(l.shipdate as DATE) <= CAST('1998-12-01 00:00:00' AS DATE);")
Supported datatype conversions are as listed in Hive Casting Dates

ToDate function provides unexpected output

I used the ToDate(userinput, format) function to covert my chararray field. I used the ToDate(userinput, 'MM/dd/yyyy') to covert the field from chararray to date but looks like i am not seeing the output as i had expected.
Here is the code:
l_dat = load 'textfile' using PigStorage('|') as (first:chararray,last:chararray,dob:chararray);
c_dat = foreach l_dat generate ToDate(dob,'MM/dd/yyyy') as mydate;
describe c_dat;
dump c_dat;
data looks like this:
(firstname1,lastname1,02/02/1967)
(John,deloy,05/26/1967)
(frank,fun,05/18/1967)
Output looks like this:
c_dat: {mydate: datetime}
(1967-05-26T00:00:00.000-04:00)
(1967-05-18T00:00:00.000-04:00)
(1967-02-02T00:00:00.000-05:00)
The output i was expecting was dateObjects with data as shown below:
(05/26/1967)
(05/18/1967)
(02/02/1967)
Please advise if i am doing anything wrong?
Ref : http://pig.apache.org/docs/r0.12.0/func.html#to-date, the return type of ToDate function is DateTime object. You can observe that in the schema description shared in output
c_dat: {mydate: datetime}
If you are having the date in the required format, you need not do any conversion.
c_dat = foreach l_dat generate dob as mydate;
If you are interested in converting the chararray date to any other format then you have to use ToString() function after getting the DateTime object.
Step 1: Convert date chararray to Date Time Ojbect using ToDate(datesstring, inutformat)
Step 2 : Use ToString(DateTime object, required format) to get the string date in the required format.
This can be achieved in a single step as below.
ToString(ToDate(date,inputformat),requiredformat);
Ref : http://pig.apache.org/docs/r0.12.0/func.html#to-string for details.