Convert string date to date format in pyspark SQL - apache-spark-sql

I have a date column in table in which is in string format
I need to convert this string date into date type format
This is what my date column looks like
+----------+
| date|
+----------+
|2018_07 |
+----------+
I need to convert this date in this desired format in date format and not in string format
+----------+
| Date|
+----------+
|2018-07-01|
+----------+
I am trying to use this but its giving me null values under date column
%sql
SELECT Col1,Col2,Col3,Col4,
TO_DATE(
CAST(
UNIX_TIMESTAMP(date, 'yyyy-MM-01') AS TIMESTAMP
)
) as Date
,sales
FROM db.table
Any kind of help is appreciated

I simplified your code now it will work fine (just tested it) :-)
%sql
SELECT
Col1,
Col2,
Col3,
Col4,
TO_DATE(TRIM(date), 'yyyy_MM') AS DATE,
sales
FROM
db.table

Related

How do I create a new column showing difference between maximum date in table and date in row?

I need two columns: 1 showing 'date' and the other showing 'maximum date in table - date in row'.
I kept getting a zero in the 'datediff' column, and thought a nested select would work.
SELECT date, DATEDIFF(max_date, date) AS datediff
(SELECT MAX(date) AS max_date
FROM mytable)
FROM mytable
GROUP BY date
Currently getting this error from the above code : mismatched input '(' expecting {, ';'}(line 2, pos 2)
Correct format in the end would be:
date | datediff
--------------------------
2021-08-28 | 0
2021-07-26 | 28
2021-07-23 | 31
2021-08-11 | 17
If you want the date difference, you can use:
SELECT date, DATEDIFF(MAX(date) OVER (), date) AS datediff
FROM mytable
GROUP BY date
You can do this using the analytic function MAX() Over()
SELECT date, MAX(date) OVER() - date FROM mytable;
Tried this here on sqlfiddle

MSSQL Sum by EntryDate

I have a table called "Data". With columns: "Number" and "EntryDate".
The EntryDate is Datetime( Y-m-d h:i:s ).
I need to calculate the sum of all entries from a date ignore h:i:s, group them by the date.
Example:
Number | EntryDate
-------------------
23 | 2018-10-01 13:22:10.520
25 | 2018-10-01 11:16:09.533
So basically I need to SUM the Number from 2018-10-01.
I have tried several variations but nothing seems to work, for example:
SELECT
SUM(Number) as 'Sum',
EntryDate AS DATE
FROM Data
GROUP BY EntryDate
Use cast() function for converting datetime to date
SELECT
SUM(Number) as 'Sum', cast(EntryDate as date) AS `DATE`
FROM Data
GROUP BY cast(EntryDate as date
Your date is at the moment in the datetime format, hence if you select date within your select query, you wont really get the date, instead you would get the respective datetimes.
What you can do is Convert the EntryDate as date:
Try:
select sum(number) as 'Sum', convert(date,EntryDate) as 'Date'
from Data
group by convert(date,EntryDate)
Should work.
Go seek more information from here https://www.w3schools.com/sql/func_sqlserver_convert.asp
Cheers

Sub-query is Not Working for Date_Part()

I want to pass the subquery as an argument to the EXTRACT() function of Postgres to get the number of the day of the week but it is not working.
Working Code:
SELECT EXTRACT(dow FROM DATE '2018-06-07');
It returns:
+-------------+
| date_part |
|-------------|
| 4.0 |
+-------------+
Not Working Code:
SELECT EXTRACT(DOW FROM DATE
(SELECT start_date from leaves where submitted_by=245 and type_id = 16)
);
It returns
syntax error at or near "SELECT"
LINE 1: SELECT EXTRACT(DAY FROM DATE (SELECT submitted_on FROM leave...
I don't know why EXTRACT() function is not accepting subquery result as the query:
SELECT start_date from leaves where submitted_by=245 and type_id = 16;
returns the following which I think is identical I have passed as a
date string in the working example.
+--------------+
| start_date |
|--------------|
| 2018-06-07 |
+--------------+
Can somebody correct it or let me know some other way to get the number of the day of the week.
Just apply it to the column of the select:
SELECT EXTRACT(DOW from start_date)
from leaves
where submitted_by=245 and type_id = 16
If you really want to use a scalar sub-query, then you must get rid of the DATE keyword, that is only needed to specify date constants.
SELECT EXTRACT(DOW FROM
(SELECT start_date from leaves where submitted_by=245 and type_id = 16)
);
Put the function inside the select:
select (select extract(dow from start_date)
from leaves
where submitted_by = 245 and type_id = 16
)
I don't see the advantage for using a subquery in the select for this (as opposed to -- say -- moving the subquery to the from. But this should do what you want.

Select count for each specific date

I have the following need:
I need to count the number of times each id activated from all dates.
Let's say the table looks like this:
tbl_activates
PersonId int,
ActivatedDate datetime
The result set should look something like this:
counted_activation | ActivatedDate
5 | 2009-04-30
7 | 2009-04-29
5 | 2009-04-28
7 | 2009-04-27
... and so on
Anyone know how to do this the best possible way? The date comes in the following format '2011-09-06 15:47:52.110', I need to relate only to the date without the time. (summary for each date)
you can use count(distinct .. )
and if the ActivatedDate is datetime you can get the date part
select Cast(ActivatedDate AS date), count(distinct id)
from my_table
group by ast(ActivatedDate AS date)
You can use to_char function to remove the time from date
select count(*) counted_activation,
to_char(activatedDate,"yyyy-mm-dd") ActDate
from table1
group by to_char(activatedDate,"yyyy-mm-dd");
Use 'GROUP BY' and 'COUNT'. Use CONVERT method to convert datetime to Date only
SELECT CONVERT(DATE,activatedate), COUNT(userId)
FROM [table]
GROUP BY CONVERT(DATE,InvoiceDate)

Sum Column of Integers Based on Timestamp in PostgreSQL

I am using PostgreSQL version 8.1. I have a table as follows:
datetime | usage
-----------------------+----------
2015-12-16 02:01:45+00 | 71.615
2015-12-16 03:14:42+00 | 43.000
2015-12-16 01:51:43+00 | 25.111
2015-12-17 02:05:26+00 | 94.087
I would like to add the integer values in the usage column based on the date in the datetime column.
Simply, I would like the output to look as below:
datetime | usage
-----------------------+----------
2015-12-16 | 139.726
2015-12-17 | 94.087
I have tried SELECT dateTime::DATE, usage, SUM(usage) FROM tableName GROUP BY dateTime::DATE, lngusage; which does not perform as expected. Any assistance would be appreciated. Thanks in advance.
Below query should give you the desired result:
select to_char(timestamp, 'YYYY-MM-DD') as time, sum(usage)
from table
group by time
This one is for postgreSQL, I see you added MySQL also.
SELECT
dt
SUM(usage),
FROM (
SELECT
DATE_TRUNC('day', datetime) dt,
usage
FROM
tableName
) t
GROUP BY
dt
SELECT to_char(datetime, 'format'), sum(usage)
FROM table
group by to_char(datetime, 'format')
In addition you could a window function.
SELECT DATETIME
,SUM(USAGE) OVER(PARTITION BY CAST(datetime AS DATE) ORDER BY datetime) AS Usage
FROM TableName