Pyspark-SQL Sum Integer to Date (with sql)

Pyspark-SQL Sum Integer to Date (with sql) - apache-spark-sql

I want to add any number of days to a given date, for example I want to add a day to today's date.
I have one dataframe like this:
------------
| date |
------------
|2020-10-01|
------------
I would like to get a dataframe like this:
------------
| date |
------------
|2020-10-02|
------------
The real code is incrusted in a complex sql query then the valid result is ONLY with SQL statements.
I have tried with this code, that try to get the next day of today and it's not working due to difference between types date and int, I think that I am looking for something similar to python timedelta but in pyspark-sql
spark.sql(f"SELECT to_date(now()) + 1")
The error:
cannot resolve '(to_date(current_timestamp()) + 1)' due to data type mismatch: differing types in '(to_date(current_timestamp()) + 1)' (date and int)

After a while of searching I found a function that solves the problem:
spark.sql("SELECT date_add(to_date(now()),1)").show()
Documentation:
date_add(Column start, int days)
Returns the date that is days days after start
https://spark.apache.org/docs/1.6.2/api/java/org/apache/spark/sql/functions.html

Related

Converting a string to date in SQL

I am working on a table where I have two cols named birth and death with dates as strings(to be specific character varying).
I want to calculate the age by using those two columns. If any one of the values is missing, i.e. if either birth is missing or death is missing then it should return 'unknown' (By the way in the table that I am working with the missing values are given as None)
When I try to convert them into date either they are giving me an error or they are giving a wrong age.
For Example, let's say these are the birth and death dates respectively:
birth: 0133-01-30T00:53:28+00:53
death: 0193-07-01T00:53:28+00:53
I used the the following command:
CAST(death as date) - CAST(birth as date)
But this is returning ages such as 2210 or just some absurd age
BY THE WAY I AM DOING THIS IN THE JUPYTER NOTEBOOK USING POSTGRESQL

You can use postgresql age function to calculate the age.
Example:
SELECT coalesce(age('0193-07-01T00:53:28+00:53'::timestamp, '0133-01-30T00:53:28+00:53'::timestamp)::text, 'Unknown');
age
------------------------
60 years 5 mons 2 days
(1 row)
To get years:
SELECT coalesce(EXTRACT(years FROM age('0193-07-01T00:53:28+00:53'::timestamp, '0133-01-30T00:53:28+00:53'::timestamp))::text, 'Unknown');
extract
---------
60
(1 row)
SELECT coalesce(EXTRACT(years FROM age('0193-07-01T00:53:28+00:53'::timestamp, null))::text, 'Unknown');
coalesce
----------
Unknown
(1 row)

How to iterate a string literal based on date using while loop in SQL

I have a table that i query to give me a list of processes. The naming convention of these processes changes based on the current month and year we are in. So for the month of July, it will give me the "07" with the year "18" in the name and so on.
This is a test query
SELECT SUBSTRING(process_ID , 6, 2) as Procnumber,[process_ID]
FROM [Hana].[dbo].[processActivityEvent] v
WHERE process_ID like 'AB18-07_shield%'
I have used a substring to get the month out which give me:
Result
+------------+----------------------+
| Procnumber | Process_ID |
+------------+----------------------+
| 07 | AB18-07_shield123456 |
+------------+----------------------+
I will like to iterate this number(month) whenever we move into a new month which will be August. I want a while loop or any function that will either add 1 to the next number or replace the current number so i wont have to hard code it into my query or modify my query whenever we move into a new month or year. I want to automate this process.
Please help.

With the Year calculation added it would be
WHERE process_ID like 'AB' + RIGHT(RTRIM(YEAR(GETDATE())), 2)
+'-'+RIGHT('0' + RTRIM(MONTH(GETDATE())), 2)+'_shield%'

WHERE process_ID like 'AB18-'+RIGHT('0' + RTRIM(MONTH(GETDATE())), 2)+'_shield%'
You can do similarly for the year as well

SQL query date range

I have the following table:
ID DATE_START DATE_END
------- ---------- --------
11944 10.01.15 20.01.15
I would like to select rows based an a date range, e.g
01.01.15 - 25.01.15
15.01.15 - 25.01.15
In both cases I would like to select the column mentioned above. Is this possible with SQL? I tried a few things but i don't get the second query working. I use Oracle DB:
Example usage:
I want to query ma datatable like this: Show me all Entries between 15.01.15 and 25.01.15. This should yield to row with ID 11944

You want to return a row if two periods overlap, assuming both columns are defined as DATE.
select *
from tab
where DATE_START <= DATE '2015-01-25' -- end of searched period
and DATE_END >= DATE '2015-01-15' -- begin of searched period
In Standard SQL there's an OVERLAPS predicate which is not (officially) supported by Oracle:
where (DATE_START, DATE_END) OVERLAPS (DATE '2015-01-15', DATE '2015-01-25')

How to use substring of result in PosrgreSQL/SQL

I don't know how to name this syntax properly. I got a table (say T) with column A in which store the entries in format USER-YYYY-MMDD. I want to extract all rows whose A column's year part (YYYY part) is greater than 2010. E.g.
TABLE T
+----------------+
| A |
+----------------+
| USER-2011-1234*|
| USER-1992-1923 |
| USER-2014-1234*|
+----------------+
(*) are what I want: YYYY part is greater than 2010. SQL should looks like this, but I dont know how to say it in PostgreSQL.
SELECT * FROM T WHERE A[5-8] > 2010
Thanks!

select *
from t
where to_number(substr(a, 6, 4)) > 2010;
Note that his will fail with an error if the string cannot be converted to a number
More details in the manual: http://www.postgresql.org/docs/current/static/functions-string.html
Btw: storing more than one information in a single column is a bad design. You should store the username and the date in two different columns. Additionally storing dates as varchar is also a very bad idea. The date should be stored as date not as varchar

Try Like this
select *
from t
where substring(a::text from 6 for 4)::integer > 2010;

Oracle date column showing the wrong value

I'm trying to identify a problem in a date colum in my table.
The database is Oracle 11g.
The situation is:
When I run the following query:
select to_char(data_val, 'DD/MM/YYYY'), a.data_val from material a order by a.data_val asc;
the five first lines of the result are:
00/00/0000 | 29/06/5585 00:00:00
00/00/0000 | 29/06/5585 00:00:00
00/00/0000 | 29/06/5585 00:00:00
11/11/1111 | 11/11/1111 00:00:00
01/01/1500 | 01/01/1500 00:00:00
the question is:
Why the to_char function of the first three lines returns a different value of date (00/00/0000)?
And why the date 29/06/5585 is the first result of a ASC date order by? It'll be right using: order by data_val DESC, will not?

We've encountered the same problem. I can confirm that the "date" column is indeed the DATE type.
The date in question is 01-May-2014, so it's most likely not related to the big year number in the original post. And when you perform some calculation with the date, the problem is fixed, i.e. TO_CHAR(datum) would be all zeros, TO_CHAR(datum + 1) would be as expected, and even TO_CHAR(datum +1 -1) would be correct. (TO_CHAR(datum+0) doesn't help :))
Based on the DUMP value it seems that the problem is that we've somehow managed to store 31-Apr-2014 rather than 01-May-2014 (investigating now how that was possible; Informatica + Oracle 11.2, I believe).

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Pyspark-SQL Sum Integer to Date (with sql) - apache-spark-sql

After a while of searching I found a function that solves the problem: spark.sql("SELECT date_add(to_date(now()),1)").show() Documentation: date_add(Column start, int days) Returns the date that is days days after start https://spark.apache.org/docs/1.6.2/api/java/org/apache/spark/sql/functions.html

Related

Converting a string to date in SQL

How to iterate a string literal based on date using while loop in SQL

SQL query date range

How to use substring of result in PosrgreSQL/SQL

Oracle date column showing the wrong value

Categories

Resources