how to select day for week Pyspark - dataframe

I need to create a column for a day of the week where values will be Monday, Tuesday, Wednesday...
and then apply a filter only for Friday.
The code I'm using is the following:
df = (
spark.table(f'nn_squad7_{country}.fact_table')
.filter(f.col('date_key').between(start,end))
.filter(f.col('is_client_plus')==1)
.filter(f.col('source')=='tickets')
.filter(f.col('subtype')=='trx')
.filter(f.col('is_trx_ok') == 1)
.withColumn('week', f.date_format(f.date_sub(f.col('date_key'), 1), 'YYYY-ww'))
.withColumn('month', f.date_format(f.date_sub(f.col('date_key'), 1), 'M'))
.withColumn('HP_client', f.col('customer_id').isNotNull())
.withColumn('local_time',f.from_utc_timestamp(f.col('trx_begin_date_time'),'Europe/Brussels'))
.withColumn('Hour', f.hour(f.col('local_time')))
.withColumn('Day', f.day(f.col('local_time')))
.filter(f.col('Hour').between(4, 8))
)
Here is the error I get:
AttributeError: module 'pyspark.sql.functions' has no attribute 'day'
How can I create a column for on a dayli basis? Thanks

You can use F.dayofweek, which returns an integer (1 = Sunday, 2 = Monday, ..., 7 = Saturday).
Alternatively, you can use F.date_format('local_time', 'E'), which returns a string like 'Sun', 'Mon', etc.
'EEEE' returns the string in full, e.g. Sunday, etc.

Related

Change month and day only in BigQuery

Is there an equivalent to DATEFROMPARTSin BigQuery? I'm trying to change only the month and day in my timestamp, not the year.
Here's my table in DATETIME:
BirthYear
2014-12-12T00:00:00
2015-01-07T00:00:00
I want to change only the month and day but keep the year. For example change the bottom row to: 2015-04-01T00:00:00
The following query works in MS SQL and I'm trying to rewrite it in BigQuery:
UPDATE `table` SET BirthYear = DATEFROMPARTS(BirthYear, 04, 01) WHERE BirthYear IS NULL
BigQuery equivalent of datefromparts(year(birthdate), 4, 1) is
date(extract(year from BirthYear), 4, 1)
also, if you need it to "convert" back to datetime you might want to use as below
datetime(date(extract(year from BirthYear), 4, 1))
For the following SQL Server expression:
datefromparts(year(birthdate), 4, 1)
In BigQuery, you could do this with datetime_trunc() and datetime_add():
datetime_add(datetime_trunc(birthdate, year), interval 4 month)
This gives you a datetime value. You can use date_trunc() and date_add() if you want to handle dates instead.

Format date information from string with SQL

Is there a way to convert a string like this '160806CD01' into a date like this '2016-08-06 00:00:00' with SQL where the year, month, and date are 16, 8, and 6 respectively?
The principle is to first extract the part of the string that contains the date, then use a conversion function to turn the string portion it into a date datatype.
The functions to use do vary depending on the RDBMS ; here are some examples :
Oracle :
TO_DATE(SUBSTR(col, 1, 6), 'YYMMDD')
MySQL/MariaDB :
STR_TO_DATE(SUBSTR(col, 1, 6), '%y%m%d')
SqlServer :
CAST(CONCAT('20', SUBSTRING(col, 1, 6)) as datetime)
Postgres :
TO_DATE(SUBSTRING(col, 1, 6), 'YYMMDD')

Find Birthday in upcoming month

I Need help in finding upcoming birthday in a month.
My Data is something like below , Both data types are nvarchar
Could anyone help me with the sql query please? how to set the DOB column into a date format and then find the birthday with month as 11 and date as 24.
Thanks in advance
Assuming SQL Server, you can use month() to extract the month from a date, for example getdate(), which is the current point in time. With left() you can extract the first characters of a string. That leads to something like:
SELECT [Name],
[Dob(mmdd)]
FROM elbat
WHERE month(getdate()) = left([Dob(mmdd)], 2);
In Microsoft SQL Server you can Create a date using the DATEFROMPARTS(int year, int month, int day) function. To get your month and day you would have to get the 2 parts of the string, the first 2 characters for month and the third and fourth characters as the day, you can use the SUBSTRING function for this. Then take each pair of characters for month and day and cast to int and use them in the DATEFROMPARTS function.
Then you want to see if your newly created date is BETWEEN today AND one month from today. So you could do something like this:
SELECT *
FROM SomeTable
WHERE
DATEFROMPARTS(YEAR(GETDATE()), CAST(SUBSTRING([Dob(mmdd)], 1, 2) as INT), CAST(SUBSTRING([Dob(mmdd)], 3, 2) as INT))
BETWEEN
DATEADD(DAY, -1, GETDATE()) AND DATEADD(MONTH, 1, GETDATE())
Note: this assumes [Dob(mmdd)] is always 4 characters.
You don't need the DOB in a date format. I am unclear what "upcoming" month means, but I suspect that it means a calendar month. If the current month, then:
where month(getdate()) = cast(left(dob, 2) as int)
If the next month, then:
where month(dateadd(month, 1, getdate())) = cast(left(dob, 2) as int)
Thanks, Everyone for the help.. I got this working,. both works perfect
select [USER_ID],[EMP_FULL_NM],[Birthday_Date] from [dbo].[COE]
where month(getdate())=left([Birthday_Date],2)
select [USER_ID],[JOINING_DT],[EMP_FULL_NM] from [dbo].[COE]
where SUBSTRING(CONVERT(VARCHAR(10), [JOINING_DT], 101),1,2) = month(getdate())

SQL: How to get the date yyyy/mm/dd based on the year and day number?

I have the following string 2015089 or 2016075, for example.
I need to get the result in yyyy/mm/dd format based on the given input.
So, based on 2015089, I get, 2015/mm/dd. dd is a 89th day of 2015 and mm is a month that has 89th day.
How can I do something like that?
I think the simplest way is to convert to a date using dateadd():
select dateadd(day, right(str, 3) - 1, datefromparts(left(str, 4) + 0, 1, 1) )
That is, add one less than the number of days to the beginning of the year. This assumes that Jan 1 is represented as "1" and not "0".
You can then format the date however you like.
In pre-SQL Server 2012, you can do:
select dateadd(day, right(str, 3) - 1, cast(left(str, 4) + '0101' as date))

Django ORM: filter by hour range

I'm trying to implement a filter for hour range, it should returns records with a date between hourA and hourB (ie: "give me the records saved between 16pm and 18pm").
My attempts:
1) Using new 1.6 __hour filter and __in or __range:
MyModel.objects.filter(date__hour__in=(16, 17, 18))
MyModel.objects.filter(date__hour__range=(16, 18))
The code above generates exceptions
2) Using Q objects:
hList = [Q(date__hour=h) for h in (16, 17, 18)]
MyModel.objects.filter(reduce(operator.or_, hList))
This version works, but is very inefficient, since for each hour in the range it repeats the extract() call by generating something like:
where
extract(hour from date) = 16 or
extract(hour from date) = 17 or
extract(hour from date) = 18
when instead the right raw SQL should be:
where extract(hour from date) in (16, 17, 18)
…how can I filter by hour range in an effective manner, without relying on raw sql?
I managed to solve the issue in this way:
all_points = MyModel.objects.all()
all_points.extra(where=['extract(hour from MyDateField) in (16, 17, 18)'])
In Django 1.9+ you can chain hour lookups, so the examples from the question will work:
MyModel.objects.filter(date__hour__in=(16, 17, 18))
MyModel.objects.filter(date__hour__range=(16, 18))