Date Transformation using Hive or sql - sql

Premise: You have a table with one column, original_date, of datatype string:
ORIGINAL_DATE
20190825
20190826
20190827
20190828
20190829
20190830
20190831
20190901
Question: Write a SQL query to calculate two more columns – end_of_week - the date of the next Sunday from original_date. If original_date is already a Sunday, this field should be the same value end_of_month - the value of the end of month date An acceptable solution is one which works for any valid date in the string format of original_date. With end_of_month and end_of_week computed
ORIGINAL_DATE END_OF_WEEK END_OF_MONTH
20190825 20190825 20190831
20190826 20190901 20190831
20190827 20190901 20190831
20190828 20190901 20190831
20190829 20190901 20190831
20190830 20190901 20190831
20190831 20190901 20190831
20190901 20190901 20190930
Additional Info:
20190825 is a Sunday, so the end_of_week for that value is still that same date.
20190827 is a Tuesday, and the next Sunday is on 20190901
CREATE TABLE random_dates ( original_date VARCHAR(8) NOT NULL );
INSERT INTO random_dates(original_date) values('20190825');
INSERT INTO random_dates(original_date) values('20190826');
INSERT INTO random_dates(original_date) values('20190827');
INSERT INTO random_dates(original_date) values('20190828');
INSERT INTO random_dates(original_date) values('20190829');
INSERT INTO random_dates(original_date) values('20190830');
INSERT INTO random_dates(original_date) values('20190831');
INSERT INTO random_dates(original_date) values('20190901');
EXPECTED OUTPUT:
20190825 2019-08-25 2019-08-31
20190826 2019-09-01 2019-08-31
20190827 2019-09-01 2019-08-31
20190828 2019-09-01 2019-08-31
20190829 2019-09-01 2019-08-31
20190830 2019-09-01 2019-08-31
20190831 2019-09-01 2019-08-31
20190901 2019-09-01 2019-09-30

Solution for Hive:
with random_dates as (--this is your example dataset
select stack(8,
'20190825', '20190826', '20190827', '20190828', '20190829', '20190830', '20190831', '20190901'
) as original_date
)
select original_date,
date_add(date_formatted, 6-days) end_of_week,
last_day(date_formatted) end_of_month
from
(
select original_date,
regexp_replace(original_date,'^(\\d{4})(\\d{2})(\\d{2})$','$1-$2-$3') date_formatted,
pmod(datediff(regexp_replace(original_date,'^(\\d{4})(\\d{2})(\\d{2})$','$1-$2-$3'),'1900-01-08'),7) days
from random_dates
)s
;
Result:
original_date end_of_week end_of_month
20190825 2019-08-25 2019-08-31
20190826 2019-09-01 2019-08-31
20190827 2019-09-01 2019-08-31
20190828 2019-09-01 2019-08-31
20190829 2019-09-01 2019-08-31
20190830 2019-09-01 2019-08-31
20190831 2019-09-01 2019-08-31
20190901 2019-09-01 2019-09-30

SELECT original_date,
CASE DAYOFWEEK(STR_TO_DATE(original_date,'%Y%m%d'))
WHEN 1 THEN DATE_ADD(STR_TO_DATE(original_date,'%Y%m%d'),INTERVAL 0 DAY)
WHEN 2 THEN DATE_ADD(STR_TO_DATE(original_date,'%Y%m%d'),INTERVAL 6 DAY)
WHEN 3 THEN DATE_ADD(STR_TO_DATE(original_date,'%Y%m%d'),INTERVAL 5 DAY)
WHEN 4 THEN DATE_ADD(STR_TO_DATE(original_date,'%Y%m%d'),INTERVAL 4 DAY)
WHEN 5 THEN DATE_ADD(STR_TO_DATE(original_date,'%Y%m%d'),INTERVAL 3 DAY)
WHEN 6 THEN DATE_ADD(STR_TO_DATE(original_date,'%Y%m%d'),INTERVAL 2 DAY)
WHEN 7 THEN DATE_ADD(STR_TO_DATE(original_date,'%Y%m%d'),INTERVAL 1 DAY)
END AS END_OF_WEEK,
LAST_DAY(STR_TO_DATE(original_date,'%Y%m%d')) AS END_OF_MONTH
FROM random_dates;

Related

Postgres generate_series how to exclude last day when hour is 00:00

I need to generate a series of days in postgresql that would produce different result depending on the hours in the timestamp.
My series generation works fine when the time is not midnight.
For time range 2023-01-06 10:00:00+00 - 2023-02-03 10:00:00+00 I get a list of days where the first element is 2023-01-06 and the last is 2023-02-03. This works as expected:
generate_series('2023-01-06 10:00:00+00'::date, '2023-02-03 10:00:00+00'::date, '1 day')
However, for time range 2023-01-06 00:00:00+00 - 2023-02-03 00:00:00+00 I would like to get a list of days where the first element is 2023-01-06 and the last is 2023-02-02 as effectively 2023-02-03 hasn't started. That series still gives me an output that includes 2023-02-03, which is not what I want:
generate_series('2023-01-06 00:00:00+00'::date, '2023-02-03 00:00:00+00'::date, '1 day')
Is that possible to achieve in postgres?
you could check if ot os midnight and then subtract 1 Minute or 1 second from the end date
SELECt *
FROM generate_series('2023-01-06 00:00:00+00'::date,
(CASE WHEN to_char('2023-02-03 00:00:00+00'::date, 'HH24:MI:SS') = '00:00:00' THEN
'2023-02-03 00:00:00+00'::date - interval '1 Minute'
ELSE '2023-02-03 00:00:00+00'::date END) , '1 day')
generate_series
2023-01-06 00:00:00
2023-01-07 00:00:00
2023-01-08 00:00:00
2023-01-09 00:00:00
2023-01-10 00:00:00
2023-01-11 00:00:00
2023-01-12 00:00:00
2023-01-13 00:00:00
2023-01-14 00:00:00
2023-01-15 00:00:00
2023-01-16 00:00:00
2023-01-17 00:00:00
2023-01-18 00:00:00
2023-01-19 00:00:00
2023-01-20 00:00:00
2023-01-21 00:00:00
2023-01-22 00:00:00
2023-01-23 00:00:00
2023-01-24 00:00:00
2023-01-25 00:00:00
2023-01-26 00:00:00
2023-01-27 00:00:00
2023-01-28 00:00:00
2023-01-29 00:00:00
2023-01-30 00:00:00
2023-01-31 00:00:00
2023-02-01 00:00:00
2023-02-02 00:00:00
SELECT 28
fiddle

get time series in 8 hours of interval

I am generating one time-series from using the below query.
SELECT * from (
select * from generate_series(
date_trunc('hour', '2021-11-13 10:01:38'::timestamp),
'2021-12-13 10:01:38'::timestamp,
concat(480, ' minutes')::interval) as t(time_ent)) as t
where t."time_ent" between '2021-11-13 10:01:38'::timestamp and '2021-12-13 10:01:38'::timestamp
and it will give me output like below.
2021-11-13 18:00:00.000
2021-11-14 02:00:00.000
2021-11-14 10:00:00.000
2021-11-14 18:00:00.000
2021-11-15 02:00:00.000
but I need output like.
2021-11-13 16:00:00.000
2021-11-14 00:00:00.000
2021-11-14 08:00:00.000
2021-11-14 16:00:00.000
2021-11-15 00:00:00.000
currently, the time series hours depend upon the timestamp that I pass. in above it gives me hours like 02,10,18...but I want the hours like 00,08,16...hours should not depend on the time I passed in query. I tried many things but not any success.
as your start of generate_series is set to 10:00:00, so your next step will be 18:00:00
you have to start your serie from 00:00:00 (cast to date) e.g.:
SELECT
time_ent::timestamp without time zone
from (
select * from generate_series(
date_trunc('hour', '2021-11-13 10:01:38'::date),
'2021-12-13 10:01:38'::timestamp ,
concat(480, ' minutes')::interval) as t(time_ent)
) as t
where t."time_ent" between '2021-11-13 10:01:38'::timestamp and '2021-12-13 10:01:38'::timestamp
and the result will be:
2021-11-13 16:00:00.000
2021-11-14 00:00:00.000
2021-11-14 08:00:00.000
2021-11-14 16:00:00.000
2021-11-15 00:00:00.000
2021-11-15 08:00:00.000

Get the dates of two weeks from today from database

I have some dates in postgresql database. I want to find dates from today to next two weeks or 14 days. How i can find the dates between current date and next 14 days? This query is not working.
I have date format 2019-12-26 in database.
"SELECT work_date FROM USERS_SCHEDULE WHERE user_id = 11 AND data(now() +14)";
Simply by adding the number of days to the date you can set the limit date you want.
Sample Data
CREATE TABLE users_schedule (work_date DATE);
INSERT INTO users_schedule
SELECT generate_series(CURRENT_DATE, DATE '2020-01-31', '1 day');
Query (dates between the current date and 3 days later)
SELECT work_date FROM users_schedule
WHERE work_date BETWEEN CURRENT_DATE AND CURRENT_DATE + 3;
work_date
------------
2019-12-26
2019-12-27
2019-12-28
2019-12-29
(4 rows)
If you mean you want to get all possible dates inside an interval, take a look at generate_series:
SELECT generate_series(DATE '2016-08-01', DATE '2016-08-14', '1 day');
generate_series
------------------------
2016-08-01 00:00:00+02
2016-08-02 00:00:00+02
2016-08-03 00:00:00+02
2016-08-04 00:00:00+02
2016-08-05 00:00:00+02
2016-08-06 00:00:00+02
2016-08-07 00:00:00+02
2016-08-08 00:00:00+02
2016-08-09 00:00:00+02
2016-08-10 00:00:00+02
2016-08-11 00:00:00+02
2016-08-12 00:00:00+02
2016-08-13 00:00:00+02
2016-08-14 00:00:00+02
(14 rows)
Using CURRENT_DATE
SELECT generate_series(CURRENT_DATE, DATE '2019-12-31', '1 day');
generate_series
------------------------
2019-12-26 00:00:00+01
2019-12-27 00:00:00+01
2019-12-28 00:00:00+01
2019-12-29 00:00:00+01
2019-12-30 00:00:00+01
2019-12-31 00:00:00+01
(6 rows)
SELECT work_date
FROM users_schedule
WHERE user_id = 11
AND work_date BETWEEN CURRENT_DATE
AND CURRENT_DATE + INTERVAL '14 days'

SQL Server Query to Pivot using CASE Statement

I have the following data:
763b44e57b39-16e5bb772ac November Monthly Mailer MM_10 191201-1 2019-12-01 00:00:00.000 2020-01-01 00:00:00.000
763b44e57b39-16e5bb772ac November Monthly Mailer MM_10 191208-2 2019-12-01 00:00:00.000 2020-01-01 00:00:00.000
763b44e57b39-16e5bb772ac November Monthly Mailer MM_10 191215-3 2019-12-01 00:00:00.000 2020-01-01 00:00:00.000
763b44e57b39-16e5bb772ac November Monthly Mailer MM_10 191222-4 2019-12-01 00:00:00.000 2020-01-01 00:00:00.000
763b57fe9950-16dac7db279 October Monthly Mailer MM_10 191001-1 2019-10-01 00:00:00.000 2019-11-01 00:00:00.000
763b57fe9950-16dac7db279 October Monthly Mailer MM_10 191008-2 2019-10-01 00:00:00.000 2019-11-01 00:00:00.000
763b57fe9950-16dac7db279 October Monthly Mailer MM_10 191015-3 2019-10-01 00:00:00.000 2019-11-01 00:00:00.000
763b57fe9950-16dac7db279 October Monthly Mailer MM_10 191022-4 2019-10-01 00:00:00.000 2019-11-01 00:00:00.000
763b57ff55b7-16dad4ef4b8 November Monthly Mailer MM_10 191101-1 2019-11-01 00:00:00.000 2019-12-01 00:00:00.000
763b57ff55b7-16dad4ef4b8 November Monthly Mailer MM_10 191108-2 2019-11-01 00:00:00.000 2019-12-01 00:00:00.000
763b57ff55b7-16dad4ef4b8 November Monthly Mailer MM_10 191115-3 2019-11-01 00:00:00.000 2019-12-01 00:00:00.000
763b57ff55b7-16dad4ef4b8 November Monthly Mailer MM_10 191122-4 2019-11-01 00:00:00.000 2019-12-01 00:00:00.000
763b5803a370-16dcb7cfd7e 11th Anniversary Celebration SBR $15 Sky Ute Loot 2019-11-01 00:00:00.000 2019-11-02 00:00:00.000
I need to create a Pivot using CASE statement so that the results looks like:
763b44e57b39-16e5bb772ac MM_10 191201-1 MM_10 191208-2 MM_10 191215-3 MM_10 191222-4
763b57fe9950-16dac7db279 MM_10 191001-1 MM_10 191008-2 MM_10 191015-3 MM_10 191022-4
What's the best way to do this using a CASE statement?
If you use ROW_NUMBER to generate a sequential number, then that can be used in the conditional aggregations.
SELECT Campaign_id,
MAX(CASE WHEN Rn = 1 THEN RIGHT(Campaign_name, 5) END) AS CampaignCode1,
MAX(CASE WHEN Rn = 1 THEN Offer_name END) AS OfferName1,
MAX(CASE WHEN Rn = 2 THEN RIGHT(Campaign_name, 5) END) AS CampaignCode2,
MAX(CASE WHEN Rn = 2 THEN Offer_name END) AS OfferName2,
MAX(CASE WHEN Rn = 3 THEN RIGHT(Campaign_name, 5) END) AS CampaignCode3,
MAX(CASE WHEN Rn = 3 THEN Offer_name END) AS OfferName3,
MAX(CASE WHEN Rn = 4 THEN RIGHT(Campaign_name, 5) END) AS CampaignCode4,
MAX(CASE WHEN Rn = 4 THEN Offer_name END) AS OfferName4
FROM
(
SELECT Campaign_id, Campaign_name, Offer_name,
ROW_NUMBER() OVER (PARTITION BY Campaign_id ORDER BY REVERSE(Offer_name)) AS Rn
FROM YourTable
) AS Src
GROUP BY Campaign_id
ORDER BY Campaign_id;

Addition in SQL by month

I need to SUM column something by month:
date something
2010-01-02
2010-01-03
2010-01-04
2010-01-07
2010-01-10
2010-01-12
2010-01-13
2010-01-14
2010-01-15
2010-01-16
2010-01-17
2010-01-18 3
2010-01-19 1
2010-01-21
2010-01-22 11
2010-01-23 1
2010-01-24
2010-01-25
2010-01-26
2010-01-27
2010-01-28
2010-01-29
2010-01-30
2010-01-05 5
2010-01-06 8
2010-01-09
2010-01-08 3
2010-01-11
2010-01-01
2010-01-20 0
2010-01-31 13
Output should be e.g. for JAN 2010 SUM OF SOMETHING 45:
date something
2010-01 45
How to write SQL query for that?
This is a simple aggregation based on the month of the date column:
select to_char("date", 'yyyy-mm'), sum(something)
from the_table
group by to_char("date", 'yyyy-mm')
This assumes the column date has the data type date (or timestamp)