GreenPlum SQL query to convert into Hive - hive

Below query we want to convert greeenplum SQL to hive sql, kindly help us.
GP sub query :
select a.region , amount_fr , count_fr , amount_sr , count_sr from (
select region , sum(cast(amount as integer)) as amount_fr , count(transid) as count_fr from test.test_fr_imi
where cast(trans_date as date) between (select cast(add_months(trunc(date_sub(cast(current_date as date),1),'MM'),-1) as date) ) and
(select current_date::date - 1 - substring((select (date_trunc('month',current_date::date ))::date - 1)::character varying,9,10)::integer)
group by 1 ) a
JOIN
(select region , sum(amount::integer) as amount_sr , count(transid) as count_sr from test.test_sr_imi
where trans_date::date between (select (date_trunc('month',current_date::date - 1) - interval '1 month')::date) and
(select current_date::date - 1 - substring((select (date_trunc('month',current_date::date ))::date - 1)::character varying,9,10)::integer) group by 1 ) b on
a.region = b.region;
I needs to convert above query which is sub query not able to convert to hive, please find particular code as above one.
(select current_date::date - 1 - substring((select
(date_trunc('month',current_date::date ))::date - 1)::character
varying,9,10)::integer)
select to_char(round((select sum(revenue)/1000000.00 from test.sampletable where trxn_date = current_date-1)/1, 2) ,'999,999')
select to_char(round((select sum(case when to_char(trxn_date, 'yyyymm') = to_char((current_date - 1) - '1 month'::interval, 'yyyymm') and extract(day from trxn_date) < extract(day from current_date) then revenue end)/1000000.00 from test.sample2table)/1, 2) ,'999,999')
select to_char(round ((select(( ( sum(case when to_char(trxn_date, 'yyyymm') = to_char((current_date - 1), 'yyyymm') then revenue end)) - (sum(case when to_char(trxn_date, 'yyyymm') = to_char((current_date - 1) - '1 month'::interval, 'yyyymm') and extract(day from trxn_date) < extract(day from current_date) then revenue end) )) / (sum(case when to_char(trxn_date, 'yyyymm') = to_char((current_date - 1) - '1 month'::interval, 'yyyymm') and extract(day from trxn_date) < extract(day from current_date) then revenue end) ))*100.00 from test.sample3table)/1, 2) ,'9990.99%')

Here are some helps on some functions -
round() - available in hive.
to_char(INT,'999,999') - You can use format_number(12345,0) - this will result 12,345
CURRENT_DATE- available in hive.
CURRENT_DATE-1 - This can be done using current_date - interval '1' day.
Syntax in hive is a little different. You need to put your sql in a subquery adn the calculate on top of it.
So, here is equivalent of first sql-
SELECT format_number(round(sum_rev/1, 2), 0)
FROM
(SELECT sum(revenue)/1000000.00 as sum_rev
FROM test.sampletable
WHERE trxn_date = CURRENT_DATE- interval '1' day) rs
here is equivalent sql of second query
SELECT format_number(round(sum_rev/1, 2), 0)
FROM (SELECT sum(CASE
WHEN from_timestamp(trxn_date, 'yyyyMM') = from_timestamp((CURRENT_DATE - interval '1' day) - interval '1' month, 'yyyyMM')
AND extract(DAY
FROM trxn_date) < extract(DAY
FROM CURRENT_DATE) THEN revenue
END)/1000000.00 as sum_rev
FROM test.sample2table) rs

Related

Getting a period index from a date in PostgreSQL

Here is a Postgres code I created, it works. Is there a way to code it in a more efficient way? My goal is to get how much periods a given date falls from 2014-03-01. One period is a half-year starting from March or September.
I updated this code below on 2022-05-18 at 10:19 UTC+2
select date,
dense_rank() over (order by half_year_mar_sep) as period_index
from
(
select date as date,
case when extract(month from date) = 12 then (extract(year from date) || '-09-01')
when extract(month from date) in (1, 2) then (extract(year from date) - 1 || '-09-01')
when extract(month from date) in (3, 4, 5) then (extract(year from date) || '-03-01')
when extract(month from date) in (6, 7, 8) then (extract(year from date) || '-03-01')
else extract(year from date) || '-09-01'
end::date as half_year_mar_sep
from
(
select generate_series(date '2014-03-01', CURRENT_DATE, interval '1 day')::date as date
) s1
) s2
If I encapsulate the code above into select min(date), period_index from (<code above>) s3 group by 2 order by 1 then here is the result what I need:
WITH cte AS (
SELECT
date1::date,
rank() OVER (ORDER BY date1)
FROM generate_series(date '2014-03-01', CURRENT_DATE + interval '1' month, interval '6 month') g (date1)
),
cteall AS (
SELECT
all_date::date
FROM
generate_series(date '2014-03-01', CURRENT_DATE + interval '1' month, interval ' 1 day') s (all_date)
),
cte3 AS (
SELECT
*
FROM
cteall c1
LEFT JOIN cte c2 ON date1 = all_date
),
cte4 AS (
SELECT
*,
count(rank) OVER w AS ct_str
FROM
cte3
WINDOW w AS (ORDER BY all_date))
SELECT
*,
rank() OVER (PARTITION BY ct_str ORDER BY all_date) AS rank1,
dense_rank() OVER (ORDER BY all_date) AS dense_rank1
FROM
cte4;
Hope it's not intimidating. personally I found cte is a good tool, since it make logic more clearly.
demo
useful link: How to do forward fill as a PL/PGSQL function
If some column don't need, you can simple replace * with the columns you want.
Based on #Mark's answer I wrote this code below, but it's not simpler than the original code.
select s.date,
m.period_index
from
(
select date::date as half_year_start,
rank() over (order by date) as period_index,
coalesce(lead(date::date, 1) over (), CURRENT_DATE) as following_half_year_start
from generate_series(date '2014-03-01', CURRENT_DATE + interval '1' month, interval '6 month') as date
) m
left join
(
select generate_series(date '2014-03-01', CURRENT_DATE, interval '1 day')::date as date
) s
on s.date between m.half_year_start and m.following_half_year_start
;

SQL Conditional Select Help (Amazon Redshift SQL)

I currently have 2 queries that I am trying to merge into 1. Basically I want to pull previous days sales if it's Tuesday - Friday and the previous 3 days sales if it's Monday. My queries are below - is there a way to do a conditional select for those date based on the day of the week?
Monday's version
SELECT *
FROM A
WHEN DATE_TRUNC('day', timestamp) IN (CURRENT_DATE - 1, CURRENT_DATE - 2, CURRENT_DATE - 3)
AND DATE_PART(weekday, current_date) = 1
Tuesday - Friday version
SELECT *
FROM A
WHEN DATE_TRUNC('day', timestamp) = CURRENT_DATE - 1
AND DATE_PART(weekday, current_date) = 1
one solution is :
SELECT *
FROM A
WHERE DATE_TRUNC('date', timestamp) <= DATE_TRUNC('date', CURRENT_DATE)
AND DATE_TRUNC('date', timestamp) >= case when DATE_TRUNC('day of week',CURRENT_DATE) = 'Monday' then DATE_TRUNC('date', CURRENT_DATE - 3) else DATE_TRUNC('date',CURRENT_DATE) end
You can use `OR:
SELECT *
FROM A
WHERE (DATE_TRUNC('day', timestamp) IN (CURRENT_DATE - 1, CURRENT_DATE - 2, CURRENT_DATE - 3) AND
DATE_PART(weekday, current_date) = 1
) OR
(DATE_TRUNC('day', timestamp) = CURRENT_DATE - 1 AND
DATE_PART(weekday, current_date) <> 1
)
This can be slightly simplified to:
WHERE timestamp > CURRENT_DATE -
(CASE WHEN DATE_PART(weekday, current_date) = 1 INTERVAL '3 DAY'
ELSE INTERVAL '1 DAY'
END)
Note: This assumes that there are no future timestamps.

get List of counts from table based on dates in sql

I have to fetch List of counts from table by department here is my table structure
empid empname department departmentId joinedon
i want to populate all the joined employee on today , yesterday and More than 2 days like [12,25,89] i.e
12* joined today
25 joined yesterday
81 joined all prior to yesterday(2+day)
* 0 if there isn't any entries for given date range.
You would use aggregation on a case expression:
select (case when joinedon::date = current_date then 'today'
when joinedon::date = current_date - interval '1 day' then 'yesterday'
when joinedon::date < current_date - interval '1 day' then 'older'
end) as grp,
count(*)
from t
group by grp;
In additional to #Gordon Linoff answer:
SELECT
days.day,
coalesce(t.cnt, 0) count
FROM (
SELECT * FROM (VALUES ('today'), ('yesterday'), ('older')) AS days (day)
)days
LEFT JOIN (
SELECT (CASE WHEN joinedon::date = current_date THEN 'today'
WHEN joinedon::date = current_date - interval '1 day' THEN 'yesterday'
WHEN joinedon::date < current_date - interval '1 day' THEN 'older'
end) as day,
count(*) cnt
FROM t
GROUP BY day
) t on t.day = days.day;
Test it here
You can use the group by as follows:
select department,
(case when joinedon::date = current_date then 'today'
when joinedon::date = current_date - interval '1 day' then 'yesterday'
when joinedon::date < current_date - interval '1 day' then 'More than 2 days'
end) as grp,
Coalesce(count(*),0)
from t
group by grp, department;

oracle query to report data only for the 1st or 2nd half of the year

I have a report which should display enrollment data only within 2 date ranges Jan-June or July-dec depending on current date.
Scenarios:
If the current date is 042020 then I should display enrollement data between this range: 072019-122019
If the current date is 072020 then I should display enrollement data between this range: 012020-062020
If the current date is 022021 then I should display enrollement data between this range: 072020-122020
Current query reports everything past 6 months with his query.
select * from enrollement where enrollement_dt > add_months(sysdate - 6);
Is there any function available in oracle to do the same or how do i get the logic in a single statement?
Any help with this is highly appreciated.
You may try below query -
select *
from enrollement
WHERE TO_CHAR(enrollement_dt, 'MMYYYY') >= CASE WHEN TO_CHAR(SYSDATE, 'mm') <= '06'
THEN TO_DATE('07' || EXTRACT(YEAR FROM SYSDATE) - 1, 'MMYYYY')
ELSE THEN TO_DATE('01' || EXTRACT(YEAR FROM SYSDATE), 'MMYYYY')
END
AND TO_CHAR(enrollement_dt, 'MMYYYY') <= CASE WHEN TO_CHAR(SYSDATE, 'mm') <= '06'
THEN TO_DATE('12' || EXTRACT(YEAR FROM SYSDATE) - 1, 'MMYYYY')
ELSE THEN TO_DATE('06' || EXTRACT(YEAR FROM SYSDATE), 'MMYYYY')
END
Basically you want to truncate to the half-year. But Oracle doesn't support this.
One method counts half-years and compares them. You want the previous half year from the current date. That would be:
select (extract(year from sysdate) * 2 + floor(extract(month from sysdate) - 1) / 6) - 1
from dual
You can use this same formula:
where (extract(year from enrollement_dt) * 2 + floor(extract(month from enrollement_dt) - 1) / 6) - 1 =
extract(year from sysdate) * 2 + floor(extract(month from sysdate) - 1) / 6) - 1
)
from dual;
Unfortunately that can't use an index on the column. So, we can revisit this. You can get the first day of the current half using some date arithmetic:
select trunc(sysdate, 'Q') - mod(floor((extract(month from sysdate) - 1) / 3), 2) * interval '3' month
from dual
That just needs to be plugged into a where clause:
where enrollement_dt >= trunc(sysdate, 'Q') - mod(floor((extract(month from sysdate) - 1) / 3), 2) * interval '3' month - interval '6' month and
enrollement_dt < trunc(sysdate, 'Q') - mod(floor((extract(month from sysdate) - 1) / 3), 2) * interval '3' month
Voila! An expression that can even use an index.
You can use the below to get the start date and end date for enrollment
WITH data
AS (SELECT TRUNC(SYSDATE) curr_date from dual
),
d2
AS (SELECT curr_date,
To_date('0107'
||( Extract (year FROM curr_date) - 1 ), 'ddmmyyyy')
start_first_half,
To_date('3112'
||( Extract (year FROM curr_date) - 1 ), 'ddmmyyyy')
end_first_half,
To_date('0101'
||Extract (year FROM curr_date), 'ddmmyyyy')
start_second_half,
To_date('3006'
||Extract (year FROM curr_date), 'ddmmyyyy')
end_second_half
FROM data)
SELECT curr_date,
CASE
WHEN To_char(curr_date, 'MM') >= To_char(start_first_half, 'MM')
AND To_char(curr_date, 'MM') <= To_char(end_first_half, 'MM') THEN
start_second_half
ELSE start_first_half
END start_date1,
CASE
WHEN To_char(curr_date, 'MM') >= To_char(start_first_half, 'MM')
AND To_char(curr_date, 'MM') <= To_char(end_first_half, 'MM') THEN
end_second_half
ELSE end_first_half
END end_date1
FROM d2
You can use it in your query like below
Select * from enrollment_table a, (WITH data
AS (SELECT TRUNC(SYSDATE) curr_date from dual
),
d2
AS (SELECT curr_date,
To_date('0107'
||( Extract (year FROM curr_date) - 1 ), 'ddmmyyyy')
start_first_half,
To_date('3112'
||( Extract (year FROM curr_date) - 1 ), 'ddmmyyyy')
end_first_half,
To_date('0101'
||Extract (year FROM curr_date), 'ddmmyyyy')
start_second_half,
To_date('3006'
||Extract (year FROM curr_date), 'ddmmyyyy')
end_second_half
FROM data)
SELECT curr_date,
CASE
WHEN To_char(curr_date, 'MM') >= To_char(start_first_half, 'MM')
AND To_char(curr_date, 'MM') <= To_char(end_first_half, 'MM') THEN
start_second_half
ELSE start_first_half
END start_date1,
CASE
WHEN To_char(curr_date, 'MM') >= To_char(start_first_half, 'MM')
AND To_char(curr_date, 'MM') <= To_char(end_first_half, 'MM') THEN
end_second_half
ELSE end_first_half
END end_date1
FROM d2 ) b
where a.enrollment_date >=b.start_date1
and a.enrollment_date <=b.end_date1

Vertica - WITH clause is not recognized in actual query

Please take a look at the following Vertica SQLcode:
WITH date_range AS
(SELECT YEAR(now() - interval '1' MONTH) ||MONTH(now() - interval '1' MONTH) ||'#'||DATE(TRUNC(now() - interval '1' MONTH, 'mm')) ||'#'||DATE(TRUNC(now(), 'mm') - interval '1' DAY) AS month1
, YEAR(now() - interval '2' MONTH) ||MONTH(now() - interval '2' MONTH) ||'#'||DATE(TRUNC(now() - interval '2' MONTH, 'mm')) ||'#'||DATE(TRUNC(now() - interval '1' MONTH, 'mm') - interval '1' DAY) AS month2
)
SELECT regexp_substr(
(SELECT month1
FROM date_range), '[^#]*', 1, 1)
I have a 420 rows-long query, and I need to use "month1" and "month2" as variables many time in my code. Unfortunately, Vertica still doesn't support variables, so I tried to use a WITH clause instead.
Unfortunately, it doesn't work, as I keep getting the following error message:
(4566) ERROR: Relation "date_range" does not exist
So help me God (or Stack Overflow)
I think this is the query you want:
WITH date_range AS (
SELECT YEAR(now() - interval '1' MONTH) ||MONTH(now() - interval '1' MONTH) ||'#'||DATE(TRUNC(now() - interval '1' MONTH, 'mm')) ||'#'||DATE(TRUNC(now(), 'mm') - interval '1' DAY) AS month1,
YEAR(now() - interval '2' MONTH) ||MONTH(now() - interval '2' MONTH) ||'#'||DATE(TRUNC(now() - interval '2' MONTH, 'mm')) ||'#'||DATE(TRUNC(now() - interval '1' MONTH, 'mm') - interval '1' DAY) AS month2
)
SELECT regexp_substr(month1, '[^#]*', 1, 1)
FROM date_range;
In an actual query, you would do this as:
SELECT regexp_substr(dr.month1, '[^#]*', 1, 1)
FROM date_range dr CROSS JOIN
. . .;
I often call such CTEs params to highlight that they are providing parameters to the query.