Hive end of the last month

Hive end of the last month - variables

INSERT OVERWRITE TABLE test_month
PARTITION (dt= LAST_DAY('${CURRENT_DATE}'))
SELECT '${CURRENT_DATE}', LAST_DAY('${CURRENT_DATE}');
Current date is first day of the month. I want to achieve something like above. It is not working. This will be HiveQL used in oozie.

Their may be couple ways, here is one way
select order_date, date_sub(concat(
(case
WHEN MONTH(order_date) = '12' THEN concat( (YEAR(order_date) +1) , '-01')
WHEN MONTH(order_date) >= '10' THEN concat( (YEAR(order_date)) , '-', (MONTH(order_date) +1))
WHEN MONTH(order_date) >= '1' THEN concat( (YEAR(order_date)) , '-0', (MONTH(order_date) +1))
ELSE 'XX' END) ,'-01' ) ,1)

Related

Concat/Union two tables in SQL

The sql query below creates two tables, tab1 with 3 columns (quarter, region and datasum) and tab2 with 2 columns (quarter and datasum).
I want to stack the values from tab1 and tab2 together (just like pd.concat([tab1, tab2]) in pandas/python). For that I need to create a new column in tab2 called region and insert that in the same position as the corresponding column in tab1. And after that I think I need to use UNION_ALL.
In tab2 I would like the value of the column region to be 'all' for every instance.
How could I achieve this?
I tried to use ALTER TABLE and ADD but I don't get that to work for me. Help would be much appreciated.
I work in SQL Oracle.
with base1 as(
select substr(municip,1,2) as region, data, age,
case when substr(time,6,2) in ('01','02','03') then substr(time, 1,4) || '_1'
when substr(time,6,2) in ('04','05','06') then substr(time, 1,4) || '_2'
when substr(time,6,2) in ('07','08','09') then substr(time, 1,4) || '_3'
else substr(time, 1,4) || '_4' end quarter
from sql_v1
where time >= '2021-01' and
),
base2 as(select data, age,
case when substr(time,6,2) in ('01','02','03') then substr(time, 1,4) || '_1'
when substr(time,6,2) in ('04','05','06') then substr(time, 1,4) || '_2'
when substr(time,6,2) in ('07','08','09') then substr(time, 1,4) || '_3'
else substr(time, 1,4) || '_4' end quarter
from sql_v1
where time >= '2021-01'),
tab1 as (select quarter, region,
sum (case when age between '16' and '64' then kvar else 0 end) datasum
from base
group by quarter, region
order by quarter, region),
tab2 as (select quarter,
sum (case when age between '16' and '64' then kvar else 0 end) datasum
from riket
group by quarter
order by quarter)
...select * from tab_union

It appears that you're using strings for storing dates??? Don't do that :(
If you use native datetime datatypes then you can extract the quarter using things like TO_CHAR(datetime_column, 'Q')
For now, however, using the horrendous datatypes, you can restructure your query to use ROLLUP in your GROUP BY...
WITH
formatted AS
(
SELECT
SUBSTR(municip,1,2) AS region,
SUBSTR(time, 1,4) || CASE WHEN SUBSTR(time,6,2) IN ('01','02','03') THEN '_1'
WHEN SUBSTR(time,6,2) IN ('04','05','06') THEN '_2'
WHEN SUBSTR(time,6,2) IN ('07','08','09') THEN '_3'
ELSE '_4' END AS quarter,
age,
kvar
FROM
sql_v1
WHERE
time >= '2021-01'
)
SELECT
region,
COALESCE(quarter, 'All'),
SUM(CASE WHEN age BETWEEN 16 AND 64 THEN kvar ELSE 0 END)
FROM
formatted
GROUP BY
region, ROLLUP(quarter)
ORDER BY
region,
GROUPING(quarter),
quarter
Demo : https://dbfiddle.uk/?rdbms=oracle_21&fiddle=ab27d71ac81bb9bc7e5e06e7f5a44ba9
Or, using DATE datatype : https://dbfiddle.uk/?rdbms=oracle_21&fiddle=1544406dc2b6669bc62ed02a6155b1c2

Count sequence selected columns

I have query below, I want sequence result like the value of 'feb' will sum by jan and feb, value of 'mar' will sum by jan, feb and mar,... . Is there any way to get the result like that?
select A.location as location
, count(Case When SUBSTRING(A.base_date,5,2)='01' Then A.customer_no else null end) as "jan"
, count(Case When SUBSTRING(A.base_date,5,2)='02' Then A.customer_no else null end) as "feb"
....
, count(Case When SUBSTRING(A.base_date,5,2)='12' Then A.customer_no else null end) as "dec"
from table_income A group by A.location;

SQL is a much more effective language when you think in rows rather than columns (normalisation).
For example, having one row per month is much simpler...
SELECT
location,
SUBSTRING(base_date,5,2) AS base_month,
SUM(COUNT(customer_no))
OVER (
PARTITION BY location
ORDER BY SUBSTRING(base_date,5,2)
)
AS count_cust
FROM
table_income
GROUP BY
location,
SUBSTRING(base_date,5,2)
Side notes:
If your base_date is a string, it shouldn't be, use data-types relevant to the data
If your base_date is a date or timestamp, you should really use date/timestamp functions, such as EXTRACT(month FROM base_date).
You probably should also account for different years...
SELECT
location,
DATE_TRUNC('month', base_date) AS base_month,
SUM(COUNT(customer_no))
OVER (
PARTITION BY location, DATE_TRUNC('year', base_date)
ORDER BY DATE_TRUNC('month', base_date)
)
AS count_cust
FROM
table_income
GROUP BY
location,
DATE_TRUNC('month', base_date)

Try this :
SELECT A.location as location
, count(Case When SUBSTRING(A.base_date,5,2) in ('01') Then A.customer_no else null end) as "jan"
, count(Case When SUBSTRING(A.base_date,5,2) in ('01','02') Then A.customer_no else null end) as "feb"
....
, count(Case When SUBSTRING(A.base_date,5,2) in ('01','02',...'12') Then A.customer_no else null end) as "dec"
from table_income A group by A.location;

Pivot without aggregate function

I am trying to turn some parts of my rows into columns. To my knowledge, I am only able to use a pivot with an aggregate function,but I would just be pivoting text. For each client I have up to 4 rows grouped by a DLSEQUENCE field. Instead of having the 4 rows, I would like everything to be on 1 row.
SELECT CASE
WHEN Sched_time BETWEEN TRUNC(SCHED_TIME) + INTERVAL '8' HOUR + INTERVAL '30' MINUTE
AND TRUNC(SCHED_TIME) + INTERVAL '14' HOUR + INTERVAL '45' MINUTE AND
TO_CHAR(SCHED_TIME, 'DY') IN ('MON', 'TUE', 'WED', 'THU', 'FRI')
THEN 'ABC'
ELSE 'DEF'
END AS Organization,
Client_Last_Name,
Client_First_Name,
Sched_Time,
Field_Name,
CASE
WHEN Recoded_Response = '1' THEN 'Yes'
WHEN Recoded_Response = '2' THEN 'No'
ELSE Recoded_Response
END AS Responses,
Dlsequence
FROM DAILY_LOG_CUSTOM_DATA
WHERE SERVICE_NAME = 'Medical'
AND FIELD_CATEGORY = 'Background Information'
AND Field_Name IN
(
'Restraint?',
'History',
'Findings',
'Treatment'
)
AND Sched_Time >= TO_DATE('2020-03-01 01:00:00', 'YYYY/MM/DD HH:MI:SS')
AND Sched_Time < TO_DATE('2020-03-31 12:59:00', 'YYYY/MM/DD HH:MI:SS')
Order BY Dlsequence
Here is my table:
I would like the response fields that go with ('Restraint?','History','Findings','Treatment') to have their own column for each DLSEQUENCE field.

The following should do what you had in mind:
SELECT DLSEQUENCE,
ORGANIZATION,
CLIENT_LAST_NAME,
CLIENT_FIRST_NAME,
SCHED_TIME,
LISTAGG("Restraint?", ',') WITHIN GROUP (ORDER BY DLSEQUENCE) AS "Restraint?",
LISTAGG("Findings", ',') WITHIN GROUP (ORDER BY DLSEQUENCE) AS "Findings",
LISTAGG("History", ',') WITHIN GROUP (ORDER BY DLSEQUENCE) AS "History",
LISTAGG("Treatment", ',') WITHIN GROUP (ORDER BY DLSEQUENCE) AS "Treatment"
FROM (SELECT DLSEQUENCE,
ORGANIZATION,
CLIENT_LAST_NAME,
CLIENT_FIRST_NAME,
SCHED_TIME,
CASE
WHEN FIELD_NAME = 'Restraint?' THEN RESPONSES
ELSE NULL
END AS "Restraint?",
CASE
WHEN FIELD_NAME = 'Findings' THEN RESPONSES
ELSE NULL
END AS "Findings",
CASE
WHEN FIELD_NAME = 'History' THEN RESPONSES
ELSE NULL
END AS "History",
CASE
WHEN FIELD_NAME = 'Treatment' THEN RESPONSES
ELSE NULL
END AS "Treatment"
FROM YOUR_TABLE)
GROUP BY DLSEQUENCE,
ORGANIZATION,
CLIENT_LAST_NAME,
CLIENT_FIRST_NAME,
SCHED_TIME
db<>fiddle here

Condition in WHERE clause (Oracle)

I need a query which returns data based on what month/year it is. Below is a subquery i wrote which returns one row - start_date and end_date are the values i need to use in my main query
WITH SUBQ AS (SELECT
dim.MONTH_NAME as current_month_name
,dim.year_period as current_month
,dim.PERIOD_YEAR as YEAR
,CASE WHEN dim.year_period NOT LIKE '%01' THEN to_number(CONCAT(to_char(dim.PERIOD_YEAR-1) , '01' ))
WHEN dim.year_period LIKE '%01'THEN to_number(CONCAT(to_char(dim.PERIOD_YEAR-2) , '01' ))
END AS START_DATE
,CASE WHEN dim.year_period NOT LIKE '%01' THEN to_number(CONCAT(to_char(dim.PERIOD_YEAR) , '01' ))
WHEN dim.year_period LIKE '%01'THEN to_number(CONCAT(to_char(dim.PERIOD_YEAR-1) , '01' )) END AS ENDDATE
from dim_periods dim WHERE dim.year_period=to_number(to_char(sysdate, 'YYYYMM')))
Question is - how do i use values from subquery with one row in where clauses?
I need to get something like this, i just dont understand how should i join my subquery and the rest of the tables i use -
select * from financial_data fd
where fd.year_period BETWEEN subq.start_date and subq.enddate

You could join the subquery, i.e. the CTE with the table, and then use the column name in the filter predicate. The result of the subquery in the WITH clause acts like a temporary table.
For example,
WITH SUBQ AS
(SELECT dim.MONTH_NAME AS current_month_name ,
dim.year_period AS current_month ,
dim.PERIOD_YEAR AS YEAR ,
CASE
WHEN dim.year_period NOT LIKE '%01'
THEN to_number(CONCAT(TO_CHAR(dim.PERIOD_YEAR-1) , '01' ))
WHEN dim.year_period LIKE '%01'
THEN to_number(CONCAT(TO_CHAR(dim.PERIOD_YEAR-2) , '01' ))
END AS START_DATE ,
CASE
WHEN dim.year_period NOT LIKE '%01'
THEN to_number(CONCAT(TO_CHAR(dim.PERIOD_YEAR) , '01' ))
WHEN dim.year_period LIKE '%01'
THEN to_number(CONCAT(TO_CHAR(dim.PERIOD_YEAR-1) , '01' ))
END AS ENDDATE
FROM dim_periods dim
WHERE dim.year_period=to_number(TO_CHAR(SYSDATE, 'YYYYMM'))
)
SELECT fd.COLUMNS,
q.COLUMNS
FROM financial_data fd
JOIN subq q
ON (fd.KEY = q.KEY) -- join key
WHERE fd.year_period BETWEEN q.start_date AND q.enddate;
So, SUBQ acts like a temporary table, which you join with financial_data table.
UPDATE OP doesn't want the ANSI join syntax.
WITH SUBQ AS
(SELECT dim.MONTH_NAME AS current_month_name ,
dim.year_period AS current_month ,
dim.PERIOD_YEAR AS YEAR ,
CASE
WHEN dim.year_period NOT LIKE '%01'
THEN to_number(CONCAT(TO_CHAR(dim.PERIOD_YEAR-1) , '01' ))
WHEN dim.year_period LIKE '%01'
THEN to_number(CONCAT(TO_CHAR(dim.PERIOD_YEAR-2) , '01' ))
END AS START_DATE ,
CASE
WHEN dim.year_period NOT LIKE '%01'
THEN to_number(CONCAT(TO_CHAR(dim.PERIOD_YEAR) , '01' ))
WHEN dim.year_period LIKE '%01'
THEN to_number(CONCAT(TO_CHAR(dim.PERIOD_YEAR-1) , '01' ))
END AS ENDDATE
FROM dim_periods dim
WHERE dim.year_period=to_number(TO_CHAR(SYSDATE, 'YYYYMM'))
)
SELECT fd.COLUMNS,
q.COLUMNS
FROM financial_data fd,
subq q
WHERE fd.KEY = q.KEY -- join key
AND fd.year_period BETWEEN q.start_date AND q.enddate;

Week interval query starting on mondays

FIDDLE
I need to do a JasperReport. what I need to display is the total number of accounts processes, broken down into weekly intervals with the number of activated and declined accounts.
For the weekly interval query I got thus far:
SELECT *
FROM account_details
WHERE DATE date_opened = DATE_ADD(2014-01-01, INTERVAL(1-DAYOFWEEK(2014-01-01)) +1 DAY)
This seems to be correct, but not POSTGRES correct. It keeps complaining about the 1-DAYOFWEEK. Here is what I will hopefully achieve:
UPDATE
It is pretty ugly, but I dont know of any better. Id does the job though. But dont know if it can be re-factored to look better at least. I also dont know how to handle division by zero at the moment.
SELECT to_char(d.day, 'YYYY/MM/DD - ') || to_char(d.day + 6, 'YYYY/MM/DD') AS Month
, SUM(CASE WHEN LOWER(situation) LIKE '%active%' THEN 1 ELSE 0 END) AS Activated
, SUM(CASE WHEN LOWER(situation) LIKE '%declined%' THEN 1 ELSE 0 END) AS Declined
, SUM(CASE WHEN LOWER(situation) LIKE '%declined%' OR LOWER(situation) LIKE '%active%' THEN 1 ELSE 0 END) AS Total
, to_char( 100.0 *( (SUM(CASE WHEN LOWER(situation) LIKE '%active%' THEN 1 ELSE 0 END)) / (SUM(CASE WHEN LOWER(situation) LIKE '%declined%' OR LOWER(situation) LIKE '%active%' THEN 1 ELSE 0 END))::real) , '99.9') AS percent_activated
, to_char( 100.0 *( (SUM(CASE WHEN LOWER(situation) LIKE '%declined%' THEN 1 ELSE 0 END)) / (SUM(CASE WHEN LOWER(situation) LIKE '%declined%' OR LOWER(situation) LIKE '%active%' THEN 1 ELSE 0 END))::real) , '99.9') AS percent_declined
FROM (
SELECT day::date
FROM generate_series('2014-08-01'::date, '2014-09-14'::date, interval '1 week') day
) d
JOIN account_details a ON a.date_opened >= d.day
AND a.date_opened < d.day + 6
GROUP BY d.day;

SELECT to_char(d.day, 'YYYY/MM/DD" - "')
|| to_char(d.day + 6, 'YYYY/MM/DD') AS week
, count(situation ILIKE '%active%' OR NULL) AS activated
, ...
FROM (
SELECT day::date
FROM generate_series('2014-08-11'::date
, '2014-09-14'::date
, '1 week'::interval) day
) d
LEFT JOIN account_details a ON a.date_opened >= d.day
AND a.date_opened < d.day + 7 -- 7, not 6!
GROUP BY d.day;
Related answers:
Weekly total sums
Calculate working hours between 2 dates in PostgreSQL
Best way to count records by arbitrary time intervals in Rails+Postgres
More about counting specific values:
For absolute performance, is SUM faster or COUNT?
SQL Query to Transpose Column Counts to Row Counts
Aside: You would typically use an enum or a look-up table and just store an ID for situation, not a lengthy text redundantly.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Hive end of the last month - variables

INSERT OVERWRITE TABLE test_month PARTITION (dt= LAST_DAY('${CURRENT_DATE}')) SELECT '${CURRENT_DATE}', LAST_DAY('${CURRENT_DATE}'); Current date is first day of the month. I want to achieve something like above. It is not working. This will be HiveQL used in oozie.

Related

Concat/Union two tables in SQL

Count sequence selected columns

Pivot without aggregate function

Condition in WHERE clause (Oracle)

Week interval query starting on mondays

Categories

Resources