Hive QL to populate a sequence of numbers between limits - hive

Not sure how to put this in a straight forward manner but I'm trying to make something work in Hive SQL. I need to create a sequence of numbers from lower limit to upper limit.
Ex:
select min(year) from table
Let's assume it results in 2010
select max(year) from table
Let's assume it results in 2015
I need to publish each year from 2010 to 2015 in a select query.
And I'm trying to put the min calculation & max calculation inside the same SQL which will/should create sequential years in the output.
Any ideas?

Well I have an idea but in order to use it, you will have to define the lowest possible and the largest possible values for the years that might be present in your table.
Let's say the smallest possible year is 1900 and the largest possible year is 2200.
Since the largest possible difference in this case is 2200-1900=300, you will have to use the following string: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 ... ... 298 299 300.
In the query, you split this string using space as a delimiter thus getting an array, and then you explode that array.
Have a look:
SELECT
minval + delta
FROM
(
SELECT
min(year) minval,
max(year) maxval,
split('0 1 2 3 4 5 6 7 8 9 10 11 12 13 ... ... ... 298 299 300', ' ') delta_list
FROM
table
) t
LATERAL VIEW explode(delta_list) dlist AS delta
WHERE (maxval-minval) >= delta
;
So you end up with 301 rows but you only need the rows with delta values not exceeding the difference between max year and min year, which is reflected in the where clause

set hivevar:end_year=2019;
set hivevar:start_year=2010;
select ${hivevar:start_year}+i as year
from
(
select posexplode(split(space((${hivevar:end_year}-${hivevar:start_year})),' ')) as (i,x)
)s;
Result:
year
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
Have a look also at this answer about generating missing dates.

Related

Find each position of same characters in a string through SQL query

Below is my data set
Year Month Holiday list
----------------------------------------------------------
2007 1 WWHHWWWWWHHWWWWWHHWWWWWHHWWWWWH
2008 4 HWWWWWHHWWWWWHHWWWWWHHWWWWWHHWW
I want to write SQL query to get below output wherein the position of each "H" is displayed.
Year Month Holiday list
-----------------------------------------------------
2007 1 3 4 10 11 18 19 25 26 31
2008 4 1 7 8 14 15 21 22 28 29
The INSTR function in Oracle returns the position of first character. So, I cannot use it.
I could write a Function and pass the Holiday List column value to it. Loop through the string and
find the position of each "H". But, I want to achieve this through plain SQL select query.
How to find each position of same characters in a string through SQL query in Oracle database?
Something like this might do it. You can use the INSTRfunction and a hierarchical query to go through the string and then use the LISTAGG analytic function to concatenate the results.
WITH data AS (
SELECT 2007 year, 1 month, 'WWHHWWWWWHHWWWWWHHWWWWWHHWWWWWH' holiday_list FROM DUAL
UNION
SELECT 2008 year, 4 month, 'HWWWWWHHWWWWWHHWWWWWHHWWWWWHHWW' holiday_list FROM DUAL)
SELECT year, month, (SELECT LISTAGG(INSTR(holiday_list,'H',1,LEVEL),' ') WITHIN GROUP (ORDER BY LEVEL)
FROM DUAL
CONNECT BY LEVEL < INSTR(holiday_list,'H',1,LEVEL)) S
FROM data;
Well you can get a list, see the dbfiddle here
SELECT id, INSTR(some, 'H', 1, c.no) AS Pos
FROM tst CROSS JOIN consecutive c
WHERE INSTR(some, 'H', 1, c.no) > 0
This gives you a list of the days with an H. You could then try to transpose this list. But be warned, there is an intentional CROSS JOIN there, so for every row in your data you will get up to 31 rows with this statement. Like Tim Biegeleisen said, it may be better to use an UDF for this.

How do I average the last 6 months of sales within SQL based on period AND year?

How do I average the last 6 months of sales within SQL?
Here are my tables and fields:
IM_ItemWhseHistoryByPeriod.FISCALCALPERIOD,
IM_ItemWhseHistoryByPeriod.FISCALCALYEAR,
And I need to average these fields
IM_ItemWhseHistoryByPeriod.DOLLARSSOLD,
IM_ItemWhseHistoryByPeriod.QUANTITYSOLD,
The hard part I'm having is understanding how to average the last whole 6 months, ie. fsicalcalperiod 2-6(inside fiscalcalyear 2017).
I'm hoping for some help on what the SQL command text should look like since I'm very new to manipulating SQL outside of the UI.
Sample Data
My Existing SQL String:
SELECT IM_ItemWhseHistoryByPeriod.ITEMCODE,
IM_ItemWhseHistoryByPeriod.DOLLARSSOLD,
IM_ItemWhseHistoryByPeriod.QUANTITYSOLD,
IM_ItemWhseHistoryByPeriod.FISCALCALPERIOD,
IM_ItemWhseHistoryByPeriod.FISCALCALYEAR
FROM MAS_AME.dbo.IM_ItemWhseHistoryByPeriod
IM_ItemWhseHistoryByPeriod
ScaisEdge Attempt #1
if fiscalyear and fiscalperiod are number you could use
select avg(IM_ItemWhseHistoryByPeriod.DOLLARSSOLD) ,
avg(IM_ItemWhseHistoryByPeriod.QUANTITYSOLD)
from my_table
where IM_ItemWhseHistoryByPeriod.FISCALCALYEAR = 2017
and IM_ItemWhseHistoryByPeriod.FISCALCALPERIOD between 2 and 6
or for each item code
select itemcode, avg(IM_ItemWhseHistoryByPeriod.DOLLARSSOLD) ,
avg(IM_ItemWhseHistoryByPeriod.QUANTITYSOLD)
from my_table
where IM_ItemWhseHistoryByPeriod.FISCALCALYEAR = 2017
and IM_ItemWhseHistoryByPeriod.FISCALCALPERIOD between 2 and 6
group by itemcode
Try the following solution and see if it works for you:
select avg(DOLLARSSOLD) as AvgDollarSod,
avg(QUANTITYSOLD) as AvgQtySold
from IM_ItemWhseHistoryByPeriod
where FISCALCALYEAR = '2017
and FISCALCALPERIOD between 2 and 6

How to generate a custom sequential number with SQL Server 2012

Is there any way to generate a custom sequential number like the following?
I want the Number to be incremented with grouping by the Code and Year.
Code Year Number
A 2016 1
A 2016 2
A 2016 3
B 2016 1
B 2016 2
C 2016 1
A 2017 1
A 2017 2
Any suggestion would be appreciated.
EDIT
Sorry, I was too ambiguous what I want. I want to generate the unique number when I query, so if I ask a new number in the above data context with Code:A and Year:2017, I want the Number to be 3. I guess to get the Number properly in a future I need to save the Code and Year with the Number.
Use ROW_NUMBER to assign Number per Code,Year grouping.
SELECT *,
Number = ROW_NUMBER() OVER(PARTITION BY Code, [Year] ORDER BY (SELECT NULL))
FROM tbl
Replace SELECT NULL with the column you want the order to be based from.

how to loop through a specified range

I have a database of movies where one field is the year which it was released.
I want to create a query which will loop through each decade and will calculate the sum of a particular field for that decade. I have no idea how I can get a loop for every decade. Can anyone help?
If you want the decades where you don't have any movies as well as those with movies, then you can use generate_series to build you list of decades and the do a left outer join to your table; generate_series is the standard way to build numeric and time lists on the fly in PostgreSQL. Something like this should get you started:
select decade.d, count(t.year)
from generate_series(1900, 2100, 10) as decade(d)
left outer join your_table t on decade.d = floor(t.year / 10) * 10
group by decade.d
order by decade.d
That will produce output like this:
d | count
------+-------
1900 | 1
1910 | 0
1920 | 1
1930 | 3
1940 | 0
1950 | 0
1960 | 1
1970 | 0
1980 | 3
-- ...
2100 | 0
You could adjust the first and last values for the generate_series call to match your data if desired.
The floor(t.year / 10) * 10 bit gives you decade for a given year; it will convert 1942 to 1940, 2000 to 2000, etc.
You can set up a decade table (a one column table with one entry for each decade) if you move to a database that doesn't have something like generate_series. The SQL would be pretty much the same, just replace the generate_series call with your decade table.
Try something like this(don't know how your tables look, guessing):
SELECT movie_year, sum(column_x)
FROM (
SELECT year
, date_trunc('decade', movie_year)::date as decade
, column_x
FROM movies) as movies_with_decades
GROUP BY decade
ORDER BY decade;

SQL complex view for virtually showing data

I have a table with the following table.
----------------------------------
Hour Location Stock
----------------------------------
6 2000 20
9 2000 24
----------------------------------
So this shows stock against some of the hours in which there is a change in the quantity.
Now my requirement is to create a view on this table which will virtually show the data (if stock is not htere for a particular hour). So the data that should be shown is
----------------------------------
Hour Location Stock
----------------------------------
6 2000 20
7 2000 20 -- same as hour 6 stock
8 2000 20 -- same as hour 6 stock
9 2000 24
----------------------------------
That means even if the data is not there for some particular hour then we should show the last hour's stock which is having stock. And i have another table with all the available hours from 1-23 in a column.
I have tried partition over by method as given below. But i think i am missing some thing around this to get my requirement done.
SELECT
HOUR_NUMBER,
CASE WHEN TOTAL_STOCK IS NULL
THEN SUM(TOTAL_STOCK)
OVER (
PARTITION BY LOCATION
ORDER BY CURRENT_HOUR ROWS 1 PRECEDING
)
ELSE
TOTAL_STOCK
END AS FULL_STOCK
FROM
(
SELECT HOUR_NUMBER AS HOUR_NUMBER
FROM HOURS_TABLE -- REFEERENCE TABLE WITH HOURS FROM 1-23
GROUP BY 1
) HOURS_REF
LEFT OUTER JOIN
(
SEL CURRENT_HOUR AS CURRENT_HOUR
, STOCK AS TOTAL_STOCK
,LOCATION AS LOCATION
FROM STOCK_TABLE
WHERE STOCK<>0
) STOCKS
ON HOURS_REF.HOUR_NUMBER = STOCKS.CURRENT_HOUR
This query is giving all the hours with stock as null for the hours without data.
We are looking at ANSI sql solution so that it can be used on databases like Teradata.
I am thinking that i am using partition over by wrongly or is there any other way. We tried with CASE WHEN but that needs some kind of looping to check back for an hour with some stock.
I've run into similar problems before. It's often simpler to make sure that the data you need somehow gets into the database in the first place. You might be able to automate it with a stored procedure that runs periodically.
Having said that, did you consider trying COALESCE() with a scalar subquery? (Or whatever similar function your dbms supports.) I'd try it myself and post the SQL, but I'm leaving for work in two minutes.
Haven't tried, but along the lines of what Mike said:
SELECT a.hour
, COALESCE( a.stock
, ( select b.stock
from tbl.b
where b.hour=a.hour-1 )
) "stock"
FROM tbl a
Note: this will impact performance greatly.
Thanks for your responses. I have tried out RECURSIVE VIEW for the above requirement and is giving correct results (I am fearing about the CPU usage for big tables as it is recursive). So here is stock table
----------------------------------
Hour Location Stock
----------------------------------
6 2000 20
9 2000 24
----------------------------------
Then we will have a view on this table which will give all 12 hours data using Left outer join.
----------------------------------
Hour Location Stock
----------------------------------
6 2000 20
7 2000 NULL
8 2000 NULL
9 2000 24
----------------------------------
Then we will have a recursive view which joins the table recursively with the same view to get the Stock of each hour moved one hour up and appended with level of data coming incremented.
REPLACE RECURSIVE VIEW HOURLY_STOCK_VIEW
(HOUR_NUMBER,LOCATION, STOCK, LVL)
AS
(
SELECT
HOUR_NUMBER,
LOCATION,
STOCK,
1 AS LVL
FROM STOCK_VIEW_WITH_LEFT_OUTER_JOIN
UNION ALL
SELECT
STK.HOUR_NUMBER,
THE_VIEW.LOCATION,
THE_VIEW.STOCK,
LVL+1 AS LVL
FROM STOCK_VIEW_WITH_LEFT_OUTER_JOIN STK
JOIN
HOURLY_STOCK_VIEW THE_VIEW
ON THE_VIEW.HOUR_NUMBER = STK.HOUR_NUMBER -1
WHERE LVL <=12
)
;
You can observe that first we select from the Left outer joined view then we union it with the left outer join view joined on the same view which we are creating and giving it its level at which data is coming.
Then we select the data from this view with the minimum level.
SEL * FROM HOURLY_STOCK_VIEW
WHERE
(
HOUR_NUMBER,
LVL
)
IN
(
SEL
HOUR_NUMBER,
MIN(LVL)
FROM HOURLY_STOCK_VIEW
WHERE STOCK IS NOT NULL
GROUP BY 1
)
;
This is working fine and giving the result as
----------------------------------
Hour Location Stock
----------------------------------
6 2000 20
7 2000 20 -- same as hour 6 stock
8 2000 20 -- same as hour 6 stock
9 2000 24
10 2000 24
11 2000 24
12 2000 24
----------------------------------
I know this is going to take huge CPU for large tables to get the recursion work ( we are limiting the recursion to only 12 levels as 12 hours data is needed to stop it go into infinite loop). But I thought some body can use this for some kind of Hierarchy building. I will look for some more responses from you guys on any other approaches available. Thanks. You can have a look at Recursive views in the below link for teradata.
http://forums.teradata.com/forum/database/recursion-in-a-stored-procedure
The most common uses of view is, the removal of complexity.
For example:
CREATE VIEW FEESTUDENT
AS
SELECT S.NAME,F.AMOUNT FROM STUDENT AS S
INNER JOIN FEEPAID AS F ON S.TKNO=F.TKNO
Now do a SELECT:
SELECT * FROM FEESTUDENT