Find each position of same characters in a string through SQL query - sql

Below is my data set
Year Month Holiday list
----------------------------------------------------------
2007 1 WWHHWWWWWHHWWWWWHHWWWWWHHWWWWWH
2008 4 HWWWWWHHWWWWWHHWWWWWHHWWWWWHHWW
I want to write SQL query to get below output wherein the position of each "H" is displayed.
Year Month Holiday list
-----------------------------------------------------
2007 1 3 4 10 11 18 19 25 26 31
2008 4 1 7 8 14 15 21 22 28 29
The INSTR function in Oracle returns the position of first character. So, I cannot use it.
I could write a Function and pass the Holiday List column value to it. Loop through the string and
find the position of each "H". But, I want to achieve this through plain SQL select query.
How to find each position of same characters in a string through SQL query in Oracle database?

Something like this might do it. You can use the INSTRfunction and a hierarchical query to go through the string and then use the LISTAGG analytic function to concatenate the results.
WITH data AS (
SELECT 2007 year, 1 month, 'WWHHWWWWWHHWWWWWHHWWWWWHHWWWWWH' holiday_list FROM DUAL
UNION
SELECT 2008 year, 4 month, 'HWWWWWHHWWWWWHHWWWWWHHWWWWWHHWW' holiday_list FROM DUAL)
SELECT year, month, (SELECT LISTAGG(INSTR(holiday_list,'H',1,LEVEL),' ') WITHIN GROUP (ORDER BY LEVEL)
FROM DUAL
CONNECT BY LEVEL < INSTR(holiday_list,'H',1,LEVEL)) S
FROM data;

Well you can get a list, see the dbfiddle here
SELECT id, INSTR(some, 'H', 1, c.no) AS Pos
FROM tst CROSS JOIN consecutive c
WHERE INSTR(some, 'H', 1, c.no) > 0
This gives you a list of the days with an H. You could then try to transpose this list. But be warned, there is an intentional CROSS JOIN there, so for every row in your data you will get up to 31 rows with this statement. Like Tim Biegeleisen said, it may be better to use an UDF for this.

Related

Parse dynamic substring SQL

Querying data in a Postgres SQL database..
I have a column job_name with data like so, where I want to parse the year out of the string (the first 4 numbers)..
AG_2017_AJSJGHJS8GJSJ_7_25_WK_AHGSGHGHS
In this case I want to pull 2017
This works fine and pulls the rows where the 4 digits are >= 2019...
select
job_id,
cast(substring(job_name, 4, 4) as integer) as year
from job_header
where
job_type = 'PAMA'
and cast(substring(job_name, 4, 4) as integer) >= 2019
The issue is we have some data where the string is formatted slightly differently, and there are 3 characters in the beginning, or 4 (instead of 2), followed by an underscore like this...
APR3_2018_Another_Test_SC_PAMA
I still want to pull the first 4 numbers (so 2018 in this case), cast to int etc...Is there a way to make my first query dynamic to handle this?
You can use a regular expression to find the first group of 4 digits, e.g.:
with job_header(job_id, job_name, job_type) as (
values
(1, 'AG_2017_AJSJGHJS8GJSJ_7_25_WK_AHGSGHGHS', 'PAMA'),
(2, 'APR3_2018_Another_Test_SC_PAMA', 'PAMA')
)
select
job_id,
cast(substring(job_name from '\d{4}') as int) as year
from job_header
where job_type = 'PAMA'
job_id | year
--------+------
1 | 2017
2 | 2018
(2 rows)
Read about POSIX Regular Expressions in the documentation.

Hive QL to populate a sequence of numbers between limits

Not sure how to put this in a straight forward manner but I'm trying to make something work in Hive SQL. I need to create a sequence of numbers from lower limit to upper limit.
Ex:
select min(year) from table
Let's assume it results in 2010
select max(year) from table
Let's assume it results in 2015
I need to publish each year from 2010 to 2015 in a select query.
And I'm trying to put the min calculation & max calculation inside the same SQL which will/should create sequential years in the output.
Any ideas?
Well I have an idea but in order to use it, you will have to define the lowest possible and the largest possible values for the years that might be present in your table.
Let's say the smallest possible year is 1900 and the largest possible year is 2200.
Since the largest possible difference in this case is 2200-1900=300, you will have to use the following string: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 ... ... 298 299 300.
In the query, you split this string using space as a delimiter thus getting an array, and then you explode that array.
Have a look:
SELECT
minval + delta
FROM
(
SELECT
min(year) minval,
max(year) maxval,
split('0 1 2 3 4 5 6 7 8 9 10 11 12 13 ... ... ... 298 299 300', ' ') delta_list
FROM
table
) t
LATERAL VIEW explode(delta_list) dlist AS delta
WHERE (maxval-minval) >= delta
;
So you end up with 301 rows but you only need the rows with delta values not exceeding the difference between max year and min year, which is reflected in the where clause
set hivevar:end_year=2019;
set hivevar:start_year=2010;
select ${hivevar:start_year}+i as year
from
(
select posexplode(split(space((${hivevar:end_year}-${hivevar:start_year})),' ')) as (i,x)
)s;
Result:
year
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
Have a look also at this answer about generating missing dates.

How do I average the last 6 months of sales within SQL based on period AND year?

How do I average the last 6 months of sales within SQL?
Here are my tables and fields:
IM_ItemWhseHistoryByPeriod.FISCALCALPERIOD,
IM_ItemWhseHistoryByPeriod.FISCALCALYEAR,
And I need to average these fields
IM_ItemWhseHistoryByPeriod.DOLLARSSOLD,
IM_ItemWhseHistoryByPeriod.QUANTITYSOLD,
The hard part I'm having is understanding how to average the last whole 6 months, ie. fsicalcalperiod 2-6(inside fiscalcalyear 2017).
I'm hoping for some help on what the SQL command text should look like since I'm very new to manipulating SQL outside of the UI.
Sample Data
My Existing SQL String:
SELECT IM_ItemWhseHistoryByPeriod.ITEMCODE,
IM_ItemWhseHistoryByPeriod.DOLLARSSOLD,
IM_ItemWhseHistoryByPeriod.QUANTITYSOLD,
IM_ItemWhseHistoryByPeriod.FISCALCALPERIOD,
IM_ItemWhseHistoryByPeriod.FISCALCALYEAR
FROM MAS_AME.dbo.IM_ItemWhseHistoryByPeriod
IM_ItemWhseHistoryByPeriod
ScaisEdge Attempt #1
if fiscalyear and fiscalperiod are number you could use
select avg(IM_ItemWhseHistoryByPeriod.DOLLARSSOLD) ,
avg(IM_ItemWhseHistoryByPeriod.QUANTITYSOLD)
from my_table
where IM_ItemWhseHistoryByPeriod.FISCALCALYEAR = 2017
and IM_ItemWhseHistoryByPeriod.FISCALCALPERIOD between 2 and 6
or for each item code
select itemcode, avg(IM_ItemWhseHistoryByPeriod.DOLLARSSOLD) ,
avg(IM_ItemWhseHistoryByPeriod.QUANTITYSOLD)
from my_table
where IM_ItemWhseHistoryByPeriod.FISCALCALYEAR = 2017
and IM_ItemWhseHistoryByPeriod.FISCALCALPERIOD between 2 and 6
group by itemcode
Try the following solution and see if it works for you:
select avg(DOLLARSSOLD) as AvgDollarSod,
avg(QUANTITYSOLD) as AvgQtySold
from IM_ItemWhseHistoryByPeriod
where FISCALCALYEAR = '2017
and FISCALCALPERIOD between 2 and 6

group yearmonth field by quarter in sql server

I have a int field in my database which represent year and month like 201501 stands for 2015 Jan,
i need to group by reporting_date field and showcase the quarterly data .The table is in the following format .Reporting_date is an int field rather than a datetime and interest_payment is float
reporting_date interest_payment
200401 5
200402 10
200403 25
200404 15
200406 5
200407 20
200408 25
200410 10
the output of the query should like this
reporting_date interest_payment
Q1 -2004 40
Q2 -2004 20
Q3 -2004 40
Q4 -2004 10
i tried using the normal group by statement
select reporting_date , sum(interest_payment) as interest_payment from testTable
group by reporting_date
but got different result output.Any help would be appreciated
Thanks
before grouping you need to calculate report_quarter, which is equal to
(reporting_date%100-1)/3
then do select
select report_year, 'Q'+cast(report_quarter+1 as varchar(1)), SUM (interest_payment)
from
(
select
*,
(reporting_date%100 - 1)/3 as report_quarter,
reporting_date/100 as report_year
from #x
) T
group by report_year, report_quarter
order by report_year, report_quarter
I see two problems here:
You need to convert reporting_date into a quarter.
You need to SUM() the values in interest_payment for each quarter.
You seem to have the right idea for (2) already, so I'll just help with (1).
If the numbers are all 6 digits (see my comment above) you can just do some numeric manipulation to turn them into quarters.
First, convert into months by dividing by 100 and keeping the remainder: MOD(reporting_date/100).
Then, convert that into a quarter: MOD(MOD(reporting_date/100)/4)+1
Add a Q and the year if desired.
Finally, use that value in your GROUP BY.
You didn't specify which DBMS you are using, so you may have to convert the functions yourself.

SQL Select Command Syntax Help

Hopefully this is pretty simple, but I'm having some trouble. I have a table that has multiple fields but basically the two that matter are ID and year. A single ID can exist in many years. How do I set up a select statement (Which I'm ultimately using in a join in another statement) so that I can retrieve all of the distinct IDS with no duplicates for the top year that they exist for?
If there is a set of records like this:
ID - Year
55 - 2000
55 - 2001
56 - 2000
56 - 2002
So basically I want something like this returned:
55 - 2001
56 - 2002
Help?
SELECT ID, MAX(Year) FROM MyTable GROUP BY ID
References:
SELECT (Transact-SQL)
GROUP BY (Transact-SQL)
Aggregate Functions (Transact-SQL)
MAX (Transact-SQL)