How to find month gaps in Oracle table? - sql

I have a Oracle table which has EmpName (Char),Month_from and Month_to column (Numeric). Here I need to find missing months ( Month gaps). In the below sample data I have to find missing month 6 (Jun ).
Thanks in advance.
Sample Data:
|-------|-----------|--------|
|eName |Month_From |Month_To|
|(Char) | ( Int) | ( Int) |
|-------|------------|-------|
|John |1 |2 | ( Jan to Feb)
|John |3 |5 | ( Mar to May)
|John |7 |8 | ( Jul to Aug)
|-------|------------|-------|
Need to Find (Jun to Jun).

Assuming no overlaps, you can find the missing months using lag():
select (prev_month_to + 1) as start_missing,
(month_from - 1) as end_missing
from (select t.*, lag(month_to) over (partition by name order by month_from) as prev_month_to
from t
) t
where prev_month_to <> month_from - 1;
This provides a range for each gap, because the gap could be more than one month.

Just conversion for the sample data, you may consider :
select to_char(to_date(lpad(t.month_from,2,'0'),'mm'),'Mon')||' to '||
to_char(to_date(lpad(t.month_to,2,'0'),'mm'),'Mon')
from my_table t
where upper(t.eName) = upper('&i_eName');
For the question ( Jun to Jun ):
select to_char(to_date(lpad(a1.mon,2,'0'),'mm'),'Mon')
from ( select level mon from dual connect by level <= 12 ) a1
where not exists ( select null
from my_table a2
where a1.mon between a2.month_from and a2.month_to
and upper(a2.eName) = upper('&i_eName') )
order by mon;
But, it returns also Sep, Oct, Nov, Dec, besides Jun. For this, i agree with #mathguy's comment.

Related

PostgreSQL - return months even where there are no records within that month

first of all thank you for helping me, this is my first JR job and I don't want to screw this up.
I need to return all the records (grouped by price and money_balance) of a Debts history by month, of Property for a specific year.
What I've been trying to do is
SELECT properties.name as property
, EXTRACT(month from priority_date) as month
, SUM(debts.money_balance) as money_balance
, SUM(debts.price) as price
FROM properties
JOIN debts on properties.id = debts.property_id
WHERE properties.community_id = 15
AND properties.active = TRUE
AND EXTRACT(year from priority_date) = 2021
GROUP BY month, properties.name
This is going to give me something like
id
property
month
money_balance
price
1
A1
1
1111
3131
2
A1
7
0
1111
3
A2
7
0
1111
But I need to have even months where there are no records, and to have the money_balance and price at 0 or null, if this achievable with SQL?
Thank you so much.
Edit:
Desired output:
|id|property | month|money_balance| price|
|--| --- | ---- |-- | ---- |
|1 | A1 |1 |1111 | 3131 |
|2 |A1 |2 | 0 |0 |
|3 |A1 |3 | 0 |0 |
Till month 12, it ca be 0 or null in the months were the are no records
You can use generate_series() to generate all the months, and then bring the data in:
SELECT p.name as property, mon,
SUM(d.money_balance) as money_balance,
SUM(d.price) as price
FROM GENERATE_SERIES(1, 12, 1) gs(mon) JOIN
(properties p JOIN
debts d
ON p.id = d.property_id AND
p.community_id = 15 AND
p.properties.active = TRUE AND
EXTRACT(year from priority_date) = 2021
)
ON EXTRACT(month from priority_date) = gs.mon
GROUP BY gs.mon, p.name

How to count the number of NEW ids in hive SQL table by date?

I have a table with a bunch of months and ids. I want to count how many NEW ids I've gotten in each month. For example, say I have the following table:
Month | ID
------------
Jan | 123
Jan | 456
Jan | 789
Feb | 123
Feb | 101112
Mar | 456
Mar | 12345
Mar | 6789
I want the output to be:
Month | # New IDS
------------------
Jan | 3
Feb | 1
Mar | 2
I'm truly lost on the best way to do this and haven't been able to find anything that's similar to this problem.
One option uses two levels of aggregation. Assuming that month is of a date datatype (or at least something that can be consistently sorted as a date):
select month, count(*) new_ids
from (select min(month) month from mytable group by id) t
group by month
You can also use window functions:
select month, count(*) new_ids
from (
select month, row_number() over(partition by id order by month) rn
from mytable
) t
where rn = 1

group by value but only for continue value

OK, the title is far from obvious, I could not explain it better.
Let's consider the table with columns (date, xvalue, some other columns), what I need is to group them by xvalue but only when they are not interrupted considering time (column date), so for example, for:
Date |xvalue |yvalue|
1 Mar |10 |1 |
2 Mar |10 |2 |
3 Mar |20 |6 |
4 Mar |20 |1 |
5 Mar |10 |4 |
6 Mar |10 |2 |
From the above data, I would like to get three rows, for the first xvalue==10, for xvalue==20 and again for xvalue==10 and for each group aggregate of the other values, for example for sum:
1 Mar, 10, 3
3 Mar, 20, 7
5 Mar, 10, 6
It's like query:
select min(date), xvalue, sum(yvalue) from t group by xvalue
Except above will merge 1,2,5 and 6th of March and I want them separately
This is an example of a gaps-and-islands problem. But you need an ordering column. With such a column, you can use the difference of row numbers:
select min(date), xvalue, sum(yvalue)
from (select t.*,
row_number() over (partition by xvalue order by date) as seqnum_d,
row_number() over (order by date) as seqnum
from t
) t
group by xvalue, (seqnum - seqnum_d)
order by min(date)
Here is a db<>fiddle.
Datas in a database are logically stored in mathematicl sets inside which there is absolutly no order and no way to have a default ordering. they are comparable to bags in which objects can move during their use.
So there is no solution to answer your query until you add a specific column to give the requested sort order that the user need to have...

How To Increment Date By One Year, Based on Last Result (DateTime Banding)

Hopefully I'll be able to explain this better than the title.
I have an activity table that looks like this:
|ID| |LicenseNumber| |DateTime|
|1 | |123 | |2017-11-17 11:19:04.420|
|2 | |123 | |2017-11-26 10:16:52.790|
|3 | |123 | |2018-02-06 11:13:21.480|
|4 | |123 | |2018-02-19 10:12:32.493|
|5 | |123 | |2018-05-16 09:33:05.440|
|6 | |123 | |2019-01-02 10:05:25.193|
What I need is a count of rows per License Number, grouped in essentially 12 month intervals. But, the year needs to start from when the previous entry ended.
For example, I need a count of all records for 12 months from 2017-11-17 11:19:04.420, and then I need a count of all records starting from (2017-11-17 11:19:04.420 + 12 months) for another 12 months, and so on.
I've considered using recursive CTEs, the LAG function etc. but can't quite figure it out. I could probably do something with a CASE statement and static values, but that would require updating the report code every year.
Any help pointing me in the right direction would be much appreciated!
I think the following code using CTE can help you but I am not totally sure what you want to achieve:
WITH CTE AS
(
SELECT TOP 1 DateTime
FROM YourTable
ORDER BY ID
UNION ALL
SELECT DATEADD(YEAR, 1, DateTime)
FROM CTE
WHERE DateTime<= DATEADD(YEAR, 1, GETDATE())
)
SELECT LicenseNumber, DateTime, Count(*) AS Rows
FROM CTE
INNER JOIN YourTable
ON YourTable.DateTime BETWEEN CTE.DateTime AND DATEADD(YEAR, 1, CTE.DateTime)
GROUP BY LicenseNumber, DateTime;
Hmmm. Do you just need the number of records in 12-month intervals after the first record?
If so:
select dateadd(year, yr - 1, min_datetime),
dateadd(year, yr, min_datetime),
count(t.id)
from (values (1), (2), (3)) v(yr) left join
(select t.*,
min(datetime) over () as min_datetime
from t
) t
on t.datetime >= dateadd(year, yr - 1, min_datetime) and
t.datetime < dateadd(year, yr, min_datetime)
group by dateadd(year, yr - 1, min_datetime),
dateadd(year, yr, min_datetime)
order by yr;
This can easily be extended to more years, if it is what you want.

Rolling counts based on rolling cohorts

Using Postgres 9.5. Test data:
create temp table rental (
customer_id smallint
,rental_date timestamp without time zone
,customer_name text
);
insert into rental values
(1, '2006-05-01', 'james'),
(1, '2006-06-01', 'james'),
(1, '2006-07-01', 'james'),
(1, '2006-07-02', 'james'),
(2, '2006-05-02', 'jacinta'),
(2, '2006-05-03', 'jacinta'),
(3, '2006-05-04', 'juliet'),
(3, '2006-07-01', 'juliet'),
(4, '2006-05-03', 'julia'),
(4, '2006-06-01', 'julia'),
(5, '2006-05-05', 'john'),
(5, '2006-06-01', 'john'),
(5, '2006-07-01', 'john'),
(6, '2006-07-01', 'jacob'),
(7, '2006-07-02', 'jasmine'),
(7, '2006-07-04', 'jasmine');
I am trying to understand the behaviour of existing customers. I am trying to answer this question:
What is the likelihood of a customer to order again based on when their last order was (current month, previous month (m-1)...to m-12)?
Likelihood is calculated as:
distinct count of people who ordered in current month /
distinct count of people in their cohort.
Thus, I need to generate a table that lists a count of the people who ordered in the current month, who belong in a given cohort.
Thus, what are the rules for being in a cohort?
- current month cohort: >1 order in month OR (1 order in month given no previous orders)
- m-1 cohort: <=1 order in current month and >=1 order in m-1
- m-2 cohort: <=1 order in current month and 0 orders in m-1 and >=1 order in m-2
- etc
I am using the DVD Store database as sample data to develop the query: http://linux.dell.com/dvdstore/
Here is an example of cohort rules and aggregations, based on July being the
"month's orders being analysed" (please notice: the "month's orders being analysed" column is the first column in the 'Desired output' table below):
customer_id | jul-16| jun-16| may-16|
------------|-------|-------|-------|
james | 1 1 | 1 | 1 | <- member of jul cohort, made order in jul
jasmine | 1 1 | | | <- member of jul cohort, made order in jul
jacob | 1 | | | <- member of jul cohort, did NOT make order in jul
john | 1 | 1 | 1 | <- member of jun cohort, made order in jul
julia | | 1 | 1 | <- member of jun cohort, did NOT make order in jul
juliet | 1 | | 1 | <- member of may cohort, made order in jul
jacinta | | | 1 1 | <- member of may cohort, did NOT make order in jul
This data would output the following table:
--where m = month's orders being analysed
month's orders |how many people |how many people from |how many people |how many people from |how many people |how many people from |
being analysed |are in cohort m |cohort m ordered in m |are in cohort m-1 |cohort m-1 ordered in m |are in cohort m-2 |cohort m-2 ordered in m |...m-12
---------------|----------------|----------------------|------------------|------------------------|------------------|------------------------|
may-16 |5 |1 | | | | |
jun-16 | | |5 |3 | | |
jul-16 |3 |2 |2 |1 |2 |1 |
My attempts so far have been on variations of:
generate_series()
and
row_number() over (partition by customer_id order by rental_id desc)
I haven't been able to get everything to come together yet (I've tried for many hours and haven't yet solved it).
For readability, I think posting my work in parts is better (if anyone wants me to post the sql query in its entirety please comment - and I'll add it).
series query:
(select
generate_series(date_trunc(‘month’,min(rental_date)),date_trunc(‘month’,max(rental_date)),’1 month)) as month_being_analysed
from
rental) as series
rank query:
(select
*,
row_number() over (partition by customer_id order by rental_id desc) as rnk
from
rental
where
date_trunc('month',rental_date) <= series.month_being_analysed) as orders_ranked
I want to do something like: run the orders_ranked query for every row returned by the series query, and then base aggregations on each return of orders_ranked.
Something like:
(--this query counts the customers in cohort m-1
select
count(distinct customer_id)
from
(--this query ranks the orders that have occured <= to the date in the row of the 'series' table
select
*,
row_number() over (partition by customer_id order by rental_id desc) as rnk
from
rental
where
date_trunc('month',rental_date)<=series.month_being_analysed) as orders_ranked
where
(rnk=1 between series.month_being_analysed - interval ‘2 months’ and series.month_being_analysed - interval ‘1 months’)
OR
(rnk=2 between series.month_being_analysed - interval ‘2 months’ and series.month_being_analysed - interval ‘1 months’)
) as people_2nd_last_booking_in_m_1,
(--this query counts the customers in cohort m-1 who ordered in month m
select
count(distinct customer_id)
from
(--this query returns the orders by customers in cohort m-1
select
count(distinct customer_id)
from
(--this query ranks the orders that have occured <= to the date in the row of the 'series' table
select
*,
row_number() over (partition by customer_id order by rental_id desc) as rnk
from
rental
where
date_trunc('month',rental_date)<=series.month_being_analysed) as orders_ranked
where
(rnk=1 between series.month_being_analysed - interval ‘2 months’ and series.month_being_analysed - interval ‘1 months’)
OR
(rnk=2 between series.month_being_analysed - interval ‘2 months’ and series.month_being_analysed - interval ‘1 months’)
where
rnk=1 in series.month_being_analysed
) as people_who_booked_in_m_whose_2nd_last_booking_was_in_m_1,
...
from
(select
generate_series(date_trunc(‘month’,min(rental_date)),date_trunc(‘month’,max(rental_date)),’1 month)) as month_being_analysed
from
rental) as series
This query does everything. It operates on the whole table and works for any time range.
Based on some assumptions and assuming current Postgres version 9.5. Should work with pg 9.1 at least. Since your definition of "cohort" is unclear to me, I skipped the "how many people in cohort" columns.
I would expect it to be faster than anything you tried so far. By orders of magnitude.
SELECT *
FROM crosstab (
$$
SELECT mon
, sum(count(*)) OVER (PARTITION BY mon)::int AS m0
, gap -- count of months since last order
, count(*) AS gap_ct
FROM (
SELECT mon
, mon_int - lag(mon_int) OVER (PARTITION BY c_id ORDER BY mon_int) AS gap
FROM (
SELECT DISTINCT ON (1,2)
date_trunc('month', rental_date)::date AS mon
, customer_id AS c_id
, extract(YEAR FROM rental_date)::int * 12
+ extract(MONTH FROM rental_date)::int AS mon_int
FROM rental
) dist_customer
) gap_to_last_month
GROUP BY mon, gap
ORDER BY mon, gap
$$
, 'SELECT generate_series(1,12)'
) ct (mon date, m0 int
, m01 int, m02 int, m03 int, m04 int, m05 int, m06 int
, m07 int, m08 int, m09 int, m10 int, m11 int, m12 int);
Result:
mon | m0 | m01 | m02 | m03 | m04 | m05 | m06 | m07 | m08 | m09 | m10 | m11 | m12
------------+----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----
2015-01-01 | 63 | 36 | 15 | 5 | 3 | 3 | | | | | | |
2015-02-01 | 56 | 35 | 9 | 9 | 2 | | 1 | | | | | |
...
m0 .. customers with >= 1 order this month
m01 .. customers with >= 1 order this month and >= 1 order 1 month before (nothing in between)
m02 .. customers with >= 1 order this month and >= 1 order 2 month before and no order in between
etc.
How?
In subquery dist_customer reduce to one row per month and customer_id (mon, c_id) with DISTINCT ON:
Select first row in each GROUP BY group?
To simplify later calculations add a count of months for the date (mon_int). Related:
How do you do date math that ignores the year?
If there are many orders per (month, customer), there are faster query techniques for the first step:
Optimize GROUP BY query to retrieve latest record per user
In subquery gap_to_last_month add the column gap indicating the time gap between this month and the last month with any orders of the same customer. Using the window function lag() for this. Related:
PostgreSQL window function: partition by comparison
In the outer SELECT aggregate per (mon, gap) to get the counts you are after. In addition, get the total count of distinct customers for this month m0.
Feed this query to crosstab() to pivot the result into the desired tabular form for the result. Basics:
PostgreSQL Crosstab Query
About the "extra" column m0:
Pivot on Multiple Columns using Tablefunc