Teradata get row counts for previous two days and compare - sql

I'm trying to setup a data check, where we get the row count from a table for today and prior date. Since it isn't loaded on weekends or holidays, I can't say DATE-1.
I came-up with the following, to get the previous date:
SELECT
LOAD_DATE
,COUNT(LOAD_DATE) RW_COUNT
,ROW_NUMBER() OVER (ORDER BY LOAD_DATE ) AS LOAD_ROWNUM
FROM DATABASE1.TABLE1
WHERE LOAD_DATE >= DATE-6
GROUP BY 1
This produces the dates, counts and assigns a row number.
LOAD_DATE RW_COUNT LOAD_ROWNUM
2019-10-16 8259 1
2019-10-15 8253 2
2019-10-11 8256 3
2019-10-10 8243 4
I to take the two most current dates and compare them. Most current would be "current" and the 2nd most current would be "prior" . Then I would like to have something like this as the result set:
CURRENT_COUNT PRIOR_COUNT DIFF_PERCENT
8259 8253 .9927
My issue is, how do I reference the first two rows and compare them to each other? Unless I'm over-thinking this, I need two additional SELECT statements: 1 with the WHERE clause referencing row 1 and another with a WHERE referercing row 2.
How do I do that? Do I have two CTEs?
Eventually, I'll need a third SELECT dividing the two rows and checking for 10% tolerance. Help, I'm in analysis paralysis.

You can filter the result of an OLAP-function using QUALIFY:
SELECT
LOAD_DATE
,COUNT(LOAD_DATE) AS CURRENT_COUNT
-- previous day's count
,LEAD(RW_COUNT)
OVER (ORDER BY LOAD_DATE DESC) AS PRIOR_COUNT
-- if your TD version doesn't support LAG/LEAD (i.e. < 16.10)
--,MIN(RW_COUNT)
-- OVER (ORDER BY LOAD_DATE DESC
-- ROWS BETWEEN 1 FOLLOWING AND 1 FOLLOWING) AS PRIOR_COUNT
,CAST(CURRENT_COUNT AS DECIMAL(18,4)) / PRIOR_COUNT AS DIFF_PERCENT
FROM DATABASE1.TABLE1
WHERE LOAD_DATE >= DATE-6
GROUP BY 1
-- return the latest row only
QUALIFY ROW_NUMBER() OVER (ORDER BY LOAD_DATE DESC) = 1
checking for 10% tolerance:
DIFF_PERCENT BETWEEN 0.9 and 1.1
Either ANDed to the QUALIFY or within a CASE

I don't know what you want for your result set. But you can use LAG() with aggregation to get the previous value.
SELECT LOAD_DATE, COUNT(*) as RW_COUNT,
LAG(COUNT(*)) OVER (ORDER BY LOAD_DATE) as PREV_RW_COUNT
FROM DATABASE1.TABLE1
WHERE LOAD_DATE >= DATE-6
GROUP BY 1;
You may just want a difference of the two counts.

If your TD version (16.0+?) doesn't support LEAD/LAG, give this a try:
SELECT
load_date,
RW_COUNT,
MAX(RW_COUNT) OVER(
ORDER BY load_date DESC
ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING -- Get previous row's value
) AS RW_COUNT_prev
FROM (
SELECT load_date, COUNT(LOAD_DATE) RW_COUNT,
FROM DATABASE1.TABLE1
WHERE LOAD_DATE >= DATE-6
GROUP BY 1
) src

Related

PostgreSQL subquery - calculating average of lagged values

I am looking at Sales Rates by month, and was able to query the 1st table. I am quite new to PostgreSQL and am trying to figure out how I can query the second (I had to do the 2nd one in Excel for now)
I have the current Sales Rate and I would like to compare it to the Sales Rate 1 and 2 months ago, as an averaged rate.
I am not asking for an answer how exactly to solve it because this is not the point of getting better, but just for hints for functions to use that are specific to PostgreSQL. What I am trying to calculate is the 2 month average in the 2nd table based on the lagged values of the 2nd table. Thanks!
Here is the query for the 1st table:
with t1 as
(select date,
count(sales)::numeric/count(poss_sales) as SR_1M_before
from data
where date between '2019-07-01' and '2019-11-30'
group by 1),
t2 as
(select date,
count(sales)::numeric/count(poss_sales) as SR_2M_before
from data
where date between '2019-07-01' and '2019-10-31'
group by 1)
select t0.date,
count(t0.sales)::numeric/count(t0.poss_sales) as Sales_Rate
t1.SR_1M_before,
t2.SR_2M_before
from data as t0
left join t1 on t0.date=t1.date
left join t2 on t0.date=t1.date
where date between '2019-07-01' and '2019-12-31'
group by 1,3,4
order by 1;
As commented by a_horse_with_no_name, you can use window functions to take the average of the two previous monthes with a range clause:
select
date,
count(sales)::numeric/count(poss_sales) as Sales_Rate,
avg(count(sales)::numeric/count(poss_sales)) over(
order by date
rows between '2 month' preceding and '1 month' preceding
) Sales_Rate,
count(sales)::numeric/count(poss_sales) as Sales_Rate
- avg(count(sales)::numeric/count(poss_sales)) over(
order by date
rows between '2 month' preceding and '1 month' preceding
) PercentDeviation
from data
where date between '2019-07-01' and '2019-12-31'
group by date
order by date;
Your data is a bit confusing -- it would be less confusing if you had decimal places (that is, 58% being the average of 57% and 58% is not obvious).
Because you want to have NULL values on the first two rows, I'm going to calculate the values using sum() and count():
with q as (
<whatever generates the data you have shown>
)
select q.*,
(sum(sales_rate) over (order by date
rows between 2 preceding and 1 preceding
) /
nullif(count(*) over (order by date
rows between 2 preceding and 1 preceding
)
) as two_month_average
from q;
You could also express this using case and avg():
select q.*,
(case when row_number() over (order by date) > 2)
then avg(sales_rate) over (order by date
rows between 2 preceding and 1 preceding
)
end) as two_month_average
from q;

Delete duplicated rows

I ve got duplicated rows in a temp table mainly because there are some date values which are seconds/miliseconds different to each other.
For example:
2018-08-30 12:30:19.000
2018-08-30 12:30:20.000
This is what causes the duplication.
How can I keep only one of those values? Let s say the higher one?
Thank you.
Well, one method is to use lead():
select t.*
from (select t.*, lead(ts) over (order by ts) as next_ts
from t
) t
where next_ts is null or
datediff(second, ts, next_ts) < 60; -- or whatever threshold you want
You could assign a Row_Number to each value, as follows:
Select *
, Row_Number() over
(partition by ObjectID, cast(date as date)... ---whichever criteria you want to consider duplicates
order by date desc) --assign the latest date to row 1, may want other order criteria if you might have ties on this field
as RN
from MyTable
Then retain only the rows where RN = 1 to remove duplicates. See this answer for examples of how to round your dates to the nearest hour, minute, etc. as needed; I used truncating to the day above as an example.

How to take only one entry from a table based on an offset to a date column value

I have a requirement to get values from a table based on an offset conditions on a date column.
Say for eg: for the below attached table, if there is any dates that comes close within 15 days based on effectivedate column I should return only the first one.
So my expected result would be as below:
Here for A1234 policy, it returns 6/18/16 entry and skipped 6/12/16 entry as the offset between these 2 dates is within 15 days and I took the latest one from the list.
If you want to group rows together that are within 15 days of each other, then you have a variant of the gaps-and-islands problem. I would recommend lag() and cumulative sum for this version:
select polno, min(effectivedate), max(expirationdate)
from (select t.*,
sum(case when prev_ed >= dateadd(day, -15, effectivedate)
then 1 else 0
end) over (partition by polno order by effectivedate) as grp
from (select t.*,
lag(expirationdate) over (partition by polno order by effectivedate) as prev_ed
from t
) t
) t
group by polno, grp;

Smoothing out a result set by date

Using SQL I need to return a smooth set of results (i.e. one per day) from a dataset that contains 0-N records per day.
The result per day should be the most recent previous value even if that is not from the same day. For example:
Starting data:
Date: Time: Value
19/3/2014 10:01 5
19/3/2014 11:08 3
19/3/2014 17:19 6
20/3/2014 09:11 4
22/3/2014 14:01 5
Required output:
Date: Value
19/3/2014 6
20/3/2014 4
21/3/2014 4
22/3/2014 5
First you need to complete the date range and fill in the missing dates (21/3/2014 in you example). This can be done by either joining a calendar table if you have one, or by using a recursive common table expression to generate the complete sequence on the fly.
When you have the complete sequence of dates finding the max value for the date, or from the latest previous non-null row becomes easy. In this query I use a correlated subquery to do it.
with cte as (
select min(date) date, max(date) max_date from your_table
union all
select dateadd(day, 1, date) date, max_date
from cte
where date < max_date
)
select
c.date,
(
select top 1 max(value) from your_table
where date <= c.date group by date order by date desc
) value
from cte c
order by c.date;
May be this works but try and let me know
select date, value from test where (time,date) in (select max(time),date from test group by date);

Last day of the month with a twist in SQLPLUS

I would appreciate a little expert help please.
in an SQL SELECT statement I am trying to get the last day with data per month for the last year.
Example, I am easily able to get the last day of each month and join that to my data table, but the problem is, if the last day of the month does not have data, then there is no returned data. What I need is for the SELECT to return the last day with data for the month.
This is probably easy to do, but to be honest, my brain fart is starting to hurt.
I've attached the select below that works for returning the data for only the last day of the month for the last 12 months.
Thanks in advance for your help!
SELECT fd.cust_id,fd.server_name,fd.instance_name,
TRUNC(fd.coll_date) AS coll_date,fd.column_name
FROM super_table fd,
(SELECT TRUNC(daterange,'MM')-1 first_of_month
FROM (
select TRUNC(sysdate-365,'MM') + level as DateRange
from dual
connect by level<=365)
GROUP BY TRUNC(daterange,'MM')) fom
WHERE fd.cust_id = :CUST_ID
AND fd.coll_date > SYSDATE-400
AND TRUNC(fd.coll_date) = fom.first_of_month
GROUP BY fd.cust_id,fd.server_name,fd.instance_name,
TRUNC(fd.coll_date),fd.column_name
ORDER BY fd.server_name,fd.instance_name,TRUNC(fd.coll_date)
You probably need to group your data so that each month's data is in the group, and then within the group select the maximum date present. The sub-query might be:
SELECT MAX(coll_date) AS last_day_of_month
FROM Super_Table AS fd
GROUP BY YEAR(coll_date) * 100 + MONTH(coll_date);
This presumes that the functions YEAR() and MONTH() exist to extract the year and month from a date as an integer value. Clearly, this doesn't constrain the range of dates - you can do that, too. If you don't have the functions in Oracle, then you do some sort of manipulation to get the equivalent result.
Using information from Rhose (thanks):
SELECT MAX(coll_date) AS last_day_of_month
FROM Super_Table AS fd
GROUP BY TO_CHAR(coll_date, 'YYYYMM');
This achieves the same net result, putting all dates from the same calendar month into a group and then determining the maximum value present within that group.
Here's another approach, if ANSI row_number() is supported:
with RevDayRanked(itemDate,rn) as (
select
cast(coll_date as date),
row_number() over (
partition by datediff(month,coll_date,'2000-01-01') -- rewrite datediff as needed for your platform
order by coll_date desc
)
from super_table
)
select itemDate
from RevDayRanked
where rn = 1;
Rows numbered 1 will be nondeterministically chosen among rows on the last active date of the month, so you don't need distinct. If you want information out of the table for all rows on these dates, use rank() over days instead of row_number() over coll_date values, so a value of 1 appears for any row on the last active date of the month, and select the additional columns you need:
with RevDayRanked(cust_id, server_name, coll_date, rk) as (
select
cust_id, server_name, coll_date,
rank() over (
partition by datediff(month,coll_date,'2000-01-01')
order by cast(coll_date as date) desc
)
from super_table
)
select cust_id, server_name, coll_date
from RevDayRanked
where rk = 1;
If row_number() and rank() aren't supported, another approach is this (for the second query above). Select all rows from your table for which there's no row in the table from a later day in the same month.
select
cust_id, server_name, coll_date
from super_table as ST1
where not exists (
select *
from super_table as ST2
where datediff(month,ST1.coll_date,ST2.coll_date) = 0
and cast(ST2.coll_date as date) > cast(ST1.coll_date as date)
)
If you have to do this kind of thing a lot, see if you can create an index over computed columns that hold cast(coll_date as date) and a month indicator like datediff(month,'2001-01-01',coll_date). That'll make more of the predicates SARGs.
Putting the above pieces together, would something like this work for you?
SELECT fd.cust_id,
fd.server_name,
fd.instance_name,
TRUNC(fd.coll_date) AS coll_date,
fd.column_name
FROM super_table fd,
WHERE fd.cust_id = :CUST_ID
AND TRUNC(fd.coll_date) IN (
SELECT MAX(TRUNC(coll_date))
FROM super_table
WHERE coll_date > SYSDATE - 400
AND cust_id = :CUST_ID
GROUP BY TO_CHAR(coll_date,'YYYYMM')
)
GROUP BY fd.cust_id,fd.server_name,fd.instance_name,TRUNC(fd.coll_date),fd.column_name
ORDER BY fd.server_name,fd.instance_name,TRUNC(fd.coll_date)