T-SQL Question - Counting and Average - sql

I have a set of data that consists of a filenbr, open date and close date.
I need to produce a summary table similar to the below, i need to count how many files belong to each day period, but i need those greater than 20 grouped together. I know how to get the datediff, what i'm stumbling on is how to get the 20+ and the % column
1 day - 30 files - 30%
3 days - 25 files - 25%
10 days - 5 files - 5%
13 days - 20 files - 20%
>= 20 days - 20 files - 20%

suppose you have a table named dayFile with the following columns
Table DayFile
days - files
1 - 10
1 - 5
1 - 15
3 - 20
3 - 5
10 - 5
13 - 20
20 - 5
22 - 5
28 - 10
Then you could do the following
SELECT
SummaryTable.Day,
SUM(SummaryTable.Files) as SumFiles,
Sum(SummaryTable.Files) / SummaryTable.TotalFiles
FROM
(SELECT
CASE WHEN (days >= 20) THEN
20
ELSE DF.days END as Day
files,
(SELECT Count(*) FROM DayFile DFCount) as TotalFiles
FROM DayFile DF) SummaryTable
Group By SummaryTable.Day
EDITED:
SELECT
SummaryTable.Day,
SUM(SummaryTable.Files) as SumFiles,
Sum(SummaryTable.Files) / (SELECT Count(*) FROM DayFile DFCount)
FROM
(SELECT
CASE WHEN (days >= 20) THEN
20
ELSE DF.days END as Day
files
FROM DayFile DF) SummaryTable
Group By SummaryTable.Day

You are unclear as to how the ranges are determined (e.g. does "3 days mean < 3 days, <= 3 days, = 3 days, > 3 days or >= 3 days?). If you are using SQL Server 2005 and higher, you get your results like so:
With PeriodLength As
(
Select DateDiff(d, OpenDate, CloseDate) As DiffDays
From Table
)
, Ranges As
(
Select Case
When DiffDays < 3 Then 'Less than 3 Days'
When DiffDays >= 3 And DiffDays < 10 Then 'Less than 10 Days'
When DiffDays >= 10 And DiffDays < 13 Then 'Less than 13 Days'
When DiffDays >= 13 And DiffDays < 20 Then 'Less than 20 Days'
When DiffDays >= 20 Then 'Greater than 20 days'
End As Range
From PeriodLength
)
Select Range
, Count(*) As FileCount
, Count(*) * 100.000 / (Select Count(*) From Ranges) As Percentage
From Ranges
Group By Range

Related

SQL - In a week get result count of records in that week and count of records ageing 7days from that week

This is redshift SQL
I'm trying to get 2 results for a week:
Total records in that week
Total records ageing greater than 7 days from that week.
say there are sample 100 records in below format, in current example 7 records/week:
day code week
1/1/2020 P001 1
1/2/2020 P002 1
1/3/2020 P003 1
1/4/2020 P004 1
1/5/2020 P005 2
1/6/2020 P006 2
1/7/2020 P007 2
1/8/2020 P008 2
1/9/2020 P009 2
1/10/2020 P010 2
1/11/2020 P011 2
.....................
4/8/2020 P099 15
Trying to get output like this:
Week count count>7 days
1 7 0
2 7 7
3 7 14
4 7 21
15 7 98
Basically for the latest week, i'm trying to get distinct number of records ageing more than 7 days. In actual use case, the number of records in week will vary.
What i've tried:
calendar_week_number,
count(code) as count 1,
count(DISTINCT (case when datediff(day, trunc(completion_date-7), '2020-01-01') then code end)) as count 2,
count(case when completion_date between TO_DATE('20200101','YYYYMMDD') and TO_DATE(completion_date,'YYYYMMDD')-7 then code end) as count 3
from rbsrpt.RBS_DAILY_ASIN_PROC_SNPSHT ul
LEFT JOIN rbsrpt.dim_rbs_time t ON Trunc(ul.completion_date) = trunc(t.cal_date)
where
mp=1
and calendar_year=2020
group by
calendar_week_number
order by calendar_week_number desc
but my output is as below:
week count1 count 2 count 3
51 2866 2866 0
50 3211 3211 0
49 6377 6377 0
48 9013 9013 0
47 5950 5950 0
One option uses lateral joins. It is probably more efficient to aggregate the calendar table by weeks first, then perform the searches on week per week in the dataset.
Assuming Postgres (since there is no TO_DATE() in MySQL):
select d.cal_date, c1.*, c2.*
from (
select calendar_week_number, min(cal_date) as cal_date
rbsrpt.dim_rbs_time t
group by calendar_week_number
) t
cross join lateral (
select count(*) as cnt
from rbsrpt.rbs_daily_asin_proc_snpsht r
where r.completion_date >= t.cal_date
and r.completion_date < t.cal_date + interval '7 day'
) c1
cross join lateral (
select count(*) as cnt_aged
from rbsrpt.rbs_daily_asin_proc_snpsht r
where r.completion_date >= t.cal_date - interval '7' day
and r.completion_date < t.cal_date
) c2
This ages out records after 7 days. If you wanted 30 days instead, you would change the where clause of the second subquery:
cross join lateral (
select count(*) as cnt_aged
from rbsrpt.rbs_daily_asin_proc_snpsht r
where r.completion_date >= t.cal_date - interval '30 day'
and r.completion_date < t.cal_date - interval '23 day'
) c2
Edit: if your database does not support lateral joins, you can use subqueries instead:
select d.cal_date,
(
select count(*)
from rbsrpt.rbs_daily_asin_proc_snpsht r
where r.completion_date >= t.cal_date
and r.completion_date < t.cal_date + interval '7 day'
) as cnt,
(
select count(*)
from rbsrpt.rbs_daily_asin_proc_snpsht r
where r.completion_date >= t.cal_date - interval '7' day
and r.completion_date < t.cal_date
) as cnt_aged
from (
select calendar_week_number, min(cal_date) as cal_date
rbsrpt.dim_rbs_time t
group by calendar_week_number
) t

Redshift: Grouping rows by range and adding to output columns

I have data like this:
Table 1: (lots of items denoted by 1, 2, 3 etc. and with sales date in epochs and the number of sales on the given date as Number. The data only covers the last 12 weeks of sales)
Item | Sales_Date | Number
1 1587633401000 2
1 1587374201000 3
1 1585732601000 4
1 1583054201000 1
1 1582190201000 2
1 1580548601000 3
What I was as the output is a single line per item with each column showing the total sales for each individual month:
Output:
Item | Month_1_Sales | Month_2_Sales | Month_3_Sales
1 3 3 9
As the only sale that occurred happened at 1580548601000 (sales = 3), while 1583054201000 (sales = 1) and 1582190201000 (sales = 2) both occur in Month 2 etc.
So I need to split the sales dates into groups by month, sum their sales numbers, and then these numbers in columns. I am very new to SQL so don't know where to start. Would anyone be able to help?
You can extract the months from the timestamp using:
select extract(month from (timestamp 'epoch' + sales_date / 1000 * interval '1 second'))
However, I am guessing that you really want 4-week periods, because 12 weeks of data is not 3 complete months. That would make more sense to me. For the calculation, use the difference from the earliest date and then use arithmetic and conditional aggregation:
select item,
sum(case when floor((sales_date - min_sales_date) / (1000 * 60 * 60 * 24 * 4 * 7)) = 2
then number
end) as month_3_sales
sum(case when floor((sales_date - min_sales_date) / (1000 * 60 * 60 * 24 * 4 * 7)) = 1
then number
end) as month_2_sales
sum(case when floor((sales_date - min_sales_date) / (1000 * 60 * 60 * 24 * 4 * 7)) = 0
then number
end) as month_3_sales
from (select t1.*,
min(sales_date) over () as min_sales_date
from table1 t1
) t1
group by item;

Force query to return all ranges even when there is no data

I have a query which calculates 3 sums and classifies then into ranges of payment late days.
However sometimes there are no data for some of the ranges, so query returns less then 3 rows.
How to force the query to always return 3 rows, even when there are no data for specific DaysLate range? In this case Suma and Docs shall contain zeros.
SELECT MIN(b.Range) Nazwa, SUM(b.Remaining) Suma, COUNT(*) Docs
FROM (
SELECT
CASE
WHEN a.DaysLate BETWEEN 1 AND 14 THEN 'Late 1 to 14 days'
WHEN a.DaysLate BETWEEN 15 AND 30 THEN 'Late 15 to 30 day'
ELSE 'Late over 30 days'
END AS Range,
CASE
WHEN a.DaysLate BETWEEN 1 AND 14 THEN 1
WHEN a.DaysLate BETWEEN 15 AND 30 THEN 2
ELSE 3
END AS Order,
Remaining, DaysLate
FROM (
SELECT DATEDIFF(dd, PaymentDate, GETDATE()) DaysLate, (WN - MA) Remaining
FROM dbo.MyData
WHERE DATEDIFF(dd, PaymentDate, GETDATE()) > 0 AND KOD_ID = #KOD_ID
) a
) b
GROUP BY b.Order
ORDER BY b.Order ASC
You need a LEFT JOIN, but you can also simplify the query using APPLY:
SELECT ranges.range, SUM(v.Remaining) as Suma,
COUNT(*) as Docs
FROM (VALUES ('Late 1 to 14 days', 1),
('Late 15 to 30 day', 2),
('Late over 30 days', 3)
) ranges(range, ord) LEFT JOIN
(dbo.MyData d CROSS APPLY
(VALUES (DATEDIFF(day, d.PaymentDate, GETDATE()),
WN - MA
)
) v(DaysLate, Remaining) CROSS APPLY
(VALUES (CASE WHEN v.DaysLate BETWEEN 1 AND 14
THEN 'Late 1 to 14 days'
WHEN v.DaysLate BETWEEN 15 AND 30
THEN 'Late 15 to 30 day'
ELSE 'Late over 30 days'
END)
) v1(range)
)
ON v1.range = ranges.range
WHERE v.DaysLate > 0 AND
KOD_ID = #KOD_ID
GROUP BY v.range, v.ord
ORDER BY v.ord;

Counting the number of days excluding sunday between two dates

I am trying to calculate number of days betwen two dates excluding sundays. This is my query,
SELECT F_PLANHM_END_DT
- F_PLANHM_ST_DT
- 2
* (TO_CHAR (F_PLANHM_END_DT, 'WW') - TO_CHAR (F_PLANHM_ST_DT, 'WW'))
FROM VW_S_CURV_PROC
WHERE HEAD_MARK = 'IGG-BLH-BM 221';
SELECT COUNT (*)
FROM (SELECT SYSDATE + l trans_date
FROM ( SELECT LEVEL - 1 l
FROM VW_S_CURV_PROC
CONNECT BY LEVEL <= ( (SYSDATE + 7) - SYSDATE)))
WHERE TO_CHAR (trans_date, 'dy') NOT IN ('sun');
I am retrieving date from a view called VW_S_CURV_PROC with start date : F_PLANHM_ST_DT and end date F_PLANHM_END_DT. Somehow i cant make this to work. Please help me...
You could use the ROW GENERATOR technique to first generate the dates for a given range, and then exclude the SUNDAYs.
For example, this query will give me the total count of days between 1st Jan 2014 and 31st Dec 2014, excluding the Sundays -
SQL> WITH DATA AS
2 (SELECT to_date('01/01/2014', 'DD/MM/YYYY') date1,
3 to_date('31/12/2014', 'DD/MM/YYYY') date2
4 FROM dual
5 )
6 SELECT SUM(holiday) holiday_count
7 FROM
8 (SELECT
9 CASE
10 WHEN TO_CHAR(date1+LEVEL-1, 'DY','NLS_DATE_LANGUAGE=AMERICAN') <> 'SUN'
11 THEN 1
12 ELSE 0
13 END holiday
14 FROM data
15 CONNECT BY LEVEL <= date2-date1+1
16 )
17 /
HOLIDAY_COUNT
-------------
313
SQL>

Extract quarters from time

Is is possible to breakdown hour into quarters and extract them?
For example:
7:00-7:15 = 1
7:15-7:30 = 2
etc
I have time column with values such as 09:30
and I want to extract:
hour = 6
quarter = 2
I can take out the hour, but how do I take out the quarter.
Use the datepart function:
select
case
when TimeColumn is null then null
when datepart(minute, TimeColumn) < 15 then 1
when datepart(minute, TimeColumn) < 30 then 2
when datepart(minute, TimeColumn) < 45 then 3
else 4
end
from MyTable