Moving average on other window function redshift - sql

I got stuck in a problem and need help on this
I have a table like this:
created_time_id | txn_src
1-1-2017 | A
1-1-2017 | A
1-1-2017 | B
1-1-2017 | A
1-1-2017 | C
2-1-2017 | A
2-1-2017 | C
2-1-2017 | B
2-1-2017 | A
3-1-2017 | A
3-1-2017 | A
3-1-2017 | C
In redshift, I have to create a moving average column for the above table along with the source count partition by date
currently I have written the below query
select
txn_src,
created_time_id::char(8)::date as "time",
count_payment
from
(
select
txn_src,
created_time_id,
count(1) as count_payment,
row_number() over (partition by created_time_id
order by
count(1) desc) as seqnum
from
my_table
where
created_time_id >= '1-1-2017' and txn_source is not null
group by
1,
2
) x
where
seqnum <= 10
order by
"time" ,
count_payment desc
This gives me the correct output like
1-1-2017 | A | 3
1-1-2017 | B | 1
and so on
I need moving average like this
time |src|cnt|mvng_avg
1-1-2017 | A | 3 |3
1-1-2017 | B | 1 |1
1-1-2017 | C | 1 |1
2-1-2017 | A | 2 |2.5
and so on ..
Can anybody suggest some good solution for this.

After some struggle, I was able to resolve this using below query.
with txn_source_by_date as (
select
txn_source ,
created_time_id,
count(1) as count_payment,
row_number() over (partition by created_time_id
order by
count(1) desc) as seqnum
from
my_table
where
created_time_id >= 20220801
and txn_source is not null
group by
1,
2
)
select
txn_source,
created_time_id::char(8)::date as "time",
count_payment,
avg(count_payment) over (partition by txn_source
order by
created_time_id rows between 29 preceding and current row ) mvng_avg
from
txn_source_by_date
group by
txn_source,
created_time_id,
count_payment
order by
"time",
txn_source

Related

How to select multiple columns, sum one column and group by multiple columns

I have a table named example, with columns user_id,date_start, and activity
I need to select user_id, date_startcolumns and count unique user_id and then group by user_id and date_start.
Table Data:
----------------------------------
| user_id | date_start | activity |
|---------|------------|-----------|
| 1 |2021-04-01 | CATIA |
| 1 |2021-04-05 | CATIA |
| 1 |2021-04-02 | CATIA |
| 1 |2021-05-01 | CATIA |
| 1 |2021-05-02 | CATIA |
| 3 |2021-05-02 | CATIA |
| 3 |2021-05-03 | CATIA |
| 4 |2021-05-05 | CATIA |
----------------------------------
This Query:
SELECT FORMAT(d.date_start, 'yyyy-MM'), d.user_id
from (select d.user_id, date_start,
count(*) over (partition by user_id) as cnt,
row_number() over (partition by FORMAT(date_start, 'yyyy-MM') order by FORMAT(date_start, 'yyyy-MM') desc) as seqnum
from planner d
) d
where seqnum = 1;
I need my code show like this:
---------------------
| date_start | total |
|------------|--------|
| 2021-04 | 1 |
| 2021-05 | 3 |
---------------------
are you looking for this? :
select FORMAT(d.date_start, 'yyyy-MM') date_start
, count(distinct user_id) total
from planner d
group by FORMAT(date_start, 'yyyy-MM')

Find the Start and End Number

I am looking to use correct window function for my SQL problem.
I have the following table and I need find the start and end numbers of continuous ranges.
Logs table:
+------------+
| log_id |
+------------+
| 1 |
| 2 |
| 3 |
| 7 |
| 8 |
| 10 |
+------------+
Expected Result:
+------------+--------------+
| start_id | end_id |
+------------+--------------+
| 1 | 3 |
| 7 | 8 |
| 10 | 10 |
+------------+--------------+
The idea is just to subtract an increasing value and then aggregate:
select min(log_id), max(log_id)
from (
select t.*, row_number() over (order by log_id) as seqnum
from t
) t
group by (log_id - seqnum)
order by min(log_id);
You can do by using row_number(), try the following and here is the demo.
select
min(log_id) as start_id,
max(log_id) as end_id
from
(
select
log_id,
log_id - row_number() over (order by log_id) as rnk
from logs
) t
group by
rnk
You can also create a CTE-
With CTE AS(
select log_id,
log_id-row_number() over(order by log_id) as diff from logs)
Select MIN(log_id) as start_id,MAX(log_id) as end_id from CTE group by diff
ORDER by start_id

SELECT based on multiple fields in MS-SQL

I have a table with 4 columns:
AcctNumb | PeriodEndingDate | WaterConsumption | ReadingType
There are multiple records for each AcctNumb, with the date that each record was recorded.
What I want to do is grab the most recent date, consumption reading, and reading type for each account.
I have tried using MAX(PeriodEndingDate) and GROUP BY AcctNumb, but I would need to aggregate all the other values, and none of the aggregate functions help me for the WaterConsumption, etc.
Can anyone point me in the right direction?
Thanks
EDIT
Here is a sample table
+----------+------------------+------------------+-------------+
| AcctNumb | PeriodEndingDate | WaterConsumption | ReadingType |
+----------+------------------+------------------+-------------+
| 1000 | 2018-03-31 | 122230 | A |
| 1001 | 2018-03-31 | 24850 | A |
| 1002 | 2018-03-31 | 88540 | A |
| 1000 | 2017-12-31 | 123800 | A |
| 1001 | 2017-12-31 | 3000 | E |
+----------+------------------+------------------+-------------+
The ReadingType is whether it's an actual (A) reading, or an estimate (E).
Try this
SELECT
AcctNumb,
PeriodEndingDate,
WaterConsumption,
ReadingType
FROM (SELECT
AcctNumb,
PeriodEndingDate,
WaterConsumption,
ReadingType,
ROW_NUMBER() OVER (PARTITION BY AcctNumb ORDER BY PeriodEndingDate DESC) AS MostrecentRecord
FROM <TableName>) dt
WHERE MostrecentRecord= 1
This can be done using ROW_NUMBER. It has been asked an answered thousands of times but the query is easier to write than find a duplicate.
select *
from
(
select *
, RowNum = ROW_NUMBER() over(partition by AcctNumb order by PeriodEndingDate)
from YourTable
) x
where x.RowNum = 1
SELECT DQ.* FROM
(SELECT *,
Row_Number() OVER (PARTITION BY AcctNumb ORDER BY PeriodEndingDate DESC) AS RN
FROM YourTable
) AS DQ
WHERE DQ.RN = 1

Select top 1 Student Fee From List In SQL Server

In my SQL Server table, I have this data:
+------+-----+------------+
| Name | Fee | Date_Time |
+------+-----+------------+
| AA | 50 | 2018-03-27 |
| AA | 30 | 2018-04-10 |
| BB | 40 | 2018-01-10 |
| BB | 10 | 2018-04-10 |
| CC | 10 | 2018-04-10 |
| DD | 10 | 2018-04-10 |
+------+-----+------------+
How can I get data using SQL query like TOP 1 for (AA, BB, CC, DD) ORDER BY Date_Time DESC into a list?
+------+-----+------------+
| Name | Fee | Date_Time |
+------+-----+------------+
| AA | 30 | 2018-04-10 |
| BB | 10 | 2018-04-10 |
| CC | 10 | 2018-04-10 |
| DD | 10 | 2018-04-10 |
+------+-----+------------+
Use row_number() function to get the top most Fee
select top(1) with ties Name, Fee, Date_Time
from table t
order by row_number() over (partition by Name order by Date_Time desc)
Another approach can be
SELECT Name,Fee,Date_Time FROM
(
SELECT *, ROW_NUMBER() OVER(PARTITION BY NAME ORDER BY DATE_TIME DESC) RN
FROM [TABLE_NAME]
) T
WHERE RN=1
In case if you have multiple entries on same day for a particular fee, and you want both should appear you can use DENSE_RANK() instead of ROW_NUMBER() like following.
SELECT Name,Fee,Date_Time FROM
(
SELECT *, DENSE_RANK() OVER(PARTITION BY NAME ORDER BY DATE_TIME DESC) RN
FROM [TABLE_NAME]
) T
WHERE RN=1
DEMO
Give a row_number based on the partition by Name and order by descending order of Date_Time and then select rows having row_number is 1.
Query
;with cte as (
select [rn] = row_number() over(
partition by [Name]
order by [Date_Time] desc
), *
from [your_table_name]
)
select [Name], [Fee], [Date_Time]
from cte
where [rn] = 1;

Partition By over Two Columns in Row_Number function

I am trying to RANK the records using the following query:
SELECT
ROW_NUMBER() over (partition by
TW.EMPL_ID,TW.HR_DEPT_ID,TW.Transfer_Startdate
order by TW.EMPL_ID,TW.Effective_Bdate) RN,
TW.EMPL_ID,TW.HR_DEPT_ID,TW.Transfer_Startdate,Effective_BDate from
TT_EMPLOYEE_WORKDAY TW
where TW.HR_DOMAIN_CODE = 'SGP'
However the resultant Row_Number computed column only displays partition for the first column. Ideally I expected to have the same value for Row_Number where the partition by column data is identical.
Any clue where I might be going wrong?
USING RANK or DENSE RANK isn't an option as I want to identify all such rows for multiple employee where EMPL_ID, HR_DEPT_ID and Transfer_StartDate are same (RN=1)
Sample data:
RN AON_EMPL_ID HR_DEPT_ID Transfer_Startdate Effective_BDate
1 0100690 69895 01/01/2017 2017-01-01
2 0100690 69895 01/01/2017 2017-01-03
3 0100690 69895 01/01/2017 2017-01-04
expanding sample data to:
create table t (
aon_empl_id varchar(16)
, hr_dept_id varchar(16)
, Transfer_Startdate date
, Effective_bdate date
);
insert into t values
('0100690','69895','01/01/2017','2017-01-01')
,('0100690','69895','01/01/2017','2017-01-03')
,('0100690','69895','01/01/2017','2017-01-04')
,('0200700','69895','01/01/2016','2016-01-01')
,('0200700','69895','01/01/2016','2016-01-03')
,('0200700','69896','01/01/2017','2017-01-04')
,('0200700','69896','01/01/2017','2017-01-04');
using top with ties
select top 1 with ties
aon_empl_id
, hr_dept_id
, Transfer_Startdate = convert(char(10),Transfer_Startdate,120)
, Effective_bdate = convert(char(10),Effective_bdate,120)
from t
order by row_number() over (
partition by aon_empl_id, hr_dept_id, Transfer_Startdate
order by Effective_bdate
)
rextester demo: http://rextester.com/KOIZ42069
returns:
+-------------+------------+--------------------+-----------------+
| aon_empl_id | hr_dept_id | Transfer_Startdate | Effective_bdate |
+-------------+------------+--------------------+-----------------+
| 0100690 | 69895 | 2017-01-01 | 2017-01-01 |
| 0200700 | 69895 | 2016-01-01 | 2016-01-01 |
| 0200700 | 69896 | 2017-01-01 | 2017-01-04 |
+-------------+------------+--------------------+-----------------+
Alternative using a common table expression with row_number():
;with cte as (
select
rn = row_number() over (
partition by aon_empl_id, hr_dept_id, Transfer_Startdate
order by Effective_bdate
)
, aon_empl_id
, hr_dept_id
, Transfer_Startdate = convert(char(10),Transfer_Startdate,120)
, Effective_bdate = convert(char(10),Effective_bdate,120)
from t tw
)
select *
from cte
where rn = 1
returns:
+----+-------------+------------+--------------------+-----------------+
| rn | aon_empl_id | hr_dept_id | Transfer_Startdate | Effective_bdate |
+----+-------------+------------+--------------------+-----------------+
| 1 | 0100690 | 69895 | 2017-01-01 | 2017-01-01 |
| 1 | 0200700 | 69895 | 2016-01-01 | 2016-01-01 |
| 1 | 0200700 | 69896 | 2017-01-01 | 2017-01-04 |
+----+-------------+------------+--------------------+-----------------+
SELECT
RANK() over (partition by --or DENSE_RANK()
TW.EMPL_ID,TW.HR_DEPT_ID,TW.Transfer_Startdate
order by TW.EMPL_ID,TW.Effective_Bdate) RN,
TW.EMPL_ID,TW.HR_DEPT_ID,TW.Transfer_Startdate,Effective_BDate from
TT_EMPLOYEE_WORKDAY TW
where TW.HR_DOMAIN_CODE = 'SGP'
UPDATE
SELECT
RANK() over (partition by --or DENSE_RANK()
TW.EMPL_ID,TW.HR_DEPT_ID,TW.Transfer_Startdate
order by TW.EMPL_ID) RN,
TW.EMPL_ID,TW.HR_DEPT_ID,TW.Transfer_Startdate,Effective_BDate from
TT_EMPLOYEE_WORKDAY TW
where TW.HR_DOMAIN_CODE = 'SGP'
Order by RN,TW.Effective_Bdate
This bit of code appears to be working:
SELECT
dense_rank() over (partition by AON_EMPL_ID
order by AON_EMPL_ID,HR_DEPT_ID,Transfer_StartDate) RN,
TW.AON_EMPL_ID,TW.HR_DEPT_ID,TW.Transfer_Startdate,Effective_BDate from
TT_AON_EMPLOYEE_WORKDAY TW
where TW.HR_DOMAIN_CODE = 'SGP'
Apparently, I just need to partition by AON_EMPL_ID and everything else should go to Order By clause.