Redshift: Grouping rows by range and adding to output columns - sql

I have data like this:
Table 1: (lots of items denoted by 1, 2, 3 etc. and with sales date in epochs and the number of sales on the given date as Number. The data only covers the last 12 weeks of sales)
Item | Sales_Date | Number
1 1587633401000 2
1 1587374201000 3
1 1585732601000 4
1 1583054201000 1
1 1582190201000 2
1 1580548601000 3
What I was as the output is a single line per item with each column showing the total sales for each individual month:
Output:
Item | Month_1_Sales | Month_2_Sales | Month_3_Sales
1 3 3 9
As the only sale that occurred happened at 1580548601000 (sales = 3), while 1583054201000 (sales = 1) and 1582190201000 (sales = 2) both occur in Month 2 etc.
So I need to split the sales dates into groups by month, sum their sales numbers, and then these numbers in columns. I am very new to SQL so don't know where to start. Would anyone be able to help?

You can extract the months from the timestamp using:
select extract(month from (timestamp 'epoch' + sales_date / 1000 * interval '1 second'))
However, I am guessing that you really want 4-week periods, because 12 weeks of data is not 3 complete months. That would make more sense to me. For the calculation, use the difference from the earliest date and then use arithmetic and conditional aggregation:
select item,
sum(case when floor((sales_date - min_sales_date) / (1000 * 60 * 60 * 24 * 4 * 7)) = 2
then number
end) as month_3_sales
sum(case when floor((sales_date - min_sales_date) / (1000 * 60 * 60 * 24 * 4 * 7)) = 1
then number
end) as month_2_sales
sum(case when floor((sales_date - min_sales_date) / (1000 * 60 * 60 * 24 * 4 * 7)) = 0
then number
end) as month_3_sales
from (select t1.*,
min(sales_date) over () as min_sales_date
from table1 t1
) t1
group by item;

Related

Calculating compound interest with deposits/withdraws

I'm trying to calculate the total on an interest-bearing account accounting for deposits/withdraws with BigQuery.
Example scenario:
Daily interest rate = 10%
Value added/removed on every day: [100, 0, 29, 0, -100] (negative means amount removed)
The totals for each day are:
Day 1: 0*1.1 + 100 = 100
Day 2: 100*1.1 + 0 = 110
Day 3: 110*1.1 + 29 = 150
Day 4: 150*1.1 + 0 = 165
Day 5: 165*1.1 - 100 = 81.5
This would be trivial to implement in a language like Python
daily_changes = [100, 0, 29, 0, -100]
interest_rate = 0.1
result = []
for day, change in enumerate(daily_changes):
if day == 0:
result.append(change)
else:
result.append(result[day-1]*(1+interest_rate) + change)
print(result)
# Result: [100, 110.00000000000001, 150.00000000000003, 165.00000000000006, 81.50000000000009]
My difficulty lies in calculating values for row N when they depend on row N-1 (the usual SUM(...) OVER (ORDER BY...) solution does not suffice here).
Here's a CTE to test with the mock data in this example.
with raw_data as (
select 1 as day, numeric '100' as change union all
select 2 as day, numeric '0' as change union all
select 3 as day, numeric '29' as change union all
select 4 as day, numeric '0' as change union all
select 5 as day, numeric '-100' as change
)
select * from raw_data
You may try below:
SELECT day,
ROUND((SELECT SUM(c * POW(1.1, day - o - 1))
FROM t.changes c WITH OFFSET o), 2) AS totals
FROM (
SELECT *, ARRAY_AGG(change) OVER (ORDER BY day) changes
FROM raw_data
) t;
+-----+--------+
| day | totals |
+-----+--------+
| 1 | 100.0 |
| 2 | 110.0 |
| 3 | 150.0 |
| 4 | 165.0 |
| 5 | 81.5 |
+-----+--------+
Another option with use of recursive CTE
with recursive raw_data as (
select 1 as day, numeric '100' as change union all
select 2 as day, numeric '0' as change union all
select 3 as day, numeric '29' as change union all
select 4 as day, numeric '0' as change union all
select 5 as day, numeric '-100' as change
), iterations as (
select *, change as total
from raw_data where day = 1
union all
select r.day, r.change, 1.1 * i.total + r.change
from iterations i join raw_data r
on r.day = i.day + 1
)
select *
from iterations
with output

How to select data with an unusual grouping by date?

There is a table:
id
direction_id
created_at
1
2
22 November 2021 г., 16:00:00
2
2
22 November 2021 г., 16:20:00
43
2
22 November 2021 г., 16:25:00
455
1
22 November 2021 г., 16:27:00
6567
2
22 November 2021 г., 17:36:00
674556
2
22 November 2021 г., 20:01:00
5243554
1
22 November 2021 г., 20:50:00
5243554
1
22 November 2021 г., 21:46:00
I need to get the following result:
1
2
created_at_by_hour
1
3
22.11.21 17
1
4
22.11.21 18
1
4
22.11.21 19
1
4
22.11.21 20
2
5
22.11.21 21
3
5
22.11.21 22
1 and 2 in the header are all possible values of direction_id that are in the table.
created_at is reduced to hours and you need to count how many records satisfy the condition <= created_at_by_hour. But the grouping should be such that if the time (hour) when no records were created, then just duplicate the previous hour.
The table consists of three fields - id (int), direction_id (int), created_at (timestamptz). I need to get an hourly (based on the created_at field) data upload with the number of records created before this "grouped" time. But I need not just the number, but separately for each direction_id (there are only two of them - 1 and 2). If no records were created for a certain direction_id at a certain hour, duplicate the previous one, but the result should end at the last created_at. created_at is the time when the record was created.
In my opinion, better to generate a date between min and max date according to an hour then calculate the count of each direction.
Demo
with time_range as (
select
min(created_at) + interval '1 hour' as min,
max(created_at) + interval '1 hour' as max
from test
)
select
count(*) filter (where direction_id = 1) as "1",
count(*) filter (where direction_id = 2) as "2",
to_char(gs.hour, 'dd.mm.yy HH24') as created_at_by_hour
from
test t
cross join time_range tr
inner join generate_series(tr.min, tr.max, interval '1 hour') gs(hour)
on t.created_at <= gs.hour
group by gs.hour
order by gs.hour
Truncate the date down to the hour, group by it and count. Then use SUM OVER to get a running total of the counts. In order to show missing hours in the table, you must generate a series of hours and outer join your data.
with hourly as
(
select date_trunc('hour', created_at) as hour, direction_id from mytable
)
, hours(hour) as
(
select *
from generate_series
(
(select min(hour) from hourly), (select max(hour) from hourly), interval '1 hour'
)
)
select
hours.hour,
sum(count(*) filter (where hourly.direction_id = 1)) over (order by hour) as "1",
sum(count(*) filter (where hourly.direction_id = 2)) over (order by hour) as "2"
from hours
left join hourly using (hour)
group by hour
order by hour;
Demo: https://dbfiddle.uk/?rdbms=postgres_14&fiddle=21d0c838452a09feac4ebc57906829f4

SQL query for incoming and outgoing stocks, first and last

I need to make a query that shows sales and stocks (incoming and outgoing) for each model in October 2021.
The point is that for obtaining incoming and outgoing stocks I need to get vt_stocks_cube_sz.qty respectively for the first day of month and for the last day of month .
Now I wrote just sum of stocks (SUM(vt_stocks_cube_sz.qty) as stocks) but it isn't correct.
Could you help me to split the stocks according to the rule above, I cannot understant how to write the query correctly.
%%time
SELECT vt_sales_cube_sz.modc_barc2 model,
SUM(vt_sales_cube_sz.qnt) sales,
SUM(vt_stocks_cube_sz.qty) as stocks
FROM vt_sales_cube_sz
LEFT JOIN vt_date_cube2
ON vt_sales_cube_sz.id_calendar_int = vt_date_cube2.id_calendar_int
LEFT JOIN vt_stocks_cube_sz ON
vt_stocks_cube_sz.parent_modc_barc = vt_sales_cube_sz.modc_barc AND
vt_stocks_cube_sz.id_stock = vt_sales_cube_sz.id_stock AND
vt_stocks_cube_sz.id_calendar_int = vt_sales_cube_sz.id_calendar_int AND
vt_stocks_cube_sz.vipusk_type = vt_sales_cube_sz.price_type
WHERE vt_date_cube2.wk_year_id = 2021
AND vt_date_cube2.wk_MoY_id = 10
AND vt_sales_cube_sz.id_stock IN
(SELECT id_stock
FROM vt_warehouse_cube
WHERE channel = \'OffLine\')
GROUP BY vt_sales_cube_sz.modc_barc2
If you're looking for a robust and generalizable approach I'd suggest using analytic functions such as FIRST_VALUE, LAST_VALUE or something slightly different with RANK or ROW_NUMBER.
A simple example follows, so you can rerun it on your side and adjust it to the specific tables/fields you're using.
N.B.: You might need some tiebreakers in case you had multiple entries for the same first/last day.
with dummy_table as (
SELECT 1 as month, 1 as day, 10 as value UNION ALL
SELECT 1 as month, 2 as day, 20 as value UNION ALL
SELECT 1 as month, 3 as day, 30 as value UNION ALL
SELECT 2 as month, 1 as day, 5 as value UNION ALL
SELECT 2 as month, 3 as day, 15 as value UNION ALL
SELECT 2 as month, 5 as day, 25 as value
)
SELECT
month,
day,
case when day = first_day then 'first' else 'last' end as type,
value,
FROM (
SELECT *
, FIRST_VALUE(day) over (partition by month order by day ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) as first_day
, LAST_VALUE(day) over (partition by month order by day ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) as last_day
FROM dummy_table
) tmp
WHERE day = first_day OR day=last_day
Dummy table:
Row
month
day
value
1
1
1
10
2
1
2
20
3
1
3
30
4
2
1
5
5
2
3
15
6
2
5
25
Result:
Row
month
day
type
value
1
1
1
first
10
2
1
3
last
30
3
2
1
first
5
4
2
5
last
25

Postgres query to get data datewise

I am using PostgreSQL.
I have a table like below:
ID product_id Date Qty
-----------------------------------
1 12 2008-06-02 50
2 3 2008-07-12 5
3 12 2009-02-10 25
4 10 2012-11-01 22
5 2 2011-03-25 7
Now I want the result like below (i.e product wise sum of qty field of last 4 years):
product_id
QTY(current_year)
QTY( current year + last_year)
QTY_last2_years
QTY > 2 years
SELECT product_id
,sum(CASE mydate >= x.t THEN qty END) AS qty_current_year
,sum(CASE mydate >= (x.t - interval '1 y') THEN qty END) AS qty_since_last_year
,sum(CASE mydate >= (x.t - interval '2 y')
AND mydate < x.t THEN qty END) AS qty_last_2_year
,sum(CASE mydate < (x.t - interval '2 y') THEN qty END) AS qty_older
FROM tbl
CROSS JOIN (SELECT date_trunc('year', now()) AS t) x -- calculate once
GROUP BY 1;
To resuse the calculated beginning of the current year I CROSS JOIN it as subquery x.

T-SQL Question - Counting and Average

I have a set of data that consists of a filenbr, open date and close date.
I need to produce a summary table similar to the below, i need to count how many files belong to each day period, but i need those greater than 20 grouped together. I know how to get the datediff, what i'm stumbling on is how to get the 20+ and the % column
1 day - 30 files - 30%
3 days - 25 files - 25%
10 days - 5 files - 5%
13 days - 20 files - 20%
>= 20 days - 20 files - 20%
suppose you have a table named dayFile with the following columns
Table DayFile
days - files
1 - 10
1 - 5
1 - 15
3 - 20
3 - 5
10 - 5
13 - 20
20 - 5
22 - 5
28 - 10
Then you could do the following
SELECT
SummaryTable.Day,
SUM(SummaryTable.Files) as SumFiles,
Sum(SummaryTable.Files) / SummaryTable.TotalFiles
FROM
(SELECT
CASE WHEN (days >= 20) THEN
20
ELSE DF.days END as Day
files,
(SELECT Count(*) FROM DayFile DFCount) as TotalFiles
FROM DayFile DF) SummaryTable
Group By SummaryTable.Day
EDITED:
SELECT
SummaryTable.Day,
SUM(SummaryTable.Files) as SumFiles,
Sum(SummaryTable.Files) / (SELECT Count(*) FROM DayFile DFCount)
FROM
(SELECT
CASE WHEN (days >= 20) THEN
20
ELSE DF.days END as Day
files
FROM DayFile DF) SummaryTable
Group By SummaryTable.Day
You are unclear as to how the ranges are determined (e.g. does "3 days mean < 3 days, <= 3 days, = 3 days, > 3 days or >= 3 days?). If you are using SQL Server 2005 and higher, you get your results like so:
With PeriodLength As
(
Select DateDiff(d, OpenDate, CloseDate) As DiffDays
From Table
)
, Ranges As
(
Select Case
When DiffDays < 3 Then 'Less than 3 Days'
When DiffDays >= 3 And DiffDays < 10 Then 'Less than 10 Days'
When DiffDays >= 10 And DiffDays < 13 Then 'Less than 13 Days'
When DiffDays >= 13 And DiffDays < 20 Then 'Less than 20 Days'
When DiffDays >= 20 Then 'Greater than 20 days'
End As Range
From PeriodLength
)
Select Range
, Count(*) As FileCount
, Count(*) * 100.000 / (Select Count(*) From Ranges) As Percentage
From Ranges
Group By Range