PostgreSQL/plpython: how compare two columns from different table in loop - sql

I have a problem with loop in which I must compare columns between different tables.
I have two tables year2004 and year2005. Both contains month numbers and an amount for that month. I want compare the amount from both tables and produce a third table year with the number of month and greatest amount for that month.
For example I have in 2004 - 100, in 2005 - 200 so I must return values(2005, number_of_month, 200). Have you any ideas for solve this problem?
PS. Sorry for my writing errors, I learned English only few years ago :)

I'm guessing that you're trying to find the greatest amount for each month across the two years.
This would be much, much easier if your data was all in one table monthly_statistics with a date column. Then it'd just be a simple aggregate function or a window.
So lets turn the two tables into one.
Given sample data:
CREATE TABLE year2004 ( month int primary key, amount int);
INSERT INTO year2004 (month, amount)
VALUES (1, 50), (2, 40), (3, 60), (4, 80), (5, 100), (6, 800), (7, 20), (8, 40), (9, 30), (10, 40), (11, 50), (12, 99);
CREATE TABLE year2005 ( month int primary key, amount int);
INSERT INTO year2005 (month, amount)
VALUES (1, 88), (2, 44), (3, 11), (4, 123), (5, 12), (6, 88), (7, 21), (8, 19), (9, 44), (10, 89), (11, 4), (12, 42);
we could either join the tables, or we could convert it to a single table by date then filter it. Here's how we might generate a single table with the contents:
SELECT DATE '2004-01-01' + month * INTERVAL '1' MONTH AS sampledate, amount
FROM year2004
UNION ALL
SELECT DATE '2005-01-01' + month * INTERVAL '1' MONTH, amount
FROM year2005;
That's what you'd use if you were going to create a new table, but if you don't care about the actual dates, only the months, you can simply union all the two tables:
WITH samples AS (
SELECT month, amount
FROM year2004
UNION ALL
SELECT month, amount
FROM year2005
)
SELECT month, max(amount) AS amount
FROM samples
GROUP BY 1
ORDER BY month;
samplemonth | amount
-------------+--------
5 | 123
11 | 89
1 | 99
2 | 88
3 | 44
9 | 40
4 | 60
6 | 100
10 | 44
12 | 50
7 | 800
8 | 21
(12 rows)

Related

Cumulative sum reset by change of year (SQL Server)

I'm using SQL Server 2017. I would like to sum up the budget per month of a year for that year and factory.
This cumulation is to be reset with each new year.
Table schema:
CREATE TABLE [TABLE_1]
(
FACTORY varchar(50) Null,
DATE_YM int Null,
BUDGET int NULL,
);
INSERT INTO TABLE_1 (FACTORY, DATE_YM, BUDGET)
VALUES ('A', 202111, 1),
('A', 202112, 1),
('A', 202201, 10),
('A', 202202, 100),
('A', 202203, 1000),
('B', 202111, 2),
('B', 202112, 2),
('B', 202201, 20),
('B', 202202, 200),
('B', 202203, 2000),
('C', 202111, 3),
('C', 202112, 3),
('C', 202201, 30),
('C', 202202, 300),
('C', 202203, 3000);
LINK TO db<>fiddle
Desired result
FACTORY
DATE_YM
C_BUDGET_SUM
A
202111
1
A
202112
2
A
202201
10
A
202202
110
A
202203
1110
B
202111
2
B
202112
4
B
202201
20
B
202202
220
B
202203
2220
C
202111
3
C
202112
6
C
202201
30
C
202202
330
C
202203
3330
My approach:
WITH data AS
(
SELECT
T1.FACTORY,
T1.DATE_YM,
T1.BUDGET
FROM
TABLE_1 AS T1
)
SELECT
FACTORY,
DATE_YM,
SUM(BUDGET) OVER (ORDER BY FACTORY, DATE_YM ASC
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS 'C_BUDGET_SUM'
FROM
data
This query totals across year ends. How can the year break be implemented dynamically?
The CTE is not necessary, but I'm assuming this is a simplified version.
To expand on my comment
with data as (
select
T1.FACTORY,
T1.DATE_YM,
T1.BUDGET
from TABLE_1 as T1
)
select
FACTORY,
DATE_YM,
sum(BUDGET) over (partition by Factory,left(Date_YM,4) order by DATE_YM asc rows between unbounded preceding and current row) as 'C_BUDGET_SUM'
from data
Results

Assistance with join using a where clause and duplicates

I am having some difficulty with writing an accurate view.
I have 2 tables that I am looking to join on different databases.
Table 1 (in database 1) contains 3 columns:
Purchase_date
Item_id
Quantity_purchased
Table 2 (in database 2) contains 3 columns:
Item_id
Price_effective_date
Price
I am trying to determine the price of the item at the purchase date, which is a challenge since the item prices change on price effective dates. Accordingly, table 2 will have multiple instances of the same item_id, but with different prices and price effective dates.
My current code is:
select tb1.*,
tb2.price x tb1.quantity_purchased as total_price
from "Database2"."schema"."Table1" tb1
left join (select item_id,
price
from "Database2"."Schema"."Table2"
) tb2
on tb1.item_id = tb2.item_id
where tb2.price_effective_date <= tb1.purchase_date
I want to limit my results to the price at the most recent price_effective_date that is just before the purchase_date.
Any recommendations?
It's not really Snowflake specific, and luckily it can be addressed with a pretty common pattern in SQL queries.
Let's prepare some data (btw, for the future, it's best to provide the exact setup like this in your questions, it helps investigations tremendously):
create or replace table tb1(purchase_date date, item_id int, quantity int);
insert into tb1 values
('2020-01-01', 101, 1),
('2020-06-30', 101, 1),
('2020-07-01', 101, 1),
('2020-12-31', 101, 1),
('2021-01-01', 101, 1),
('2020-01-01', 102, 1),
('2020-06-30', 102, 1),
('2020-07-01', 102, 1),
('2020-12-31', 102, 1),
('2021-01-01', 102, 1);
create or replace table tb2(item_id int, effective_date date, price decimal);
insert into tb2 values
(101, '2020-01-01', 10),
(101, '2021-01-01', 11),
(102, '2020-01-01', 20),
(102, '2020-07-01', 18),
(102, '2021-01-01', 22);
Now, what you want is to join records from tb1 and tb2 on item_id, but only use the records from tb2 where effective_date is the largest of all the values of effective_date for that item that are before purchase_date. Correct? If you phrase it like this, the SQL writes itself almost:
select tb1.*, tb2.effective_date, tb2.price
from tb1 join tb2 on tb1.item_id = tb2.item_id
where tb2.effective_date = (
select max(effective_date)
from tb2 sub
where sub.effective_date <= tb1.purchase_date
and sub.item_id = tb1.item_id
)
order by tb1.item_id, purchase_date;
The result is hopefully what you want:
PURCHASE_DATE
ITEM_ID
QUANTITY
EFFECTIVE_DATE
PRICE
2020-01-01
101
1
2020-01-01
10
2020-12-31
101
1
2020-01-01
10
2021-01-01
101
1
2021-01-01
11
2020-01-01
102
1
2020-01-01
20
2020-06-30
102
1
2020-01-01
20
2020-07-01
102
1
2020-07-01
18
2020-12-31
102
1
2020-07-01
18
2021-01-01
102
1
2021-01-01
22
Note, this query will not handle wrong data, e.g. purchases with no matching items and effective dates.
EDIT: Handling missing effective_dates
To handle cases where there are no effective dates matching the purchase date, you can identify the "missing" purchases, and then add the smallest existing effective_date for these items, e.g. (we add a new item, value 103 to the existing table to showcase this):
insert into tb1 values
('2020-06-01', 103, 11),
('2020-08-01', 103, 12);
insert into tb2 values
(103, '2020-07-01', 30);
with missing as (
select * from tb1 where not exists (
select * from tb2
where tb2.effective_date <= tb1.purchase_date
and tb2.item_id = tb1.item_id)
)
select m.item_id, m.purchase_date, m.quantity,
(select min(effective_date) from tb2 where tb2.item_id = m.item_id) best_date
from missing m;
You can take this query and UNION ALL it with the original query.

SQL Server Query for average value over a date period

DECLARE #SampleOrderTable TABLE
(
pkPersonID INT,
OrderDate DATETIME,
Amount NUMERIC(18, 6)
)
INSERT INTO #SampleOrderTable (pkPersonID, OrderDate, Amount)
VALUES (1, '12/10/2019', '762.84'),
(2, '11/10/2019', '886.32'),
(3, '11/9/2019', '10245.00')
How do I select the the last 4 days prior to OrderDate and the average Amount over that period?
So result data would be:
pkPersonID Date Amount
------------------------------------
1 '12/7/2019' 190.71
1 '12/8/2019' 190.71
1 '12/9/2019' 190.71
1 '12/10/2019' 190.71
2 '12/7/2019' 221.58
2 '12/8/2019' 221.58
2 '12/9/2019' 221.58
2 '12/10/2019' 221.58
3 '11/6/2019' 2561.25
3 '11/7/2019' 2561.25
3 '11/8/2019' 2561.25
3 '11/9/2019' 2561.25
You may try with the following approach, using DATEADD(), windowed COUNT() and VALUES() table value constructor:
Table:
DECLARE #SampleOrderTable TABLE (
pkPersonID INT,
OrderDate DATETIME,
Amount NUMERIC(18, 6)
)
INSERT INTO #SampleOrderTable (pkPersonID, OrderDate, Amount)
VALUES (1, '20191210', '762.84'),
(2, '20191210', '886.32'),
(3, '20191109', '10245.00')
Statement:
SELECT
t.pkPersonID,
DATEADD(day, -v.Day, t.OrderDate) AS [Date],
CONVERT(numeric(18, 6), Amount / COUNT(Amount) OVER (PARTITION BY t.pkPersonID)) AS Amount
FROM #SampleOrderTable t
CROSS APPLY (VALUES (0), (1), (2), (3)) v(Day)
ORDER BY t.pkPersonID, [Date]
Result:
pkPersonID Date Amount
1 07/12/2019 00:00:00 190.710000
1 08/12/2019 00:00:00 190.710000
1 09/12/2019 00:00:00 190.710000
1 10/12/2019 00:00:00 190.710000
2 07/12/2019 00:00:00 221.580000
2 08/12/2019 00:00:00 221.580000
2 09/12/2019 00:00:00 221.580000
2 10/12/2019 00:00:00 221.580000
3 06/11/2019 00:00:00 2561.250000
3 07/11/2019 00:00:00 2561.250000
3 08/11/2019 00:00:00 2561.250000
3 09/11/2019 00:00:00 2561.250000
You can use sql functions like AVG, DATEADD and GETDATE.
SELECT AVG(Amount) as AverageAmount
FROM #SampleOrderTable
WHERE OrderDate >= DATEADD(DAY, -4, GETDATE())
DECLARE #SampleOrderTable TABLE (
pkPersonID INT,
OrderDate DATETIME,
Amount NUMERIC(18, 6)
);
INSERT INTO #SampleOrderTable
(pkPersonID, OrderDate, Amount)
VALUES
(1, '12/20/2019', 762.84),
(2, '12/20/2019', 886.32),
(3, '12/20/2019', 10245.00),
(4, '12/19/2019', 50.00),
(5, '12/19/2019', 100.00),
(6, '09/01/2019', 200.00),
(7, '09/01/2019', 300.00),
(8, '12/15/2019', 400.00),
(9, '12/15/2019', 500.00),
(10, '09/02/2019', 150.00),
(11, '09/02/2019', 1100.00),
(12, '09/02/2019', 1200.00),
(13, '09/02/2019', 1300.00),
(14, '09/02/2019', 1400.00),
(15, '09/02/2019', 1500.00);
SELECT OrderDate,AVG(Amount) AS Average_Value
FROM #SampleOrderTable
WHERE DATEDIFF(DAY, CAST(OrderDate AS DATETIME), CAST(GETDATE() AS Datetime)) <= 4
GROUP BY OrderDate;

count trip in sql server

my table structure
id zoneid status
1 35 IN starting zone
2 35 OUT 1st trip has been started
3 36 IN
4 36 IN
5 36 OUT
6 38 IN last station zone 1 trip completed
7 38 OUT returning back 2nd trip has start
8 38 OUT
9 36 IN
10 36 OUT
11 35 IN when return back in start zone means 2nd trip complete
12 35 IN
13 35 IN
14 35 OUT 3rd trip has been started
15 36 IN
16 36 IN
17 36 OUT
18 38 IN 3rd trip has been completed
19 38 OUT 4th trip has been started
20 38 OUT
21 36 IN
22 36 OUT
23 35 IN 4th trip completed
24 35 IN
now i want a sql query, so i can count no of trips. i do not want to use status field for count
edit
i want result total trips
where 35 is the starting point and 38 is the ending point(this is 1 trip), when again 35 occures after 38 means 2 trip and so on.
So you don't want to look at the status, but only look at the zoneid changes ordered by id. zoneid 36 is irrelevant, so we select 35 and 38 only, order them by id and count changes. We detect changes by comparing a record with the previous one. We can look into a previous record with LAG.
select sum(ischange) as trips_completed
from
(
select
case when zoneid <> lag(zoneid) over (order by id) then 1 else 0 end as ischange
from trips
where zoneid in (35,38)
) changes_detected;
I am suggesting this without any testing. Does the following query produce the correct number of rows? Note if there is a date_created (datetime) column then I would suggest using that column to order by instead of id.
select
ca.in_id, t.id as out_id, ca.in_status, t.status as out_status
from table1 t
cross apply (
select top (1) id as in_id, status as in_status
from table1
where table1.id < t.id
and zoneid = 35
order by id DESC
) ca
where t.zoneid = 38
/* and conditions for selecting one day only */
If that logic is correct then just use COUNT(*) instead of the column list.
CREATE TABLE Table1
("id" int, "zoneid" int, "status" varchar(3), "other" varchar(54))
;
INSERT INTO Table1
("id", "zoneid", "status", "other")
VALUES
(1, 35, 'IN', 'starting zone'),
(2, 35, 'OUT', '1st trip has been started'),
(3, 36, 'IN', NULL),
(4, 36, 'IN', NULL),
(5, 36, 'OUT', NULL),
(6, 38, 'IN', 'last station zone 1 trip completed'),
(7, 38, 'OUT', 'returning back 2nd trip has start'),
(8, 38, 'OUT', NULL),
(9, 36, 'IN', NULL),
(10, 36, 'OUT', NULL),
(11, 35, 'IN', 'when return back in start zone means 2nd trip complete'),
(12, 35, 'IN', NULL),
(13, 35, 'IN', NULL),
(14, 35, 'OUT', '3rd trip has been started'),
(15, 36, 'IN', NULL),
(16, 36, 'IN', NULL),
(17, 36, 'OUT', 'other'),
(18, 38, 'IN', '3rd trip has been completed'),
(19, 38, 'OUT', '4th trip has been started'),
(20, 38, 'OUT', NULL),
(21, 36, 'IN', NULL),
(22, 36, 'OUT', NULL),
(23, 35, 'IN', '4th trip completed'),
(24, 35, 'IN', NULL)
;
For learning purposes, here is a self explanatory & detailed version http://sqlfiddle.com/#!15/d8bf4/1/0
The solution is based on calculating the 'running' count of trips from '35 to 38' and '38 to 35'. Solution is very specific to the OP query but can be optimized with a much shorter version...
with trip_38_to_35 as (
select * from zonecount
where (zoneid=38 and status='OUT') OR (zoneid=35 and status='IN')
order by id asc
)
, count_start_on_38 as (
select count(*) as start_on_38
from trip_38_to_35
where (zoneid=38 and status='OUT') AND
id <
( select max(id)
from trip_38_to_35
where (zoneid=35 and status='IN')
) /*do not count unfinished trips*/
)
, count_end_on_35 as (
select count(*) as end_on_35
from trip_38_to_35
where (zoneid=35 and status='IN')
) /*the other way of trip*/
, trip_35_to_38 as (
select * from zonecount
where (zoneid=35 and status='OUT') OR (zoneid=38 and status='IN')
order by id asc
)
,count_start_on_35 as (
select count(*) as start_on_35
from trip_35_to_38
where (zoneid=35 and status='OUT') AND
id <
( select max(id)
from trip_35_to_38
where (zoneid=38 and status='IN')
) /*do not count unfinished trips*/
)
,count_end_on_38 as (
select count(*) as end_on_38
from trip_35_to_38
where (zoneid=38 and status='IN')
)
/*sum the MIN of the two trips count*/
select
(case when end_on_35 > start_on_38 then start_on_38 else end_on_35 end) +
(case when end_on_38 > start_on_35 then start_on_35 else end_on_38 end)
from
count_start_on_38,
count_end_on_35,
count_start_on_35,
count_end_on_38
btw, 6 trips are calculated as per definition

SQL Server - Selecting periods without changes in data

What I am trying to do is to select periods of time where the rest of data in the table was stable based on one column and check was there a change in second column value in this period.
Table:
create table #stable_periods
(
[Date] date,
[Car_Reg] nvarchar(10),
[Internal_Damages] int,
[External_Damages] int
)
insert into #stable_periods
values ('2015-08-19', 'ABC123', 10, 10),
('2015-08-18', 'ABC123', 9, 10),
('2015-08-17', 'ABC123', 8, 9),
('2015-08-16', 'ABC123', 9, 9),
('2015-08-15', 'ABC123', 10, 10),
('2015-08-14', 'ABC123', 10, 10),
('2015-08-19', 'ABC456', 5, 3),
('2015-08-18', 'ABC456', 5, 4),
('2015-08-17', 'ABC456', 8, 4),
('2015-08-16', 'ABC456', 9, 4),
('2015-08-15', 'ABC456', 10, 10),
('2015-01-01', 'ABC123', 1, 1),
('2015-01-01', 'ABC456', NULL, NULL);
--select * from #stable_periods
-- Unfortunately I can’t post pictures yet but you get the point of how the table looks like
What I would like to receive is
Car_Reg FromDate ToDate External_Damages Have internal damages changed in this period?
ABC123 2015-08-18 2015-08-19 10 Yes
ABC123 2015-08-16 2015-08-17 9 Yes
ABC123 2015-08-14 2015-08-15 10 No
ABC123 2015-01-01 2015-01-01 1 No
ABC456 2015-08-19 2015-08-19 3 No
ABC456 2015-08-16 2015-08-18 4 Yes
ABC456 2015-08-15 2015-08-15 10 No
ABC456 2015-01-01 2015-01-01 NULL NULL
Basically to build period frames where [External_Damages] were constant and check did the [Internal_Damages] change in the same period (doesn't matter how many times).
I spend a lot of time trying but I am afraid that my level of abstraction thinking in much to low...
Will be great to see any suggestions.
Thanks,
Bartosz
I believe this is a form of Islands Problem.
Here is a solution using ROW_NUMBER and GROUP BY:
SQL Fiddle
WITH CTE AS(
SELECT *,
RN = DATEADD(DAY, - ROW_NUMBER() OVER(PARTITION BY Car_reg, External_Damages ORDER BY [Date]), [Date])
FROM #stable_periods
)
SELECT
Car_Reg,
FromDate = MIN([Date]),
ToDate = MAX([Date]) ,
External_Damages,
Change =
CASE
WHEN MAX(External_Damages) IS NULL THEN NULL
WHEN COUNT(DISTINCT Internal_Damages) > 1 THEN 'Yes'
ELSE 'No'
END
FROM CTE c
GROUP BY Car_Reg, External_Damages, RN
ORDER BY Car_Reg, ToDate DESC