Joining tables with recent date for each row then weighted averaging - sql

There are three tables, such as equip_type , output_history, and time_history in Oracle DB.
Is there a way to join the three tables as shown below at (1) and then to get weighted average as shown below at (2)?
--equip_type table and the date
CREATE TABLE equip_type (
EQUIP_TYPE VARCHAR(60),
EQUIP VARCHAR(60)
);
INSERT INTO equip_type VALUES ('A','e1');
-- output_history and data
CREATE TABLE output_history (
EQUIP VARCHAR(60),
MODEL VARCHAR(60),
Data1 VARCHAR(60),
QUANTITY NUMBER(10)
);
INSERT INTO output_history VALUES ('e1','m1','20180103',10);
INSERT INTO output_history VALUES ('e1','m1','20180106',20);
--time_history table and data
CREATE TABLE time_history (
EQUIP VARCHAR(60),
MODEL VARCHAR(60),
Data2 VARCHAR(60),
time NUMBER(10)
);
INSERT INTO time_history VALUES ('e1','m1','20180101',6);
INSERT INTO time_history VALUES ('e1','m1','20180105',5);
(1) How to get joined table as below?
EQUIP MODEL DATE1 QUANTITY DATE2 TIME TYPE
---- ---- ---------- ------ -------- ---- ----
e1 m1 20180103 10 20180101 6 A
e1 m1 20180106 20 20180105 5 A
For each row in OUTPUT_HISTORY, *the most recent row at the point of the DATE1*in TIME_HISTORY is joined.
(2) Then, With the joined table above, how to get weighted average of TIME?
(QUANTITY * TIME) / sum of QUANTITY group by TYPE, MODEL
for example,(10×6 + 20×5)÷(10+20) for equip type A and model m1

One method uses analytic functions to get the most recent record and then simple aggregation
select sum(quantity * time) / sum(quantity)
from output_history oh left join
(select th.*,
row_number() over (partition by equip, model order by date2 desc) as seqnum
from time_history th
) th
on oh.equip = th.equip and oh.model = th.model and th.seqnum = 1
group by equip, model;

Related

SUM multiple tables GROUP BY column

I have the following SQL Server tables:
create table Cars
(
CarID int,
CarType varchar(50),
PlateNo varchar(20)
);
create table Fuelings
(
CarID int,
FuelingDate date,
Odometer int,
Quantity decimal,
Cost money
);
create table Maintenances
(
CarID int,
MaintenanceDate date,
MaintenanceCost money
);
I'm having problems calculating the fuel consumption grouped by the column CarType. To get the fuel consumption I want to calculate the distance and fuel quantity for each car, then sum that up grouped by the column CarType.
What I have now:
SELECT DISTINCT C.CarType AS [Car type],
SUM(M.MaintenanceCost) AS [Maintenance],
SUM(F.Cost) AS [Fuel],
(MAX(Odometer)-MIN(Odometer)) AS [Distance],
(SUM(Quantity)*100)/(MAX(Odometer)-MIN(Odometer)) AS [L/100km]
FROM Cars AS C
LEFT JOIN Maintenances AS M ON M.CarID=C.CarID
AND M.MaintenanceDate BETWEEN '2021-01-01 00:00:00' AND '2021-01-31 23:59:29'
LEFT JOIN Fuelings AS F ON F.CarID=C.CarID
AND F.FuelingDate BETWEEN '2021-01-01 00:00:00' AND '2021-01-31 23:59:29'
GROUP BY C.CarType
Desired result for type 'SUV':
Total fuel quantity: 301
Total distance: 1600
Consumption: 18,8125
See here: http://sqlfiddle.com/#!18/2636c/18
I used a common table expression cte_car to make a first grouping and get all the details of a single car. Then a final grouping is done to get the totals accross the car types.
Sample data
I left out the Maintenances table because it is not needed for the consumptions.
create table Cars
(
CarID int,
CarType varchar(50),
PlateNo varchar(20)
);
insert into Cars (CarID, CarType, PlateNo) values
(1,'Coupe','BC18341'),
(2,'Hatchback','AU14974'),
(3,'Hatchback','BC49207'),
(4,'SUV','AU10299'),
(5,'Coupe','AU32703'),
(6,'Coupe','BC51719'),
(7,'Hatchback','AU30325'),
(8,'SUV','BC52018');
create table Fuelings
(
CarID int,
FuelingDate date,
Odometer int,
Quantity decimal,
Cost money
);
insert into Fuelings (CarID, FuelingDate, Odometer, Quantity, Cost) values
(1,'2021-01-02', 124520, 53.28, 78.32),
(1,'2021-01-15', 124810, 49.17, 68.34),
(1,'2021-01-28', 125130, 51.74, 69.13),
(2,'2021-01-05', 344380, 49.10, 72.81),
(2,'2021-01-18', 344540, 54.98, 69.37),
(2,'2021-01-29', 344990, 52.76, 66.83),
(3,'2021-01-01', 874200, 45.27, 73.48),
(3,'2021-01-19', 874770, 46.75, 67.91),
(3,'2021-01-26', 874930, 52.15, 75.50),
(4,'2021-01-03', 414190, 50.88, 71.72),
(4,'2021-01-14', 414400, 51.94, 68.15),
(4,'2021-01-29', 415140, 48.30, 77.82),
(5,'2021-01-06', 294240, 48.15, 71.48),
(5,'2021-01-19', 294680, 53.86, 66.80),
(5,'2021-01-30', 294890, 51.54, 74.31),
(6,'2021-01-01', 934220, 49.26, 69.98),
(6,'2021-01-18', 934520, 51.35, 71.50),
(6,'2021-01-25', 934970, 54.63, 65.72),
(7,'2021-01-05', 584110, 51.42, 74.29),
(7,'2021-01-22', 584430, 49.36, 69.95),
(7,'2021-01-31', 584750, 49.84, 73.18),
(8,'2021-01-02', 654280, 53.87, 77.75),
(8,'2021-01-17', 654730, 45.32, 67.48),
(8,'2021-01-29', 654930, 50.75, 69.80);
Solution
with cte_car as
(
select c.CarId,
c.CarType,
max(f.Odometer) - min(f.Odometer) as CarDistance,
sum(f.Quantity) as CarQuantity
from Cars c
join Fuelings f
on f.CarId = c.CarId
group by c.CarId,
c.CarType
)
select cc.CarType,
sum(cc.CarDistance) as TotalDistance,
sum(cc.CarQuantity) as TotalQuantity,
sum(cc.CarQuantity) * 100.0 / sum(cc.CarDistance) as TotalConsumption
from cte_car cc
group by cc.CarType;
Result
CarType TotalDistance TotalQuantity TotalConsumption
------- ------------- ------------- ----------------
Coupe 2010 463 23.034825
Hatchback 1980 451 22.777777
SUV 1600 301 18.8125
Fiddle to see things in action.

How to insert a column which sets unique id based on values in another column (SQL)?

I will create table where I will insert multiple values for different companies. Basically I have all values that are in the table below but I want to add a column IndicatorID which is linked to IndicatorName so that every indicator has a unique id. This will obviously not be a PrimaryKey.
I will insert the data with multiple selects:
CREATE TABLE abc
INSERT INTO abc
SELECT company_id, 'roe', roevalue, metricdate
FROM TABLE1
INSERT INTO abc
SELECT company_id, 'd/e', devalue, metricdate
FROM TABLE1
So, I don't know how to add the IndicatorID I mentioned above.
EDIT:
Here is how I populate my new table:
INSERT INTO table(IndicatorID, Indicator, Company, Value, Date)
SELECT [the ID that I need], 'NI_3y' as 'Indicator', t.Company, avg(t.ni) over (partition by t.Company order by t.reportdate rows between 2 preceding and current row) as 'ni_3y',
t.reportdate
FROM table t
LEFT JOIN IndicatorIDs i
ON i.Indicator = roe3 -- the part that is not working if I have separate indicatorID table
I am going to insert different indicators for the same companies. And I want indicatorID.
Your "indicator" is a proper entity in its own right. Create a table with all indicators:
create table indicators (
indicator_id int identity(1, 1) primary key,
indicator varchar(255)
);
Then, use the id only in this table. You can look up the value in the reference table.
Your inserts are then a little more complicated:
INSERT INTO indicators (indicator)
SELECT DISTINCT roevalue
FROM table1 t1
WHERE NOT EXISTS (SELECT 1 FROM indicators i2 WHERE i2.indicator = t1.roevalue);
Then:
INSERT INTO ABC (indicatorId, companyid, value, date)
SELECT i.indicatorId, t1.company, v.value, t1.metricdate
FROM table1 t1 CROSS APPLY
(VALUES ('roe', t1.roevalue), ('d/e', t1.devalue)
) v(indicator, value) JOIN
indicators i
ON i.indicator = v.indicator;
This process is called normalization and it is the typical way to store data in a database.
DDL and INSERT statement to create an indicators table with a unique constraint on indicator. Because the ind_id is intended to be a foreign key in the abc table it's created as a non-decomposable surrogate integer primary key using the IDENTITY property.
drop table if exists test_indicators;
go
create table test_indicators (
ind_id int identity(1, 1) primary key not null,
indicator varchar(20) unique not null);
go
insert into test_indicators(indicator) values
('NI'),
('ROE'),
('D/E');
The abc table depends on the ind_id column from indicators table as a foreign key reference. To populate the abc table company_id's are associated with ind_id's.
drop table if exists test_abc
go
create table test_abc(
a_id int identity(1, 1) primary key not null,
ind_id int not null references test_indicators(ind_id),
company_id int not null,
val varchar(20) null);
go
insert into test_abc(ind_id, company_id)
select ind_id, 102 from test_indicators where indicator='NI'
union all
select ind_id, 103 from test_indicators where indicator='ROE'
union all
select ind_id, 104 from test_indicators where indicator='D/E'
union all
select ind_id, 103 from test_indicators where indicator='NI'
union all
select ind_id, 105 from test_indicators where indicator='ROE'
union all
select ind_id, 102 from test_indicators where indicator='NI';
Query to get result
select i.ind_id, a.company_id, i.indicator, a.val
from test_abc a
join test_indicators i on a.ind_id=i.ind_id;
Output
ind_id company_id indicator val
1 102 NI NULL
2 103 ROE NULL
3 104 D/E NULL
1 103 NI NULL
2 105 ROE NULL
1 102 NI NULL
I was finally able to find the solution for my problem which seems to me very simple, although it took time and asking different people about it.
First I create my indicators table where I assign primary key for all indicators I have:
CREATE TABLE indicators (
indicator_id int identity(1, 1) primary key,
indicator varchar(255)
);
Then I populate easy without using any JOINs or CROSS APPLY. I don't know if this is optimal but it seems as the simplest choice:
INSERT INTO table(IndicatorID, Indicator, Company, Value, Date)
SELECT
(SELECT indicator_id from indicators i where i.indicator = 'NI_3y) as IndicatorID,
'NI_3y' as 'Indicator',
Company,
avg(ni) over (partition by Company order by reportdate rows between 2 preceding and current row) as ni_3y,
reportdate
FROM TABLE1

Join Sales table with Sales Region table based on Sales person and the date of the sale

I have a table of sales, but it does not include the region of the sale. I also have a table of the assignment of our sales people based on the region and dates they were assigned. I want to join the tables so I can grab the region and include it into my sales table.
I join on the sales person's initial (key), but I also want to compare the date of the sale to the region start and region stop to join the correct region. I tried using the sale date BETWEEN the start and stop, but that did not work because if they are still currently in the region, it provides a NULL value.
Thanks for any help, Brent
IF NOT EXISTS (
select * from sysobjects where name='sales' and xtype='U'
)CREATE TABLE sales (
[Sale_Date] DATETIME,
[Sales_Person] NVARCHAR(3),
[Sales_Amount] INT,
[Region] INT
);
INSERT INTO sales VALUES
('2016-07-01 00:00:00',N'MDD',152,NULL),
('2016-09-21 00:00:00',N'MDD',278,NULL),
('2018-03-01 00:00:00',N'STE',385,NULL),
('2018-04-01 00:00:00',N'MDD',426,NULL),
('2019-02-25 00:00:00',N'MDD',224,NULL),
('2020-02-15 00:00:00',N'STE',261,NULL),
('2020-03-01 00:00:00',N'STE',480,NULL),
('2020-06-05 00:00:00',N'BBB',245,NULL),
('2020-07-05 00:00:00',N'BBB',178,NULL);
IF NOT EXISTS (
select * from sysobjects where name='SalesPersonAssignment' and xtype='U'
) CREATE TABLE SalesPersonAssignment (
[sales_person] NVARCHAR(4),
[Region_ID] INT,
[Region_Name] NVARCHAR(6),
[Region_Start_Date] DATETIME,
[Region_Stop_Date] NVARCHAR(10)
);
INSERT INTO SalesPersonAssignment VALUES
(N'MDD',2,N'North ','2015-01-05 00:00:00',N'12/31/2017'),
(N'MDD',6,N'West','2018-01-01 00:00:00',N'NULL'),
(N'STE ',6,N'West','2018-10-02 00:00:00',N'12/31/2019'),
(N'STE',2,N'North ','2020-01-01 00:00:00',N'NULL'),
(N'BBB',1,N'South','2019-01-01 00:00:00',N'NULL');
Select s.Sale_Date, s.Sales_Amount, s.Sales_Person, spa.Region_Name
FROM sales s LEFT OUTER JOIN SalesPersonAssignment spa ON s.Sales_Person = spa.sales_person
--join based on the sales date and region's start/stop date of the sales person
You are storing a date as a string. And the time component is unnecessary. The better approach is:
CREATE TABLE SalesPersonAssignment (
[sales_person] NVARCHAR(4),
[Region_ID] INT,
[Region_Name] NVARCHAR(6),
[Region_Start_Date] DATE,
[Region_Stop_Date] DATE
);
 
INSERT INTO SalesPersonAssignment VALUES
(N'MDD',2,N'North ','2015-01-05','2017-12-31'),
(N'MDD',6,N'West','2018-01-01', NULL),
(N'STE ',6,N'West','2018-10-02', '2019-12-31'),
(N'STE',2,N'North ','2020-01-01', NULL),
(N'BBB',1,N'South','2019-01-01', NULL);
Just include those conditions in the ON clause:
SELECT s.Sale_Date, s.Sales_Amount, s.Sales_Person, spa.Region_Name
FROM sales s LEFT OUTER JOIN
SalesPersonAssignment spa
ON s.Sales_Person = spa.sales_person AND
s.Sale_Date >= spa.Region_Start_Date AND
(s.Sale_Date <= spa.Region_End_Date OR spa.Region_End_Date IS NULL);

Getting first one depending on the current row

There are three tables, such as equip_type, output_history, and time_history in Oracle DB. How to join the three tables as shown below?
(DBMS: Oracle)
EQUIP MODEL DATE1 QUANTITY DATE2 TIME EQUIP_TYPE
---- ---- ---------- ------ -------- ---- ----------
e1 m1 20180103 10 20180101 6 A
e1 m1 20180106 20 20180105 5 A
Notice that at the point of DATE1 '20180103' in output_history, DATE2 '20180101' in time_history is the most recent one.
At the point of DATE1 '20180106' in output_history, 'DATE2 20180105' in time_history is the most recent one.
--equip_type table and the date
CREATE TABLE equip_type (
EQUIP_TYPE VARCHAR(60),
EQUIP VARCHAR(60)
);
INSERT INTO equip_type VALUES ('A','e1');
-- output_history and data
CREATE TABLE output_history (
EQUIP VARCHAR(60),
MODEL VARCHAR(60),
Data1 VARCHAR(60),
QUANTITY NUMBER(10)
);
INSERT INTO output_history VALUES ('e1','m1','20180103',10);
INSERT INTO output_history VALUES ('e1','m1','20180106',20);
--time_history table and data
CREATE TABLE time_history (
EQUIP VARCHAR(60),
MODEL VARCHAR(60),
Data2 VARCHAR(60),
time NUMBER(10)
);
INSERT INTO time_history VALUES ('e1','m1','20180101',6);
INSERT INTO time_history VALUES ('e1','m1','20180105',5);
You can use a correlated subquery with a NOT EXIST condition to select the closest related record in time_history.
I tested below query on MySQL in this db fiddle. You did not tag the RDBMS you are using. I tested on MySQL, but this is standard SQL that will work on most RDBMS.
SELECT
o.equip,
o.model,
o.data1,
o.quantity,
t.data2,
t.time,
e.equip_type
FROM
output_history o
INNER JOIN equip_type e ON e.equip = o.equip
INNER JOIN time_history t ON t.equip = o.equip AND t.data2 <= o.data1
WHERE NOT EXISTS (
SELECT 1
FROM time_history
WHERE
equip = o.equip
AND data2 <= o.data1
AND data2 > t.data2
)
Sid notes : the query will be always lookup the most recent time_history record before the current output_history record (even if there is a closest record in the future, it will not be selected)
Disclaimer : don't store dates as strings, this is a recipe for disaster. Use the relevant datatype according to your RDBMS. In your use case, it works only because dates are formated in a way that they can be easily sorted.

How to distribute budget value to actual rows in Postgresql

Budget table contains jobs with loads:
create temp table budget (
job char(20) primary key,
load numeric(4,1) not null check (load>0 )
);
insert into budget values ( 'programmer', 3 );
insert into budget values ( 'analyst', 1.5 );
Actual table contains actual loads by employees:
create temp table actual (
job char(20),
employee char(20),
load numeric(4,1) not null check (load>0 ),
contractdate date,
primary key (job, employee)
);
insert into actual values ( 'programmer', 'John', 1, '2014-01-01' );
-- half time programmer:
insert into actual values ( 'programmer', 'Bill', 0.5, '2014-01-02' );
insert into actual values ( 'analyst', 'Aldo', 1, '2014-01-03' );
insert into actual values ( 'analyst', 'Margaret', 1, '2014-01-04' );
Result table should show difference between budget and actual jobs so that budget load is
distributed to employees in contract date order.
If budget load is greater than sum of job loads, separate budget line with empty employee
should appear.
In data above, 1.5 programmers are missing and 0.5 analysts are more.
Result should be
Job Employee Budget Actual Difference
programmer John 1 1 0
programmer Bill 0.5 0.5 0
programmer 1.5 0 1.5
analyst Aldo 1 1 0
analyst Margaret 0.5 1 -0.5
How to create such table in modern Postgresql ?
Can rank function with full join used or other idea ?
I tried
select
coalesce(budget.job, actual.job ) as job,
employee,
budget.load as budget,
coalesce(actual.load,0) as actual,
coalesce(budget.load,0)-coalesce( actual.load,0) as difference
from budget full join actual using (job)
order by 1, contractdate
but this does not distribute budget load to employee rows.
I posted this also in pgsql-general mailing list.
The following query gets what you want:
select job, employee, budget, actual,
(budget - cumload) as diff, contractdate
from (select coalesce(b.job, a.job ) as job, a.contractdate,
a.employee,
b.load as budget,
coalesce(a.load,0) as actual,
sum(a.load) over (partition by a.job order by a.contractdate NULLS last) as cumload
from budget b join
(select a.*
from actual a
union all
select b.job, NULL, NULL, NULL
from budget b
) a
on b.job = a.job
) ab
where contractdate is not null or budget > cumload
order by job, contractdate
The SQL Fiddle is here.
Note that this uses union all to bring in the extra rows needed for the query. You wanted to do this with a full outer join, but that doesn't generate extra rows when the join conditions are met.
Also, the logic that you are looking for requires a cumulative sum, which Postgres happily provides.