Why is there no "product()" aggregate function in SQL? [duplicate]

Why is there no "product()" aggregate function in SQL? [duplicate] - sql

This question already has answers here:
is there a PRODUCT function like there is a SUM function in Oracle SQL?
(7 answers)
Closed 7 years ago.
When there are Sum(), min(), max(), avg(), count() functions, can someone help understand why there is no product() built-in function. And what will be the most efficient user-implementation of this aggregate function ?
Thanks,
Trinity

If you have exponential and log functions available, then:
PRODUCT(TheColumn) = EXP(SUM(LN(TheColumn)))

One can make a user-defined aggregate in SQL 2005 and up by using CLR. In Postgresql, you can do it in Postgres itself, likewise with Oracle

I'll focus on the question why it's not a standard function.
Aggregate function are basic statistical functions and product is not
Applied to common numerical data, the result will be in most cases out of range (overflow) so it is of little general use

It's probably left out because most people don't need it and it can be defined easily in most databases.
Solution for PostgreSQL:
CREATE OR REPLACE FUNCTION product_sfunc(state numeric, factor numeric)
RETURNS numeric AS $$
SELECT $1 * $2
$$ LANGUAGE sql;
CREATE AGGREGATE product (
sfunc = product_sfunc,
basetype = numeric,
stype = numeric,
initcond = '1'
);

You can simulate product() using cursors. If you let us know which database platform you're using, then we might be able to give you some sample code.

I can confirm that it is indeed rare to use a product() aggregate function, but I have a quite valid example, especially working with highly aggregated data that must be presented to users in a report.
It utilizes the exp(sum(ln( multiplyTheseColumnValues ))) "trick" as mentioned in another post and other internet sources.
The report (which should care about the display, and contain as least data calculation logic as possible, to provide better maintainability and flexibility) is basically displaying this data along with some graphics:
DESCR SUM
---------------------------------- ----------
money available in 2013 33233235.3
money spent in 2013 4253235.3
money bound to contracts in 2013 34333500
money spent 2013 in % of available 12
money bound 2013 in % of available 103
(In real life its a bit more complex and used in state budget scenarios.)
It aggregates quite some complex data found in the first 3 rows.
I do not want to calculate the percentage values of the following rows (4th and 5th) by:
doing it in the quite dumb (as it should be) report (which just takes any number of such rows with a descripiton descr and a number sum) with some fancy logic (using JasperReports, BIRT Reports or alike)
neither do I want to calculate the underlying data (money available, money spent, money bound) multiple times (since these are quite expensive operations) just to calculate the percentage values
So I used another trick involving the use of the product()-functionality.
(If somebody does know of a better way to achive this considering the above mentioned restrictions, I would be happy to know :-) )
The whole simplified example is available as one executable SQL below.
Maybe it could help convice some Oracle guys that this functionality is not as rare, or not worth providing, as it may seem at first thoughts.
with
-- we have some 10g database without pivot/unpivot functionality
-- what is interesting for various summary reports
sum_data_meta as (
select 'MA' as sum_id, 'money available in 2013' as descr, 1 as agg_lvl from dual
union all select 'MS', 'money spent in 2013', 1 from dual
union all select 'MB', 'money bound to contracts in 2013', 1 from dual
union all select 'MSP', 'money spent 2013 in % of available', 2 from dual
union all select 'MBP', 'money bound 2013 in % of available', 2 from dual
)
/* select * from sum_data_meta
SUM_ID DESCR AGG_LVL
------ ---------------------------------- -------
MA money available in 2013 1
MS money spent in 2013 1
MB money bound to contracts in 2013 1
MSP money spent 2013 in % of available 2
MBP money bound 2013 in % of available 2
*/
-- 1st level of aggregation with the base data (the data actually comes from complex (sub)SQLs)
,sum_data_lvl1_base as (
select 'MA' as sum_id, 33233235.3 as sum from dual
union all select 'MS', 4253235.3 from dual
union all select 'MB', 34333500 from dual
)
/* select * from sum_data_lvl1_base
SUM_ID SUM
------ ----------
MA 33233235.3
MS 4253235.3
MB 34333500.0
*/
-- 1st level of aggregation with enhanced meta data infos
,sum_data_lvl1 as (
select
m.descr,
b.sum,
m.agg_lvl,
m.sum_id
from sum_data_meta m
left outer join sum_data_lvl1_base b on (b.sum_id=m.sum_id)
)
/* select * from sum_data_lvl1
DESCR SUM AGG_LVL SUM_ID
---------------------------------- ---------- ------- ------
money available in 2013 33233235.3 1 MA
money spent in 2013 4253235.3 1 MS
money bound to contracts in 2013 34333500.0 1 MB
money spent 2013 in % of available - 2 MSP
money bound 2013 in % of available - 2 MBP
*/
select
descr,
case
when agg_lvl < 2 then sum
when agg_lvl = 2 then -- our level where we have to calculate some things based on the previous level calculations < 2
case
when sum_id = 'MSP' then
-- we want to calculate MS/MA by tricky aggregating the product of
-- (MA row:) 1/33233235.3 * (MS:) 4253235.3/1 * (MB:) 1/1 * (MSP:) 1/1 * (MBP:) * 1/1
trunc( -- cut of fractions, e.g. 12.7981 => 12
exp(sum(ln( -- trick simulating product(...) as mentioned here: http://stackoverflow.com/a/404761/1915920
case when sum_id = 'MS' then sum else 1 end
/ case when sum_id = 'MA' then sum else 1 end
)) over ()) -- "over()" => look at all resulting rows like an aggregate function
* 100 -- % display style
)
when sum_id = 'MBP' then
-- we want to calculate MB/MA by tricky aggregating the product as shown above with MSP
trunc(
exp(sum(ln(
case when sum_id = 'MB' then sum else 1 end
/ case when sum_id = 'MA' then sum else 1 end
)) over ())
* 100
)
else -1 -- indicates problem
end
else null -- will be calculated in a further step later on
end as sum,
agg_lvl,
sum_id
from sum_data_lvl1
/*
DESCR SUM AGG_LVL SUM_ID
---------------------------------- ---------- ------- ------
money available in 2013 33233235.3 1 MA
money spent in 2013 4253235.3 1 MS
money bound to contracts in 2013 34333500 1 MB
money spent 2013 in % of available 12 2 MSP
money bound 2013 in % of available 103 2 MBP
*/

Since the Product is noting but the multiple of SUM, so in SQL they didnot introduce the Product aggregate function
For example: 6 * 4 can be acheived by
either adding 6, 4 times to itself like 6+6+6+6
or
adding 4, 6 times to itself like 4+4+4+4+4+4
thus giving the same result

Related

Issue with Joining Tables in SQL

SQL newbie here, using Zoho Analytics to do some reporting, specifically with prorated forecasting of lead generation. I successfully created some tables that contain lead goals, and joined them onto matching leads based off of the current month. The problem I am having is I would like to be able to access my prorated goals even if I filter so that there are no leads that have been created yet. This will make more sense in the picture I attached, with an RPM gauge that cannot pull the target or maximum because no leads match the filter criteria. How do I join the tables (with maybe an ifnull statement?) so that even if no lead ID's match, I can still output my goals? Thanks so much in advance.
RPM Gauge With prorated target and monthly goal
RPM gauge settings, distinct count of Lead Id's
Base table with goal used in Query table
Query table, forgive me I am new
Sorry for what I am sure is a fundamental misunderstanding of how this works, I have had to teach myself everything I know about SQL, and I am apparently not a terribly great teacher.
Thanks!
I have tried using a right join, and an ifnull statement but it did not improve matters.
Edit- Sorry for the first post issues- here is the code and tables not in image form
Lead Table Example-
ID
Lead Created Time
Lead Type
12345
11/21/2022
Charge
12346
10/17/2020
Store
12347
08/22/2022
Enhance
I purposefully left out an entry that would match my filter criteria, as for the first few days of the month this often comes up. Ideally I would still like to get the prorated and total goals returned.
The table the query is pulling from to determine the prorated numbers-
Start Date
End Date
Prorating decimal
Charge
Enhance
Store
Service
Charge[PR]
Enhance[PR]
Store[PR]
Service[PR]
Total Leads
Total Leads[PR]
Jan 01 2022
Jan 31 2022
.1
15
12
15
20
1.5
1.2
1.5
2.0
62
6.2
Feb 01 2022
Feb 28 2022
.1
15
12
15
20
1.5
1.2
1.5
2.0
62
6.2
Mar 01 2022
Mar 31 2022
.1
15
12
15
20
1.5
1.2
1.5
2.0
62
6.2
^For simplicity's sake I did not change the goals month to month, but they would in reality.
Idea for a successful data table, [PR] meaning prorated-
Sum of Lead Id's
Storage Goal
Storage Goal[PR]
Charge Goal
Charge Goal [PR]
14
10
1
15
2
1
10
1
15
2
0
10
1
15
2
The SQL Query that I have that returns the blank gauge when no leads match my criteria(Created this month, and lead type=Store)
SELECT
"Leads"."Id",
"SSS - 2022 Leads Forecast [Job Type]".*
FROM "Leads"
RIGHT JOIN "SSS - 2022 Leads Forecast [Job Type]" ON ((GETDATE() >= "Start Date")
AND (GETDATE() <= "End Date"))
Thanks so much to everyone who helped me reformat, first time poster so still learning the ropes. Let me know if I can provide more context or better info.

Figured this out! I used subqueries, filtering manually in the query instead of through the analytics widget, and did a distinct count to return zero instead of null, as well as coalescing for the dollar amount to return zero. (Not applicable in the below example) Below I have an example of some of the queries I used, as well as the resulting data table that is giving me the result that I want.
SELECT
( SELECT count(*)
FROM ( SELECT DISTINCT "Leads"."Id"
FROM "Leads"
WHERE "Lead Type" = 'Charge'
AND month_name("Created Time") = month_name(GETDATE())
AND year("Created Time") = year(GETDATE())
) AS 'test1'
) AS 'Charge Leads',
( SELECT count(*)
FROM ( SELECT DISTINCT "Leads"."Id"
FROM "Leads"
WHERE "Lead Type" = 'Store'
AND month_name("Created Time") = month_name(GETDATE())
AND year("Created Time") = year(GETDATE())
) AS 'test2'
) AS 'Store Leads',
( SELECT count(*)
FROM ( SELECT DISTINCT "Leads"."Id"
FROM "Leads"
WHERE "Lead Type" = 'Enhance'
AND month_name("Created Time") = month_name(GETDATE())
AND year("Created Time") = year(GETDATE())
) AS 'test3'
) AS 'Enhance Leads',
( SELECT count(*)
FROM ( SELECT DISTINCT "Leads"."Id"
FROM "Leads"
WHERE "Lead Type" = 'Service'
AND month_name("Created Time") = month_name(GETDATE())
AND year("Created Time") = year(GETDATE())
) AS 'test4'
) AS 'Service Leads',
"SSS - 2022 Leads Forecast [Job Type]".*
FROM "SSS - 2022 Leads Forecast [Job Type]"
WHERE ((GETDATE() >= "Start Date")
AND (GETDATE() <= "End Date"))
I am 100% sure that there is a more efficient way to do this, but it works and that was the most pressing thing.
Here is the resulting data table, which is exactly what I needed-
Charge Leads
Store Leads
Enhance Leads
Service Leads
Start Date
End Date
[PR] Charge
[PR] Enhance
[PR] Store
[PR] Service
[PR] Total Leads
[Total] Charge
[Total] Enhance
[Total] Store
[Total] Service
[Total] Total Leads
Prorating Decimal
7
0
5
35
01 Dec 2022
31 Dec 2022
64
34
17
56
171
152
81
40
134
407
.419
The [PR] are the prorated goals, so where we should be at this point in the month, and [Total] is the total goal for the month.

How to calculate a percentage on different values from same column with different criteria

I'm trying to write a query in SSRS (using SQL) to calculate an income statement percentage of sales for each month (the year is a parameter chosen by the user at runtime). However, the table I have to use for the data lists all of the years, months, accounts, dollars, etc together and looks like this:
ACCT_YEAR
ACCT_PERIOD
ACCOUNT_ID
CREDIT_AMOUNT
2021
1
4000
20000
2021
2
4000
25000
2021
1
5000
5000
2021
2
5000
7500
2021
1
6000
4000
2021
2
6000
8000
etc, etc (ACCOUNT_ID =4000 happens to be the sales account)
As an example,
I need to calculate
CREDIT_AMOUNT when ACCT_YEAR = 2021, ACCT_PERIOD=1, and ACCOUNT_ID=5000
/
CREDIT_AMOUNT when ACCT_YEAR = 2021, ACCT_PERIOD=1, and ACCOUNT_ID=4000
* 100
I would then do that for each ACCT_PERIOD in the ACCT_YEAR.
Hope that makes sense...What I want would look like this:
ACCT_YEAR
ACCT_PERIOD
ACCOUNT_ID
PERCENTAGE
2021
1
5000
0.25
2021
2
5000
0.30
2021
1
6000
0.20
2021
2
6000
0.32
I'm trying to create a graph that shows the percentage of sales of roughly 10 different accounts (I know their specific account_ID's and will filter by those ID's) and use the line chart widget to show the trends by month.
I've tried CASE scenarios, OVER scenarios, and nested subqueries. I feel like this should be simple but I'm being hardheaded and not seeing the obvious solution.
Any suggestions?
Thank you!

One important behaviour to note is that window functions are applied after the where clause.
Because you need the window functions to be applied before any where clause (which would filter account 4000 out), they need to be used in one scope, and the where clause in another scope.
WITH
perc AS
(
SELECT
*,
credit_amount * 100.0
/
SUM(
CASE WHEN account_id = 4000 THEN credit_amount END
)
OVER (
PARTITION BY acct_year, accr_period
)
AS credit_percentage
FROM
your_table
)
SELECT
*
FROM
perc
WHERE
account_id IN (5000,6000)

You just to use a matrix with a parent column group for ACCT_YEAR and a child column group for ACCT_PERIOD. Then you can use your calculation. If you format the textbox for percentage, you won't need to multiply it by 100.
Textbox value: =IIF(ACCOUNT_ID=4000, Sum(CREDIT_AMOUNT), 0) = 0, 0, IIF(ACCOUNT_ID=5000, Sum(CREDIT_AMOUNT), 0) / IIF(ACCOUNT_ID=4000, Sum(CREDIT_AMOUNT), 0)

Update column in SQL Server 2008 - SSIS

If the Customer Transaction type is not mentioned in the database
If the model year is less than or equal to 2 years than the transaction
type should be updated as a Warranty.
Remaining 60 % of the customer data should be updated as Customer
pay and the 40% of the customer data should be updated as Warranty
randomly on each dealer.
I have a table for Model year table of this structure:
SlNo VehicleNo ModelYear
---- --------- ---------
1 AAAD1234 2012
2 VVV023333 2008
3 CRT456 2011
4 MTER6666 2010
Is it possible to achieve this using SSIS??
I have tried a query. Please help fix it
select
vehicleNo, Modelyear,
case
when DATEDIFF(year, ModelYear, GETDATE()) <= 2 then 'Warranty' END,
case
when COUNT(modelyear) * 100 / (select COUNT(*) from VehicleModel) > 2 then '100%' end,
case
when COUNT(modelyear) * 40 / (select COUNT(*) from VehicleModel) > 2 then '40%' end
from
vehiclemodel
group by
vehicleNo, Modelyear
Output
vehicleNo Modelyear (No column name) (No column name) (No column name)
--------- --------- ---------------- ---------------- ----------------
AAAD1234 2008 NULL 100% 40%
VVV023333 2010 Warranty 100% 40%
CRT456 2011 Warranty 100% 40%
MTER6666 2012 Warranty 100% 40%

what exactly are you trying to do with SSIS? Where are you moving data from and where are you inserting it to?
If you only need to run this query you don't need SSIS. You can do this logic in SQL.
If you need to insert this into another table or database I would also do the calculation on SQL (like you just did) and use it as source to an OleDBSourtce component and then insert it into your destination.
I think you have to provide more information so we can help you

SQL complex view for virtually showing data

I have a table with the following table.
----------------------------------
Hour Location Stock
----------------------------------
6 2000 20
9 2000 24
----------------------------------
So this shows stock against some of the hours in which there is a change in the quantity.
Now my requirement is to create a view on this table which will virtually show the data (if stock is not htere for a particular hour). So the data that should be shown is
----------------------------------
Hour Location Stock
----------------------------------
6 2000 20
7 2000 20 -- same as hour 6 stock
8 2000 20 -- same as hour 6 stock
9 2000 24
----------------------------------
That means even if the data is not there for some particular hour then we should show the last hour's stock which is having stock. And i have another table with all the available hours from 1-23 in a column.
I have tried partition over by method as given below. But i think i am missing some thing around this to get my requirement done.
SELECT
HOUR_NUMBER,
CASE WHEN TOTAL_STOCK IS NULL
THEN SUM(TOTAL_STOCK)
OVER (
PARTITION BY LOCATION
ORDER BY CURRENT_HOUR ROWS 1 PRECEDING
)
ELSE
TOTAL_STOCK
END AS FULL_STOCK
FROM
(
SELECT HOUR_NUMBER AS HOUR_NUMBER
FROM HOURS_TABLE -- REFEERENCE TABLE WITH HOURS FROM 1-23
GROUP BY 1
) HOURS_REF
LEFT OUTER JOIN
(
SEL CURRENT_HOUR AS CURRENT_HOUR
, STOCK AS TOTAL_STOCK
,LOCATION AS LOCATION
FROM STOCK_TABLE
WHERE STOCK<>0
) STOCKS
ON HOURS_REF.HOUR_NUMBER = STOCKS.CURRENT_HOUR
This query is giving all the hours with stock as null for the hours without data.
We are looking at ANSI sql solution so that it can be used on databases like Teradata.
I am thinking that i am using partition over by wrongly or is there any other way. We tried with CASE WHEN but that needs some kind of looping to check back for an hour with some stock.

I've run into similar problems before. It's often simpler to make sure that the data you need somehow gets into the database in the first place. You might be able to automate it with a stored procedure that runs periodically.
Having said that, did you consider trying COALESCE() with a scalar subquery? (Or whatever similar function your dbms supports.) I'd try it myself and post the SQL, but I'm leaving for work in two minutes.

Haven't tried, but along the lines of what Mike said:
SELECT a.hour
, COALESCE( a.stock
, ( select b.stock
from tbl.b
where b.hour=a.hour-1 )
) "stock"
FROM tbl a
Note: this will impact performance greatly.

Thanks for your responses. I have tried out RECURSIVE VIEW for the above requirement and is giving correct results (I am fearing about the CPU usage for big tables as it is recursive). So here is stock table
----------------------------------
Hour Location Stock
----------------------------------
6 2000 20
9 2000 24
----------------------------------
Then we will have a view on this table which will give all 12 hours data using Left outer join.
----------------------------------
Hour Location Stock
----------------------------------
6 2000 20
7 2000 NULL
8 2000 NULL
9 2000 24
----------------------------------
Then we will have a recursive view which joins the table recursively with the same view to get the Stock of each hour moved one hour up and appended with level of data coming incremented.
REPLACE RECURSIVE VIEW HOURLY_STOCK_VIEW
(HOUR_NUMBER,LOCATION, STOCK, LVL)
AS
(
SELECT
HOUR_NUMBER,
LOCATION,
STOCK,
1 AS LVL
FROM STOCK_VIEW_WITH_LEFT_OUTER_JOIN
UNION ALL
SELECT
STK.HOUR_NUMBER,
THE_VIEW.LOCATION,
THE_VIEW.STOCK,
LVL+1 AS LVL
FROM STOCK_VIEW_WITH_LEFT_OUTER_JOIN STK
JOIN
HOURLY_STOCK_VIEW THE_VIEW
ON THE_VIEW.HOUR_NUMBER = STK.HOUR_NUMBER -1
WHERE LVL <=12
)
;
You can observe that first we select from the Left outer joined view then we union it with the left outer join view joined on the same view which we are creating and giving it its level at which data is coming.
Then we select the data from this view with the minimum level.
SEL * FROM HOURLY_STOCK_VIEW
WHERE
(
HOUR_NUMBER,
LVL
)
IN
(
SEL
HOUR_NUMBER,
MIN(LVL)
FROM HOURLY_STOCK_VIEW
WHERE STOCK IS NOT NULL
GROUP BY 1
)
;
This is working fine and giving the result as
----------------------------------
Hour Location Stock
----------------------------------
6 2000 20
7 2000 20 -- same as hour 6 stock
8 2000 20 -- same as hour 6 stock
9 2000 24
10 2000 24
11 2000 24
12 2000 24
----------------------------------
I know this is going to take huge CPU for large tables to get the recursion work ( we are limiting the recursion to only 12 levels as 12 hours data is needed to stop it go into infinite loop). But I thought some body can use this for some kind of Hierarchy building. I will look for some more responses from you guys on any other approaches available. Thanks. You can have a look at Recursive views in the below link for teradata.
http://forums.teradata.com/forum/database/recursion-in-a-stored-procedure

The most common uses of view is, the removal of complexity.
For example:
CREATE VIEW FEESTUDENT
AS
SELECT S.NAME,F.AMOUNT FROM STUDENT AS S
INNER JOIN FEEPAID AS F ON S.TKNO=F.TKNO
Now do a SELECT:
SELECT * FROM FEESTUDENT

Can I use SQL to plot actual dates based on schedule information?

If I have a table containing schedule information that implies particular dates, is there a SQL statement that can be written to convert that information into actual rows, using some sort of CROSS JOIN, perhaps?
Consider a payment schedule table with these columns:
StartDate - the date the schedule begins (1st payment is due on this date)
Term - the length in months of the schedule
Frequency - the number of months between recurrences
PaymentAmt - the payment amount :-)
SchedID StartDate Term Frequency PaymentAmt
-------------------------------------------------
1 05-Jan-2003 48 12 1000.00
2 20-Dec-2008 42 6 25.00
Is there a single SQL statement to allow me to go from the above to the following?
Running
SchedID Payment Due Expected
Num Date Total
--------------------------------------
1 1 05-Jan-2003 1000.00
1 2 05-Jan-2004 2000.00
1 3 05-Jan-2005 3000.00
1 4 05-Jan-2006 4000.00
1 5 05-Jan-2007 5000.00
2 1 20-Dec-2008 25.00
2 2 20-Jun-2009 50.00
2 3 20-Dec-2009 75.00
2 4 20-Jun-2010 100.00
2 5 20-Dec-2010 125.00
2 6 20-Jun-2011 150.00
2 7 20-Dec-2011 175.00
I'm using MS SQL Server 2005 (no hope for an upgrade soon) and I can already do this using a table variable and while loop, but it seemed like some sort of CROSS JOIN would apply but I don't know how that might work.
Your thoughts are appreciated.
EDIT: I'm actually using SQL Server 2005 though I initially said 2000. We aren't quite as backwards as I thought. Sorry.

I cannot test the code right now, so take it with a pinch of salt, but I think that something looking more or less like the following should answer the question:
with q(SchedId, PaymentNum, DueDate, RunningExpectedTotal) as
(select SchedId,
1 as PaymentNum,
StartDate as DueDate,
PaymentAmt as RunningExpectedTotal
from PaymentScheduleTable
union all
select q.SchedId,
1 + q.PaymentNum as PaymentNum,
DATEADD(month, s.Frequency, q.DueDate) as DueDate,
q.RunningExpectedTotal + s.PaymentAmt as RunningExpectedTotal
from q
inner join PaymentScheduleTable s
on s.SchedId = q.SchedId
where q.PaymentNum <= s.Term / s.Frequency)
select *
from q
order by SchedId, PaymentNum

Try using a table of integers (or better this: http://www.sql-server-helper.com/functions/integer-table.aspx) and a little date math, e..g. start + int * freq

I've used table-valued functions to achieve a similar result. Basically the same as using a table variable I know, but I remember being really pleased with the design.
The usage ends up reading very well, in my opinion:
/* assumes #startdate and #enddate schedule limits */
SELECT
p.paymentid,
ps.paymentnum,
ps.duedate,
ps.ret
FROM
payment p,
dbo.FUNC_get_payment_schedule(p.paymentid, #startdate, #enddate) ps
ORDER BY p.paymentid, ps.paymentnum

A typical solution is to use a Calendar table. You can expand it to fit your own needs, but it would look something like:
CREATE TABLE Calendar
(
calendar_date DATETIME NOT NULL,
is_holiday BIT NOT NULL DEFAULT(0),
CONSTRAINT PK_Calendar PRIMARY KEY CLUSTERED calendar_date
)
In addition to the is_holiday you can add other columns that are relevant for you. You can write a script to populate the table up through the next 10 or 100 or 1000 years and you should be all set. It makes queries like that one that you're trying to do much simpler and can give you additional functionality.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas