Issue with Joining Tables in SQL - sql

SQL newbie here, using Zoho Analytics to do some reporting, specifically with prorated forecasting of lead generation. I successfully created some tables that contain lead goals, and joined them onto matching leads based off of the current month. The problem I am having is I would like to be able to access my prorated goals even if I filter so that there are no leads that have been created yet. This will make more sense in the picture I attached, with an RPM gauge that cannot pull the target or maximum because no leads match the filter criteria. How do I join the tables (with maybe an ifnull statement?) so that even if no lead ID's match, I can still output my goals? Thanks so much in advance.
RPM Gauge With prorated target and monthly goal
RPM gauge settings, distinct count of Lead Id's
Base table with goal used in Query table
Query table, forgive me I am new
Sorry for what I am sure is a fundamental misunderstanding of how this works, I have had to teach myself everything I know about SQL, and I am apparently not a terribly great teacher.
Thanks!
I have tried using a right join, and an ifnull statement but it did not improve matters.
Edit- Sorry for the first post issues- here is the code and tables not in image form
Lead Table Example-
ID
Lead Created Time
Lead Type
12345
11/21/2022
Charge
12346
10/17/2020
Store
12347
08/22/2022
Enhance
I purposefully left out an entry that would match my filter criteria, as for the first few days of the month this often comes up. Ideally I would still like to get the prorated and total goals returned.
The table the query is pulling from to determine the prorated numbers-
Start Date
End Date
Prorating decimal
Charge
Enhance
Store
Service
Charge[PR]
Enhance[PR]
Store[PR]
Service[PR]
Total Leads
Total Leads[PR]
Jan 01 2022
Jan 31 2022
.1
15
12
15
20
1.5
1.2
1.5
2.0
62
6.2
Feb 01 2022
Feb 28 2022
.1
15
12
15
20
1.5
1.2
1.5
2.0
62
6.2
Mar 01 2022
Mar 31 2022
.1
15
12
15
20
1.5
1.2
1.5
2.0
62
6.2
^For simplicity's sake I did not change the goals month to month, but they would in reality.
Idea for a successful data table, [PR] meaning prorated-
Sum of Lead Id's
Storage Goal
Storage Goal[PR]
Charge Goal
Charge Goal [PR]
14
10
1
15
2
1
10
1
15
2
0
10
1
15
2
The SQL Query that I have that returns the blank gauge when no leads match my criteria(Created this month, and lead type=Store)
SELECT
"Leads"."Id",
"SSS - 2022 Leads Forecast [Job Type]".*
FROM "Leads"
RIGHT JOIN "SSS - 2022 Leads Forecast [Job Type]" ON ((GETDATE() >= "Start Date")
AND (GETDATE() <= "End Date"))
Thanks so much to everyone who helped me reformat, first time poster so still learning the ropes. Let me know if I can provide more context or better info.

Figured this out! I used subqueries, filtering manually in the query instead of through the analytics widget, and did a distinct count to return zero instead of null, as well as coalescing for the dollar amount to return zero. (Not applicable in the below example) Below I have an example of some of the queries I used, as well as the resulting data table that is giving me the result that I want.
SELECT
( SELECT count(*)
FROM ( SELECT DISTINCT "Leads"."Id"
FROM "Leads"
WHERE "Lead Type" = 'Charge'
AND month_name("Created Time") = month_name(GETDATE())
AND year("Created Time") = year(GETDATE())
) AS 'test1'
) AS 'Charge Leads',
( SELECT count(*)
FROM ( SELECT DISTINCT "Leads"."Id"
FROM "Leads"
WHERE "Lead Type" = 'Store'
AND month_name("Created Time") = month_name(GETDATE())
AND year("Created Time") = year(GETDATE())
) AS 'test2'
) AS 'Store Leads',
( SELECT count(*)
FROM ( SELECT DISTINCT "Leads"."Id"
FROM "Leads"
WHERE "Lead Type" = 'Enhance'
AND month_name("Created Time") = month_name(GETDATE())
AND year("Created Time") = year(GETDATE())
) AS 'test3'
) AS 'Enhance Leads',
( SELECT count(*)
FROM ( SELECT DISTINCT "Leads"."Id"
FROM "Leads"
WHERE "Lead Type" = 'Service'
AND month_name("Created Time") = month_name(GETDATE())
AND year("Created Time") = year(GETDATE())
) AS 'test4'
) AS 'Service Leads',
"SSS - 2022 Leads Forecast [Job Type]".*
FROM "SSS - 2022 Leads Forecast [Job Type]"
WHERE ((GETDATE() >= "Start Date")
AND (GETDATE() <= "End Date"))
I am 100% sure that there is a more efficient way to do this, but it works and that was the most pressing thing.
Here is the resulting data table, which is exactly what I needed-
Charge Leads
Store Leads
Enhance Leads
Service Leads
Start Date
End Date
[PR] Charge
[PR] Enhance
[PR] Store
[PR] Service
[PR] Total Leads
[Total] Charge
[Total] Enhance
[Total] Store
[Total] Service
[Total] Total Leads
Prorating Decimal
7
0
5
35
01 Dec 2022
31 Dec 2022
64
34
17
56
171
152
81
40
134
407
.419
The [PR] are the prorated goals, so where we should be at this point in the month, and [Total] is the total goal for the month.

Related

SQL construct for particular output. Can one query give me this

Afternoon. I am using SQLServer 2008r2, I have this SQL:
SELECT dateName(mm,wfi.created) AS theMonth
, datePart(yyyy,wfi.created) AS theYear
, count(wf.WebFormsIndexID) AS numOfForms
FROM WebFormsInstances as wfi
LEFT OUTER JOIN WebFormsIndex as wf on wfi.webFormsIndexID = wf.WebFormsIndexID
where year(wfi.created) = year(getDate())
group by datePart(yyyy,wfi.created), datePart(mm,wfi.created), dateName(mm,wfi.created)
order by theYear,datePart(mm,wfi.created)
which gives me total number of all forms submitted by the month:
January 2015 799
February 2015 1282
March 2015 1450
...
There are around 50 different forms. The form name is wf.formName How can I restructure this so I can get total numbers for each individual form for each month. Something like:
myFormName1 January 2015 220
myFormName2 January 2015 179
I can figure out how to do this in two queries but would ideally like to do it in one? The objective is a report, form names down the page, months across the page with total number of forms in play for each month.
SELECT
wf.formName
dateName(mm,wfi.created) AS theMonth,
datePart(yyyy,wfi.created) AS theYear,
count(wf.WebFormsIndexID) AS numOfForms
FROM WebFormsInstances as wfi
LEFT OUTER JOIN WebFormsIndex as wf on wfi.webFormsIndexID = wf.WebFormsIndexID
where year(wfi.created) = year(getDate())
group by wf.formName, datePart(yyyy,wfi.created), datePart(mm,wfi.created)
order by wf.formName, datePart(yyyy,wfi.created), datePart(mm,wfi.created)

oracle sql: efficient way to calculate business days in a month

I have a pretty huge table with columns dates, account, amount, etc. eg.
date account amount
4/1/2014 XXXXX1 80
4/1/2014 XXXXX1 20
4/2/2014 XXXXX1 840
4/3/2014 XXXXX1 120
4/1/2014 XXXXX2 130
4/3/2014 XXXXX2 300
...........
(I have 40 months' worth of daily data and multiple accounts.)
The final output I want is the average amount of each account each month. Since there may or may not be record for any account on a single day, and I have a seperate table of holidays from 2011~2014, I am summing up the amount of each account within a month and dividing it by the number of business days of that month. Notice that there is very likely to be record(s) on weekends/holidays, so I need to exclude them from calculation. Also, I want to have a record for each of the date available in the original table. eg.
date account amount
4/1/2014 XXXXX1 48 ((80+20+840+120)/22)
4/2/2014 XXXXX1 48
4/3/2014 XXXXX1 48
4/1/2014 XXXXX2 19 ((130+300)/22)
4/3/2014 XXXXX2 19
...........
(Suppose the above is the only data I have for Apr-2014.)
I am able to do this in a hacky and slow way, but as I need to join this process with other subqueries, I really need to optimize this query. My current code looks like:
<!-- language: lang-sql -->
select
date,
account,
sum(amount/days_mon) over (partition by last_day(date))
from(
select
date,
-- there are more calculation to get the account numbers,
-- so this subquery is necessary
account,
amount,
-- this is a list of month-end dates that the number of
-- business days in that month is 19. similar below.
case when last_day(date) in ('','',...,'') then 19
when last_day(date) in ('','',...,'') then 20
when last_day(date) in ('','',...,'') then 21
when last_day(date) in ('','',...,'') then 22
when last_day(date) in ('','',...,'') then 23
end as days_mon
from mytable tb
inner join lookup_businessday_list busi
on tb.date = busi.date)
So how can I perform the above purpose efficiently? Thank you!
This approach uses sub-query factoring - what other RDBMS flavours call common table expressions. The attraction here is that we can pass the output from one CTE as input to another. Find out more.
The first CTE generates a list of dates in a given month (you can extend this over any range you like).
The second CTE uses an anti-join on the first to filter out dates which are holidays and also dates which aren't weekdays. Note that Day Number varies depending according to the NLS_TERRITORY setting; in my realm the weekend is days 6 and 7 but SQL Fiddle is American so there it is 1 and 7.
with dates as ( select date '2014-04-01' + ( level - 1) as d
from dual
connect by level <= 30 )
, bdays as ( select d
, count(d) over () tot_d
from dates
left join holidays
on dates.d = holidays.hol_date
where holidays.hol_date is null
and to_number(to_char(dates.d, 'D')) between 2 and 6
)
select yt.account
, yt.txn_date
, sum(yt.amount) over (partition by yt.account, trunc(yt.txn_date,'MM'))
/tot_d as avg_amt
from your_table yt
join bdays
on bdays.d = yt.txn_date
order by yt.account
, yt.txn_date
/
I haven't rounded the average amount.
You have 40 month of data, this data should be very stable.
I will assume that you have a cold body (big and stable easily definable range of data) and hot tail (small and active part).
Next, I would like to define a minimal period. It is a data range that is a smallest interval interesting for Business.
It might be year, month, day, hour, etc. Do you expect to get questions like "what was averege for that account between 1900 and 12am yesterday?".
I will assume that the answer is DAY.
Then,
I will calculate sum(amount) and count() for every account for every DAY of cold body.
I will not create a dummy records, if particular account had no activity on some day.
and I will save day, account, total amount, count in a TABLE.
if there are modifications later to the cold body, you delete and reload affected day from that table.
For hot tail there might be multiple strategies:
Do the same as above (same process, clear to support)
always calculate on a fly
use materialized view as an averege between 1 and 2.
Cold body table totalc could also be implemented as materialized view, but if data never change - no need to rebuild it.
With this you go from (number of account) x (number of transactions per day) x (number of days) to (number of account)x(number of active days) number of records.
That should speed up all following calculations.

Access 2010: JOIN tables and return the most recent record

I'm trying to work out how to get the right results from the following.
I am trying to create an Access query that takes the relevant staff rate to calculate a cost for an employee from timesheet data. The following is an example of the time data:
ID EmpNo Period_Month Period_Year CostCode Workstage Line_Hours
14 11486 3 2014 C10798 000 20.00
15 11486 3 2014 C10657 000 21.50
16 11486 3 2014 C11112 000 10.00
For this employee, there may be rates set during different periods as so:
EmpNo Period_Month Period_Year Rate
11486 1 2014 10.00
11486 3 2014 12.00
11486 6 2014 15.00
I want to know how I can join the two tables to calculate a cost (hours * rate) and pick out only correct rate. A rate takes affect from the period it is stamped with and then-on until a new rate is entered. Normally in SQl I'd do this by taking the top item of an embedded select in the join but I can't seem to do the same in Access. I've also read that I could do a join on the rate table twice to pick up the item in the staff rate table that I require, but can't seem to apply the same logic to this.
UPDATE
As requested, the following is as far as i've got with the query. It gets me as far as all the rates for the current and previous periods, but I can't find a way to take the top one.
SELECT t.EmpNo, t.CostCode, t.Workstage, t.TimeCode_Desc, t.Line_Hours, t.Period_Month, t.Period_Year, srA.Rate
FROM (tblTime AS t LEFT JOIN qryTotalHours AS hrs ON (t.Period_Year = hrs.Period_Year) AND (t.Period_Month = hrs.Period_Month) AND (t.EmpNo = hrs.EmpNo)) LEFT JOIN tblStaffRates AS srA ON t.EmpNo = srA.EmpNo
WHERE (((t.Period_Month)>=[srA].[Period_Month]));

Oracle SQL Month Statement Generation

I am having performance issue on a set of SQLs to generate current month's statement in realtime.
Customers will purchase some goods using points from an online system, and the statement containing "open_balance", "point_earned", "point_used", "current_balance" should be generated.
The following shows the shortened schema :
//~200k records
customer: {account_id:string, create_date:timestamp, bill_day:int} //totally 14 fields
//~250k records per month, kept for 6 month
history_point: {point_id:long, account_id:string, point_date:timestamp, point:int} //totally 9 fields
//each customer have maximum of 12 past statements kept
history_statement: {account_id:string, open_date:date, close_date:date, open_balance:int, point_earned:int, point_used:int, close_balance:int} //totally 9 fields
On every bill day, the view should automatically create a new month statement.
i.e. If bill_day is 15, then transaction done on or after 16 Dec 2013 00:00:00 should belongs to new bill cycle of 16 Dec 2013 00:00:00 - 15 Jan 2014 23:59:59
I tried the approach described below,
Calculate the last close day for each account (in materialized view, so that it update only after there is new customer or past month statement inserted into history_statement)
Generate a record for each customer each month that I need to calculate (Also in materialized view)
Sieve the point record for only point records within the date that I will calculate (This takes ~0.1s only)
Join 2 with 3 to obtain point earned and used for each customer each month
Join 4 with 4 on date less than open date to sum for open and close balance
6a. Select from 5 where open date is less than 1 month old as current balance (these are not closed yet, and the point reflect the point each customer own now)
6b. All the statements are obtained by union of history_statement and 5
On a development server, the average response time (200K customer, 1.5M transactions in current month) is ~3s which is pretty slow for web application, and on the testing server, where resources are likely to be shared, the average response time (200K customer, ~200k transaction each month for 8 months) is 10-15s.
Does anyone have some idea on writing a query with better approach or to speed up the query?
Related SQL:
2: IV_STCLOSE_2_1_T(Materialized view)
3: IV_STCLOSE_2_2_T (~0.15s)
SELECT ACCOUNT_ID, POINT_DATE, POINT
FROM history_point
WHERE point_date >= (
SELECT MIN(open_date)
FROM IV_STCLOSE_2_1_t
)
4: IV_STCLOSE_3_T (~1.5s)
SELECT p0.account_id, p0.open_date, p0.close_date, COALESCE(SUM(DECODE(SIGN(p.point),-1,p.point)),0) AS point_used, COALESCE(SUM(DECODE(SIGN(p.point),1,p.point)),0) AS point_earned
FROM iv_stclose_2_1_t p0
LEFT JOIN iv_stclose_2_2_t p
ON p.account_id = p0.account_id
AND p.point_date >= p0.open_date
AND p.point_date < p0.close_date + INTERVAL '1' DAY
GROUP BY p0.account_id, p0.open_date, p0.close_date
5: IV_STCLOSE_4_T (~3s)
WITH t AS (SELECT * FROM IV_STCLOSE_3_T)
SELECT t1.account_id AS STAT_ACCOUNT_ID, t1.open_date, t1.close_date, t1.open_balance, t1.point_earned AS point_earn, t1.point_used , t1.open_balance + t1.point_earned + t1.point_used AS close_balance
FROM (
SELECT v1.account_id, v1.open_date, v1.close_date, v1.point_earned, v1.point_used, COALESCE(sum(v2.point_used + v2.point_earned),0) AS OPEN_BALANCE
FROM t v1
LEFT JOIN t v2
ON v1.account_id = v2.account_id
AND v1.OPEN_DATE > v2.OPEN_DATE
GROUP BY v1.account_id, v1.open_date, v1.close_date, v1.point_earned, v1.point_used
) t1
It turns out to be that in IV_STCLOSE_4_T
WITH t AS (SELECT * FROM IV_STCLOSE_3_T)
is problematic.
At first thought WITH t AS would be faster as IV_STCLOSE_3_T is only evaluated once, but it apparently forced materializing the whole IV_STCLOSE_3_T, generating over 200k records despite I only need at most 12 of them from a single customer at any time.
With the above statement removed and appropriately indexing account_id, the cost reduced from over 500k to less than 500.

Why is there no "product()" aggregate function in SQL? [duplicate]

This question already has answers here:
is there a PRODUCT function like there is a SUM function in Oracle SQL?
(7 answers)
Closed 7 years ago.
When there are Sum(), min(), max(), avg(), count() functions, can someone help understand why there is no product() built-in function. And what will be the most efficient user-implementation of this aggregate function ?
Thanks,
Trinity
If you have exponential and log functions available, then:
PRODUCT(TheColumn) = EXP(SUM(LN(TheColumn)))
One can make a user-defined aggregate in SQL 2005 and up by using CLR. In Postgresql, you can do it in Postgres itself, likewise with Oracle
I'll focus on the question why it's not a standard function.
Aggregate function are basic statistical functions and product is not
Applied to common numerical data, the result will be in most cases out of range (overflow) so it is of little general use
It's probably left out because most people don't need it and it can be defined easily in most databases.
Solution for PostgreSQL:
CREATE OR REPLACE FUNCTION product_sfunc(state numeric, factor numeric)
RETURNS numeric AS $$
SELECT $1 * $2
$$ LANGUAGE sql;
CREATE AGGREGATE product (
sfunc = product_sfunc,
basetype = numeric,
stype = numeric,
initcond = '1'
);
You can simulate product() using cursors. If you let us know which database platform you're using, then we might be able to give you some sample code.
I can confirm that it is indeed rare to use a product() aggregate function, but I have a quite valid example, especially working with highly aggregated data that must be presented to users in a report.
It utilizes the exp(sum(ln( multiplyTheseColumnValues ))) "trick" as mentioned in another post and other internet sources.
The report (which should care about the display, and contain as least data calculation logic as possible, to provide better maintainability and flexibility) is basically displaying this data along with some graphics:
DESCR SUM
---------------------------------- ----------
money available in 2013 33233235.3
money spent in 2013 4253235.3
money bound to contracts in 2013 34333500
money spent 2013 in % of available 12
money bound 2013 in % of available 103
(In real life its a bit more complex and used in state budget scenarios.)
It aggregates quite some complex data found in the first 3 rows.
I do not want to calculate the percentage values of the following rows (4th and 5th) by:
doing it in the quite dumb (as it should be) report (which just takes any number of such rows with a descripiton descr and a number sum) with some fancy logic (using JasperReports, BIRT Reports or alike)
neither do I want to calculate the underlying data (money available, money spent, money bound) multiple times (since these are quite expensive operations) just to calculate the percentage values
So I used another trick involving the use of the product()-functionality.
(If somebody does know of a better way to achive this considering the above mentioned restrictions, I would be happy to know :-) )
The whole simplified example is available as one executable SQL below.
Maybe it could help convice some Oracle guys that this functionality is not as rare, or not worth providing, as it may seem at first thoughts.
with
-- we have some 10g database without pivot/unpivot functionality
-- what is interesting for various summary reports
sum_data_meta as (
select 'MA' as sum_id, 'money available in 2013' as descr, 1 as agg_lvl from dual
union all select 'MS', 'money spent in 2013', 1 from dual
union all select 'MB', 'money bound to contracts in 2013', 1 from dual
union all select 'MSP', 'money spent 2013 in % of available', 2 from dual
union all select 'MBP', 'money bound 2013 in % of available', 2 from dual
)
/* select * from sum_data_meta
SUM_ID DESCR AGG_LVL
------ ---------------------------------- -------
MA money available in 2013 1
MS money spent in 2013 1
MB money bound to contracts in 2013 1
MSP money spent 2013 in % of available 2
MBP money bound 2013 in % of available 2
*/
-- 1st level of aggregation with the base data (the data actually comes from complex (sub)SQLs)
,sum_data_lvl1_base as (
select 'MA' as sum_id, 33233235.3 as sum from dual
union all select 'MS', 4253235.3 from dual
union all select 'MB', 34333500 from dual
)
/* select * from sum_data_lvl1_base
SUM_ID SUM
------ ----------
MA 33233235.3
MS 4253235.3
MB 34333500.0
*/
-- 1st level of aggregation with enhanced meta data infos
,sum_data_lvl1 as (
select
m.descr,
b.sum,
m.agg_lvl,
m.sum_id
from sum_data_meta m
left outer join sum_data_lvl1_base b on (b.sum_id=m.sum_id)
)
/* select * from sum_data_lvl1
DESCR SUM AGG_LVL SUM_ID
---------------------------------- ---------- ------- ------
money available in 2013 33233235.3 1 MA
money spent in 2013 4253235.3 1 MS
money bound to contracts in 2013 34333500.0 1 MB
money spent 2013 in % of available - 2 MSP
money bound 2013 in % of available - 2 MBP
*/
select
descr,
case
when agg_lvl < 2 then sum
when agg_lvl = 2 then -- our level where we have to calculate some things based on the previous level calculations < 2
case
when sum_id = 'MSP' then
-- we want to calculate MS/MA by tricky aggregating the product of
-- (MA row:) 1/33233235.3 * (MS:) 4253235.3/1 * (MB:) 1/1 * (MSP:) 1/1 * (MBP:) * 1/1
trunc( -- cut of fractions, e.g. 12.7981 => 12
exp(sum(ln( -- trick simulating product(...) as mentioned here: http://stackoverflow.com/a/404761/1915920
case when sum_id = 'MS' then sum else 1 end
/ case when sum_id = 'MA' then sum else 1 end
)) over ()) -- "over()" => look at all resulting rows like an aggregate function
* 100 -- % display style
)
when sum_id = 'MBP' then
-- we want to calculate MB/MA by tricky aggregating the product as shown above with MSP
trunc(
exp(sum(ln(
case when sum_id = 'MB' then sum else 1 end
/ case when sum_id = 'MA' then sum else 1 end
)) over ())
* 100
)
else -1 -- indicates problem
end
else null -- will be calculated in a further step later on
end as sum,
agg_lvl,
sum_id
from sum_data_lvl1
/*
DESCR SUM AGG_LVL SUM_ID
---------------------------------- ---------- ------- ------
money available in 2013 33233235.3 1 MA
money spent in 2013 4253235.3 1 MS
money bound to contracts in 2013 34333500 1 MB
money spent 2013 in % of available 12 2 MSP
money bound 2013 in % of available 103 2 MBP
*/
Since the Product is noting but the multiple of SUM, so in SQL they didnot introduce the Product aggregate function
For example: 6 * 4 can be acheived by
either adding 6, 4 times to itself like 6+6+6+6
or
adding 4, 6 times to itself like 4+4+4+4+4+4
thus giving the same result