TABLEAU CALCULATED FIELD OPTIMIZATION - sql

I have a calculated field called FirstSale where I observe the first instance of a product that sells more than 80% of its inventory.
I look at the product ID and the timestamp (converted to string) and the % of inventory sold.
How can this query be optimized to not depend on this many fields OR how can this be converted to a SQL query.
Calculated Field Logic:
IF STR[SaleDate]) =
{FIXED [ID], {FIXED [ID], STR([SaleDate]), [Inventory %]:
IF MIN([Inventory %]) > 0.8
THEN 1 ELSE 0 END}: MIN([STR(SaleDate]))}
THEN 1
END
The data looks like this
Where there is a product ID, Sale Date, Inventory % and the last column (with 1s and 0s) is the calculated field.
Essentially, The goal is that the calculation should return 1 only for the first time a an ID shows Inventory % > 80%. In all other cases, return 0.
For example looking at the second ID, the only value combination that should have a 1 is October 28 (2020083008056, October 28 2020, 84.00%, 1 ) and all other values should return 0.
So the full return for the second ID would be
(2020083008056, October 28 2020, 84.00%, 1 )
(2020083008056, October 29 2020, 84.36%, 0 )
(2020083008056, October 30 2020, 84.67%, 0 )
(2020083008056, October 31 2020, 84.67%, 0 )

I have recreated some sample data to solve your problem. If I have not misunderstood your LOD calculation should be pretty much simpler. Let's have a look.
Sample data re-created
Added one CF just to check whether inventory sale is greater or equal to .80 and added it to view to create a data like you have shown.
Now add your desired field with calculation as
{FIXED [Prod id] : MIN(
IF [Greater than 80]=1 then [Date] end)} = [Date]
Adding this field to view/filter should serve your purpose. See it
Still if you want returns as 0 or 1 use this calculation instead
IF {FIXED [Prod id] : MIN(
IF [Greater than 80]=1 then [Date] end)} = [Date]
then 1 else 0 end

Related

Issue with Joining Tables in SQL

SQL newbie here, using Zoho Analytics to do some reporting, specifically with prorated forecasting of lead generation. I successfully created some tables that contain lead goals, and joined them onto matching leads based off of the current month. The problem I am having is I would like to be able to access my prorated goals even if I filter so that there are no leads that have been created yet. This will make more sense in the picture I attached, with an RPM gauge that cannot pull the target or maximum because no leads match the filter criteria. How do I join the tables (with maybe an ifnull statement?) so that even if no lead ID's match, I can still output my goals? Thanks so much in advance.
RPM Gauge With prorated target and monthly goal
RPM gauge settings, distinct count of Lead Id's
Base table with goal used in Query table
Query table, forgive me I am new
Sorry for what I am sure is a fundamental misunderstanding of how this works, I have had to teach myself everything I know about SQL, and I am apparently not a terribly great teacher.
Thanks!
I have tried using a right join, and an ifnull statement but it did not improve matters.
Edit- Sorry for the first post issues- here is the code and tables not in image form
Lead Table Example-
ID
Lead Created Time
Lead Type
12345
11/21/2022
Charge
12346
10/17/2020
Store
12347
08/22/2022
Enhance
I purposefully left out an entry that would match my filter criteria, as for the first few days of the month this often comes up. Ideally I would still like to get the prorated and total goals returned.
The table the query is pulling from to determine the prorated numbers-
Start Date
End Date
Prorating decimal
Charge
Enhance
Store
Service
Charge[PR]
Enhance[PR]
Store[PR]
Service[PR]
Total Leads
Total Leads[PR]
Jan 01 2022
Jan 31 2022
.1
15
12
15
20
1.5
1.2
1.5
2.0
62
6.2
Feb 01 2022
Feb 28 2022
.1
15
12
15
20
1.5
1.2
1.5
2.0
62
6.2
Mar 01 2022
Mar 31 2022
.1
15
12
15
20
1.5
1.2
1.5
2.0
62
6.2
^For simplicity's sake I did not change the goals month to month, but they would in reality.
Idea for a successful data table, [PR] meaning prorated-
Sum of Lead Id's
Storage Goal
Storage Goal[PR]
Charge Goal
Charge Goal [PR]
14
10
1
15
2
1
10
1
15
2
0
10
1
15
2
The SQL Query that I have that returns the blank gauge when no leads match my criteria(Created this month, and lead type=Store)
SELECT
"Leads"."Id",
"SSS - 2022 Leads Forecast [Job Type]".*
FROM "Leads"
RIGHT JOIN "SSS - 2022 Leads Forecast [Job Type]" ON ((GETDATE() >= "Start Date")
AND (GETDATE() <= "End Date"))
Thanks so much to everyone who helped me reformat, first time poster so still learning the ropes. Let me know if I can provide more context or better info.
Figured this out! I used subqueries, filtering manually in the query instead of through the analytics widget, and did a distinct count to return zero instead of null, as well as coalescing for the dollar amount to return zero. (Not applicable in the below example) Below I have an example of some of the queries I used, as well as the resulting data table that is giving me the result that I want.
SELECT
( SELECT count(*)
FROM ( SELECT DISTINCT "Leads"."Id"
FROM "Leads"
WHERE "Lead Type" = 'Charge'
AND month_name("Created Time") = month_name(GETDATE())
AND year("Created Time") = year(GETDATE())
) AS 'test1'
) AS 'Charge Leads',
( SELECT count(*)
FROM ( SELECT DISTINCT "Leads"."Id"
FROM "Leads"
WHERE "Lead Type" = 'Store'
AND month_name("Created Time") = month_name(GETDATE())
AND year("Created Time") = year(GETDATE())
) AS 'test2'
) AS 'Store Leads',
( SELECT count(*)
FROM ( SELECT DISTINCT "Leads"."Id"
FROM "Leads"
WHERE "Lead Type" = 'Enhance'
AND month_name("Created Time") = month_name(GETDATE())
AND year("Created Time") = year(GETDATE())
) AS 'test3'
) AS 'Enhance Leads',
( SELECT count(*)
FROM ( SELECT DISTINCT "Leads"."Id"
FROM "Leads"
WHERE "Lead Type" = 'Service'
AND month_name("Created Time") = month_name(GETDATE())
AND year("Created Time") = year(GETDATE())
) AS 'test4'
) AS 'Service Leads',
"SSS - 2022 Leads Forecast [Job Type]".*
FROM "SSS - 2022 Leads Forecast [Job Type]"
WHERE ((GETDATE() >= "Start Date")
AND (GETDATE() <= "End Date"))
I am 100% sure that there is a more efficient way to do this, but it works and that was the most pressing thing.
Here is the resulting data table, which is exactly what I needed-
Charge Leads
Store Leads
Enhance Leads
Service Leads
Start Date
End Date
[PR] Charge
[PR] Enhance
[PR] Store
[PR] Service
[PR] Total Leads
[Total] Charge
[Total] Enhance
[Total] Store
[Total] Service
[Total] Total Leads
Prorating Decimal
7
0
5
35
01 Dec 2022
31 Dec 2022
64
34
17
56
171
152
81
40
134
407
.419
The [PR] are the prorated goals, so where we should be at this point in the month, and [Total] is the total goal for the month.

Finding a lagged cumulative total from a table containing both cumulative and delta entries

I have a SQL table with a schema where a value is either a cumulative value for a particular category, or a delta on top of the previous value. While I appreciate this is not a particularly great design, it comes from an external source and thus I can't change it in any way.
The table looks something like the following:
Date Category AmountSoldType AmountSold
-----------------------------------------------------
Jan 1 Apples Cumulative 100
Jan 1 Bananas Cumulative 50
Jan 2 Apples Delta 20
Jan 2 Bananas Delta 10
Jan 3 Apples Delta 25
Jan 3 Bananas Cumulative 75
For this example, I want to produce the total cumulative number of fruits sold by item at the beginning of each day:
Date Category AmountSold
--------------------------------
Jan 1 Apples 0
Jan 1 Bananas 0
Jan 2 Apples 100
Jan 2 Bananas 50
Jan 3 Apples 170
Jan 3 Bananas 60
Jan 4 Apples 195
Jan 4 Bananas 75
Intuitively, I want to take the most recent cumulative total, and add any deltas that have appeared since that entry.
I imagine something akin to
SELECT Date, Category
LEAD((subquery??), 1) OVER (PARTITION BY Category ORDER BY Date) AS Amt
FROM Fruits
GROUP BY Date, Category
ORDER BY Date ASC
is what I want, but I'm having trouble putting the right subquery together. Any suggestions?
You seem to want to add the deltas to the most recent cumulative -- all before the current date.
If so, I think this logic does what you want:
select f.*,
(max(case when date = date_cumulative then amountsold else 0 end) over (partition by category
) +
sum(case when date > date_cumulative then amountsold else 0 end) over (partition by category order by date rows between unbounded preceding and 1 preceding
)
) amt
from (select f.*,
max(case when AmountSoldType = 'cumulative' then date else 0 end) over
(partition by category order by date rows between unbounded preceding and current_row
) as date_cumulative
from fruits f
) f
I'm a bit confused by this data set (notwithstanding the mistake in adding up the apples). I assume the raw data states end-of-day figures, so for example 20 apples were sold on Jan 2 (because there is a delta of 20 reported for that day).
In your example results, it does not appear valid to say that zero apples were sold on Jan 1. It isn't actually possible to say how many were sold on that day, because it is not clear whether the 100 cumulative apples were accrued during Jan 1 (and thus should be excluded from the start-of-day figure you seek) or whether they were accrued on previous days (and should be included), or some mix of the two. That day's data should thus be null.
It is also not clear whether all data sets must begin with a cumulative, or whether data sets can begin with a delta (which might require working backwards from a subsequent cumulative), and whether you potentially have access to multiple data sets from your external source which form a continuous consistent sequence, or whether "cumulatives" relate purely to a single data set received. I'm going to assume at least that all data sets begin with a cumulative.
All that said, this problem is a simple case of firstly converting all rows into either all deltas, or all cumulatives. Assuming we go for all cumulatives, then recursing through each row in order, it is a case of either selecting the AmountSold as-is (if the row is a cumulative), or adding the AmountSold to the result of the previous step (if it is a delta).
Once pre-processed like this, then for a start-of-day cumulative, it is all just a question of looking at the previous day's cumulative (which was an end-of-day cumulative, if my initial assumption was correct that all raw data relates to end-of-day figures).
Using the LAG function in this final step to get the previous day's cumulative, will also neatly produce a null for the first row.

Identifying premature expiration

The dataset I have is a bit tricky. It’s a rolling calendar for a period of 24 months, and the data is published only once a month.
The relevant data points are as follows:
• CaseNumber (int)
• Start_date (date)
• Reporting_month (date)
• Months_old (int)
The above ‘CaseNumber’ has the ‘potential’ to appear in a ‘Reporting_month’ as many as 24 times (0-23). However each ‘CaseNumber’ will only appear one time in each ‘Reporting_month’.
So if you list any # of months in chronological order (Jan, Feb, Mar, Apr, etc.) a single ‘CaseNumber’ is will show up in each one of those ‘Reporting_months’ as long as the ‘CaseNumber’ is < 23 ‘Months_old’.
However, once a ‘CaseNumber’ = 24 ‘Months_old’ it will no longer report in this data set. So the oldest any particular ‘CaseNumber’ will ever be in this reporting cycle is 23 ‘Months_old’. Any older, than it will not appear on this report.
What I’m interested doing is tracking these ‘CaseNumbers’ to see if any are dropping off of this report prematurely. So in doing so I need to be able to compare the current ‘Reporting_month’ to the previous ‘Reporting_month’ to determine if any of the ‘CaseNumbers’ prematurely dropped off.
Example:
Case # Previous Current
Months_old Months_old Status
1234 22 23 Correct age
5678 23 NULL Dropped due to age
9101 18 NULL Premature drop
only means i've been able to achieve this is via a VLOOKUP formula in excel done manually. I'd like to get away from having to complete this manually.
SELECT
a.[CaseNumber]
,CONVERT(DATE,MAX(a.[Month]),111) 'Month'
,CASE WHEN m2.[CaseNumber] IS NOT NULL
AND m1.[CaseNumber] IS NULL
THEN 'Yes'
ELSE 'No'
END as 'New Default'
FROM
[dbo].['v2-2yrTotalDefault$'] a
LEFT OUTER JOIN (
SELECT DISTINCT
[CaseNumber]
FROM
[dbo].['v2-2yrTotalDefault$']
WHERE
LEFT(CONVERT(varchar,[Month],112),6) = '201902') m1
ON m1.CaseNumber = a.CaseNumber --most current month
LEFT OUTER JOIN (
SELECT DISTINCT
[CaseNumber]
FROM
[dbo].['v2-2yrTotalDefault$']
WHERE
LEFT(CONVERT(varchar,[Month],112),6) = '201903') m2
ON m2.CaseNumber = a.CaseNumber --previous month
WHERE
a.[Month] > '12/01/2018'
GROUP BY
a.[CaseNumber]
ORDER BY
a.[CaseNumber]
/the continually errors out due to the following error:
Msg 8120, Level 16, State 1, Line 8
Column 'm2.CaseNumber' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
Msg 8120, Level 16, State 1, Line 9
Column 'm1.CaseNumber' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause./
additionally with the above i don't want to have to hard code the months in the SELECT statement. I'd like to be able to control what month i'm looking to view in the WHERE clause.
in the end i'd like the results to return to columns, one reflecting previous months 'age' in months, and the second showing current month 'age'. if a CaseNumber dropped off prematurely, i'd like the current month to say 'premature_expiration'.

SQL Server : sum particular column for this year and last year

SELECT
a.ERSDataValues_ERSCommodity_ID,c.ersGeographyDimension_country,
b.ERSTimeDimension_Year,
SUM(a.ERSDataValues_AttributeValue) as Total
FROM
cosd.ERSDataValues a, cosd.ERSTimeDimension_LU b,
cosd.ERSGeographyDimension_LU c
WHERE
a.ERSDataValues_ERSCommodity_ID IN (SELECT ERSBusinessLogic_InputDataSeries
FROM [AnimalProductsCoSD].[CoSD].[ERSBusinessLogic]
WHERE ERSBusinessLogic_InputGeographyDimensionID = 7493
AND ERSBusinessLogic_InputTimeDimensionValue = 'all months'
AND ERSBusinessLogic_Type = 'time aggregate')
AND a.ERSDataValues_ERSTimeDimension_ID = b.ERSTimeDimension_ID
AND c.ersGeographyDimension_country != 'WORLD'
AND a.ERSDataValues_ERSGeography_ID = c.ERSGeographyDimension_ID
GROUP BY
b.ERSTimeDimension_Year, a.ERSDataValues_ERSCommodity_ID,
c.ersGeographyDimension_country
ORDER BY
b.ERSTimeDimension_Year, a.ERSDataValues_ERSCommodity_ID
All I want is that sum function above to return sum from Jan 2018 to june 2018 and also I want a sum from previous year for the same time period. I do not want to hardcode the months but I rather want it dynamically.
I thought of using conditional aggregrate functions, but the output does not match to my requirement . Any ideas ?
This is the output I want: https://imgur.com/a/YtDgR8s
You can add a filter to a where clause to limit time dimension to current and previous year, and then add something like this to both the SELECT list and GROUP BY:
CASE
WHEN YEAR(b.date)=YEAR(GETDATE()) THEN 'Current Year'
WHEN YEAR(b.date)=YEAR(GETDATE())-1 THEN 'Previous Year'
ELSE NULL
END
It may look a little different depending on how you define current and previous year, but that's the idea.
In terms of limiting Jan to June this has to be in the WHERE clause. If you do not want to hard-code specific months (Jan/June) then I think you have to reference something else from you time dimension - e.g. if you have quarter attribute you can say quarter_number in (1,2).

Power BI: Simple addition give wrong result

I'm counting number of unique IDs per month in given timeframe and I've ancountered two strange things:
1. Looking for the same thing but using two different approaches (value for each month and cumulative value month by month) gives different values. See screenshot below.
2. When You'll add by hand values in first column (monthly value) the result is 868, when PowerBI summerize it - it's 864 o_O
any ideas?
DAX Formulas below:
Y-1 Kandydaci = CALCULATE(
distinctcount(getDataForTeb[ID_DANE_OSOBOWE]);
DATESBETWEEN(
getDataForTeb[Złożenie podania];
DATE(YEAR(now())-1;4;1);
IF(DATE(YEAR(NOW())-1;MONTH(NOW());DAY(NOW()))<=DATE(YEAR(NOW())-1;11;30);
DATE(YEAR(NOW())-1;MONTH(NOW());DAY(NOW()));DATE(YEAR(NOW())-1;11;30)));
ISBLANK(getDataForTeb[REZYGNACJA_DATA]))
Y-1 Kandydaci cumulative = CALCULATE(
DISTINCTCOUNT(getDataForTeb[ID_DANE_OSOBOWE]);
FILTER(
ALL (getDataForTeb);
AND (
getDataForTeb[Złożenie podania] <= MAX(getDataForTeb[Złożenie podania])-364;
AND (
getDataForTeb[Złożenie podania] <= DATE(YEAR(NOW())-1; 11; 30);
getDataForTeb[Złożenie podania] >= DATE(YEAR(NOW())-1; 4; 1)
)
)
);
ISBLANK(getDataForTeb[REZYGNACJA_DATA])
)
Another interesting example just from a while ago: different file, no DAX involved:
Yes! This is the magic of DISTINCTCOUNT(). It has counted the number of distinct values for the [ID_DANE_OSOBOWE] column in each month, but when the measure is evaluated for all months, it does not double count the values which appear in more than one month.
Simplified:
| ID | Month |
+----+-------+
| 1 | March |
| 1 | April |
When you have a measure My Measure = DISTINCTCOUNT(tbl[ID]) for each month the value will be 1, but when you do a distinct count for all months then the value will still be 1 because there is only one distinct value.
In general when you're getting strange results when grand total (calculated automatically) is different then sum of partial results it is either the case explained by mendosi above (regarding DISTINCTCOUNT) because of switching filter context for each row of calculation, or because some calculations count BLANK values as 1 - in March 2019 update of PowerBI a new DAX function was introduced: DistinctCountNoBlank which eliminates counting BLANK values.