SAP BO - how to get 1/0 distinct values per week in each row - sql

the problem I am trying to solve is having a SAP Business Objects query calculate a variable for me because calculating it in a large excel file crashes the process.
I am having a bunch of columns with daily/weekly data. I would like to get a "1" for the first instance of Name/Person/Certain Identificator within a single week and "0" for all the rest.
So for example if item "Glass" was sold 5 times in week 4 in this variable/column first sale will get "1" and next 4 sales will get "0". This will allow me to have the number of distinct items being sold in a particular week.
I am aware there are Count and Count distinct functions in Business Objects, however I would prefer to have this 1/0 system for the entire raw table of data because I am using it as a source for a whole dashboard and there are lots of metrics where distinct will be part/slicer for.
The way I doing it previously is with excel formula: =IF(SUMPRODUCT(($A$2:$A5000=$A2)*($G$2:$G5000=$G2))>1,0,1)
This does the trick and gives a "1" for the first instance of value in column G appearing in certain value range in column A ( column A is the week ) and gives "0" when the same value reappears for the same week value in column A. It will give "1" again when the week value change.
Since it is comparing 2 cells in each row for the entire columns of data as the data gets bigger this tends to crash.
I was so far unable to emulate this in Business Objects and I think I exhausted my abilities and googling.
Could anyone share their two cents on this please?

Assuming you have an object in the query that uniquely identifies a row, you can do this in a couple of simple steps.
Let's assume your query contains the following objects:
Sale ID
Name
Person
Sale Date
Week #
Price
etc.
You want to show a 1 for the first occurrence of each Name/Week #.
Step 1: Create a variable with the following definition. Let's call it [FirstOne]
=Min([Sale ID]) In ([Name];[Week #])
Step 2: In the report block, add a column with the following formula:
=If [FirstOne] = [Sale ID] Then 1 Else 0
This should produce a 1 in the row that represents the first occurrence of Name within a Week #. If you then wanted to show a 1 one the first occurrence of Name/Person/Week #, you could just modify the [FirstOne] variable accordingly:
=Min([Sale ID]) In ([Name];[Person];[Week #])

I think you want logic around row_number():
select t.*,
(case when 1 = row_number() over (partition by name, person, week, identifier
order by ??
)
then 1 else 0
end) as new_indicator
from t;
Note the ??. SQL tables represent unordered sets. There is no "first" row in a table or group of rows, unless a column specifies that ordering. The ?? is for such a column (perhaps a date/time column, perhaps an id).
If you only want one row to be marked, you can put anything there, such as order by (select null) or order by week.

Related

Getting another column from same row which has first non-null value in column

I have a SQL table like this and I want to find the average adjusted amt for products partitioned by store_id that looks like this
Here, I need to compute the adj_amt which is the product of the previous two columns.
For this, I need to fill the nulls in the avg_quantity with the first non_null value in the partition. The query I use is below.
select
CASE WHEN av_quantity is null then
# the boolen here is for non-null values
first_value(av_quantity, True) over (partition by store_no order by product_id
range between current row and unbounded following
)
else av_quantity
end as adj_av_quantity
I'm having trouble with the SQL required to get the adjusted cost, since its not pulling the first non_null value for factor but still fetches it based on the same row for the adj_av_quantity. any thoughts on how I could do this?
FYI I've simplified the data here. The actual dataset is pretty huge (> 125 million rows with 800+ columns) so I won't be able to use joins and have to do this via window functions. I'm using spark-sql

How to calculate a bank's deposit growth from one call report to the next, as a percentage?

I downloaded the entire FDIC bank call reports dataset, and uploaded it to BigQuery.
The table I currently have looks like this:
What I am trying to accomplish is adding a column showing the deposit growth rate since the last quarter for each bank:
Note:The first reporting date for each bank (e.g. 19921231) will not have a "Quarterly Deposit Growth". Hence the two empty cells for the two banks.
I would like to know if a bank is increasing or decreasing its deposits each quarter/call report (viewed as a percentage).
e.g. "On their last call report (19921231)First National Bank had deposits of 456789 (in 1000's). In their next call report (19930331)First National bank had deposits of 567890 (in 1000's). What is the percentage increase (or decrease) in deposits"?
This "_%_Change_in_Deposits" column would be displayed as a new column.
This is the code I have written so far:
select
SFRNLL.repdte, SFRNLL.cert, SFRNLL.name, SFRNLL.city, SFRNLL.county, SFRNLL.stalp, SFRNLL.specgrp AS `Loan_Specialization`, SFRNLL.lnreres as `_1_to_4_Residential_Loans`, AL.dep as `Deposits`, AL.lnlsnet as `loans_and_leases`,
IEEE_DIVIDE(SFRNLL.lnreres, AL.lnlsnet) as SFR2TotalLoanRatio
FROM usa_fdic_call_reports_1992.All_Reports_19921231_1_4_Family_Residential_Net_Loans_and_Leases as SFRNLL
JOIN usa_fdic_call_reports_1992.All_Reports_19921231_Assets_and_Liabilities as AL
ON SFRNLL.cert = AL.cert
where SFRNLL.specgrp = 4 and IEEE_DIVIDE(SFRNLL.lnreres, AL.lnlsnet) <= 0.10
UNION ALL
select
SFRNLL.repdte, SFRNLL.cert, SFRNLL.name, SFRNLL.city, SFRNLL.county, SFRNLL.stalp, SFRNLL.specgrp AS `Loan_Specialization`, SFRNLL.lnreres as `_1_to_4_Residential_Loans`, AL.dep as `Deposits`, AL.lnlsnet as `loans_and_leases`,
IEEE_DIVIDE(SFRNLL.lnreres, AL.lnlsnet) as SFR2TotalLoanRatio
FROM usa_fdic_call_reports_1993.All_Reports_19930331_1_4_Family_Residential_Net_Loans_and_Leases as SFRNLL
JOIN usa_fdic_call_reports_1993.All_Reports_19930331_Assets_and_Liabilities as AL
ON SFRNLL.cert = AL.cert
where SFRNLL.specgrp = 4 and IEEE_DIVIDE(SFRNLL.lnreres, AL.lnlsnet) <= 0.10
The table looks like this:
Additional notes:
I would also like to view the last column (SFR2TotalLoansRatio) as a percentage.
This code runs correctly, however, previously I was getting a "division by zero" error when attempting to run 50,000 rows (1992 to the present).
Addressing each of your question individually.
First) Retrieving SFR2TotalLoanRatio as percentage, I assume you want to see 9.988% instead of 0.0988 in your results. Currently, in BigQuery you can achieve this by casting the field into a STRING then, concatenating the % sign. Below there is an example with sample data:
WITH data as (
SELECT 0.0123 as percentage UNION ALL
SELECT 0.0999 as percentage UNION ALL
SELECT 0.3456 as percentage
)
SELECT CONCAT(CAST(percentage*100 as String),"%") as formatted_percentage FROM data
And the output,
Row formatted_percentage
1 1.23%
2 9.99%
3 34.56%
Second) Regarding your question about the division by zero error. I am assuming IEEE_DIVIDE(arg1,arg2) is a function to perform the division, in which arg1 is the divisor and arg2 is the dividend. Therefore, I would adivse your to explore your data in order to figured out which records have divisor equals to zero. After gathering these results, you can determine what to do with them. In case you decide to discard them you can simply add within your WHERE statement in each of your JOINs: AL.lnlsnet = 0. On the other hand, you can also modify the records where lnlsnet = 0 using a CASE WHEN or IF statements.
UPDATE:
In order to add this piece of code your query, you u have to wrap your code within a temporary table. Then, I will make two adjustments, first a temporary function in order to calculate the percentage and format it with the % sign. Second, retrieving the previous number of deposits to calculate the desired percentage. I am also assuming that cert is the individual id for each of the bank's clients. The modifications will be as follows:
#the following function MUST be the first thing within your query
CREATE TEMP FUNCTION percent(dep INT64, prev_dep INT64) AS (
Concat(Cast((dep-prev_dep)/prev_dep*100 AS STRING), "%")
);
#followed by the query you have created so far as a temporary table, notice the the comma I added after the last parentheses
WITH data AS(
#your query
),
#within this second part you need to select all the columns from data, and LAG function will be used to retrieve the previous number of deposits for each client
data_2 as (
SELECT repdte, cert, name, city, county, stalp, Loan_Specialization, _1_to_4_Residential_Loans,Deposits, loans_and_leases, SFR2TotalLoanRatio,
CASE WHEN cert = lag(cert) OVER (PARTITION BY id ORDER BY d) THEN lag(Deposits) OVER (PARTITION BY id ORDER BY id) ELSE NULL END AS prev_dep FROM data
)
SELECT repdte, cert, name, city, county, stalp, Loan_Specialization, _1_to_4_Residential_Loans,Deposits, loans_and_leases, SFR2TotalLoanRatio, percent(Deposits,prev_dep) as dept_growth_rate FROM data_2
Note that the built-in function LAG is used together with CASE WHEN in order to retrieve the previous amount of deposits per client.

Wrapping a range of data

How would I select a rolling/wrapping* set of rows from a table?
I am trying to select a number of records (per type, 2 or 3) for each day, wrapping when I 'run out'.
Eg.
2018-03-15: YyBiz, ZzCo, AaPlace
2018-03-16: BbLocation, CcStreet, DdInc
These are rendered within a SSRS report for Dynamics CRM, so I can do light post-query operations.
Currently I get to:
2018-03-15: YyBiz, ZzCo
2018-03-16: AaPlace, BbLocation, CcStreet
First, getting a number for each record with:
SELECT name, ROW_NUMBER() OVER (PARTITION BY type ORDER BY name) as RN
FROM table
Within SSRS, I then adjust RN to reflect the number of each type I need:
OnPageNum = FLOOR((RN+num_of_type-1)/num_of_type)-1
--Shift RN to be 0-indexed.
Resulting in AaPlace, BbLocation and CcStreet having a PageNum of 0, DdInc of 1, ... YyBiz and ZzCo of 8.
Then using an SSRS Table/Matrix linked to the dataset, I set the row filter to something like:
RowFilter = MOD(DateNum, NumPages(type)) == OnPageNum
Where DateNum is essentially days since epoch, and each page has a separate table and day passed in.
At this point, it is showing only N records of type per page, but if the total number of records of a type isn't a multiple of the number of records per page of that type, there will pages with less records than required.
Is there an easier way to approach this/what's the next step?
*Wrapping such as Wraparound found in videogames, seamless resetting to 0.
To achieve this effect, I found that offsetting the RowNumber by -DateNum*num_of_type (negative for positive ordering), then modulo COUNT(type) would provide the correct "wrap around" effect.
In order to achieve the desired pagination, it then just had to be divided by num_of_type and floor'd, as below:
RowFilter: FLOOR(((RN-DateNum*num_of_type) % count(type))/num_of_type) == 0

DAX: Is the value in one column the same this month as it was last?

I need to create a calculated column. I have a list of items with serial #s, and those items are assigned to someone each month. I need to know (0/1) whether the owner of that item this month is the same as the owner of that item last month. (So I can create a measure to average how many are changing owners month-to-month.)
Basically, I'm trying to achieve the last column:
Month ItemID Owner Same Owner as Prev Mth
2015/01/31 A1 Al
2015/01/31 A2 Bob
2015/01/31 A3 Carl
2015/02/28 A1 Al 1
2015/02/28 A2 Carl 0
2015/02/28 A3 Carl 1
2015/03/31 A1 Bob 0
2015/03/31 A2 Bob 0
2015/03/31 A3 Bob 0
2015/04/30 A1 Bob 1
2015/04/30 A2 Bob 1
2015/04/30 A3 Al 0
I tried a CALCULATE(Max([Owner]), FILTER(tbl, DATEADD([Month],-1,MONTH)=EARLIER([Month]), FILTER(tbl, [ItemID] = EARLIER([ItemID]))
But Max doesn't work on text fields. So I am kind of stumped. I know this shouldn't be that hard...
Date logic is almost always an issue of modeling rather than clever functions.
You will need a date table with a monotonically incremented integer id for months. I typically refer to this as MonthSequential or MonthIndex depending on the intended audience for the model. This field simply increments by 1 for each month in the date table without wrapping at year boundaries. Thus if the first month in your model is January, 2014, that month will have MonthSequential=1. February, 2014 has MonthSequential=2, and so on to December, 2014 with MonthSequential=12. January, 2015 has MonthSequential=13.
This allows very simple arithmetic to identify any month or range of months an arbitrary amount of time from the current month. Once you have this field in your date dimension (and your Items[Month] field related to your DimDate[Date] field), life gets pretty easy:
SameOwnerPreviousMonth=
IF(
CALCULATE(
VALUES(Items[Owner])
,FILTER(
ALLEXCEPT(Items, Items[ItemID])
,RELATED(DimDate[MonthSequential]) =
EARLIER(RELATED(DimDate[MonthSequential])) - 1
)
) = Items[Owner]
,1
,0
)
There's some funkiness here with row context, which I will explain.
Any calculated column is defined by some formula. That formula is evaluated in the row context of the table. What happens is a row-by-row iteration through the table. The formula you provide is evaluated once per row and that creates the value for that calculated column.
This being said, the storage engine and formula engine behind DAX have no concept of a row ordering. This means that any formula we define for a calculated column must provide its own ordering or reference to another row if we need to do that.
So, what do we do to find the owner in the previous month? Well, we need to look through the entire Items table and find the row which has the same [ItemId] and falls in the month immediately prior to the month on the current row. Our [MonthSequential] makes finding a date in the previous month trivial, and DAX offers many context-manipulating functions to preserve or eliminate context.
Note: I will refer to function arguments positionally, with the first argument to a function indicated by (1).
Let's step through the solution. We'll ignore the IF() because that is trivial. The meat of the formula lies in the CALCULATE() which identifies the [Owner] in the previous month:
CALCULATE(
VALUES(Items[Owner])
,FILTER(
ALLEXCEPT(Items, Items[ItemID])
,RELATED(DimDate[MonthSequential]) =
EARLIER(RELATED(DimDate[MonthSequential])) - 1
)
)
CALCULATE() evaluates arguments (2)-(n) first, to create a new filter context. That filter context is then used to evaluate the expression in (1).
FILTER() iterates row-by-row through the table provided in (1) and evaluates the boolean expression in (2) for each row in (1). It returns a table made up of the subset of rows of (1) for which (2) evaluates to true. Since we are already iterating through the entire Items table in evaluating our calculated column, we end up with two sets of row context. The outer row context is the iteration through the whole table. The inner row context is the iteration through (1) of our filter. The outer row context affects the inner, and we must modify/remove select portions of the outer context as needed.
The table we iterate over is ALLEXCEPT(Items, Items[ItemId]). ALLEXCEPT() strips out all context, except for the fields named. On any given row in our outer context, we preserve the value of Items[ItemId] and strip all other context ([Month] and [Owner], along with any other fields you've not named in your sample data). This gives us a table for our FILTER() made up of every row in Items which shares the [ItemId] of the current row in the outer filter context. This subset table becomes the generator of our inner row context.
Now we're iterating over FILTER()'s (1), explained above. RELATED() allows us to call out to get a value from another table related to the current one. We grab the [MonthSequential] value that is tied to the current row in our inner row context. We want to find the month that is immediately prior to the current month in the outer row context. To refer to a value in the outer row context, we need to escape the inner.
EARLIER() allows us to escape the current (inner) row context and refer to the last valid (outer) row context. This can happen through arbitrary levels of nesting of contexts. Luckily, we only have two. EARLIER(RELATED(DimDate[MonthSequential])) finds the [MonthSequential] value of the current row in the outer context. We simply subtract 1 from that to get the prior month (and since we're using [MonthSequential], we have no need to implement any logic to handle wrapping around year barriers).
Thus the context in which we evaluate VALUES(Items[Owner]) is that subset of our Items table where [ItemId] is equal to the current row in our outer row context, and the value of [MonthSequential] is one less than the current row in the outer row context. VALUES() returns the list of values which make up the column reference inside. In this case, since every [ItemId] is associated with only a single [Owner] in any given month, that list is only a single value which can be implicitly cast to a scalar value and represented in our calculated column.
Our IF() simply tests this [Owner] value against that of the current row in the outer row context and returns a 1 or 0 as appropriate.
This will break if you have a single [ItemId] which has multiple distinct [Owner]s in a given month.
Model diagram:

Identifying parent records for many transactions

This is related to a question I asked previously for which lag/lead was suggested. However the data I'm working with are more complex than I first thought so I need a more robust solution. This screen shot shows an issue I need to tackle:
Within a single serial number, a shipment event defines a new reference window. So records 2,3,4 relate to 1. Record 6 relates to 5 and so forth. I need to mark the records for which the BillToId doesn't match the parent shipment.
I'm trying to understand if I could even use the LAG function to compare records 2,3,4 back to 1 when the number of post-shipment events varies (duplicates are allowed). I was thinking I might be better off with another fact table that identifies the parent rowid along each record first?
So then my question becomes how do I efficiently identify which shipment each row belongs to? Am I forced to run a subquery for each record? I'm working right now with over 2 million total rows. I would later make this query part of the ETL process so it would be processing smaller chunks of data.
Here is an approach that uses the cumulative sum functionality in SQL Server. The idea is to assign each "ship" activity a value of "1" and "0" for everything else. Then do a cumulative sum to identify each group that should have the same billtoid. After that, the ship information can be assigned to all records in the same group:
select rowid, dateid, billtoid, activitytypeid, serialnumber
from (select t.*,
max(case when activitytypeid = 'Ship' then billtoid end) over
(partition by serialnumber, cumships) as ship_billtoid
from (select t.*,
sum(case when activitytypeid = 'Ship' then 1 else 0 end) over
(partition by serialnumber order by rowid) as cumships
from t
) t
) t
where billtoid <> ship_billtoid;