MDX - group dimension by name - mdx

I have a dimension (page_type) where the names are not unique (so two keys can have the same name). Now I would like to see the clicks by page_type-name.
The following query unfortunately show the dimension names, but one line per key.
SELECT
{[Measures].[count_clicks]} ON COLUMNS,
[page_type].[page_type].members ON ROWS
FROM
[customer_journey]
The result:
category 150.000
product 100.000
category 80.000
...
How can I change this query, to get only one line per page_type?
category 230.000
product 100.000
...

This is slow but does this work:
with set SetOfPagesWithSameName as
filter
(
[page_type].[page_type].members as p,
p.current.name = [page_type].[page_type].currentmember.name
)
member Measures.TotalCountOFClicks as
sum(
existing SetOfPagesWithSameName,
[Measures].[count_clicks]
)
member Measures.CountSimilarPagesGrt1 as
IIF(SetOfPagesWithSameName.count > 0 , 1, null)
select
NonEmpty([page_type].[page_type].members, Measures.CountSimilarPagesGrt1) on 1,
Measures.TotalCountOFClicks on 0
from [customer_journey]

Related

group by rollup duplicating results

Hi I am trying to use group by rollup to get a grand total for a column but when I do it is duplicating the rows. What am I doing wrong? Thanks.
Code:
WITH TOTS AS
(
select
swkl.prn_cln_n,
swkl.swkl_bgn_dt,
swkl.swkl_end_dt,
swkl.sec_id,
actbl.*
from actbl
join actblc on actblc.actbl_seq_id = actbl.actbl_seq_id
join swkl on swkl.swkl_id = actblc.swkl_id
where swkl.prn_cln_n = '242931'
and swkl.stlm_rcd_sta_cd = 'A'
and trunc(swkl.stlm_rcd_mntd_ts) = to_date('03/10/2021','mm/dd/yyyy') --'14-JUN-2021'
and actblc.actblc_wkfl_pcs_cd = 'COM'
and actblc.stlm_rcd_sta_cd = 'A'
)
,F2 AS
(
SELECT PRN_CLN_N AS CLIENT_NUMBER
,LINE_TYPE_HM
,PR_TYP_C
,SUM(BIIL_A) AS BILL_AMOUNT
FROM TOTS
GROUP BY PRN_CLN_N
,LINE_TYPE_HM
,PR_TYP_C
)
SELECT coalesce(CLIENT_NUMBER,'TOTAL') AS CLIENT_NUMBER
,LINE_TYPE_HM
,PR_TYP_C
,SUM(BILL_AMOUNT) BILL_AMOUNT
FROM F2
group by rollup (CLIENT_NUMBER
,LINE_TYPE_HM
,PR_TYP_C)
There's nothing wrong, as far as I can tell.
When using rollup, you specify columns in it - client_number, line_type_hm, pr_typ_c which is 3 column. rollup will produce 3 + 1 = 4 levels of subtotals. Those subtotals are visually identified by having NULL values in rollup columns. In your screenshot,
1st level are all "fully populated" lines (those that have all columns with some values, i.e. the 1st line, 3rd, 5th, ...)
2nd level are lines whose pr_typ_c is NULL (2nd line, 4th, 6th, ...)
3rd level is the penultimate row (has both line_type_hm and pr_typ_c empty)
4th level is "grand total", the last line with TOTAL in client_number column
I don't have your tables nor data so it is you who might know what to do next because you can reduce number of subtotals by performing partial rollup. How? For example,
group by client_number, rollup(line_type_hm, pr_typ_c)
or any other combination of rollup columns.
Try it and see what happens.

BigQuery: Count consecutive string matches between two fields

I have two tables:
Master_Equipment_Index (alias mei) containing the columns serial_num & model_num
Customer Equipment Index (alias cei) containing the columns account_num, serial_num, & model_num
Originally, guard rails were not implemented to require model attribute input in the mei data whenever new serial_num records were inserted. Whenever that serial_num is later associated with a customer account in the cei data, the model data carries over as null.
What I want to do is backfill the missing model attributes in the cei data from the mei data based on the strongest sequential character match from other similar serial_nums in the mei data.
To further clarify, I don't have access to mass update the mei or cei datasets. I can formalize change requests, but I need to build the function out to prove its worth. So this has to be done outside of any mass action query updates.
cei.account_num
cei.serial_num
cei.model
mei.serial_num
mei.model
serial_num_str_match
row_number
123123123
B4I4SXT1708
null
B4I4SXT178A
Model_Series1
8
1
123123123
B4I4SXT1708
null
B4I4SXTAS34
Model_Series2
7
2
In the table example above row_number 1 has a higher consecutive string match count than row_number 2. I want to only return row_number 1 and populate cei.model with mei.model's value.
cei.account_num
cei.serial_num
cei.model
mei.serial_num
mei.model
serial_num_str_match
row_number
123123123
B4I4SXT1708
Model_Series1
B4I4SXT178A
Model_Series1
8
1
To give an idea as to scale:
The mei data contains 1 million records and the cei data contains 50,000 records. I would have to take and perform this string match for every single cei.account_num, cei.serial_num where the cei.model data is null.
With mac addresses, the first 6 characters identify the vendor and I could look at things similarly in the sample SQL below to help reduce the volume of transactional 1:Many lookups taking place:
/* need to define function */
create temp function string_match_function(x any type, y any type) as (
syntax to generate consecutive string count matches between x and y
);
select * from (
select
c.account_num,
c.serial_num,
m.model,
row_number() over(partition by c.account_num, c.serial_num order by serial_num_str_match desc) seq
from (
select
c.account_num,
c.serial_num,
m.model,
needed: string_match_function(c.serial_num, m.serial_num) as serial_num_str_match
from (
select * from cei where model is null
) c
join (
select * from mei where model is not null
) m on substr(c.serial_num,1,6) = substr(m.serial_num,1,6)
) as a
) as b
where seq = 1
I've looked at different options, some coming from https://hoffa.medium.com/new-in-bigquery-persistent-udfs-c9ea4100fd83, but I'm not finding what I need.
Any insight or direction would be greatly appreciated.
This UDF function counts the equal charachters in each string from the begin:
CREATE TEMP FUNCTION string_match_function(x string, y string)
RETURNS int64
LANGUAGE js
AS r"""
var i=0;
var max_len= Math.min(x.length,y.length);
for(i=0;i<max_len;i++){
if(x[i]!=y[i]) {return i;}
}
return i;
""";
select string_match_function("12a345","1234")
gives 2, because both start with 12

SQL subqueries/ subqueries within subqueries

I'm trying to learn SQL (through a company guide).
I have a list of invoices in a table and a tax code associated with one.
All of the invoices have one tax code, except number 1000001 which has two (and therefore two records in dbo.InvoiceTaxBreakDown.
The exercise wants me to create the variables TaxCode1 and TaxCode2. All except one record should have 'NULL' in TaxCode2.
This is my code:
SELECT
InvoiceNo,
TaxCode1 =
(
SELECT
TOP 1 T.TaxCode
FROM
dbo.InvoiceTaxBreakDown as T
WHERE
I.InvoiceGUID = T.InvoiceGUID
ORDER BY
T.TaxCode DESC
)
,
TaxCode2 =
(
SELECT
TOP 1 T.TaxCode
FROM
dbo.InvoiceTaxBreakDown as T
WHERE
I.InvoiceGUID = T.InvoiceGUID
ORDER BY
T.TaxCode ASC
)
FROM
dbo.InvoiceTaxBreakDown as I
I'm not sure if I even need to reference two data sources..
Only the first record is correct! Please help.
[Output][1]

MDX Filtering dimension members with result of other dimension

I would like to filter a dimension for cube security with some information that are in another dimension.
So - I have a dimension which holds some account Responsible (Account Number and the initials on the one responsible) and another Dimension with all accounts.
I would like to make sure, that a person only can see movements on the accounts on which they are responsible.
I can make the filtering work like this:
SELECT
{} ON 0
,{
Exists
(
Filter
(
[Accounts].[Accounts].[AccountNo]
*
[AccountResponsible].[AccountResponsible].[AccountNo]
,
[Accounts].[Accounts].Properties("key")
=
[AccountResponsible].[AccountResponsible].Properties("key")
)
,[AccountResponsible].[Responsible].&[MSA]
)
} ON 1
FROM mycube;
the problem is, that there are two columns, and I can't use that in cube security. Is there a way to rewrite this, so that I actually get only one column with the members that the user are allowed to see?
Try using the Extract function:
SELECT
{} ON 0
,
EXTRACT(
{
Exists
(
Filter
(
[Accounts].[Accounts].[AccountNo]
*
[AccountResponsible].[AccountResponsible].[AccountNo]
,
[Accounts].[Accounts].Properties("key")
=
[AccountResponsible].[AccountResponsible].Properties("key")
)
,[AccountResponsible].[Responsible].&[MSA]
)
}
,[Accounts].[Accounts] //<<HIERARCHY YOU WISH TO EXTRACT
) ON 1
FROM mycube;

MDX query - best salesmen who sold all of given products

Let's say I have two simple dimensions:
Products - with id and name
Salesmen - with id and name
My fact table is named SALES and contains the ids of the abovementioned.
I need to produce a query that will show the names of salesmen who sold all of the given products.
This code solves the problem for two items X and Y:
SELECT
{} on 0,
EXISTS(
EXISTS(
{[Salesmen].[Name].MEMBERS},
{[Products].[Name].&[X]}
)
,{[Products].[Name].&[Y]}
)
ON 1
FROM [Test];
The other version is:
SELECT
{} on 0,
INTERSECT(
NONEMPTY(
{[Salesmen].[Name].MEMBERS}
,([Products].[Name].&[X])
)
,NONEMPTY(
{[Salesmen].[Name].MEMBERS}
,([Products].[Name].&[Y])
)
)
ON 1
FROM [Test];
However, this method becomes troublesome if the list of given products is large, for example - 100 random products..
Do you have a property member_key for the hierarchy [Products].[Name] ? We can test like this:
WITH
MEMBER [Measures].[Meas1] AS
[Products].[Name].CurrentMember.PROPERTIES("KEY ID")
MEMBER [Measures].[Meas2] AS
[Products].[Name].CurrentMember.MEMBER_Key
MEMBER [Measures].[Meas3] AS
[Products].[Name].CurrentMember.MEMBERvalue
select
{
[Measures].[Meas1]
,[Measures].[Meas2]
,[Measures].[Meas3]
} on COLUMNS,
[Products].[Name].MEMBERS on ROWS
FROM [Test];
Hopefully one of the custom measures gives you a value? I'll assume Meas2 is working (swap to a different one if Meas1 or Meas3 is returning numbers)
WITH
MEMBER [Measures].[Meas2] AS
[Products].[Name].CurrentMember.MEMBER_Key
SET [ProdsetA] AS
FILTER(
[Products].[Name].MEMBERS
,[Measures].[Meas2] <100
)
SET [ProdsetB] AS
FILTER(
[Products].[Name].MEMBERS
,[Measures].[Meas2] >500
)
SELECT
{} on 0,
INTERSECT(
NONEMPTY(
{[Salesmen].[Name].MEMBERS}
,[ProdsetA]
)
,NONEMPTY(
{[Salesmen].[Name].MEMBERS}
,[ProdsetB]
)
)
ON 1
FROM [Test];
... the >100 and <500 are important. These are the criteria for the filter function to use. The custom set [ProdsetA] will only contain Products that have MEMBER_Key that are <100 whereas the custom set [ProdsetB] will only contain Products that have MEMBER_Key that are >500. You need to use the member values presented to you by the first script to decide what values 100 and 500 should be in your cube context (...I don't know the key values in your cube so just used 100 and 500 as placeholders)