SSAS and calculated dimention from multiple other dims - ssas

Given 3 dimentions DimA and DimB and DimC with some DimSk, DimId and DimName as attributes the problem is defined as "add to cube new attribute which is calculated as":
NewAttr =
CASE
WHEN DimC.DimId IN (1, 2, 3) THEN 'A ' + DimA.DimName
WHEN DimC.DimId IN (4, 5, 6) THEN 'B ' + DimB.DimName
END
All dimentions are referenced directly from Fact by SKs.
How would You solve this in multidim SSAS cube?
!Warning! Spoilers below - try to think about solution before reading about my!
My current approach is to calculate CROSS JOIN (~100x100x100) beetween Dims IDs.
Then I can calculate composite NK for DimNew as ID ~ DimA.DimId+|+DimB.DimId+|+DimC.DimId.
Then I can add this ID to Fact ETLs too, and build new ETL for new dim with NewAttr as expected.
Then I can add in cube new dim and add new fact column and join them by ID/SK.
Should work, or is there 10x better solution?
Final Fact can be like:
FactId, DimASK, DimBSK, DimCSK, DimNewSK
or
FactId, DimASK, DimBSK, DimCSK, NewAttr
second one is fast and dirty without NewDim on db - but fact can be processed partially so distinct can produce different NewAttr for same composite NK when DimNamein dims will change in time...

Okey dokey!
Seems like if You have all SKs in Fact then add new dimention with all needed, then calculate NewAtr, and then make new Dim in cube with all SKs as composite key - its key step.
Its important to not use DimNewSK from new dim from identity or something as key in cube.
Composite Key allows to make simple regular relation to fact and works without additional dummy keys.
Pictures are 1000s of words - its as simple as looks after all:

Related

How to sort connection type into only 2 rows in Qlik sense

I have a column named Con_TYPE in which there are multiple types of connections such as fiberoptic, satellite, 3g etc.
And I want to sort them only into 2 rows:
fiberoptic
5
others
115
Can anybody help me?
Thanks in advance
You can use Calculated dimension or Mapping load
Lets imagine that the data, in its raw form, looks like this:
dimension: Con_TYPE
measure: Sum(value)
Calculated dimension
You can add expressions inside the dimension. If we have a simple if statement as an expression then the result is:
dimension: =if(Con_TYPE = 'fiberoptic', Con_TYPE, 'other')
measure: Sum(Value)
Mapping load
Mapping load is a script function so we'll have to change the script a bit:
// Define the mapping. In our case we want to map only one value:
// fiberoptic -> fiberoptic
// we just want "fiberoptic" to be shown the same "fiberoptic"
TypeMapping:
Mapping
Load * inline [
Old, New
fiberoptic, fiberoptic
];
RawData:
Load
Con_TYPE,
value,
// --> this is where the mapping is applied
// using the TypeMapping, defined above we are mapping the values
// in Con_TYPE field. The third parameter specifies what value
// should be given if the field value is not found in the
// mapping table. In our case we'll chose "Other"
ApplyMap('TypeMapping', Con_TYPE, 'Other') as Con_TYPE_Mapped
;
Load * inline [
Con_TYPE , value
fiberoptic, 10
satellite , 1
3g , 7
];
// No need to drop "TypeMapping" table since its defined with the
// "Mapping" prefix and Qlik will auto drop it at the end of the script
And we can use the new field Con_TYPE_Mapped in the ui. And the result is:
dimension: Con_TYPE_Mapped
measure: Sum(Value)
Pros/Cons
calculated dimension
+ easy to use
+ only UI change
- leads to performance issues on mid/large datasets
- have to be defined (copy/paste) per table/chart. Which might lead to complications if have to be changed across the whole app (it have to be changed in each object where defined)
mapping load
+ no performance issues (just another field)
+ the mapping table can be defined inline or loaded from an external source (excel, csv, db etc)
+ the new field can be used across the whole app and changing the values in the script will not require table/chart change
- requires reload if the mapping is changed
P.S. In both cases selecting Other in the tables will correctly filter the values and will show data only for 3g and satellite

Oracle spatial request working on one instance and not on another

I have this statement that is generated by Geoserver
SELECT
shape AS shape
FROM
(
SELECT
c.chantier_id id,
sdo_geom.sdo_buffer(c.shape, m.diminfo, 1) shape,
c.datedebut datedebut,
c.datefin datefin,
o.nom operation,
c.brouillon brouillon,
e.code etat,
u.utilisateur_id utilisateur,
u.groupe_id groupe
FROM
user_sdo_geom_metadata m, lyv_chantier c
JOIN lyv_utilisateur u ON c.createur_id = u.utilisateur_id
JOIN lyv_etat e ON c.etat_id = e.etat_id
JOIN lyv_operation o ON c.operation = o.id
WHERE
m.table_name = 'LYV_CHANTIER'
AND m.column_name = 'SHAPE'
) vtable
WHERE
( brouillon = 0
AND ( etat != 'archive'
OR etat IS NULL )
AND sdo_filter(shape, mdsys.sdo_geometry(2003, 4326, NULL, mdsys.sdo_elem_info_array(1, 1003, 1), mdsys.sdo_ordinate_array(
2.23365783691406, 48.665657043457, 2.23365783691406, 48.9341354370117, 2.76649475097656, 48.9341354370117, 2.76649475097656, 48.665657043457, 2.23365783691406, 48.665657043457)), 'mask=anyinteract querytype=WINDOW') = 'TRUE' );
On my local instance (dockerized if that can explain anything) it works fine, but on another instance I get an error :
ORA-13226: interface not supported without a spatial index
I guess that the SDO_FILTER is applied to the result of SDO_BUFFER which is therefore not indexed.
But why is it working on my local instance ?!
Is there some kind of weird configuration shenanigan that could explain the different behavior maybe ?
EDIT : The idea behind this is to get around a bug in Geoserver with Oracle databases where it renders only the first point of MultiPoint geometries, but works fine with MutltiPolygon.
I am using a SQL view as layer in Geoserver (hence the subselect I guess).
First, you need to do some debugging here.
Connect to each instance, on the same user as your Geoserver's datasource, and run the sql. From the same connections (in each instance) you must also verify that the user's metadata view (user_sdo_geom_metadata) have an entry for the table and the table has a spatial index - whose owner is the same user as the one you connect.
Also, your query ( select ... from 'vtable') has a column 'shape' which is a buffer of the column lyv_chantier.shape. The sdo_filter, in this sql, expects a spatial index on the vtable.shape - which cannot exist. You should try to use a different alias (e.g. buf_shape) and sdo_filter(buf_shape,...) - to see if the sql fails in both instances, as it should.
I'm in a bit of a hurry right now, so my instructions are summarized. If you want, do this debugging and post the results. We then can go into details.
EDIT: Judging from your efforts, I'd say that the simplest approach is: 1) add a second geometry column to lyv_chantier (e.g. buf_shp). 2) update lyv_chantier set buf_shp = sdo_geom.sdo_buffer(shape,...). 3) insert into user_sdo_geom_metadata the values (lyv_chantier, buf_shp, ...). 4) create a spatial index on column buf_shp. You may need to consider a trigger to update buf_shp whenever shape changes...
This is a very practical approach but you don't provide any info about your case (what is the oracle version, how many rows does the table have, how is it used, why do you want to use sdo_buffer, etc), so that's my recommendation for now.
Also, since you are, most likely, using an sql view as layer in Geoserver (you don't say anything about that, either), you could also consider using pure GS functionality to achieve your goal.
At the end, without describing your goal, it's difficult to provide anything more tailor-made.

ADO.NET - Accessing Each DataView in DataViewManager

Looks like a silly question, but I can't find a way to access the DataViews in my DataViewManager.
I can see it in the DataViewManager Visualizer window, so there must be a way.
What am I doing wrong?
dvm = New DataViewManager(MyDS) ''-- MyDS is a strongly typed dataset
dvm.CreateDataView(MyDS.Company)
dvm.CreateDataView(MyDS.Sites)
MsgBox(dvm.DataViewSettings.Count) ''-- shows 7, even though I added only 2.
For Each view As DataView In dvm ''-- Error!
MsgBox(view.Table.TableName)
Next
I also observed that irrespective of how many DataViews I create, data the DataViewManager Visualizer shows all DataViews in my dataset. Why?
how do I hide those rows in parent whose child data view returns 0 rows after applying RowFilter on child
I've done it like this, but it feels like a nasty hack; I've never read the source deeply enough to know if there is a better way:
Add a column to your child datatable: Name: IsShowing, Type: Int, Expression: 1, ReadOnly: True
Put the following code:
ChildBindingSource.RemoveFilter()
ParentBindingSource.RemoveFilter()
YourDataSet.ChildDataTable.IsShowingColumn.Expression = ""
YourDataSet.ChildDataTable.Expression = $"IIF([SomeColumn] Like '{SomeFilterText}',1,0)"
ChildBindingSource.Filter = "[IsShowing] > 0"
ParentBindingSource.Filter = "Sum(Child.IsShowing) > 0"
The removal and re-add triggers a re-evaluation of the expression and the filters. There is probably a way to do this without removing/re-adding but I haven't yet found it.. Expressions are normally only re-evaluated when row data changes; changing an expression doesn't seem to recalculate all the row values/trigger a refresh of the relations and BS filters
It would be great if the parent filter supported complex expressions like SUM(IIF(Child.SomeColumn = 'SomeFilter',1,0)>0 but the SUM operator expects only a column name in the parent or child. As such, the circuitous route of having a column with an Expression be the part inside the SUM is the only way i've found to leverage the built in filtering
Remember when you test that the search is case sensitive. If you want it not to be you might have to have another column of data that is the lowercase version of what you want to search and lowercase your query string

Is there a more efficient way to return all records in a "traversed" link'ed list?

I have a generic (non-V) class with a LINKMAP field that links a chain of time-series data. Each record links to the next record based on a key of the next month/day/hour/etc. and a value of the #rid of the next record. I could use edges (non-lightweight, with a single property the same as the LINKMAP key), but I don't want to because I only need a one-direction link and want to use the least amount of storage space (plus my assumption is that while using edges might offer slightly easier querying, it wouldn't offer any benefit in terms of performance or storage space).
I currently have the following batch query:
begin;
let r = select from data where key='AAA';
let y = select expand(links['2017']) from $r;
let m = select expand(links['07']) from $y;
let d = select expand(links['15']) from $m;
let h = select expand(links['10']) from $d;
return (select $r[0], $y[0], $m[0], $d[0], $h[0])
This works, in the sense that I get a result set with the #rids of each record in the link'ed list.
I have two questions:
Is there a more efficient way to do this query? I've tried various TRAVERSE options with $path and such, but nothing else I attempted worked at all. I did not try MATCH because I'm not using edges.
The current result set is flat, with each value prop ($r, $y, etc.) containing the corresponding #rid. That's OK for my purposes, but I'd like to know if there's a way to return the actual records instead of just the #rids. Nothing I've tried works.
PS - I'm using orientjs, but the batch runs identically from the console as well.

Django ORM Cross Product

I have three models:
class Customer(models.Model):
pass
class IssueType(models.Model):
pass
class IssueTypeConfigPerCustomer(models.Model):
customer=models.ForeignKey(Customer)
issue_type=models.ForeignKey(IssueType)
class Meta:
unique_together=[('customer', 'issue_type')]
How can I find all tuples of (custmer, issue_type) where there is no IssueTypeConfigPerCustomer object?
I want to avoid a loop in Python. A solution which solves this in the DB would be preferred.
Background: for every customer and for every issue-type, there should be a config in the DB.
If you can afford to make one database trip for each issue type, try something like this untested snippet:
def lacking_configs():
for issue_type in IssueType.objects.all():
for customer in Customer.objects.filter(
issuetypeconfigpercustomer__issue_type=None
):
yield customer, issue_type
missing = list(lacking_configs())
This is probably OK unless you have a lot of issue types or if you are doing this several times per second, but you may also consider having a sensible default instead of making a config object mandatory for each combination of issue type and customer (IMHO it is a bit of a design-smell).
[update]
I updated the question: I want to avoid a loop in Python. A solution which solves this in the DB would be preferred.
In Django, every Queryset is either a list of Model instances or a dict (values querysets), so it is impossible to return the format you want (a list of tuples of Model) without some Python (and possibly multiple trips to the database).
The closest thing to a cross product would be using the "extra" method without a where parameter, but it involves raw SQL and knowing the underlying table name for the other model:
missing = Customer.objects.extra(
select={"issue_type_id": 'appname_issuetype.id'},
tables=['appname_issuetype']
)
As a result, each Customer object will have an extra attribute, "issue_type_id", containing the id of one IssueType. You can use the where parameter to filter based on NOT EXISTS (SELECT 1 FROM appname_issuetypeconfigpercustomer WHERE issuetype_id=appname_issuetype.id AND customer_id=appname_customer.id). Using the values method you can have something close to what you want - this is probably enough information to verify the rule and create the missing records. If you need other fields from IssueType just include them in the select argument.
In order to assemble a list of (Customer, IssueType) you need something like:
cross_product = [
(customer, IssueType.objects.get(pk=customer.issue_type_id))
for customer in
Customer.objects.extra(
select={"issue_type_id": 'appname_issuetype.id'},
tables=['appname_issuetype'],
where=["""
NOT EXISTS (
SELECT 1
FROM appname_issuetypeconfigpercustomer
WHERE issuetype_id=appname_issuetype.id
AND customer_id=appname_customer.id
)
"""]
)
]
Not only this requires the same number of trips to the database as the "generator" based version but IMHO it is also less portable, less readable and violates DRY. I guess you can lower the number of database queries to a couple using something like this:
missing = Customer.objects.extra(
select={"issue_type_id": 'appname_issuetype.id'},
tables=['appname_issuetype'],
where=["""
NOT EXISTS (
SELECT 1
FROM appname_issuetypeconfigpercustomer
WHERE issuetype_id=appname_issuetype.id
AND customer_id=appname_customer.id
)
"""]
)
issue_list = dict(
(issue.id, issue)
for issue in
IssueType.objects.filter(
pk__in=set(m.issue_type_id for m in missing)
)
)
cross_product = [(c, issue_list[c.issue_type_id]) for c in missing]
Bottom line: in the best case you make two queries at the cost of legibility and portability. Having sensible defaults is probably a better design compared to mandatory config for each combination of Customer and IssueType.
This is all untested, sorry if some homework was left for you.