Binary Sankey Diagram in Tableau - Not All Activities Match The Corresponding Number of KPIs - data-visualization

How do I link my activities variable to only the corresponding KPIs variable?
Using guidance from a number of sources, but primarily the genius of Jeffery Shafer articulated through the SuperDataScience video, I built a Sankey Diagram for my work. For the most part it works, however, I have been trying to figure out how to adjust my Sankey Diagram model to line up each activity with ONLY the corresponding KPIs, but am having no luck.
The data structure looks like this:
You'll note I changed the binary value to "", 2 instead of 0, 1 as it makes visual calculations easier. For the "Viz" variable, I have "Activity" for the raw data set, then I copy/paste/replicate the data to mirror the data (required for the model) but with "KPI" for the mirrored data.
In the following image, you'll see my main issue is that the smallest represented activity still shows as corresponding to all KPIs when in fact it does not. I want activity to line up only with the corresponding KPIs as some activities don't correspond with all, or even any, KPIs.
Finally, here is the model very similar to what the above video link shows:
Can someone help provide insight into how I can adjust the model to fit activities linking only to corresponding KPIs? I appreciate any insight. Thanks!

I have a solution to the issue, thanks to a helpful Tableau support member named Anthony. It was in the data structure. The data was not structured to only associate "Activities" with their "KPI" values within Tableau's requirements, but every "Activities" value with every "KPI" value. As a result, to achieve the desired result, the data needs to be restructured to only contain a row for every valid "Activities" and "KPI" combination. See the visual below where data is removed to format properly:
-------------------------------------->
Once the table is restructured, the desired visual result should configure with the model. It works like a charm!
Good luck out there!

Related

Insert ceros instead of interopolate ARIMA_PLUS bigquery

I want to do ARIMA_plus forecasting on a series of sale records. The problem is that sale records only contain sales. When doing the forecast we need to insert for every product the "non sales", which, essentially, are rows with the import column set to cero for every day the product has not been sold. We have here two options:
Fill the database with those zero-rows (uses a lot of space)
When doing the forecasting with ARIMA_PLUS in bigquery tell the model to fill with zeros instead of interpolating (default and seemingly unique option).
I want to follow the second option, yet, i dont see how. Here you can see a screenshot of the documentation Google info about interpolation
The first option would be carried out with a merge, nevertheless I would prefer to discard it since it increases the size of the sales table.
I have scanned the documentation and havent seen any solution
You need to provide an input dataset covering the missing values with the right method for your use case.
In other words, the SQL query must solve the interpolation so that the input for the model already contains the expected data.
You can, for example, create a query to add a liner interpolation solution for your use case.
So, the first approach you mentioned can be solved using that input SQL (rather than adding the data to the source table) and the second approach is not valid in bigquery, as far as I know.
Here you have an example: https://justrocketscience.com/post/interpolation_sql/

BIRT Filter by Parameters Return a Blank Report When it Shouldn't

let me further elaborate on my concern:
I am working on some test reports on BIRT to be familiarized with it and came across an unsettling problem.
I created a data source that connects to a test SQL Server database, a data source that will return building, floor, room, and the number of employees that contain more than one employee, and two String parameters that lets the user choose the building and floor so the report filters by just that one.
The problem happens when I test it with a building and floor that I know are in the result set. For some reason the report filters a blank result, as if the building and floors are not present in the data set result.
I tried filtering by just the building first and then the floor but the same thing happens. If I take out the filter then the report shows up without a problem.
Why does this happen? I am assuming it's the way I input the parameters, but I am not sure.
Can anyone help me? Thanks!

text information retrieve result analysis dataset (text)

I had created the text semantic search engine. However, I cannot find the data set which is labeled so that I can evaluate the information retrieve of my system.
Is there any public available document (text) which is labeled. As I would need the text document to evaluate the information retrieve result. (recall, precision, F1 value...)
Thanks.
I do research in this direction. In all my research, i have used AOL dataset which consists of ~20M web queries collected from ~650k users over three months (March 01, 2006 to May 31, 2006). The data is sorted by anonymous user ID and sequentially arranged.
The data set includes {AnonID, Query, QueryTime, ItemRank, ClickURL}. More details can be found in the link mentioned above. I am interested to know how you have implemented and if possible, share your engine's code. I am also interested to know the performance on AOL dataset in your search engine.
You can find the dataset in my git repository. Thanks!

Arranging dimentions for clustering with SSAS

I am having some trouble with SSAS and data mining - specifically the Microsoft Clustering package.
I intend to ultimately do my work in AMO and MDX, but for now, just happy to understand how it works in the BIDS via Visual Studio. One step at a time!
The whole problem is around clustering both "vertically" and "horizontally" (separately) from a table that is organized vertically. My main source data table in my OLTP database looks like =>
ID_NUM
{numbers 1 - 20,000}
TECK_ID
{numbers 1-500, {for each ID_NUM}}
(though just grabbed a few of these for playing around with the data in the screencaps)
TECK_VALUE
{a double, the 'fact' bit}
So- 10 million rows, of two ints and a double.
Which looks like this- http://i.imgur.com/KG1LhaJ.jpg
So I create a new Analysis Services project in Visual Studio, set up a Data Source, and bring in the above table, as well as two "dimension tables" (identity of what id_num is, names of what each teck_id is) into a Data Source View and link it up, matching up the appropriate keys.
Which looks like this- http://i.imgur.com/Q0vgwIc.jpg
Next I want to manipulate how my data is represented, so I go to set up a cube from this Data Source View. I create dimensions based on my two "dimension" tables (the above "id_num" primary key one, and the "teck_id" primary key one), and create a single measure (as sum) of the teck_value column from my main table. This all seems to compile successfully.
Which looks like this- http://i.imgur.com/y5pUSjh.jpg
The reason I think everything has worked well is I can arrange my data how I want by browsing the cube. I am able to define my "rows" as both the id_num, or as the "teck_id", with the other one filling up the columns. The measure "Teck_value" always makes up the dataset of the table. This is exactly how I want it, the flexibility to arrange my data both ways.
Which looks like this- http://i.imgur.com/ugLUkgg.jpg
And this- http://i.imgur.com/RwQgj58.jpg
Beautiful! Now I wish to do some mining on this basis!
I wish to, quite simply, using Microsoft Clustering to (separately) -
Assign each TECK_ID a cluster number based on how it varies on each ID_NUM
Assign each ID_NUM a cluster based on how it varies on each TECK_ID
Seemingly a simple requirement - just changing what is represented as "rows" and what as "columns" - which I already appear to be able to do through the cube browser. This seems to be one of the main points of OLAP rather than OLTP from my uneducated perspective!
Yet when I try to set this up I fail utterly!
The Clustering Wizard leaves me confounded and I come up with nonsense results. I am given the option of selecting a key (for which I can choose either of the above), but no option to parse by the other dimension. Indeed, the only thing I can choose to mine on is TECK_VALUE, which isn't any good as that doesn't separate out the different fields!
My wizard looks like this- http://i.imgur.com/lHfasv0.jpg
So, I am left in a pickle. I really don't want to go back and line up my OLTP databases horizontally because 1) this would mean having 20k columns when I try to categorize my TECK_IDs. and 2) I was hoping SSAS and OLAP can give me the flexibility I need to mine the fields that I want - isn't that part of the reason you set up a cube "chop up the data how you like" ?
Bonus points for helping me with the AMO / MDX side as well! :)

Accommodating Dynamic Hierarchies in a Data Warehouse Model

I am building a data warehouse for the company's (which I am working for) core ERP application, for a particular client.
Most of the data in the source database, which is related to hierarchies in the data warehouse are in columns as shown below:
But traditionally the model to store dimension data according to my knowledge is as:
I could pivot the data and fit them in the model shown above. But the issue comes when a user introduces a new hierarchy value. Say for instance the user in the future decides to define a new level called Product Sub Category. Then my entire data warehouse model will collapse without a way to accommodate the new hierarchy level defined.
Do let me know a way to overcome this situation.
I hope my answer is clear enough. Just let me know if further details are needed.
Well, nothing should collapse -- the ETL should extract and load the data as always.
Here are a few options to consider:
Simply add one more column for the new hierarchy to the dimProduct.
Try using hierarchy helper table.
Consider adding path string attribute to the dimProduct.