Redaction in Tableau - data-visualization

I am currently in the process of building some Tableau workbooks where we will need to redact visualizations or text tables if the results fall below a certain threshold (e.g. only ten data points are returned after filters are applied). Does anyone know how to create calculated fields or know of other methods to redact in Tableau?

You can create a threshold filter that compares the number of filtered responses to a threshold value set in a parameter.
First, create a parameter with integer data type and set it to the desired threshold. In this example, I called it Count Threshold.
Then create a calculated field for the filter with an equation like the following:
{FIXED: COUNTD([Respondent ID]) >= [Count Threshold]}
(I did this for survey results where we needed to hide results if the filtered number of respondents was fewer than 10.)
For the threshold filter to be applied after your other filters, choose "Add to Context" for your other filters.

I found a partial solution on the Tableau community forum/knowledgebase about redaction that might work for other implementations.
The basic idea is to create two different calculated fields, one which displays a integer value and the other that displays a string value. That way, when both are concatenated in the display you get the desired output without breaking any of the calculated field rules.
So create a calculated field that has a formula like:
IF sum([Datafield_to_Redact]) < 10 THEN "*" ELSE str(sum([Datafield_to_Redact])) END
And another that has a calculated field that has a formula like:
IF sum([Datafield_to_Redact]) < 10 THEN null ELSE sum([Datafield_to_Redact]) END
In the post the attached workbook and screenshot show how the two values are concatenated in the Text mark.
Workbook screen capture

Related

Spotfire calculated column based on filter

I want to create a calculated column that is equal to the percent of the total of the previous column, but only for the rows that are selected.
For example, the two columns below show rows where I've filtered for only the rows of interest. The sum is accurate, but I want the percent of the total to be only out of sum of the currently selected rows rather than the absolute total (which is how it's currently being calculated). I want the percent of the total to dynamically change depending on what is filtered in the data table. Is this possible?
Image of my 2 columns:
I know spotfire somehow calculates this becuase when I insert a bar graph using % of Total(SumofComponents) the ratio is only out of the current total.
Image of my bar chart:
much like in programming, Spotfire has a rough concept of scope. and unfortunately, calculated columns are above filters in terms of scope; they have no concept of what is and is not filtered.
visualizations themselves, however, do*!
what you can do in this case is to put your expression on the Y axis (it looks like you've done this), and it will respect your selected filters*.
*caveats: there are a few ways that filtering can be negated on a given visualization:
Properties>>Appearance>>Show shadows indicating the unfiltered data. this option shows a grey bar that represents the data hidden by whatever filter selection was made
Properties>>Data>>Limit data using filterings. these options allow the viz to use separate filtering sets ("Filter Schemes" as they're called in Spotfire) or none at all (to ignore filters completely)
Properties>>Subsets. by default there are three subsets: "All Data" which ignores filters, "Not in current filtering" which inverts the filter selection (e.g., if you filter a boolean column to show only TRUE, this chart would show only FALSE), and "Current filtering" which is the default behavior. you can check the online help for additional subsets that you can add.

SSRS Dynamic Graphs

Does anyone know if it is possible to create a graph(s) at run time based on the dataset?
To clarify, I have a count of patients suffering from a health condition, split by week. I need to make a graph per condition with weeks on the X axis and patient count on the Y. Nice and easy so far.
The problem is that the number of conditions displayed in the dataset will be different depending on the values entered for the start and end dates for the reporting date range.
With this in mind can I create a single graph then tell it to replicate once for each condition returned and only look at the data for that specific condition?
The graphs can't appear in rows as they must aggregate data from multiple rows (where the condition is the same) and plot the various count values over week numbers (the dataset returns a count, a week number and a condition with a group by on the week number and condition)
As an added challenge none of this can be hard coded as the single report has to work across multiple sites.
Thanks
P
Yes this is possible by first dropping a "Matrix" control into the report surface. With the Matrix control, you're able to display groups of data. In this case, your group will be the condition returned in your DataSet. Each group will have an embedded chart which will display data the same way, but only the data within the grouping you choose.
Step 1: Add a Matrix control to your report surface. Create a Row Group based on Condition (In my example, Year)
Step 2: Right Click the empty column on the right side of the Matrix control, and choose the option for Insert Column > Outside Group - Right. Then Delete the middle column.
Step 3: Right click the Right columns "Data" cell (which should be outside the grouping) and choose Insert > Chart. Select the desired chart type.
Step 4: Resize the column and row to view the chart in more detail. Edit the Chart Data to aggregate what you're wanting to show as the line, and pick your category groups.
Step 5: Test, and revist whichever step above needs adjusting.

Qlikview: how to create summary table to filter multiple associated tables

I have 4-5 tables of single and many rows per ID. I want to generate a summary table listing each ID along with various counts and max/mins, but I want to be able to filter on calculations. Example: "ID" is the identifier and there are two tables, TestA and TestB.
One desired selection criteria: Show only those IDs where at least one TestA score >5 and there is at least one TestB score.
In a straight table, this is simple to do with expressions, but the resulting table cannot be selected on the calculated true/false value.
I think I need to create a new table in the load script containing the ID, and then various conditions labeled as I wish. Then, these fields could be dimensions. This seems similar in concept to a master calendar. Am I on the right track?
If it helps to understand my example, this a medical application; the tables are lab results and other interventions that each require complex queries pulling data from various sources that are very "hard-coded" to produce a small data set from millions of rows of highly normalized source data. The desired dimensions would be combinations of the labs so as to allow identification of patients who meet certain criteria--then, once filtered, there would be many more graphs and charts to identify what tests and procedures were followed for that group of patients.
My current data model just loads many tables which then associate on ID. I had attempted to load all data into one big table using concatenates and calculations, but this did not seem to accomplish what I needed and was difficult to manage.
IIUC, I think what you want to do can be accomplished with a combination of sliders/input boxes, variables and calculated dimensions in your table. The process is definitely burdensome, but it should allow you filter the way you want.
Add a field to your table load statement in your script like rnum as RowNo().
Create a variable for your filter(s). Ex. vFilterTestAScore.
Add a slider or input box to your dashboard and point it to that variable.
a. For slider, the option is in the General tab -> Data header -> select the Variable radio button.
b. For input box, add the correct var from the list to the list of Displayed Variables.
Set sliders/input boxes to the criteria you want: vFilterTestAScore = 5 and vFilterTestBScore = 1
Create a straight table with ID as the dimension and expressions for TestAScore and TestBScore. The expression formulas would be sum(TestAScore) and sum(TestBScore) respectively (this won't make sense until the next step).
Now add a calculated dimension to you table. The idea here is that rather than just having the ID dimension, you will create a calculated dimension that only displays the ID of the records that meet the criteria you select in the slider or enter in the input box. The formula should be something like:
if(aggr(sum(TestA), rnum) >= vFilterTestAScore, ID, null()) or for multiple filters: if((aggr(sum(TestA), rnum) >= vFilterTestAScore) and (aggr(sum(TestB), rnum) >= vFilterTestBScore), ID, null()).
On your new calculated dimension, check the 'Suppress When Value is Null' box so only results that meet your criteria are displayed in your table.
To summarize, you are using the variables to store your selection criteria which you are entering via input box or slider. Then you are conditionally displaying only the ID's in your table that match those criteria via a calculated dimension and 'Suppress When Null' option.
I can send you a .qvw if you aren't using the free personal edition and are able to open other qvw's.

Reporting Services - Two filters on the same chart Category Group?

I have sales data that I'd like to plot on my chart. However, at a specific point in time, we had a change taking place I'd like to ensure is clearly visible in the chart, preferably by dividing the sales data (which is stored in a single SQL Server column) into two different chunks, which would allow me to then treat them as different data series.
I used to solve this in Excel by storing the post-event data in a different column (by simply dragging them to a different column), and thus I was able to treat them as a different series (the blue and green line in the chart below. The red and orange line are pre-event and post-event averages):
I'd like to reproduce this effect in SSRS, but am not sure how to tackle it. I've tried using an approach where I added two category groups, both pointing to the date-time column, and applying filters to them (one <= the cutoff date, the other >=).
I then added my sales data twice, with the idea I could somehow connect them to the individual category groups, but that does not seem possible.
Has anyone tried anything like this before, or would have a different approach to achieve what I'm trying to get?
Thanks!
I managed to get this to work, and figured I'd share how to do it.
My dataset contains a field called DATEKEY, which stores the date in the format YYYYMMDD. It's possible to use this in an expression and evaluate the date for a specific row. In case the expression evaluates to true, we display the value. If not, we display a blank string.
In case we want to show the values prior to the date, the expression would be:
=IIF(Fields!DATEKEY.Value <= 20130601, Avg(Fields!My_NUMBER.Value), "")
The second series can then be made by reversing the symbol:
=IIF(Fields!DATEKEY.Value >= 20130601, Avg(Fields!My_NUMBER.Value), "")
The graph then looks like this:

Qlikview Expression to get Average in comparison to selected criteria for Chart

I am trying to create a chart that displays the average of a parameter of an entire category. The problem is that in sheet 1,2,3,4, etc, the charts I have reflect one parameter, so when I go to compare this parameter to the average in my new sheet, I only get the average of the fields that have been selected in prior sheets. I want it so that I can create a new chart with one bar as an average of the entire population, and another bar in the same chart that depicts the average of the selected parameter(s). Ideally, I would have two dimensions/x axis categories, one for the selected criteria, and one for average of all categories. Can someone help?
Thanks.
I believe the way to handle this is with set analysis. You can create one expression with the normal avg(parameter) and another expression with some embedded set analysis. See example below.
Expression1: avg(parameter) // will agregate everything in the current selection
Expression2: avg({1<category='YourCategory'>}parameter) // will aggregate everything in a category that you set in the set analysis syntax.
With some more specific info I would be able to get you an actual expression, but hopefully this helps.
EDIT
Updated to reflect new information provided in the comments.
Expression1: avg(parameter)
Expression2: avg({1}parameter)
In set analysis syntax the {1} means the entire universe of data where as {$} is your current selection. In other words, 1 is your data with nothing selected. In my first example above, the {1<category='YourCategory'>} statement uses a modifier signified by the <> so that you would be performing the aggregation over the entire universe with only the value specified in YourCategory selected.
So if the scenario is that you have currently selected let's say, 5 zip codes, your current selection ({$}) would be those 5 zip codes and the entire universe ({1}) will be as if you selected no zip codes. Therefore comparing those two populations side-by-side in two Expressions should give you the comparison you want.
Caveat: Using the {1} syntax in your expression means that your average of "all zip codes" will always be the average with nothing selected unless you put something in the modifier as in my first example. See this more detailed explanation.