Issue displaying empty value of repeated columns in Google Data Studio - google-bigquery

I've got an issue when trying to visualize in Google Data Studio some information from a denormalized table.
Context: I want to gather all the contact of a company and there related orders in a table in Big Query. Contacts can have no order or multiple orders. Following Big Query best practice, this table is denormalized and all the orders for a client are in arrays of struct. It looks like this:
Fields Examples:
+-------+------------+-------------+-----------+
| Row # | Contact_Id | Orders.date | Orders.id |
+-------+------------+-------------+-----------+
|- 1 | 23 | 2019-02-05 | CB1 |
| | | 2020-03-02 | CB293 |
|- 2 | 2321 | - | - |
|- 3 | 77 | 2010-09-03 | AX3 |
+-------+------------+-------------+-----------+
The issue is when I want to use this table as a data source in Data Studio.
For instance, if I build a table with Contact_Id as dimension, everything is fine and I can see all my contacts. However, if I add any dimensions from the Orders struct, all info from contact with no orders are not displayed. For instance, all info from Contact_Id 2321 is removed from the table.
Have you find any workaround to visualize these empty arrays (for instance as null values)?
The only solution I've found is to build an intermediary table with the orders unnested.

The way I've just discovered to work around this is to add an extra field in my DS-> BQ connector:
ARRAY_LENGTH(fields.orders) AS numberoforders
This will return zero if the array is empty - you can then create calculated fields within DataStudio - using the "numberoforders" field to force values to NULL or zero.

You can fix this behaviour by changing a little your query on the BigQuery connector.
Instead of doing this:
SELECT
Contact_id,
Orders
FROM myproject.mydataset.mytable
try this:
SELECT
Contact_id,
IF(ARRAY_LENGTH(Orders) > 0, Orders, [STRUCT(CAST(NULL AS DATE) AS date, CAST(NULL AS STRING) AS id)]) AS Orders
FROM myproject.mydataset.mytable
This way you are forcing your repeated field to have, at least, an array containing NULL values and hence Data Studio will represent those missing values.
Also, if you want to create new calculated fields using one of the nested fields, you should check before if the value is NULL to avoid filling all NULL values. For example, if you have a repeated and nested field which can be 1 or 0, and you want to create a calculated field swaping the value, you should do:
IF(myfield.key IS NOT NULL, IF(myfield.key = 1, 0, 1), NULL)
Here you can see what happens if you check before swaping and if you don't:
Original value No check Check
1 0 0
0 1 1
NULL 1 NULL
1 0 0
NULL 1 NULL

Related

SELECT MAX values for duplicate values in another column

I am having some trouble finding an answer for this one, so I apologize if it was somewhere else.
I have a table 'dbo.MileageImport' that has the following layout which I pulled to find duplicate entries:
|KEY | DATA |
---------------------
|V9864653 | 180288 |
|V9864653 | 22189 |
|V9864811 | 11464 |
|V9864811 | 12688 |
What I am having troubles with is when I run the following SQL in a DB2 environment:
SELECT KEY, MIN(DATA)
FROM dbo.MileageImport
GROUP BY KEY
HAVING (COUNT(KEY)>1);
It ends up pulling the following data:
|KEY | DATA |
---------------------
|V9864811 | 11464 |
|V9864653 | 180288 |
For some reason it's pulling the MIN value for V9864811, but not V9864653. If I inverse that and put MAX instead of MIN, it pulls the opposite values.
Is there something I am missing here so I can pull the MIN DATA value for only duplicate KEY records, or is there another way to do this? The report where this data comes from changes from month to month, so there could be different keys that end up being duplicated that I need to correct. Ultimately I am turning this into a DELETE statement to delete the lower of the two (or more) duplicated mileage entries.
Is your DATA column numerical? or a VARCHAR?
If you find its better to change it to a number if you can, maybe an integer if you aren't having any fractions and its just round numbers.
if not, then you could cast them to an integer value, but if there are lots of transactions or its a big table it will be slow and not ideal. Its bad practise to do that if you could just change the datatype!
SELECT KEY, MIN(CAST(DATA as Int))
FROM dbo.MileageImport
GROUP BY KEY
HAVING (COUNT(KEY)>1)

Collecting multiple values from different rows and creating columns from them in SQL

I want to import data from one sql database to another. The database containing the data is structured differently than the one I have now.
My database has the tables Person and Person_Data
Person columns:
id(PK, int) | Person_Name(text)| Person_Data_id(FK, int)
Person_Data columns:
Person_Data_id(PK, int) | Date_Of_Birth(text)| City_Of_Birth(text) | Favorite_City(text)|
The other database has the neccesary data to populate this, but is structured a bit differently. It has these tables:
ExternalPerson, ExternalProperty
ExternalPerson columns:
|PersonID(PK, int) | Name(string) |
| 0 |"John" |
| 1 |"Bob" |
ExternalProperty columns:
|PersonId|PropertyName|PropertyAttribute|PropertyValue|
| 0 |"Birth" | "City" |"Rome" |
| 1 |"Birth" | "City" |"Vienna" |
| 0 |"Birth" | "Date" |"1982-02-01" |
| 0 |"Favorite" | "City" |"New York" |
As you can see, the external database contains information that could be inserted in the regular one. It's just that some of the columns are stored in rows instead. I want to merge it, so that, for each PersonID, we pick up the Value for Birth and City and put it in City_Of_Birth etc. The external database is structured so that each combination of PersonID, PropertName and PropertyAttribute only has one row, so there is no risk for disambiguity. All combinations of PropertyName and PropertyAttribute present in the external database also have a correcponding column in the Person_Data table. There might be missing data though, for example in our case, Bob does not have a value for date of Birth or Favorite city, in which case those entries should be null. That is, I want to transform the two tables ExternalPerson and ExternalProperty into
|id(PK, int)|Name |Date_Of_Birth|City_Of_Birth|Favorite_City|
|auto |"John"|"1982-02-01" |"Rome" |"New York" |
|increment |"Bob" | NULL |"Vienna" |NULL |
I have tried various combinations of JOIN, GROUP BY, SELECT CASE WHEN and COALESCE to no avail. I feel like this should be possible to do, but have not succeded to find the SQL commands to extract the rows from the external database into columns. For example the line
SELECT
Name,
PropertValue AS City_Of_Birth
FROM
ExternalProperty
WHERE PropertyName LIKE 'Birth' AND PropertyAttribute LIKE 'City'
will output the City_Of_Birth in a single column together with Name, but I don't know how to aggregate the result.
Does anybody have any idea on how to do this? Thanks in advance.
I am using Microsoft SQL Server Management Studio 2017 and Microsoft SQL Server 2017 (RTM) - 14.0.1000.169 (X64)
You can aggregate with MAX()
MAX(CASE WHEN PropertyName LIKE 'Birth' AND PropertyAttribute LIKE 'City' THEN PropertyValue ELSE NULL END) AS City_Of_Birth

ssrs report report filter with no duplicates used in query

I am having an issue and I'm not sure how to solve it.
I have an SSRS report that pulls from a table. I want a parameter filter to show de-duplicated values based on available options in one of the columns.
So my dataset with a query like:
SELECT * FROM table1 WITH (NOLOCK) WHERE col1 IN (#param)
Then I want a parameter called param that gets its available and default values from col1 in the above data set and I want them to be de-duplicated.
From reading online I learned I have to create a dummy param and use VBA code to de-duplicate that list.
So I have these params:
param_dummy that gets its available and default values from col1 in the above dataset
param that gets a de-duplicate list from param_dummy using Code.RemoveDuplicates
But I'm having an issue with circular logic. param gets its value from param_default which gets its value from the dataset/query which uses param.
How can I solve this?
One thought is to remove the WHERE col1 IN (#param) and instead use a filter on the Tablix table in the SSRS report. This works but I am wondering how efficient it is.
And/or if anyone has any other suggestions I am all ears.
Updated to add more details...
So let us say I have a table in my DB like so:
| id | col1 | col2 |
|----|------|--------|
| 1 | a | hello |
| 2 | b | how |
| 3 | a | are |
| 4 | c | you |
| 5 | d | on |
| 6 | a | this |
| 7 | b | lovely |
| 8 | c | day |
What I want is:
a Tablix to show all the fields from the table
a filter where the user can select between the available dropdowns in col1 (de-duplicated)
a text filter that allows nulls where a user can filter on col2
the parameters will have default values so the table will load on page load
So I have a dataset with a query like so:
SELECT
*
FROM dbo.table1
WHERE col1 IN (#col1options) AND (#col2value IS NULL OR col2 = #col2value)
Then for col1options I would make available and default options be Get values from a query and I would use the above dataset and col1.
But this won't work since the query/dataset depends on col1options which gets its default values from the query/dataset.
I can use a second dataset but that means making multiple calls to the SQL server and I want to avoid that.
I'm not sure I understand your issue so this is a guess...
If you mean you want to be able to filter your data by choosing one or more entries from a specific column in the table, but this column has duplicates and you want your parameter list to not show duplicates then this is what do to.
Create a new report
Add dataset dsMain as SELECT * FROM myTable WHERE myColumn IN (#myParam)
Add dataset dsParamValues as SELECT DISTINCT myColumn FROM myTable ORDER BY myColumn
Edit the #myParam parameter properties and set the available and default values to a query, then choose dsParamValues
Add you table/matrix control and set it's dataset property to dsMain
Found an easier solution.
Follow this link to build the "dummy" hidden parameter, the visible paramter and the de-dupe VBA code
Add a tablix properties filter where param is in the visible / non-hidden parameter from above VBA (FYI double click to add parameter)
Adding via double click will append a (0) at the end, remove the (0)
It should work as expected at that point! You should be able to select one, some or all parameters and your report should update accordingly.

Create new dimension using values from another dimension in SQL?

I currently have a SQL table that looks something like this:
RuleName | RuleGroup
---------------------------
Backdated task | DRFHA
Incorrect Num | FRCLSR
Incomplete close | CFPBDO
Appeal close | CFPBDO
Needs letter | CFPBCRE
Plan ND | DO
B7IND | CORE
I am currently writing a stored procedure in SSMS that pulls these dimensions from the existing table. However, I also want the procedure to create a new dimension that will create a "SuperGroup" dimension for each rule based on the text in it's RuleGroup (and an other column for the rest). For example:
RuleName | RuleGroup | SuperGroup
--------------------------------------------
Backdated task | DRFHA | Other
Incorrect Num | FRCLSR | Fore
Incomplete close | CFPBDO | DefaultOp
Appeal close | CFPBDO | DefaultOp
Needs letter | CFPBCRE | Core
Plan ND | DO | DefaultOp
B7IND | CORE | Core
I have currently tried used the "GROUP BY" function, as well as using SELECT with several "LIKE" statements. However, the issue is that this needs to be scaleable - although I only have 21 groups right now, I want to automatically sort if new groups are added.
Here is the SSMS procedure as well:
CREATE PROCEDURE [Rules].[PullRulesSpecifics]
AS
BEGIN
SELECT
ru.RuleName
ru.RuleGroup
FROM RuleData.groupings ru
WHERE 1=1
AND ru.ActiveRule = 1
AND ru.RuleOpen >= '2015-01-01'
Option 1: (the Normalized option)
Assuming that your database is well normalized, you should have a Foreign-Key constraint on your RuleGroup column that prevents users from entering whatever they like in there. This way, only valid RuleGroup values can be entered into the table. If this is that case (which I suspect it is not) then you can add a column to the Foreign-key table (the one that hold the list of valid RuleGroup values) that indicates to which SuperGroup the RuleGroup belongs. (The SuperGroup column would ideally have a FK constraint on it as well that references another table that contains all of the valid SuperGroup values.) If you use this approach, then there is no coding involved whenever a new SuperGroup is added. It maintains itself.
Option 2: (Not a best practice, try option #1 if you can)
Create a new SuperGroups table with 2 columns: SuperGroup and MatchingCriteria. Then you can join on the new SuperGroups table. (Note that this assumes that each MatchingCriteria is going to be mutually exclusive. If not, then you could match more than 1 SuperGroup and get results you might not have intended. Either that or you will have to find some other way to limit the results to a single SuperGroup.) The Query would look something like this:
SELECT
ru.RuleName,
ru.RuleGroup,
sg.SuperGroup
FROM RuleData.groupings ru
JOIN RuleData.SuperGroups sg ON ru.RuleGroup LIKE sg.MatchingCriteria
WHERE ru.ActiveRule = 1
AND ru.RuleOpen >= '2015-01-01'
I removed the WHERE 1=1 code. It was unnecessary and was probably just there to help you debug your problem.

Storing a COUNT of values in a table

I have a table with data along the (massively simplified) lines of:
User | Value
-----|------
UsrA | 100
UsrA | 102
UsrB | 100
UsrA | 100
UsrB | 101
and, for reasons far to obscure to go into, I need to store the COUNT of each value in a table for future retrieval - ending up with something like
User | Value100Count | Value101Count | Value102Count
-----|---------------|---------------|--------------
UsrA | 2 | 0 | 1
UsrB | 1 | 1 | 0
However, there could be up to 255 different Values - meaning potentially 255 different ValueXCount columns. I know this is a horrible way to do things, but is there an easy way to get the data into a format that can be easily INSERTed into the destination table? Is there a better way to store the COUNT of values per user (unfortunately I do need to store this information; grabbing it from the source table each time isn't an option)?
The whole thing isn't very pretty, but you know that, rather than your table with 255 columns I'd consider setting up another table with:
User | Value | CountOfValue
And set a primary key over User and Value.
You could then insert the count's for given user/value combos into the CountOfValue field
As I said, the design is horrible and it feels like you would be better off starting from scratch, normalizing and doing counts live.
Check out indexed views. You can maintain the table automatically, with integrity and as a bonus it can get used in queries that already do count(*) on that data.