Data Studio obtain a transposed table from BigQuery and hidden conditional formatting in text/strings - google-bigquery

Several shops should be monitored about their status in a Data Studio dashboard.
There are less than 20 shops and I show here only two in the example. In the BigQuery table there is shop column and following columns: status, info, sold_today and update_time. The shop and update_time columns are always filled, but the other ones are filled only if there is a change.
Task: For each shop the last entries of all columns should be shown.
Here is the BigQuery code for the sample table:
create or replace table dsadds.eu.dummy as(
Select "shop A" as shop, 1000 as sold_today, "sale ABC" as info, 0 as status,timestamp( "2022-09-05") as update_time
union all select "shop A",null,null,1,"2022-09-06"
union all select "shop A" as shop, 500 as sold_today, "open" as status,3,"2022-09-01"
union all Select "shop B" as shop, 700 as sold_today, "open" as status,3,current_timestamp() as update_time
)
This table looks in Data Studio with conditional formatting, Status=1 marked red, like this:
As you can see the "Shop A" is shown several times and with null values.
With following custom BigQuery in Data Studio I can obtain the last entry of each shop:
with tbl as
(select shop,
array_agg(sold_today ignore nulls order by update_time desc limit 1)[safe_offset(0)] sold_today,
array_agg(info ignore nulls order by update_time desc limit 1)[safe_offset(0)] info,
array_agg(status ignore nulls order by update_time desc limit 1)[safe_offset(0)] status,
from dsadds.eu.dummy
group by 1
)
select * from tbl
resulting in following table, showing all needed information:
However, the users would like to have this table to be transposed and look like this:
-->
On the right hand side it is shown with the final textbox for the labeling of the rows.
Of course, it is possible to build for each entry a Scorecard, but with 10 shops and three field for each, the limit of charts per page was reached.
Question
Is there a way to transpose a table and also do the conditional formatting?

The task is to return one column for each shop and a column id to sort the results. A column has to have one data type and for different rows, we cannot return one time a string and the other time a integer. Thus all integer values have to be formatted in BigQuery as strings.
For transpose, we build an tlb_array. The grouping by the shop generates an array for each one. The array has as first entry the shop name shop and as 2nd entry the column info and as 3rd entry we cast the sold_today column, which is an integer value to a string. We also include an id as entry number. By unnesting this array we unflatten the data and group it again by id in the next select statement. Here, we create a column for each shop and the if condition only considers data for this shop. Thus we end up with a table with three rows, with the row number in id. The needed data is in the shop columns.
with tbl as
(select shop,
array_agg(sold_today ignore nulls order by update_time desc limit 1)[safe_offset(0)] sold_today,
array_agg(info ignore nulls order by update_time desc limit 1)[safe_offset(0)] info,
array_agg(status ignore nulls order by update_time desc limit 1)[safe_offset(0)] status,
from dsadds.eu.dummy
group by 1
),
tlb_array as (
Select shop,X.* from tbl,
unnest([
struct(1 as id,shop as value),
struct(2,info),
struct(3,cast(sold_today as string))]) X
)
select id,
any_value(if(shop="shop A",value,null)) as ShopA,
any_value(if(shop="shop B",value,null)) as ShopB,
from tlb_array
group by 1
Thus we return only text via strings. But we want to apply conditional formatting without adding further columns. The trick is to include special characters in the returned string. ´ or ' is possible, but this would disturb the user. Therefore, the use of space characters is a good way. There are several unicode characters for different space distances. Thus a number can be encoded to space characters. Following UDF has to be hosted by you. It encodes each decimal digit of a number in a different unicode space character.
CREATE OR REPLACE FUNCTION `dsadds.us.number_to_space`(x INT64) AS (
(
SELECT
CONCAT(" ",
string_agg(SUBSTRING(
CODE_POINTS_TO_STRING([0x2007, 0x2002, 0x2004, 0x2005, 0x2006, 0x2008, 0x2009, 0x200A, 0x202F, 0x205F]),
y-47,1),"")
,"- ")
FROM
UNNEST(TO_CODE_POINTS(CAST(x AS string))) y )
);
Then you can use this function in your Custom BigQuery in Data Studio:
with tbl as
(select shop,
array_agg(sold_today ignore nulls order by update_time desc limit 1)[safe_offset(0)] sold_today,
array_agg(info ignore nulls order by update_time desc limit 1)[safe_offset(0)] info,
array_agg(status ignore nulls order by update_time desc limit 1)[safe_offset(0)] status,
from dsadds.us.dummy
group by 1
),
tlb_array as (
Select shop,X.* from tbl,
unnest([
struct(1 as id,concat(shop,dsadds.us.number_to_space(status)) as value),
struct(2,concat(info,dsadds.us.number_to_space(status))),
struct(3,cast(sold_today as string))]) X
)
select id,
any_value(if(shop="shop A",value,null)) as ShopA,
any_value(if(shop="shop B",value,null)) as ShopB,
from tlb_array
group by 1
This will result in following needed table. The (hidden) space characters have to be copied from the table (only in view not in edit mode of Data Studio possible) and condition formatting rules added (text contrains: ). Please also adding a textbox over the first column to hide it and enter the labels for each row.

Related

How to get first row of 3 specific values of a column using Oracle SQL?

I have a table which has ID, FAMILY, ENV_XML_PATH and CREATED_DATE columns.
ID
FAMILY
ENV_XML_PATH
CREATED_DATE
15826841
CRM
path1.xml
03-09-22 6:50:34AM
15826856
SCM
path3.xml
03-10-22 7:12:20AM
15826786
IC
path4.xml
02-10-22 12:50:52AM
15825965
CRM
path5.xml
02-10-22 1:50:52AM
15653951
null
path6.xml
04-10-22 12:50:52AM
15826840
FIN
path7.xml
03-10-22 2:34:09AM
15826841
SCM
path8.xml
02-10-22 8:40:52AM
15223450
IC
path9.xml
03-09-22 5:34:09AM
15026853
SCM
path10.xml
05-10-22 4:40:59AM
Now there are 18 DISTINCT values in FAMILY column and each value has multiple rows associated (as you can see from the above image).
What I want is to get the first row of 3 specific values (CRM, SCM and IC) in FAMILY column.
Something like this:
ID
FAMILY
ENV_XML_PATH
CREATED_DATE
15826841
CRM
path1.xml
date1
15826856
SCM
path3.xml
date2
15826786
IC
path4.xml
date3
I am new to this, though I understand the logic but I am not sure how to implement it. Kindly help. Thanks.
You can use RANK for that. Something like this:
WITH groupedData AS
(SELECT id, family, env_xml_path, created_date,
RANK () OVER (PARTITION BY family ORDER BY id) AS r_num
FROM yourtable
GROUP BY id, family, env_xml_path, created_date)
SELECT id, family, env_xml_path, created_date
FROM groupedData
WHERE r_num = 1
ORDER BY id;
Thus, within the first query, your data will be grouped by family and sorted by the column you want (in my example, it will be sorted by id).
After that, you will use the second query to only take the first row of each family.
Add a WHERE clause to the first query if you need to apply further restrictions on the result set.
See here a working example: db<>fiddle
You could use a window function to get to know the row number of each partition in family ordered by the created_date, and then filter by the the three families you are interested in:
with row_window as (
select
id,
family,
env_xml_path,
created_date,
row_number() over (partition by family order by created_date asc) as rn
from <your_table>
where family in ('CRM', 'SCM', 'IC')
)
select
id,
family,
env_xml_path,
created_date
from row_window
where rn = 1
Output:
ID
FAMILY
ENV_XML_PATH
CREATED_DATE
15826841
CRM
path1.xml
03-09-22 6:50:34
15826856
SCM
path3.xml
03-10-22 7:12:20
15826786
IC
path4.xml
02-10-22 12:50:52
The question doesn't really specify what 'first' means, but I assume it means the first to be added in the table, aka the person whose date is the oldest. Try this code:
SELECT DISTINCT * FROM (yourTable) WHERE Family = 'CRM' OR
Family = 'SCM' OR Family = 'IC' ORDER BY Created_Date ASC FETCH FIRST (number) ROWS ONLY;
What it does:
Distinct - It selects different rows, which means you won't get same type of rows at the top.
Where - checks if certain condition is true
OR - it means that the select should choose rows that match those requirements. In the current situation the distinct clause means that same rows won't repeat, so you won't be getting 2 different 'CRM' family names, so it will find the first 'CRM' then the first 'SCM' and so on.
ORDER BY - orders the column in specified order. In the current one, if first rows mean the oldest, then by ordering them by date and using ASC the oldest(aka smallest date) will be at the top.
FETCH FIRST (number) ROWS ONLY - It selects only the very first couple of rows you want. For example if you need 3 different 'first' rows you need to get FETCH FIRST 3 ROWS ONLY. Combined with the distinct word it will only show 3 different rows.

Inclusion of nulls with ANY_VALUE in BigQuery

I have a 'vendors' table that looks like this...
**company itemKey itemPriceA itemPriceB**
companyA, 203913, 20, 10
companyA, 203914, 20, 20
companyA, 203915, 25, 5
companyA, 203916, 10, 10
It has potentially millions of rows per company and I want to query it to bring back a representative delta between itemPriceA and itemPriceB for each company. I don't care which delta I bring back as long as it isn't zero/null (like row 2 or 4), so I was using ANY_VALUE like this...
SELECT company
, ANY_VALUE(CASE WHEN (itemPriceA-itemPriceB)=0 THEN null ELSE (itemPriceA-itemPriceB) END)
FROM vendors
GROUP BY 1
It seems to be working but I notice 2 sentences that seem contradictory from Google's documentation...
"Returns NULL when expression is NULL for all rows in the group. ANY_VALUE behaves as if RESPECT NULLS is specified; rows for which expression is NULL are considered and may be selected."
If ANY_VALUE returns null "when expression is NULL for all rows in the group" it should NEVER return null for companyA right (since only 2 of 4 rows are null)? But the second sentence sounds like it will indeed include the null rows.
P.s. you may be wondering why I don't simply add a WHERE clause saying "WHERE itemPriceA-itemPriceB>0" but in the event that a company has ONLY matching prices, I still want the company to be returned in my results.
Clarification
I'm afraid the accepted answer will have to show stronger evidence that contradicts the docs.
#Raul Saucedo suggests that the following BigQuery documentation is referring to WHERE clauses:
rows for which expression is NULL are considered and may be selected
This is not the case. WHERE clauses are not mentioned anywhere in the ANY_VALUE docs. (Nowhere on the page. Try to ctrl+f for it.) And the docs are clear, as I'll explain.
#d3wannabe is correct to wonder about this:
It seems to be working but I notice 2 sentences that seem contradictory from Google's documentation...
"Returns NULL when expression is NULL for all rows in the group. ANY_VALUE behaves as if RESPECT NULLS is specified; rows for which expression is NULL are considered and may be selected."
But the docs are not contradictory. The 2 sentences coexist.
"Returns NULL when expression is NULL for all rows in the group." So if all rows in a column are NULL, it will return NULL.
"ANY_VALUE behaves as if RESPECT NULLS is specified; rows for which expression is NULL are considered and may be selected." So if the column has rows mixed with NULLs and actual data, it will select anything from that column, including nulls.
How to create an ANY_VALUE without nulls in BigQuery
We can use ARRAY_AGG to turn a group of values into a list. This aggregate function has the option to INGORE NULLS. We then select 1 item from the list after ignoring nulls.
If we have a table with 2 columns: id and mixed_data, where mixed_data has some rows with nulls:
SELECT
id,
ARRAY_AGG( -- turn the mixed_data values into a list
mixed_data -- we'll create an array of values from our mixed_data column
IGNORE NULLS -- there we go!
LIMIT 1 -- only fill the array with 1 thing
)[SAFE_OFFSET(0)] -- grab the first item in the array
AS any_mixed_data_without_nulls
FROM your_table
GROUP BY id
See similar answers here:
https://stackoverflow.com/a/53508606/6305196
https://stackoverflow.com/a/62089838/6305196
Update, 2022-08-12
There is evidence that the docs may be inconsistent with the actual behavior of the function. See Samuel's latest answer to explore his methodology.
However, we cannot know if the docs are incorrect and ANY_VALUE behaves as expected or if ANY_VALUE has a bug and the docs express the intended behavior. We don't know if Google will correct the docs or the function when they address this issue.
Therefore I would continue to use ARRAY_AGG to create a safe ANY_VALUE that ignores nulls until we see a fix from Google.
Please upvote the issue in Google's Issue Tracker to see this resolved.
This is an explanation about how “any_value works with null values”.
With any_value always return the first value, if there is a value different from null.
SELECT ANY_VALUE(fruit) as any_value
FROM UNNEST([null, "banana",null,null]) as fruit;
Return null if all rows have null values. Refers at this sentence
“Returns NULL when expression is NULL for all rows in the group”
SELECT ANY_VALUE(fruit) as any_value
FROM UNNEST([null, null, null]) as fruit
Return null if one value is null and you specified in the where clause. Refers to these sentences
“ANY_VALUE behaves as if RESPECT NULLS is specified; rows for which
expression is NULL are considered and may be selected.”
SELECT ANY_VALUE(fruit) as any_value
FROM UNNEST(["apple", "banana", null]) as fruit
where fruit is null
Always depends which filter you are using and the field inside the any_value.
You can see this example, return two rows that are different from 0.
SELECT ANY_VALUE(e).company, (itemPriceA-itemPriceB) as value
FROM `vendor` e
where (itemPriceA-itemPriceB)!=0
group by e.company
The documentation says that "NULL are considered and may be" returned by an any_value statement. However, I am quite sure the documentation is wrong here. In the current implementation, which was tested on 13th August 2022, the any_value will return the first value of that column. However, if the table does not have an order by specified, the sorting may be random due to processing of the data on several nodes.
For testing a large table of nulls is needed. To generate_array will come handy for that. This array will have several entries and the value zero for null. The first 1 million entries with value zero are generated in the table tmp. Then table tbl adds before and after the [-100,0,-90,-80,3,4,5,6,7,8,9] the 1 million zeros. Finally, calculating NULLIF(x,0) AS x replaces all zeros by null.
Several test of any_value using the test table tbl are done. If the table is not further sorted, the first value of that column is returned: -100.
WITH
tmp AS (SELECT ARRAY_AGG(0) AS tmp0 FROM UNNEST(GENERATE_ARRAY(1,1000*1000))),
tbl AS (
SELECT
NULLIF(x,0) AS x,
IF(x!=0,x,NULL) AS y,
rand() AS rand
FROM
tmp,
UNNEST(ARRAY_CONCAT(tmp0, [0,0,0,0,0,-100,0,-90,-80,3,4,5,6,7,8,9] , tmp0)) AS x )
SELECT "count rows", COUNT(1) FROM tbl
UNION ALL SELECT "count items not null", COUNT(x) FROM tbl
UNION ALL SELECT "any_value(x): (returns first non null element in list: -100)", ANY_VALUE(x) FROM tbl
UNION ALL SELECT "2nd run", ANY_VALUE(x) FROM tbl
UNION ALL SELECT "3rd run", ANY_VALUE(x) FROM tbl
UNION ALL SELECT "any_value(y)", ANY_VALUE(y) FROM tbl
UNION ALL SELECT "order asc", ANY_VALUE(x) FROM (Select * from tbl order by x asc)
UNION ALL SELECT "order desc (returns largest element: 9)", ANY_VALUE(x) FROM (Select * from tbl order by x desc)
UNION ALL SELECT "order desc", ANY_VALUE(x) FROM (Select * from tbl order by x desc)
UNION ALL SELECT "order abs(x) desc", ANY_VALUE(x) FROM (Select * from tbl order by abs(x) desc )
UNION ALL SELECT "order abs(x) asc (smallest number: 3)", ANY_VALUE(x) FROM (Select * from tbl order by abs(x) asc )
UNION ALL SELECT "order rand asc", ANY_VALUE(x) FROM (Select * from tbl order by rand asc )
UNION ALL SELECT "order rand desc", ANY_VALUE(x) FROM (Select * from tbl order by rand desc )
This gives following result:
The first not null entry, -100 is returned.
Sorting the table by this column causes the any_value to always return the first entry
In the last two examples, the table is ordered by random values, thus any_value returns random entries
If the dataset is larger than 2 million rows, the table may be internally split to be processed; this will result in a not ordered table. Without the order by command the first entry on the table and thus the result of any_value cannot be predicted.
For testing this, please replace the 10th line by
UNNEST(ARRAY_CONCAT(tmp0,tmp0,tmp0,tmp0,tmp0,tmp0,tmp0,tmp0, [0,0,0,0,0,-100,0,-90,-80,3,4,5,6,7,8,9] , tmp0,tmp0)) AS x )

How to determine the order of the result from my postgres query?

I have the following query:
SELECT
time as "time",
case
when tag = 'KEB1.DB_BP.01.STATUS.SOC' THEN 'SOC'
when tag = 'KEB1.DB_BP.01.STATUS.SOH' THEN 'SOH'
end as "tag",
value as "value"
FROM metrics
WHERE
("time" BETWEEN '2021-07-02T10:39:47.266Z' AND '2021-07-09T10:39:47.266Z') AND
(container = '1234') AND
(tag = 'KEB1.DB_BP.01.STATUS.SOC' OR tag = 'KEB1.DB_BP.01.STATUS.SOH')
GROUP BY 1, 2, 3
ORDER BY time desc
LIMIT 2
This is giving me the result:
Sometimes the order changes of the result changes from SOH -> SOC or from SOC -> SOH. I'm trying to modify my query so I always get SOH first and than SOC.. How can I achieve this?
You have two times that are identical. The order by is only using time as a key. When the key values are identical, the resulting order for those keys is arbitrary and indeterminate. In can change from one execution to the next.
To prevent this, add an additional column to the order by so each row is unique. In this case that would seem to be tag:
order by "time", tag
You want to show the two most recent rows. In your example these have the same date/time but they can probably also differ. In order to find the two most recent rows you had to apply an ORDER BY clause.
You want to show the two rows in another order, however, so you must place an additional ORDER BY in your query. This is done by selecting from your query result (i.e. putting your query into a subquery):
select *
from ( <your query here> ) myquery
order by tag desc;
Try this:
order by 1 desc, 2
(order by first column descending and by the second column)

How to compare ordered datasets with the dataset before?

I have the following query:
select * from events order by Source, DateReceived
This gives me something like this:
I would like to get the results which i marked blue -> When there are two or more equal ErrorNr-Entries behind each other FROM THE SAME SOURCE.
So I have to compare every row with the row before. How can I achieve that?
This is what I want to get:
Apply the row number over partition by option on your table:
SELECT
ROW_NUMBER() OVER(PARTITION BY Source ORDER BY datereceived)
AS Row,
* FROM events
Either you can run a (max) having > 1 option on the result set's row number. Or if you need the details, apply the same query deducting the row nuber with 1.
Then you can make a join on the source and the row numbers and if the error nr is the same then you have a hit.
You can use the partition by as below.
select * from(select
*,row_number()over(partition by source,errornr order by Source, DateReceived) r
from
[yourtable])t
where r>1
You can specify your column names in the outer select.

Split the results of a query in half

I'm trying to export rows from one database to Excel and I'm limited to 65000 rows at a shot. That tells me I'm working with an Access database but I'm not sure since this is a 3rd party application (MRI Netsource) with limited query ability. I've tried the options posted at this solution (Is there a way to split the results of a select query into two equal halfs?) but neither of them work -- in fact, they end up duplicating results rather than cutting them in half.
One possibly related issue is that this table does not have a unique ID field. Each record's unique ID can be dynamically formed by the concatenation of several text fields.
This produces 91934 results:
SELECT * from note
This produces 122731 results:
SELECT * FROM (
SELECT *, ROW_NUMBER() OVER (ORDER BY notedate) AS rn FROM note
) T1
WHERE rn % 2 = 1
EDIT: Likewise, this produces 91934 results, half of them with a tile_nr value of 1, the other half with a value of 2:
SELECT *, NTILE(2) OVER (ORDER BY notedate) AS tile_nr FROM note
However this produces 122778 results, all of which have a tile_nr value of 1:
SELECT bldgid, leasid, notedate, ref1, ref2, tile_nr
FROM (
SELECT *, NTILE(2) OVER (ORDER BY notedate) AS tile_nr FROM note) x
WHERE x.tile_nr = 1
I know that I could just use a COUNT to get the exact number of records, run one query using TOP 65000 ORDER BY notedate, and then another that says TOP 26934 ORDER BY notedate DESC, for example, but as this dataset changes a lot I'd prefer some way to automate this to save time.