Multiple unnest hits custom Dimensions BigQuery - sql

I have this table in BigQuery
+---------------------------+---------------------------+--------------------------+|
| date |hits.customDimensions.index|hits.customDimensions.value|
+---------------------------+---------------------------+---------------------------|
| 24/09/2021 | 1 | ARG |
| | 2 | production |
| | 3 | id1 |
| | 4 | label1|label2 |
| 24/09/2021 | 1 | GER |
| | 2 | production |
| | 3 | id2 |
| | 4 | label1|label4 |
+---------------------------+---------------------------+---------------------------+
I would like to get a table like this:
+-------------++----------+----------+
| date | Country | labels |
+-------------+----------+----------+
| 24/09/2021 | ARG | label1 |
| 24/09/2021 | ARG | label2 |
| 24/09/2021 | GER | label1 |
| 24/09/2021 | GER | label4 |
+-------------+----------+----------+
I tried with UNNEST hits.customdimensions individually but I could't get join the information of country and labels in one table.

Consider below approach
select date,
(
select value from t.hits hit,
hit.customDimensions customDimension
where index = 1
) country,
label
from data t,
unnest((
select split(value, '|') from t.hits hit,
hit.customDimensions customDimension
where index = 4
)) label
if applied to sample data in your question - output is

Related

SQL Query - Add column data from another table adding nulls

I have 2 tables, tableStock and tableParts:
tableStock
+----+----------+-------------+
| ID | Num_Part | Description |
+----+----------+-------------+
| 1 | sr37 | plate |
+----+----------+-------------+
| 2 | sr56 | punch |
+----+----------+-------------+
| 3 | sl30 | crimper |
+----+----------+-------------+
| 4 | mp11 | holder |
+----+----------+-------------+
tableParts
+----+----------+-------+
| ID | Location | Stock |
+----+----------+-------+
| 1 | A | 2 |
+----+----------+-------+
| 3 | B | 5 |
+----+----------+-------+
| 5 | C | 2 |
+----+----------+-------+
| 7 | A | 1 |
+----+----------+-------+
And I just want to do this:
+----+----------+-------------+----------+-------+
| ID | Num_Part | Description | Location | Stock |
+----+----------+-------------+----------+-------+
| 1 | sr37 | plate | A | 2 |
+----+----------+-------------+----------+-------+
| 2 | sr56 | punch | NULL | NULL |
+----+----------+-------------+----------+-------+
| 3 | sl30 | crimper | B | 5 |
+----+----------+-------------+----------+-------+
| 4 | mp11 | holder | NULL | NULL |
+----+----------+-------------+----------+-------+
List ALL the rows of the first table and if the second table has the info, in this case 'location' and 'stock', add to the column, if not, just null.
I have been using inner and left join but some rows of the first table disappear because the lack of data in the second one:
select tableStock.ID, tableStock.Num_Part, tableStock.Description, tableParts.Location, tableParts.Stock from tableStock inner join tableParts on tableStock.ID = tableParts.ID;
What can I do?
You can use left join. Here is the demo.
select
s.ID,
Num_Part,
Description,
Location,
Stock
from Stock s
left join Parts p
on s.ID = p.ID
order by
s.ID
output:
| id | num_part | description | location | stock |
| --- | -------- | ----------- | -------- | ----- |
| 1 | sr37 | plate | A | 2 |
| 2 | sr56 | punch | NULL | NULL |
| 3 | sl30 | crimper | B | 5 |
| 4 | mp11 | holder | NULL | NULL |

getting an average of a count of a group sql impala

Given a table like;
+----+----------+
| id | modality |
+----+----------+
| 1 | CT |
| 1 | SC |
| 1 | MR |
| 1 | CT |
| 2 | SC |
| 3 | CT |
| 3 | CT |
| 3 | MR |
| 4 | MR |
| 4 | CT |
+----+----------+
Something like;
SELECT AVG(modality_count)
FROM (
SELECT id, COUNT(modality) AS modality_count
FROM TABLE
GROUP BY id
) a
Would get the average amount of modality per patient, and output something like:
+---------------------+
| AVG(modality_count) |
+---------------------+
| 1.4 |
+---------------------+
the following query:
SELECT CONCAT(CAST(COUNT(*) AS string), ' ' , modality) AS count_m
FROM TABLE
GROUP BY modality
would get the count of the number of rows for each modality, like:
+---+---------+
| | count_m |
+---+---------+
| 1 | 560 CT |
| 2 | 2 SC |
| 3 | 473 MR |
+---+---------+
Let's say there are 1035 rows, 726 unique ids, and each id is likely to have 1.4 scans (either CT, SC, MR). How would you get the average number of CT/ MT/ SC?
desired output something like:
+----------+---------+
| modality | AVG(id) |
+----------+---------+
| CT | 0.77 |
| SC | 0.003 |
| MR | 0.65 |
+----------+---------+
Thanks in advance

How to concat rows based on ID,

Using standard SQL in bigquery:
Given a table such as: Where the values have been counted so only appear once
| id | key | value |
--------------------
| 1 | read | aa |
| 1 | read | bb |
| 1 | name | abc |
| 2 | read | bb |
| 2 | read | cc |
| 2 | name | def |
| 2 | value| some |
| 3 | read | aa |
How can I make it so each row is one user and their respective values? e.g. NEST
So the table would look like:
| id | key | value |
--------------------
| 1 | read | aa |
| | read | bb |
| | name | abc |
| 2 | read | bb |
| | read | cc |
| | name | def |
| | value| some |
| 3 | read | aa |
I've tried using ARRAY_AGG on the column, which ends up listing all the values of that column.
I just need to have each row as a single user with multiple values, as shown above.
Like BigQuery does here, this is what I want it to look like:
Below is for BigQuery Standard SQL
#standardSQL
SELECT id, ARRAY_AGG(STRUCT(key AS key, value AS value)) params
FROM `project.dataset.table`
GROUP BY id
if to apply to your sample data - result is

sort a table while keeping the hierarchy of rows

I have a table which represents the hierarchy of departments:
+-----------+--------------+--------------+--------------+-----------+-------+
| Top Dept. | 2-tier Dept. | 3-tire Dept. | 4-tier Dept. | name | tier |
+-----------+--------------+--------------+--------------+-----------+-------+
| 00 | | | | abc | 0 |
| | 00-01 | | | bcd | 1 |
| | | 00-01-01 | | cde | 2 |
| | | 00-01-02 | | abc | 2 |
| | 00-02 | | | aef | 1 |
| | | 00-02-01 | | qwe | 2 |
| | | 00-02-03 | | abc | 2 |
| | | | 00-02-03-01 | abc | 3 |
+-----------+--------------+--------------+--------------+-----------+-------+
now I want to sort the rows which are in the same tier by their names while keeping the hierarchy overall, That's what I expect:
+-----------+--------------+--------------+--------------+-----------+-------+
| Top Dept. | 2-tier Dept. | 3-tire Dept. | 4-tier Dept. | name | tier |
+-----------+--------------+--------------+--------------+-----------+-------+
| 00 | | | | abc | 0 |
| | 00-02 | | | aef | 1 |
| | | 00-02-03 | | abc | 2 |
| | | 00-02-01 | | qwe | 2 |
| | 00-01 | | | def | 1 |
| | | 00-01-02 | | abc | 2 |
| | | 00-01-01 | | cde | 2 |
| | | | 00-02-03-01 | abc | 3 |
+-----------+--------------+--------------+--------------+-----------+-------+
the missing data means null, I'm using Oracle DB, can anyone help me?
EDIT: Actually, it's a simple version of this sql, I've tried to add a new column which concats the values of the first four columns and then order by it and by name, but it did't work.
Update: This appears to be working... SQL Fiddle
All that was really needed from my original comment was to amend name to department in that order in both selects. This allows the engine to sort by name first, while maintaining the hierarchy.
WITH cte(Dept, superiorDept, name, depth, sort)AS (
SELECT
Dept,
superiorDept,
name,
0,
name|| dept
FROM hierarchy h
WHERE superiorDept IS NULL
UNION ALL
SELECT
h2.Dept,
h2.superiorDept,
h2.name,
cte.depth + 1,
cte.sort || h2.name ||h2.dept
FROM hierarchy h2
INNER JOIN cte ON h2.superiorDept = cte.Dept
)
SELECT
CASE WHEN depth = 0 THEN Dept END AS 一级部门,
CASE WHEN depth = 1 THEN Dept END AS 二级部门,
CASE WHEN depth = 2 THEN Dept END AS 三级部门,
CASE WHEN depth = 3 THEN Dept END AS 四级部门,
name,
depth,
sort
FROM cte
ORDER BY sort, name

Joining two tables and calculating divide-SUM from the resulting table in SQL Server

I have one table that looks like this:
+---------------+---------------+-----------+-------+------+
| id_instrument | id_data_label | Date | Value | Note |
+---------------+---------------+-----------+-------+------+
| 1 | 57 | 1.10.2010 | 200 | NULL |
| 1 | 57 | 2.10.2010 | 190 | NULL |
| 1 | 57 | 3.10.2010 | 202 | NULL |
| | | | | |
+---------------+---------------+-----------+-------+------+
And the other that looks like this:
+----------------+---------------+---------------+--------------+-------+-----------+------+
| id_fundamental | id_instrument | id_data_label | quarter_code | value | AnnDate | Note |
+----------------+---------------+---------------+--------------+-------+-----------+------+
| 1 | 1 | 20 | 20101 | 3 | 28.2.2010 | NULL |
| 2 | 1 | 20 | 20102 | 4 | 1.8.2010 | NULL |
| 3 | 1 | 20 | 20103 | 5 | 2.11.2010 | NULL |
| | | | | | | |
+----------------+---------------+---------------+--------------+-------+-----------+------+
What I would like to do is to merge/join these two tables in one in a way that I get something like this:
+------------+--------------+--------------+----------+--------------+
| Date | Table1.Value | Table2.Value | AnnDate | quarter_code |
+------------+--------------+--------------+----------+--------------+
| 1.10.2010. | 200 | 3 | 1.8.2010 | 20102 |
| 2.10.2010. | 190 | 3 | 1.8.2010 | 20102 |
| 3.10.2010. | 202 | 3 | 1.8.2010 | 20102 |
| | | | | |
+------------+--------------+--------------+----------+--------------+
So the idea is to order them by Date from Table1 and since Table2 Values only change on the change of AnnDate we populate the Resulting table with same values from Table2.
After that I would like to go through the resulting table and create another (Final table) with the following.
On Date 1.10.2010. take last 4 AnnDates (so it would be 1.8.2010. and f.e. 20.3.2010. 30.1.2010. 15.11.2009) and Table2 values on those AnnDate. Make SUM of those 4 values and then divide the Table1 Value with that SUM.
So we would get something like:
+-----------+---------------------------------------------------------------+
| Date | FinalValue |
+-----------+---------------------------------------------------------------+
| 1.10.2010 | 200/(Table2.Value on 1.8.2010+Table2.Value on 20.3.2010 +...) |
| | |
+-----------+---------------------------------------------------------------+
Is there any way this can be done?
EDIT:
Hmm yes now I see that I really didn't do a good job explaining it.
What I wanted to say is
I try INNER JOIN like this:
SELECT TableOne.Date, TableOne.Value, TableTwo.Value, TableTwo.AnnDate, TableTwo.quarter_code
FROM TableOne
INNER JOIN TableTwo ON TableOne.id_intrument=TableTwo.id_instrument WHERE TableOne.id_data_label = somevalue AND TableTwo.id_data_label = somevalue AND date > xxx AND date < yyy
And this inner join returns 2620*40 rows which means for every AnnDate from table2 it returns all Date from table1.
What I want is to return 2620 values with Dates from Table1
Values from table1 on that date and Values from table2 that respond to that period of dates
f.e.
Table1:
+-------+-------+
| Date | Value |
+-------+-------+
| 1 | a |
| 2 | b |
| 3 | c |
| 4 | d |
+-------+-------+
Table2
+-------+---------+
| Value | AnnDate |
+-------+---------+
| x | 1 |
| y | 4 |
+-------+---------+
Resulting table:
+-------+---------+---------+
| Date | ValueT1 | ValueT2 |
+-------+---------+---------+
| 1 | a | x |
| 2 | b | x |
| 3 | c | x |
| 4 | d | y |
+-------+---------+---------+
You need a JOIN statement for your first query. Try:
SELECT TableOne.Date, TableOne.Value, TableTwo.Value, TableTwo.AnnDate, TableTwo.quarter_code FROM TableOne
INNER JOIN TableTwo
ON TableOne.id_intrument=TableTwo.id_instrument;