SQL, BigQuery - completing missing values with other part of rows - sql

I'm using Firebase data exported to BigQuery (data contains events data coming from mobile application). I've made an update to the application and new parameter is being reported. Unfortunately, not all users have the latest version of app. This is why I have rows with that parameter as well as rows without it.
In event_params I have something like:
| No | contentId | contentName |
|----|-----------|---------------------|
| 1 | abc | (parameter missing) |
| 2 | abc | Name of ABC |
| 3 | cde | Name of CDE |
| 4 | efg | Name of EFG |
| 5 | abc | (parameter missing) |
| 6 | cde | Name of CDE |
Now, when I query that table and I specify (using UNNEST) that I need contentName parameter, I don't get rows where that parameter is missing.
I have query:
SELECT
ep.value.string_value as ContentID,
ep2.value.string_value as ContentName,
COUNT(1) as `Count`
FROM
`mydataset.mytable.events_*`,
UNNEST(event_params) as ep,
UNNEST(event_params) as ep2
WHERE
event_name="my_event_name" AND
ep.key="contentID" AND
ep2.key="contentName"
GROUP BY 1,2
and I get:
| No | contentId | contentName | Count |
|----|-----------|-------------|-------|
| 1 | abc | Name of ABC | 1 |
| 2 | cde | Name of CDE | 2 |
| 3 | efg | Name of EFG | 1 |
However, I would like to get:
| No | contentId | contentName | Count |
|----|-----------|-------------|-------|
| 1 | abc | Name of ABC | 3 |
| 2 | cde | Name of CDE | 2 |
| 3 | efg | Name of EFG | 1 |
I want to complete somehow rows with missing contentName parameters using values from other rows with the same contentId (we can assume that each contentId has the same, constant contentName)
How can I achieve it? I thougt about SELF JOIN, but it's rather not recommended by BigQuery.

The solution provided by Gordon can be slightly modified in order to achieve what you intend:
SELECT contentId.value.string_value as ContentID,
MAX(contentName.value.string_value) as ContentName,
COUNT(1) as `Count`
FROM `mydataset.mytable.events_*` e LEFT JOIN
UNNEST(e.event_params) as contentId
ON contentId.key = 'contentID' LEFT JOIN
UNNEST(e.event_params) contentName
ON contentName.key = 'contentName'
WHERE e.event_name = 'my_event_name'
GROUP BY 1;
Note that I am grouping only by the ContentID and I am aggregating the ContentNames using MAX, which ignores null values.
I have recreated your example table and it works as expected.

You can update the table so that you fill the nulls and then make your query
[1]
UPDATE `your_project.your_dataset.your_table` t_incomplete
SET t_incomplete.contentName = t_complete.contentName
FROM `your_project.your_dataset.your_table` t_complete
WHERE t_incomplete.contentId = t_complete.contentId
AND t_complete.contentName IS NOT NULL
I am not sure how will this work with nested tables but you can always
UPDATE UNNESTING
UPDATE WITH QUERY [1]
UPDATE NESTING
You can picture the idea behind with this sample CREATE TABLE
CREATE TABLE `your_project.your_dataset.sample_table`
(
id INT64,
nullable STRING
);
INSERT INTO `your_project.your_dataset.sample_table`
VALUES (1, 'foo');
INSERT INTO `your_project.your_dataset.sample_table`
VALUES (1, null);
INSERT INTO `your_project.your_dataset.sample_table`
VALUES (2, 'lel');
INSERT INTO `your_project.your_dataset.sample_table`
VALUES (1, null);
INSERT INTO `your_project.your_dataset.sample_table`
VALUES (2, null);
and QUERY[2]
UPDATE `your_project.your_dataset.sample_table` t_incomplete
SET t_incomplete.nullable = t_complete.nullable
FROM `wave27-sellbytel-aalbesa.trial_dataset.with_and_update` t_complete
WHERE t_incomplete.id = t_complete.id
AND t_complete.nullable IS NOT NULL
This way you actually give the corresponding value to the cell and you can run your query without worries. I hope this works!

Do you just need an OR condition?
WHERE event_name = 'my_event_name' AND
ep.key = 'contentID' AND
(ep2.key = 'contentName' OR ep2.key IS NULL)
EDIT:
I think you need LEFT JOINs:
SELECT contentId.value.string_value as ContentID,
contentName.value.string_value as ContentName,
COUNT(1) as `Count`
FROM `mydataset.mytable.events_*` e LEFT JOIN
UNNEST(e.event_params) as contentId
ON contentId.key = 'contentID' LEFT JOIN
UNNEST(e.event_params) contentName
ON contentName.key = 'contentName'
WHERE e.event_name = 'my_event_name'
GROUP BY 1, 2;
Note: This should preserve the counts you want but might result in extra rows in the result set.

Related

How do I update a column from a table with data from a another column from this same table?

I have a table "table1" like this:
+------+--------------------+
| id | barcode | lot |
+------+-------------+------+
| 0 | ABC-123-456 | |
| 1 | ABC-123-654 | |
| 2 | ABC-789-EFG | |
| 3 | ABC-456-EFG | |
+------+-------------+------+
I have to extract the number in the center of the column "barcode", like with this request :
SELECT SUBSTR(barcode, 5, 3) AS ToExtract FROM table1;
The result:
+-----------+
| ToExtract |
+-----------+
| 123 |
| 123 |
| 789 |
| 456 |
+-----------+
And insert this into the column "lot" .
follow along the lines
UPDATE table_name
SET column1 = value1, column2 = value2, ...
WHERE condition;
i.e in your case
UPDATE table_name
SET lot = SUBSTR(barcode, 5, 3)
WHERE condition;(if any)
UPDATE table1 SET Lot = SUBSTR(barcode, 5, 3)
-- WHERE ...;
Many databases support generated (aka "virtual"/"computed" columns). This allows you to define a column as an expression. The syntax is something like this:
alter table table1 add column lot varchar(3) generated always as (SUBSTR(barcode, 5, 3))
Using a generated column has several advantages:
It is always up-to-date.
It generally does not occupy any space.
There is no overhead when creating the table (although there is overhead when querying the table).
I should note that the syntax varies a bit among databases. Some don't require the type specification. Some use just as instead of generated always as.
CREATE TABLE Table1(id INT,barcode varchar(255),lot varchar(255))
INSERT INTO Table1 VALUES (0,'ABC-123-456',NULL),(1,'ABC-123-654',NULL),(2,'ABC-789-EFG',NULL)
,(3,'ABC-456-EFG',NULL)
UPDATE a
SET a.lot = SUBSTRING(b.barcode, 5, 3)
FROM Table1 a
INNER JOIN Table1 b ON a.id=b.id
WHERE a.lot IS NULL
id | barcode | lot
-: | :---------- | :--
0 | ABC-123-456 | 123
1 | ABC-123-654 | 123
2 | ABC-789-EFG | 789
3 | ABC-456-EFG | 456
db<>fiddle here

Join two tables returning all rows as single row from the second table

I want to get data in a single row from two tables which have one to many relation.
Primary table
Secondary table
I know that for each record of primary table secondary table can have maximum 10 rows. Here is structure of the table
Primary Table
-------------------------------------------------
| ImportRecordId | Summary |
--------------------------------------------------
| 1 | Imported Successfully |
| 2 | Failed |
| 3 | Imported Successfully |
-------------------------------------------------
Secondary table
------------------------------------------------------
| ImportRecordId | CodeName | CodeValue |
-------------------------------------------------------
| 1 | ABC | 123456A |
| 1 | DEF | 8766339 |
| 1 | GHI | 887790H |
------------------------------------------------------
I want to write a query with inner join to get data from both table in a way that from secondary table each row should be treated as column instead showing as multiple row.
I can hard code 20 columns names(as maximum 10 records can exist in secondary table and i want to display values of two columns in a single row) so if there are less than 10 records in the secondary table all other columns will be show as null.
Here is expected Output. You can see that for first record in primary table there was only three rows that's why two required columns from these three rows are converted into columns and for all others columns values are null.
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| ImportRecordId | Summary | CodeName1 | CodeValue1 | CodeName2 | CodeValue2 | CodeName3 | CodeValue3 | CodeName4 | CodeValue4| CodeName5 | CodeValue5| CodeName6 | CodeValue6| CodeName7 | CodeValue7 | CodeName8 | CodeValue8 | CodeName9 | CodeValue9 | CodeName10 | CodeValue10|
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 1 | Imported Successfully | ABC | 123456A | DEF | 8766339 | GHI | 887790H | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL |
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Here is my simple SQL query which return all data from both tables but instead multiple rows from secondary table i want to get them in a single row like above result set.
Select p.ImportRecordId,p.Summary,s.*
from [dbo].[primary_table] p
inner join [dbo].[secondary_table] s on p.ImportRecordId = s.ImportRecordId
The following uses Row_Number(), a JOIN and a CROSS APPLY to create the source of the PIVOT
You'll have to add the CodeName/Value 4...10
Example
Select *
From (
Select A.[ImportRecordId]
,B.Summary
,C.*
From (
Select *
,RN = Row_Number() over (Partition by [ImportRecordId] Order by [CodeName])
From Secondary A
) A
Join Primary B on A.[ImportRecordId]=B.[ImportRecordId]
Cross Apply (values (concat('CodeName' ,RN),CodeName)
,(concat('CodeValue',RN),CodeValue)
) C(Item,Value)
) src
Pivot (max(value) for Item in (CodeName1,CodeValue1,CodeName2,CodeValue2,CodeName3,CodeValue3) ) pvt
Returns
ImportRecordId Summary CodeName1 CodeValue1 CodeName2 CodeValue2 CodeName3 CodeValue3
1 Imported Successfully ABC 123456A DEF 8766339 GHI 887790H

SQL conditional join with default values

I'm kinda struggling with this query, I have the following table:
setting
-------
id | name | value | type
--------------------------------
1 | title | Hi | string
2 | color | #ff0000 | string
user_setting
-------
id | userId | settingId | value
--------------------------------
1 | 1 | 1 | Hello
user
-------
id | email
1 | foo#test.com
I want to run a query that will select all settings for user 1, but also include the default value, so ideally I get this:
id | default | value
-----------------------
title | Hi | Hello
color | #ff0000 | null
My current query is
SELECT setting.id, setting.name, setting.value, user_setting.value, user.id
FROM setting
RIGHT JOIN user_setting
ON setting.id = "user_setting"."settingId"
LEFT OUTER JOIN user
ON "user_setting"."userId" = user.id
WHERE user.id = 1
But this only gives me the values that the user has defined.
EDIT: Updated setting table
I think you want a left join. But I think your setting is missing a column for setting_id (or whatever it is called). So the table should really be:
setting
-------
id | name | value | type
--------------------------------
1 | title | Hi | string
2 | color | #ff0000 | string
Otherwise user_settings.setting_id doesn't refer to anything. With this column, you want:
select s.name, s.value as default, us.value
from setting s left join
user_setting us
on us.setting_id = s.id and us.user_id = 1

Add columns from table 2 to columns at table1 based on Id matches (SQL)

I have a stored procedure and after performing certain calculations, i select the columns of the temp table to display at the UI.
Here is the end part of that stored procedure
SELECT Id, Data, Value from #preopt
The data which returns when we run this select statement is as follows.
Id | Data | Value
1 | xyz | 232
2 | abc | 222
3 | 3232 | www
Now I have one more table. This is not a temporary table. It has following data in it.
SELECT Id, List1, List2 from dbo.IdLists
Id | List1 | List2
1 | g23 | h323
45 | g21 | h44
2 | g455 | g45
3 | g32 | h48
I want my final table from stored proc to look like this. In the temp table #preopt. Basically it checks the Id column in #preopt and compares with Id column in dbo.IDlists. After comparison, it picks up List1 & List2 columns and adds relevant value for that id to the temp table #preopt
Id | Data | Value | List1 | List2
1 | xyz | 232 | g23 | h323
2 | abc | 222 | g455 | g45
3 | 3232 | www | g32 | h48
Can someone please let me know if this is achievable?
This query should do the trick. Update your List1 and List2 in the temp table using the values from the join on IDLists.
UPDATE p
SET p.List1 = l.List1, p.List2 = l.List2
FROM #preopt p
INNER JOIN dbo.IdLists l
ON p.Id = l.Id
Looks like you want to do a join.
SELECT po.Id, po.Data, po.Value, il. from #preopt po
INNER JOIN dbo.IdLists il on po.Id = il.Id

View Table over Language/Client/Status Table

I would like to simplify my data with a view table, MainView but am having a hard time figuring it out.
I have a Fact table that is specific to clients, language, and status. The ID in the Fact table comes from a FactLink table that just has an FactLinkID column. The Status table has an Order column that needs to be shown in the aggregate view instead of the StatusID. My Main table references the Fact table in multiple columns.
The end goal will be to be able to query the view table by the compound index of LanguageID, StatusOrder, ClientID more simply than I was before, grabbing the largest specified StatusOrder and the specified ClientID or ClientID 1. So, that is what I was hoping to simplify with the view table.
So,
Main
ID | DescriptionID | DisclaimerID | Other
----+---------------+--------------+-------------
50 | 1 | 2 | Blah
55 | 4 | 3 | Blah Blah
Fact
FactID | LanguageID | StatusID | ClientID | Description
-------+------------+----------+----------+------------
1 | 1 | 1 | 1 | Some text
1 | 2 | 1 | 1 | Otro texto
1 | 1 | 3 | 2 | Modified text
2 | 1 | 1 | 1 | Disclaimer1
3 | 1 | 1 | 1 | Disclaimer2
4 | 1 | 1 | 1 | Some text 2
FactLink
ID
--
1
2
3
4
Status
ID | Order
---+------
1 | 10
2 | 100
3 | 20
MainView
MainID | StatusOrder | LanguageID | ClientID | Description | Disclaimer | Other
-------+-------------+------------+----------+---------------+-------------+------
50 | 10 | 1 | 1 | Some text | Disclaimer1 | Blah
50 | 10 | 2 | 1 | Otro texto | NULL | Blah
50 | 20 | 1 | 2 | Modified text | NULL | Blah
55 | 10 | 1 | 1 | Some text 2 | Disclaimer2 | Blah Blah
Here's how I implemented it with just a single column that references the Fact table:
DROP VIEW IF EXISTS dbo.KeywordView
GO
CREATE VIEW dbo.KeywordView
WITH SCHEMABINDING
AS
SELECT t.KeywordID, f.ClientID, f.Description Keyword, f.LanguageID, s.[Order] StatusOrder
FROM dbo.Keyword t
JOIN dbo.Fact f
ON f.FactLinkID = t.KeywordID
JOIN dbo.Status s
ON f.StatusID = s.StatusID
GO
CREATE UNIQUE CLUSTERED INDEX KeywordIndex
ON dbo.KeywordView (KeywordID, ClientID, LanguageID, StatusOrder)
My previous query queried for everything except for that StatusOrder. But adding in the StatusOrder seems to complicate things. Here's my previous query without the StatusOrder. When I created a view on a table with just a single Fact linked column it greatly simplified things, but extending that to two or more columns has proven difficult!
SELECT
Main.ID,
COALESCE(fDescription.Description, dfDescription.Description) Description,
COALESCE(fDisclaimer.Description, dfDisclaimer.Description) Disclaimer,
Main.Other
FROM Main
LEFT OUTER JOIN Fact fDescription
ON fDescription.FactLinkID = Main.DescriptionID
AND fDescription.ClientID = #clientID
AND fDescription.LanguageID = #langID
AND fDescription.StatusID = #statusID -- This actually needs to get the largest `StatusOrder`, not the `StatusID`.
LEFT OUTER JOIN Fact dfDescription
ON dfDescription.FactLinkID = Main.DescriptionID
AND dfDescription.ClientID = 1
AND dfDescription.LanguageID = #langID
AND dfDescription.StatusID = #statusID
... -- Same for Disclaimer
WHERE Main.ID = 50
Not sure if this the most performant or elegant way to solve this problem. But I finally thought of a way to do it. The problem with the solution below is that it can no longer be indexed. So, now to figure out how to do that without having to wrap it in a derived table.
SELECT
x.ID,
x.StatusOrder,
x.LanguageID,
x.ClientID,
x.Other,
MAX(x.Description),
MAX(x.Disclaimer)
FROM (
SELECT
Main.ID,
s.StatusOrder,
f.LanguageID,
f.ClientID,
f.Description,
NULL Disclaimer,
Main.Other
FROM Main
JOIN Fact f
ON f.FactID = Main.DescriptionID
JOIN Status s ON s.StatusID = f.StatusID
UNION ALL
SELECT
Main.ID,
s.StatusOrder,
f.LanguageID,
f.ClientID,
NULL Description,
f.Description Disclaimer,
Main.Other
FROM Main
JOIN Fact f
ON f.FactID = Main.DisclaimerID
JOIN Status s ON s.StatusID = f.StatusID
) x
GROUP BY x.ID, x.StatusOrder, x.LanguageID, x.ClientID, x.Other