How to remove Repeated field in BigQuery schema?

How to remove Repeated field in BigQuery schema? - sql

I have a schema that has a repeated field nested into another repeated field like so: person.children.toys. I want to make this inner field not repeated (so child can have only single nullable toy). I know that for such change I need to make a new table with new schema and run SQL query that inserts modified results into it, but I don't know how to make the query. I need it to select first toy (or null) for each child and insert resulting objects into new table. There is a guarantee that in source table all children have no more than 1 toy.

Below is for BigQuery Standard SQL
I know - it might look over-complicated - but it totally preserves original schema while eliminating all but first (or null) toys. This can be handy if your real schema has more than just few fields so you don't need to worry about them
#standardSQL
WITH `project.dataset.table` AS (
SELECT 1 id, STRUCT([STRUCT('mike' AS name, ['woody'] AS toys)] AS children) AS person UNION ALL
SELECT 2 id, STRUCT([STRUCT('nik', ['buzz', 'bobeep']), ('john', ['car', 'buzz', 'bobeep'])] AS children) AS person UNION ALL
SELECT 3 id, STRUCT([STRUCT('vincent', IF(TRUE,[],['']))] AS children) AS person
)
SELECT *
REPLACE(
(SELECT AS STRUCT *
REPLACE (
(SELECT ARRAY_AGG(t) FROM
(SELECT * REPLACE((SELECT toy FROM UNNEST(toys) toy WITH OFFSET ORDER BY OFFSET LIMIT 1) AS toys) FROM UNNEST(children)) t)
AS children)
FROM UNNEST([person]))
AS person)
FROM `project.dataset.table`
If to apply to below data
Row id person.children.name person.children.toys
1 1 mike toy1
2 2 nik toy2
toy3
john toy4
toy5
toy6
3 3 vincent
result will be
Row id person.children.name person.children.toys
1 1 mike toy1
2 2 nik toy2
john toy4
3 3 vincent null
Note: toys field originally REPEATED STRING becomes just STRING

I could give you a better answer if you had a better described schema, but with the data provided:
CREATE OR REPLACE TABLE `temp.flat` AS
WITH data AS (
SELECT 1 id, STRUCT([STRUCT(['woody']AS toy)] AS children) AS person
UNION ALL
SELECT 2 id, STRUCT([STRUCT(['buzz', 'bobeep'])] AS children) AS person
UNION ALL
SELECT 3 id, STRUCT([STRUCT(IF(true,[],['']))] AS children) AS person
)
SELECT id, person.children[SAFE_OFFSET(0)].toy[SAFE_OFFSET(0)] first_toy
FROM `data`
Goes from:
To:

Related

Creating a Materialized Path for a tree with SQL (Teradata)

I have an org tree given in a table, which is sorted "top-down" (from parent to child). The level of each instance is also given as an attribute. data structure example:
Index
Employee_name
Employee_level
1
Michael
1
2
Pam
2
3
Jim
2
4
Dwight
3
5
Angela
1
In the above tree, Michael is the parent of Pam and Jim, while Jim is the parent of Dwight. Angela is parallel to Michael with no children.
I wish to create a column which would allow to query all the employees in a selected branch. After some reseach, I think that a Materialized path could work. therefore, I would probably need to create a column with the parent of each employee, and then have a recursion create another column with the desired key. Any ideas how to create this with Teradata SQL? Thanks

This gets you the parent:
select t.*,
(
select max(index)
from tab as t2
where t2.index < t.index
and t2.Employee_level < t.Employee_level
) as parent_idx
from tab as t
When you materialize it in a table you can do simple recursion:
WITH RECURSIVE cte AS
( -- traverse the hierarchy and built the path
SELECT idx, parent_idx, Employee_level, Employee_name
,Cast(Employee_name AS VARCHAR(500)) AS Path -- must be large enough for concatenating all levels
FROM mytab
WHERE Employee_level = 1
UNION ALL
SELECT t.idx, t.parent_idx, t.Employee_level, t.Employee_name
,cte.Path || ',' || Trim(t.Employee_name)
FROM cte JOIN mytab AS t
ON cte.idx = t.parent_idx
)
select * from cte
order by idx
;

Select all related records

I have a table (in SQL Server) that stores records as shown below. The purpose for Old_Id is for change tracking.
Meaning that when I want to update a record, the original record has to be unchanged, but a new record has to be inserted with a new Id and with updated values, and with the modified record's Id in Old_Id column
Id Name Old_Id
---------------------
1 Paul null
2 Paul 1
3 Jim null
4 Paul 2
5 Tim null
My question is:
When I search for id = 1 or 2 or 4, I want to select all related records.
In this case I want see records the following ids: 1, 2, 4
How can it be written in a stored procedure?
Even if it's bad practice to go with this, I can't change this logic because its legacy database and it's quite a large database.
Can anyone help with this?

you can do that with Recursive Common Table Expressions (CTE)
WITH cte_history AS (
SELECT
h.id,
h.name,
h.old_id
FROM
history h
WHERE old_id IS NULL
and id in (1,2,4)
UNION ALL
SELECT
e.id,
e.name,
e.old_id
FROM
history e
INNER JOIN cte_history o
ON o.id = e.old_id
)
SELECT * FROM cte_history;

sql complex order by multiple fields

I have a table category that has an int field that can reference the primary key in the same table.
like this:
ID category isSubCategoryOf orderingNumber
3 "red t-shirts" 2 2
1 "clothes" NULL 1
4 "cars" NULL 1
6 "Baby toys" 5 1
5 "Toys" NULL 1
2 "t-shirt" 1 1
I want the table to be order such that under each category all sub-categories are listed and under that category all of that sub-category.
ID category isSubCategoryOf orderingNumber
1 "clothes" NULL 1
2 "t-shirt" 1 1
3 "red t-shirts" 2 2
4 "cars" NULL 1
5 "Toys" NULL 1
6 "Baby toys" 5 1
Is such a thing possible to do with SQL or do I have to order this later by hand?

I I understand your needs correctly, you'll need a recursive query to deal with your hierarchical data:
WITH recCTE AS
(
--recursive seed
SELECT
category,
ID,
isSubCategoryOf as ParentID,
orderingNumber,
CAST(ID as VARCHAR(100)) as Path,
1 as depth
FROM table
WHERE isSubCategoryOf IS NULL
UNION ALL
--recursive term
SELECT
table.category,
table.id,
table.isSubCategoryOf,
table.orderingNumber,
recCTE.path + '>' + table.id,
recCTE.depth + 1
FROM recCTE
INNER JOIN table ON
recCTE.ID = table.isSubCategoryOf
)
SELECT * FROM recCTE ORDER BY path
Recursive queries are made up of three parts.
The recursive seed, which is the starting point of the recursive look ups. In your case it's any record with a NULL isSubCategoryOf. You can think of these as the Root of your hierarchy.
The recursive term, which is the part of the recursive CTE that refers back to itself. It iterates until it comes up with no records for each leg of the hierarchy
The final Select statement that selects from the recursive CTE.
Here I made a path field that stitches together each ID that is part of the hieararchy. This gives you the field you can sort on to get your hierarchical sort as asked.
It seems like, with your data, your orderingNumber is akin to the depth field that I added to the recursive CTE above. If that's the case, then you can remove that field from the CTE and save a bit of processing.

if you just want to sotrt and print you can make calculated column
select ID, category, isSubCategoryOf, orderingNumber, ID + '_' + isSubCategoryOf as SortOrder
from table
order by ID + '_' + isSubCategoryOf

SQL: Tree structure without parent key

Note: The Data schema can not be changed. I'm stuck with it.
Database: SQLite
I have a simple tree structure, without parent keys, that is only 1 level deep. I have simplied the data for clarity:
ID Content Title
1 Null Canada
2 25 Toronto
3 33 Vancouver
4 Null USA
5 45 New York
6 56 Dallas
The structure is ordinal as well so all Canadian Cities are > Canada's ID of 1 and less than the USA's ID of 4
Question: How do I select all a nation's Cities when I do not know how many there are?

My query assigns every city to every country, which is probably not what you want, but:
http://sqlfiddle.com/#!5/94d63/3
SELECT *
FROM (
SELECT
place.Title AS country_name,
place.ID AS id,
(SELECT MIN(ID)
FROM place AS next_place
WHERE next_place.ID > place.ID
AND next_place.Content IS NULL
) AS next_id
FROM place
WHERE place.Content IS NULL
) AS country
INNER JOIN place
ON place.ID > country.id
AND CASE WHEN country.next_id IS NOT NULL
THEN place.ID < country.next_id
ELSE 1 END

select * from tbl
where id > 1
and id < (select min(id) from tbl where content is null and id > 1)
EDIT
I just realized the above does not work if there are no countries with greater ID. This should fix it.
select * from tbl a
where id > 4
and id < (select coalesce(b.id,a.id+1) from tbl b where b.content is null and b.id > a.id)
Edit 2 - Also made subquery fully correlated, so only have to change country id in one place.

You have here severals things to consider, one is if your data is gonna change and the other one is if it isn't gonna change, for the first one exist 2 solutions, and for the second, just one.
If your data is organize as shown in your example, you can do a select top 3, i.e.
SELECT * FROM CITIES WHERE ID NOT IN (SELECT TOP 3 ID FROM CITIES)
You can create another table where you specify wich city belongs to what parent, and make the hierarchy by yourself.
I reccomend the second one to be used.

Select values in SQL that do not have other corresponding values except those that i search for

I have a table in my database:
Name | Element
1 2
1 3
4 2
4 3
4 5
I need to make a query that for a number of arguments will select the value of Name that has on the right side these and only these values.
E.g.:
arguments are 2 and 3, the query should return only 1 and not 4 (because 4 also has 5). For arguments 2,3,5 it should return 4.
My query looks like this:
SELECT name FROM aggregations WHERE (element=2 and name in (select name from aggregations where element=3))
What do i have to add to this query to make it not return 4?

A simple way to do it:
SELECT name
FROM aggregations
WHERE element IN (2,3)
GROUP BY name
HAVING COUNT(element) = 2
If you want to add more, you'll need to change both the IN (2,3) part and the HAVING part:
SELECT name
FROM aggregations
WHERE element IN (2,3,5)
GROUP BY name
HAVING COUNT(element) = 3
A more robust way would be to check for everything that isn't not in your set:
SELECT name
FROM aggregations
WHERE NOT EXISTS (
SELECT DISTINCT a.element
FROM aggregations a
WHERE a.element NOT IN (2,3,5)
AND a.name = aggregations.name
)
GROUP BY name
HAVING COUNT(element) = 3
It's not very efficient, though.

Create a temporary table, fill it with your values and query like this:
SELECT name
FROM (
SELECT DISTINCT name
FROM aggregations
) n
WHERE NOT EXISTS
(
SELECT 1
FROM (
SELECT element
FROM aggregations aii
WHERE aii.name = n.name
) ai
FULL OUTER JOIN
temptable tt
ON tt.element = ai.element
WHERE ai.element IS NULL OR tt.element IS NULL
)
This is more efficient than using COUNT(*), since it will stop checking a name as soon as it finds the first row that doesn't have a match (either in aggregations or in temptable)

This isn't tested, but usually I would do this with a query in my where clause for a small amount of data. Note that this is not efficient for large record counts.
SELECT ag1.Name FROM aggregations ag1
WHERE ag1.Element IN (2,3)
AND 0 = (select COUNT(ag2.Name)
FROM aggregatsions ag2
WHERE ag1.Name = ag2.Name
AND ag2.Element NOT IN (2,3)
)
GROUP BY ag1.name;
This says "Give me all of the names that have the elements I want, but have no records with elements I don't want"

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How to remove Repeated field in BigQuery schema? - sql

Related

Creating a Materialized Path for a tree with SQL (Teradata)

Select all related records

sql complex order by multiple fields

SQL: Tree structure without parent key

Select values in SQL that do not have other corresponding values except those that i search for

Categories

Resources