How to delete a STRUCT from an ARRAY in the NESTED field - sql

Is there a simple way to delete a STRUCT from the nested and repeated field in the BigQuery (BQ table column Type: RECORD, Mode: REPEATED).
Let's say I have the following tables:
wishlist
name toy.id toy.priority
Alice 1 high
2 medium
3 low
Kazik 3 high
1 medium
toys
id name available
1 car 0
2 doll 1
3 bike 1
I'd like to DELETE from wishlist toys that are not available (toys.available==0). In this case, it's toy.id==1.
As a result, the wishlist would look like this:
name toy.id toy.priority
Alice 2 medium
3 low
Kazik 3 high
I know how to select it:
WITH `project.dataset.wishlist` AS
(
SELECT 'Alice' name, [STRUCT<id INT64, priority STRING>(1, 'high'), (2, 'medium'), (3, 'low')] toy UNION ALL
SELECT 'Kazik' name, [STRUCT<id INT64, priority STRING>(3, 'high'), (1, 'medium')]
), toys AS (
SELECT 1 id, 'car' name, 0 available UNION ALL
SELECT 2 id, 'doll' name, 1 available UNION ALL
SELECT 3 id, 'bike' name, 1 available
)
SELECT wl.name, ARRAY_AGG(STRUCT(unnested_toy.id, unnested_toy.priority)) as toy
FROM `project.dataset.wishlist` wl, UNNEST (toy) as unnested_toy
LEFT JOIN toys t ON unnested_toy.id=t.id
WHERE t.available != 0
GROUP BY name
But I don't know how to remove structs <toy.id, toy.priority> from wishlist when toys.available==0.
There are very similar questions like How to delete/update nested data in bigquery or How to Delete rows from Structure in bigquery but the answers are either unclear to me in terms of deletion or suggest copying the whole wishlist to the new table using the selection statement. My 'wishlist' is huge and 'toys.availabililty' changes often. Copying it seems to me very inefficient.
Could you please suggest a solution aligned with BQ best practices?
Thank you!

... since row Deletion was implemented in BQ, I thought that STRUCT deletion inside a row is also possible.
You can use UPDATE DML for this (not DELETE as it is used for deletion of whole row(s), while UPDATE can be used to modify the row)
update `project.dataset.wishlist` wl
set toy = ((
select array_agg(struct(unnested_toy.id, unnested_toy.priority))
from unnest(toy) as unnested_toy
left join `project.dataset.toys` t on unnested_toy.id=t.id
where t.available != 0
))
where true;

You can UNNEST() and reaggregate:
SELECT wl.name,
(SELECT ARRAY_AGG(t)
FROM UNNEST(wl.toy) t JOIN
toys
ON toys.id = t.id
WHERE toys.available <> 0
) as available_toys
FROM `project.dataset.wishlist` wl;

Related

Subset large table for use in multiple UNIONs

Suppose I have a table with the following structure:
id measure_1_actual measure_1_predicted measure_2_actual measure_2_predicted
1 1 0 0 0
2 1 1 1 1
3 . . 0 0
I want to create the following table, for each ID (shown is an example for id = 1):
measure actual predicted
1 1 0
2 0 0
Here's one way I could solve this problem (I haven't tested this, but you get the general idea, I hope):
SELECT 1 AS measure,
measure_1_actual AS actual,
measure_1_predicted AS predicted
FROM tb
WHERE id = 1
UNION
SELECT 2 AS measure,
measure_2_actual AS actual,
measure_2_predicted AS predicted
FROM tb WHERE id = 1
In reality, I have five of these "measures" and tens of millions of people - subsetting such a large table five times for each member does not seem the most efficient way of doing this. This is a real-time API, receiving tens of requests a minute, so I think I'll need a better way of doing this. My other thought was to perhaps create a temp table/view for each member once the request is received, and then UNION based off of that subsetted table.
Does anyone have a more efficient way of doing this?
You can use a lateral join:
select t.id, v.*
from t cross join lateral
(values (1, measure_1_actual, measure_1_predicted),
(2, measure_2_actual, measure_2_predicted)
) v(measure, actual, predicted);
Lateral joins were introduced in Postgres 9.4. You can read about them in the documentation.

How to select rows from a hierarchical query filtering by descendant value in Oracle?

Given the table
ID PARENT_ID STRVAL SUBTYPE SUBVAL
0 null Chicago location city
1 0 Best Buy building bestbuy
2 0 Walmart building walmart
3 0 Amazon building amazon
4 1 Macbook object macbook
5 2 Sausages object sausages
6 3 Macbook object macbook
7 3 Tupperware object tupperware
What I'm attempting to do is query this table and get all items from level 1 (the buildings), but what I need to do is filter this return set by returning those that have children containing a certain value. The following query is what I have so far which returns Best Buy, Walmart, and Amazon
SELECT * FROM (
SELECT strval, parent_id, id
FROM stores
where LEVEL = 1
CONNECT BY PRIOR id = parent_id
START WITH parent_id = 0
)
What I would like to do is get a return where one of the descendants has a subtype of object and a subval of macbook, thus returning only Best Buy and Amazon from my query. I'm not really sure where to go from here.
SQLFiddle
Try reversing your CONNECT BY condition and starting with (i.e., START WITH) what you know:
SELECT DISTINCT strval, parent_id, id
FROM stores
where subtype = 'building'
CONNECT BY id = prior parent_id
START WITH subtype = 'object' and subval = 'macbook';
Update for more general question
In the comments, you asked what if the starting values aren't at the same level?
In that case, I'm afraid you'll have to look at the whole tree for each building and then filter.
I added this row to your test data:
insert into stores values (8, 4, 'Year','edition','2015');
Then, this query gives the answer:
WITH whole_tree AS
(SELECT strval,
parent_id,
id,
CONNECT_BY_ROOT(strval) building,
SYS_CONNECT_BY_PATH (subtype || ':' || subval, ',') PATH
FROM stores
CONNECT BY PRIOR id = parent_id
START WITH subtype = 'building')
SELECT distinct building
FROM whole_tree
WHERE PATH LIKE '%object:macbook%edition:2015%';
This join should give you macbook objects whose parents are buildings. Feel free to select columns you need only:
select *
from
(
select *
from stores
where subtype = 'object'
and strval = 'Macbook'
) macs
join
(
select *
from stores
where subtype = 'building'
) bld
on bld.id = macs.parent_id

Comparing list of values against table

I tried to find solution for this problem for some time but without success so any help would be much appreciated. List of IDs needs to be compared against a table and find out which records exist (and one of their values) and which are non existent. There is a list of IDs, in text format:
100,
200,
300
a DB table:
ID(PK) value01 value02 value03 .....
--------------------------------------
100 Ann
102 Bob
300 John
304 Marry
400 Jane
and output I need is:
100 Ann
200 missing or empty or whatever indication
300 John
Obvious solution is to create table and join but I have only read access (DB is closed vendor product, I'm just a user). Writing a PL/SQL function also seems complicated because table has 200+ columns and 100k+ records and I had no luck with creating dynamic array of records. Also, list of IDs to be checked contains hundreds of IDs and I need to do this periodically so any solution where each ID has to be changed in separate line of code wouldn't be very useful.
Database is Oracle 10g.
there are many built in public collection types. you can leverage one of them like this:
with ids as (select /*+ cardinality(a, 1) */ column_value id
from table(UTL_NLA_ARRAY_INT(100, 200, 300)) a
)
select ids.id, case when m.id is null then '**NO MATCH**' else m.value end value
from ids
left outer join my_table m
on m.id = ids.id;
to see a list of public types on your DB, run :
select owner, type_name, coll_type, elem_type_name, upper_bound, precision, scale from all_coll_types
where elem_type_name in ('FLOAT', 'INTEGER', 'NUMBER', 'DOUBLE PRECISION')
the hint
/*+ cardinality(a, 1) */
is just used to tell oracle how many elements are in our array (if not specified, the default will be an assumption of 8k elements). just set to a reasonably accurate number.
You can transform a variable into a query using CONNECT BY (tested on 11g, should work on 10g+):
SQL> WITH DATA AS (SELECT '100,200,300' txt FROM dual)
2 SELECT regexp_substr(txt, '[^,]+', 1, LEVEL) item FROM DATA
3 CONNECT BY LEVEL <= length(txt) - length(REPLACE(txt, ',', '')) + 1;
ITEM
--------------------------------------------
100
200
300
You can then join this result to the table as if it were a standard view:
SQL> WITH DATA AS (SELECT '100,200,300' txt FROM dual)
2 SELECT v.id, dbt.value01
3 FROM dbt
4 RIGHT JOIN
5 (SELECT to_number(regexp_substr(txt, '[^,]+', 1, LEVEL)) ID
6 FROM DATA
7 CONNECT BY LEVEL <= length(txt) - length(REPLACE(txt, ',', '')) + 1) v
8 ON dbt.id = v.id;
ID VALUE01
---------- ----------
100 Ann
300 John
200
One way of tackling this is to dynamically create a common table expression that can then be included in the query. The final synatx you'd be aiming for is:
with list_of_values as (
select 100 val from dual union all
select 200 val from dual union all
select 300 val from dual union all
...)
select
lov.val,
...
from
list_of_values lov left outer join
other_data t on (lov.val = t.val)
It's not very elegant, particularly for large sets of values, but compatibility with a database on which you might have few privileges is very good.

List category/subcategory tree and display its sub-categories in the same row

I have a hierarchical table of Regions and sub-regions, and I need to list a tree of regions and sub-regions (which is easy), but also, I need a column that displays, for each region, all the ids of it's sub regions.
For example:
id name superiorId
-------------------------------
1 RJ NULL
2 Tijuca 1
3 Leblon 1
4 Gavea 2
5 Humaita 2
6 Barra 4
I need the result to be something like:
id name superiorId sub-regions
-----------------------------------------
1 RJ NULL 2,3,4,5,6
2 Tijuca 1 4,5,6
3 Leblon 1 null
4 Gavea 2 4
5 Humaita 2 null
6 Barra 4 null
I have done that by creating a function that retrieves a STUFF() of a region row,
but when I'm selecting all regions from a country, for example, the query becomes really, really slow, since I execute the function to get the region sons for each region.
Does anybody know how to get that in an optimized way?
The function that "retrieves all the ids as a row" is:
I meant that the function returns all the sub-region's ids as a string, separated by a comma.
The function is:
CREATE FUNCTION getSubRegions (#RegionId int)
RETURNS TABLE
AS
RETURN(
select stuff((SELECT CAST( wine_reg.wine_reg_id as varchar)+','
from (select wine_reg_id
, wine_reg_name
, wine_region_superior
from wine_region as t1
where wine_region_superior = #RegionId
or exists
( select *
from wine_region as t2
where wine_reg_id = t1.wine_region_superior
and (
wine_region_superior = #RegionId
)
) ) wine_reg
ORDER BY wine_reg.wine_reg_name ASC for XML path('')),1,0,'')as Sons)
GO
When we used to make these concatenated lists in the database we took a similar approach to what you are doing at first
then when we looked for speed
we made them into CLR functions
http://msdn.microsoft.com/en-US/library/a8s4s5dz(v=VS.90).aspx
and now our database is only responsible for storing and retrieving data
this sort of thing will be in our data layer in the application

How to fetch categories and sub-categories in a single query in sql? (mysql)

I would like to know if it's possible to extract the categories and sub-categories in a single DB fetch.
My DB table is something similar to that shown below
table
cat_id parent_id
1 0
2 1
3 2
4 3
5 3
6 1
i.e. when the input is 3, then all the rows with parent_id as 3 AND the row 3 itself AND all the parents of row 3 should be fetched.
output
cat_id parent_id
3 2 -> The row 3 itself
4 3 -> Row with parent as 3
5 3 -> Row with parent as 3
2 1 -> 2 is the parent of row 3
1 0 -> 1 is the parent of row 2
Can this be done using stored procedures and loops? If so, will it be a single DB fetch or multiple? Or are there any other better methods?
Thanks!!!
If you asking about "Is there in mysql recursive queries?" answer "NO".
But there is very good approach to handle it.
Create helper table (saying CatHierarchy)
CatHierarchy:
SuperId, ChildId, Distance
------------------------------
1 1 0
1 2 1
2 2 0
This redundant data allows easily in 1 query to select any hierarchy, and in 2 insert support any hierarchy (deletion also performed in 1 query with help of delete cascade integrity).
So what does this mean. You track all path in hierarchy. Each node of Cat must add reference to itself (distance 0), then support duplication by adding redundant data about nodes are linked.
To select category with sub just write:
SELECT c.* from Category c inner join CatHierarchy ch ON ch.ChildId=c.cat_id
WHERE ch.SuperId = :someSpecifiedRootOfCat
someSpecifiedRootOfCat - is parameter to specify root of category
THATS ALL!
Theres a really good article about this on Sitepoint - look especially at Modified Preorder Tree Traversal
It's tricky. I assume you want to display categories, kind of like a folder view? Three fields: MainID, ParentID, Name... Apply to your table, and it should work like a charm. I think it's called a recursive query?
WITH CATEGORYVIEW (catid, parentid, categoryname) AS
(
SELECT catid, ParentID, cast(categoryname as varchar(255))
FROM [CATEGORIES]
WHERE isnull(ParentID,0) = 0
UNION ALL
SELECT C.catid, C.ParentID, cast(CATEGORYVIEW.categoryname+'/'+C.categoryname as varchar(255))
FROM [CATEGORIES] C
JOIN CATEGORYVIEW ON CATEGORYVIEW.catID = C.ParentID
)
SELECT * FROM CATEGORYVIEW ORDER BY CATEGORYNAME