SQL equivalent "$all" MongoDB operator on Repeated string - sql

Suppose the following Data structure:
MongoDB: {id: ObjectId, colors: String[]}
SQL: Column ID (Integer), Column COLORS (Repeated String)
Suppose the following MongoDB query:
collection.find({colors: {$all: ["blue", "orange", "yellow"]} })
What is the equivalent operator/notation for "$all" in SQL? Notice that different from the $in, $all looks for documents (rows) having that field matching ALL the values, not only "some" of them.

Assuming there are no duplicates in the repeated values, you can use:
select s.*
from sql s
where (select count(*)
from unnest(s.colors) color
where color in ('blue', 'orange', 'yellow')
) = 3;
The "3" is the size of the list. If there are duplicates, then use count(distinct color) instead.
If you don't want to "remember" 3, you can use:
with color_list as (
select color
from unnest(array['blue', 'orange', 'yellow']) color
)
select s.*
from sql s
where (select count(*)
from unnest(s.colors) color join
color_list cl
using (color)
) = (select count(*) from color_list);
Or even:
select s.*
from sql s
where not exists (select 1
from unnest(array['blue', 'orange', 'yellow']) my_color left join
unnest(s.colors) color
on my_color = color
where color is null
);

Below is for BigQuery Standard SQL
#standardSQL
create temp function check_all(arr ANY TYPE, match ANY TYPE) as (
array_length(array(
select distinct m from unnest(match) m
join unnest(arr) m using(m)
)) = array_length(array(
select distinct m from unnest(match) m
))
);
select *
from `project.dataset.table`
where check_all(colors, ['blue', 'orange', 'yellow'])
if to apply to below dummy sample data
with `project.dataset.table` as (
select 1 id, ['blue', 'orange', 'yellow', 'black'] colors union all
select 2, ['blue', 'pink', 'yellow', 'green'] union all
select 3, ['red', 'orange', 'blue', 'pink', 'yellow', 'green']
)
the output is

Related

PostgreSQL: Select unique rows where distinct values are in list

Say that I have the following table:
with data as (
select 'John' "name", 'A' "tag", 10 "count"
union all select 'John', 'B', 20
union all select 'Jane', 'A', 30
union all select 'Judith', 'A', 40
union all select 'Judith', 'B', 50
union all select 'Judith', 'C', 60
union all select 'Jason', 'D', 70
)
I know there are a number of distinct tag values, namely (A, B, C, D).
I would like to select the unique names that only have the tag A
I can get close by doing
-- wrong!
select
distinct("name")
from data
group by "name"
having count(distinct tag) = 1
however, this will include unique names that only have 1 distinct tag, regardless of what tag is it.
I am using PostgreSQL, although having more generic solutions would be great.
You're almost there - you already have groups with one tag, now just test if it is the tag you want:
select
distinct("name")
from data
group by "name"
having count(distinct tag) = 1 and max(tag)='A'
(Note max could be min as well - SQL just doesn't have single() aggregate function but that's different story.)
You can use not exists here:
select distinct "name"
from data d
where "tag" = 'A'
and not exists (
select * from data d2
where d2."name" = d."name" and d2."tag" != d."tag"
);
This is one possible way of solving it:
select
distinct("name")
from data
where "name" not in (
-- create list of names we want to exclude
select distinct name from data where "tag" != 'A'
)
But I don't know if it's the best or most efficient one.

Django many to one reverse query on the same field

I have a one to many relationship between two tables ...
Item - id, title
ItemAttribute - id, item_id, attribute_code(string, indexed)
where attribute_code can have values for colors, sizes, qualities, dimensions, etc,
- like codes for 'Blue', 'Black', 'White', 'S', 'L', 'XL', '250g', '400g', etc
Question: How do I query for all Items that are either ('Blue' OR 'White') AND 'XL'
I prefer using Django's ORM, but if someone could even help out with raw SQL it would do just fine. Thanks.
This subquery:
select item_id
from item_attribute
group by item_id
having sum(attribute_code in ('Blue', 'White')) > 0 and sum(attribute_code = 'XL') > 0
returns all the item_ids that you want.
So you can join it to the table item:
select i.*
from item i inner join (
select item_id
from item_attribute
group by item_id
having sum(attribute_code in ('Blue', 'White')) > 0 and sum(attribute_code = 'XL') > 0
) a on a.item_id = i.id
or use the operator IN:
select *
from item
where id in (
select item_id
from item_attribute
group by item_id
having sum(attribute_code in ('Blue', 'White')) > 0 and sum(attribute_code = 'XL') > 0
)
If you have really large data sets you could also try EXISTS:
select i.*
from item i
where
exists (select 1 from item_attribute where item_id = i.id and attribute_code in ('Blue', 'White'))
and
exists (select 1 from item_attribute where item_id = i.id and attribute_code = 'XL')

How do I change the values in the array of struct in BQ

I haven't understood(I don't know) how to update the colour "yellow" to "blue" and its id to 5. Solving the example would help me to understand. Thanks
--create or replace table mydataset.struct_4 (boxes string,colors array<struct<colour string,id int64>>)
--insert into mydataset.struct_4 (boxes,colors) values("box_1",[("brown",1),("green",3),("white",7)]),("box_2",[("yellow",2),("white",4)])
select * from `mydataset.struct_4`
#standardSQL
SELECT
boxes,
ARRAY(
SELECT AS STRUCT
IF(colour = 'yellow', 'blue', colour) colour,
IF(colour = 'yellow', 5, id) id
FROM UNNEST(colors)
) colors
FROM `mydataset.struct_4`
Variation of above would be
#standardSQL
SELECT
boxes,
ARRAY(
SELECT IF(colour = 'yellow',
STRUCT<colour STRING,id INT64>('blue', 5),
STRUCT(colour, id))
FROM UNNEST(colors)
) colors
FROM `mydataset.struct_4`
with the same output of course
Row boxes colors.colour colors.id
1 box_1 brown 1
green 3
white 7
2 box_2 blue 5
white 4
Update for: but how do I update the colour "yellow" to "blue" and its id to 5 using UPDATE DML statement :)
#standardSQL
UPDATE `mydataset.struct_4` SET colors = ARRAY(
SELECT IF(colour = 'yellow',
STRUCT<colour STRING,id INT64>('blue', 5),
STRUCT(colour, id))
FROM UNNEST(colors)
) WHERE TRUE

How do I find elements in an array in BigQuery

I am trying to search for a row that has certain key value pairs in an array. A row in my BigQuery table would look something like this.
{
"ip": "192.168.1.1",
"cookie" [
{
"key": "apple",
"value: "red"
},
{
"key": "orange",
"value: "orange"
},
{
"key": "grape",
"value: "purple"
}
]
}
I thought about using implicit UNNEST or CROSS JOIN like the following, but it didn't work because unnesting it would just create multiple different rows.
SELECT ip
FROM table t, t.cookie c
WHERE (c.key = "grape" AND c.value ="purple") AND (c.key = "orange" AND c.value ="orange")
This link is really close to what I want to do, except they are using legacy SQL and not standardSQL
#standardSQL
SELECT ip
FROM yourTable
WHERE (
SELECT COUNT(1)
FROM UNNEST(cookie) AS pair
WHERE pair IN (('grape', 'purple'), ('orange', 'orange'))
) >= 2
you can test it with below dummy data
#standardSQL
WITH yourTable AS (
SELECT '192.168.1.1' AS ip, [('apple', 'red'), ('orange', 'orange'), ('grape', 'purple')] AS cookie UNION ALL
SELECT '192.168.1.2', [('abc', 'xyz')]
)
SELECT ip
FROM yourTable
WHERE (
SELECT COUNT(1)
FROM UNNEST(cookie) AS pair
WHERE pair IN (('grape', 'purple'), ('orange', 'orange'))
) >= 2
In case if you need output ip if at least one pair is in array - you need to change >= 2 to >=1 in WHERE clause
Mikhail's solution is good if it is guaranteed that there are no duplicate pairs in the cookie array. But if there could be duplicates, here is the alternative solution:
#standardSQL
WITH yourTable AS (
SELECT
'192.168.1.1' AS ip,
[('apple', 'red'), ('orange', 'orange'), ('grape', 'purple')] AS cookie UNION ALL
SELECT
'192.168.1.2',
[('abc', 'xyz'), ('orange', 'orange'), ('orange', 'orange')]
)
SELECT ip
FROM yourTable t
WHERE (
('grape', 'purple') IN UNNEST(t.cookie) AND
('orange', 'orange') IN UNNEST(t.cookie) )
Results in only
ip
-----------
192.168.1.1

Make a CTE from result of others CTE

I have several joined CTE. Something like:
;With CT1 AS(SELECT ..)
, CT2 AS(select)
SELECT *.T1,*T2 FROM CT1 T1 INNER JOIN CT2 T2 WHERE (some Condition ) GROUP BY (F1,F2, etc)
Now I need to join the result of this query to another CTE. What’s the best way? Can I make a CTE with the result of this Query? Any help would be greatly appreciated.
You can keep creating new CTEs based on previously defined ones. They may joined or otherwise combined, subject to the rules for CTEs.
; with
ArabicRomanConversions as (
select *
from ( values
( 0, '', '', '', '' ), ( 1, 'I', 'X', 'C', 'M' ), ( 2, 'II', 'XX', 'CC', 'MM' ), ( 3, 'III', 'XXX', 'CCC', 'MMM' ), ( 4, 'IV', 'XL', 'CD', '?' ),
( 5, 'V', 'L', 'D', '?' ), ( 6, 'VI', 'LX', 'DC', '?' ), ( 7, 'VII', 'LXX', 'DCC', '?' ), ( 8, 'VIII', 'LXXX', 'DCCC', '?' ), ( 9, 'IX', 'XC', 'CM', '?' )
) as Placeholder ( Arabic, Ones, Tens, Hundreds, Thousands )
),
Numbers as (
select 1 as Number
union all
select Number + 1
from Numbers
where Number < 3999 ),
ArabicAndRoman as (
select Number as Arabic,
( select Thousands from ArabicRomanConversions where Arabic = Number / 1000 ) +
( select Hundreds from ArabicRomanConversions where Arabic = Number / 100 % 10 ) +
( select Tens from ArabicRomanConversions where Arabic = Number / 10 % 10 ) +
( select Ones from ArabicRomanConversions where Arabic = Number % 10 ) as Roman
from Numbers ),
Squares as (
select L.Arabic, L.Roman, R.Arabic as Square, R.Roman as RomanSquare
from ArabicAndRoman as L inner join
ArabicAndRoman as R on R.Arabic = L.Arabic * L.Arabic
where L.Arabic < 16 ),
Cubes as (
select S.Arabic, S.Roman, S.Square, S.RomanSquare, A.Arabic as Cube, A.Roman as RomanCube
from Squares as S inner join
ArabicAndRoman as A on A.Arabic = S.Square * S.Arabic )
select *
from Cubes
order by Arabic
option ( MaxRecursion 3998 )
This is a format I have used a few times where a temp table is used to buffer one complex CTE which is output and then used again from temp with a second CTE.
It is useful if you need 2 result sets or if the complete CTE as one massive statement causes speed issues (breaking it up can be a huge performance improvement in some cases)
-- I do this "DROP" because in some cases where query is executed over and
-- over sometimes the object is not cleared before next transaction.
BEGIN TRY DROP TABLE #T_A END TRY BEGIN CATCH END CATCH;
WITH A AS (
SELECT 'A' AS Name, 1 as Value
UNION ALL SELECT 'B', 2
)
SELECT *
INTO #T_A
FROM A;
SELECT *
FROM #T_A ; -- Generate First Output Table
WITH B AS (
SELECT 'A' AS Name, 234 as Other
UNION ALL SELECT 'B', 456
)
-- Generate second result set from Temp table.
SELECT B.*, A. Value
FROM B JOIN #T_A AS A ON A.Name=B.Name
This produces a 2 table result set. Which is also handy in .NET filling a DataSet.