In Google BigQuery what is the benefit of ARRAY of STRUCT vs STRUCT of ARRAY? - sql

I'm just getting to grips with ARRAYS and STRUCT in BigQuery
I'm wondering why one would choose this formatting
SELECT [STRUCT('Alice' AS col_1, 'Bob' AS col_2),STRUCT('Charlie' AS col_1, 'David' AS col_2)] AS names;
over formatting like this
SELECT STRUCT(['Alice','Charlie'] AS col_1, ['Bob','David'] AS col_2) AS names;
output 1
for both the output looks the same. What would be an example of why you would use one over the other? To me the first example makes more sense because I'd want Alice and Bob to be on the same record and it's more clear in the first example. However I've seen in Google's Vertex AI prediction output they use the second example. I.e. when outputting binary predictions they output as
SELECT STRUCT([0,1] AS class, [0.8,0.2] AS col_2) AS prediction;
instead of
SELECT [STRUCT(0 AS class, 0.8 AS prediction),STRUCT(1 AS class, 0.2 AS prediction)] AS names;
output 2
When is the right time to use each?

It should be based on what a row/entity represents.
In your first example, each element of the array could be a pair/team and you can add as many as you want. Alice and Bob are pair 1, Charlie and David are pair 2...
The schema would be:
names (repeated field - struct)
- col_1 (string)
- col_2 (string)
In your second example, you have an entity with two types of names. An example here would be a list of names you like and names that you don't like. So Alice and Charlie would be liked (col_1) and you can add as many as you want to this list, on the other side Bob and David would be names that you don't like (and you can add as many here as well).
Schema.
names.col_1 (repeated field - string)
names.col_2 (repeated field - string)
In JSON format:
Example 1: [{"col_1":"Alice", "col_2":"Bob"}, {"col_1":"Charlie", "col_2":"David"}]
Example 2: {"col_1":['Alice', 'Charlie'], "col_2":["Bob", "David"]}

Related

Columns into JSON array

I have the following table:
Name
Pets
John
Bird
John
Cow
John
Dog
Nina
Cow
Nina
Fish
Nina
Cat
I would like to output it like so:
Name
Pets
John
["Bird","Cow","Dog"]
Nina
["Cow","Fish","Cat"]
I have this starting point, that converts a single column to JSON.
SELECT JSON_ARRAY(GROUP_CONCAT(column_name SEPARATOR ',')) AS names
FROM table_name;
I'm new to working with arrays and JSON in SQL. Is this possible? What is the best solution?
This approach is already a proper solution for this current case, just need to add GROUP BY expression, and exchange the aliases such as
SELECT name, JSON_ARRAY(GROUP_CONCAT(pets)) AS pets
FROM t
GROUP BY name
where , is the default seperator, then adding that is redundant
Demo
P.S. seems your DB is MySQL (version at least 5.7+) or its extension which's so called MariaDB or SQLite. It's expected to tag the DBMS, and its version, which you're using.

How can I display a row with a certain value in its string selected?

I'm working in SSRS, and I am trying to figure out how to display a row if a certain value in its string is selected from the multiselect parameter. Let's say we have a table like this:
ID Animals
-------------
1 Cat, dog, bird
2 Dog
3 Dog, pig, Cat
Whenever I choose dog in my parameter, all 3 rows should display. If I were to select cat, only 2 rows should show. I've used the InStr function to show or hide a column before, but I'm not sure how to use it select the rows. I've tried InStr(Fields!Animals.Value,"Dog"), but it only brings that one line.
Any suggestions or advice would be greatly appreciated!
You can use LIKE and use % on both sides of the word to indicate that you want to find 'Dog' anywhere in the string.
SELECT ID, Animals
FROM your_table
WHERE animals LIKE '%Dog%'
Try:
InStr(LCase(Fields!Animals.Value),"dog")
Since in SSRS Dog and dog is not the same, you have to low case your field.
This will work if you Animals field returns a string with names of multiple animals.
In this case:
LCASE("Dog, pig, Cat") will return "dog, pig, cat" you have to ensure that the string you want to search in the field is lower case too.

SQL Server Multiple Likes

I have an unusual question that seems simple but has me stumped in a SQL Server stored procedure.
I have 2 tables as described below.
tblMaster
ID, CommitDate, SubUser, OrigFileName
Sample data
ID CommitDate SubUser OrigFileName
----------------------------------------
1 2014-10-07 Test1 Test1.pdf
2 2014-10-08 Test2 Test2.pdf
3 2014-10-09 Test3 Test3.pdf
The above table is basically the header table that tracks the committed files. In addition to this, we have a details table with the following structure.
tblIndex
ID, FileID (Linking column to the header row described above), Word
Sample data:
1. 1, 1, Oil
2. 2, 1, oil
3. 3, 2, oil
4. 4, 2, tank
5. 5, 3, tank
The above rows represent the words that we want to search on and if a certain criteria matches return the corresponding filename/header row ID. What I would love to figure out to do is if I do a search for
One word (i.e. "oil"), then the system should respond with all the files that meet the criteria (easiest case and figured out)
If more than one word is searched for (i.e. "oil" and "tank"), then we should only see the second file since it is the only one that has both oil and tank as its key words.
Tried using a LIKE "%oil%" AND LIKE "%tank%" and that resulted in no rows being created since one value can't be both oil and tank.
Tried doing a LIKE "%oil%" OR LIKE "%tank%" but I get files 1, 2, and 3 since the OR is inclusive of all the other rows.
One last thing, I recognize I could just do a search for the first term and then save the results into a temp table and then search for the second term in that second table and I will get what I am looking for. The problem with that is that I don't exactly know how many items will be searched for. I don't want to have to create a structure where I am constantly having to store data into another temp table if someone does a search for 6 "keywords".
Any help/ideas will be much appreciated.
try this ! slightly differing from the previous answer
SELECT distinct FileID,COUNT(distinct t.word) FROM tblIndex t
WHERE t.Word LIKE '%oil%' OR t.Word LIKE '%tank%'
GROUP BY FileID
HAVING COUNT(distinct t.word) > 1
One simple option would be to do something like this :
SELECT FileID
FROM tblIndex t
WHERE t.Word LIKE '%oil%' OR t.Word LIKE '%tank%'
GROUP BY FileID
HAVING COUNT(*) > 1
This assume you do not have duplicate in your tblIndex.
I'm also unsure whether you really need the like or not. According to your sample data you don't, a basic comparison would be way more efficient and would avoid possible collisions.

Pig Latin: using field in one table as position value to access data in another table

Let's say we have two tables. The first table has following description:
animal_count: {zoo_name:chararray, counts:()}
The meaning of "zoo_name" fields is obvious. "counts" fields contains counts of each specific animal species. In order to know what exact species a given field in "counts" tuple represents, we use another table:
species_position : {species:chararray, position:int}
Let assume we have following data in "species_position" table:
"tiger", 0
"elephant", 1
"lion", 2
This data means the first field in animal_count.counts is the number of tigers in a given zoo. The second field in that tuple is the number of elephants, and so on. So, if we want to represent that fact that "san diego zoo" has 2 tigers, 4 elephants and no lion, we will have following data in "animal_count" table:
"san diego zoo", (2, 4, 0)
Given this setup, how can I write a query to extract the number of a given species in all zoos? I was hoping for something like:
FOREACH species_position GENERATE species, animal_count.counts.$position;
Of course, the "animal_count.counts.$position" won't work.
Is this possible without resorting to UDF?

Postgres text search on multiple rows

I have a table called 'exclude' that contains hashtags:
-------------
id | tag
-------------
1 #oxford
2 #uk
3 #england
-------------
I have another table called 'post':
-----------------------------------------------
id | tags | text
1 #oxford #funtimes Sitting in the sun
2 #oz Beach fun
3 #england Milk was a bad choice
-----------------------------------------------
In order to do a text search on the posts tags I've been running a query like follows:
SELECT * FROM post WHERE to_tsvector(tags) ## plainto_tsquery('mysearchterm')
However, I now want to be able to exclude all posts where some or all of the tags are in my exclude table. Is there any easy way to do this in SQL/Postgres?
I tried converting the tags row into one column, and using this term within the plainto_tsquery function but it doesn't work (I don't know how to do a text search 'not equal' to either, hence the logic is actual wrong, albeit on the right lines in my mind):
select * from post where to_tsvector(tags) ## plainto_tsquery(
select array_to_string(array(select RTRIM(value) from exclude where key = 'tag'), ' ')
)
What version of PostgreSQL are you on? And how flexible is your schema design? In other words, can you change it at will? Or is this out of your control?
Two things immediately popped to mind when I read your questions. One is you should be able to use array and the the #> (contains) or <# (is contains by) operators.
Here is documentation
Second, you might be able to utilize an hstore and do a similar operation.
to:
hstore #> hstore
It's not a true hstore, because you are not using a real key=>value pair. But, I guess you could do {tagname}=True or {tagname}=NULL. Might be a bit hackish.
You can see the documentation (for PostgreSQL 9.1) hstore and how to use it here