BigQuery - Get fields of nested Repeated Records - sql

I am working with BigQuery tables that can have many levels of nested repeated record fields, as shown in the example.
I need to make a select on the given table, extract only some fields and ignore others, and at the end have the same structure except for the ignored fields.
I think I have to work with array_agg and unnest to get only the required fields from the repeated record fields, but don't know how to do.
In the example, I want to keep only DatiRighe and DatiRigheDettaglio as Structs and for each of them I want to keep everything except DatiRighe.Nota and DatiRigheDettaglio.cod_iva.

Try the following query for your requirement:
SELECT
* REPLACE((
SELECT
AS STRUCT * EXCEPT(Nota)
FROM (
SELECT
AS STRUCT DatiRighe.* REPLACE((
SELECT
AS STRUCT DatiRighe.DatiRigheDettaglio.* EXCEPT(cod_iva))AS DatiRigheDettaglio) )) AS DatiRighe)
FROM
`my-project_id.database.table`
In the query, I used EXCEPT to remove the unwanted column and I used REPLACE to replace the corresponding structure with a new modified one. Let me know if it helps.

Related

Insert data into an impala table with struct type

I have a table with detailed rows (for 1 id many rows). For this reason I have created a table with struct types in order to reduce the rows and make them 1 id 1 row.
How can I insert data into an impala table with struct types? Also, how can I aggregate after values from a struct type?
Instead of using struct to store multiple values, you can use group_concat(col, separator)
For example, if a customer has 3 account numbers and you want to store them in 1 row separated by comma, you can use below code -
select cust_id, name, group_concat(cust_acc,',') as concat_account
from cust_details
group by 1,2
You can use pipe if your data has comma in it.
Another benifit of above solution is, you can use split_part(concat_account,',',1) to get first account number.
Now, if your data is very complex and you cant use group concat, you can use struct. Its little tricky because you have to prepare data first in the struct format and then load them. Pls refer to below link - How to insert Array<Struct> values in Impala?

Is there a way to express AND in SIMILAR TO ignoring order of matches?

I have a Redshift table column that contains 1 to many hashtags (e.g. #a, #b, etc.). I want to write a query that finds rows where all tags from a given set exist (e.g. #a and #b) while not picking up other rows that have some but not all of the tags (e.g. only #a or only #b).
I can see how to do this with multiple LIKE statements (e.g. LIKE '%#a %' AND LIKE '%#b%') but I would really like to do it with a single statement. I can see how to do this with SIMILAR TO but not in a way that ignores ordering. The following would work but only if I include all possible combinations of ordering.
SELECT * FROM table WHERE field SIMILAR TO '(%#a%)(%#b%)|(%#b%)(%#a%)'
This works but having to list all combinations of the tags I'm looking for would be a royal pain and prone to error. Is there a way to express 'AND' in SIMLAR TO (or another function) in Redshift that ignores order?
Make sure to capture the whole tag in any position and not match on incomplete tags:
SELECT *
FROM table
WHERE (field LIKE '#a#%' OR field LIKE '%#a') AND
(field LIKE '#b#%' OR field LIKE '%#b')
This avoids matching data such as #ac#b
Use AND and LIKE:
SELECT t.*
FROM table t
WHERE field LIKE '%#a%' AND
field LIKE '%#b%';

Creating a calculated field table based on data in separate tables

It is straight forward to create a calculated field in a table that uses data IN the table... due to the fact that the expression builder is straight forward to use. However, it appears to me that the expression builder for the calculated field only works with data IN the table;
i.e: expression builder in table MYTABLE works with fields FIELD1.MYTABLE, FIELD2.MYTABLE etc.
Inventory Problem
My problem is that I have two 'count' fields that result from my queries that apply to INPUTQUERY and OUTPUTQUERY (gives me a count of all input data added and a count of all output data added) and now I want to subtract the two to get a stock.
I can't link the table that was created from my query because it wont be able to continually update do the relationship itself, and thus i'm stuck either using the expression builder/SQL.
First question:
Is it possible to have the expression builder reference data from other tables?
i.e expressionbuilder for:
MAINTABLE CALCULATEDFIELD.MAINTABLE = INPUTSUM.INPUTTABLE - OUTPUTSUM.OUTPUTTABLE
(which gives a difference of the two)?
Second question:
if the above isn't possible, can I do this through an SQL code ?
i.e
SELECT(data from INPUTSUM)
FROM(INPUTTABLE)
-
SELECT(data from OUTPUTSUM)
FROM(OUTPUTTABLE)
Try this:
SELECT SUM(T.INPUTSUM) - SUM(T.OUTPUTSUM) AS RESULTSUM
FROM
(
SELECT INPUTSUM, 0 AS OUTPUTSUM
FROM INPUTTABLE
UNION
SELECT 0 AS INPUTSUM, OUTPUTSUM
FROM OUTPUTTABLE
) AS T

SqlQuery and SqlFieldsQuery

it looks that SqlQuery only supports sql that starts with select *? Doesn't it support other sql that only select some columns like
select id, name from person and maps the columns to the corresponding POJO?
If I use SqlFieldQuery to run sql, the result is a QueryCursor of List(each List contains one record of the result). But if the sql starts with select *, the this list's contents would be different with field query like:
select id,name,age from person
For the select *, each List is constructed with 3 parts:
the first elment is the key of the cache
the second element is the pojo object that contains the data
the tailing element are the values for each column.
Why was it so designed? If I don't know what the sql that SqlFieldsQuery runs , then I need additional effort to figure out what the List contains.
SqlQuery returns key and value objects, while SqlFieldsQuery allows to select specific fields. Which one to use depends on your use case.
Currently select * indeed includes predefined _key and _val fields, and this will be improved in the future. However, generally it's a good practice to list fields you want to fetch when running SQL queries (this is true for any SQL database, not only Ignite). This way your code will be protected from unexpected behavior in case schema is changed, for example.

SQL - just view the description for explanation

I would like to ask if it is possible to do this:
For example the search string is '009' -> (consider the digits as string)
is it possible to have a query that will return any occurrences of this on the database not considering the order.
for this example it will return
'009'
'090'
'900'
given these exists on the database. thanks!!!!
Use the Like operator.
For Example :-
SELECT Marks FROM Report WHERE Marks LIKE '%009%' OR '%090%' OR '%900%'
Split the string into individual characters, select all rows containing the first character and put them in a temporary table, then select all rows from the temporary table that contain the second character and put these in a temporary table, then select all rows from that temporary table that contain the third character.
Of course, there are probably many ways to optimize this, but I see no reason why it would not be possible to make a query like that work.
It can not be achieved in a straight forward way as there is no sort() function for a particular value like there is lower(), upper() functions.
But there is some workarounds like -
Suppose you are running query for COL A, maintain another column SORTED_A where from application level you keep the sorted value of COL A
Then when you execute query - sort the searchToken and run select query with matching sorted searchToken with the SORTED_A column