is there a way to unnest bigquery array independently? - sql

Let's say I have this database:
with tbl as (
select
['Unknown','Eletric','High Voltage'] AS product_category,
['Premium','New'] as client_cluster
) select * from tbl
And its output:
row | product_category | client_cluster
--------------------------------------------------------------
1 | [Unknown, Eletric, High Voltage] | [Premium, New]
I would like to unnest this columns independently in a way that it will be then N rows where N would be the size of the biggest array I unnest and the output would look like this:
row | product_category | client_cluster
---------------------------------------------
1 | Unknown | Premium
2 | Eletric | New
3 | High Voltage | Null
And there is no order that I would like to imply. Is there a way to do that? I tried this stackoverflow but in my case it did not work as expected because of my arrays does not have the same size.

it did not work as expected because of my arrays does not have the same size.
for your specific sample in your question, you can left join unnested arrays.
WITH tbl AS (
SELECT ['Unknown','Eletric','High Voltage'] AS product_category, ['Premium','New'] as client_cluster
)
SELECT p AS product_category, c AS client_cluster
FROM tbl, UNNEST(product_category) p WITH offset
LEFT JOIN UNNEST(client_cluster) c WITH offset USING (offset);
But the length of product_category is less than that of client_cluster, it won't work as you wish.
WITH tbl AS (
SELECT ['Eletric','High Voltage'] AS product_category, ['Supreme', 'Premium','New'] as client_cluster
)
SELECT p AS product_category, c AS client_cluster
FROM tbl, UNNEST(product_category) p WITH offset
LEFT JOIN UNNEST(client_cluster) c WITH offset USING (offset);
I might be wrong, but as far as I know you can't use FULL JOIN or RIGHT JOIN with flattened array to solve this issue. If you try to do so, you will get:
Query error: Array scan is not allowed with FULL JOIN: UNNEST expression at [31:13]
So you might consider below workaround using a hidden table(array) for join key.
WITH tbl AS (
SELECT 1 id, ['Unknown','Eletric','High Voltage'] AS product_category, ['Premium','New'] as client_cluster,
UNION ALL
SELECT 2 id, ['Eletric','High Voltage'], ['Premium','New', 'Supreme']
)
SELECT id, p AS product_category, c AS client_cluster
FROM tbl, UNNEST(GENERATE_ARRAY(0, GREATEST(ARRAY_LENGTH(client_cluster), ARRAY_LENGTH(product_category)) - 1)) o0
LEFT JOIN UNNEST(product_category) p WITH offset o1 ON o0 = o1
LEFT JOIN UNNEST(client_cluster) c WITH offset o2 ON o0 = o2;
Query results

Related

Problem optimizing sql query with cross apply sub query

So I have three tables:
MakerParts, that holds the primary information of a Vehicle Part:
Id
MakerId
PartNumber
Description
1
1
ABC1234
Tire
2
1
XYZ1234
Door
MakerPrices, that holds the price history variation for the parts (references MakerParts.Id on MakerPartNumberId, and the table MakerPriceUpdates on UpdateId):
Id
MakerPartNumberId
UpdateId
Price
1
1
1
9.83
2
1
2
11.23
MakerPriceUpdates, that holds the date of prices updates. This update is basically a CSV file that is uploaded to our system. One file, one line on this table, multiple prices changes on the table MakerPrices.
Id
Date
FileName
1
2019-01-09 00:00:00.000
temp.csv
2
2019-01-11 00:00:00.000
temp2.csv
This means that one part (MakerParts) may have multiple prices (MakerPrices). The date of the price change is on the table MakerPricesUpdates.
I want to select all MakerParts where the most recent price is zero, filtering by the MakerId on table MakerParts.
What I've tried:
select mp.* from MakerParts mp cross apply
(select top 1 Price from MakerPrices inner join
MakerPricesUpdates on MakerPricesUpdates.Id = MakerPrices.UpdateId where
MakerPrices.MakerPartNumberId = mp.Id order by Date desc) as p
where mp.MakerId = 1 and p.Price = 0
But that is absurdly slow (we have about 100 million lines on the MakerPrices table). I'm having a hard time optimizing this query. (the result is only two rows for the MakerId 1, and it took 2 mins to run). I also tried:
select * from (
select
mp.*,
(select top 1 Price from MakerPrices inner join
MakerPricesUpdates on MakerPricesUpdates.Id = MakerPrices.UpdateId
where MakerPrices.MakerPartNumberId = mp.Id order by Date desc) as Price
from MakerParts mp) as temp
where temp.Price = 0 and MakerId = 1
Same result, and same time. My query plan (for the first query) (no new indexes suggested by Management Studio):
I think you can avoid joining MakerPriceUpdates with makerprices since with the highest
UpdateId you can find the latest price updates. It will save you some time.
select mp.* from MakerParts mp cross apply
(select top 1 Price from MakerPrices where
MakerPrices.MakerPartNumberId = mp.Id order by MakerPrices.UpdateId desc) as p
where mp.MakerId = 1 and p.Price = 0
You can further reduced some times by avoiding sort and order by with cte and row_number() as below:
;with LatestMakerPrices as
(
select *,row_number()over(partition by MakerPartNumberId order by updateid desc)rn from MakerPrices
)
select mp.* from MakerParts mp cross apply
(select price from LatestMakerPrices lmp where lmp.MakerPartNumberId=mp.Id) as p
where mp.MakerId = 1 and p.Price = 0
Execution plan difference between query in question and my answer:
try:
WITH tab AS (
SELECT *, NULL as Price FROM MakerParts
WHERE not exists (
SELECT Id
FROM MakerPrices
WHERE MakerPrices.MakerPartNumberId = MakerParts.Id
)
)
SELECT * from tab WHERE MakerId = 2
UNION ALL
SELECT a.* , Price
FROM [dbo].[MakerParts] a
LEFT JOIN [dbo].[MakerPrices] b
ON b.MakerPartNumberId = a.Id
WHERE MakerId = 2 AND Price = 0
Try your query:
select mp.* from MakerParts mp cross apply
(select top 1 Price from MakerPrices inner join
MakerPricesUpdates on MakerPricesUpdates.Id = MakerPrices.UpdateId where
MakerPrices.MakerPartNumberId = mp.Id order by Date desc) as p
where mp.MakerId = 1 and p.Price = 0
After creating below index:
CREATE NONCLUSTERED INDEX [NCIdx_MakerPrices_MakerPartNumberId_UpdateId] ON [dbo].[MakerPrices]
(
[MakerPartNumberId] ASC,
[UpdateId] ASC
)
INCLUDE([Price])
And making ID column of MakerPricesUpdates table primary key.

Optimize query with a subquery With Group BY MAX and JOINED with another TABLE

I need help to optimize this SQL query, so that it would run much faster.
What I am trying to do is, get the latest values of DATA out of these tables:
TABLE: Quotes
ID QuoteNumber LastUpdated(inticks) PolicyId
1 C1000 1000000000000 100
1 D2000 1001111111110 200
2 A1000 1000000000000 300
2 B2000 1002222222222 400
TABLE: Policies
ID CustomerName Blah1(dummy column)
100 Mark someData
200 Lisa someData2
300 Brett someData3
400 Goku someData4
DESIRED RESULT:
LastUpdated Id(quoteId) QuoteNumber CustomerName
1001111111110- -1- -D2000- -Lisa
1002222222222- -2- -B2000- -Goku
Select DISTINCT subquery1.LastUpdated,
q2.Id,
q2.QuoteNumber,
p.CustomerName
FROM
(Select q.id,
Max(q.LastUpdated) from Quotes q
where q.LastUpdated > #someDateTimeParameter
and q.QuoteNumber is not null
and q.IsDiscarded = 0
GROUP BY q.id) as subquery1
LEFT JOIN Quotes q2
on q2.id = subquery1.id
and q2.LastUpdated = subquery1.LastUpdated
INNER JOIN Policies p
on p.id = q2.PolicyId
where p.blah1 = #someBlahParameter
ORDER BY subquery1.LastUpdated
Here is the actual execution plan:
https://www.brentozar.com/pastetheplan/?id=SkD3fPdwD
I think you're looking for something like this
with q_cte as (
select q.Id, q.QuoteNumber, q.LastUpdated,
row_number() over (partition by q.id order by q.LastUpdated desc) rn
from Quotes q
where q.LastUpdated>#someDateTimeParameter
and q.QuoteNumber is not null
and q.IsDiscarded=0)
select q.*, p.CustomerName
from q_cte q
join Policies p on q.PolicyId=p.id
where q.rn=1 /* Only the lastest date */
and p.blah1=someBlahParameter
order by q.LastUpdated;

How to convert list of comma separated Ids into their name?

I have a table that contains:
id task_ids
1 10,15
2 NULL
3 17
I have the table that has the names of this tasks:
id task_name
10 a
15 b
17 c
I want to generate the following output
id task_ids task_names
1 10,15 a,b
2 null null
3 17 c
I know this structure isn't ideal but this is legacy table which I will not change now.
Is there easy way to get the output ?
I'm using Presto but I think this can be solved with native sql
WITH data AS (
SELECT * FROM (VALUES (1, '10,15'), (2, NULL)) x(id, task_ids)
),
task AS (
SELECT * FROM (VALUES ('10', 'a'), ('15', 'b')) x(id, task_name)
)
SELECT
d.id, d.task_ids
-- array_agg will obviously capture NULL task_name comping from LEFT JOIN, so we need to filter out such results
IF(array_agg(t.task_name) IS NOT DISTINCT FROM ARRAY[NULL], NULL, array_agg(t.task_name)) task_names
FROM data d
-- split task_ids by `,`, convert into numbers, UNNEST into separate rows
LEFT JOIN UNNEST (split(d.task_ids, ',')) AS e(task_id) ON true
-- LEFT JOIN with task to pull the task name
LEFT JOIN task t ON e.task_id = t.id
-- aggregate back
GROUP BY d.id, d.task_ids;
You have a horrible data model, but you can do what you want with a bit of effort. Arrays are better than strings, so I'll just use that:
select t.id, t.task_id, array_agg(tt.task_name) as task_names
from t left join lateral
unnest(split(t.task_ids, ',')) u(task_id)
on 1=1 left join
tasks tt
on tt.task_id = u.task_id
group by t.id, t.task_id;
I don't have Presto on hand to test this. But this or some minor variant should do what you want.
EDIT:
This version might work:
select t.id, t.task_id,
(select array_agg(tt.task_name)
from unnest(split(t.task_ids, ',')) u(task_id) join
tasks tt
on tt.task_id = u.task_id
) as task_names
from t ;

SQL - Select records not present in another table (3 table relation)

I have 3 tables:
Table_Cars
-id_car
-description
Table_CarDocuments
-id_car
-id_documentType
-path_to_document
Table_DocumentTypes
-id_documentType
-description
I want to select all cars that do NOT have documents on the table Table_CarDocuments with 4 specific id_documentType.
Something like this:
Car1 | TaxDocument
Car1 | KeyDocument
Car2 | TaxDocument
With this i know that i'm missing 2 documents of car1 and 1 document of car2.
You are looking for missing car documents. So cross join cars and document types and look for combinations NOT IN the car douments table.
select c.description as car, dt.description as doctype
from table_cars c
cross join table_documenttypes dt
where (c.id_car, dt.id_documenttype) not in
(
select cd.id_car, cd.id_documenttype
from table_cardocuments cd
);
UPDATE: It shows that SQL Server's IN clause is very limited and not capable of dealing with value lists. But a NOT IN clause can easily be replaced by NOT EXISTS:
select c.description as car, dt.description as doctype
from table_cars c
cross join table_documenttypes dt
where not exists
(
select *
from table_cardocuments cd
where cd.id_car = c.id_car
and cd.id_documenttype = dt.id_documenttype
);
UPDATE: As you are only interested in particular id_documenttype (for which you'd have to add and dt.id_documenttype in (1, 2, 3, 4) to the query), you can generate records for them on-the-fly instead of having to read the table_documenttypes.
In order to do that replace
cross join table_documenttypes dt
with
cross join (values (1), (2), (3), (4)) as dt(id_documentType)
You can use the query below to get the result:
SELECT
c.description,
dt.description
FROM
Table_Cars c
JOIN Table_CarDocuments cd ON c.id_car = cd.id_car
JOIN Table_DocumentTypes dt ON cd.id_documentType = dt.id_documentType
WHERE
dt.id_documentType NOT IN (1, 2, 3, 4) --replace with your document type id
Thanks to #Thorsten Kettner help
select c.description as car, dt.description as doctype
from table_cars c
cross join table_documenttypes dt
where dt.id no in (
(
select cd.id_documentType
from table_cardocuments cd
where cd.idcar = c.id AND cd.id_doctype = dt.id
)
AND dt.id IN (1, 2, 3, 4)
This can be a complicated query. The idea is to generate all combinations of cars and the four documents that you want (using cross join). Then use left join to determine if the document actually exists:
select c.id_car, dd.doctype
from cars c cross join
(select 'doc1' as doctype union all
select 'doc2' union all
select 'doc3' union all
select 'doc4'
) dd left join
CarDocuments cd
on c.id_car = cd.id_car left join
Documents d
on cd.id_document_type = d.id_document_type and d.doctype = dd.doctype
where dd.id_document_type is null;
Finally, the where clause finds the car/doctype pairs that are not present in the data.

Get Incremental index for specific rows

I want to get the incremental index when note exists for the row. I am trying to achieve the same with ROW_Number() but it seems there is a problem with the method being used to generate it.
SELECT * RowNo,
(SELECT CASE
WHEN LEN(Value) > 0 THEN ROW_NUMBER()
OVER (
ORDER BY ID)
ELSE ''
END
FROM Dictionary
WHERE ID = ABC.ID) Note
FROM ABCD AS ABC WITH(NOLOCK)
INNER JOIN XYZ AS XYZ WITH(NOLOCK)
ON ABC.Id = XYZ.ID
WHERE ABC.Id = 10
output expected:
ID Name Note
1 A 1
2 B
3 C 2
4 D
5 E
6 F 3
The subquery isn't needed here, and you want to use the partition by argument to separate values having len(value)>0 from those having no value:
SELECT
ID,
Name,
CASE WHEN LEN(Value)>0 THEN ROW_NUMBER() OVER (
PARTITION BY CASE WHEN LEN(Value)>0 THEN 1 ELSE 0 END
ORDER BY ID) ELSE '' END as Note
FROM ABCD AS ABC WITH(NOLOCK)
INNER JOIN XYZ AS XYZ WITH(NOLOCK)
ON ABC.Id = XYZ.ID
Where ABC.Id = 10
I think maybe you need to change the approach to make the Dictionary query the "main" query. It's hard to say without knowing exactly what your tables look like. Which Table does the "Id" in your expected output come from?
Try like this:
WITH cte AS (
SELECT ID, ROW_NUMBER() OVER (ORDER BY ID) AS Note
FROM Dictionary WHERE ID=10
AND LEN(Value)>0
)
SELECT ABC.ID, [Name], cte.Note
FROM ABCD AS ABC WITH(NOLOCK)
INNER JOIN XYZ AS XYZ WITH(NOLOCK) ON ABC.Id = XYZ.ID
LEFT OUTER JOIN cte ON ABC.Id=cte.ID