Inserting into Bigquery table with array of arrays - google-bigquery

How to insert a record into a BigQuery table with nested arrays 2 levels deep.
ORDER table has an array ORDER_DETAIL which has an array ORDER_DISCOUNTS.
Below is not working.
INSERT INTO ORDER (ORDER_ID, OrderDetail)
SELECT OH.ORDER_ID, ARRAY_AGG(struct(OD.line_id, OD.item_id, ARRAY_AGG(struct(ODIS.discounttype)) )
FROM ORDER_HEADER OH LEFT JOIN ORDER_DETAIL OD, ORDER_DISCOUNTS ODIS
ON OH.ORDER_ID = OD.ORDER_ID AND ODIS.ORDER_ID = OD.ORDER_ID and ODIS.LINE_ID = OD.LINE_ID
WHERE OH.ORDER_ID = 'ABCD'

I can't see the GROUP BYs in the sample question. Reproducing here with public data to show how to make arrays of arrays in BigQuery:
WITH data AS (
SELECT *
FROM `fh-bigquery.wikipedia_v3.pageviews_2019`
JOIN UNNEST(['Andy_K%','Boys%','Catri%']) start
ON title LIKE start
WHERE DATE(datehour) = "2019-09-01"
AND wiki='en'
)
SELECT start, ARRAY_AGG(STRUCT(title, views) LIMIT 10) title_views
FROM (
SELECT start, title, ARRAY_AGG(STRUCT(datehour,views) LIMIT 3) views
FROM data
GROUP BY start, title
)
GROUP BY start

Related

SQL query is loading for long period, how it could be optimized?

This is the query:
SELECT
[Code]
FROM (
SELECT
ROW_NUMBER() OVER (PARTITION BY [OrderNo], [ProductNo] ORDER BY [Quantity] DESC) AS [RowNumber],
SUBSTRING(P.[ProductNo], 1, 2) AS [Code]
FROM [LESMESPRD].[FlexNet_prd].[dbo].[ORDER_DETAIL] AS OD
INNER JOIN [LESMESPRD].[FlexNet_prd].[dbo].[WIP_COMPONENT] AS WC ON [WC].[WiporderNo] = OD.[OrderNo]
AND WC.[WipOrderType] = OD.[OrderType]
AND WC.[Active] = 1
INNER JOIN [LESMESPRD].[FlexNet_prd].[dbo].[COMPONENT] AS C ON C.[ID] = WC.[ComponentID]
INNER JOIN [LESMESPRD].[FlexNet_prd].[dbo].[PRODUCT] AS P ON P.[ID] = C.[ProductID]
WHERE SUBSTRING(P.[ProductNo], 1, 2) IN ('43', '72')
) AS OrderBrandComponents
WHERE [RowNumber] = 1
Executing time is 1 minute and 16 seconds, maybe you can help me optimize it somehow? This query is just small piece of the code, but I found that exactly this part is slowing the process.
I tried to think that maybe problem is in sub select when I try to get my rownumber, from these tables that are linked servers data is executing in seconds, I think problem is with the functions. I hope that this query could be optimized.
I believe the delay is because your query is not sargable based on the SUBSTRING( P.[ProductNo], 1,2 ). The engine can not utilize an index on a function call. But by using the full column and using LIKE based on the first 2 characters Plus wild-card anything after, you get the same records, but able to use an index.
Now, because you are looking for 2 specific product type codes (43 and 72), I reversed the query to START with that table, then find orders the products were used. This may help optimize speed, especially if you have 100 orders with these products, but 1000s of orders otherwise. Thus, starting with a smaller set to begin with.
Also, you dont need all the square brackets all over. Typically, those are only used if you have a column name based on a "reserved" keyword, such as naming a column "from" which is an obvious keyword in a SQL statement. Or things that are known data types, function names, etc.
Finally indexes to help optimize this. I would ensure you have indexes on the following tables
table index
Product ( ProductNo, Id ) -- specifically this order
Component ( ProductID, Id
WIP_COMPONENT ( ComponentId, Active, WipOrderNo, WipOrderType )
ORDER_DETAIL ( OrderNo, OrderType )
SELECT
Code
FROM
(SELECT
ROW_NUMBER() OVER
(PARTITION BY OrderNo, ProductNo
ORDER BY Quantity DESC) AS RowNumber,
SUBSTRING(P.ProductNo, 1, 2) Code
FROM
LESMESPRD.FlexNet_prd.dbo.PRODUCT P
JOIN LESMESPRD.FlexNet_prd.dbo.COMPONENT C
ON P.ID = C.ProductID
JOIN LESMESPRD.FlexNet_prd.dbo.WIP_COMPONENT WC
ON C.ID = WC.ComponentID
AND WC.Active = 1
JOIN LESMESPRD.FlexNet_prd.dbo.ORDER_DETAIL OD
ON WC.WiporderNo = OD.OrderNo
AND WC.WipOrderType = OD.OrderType
WHERE
P.ProductNo like '43%'
OR P.ProductNo like '72%' ) AS OrderBrandComponents
WHERE
OrderBrandComponents.RowNumber = 1

Subtract value of a field from a count(*) query

I have three tables:
Plans(PlanID(key), Capacity)
CustomerInProject(VahedID, cinpid(key))
Vahed(VahedID(key), PlanID,...)
This query shows number of houses with the same PlanID (map) that people hired:
select
count(*)
from
Vahed as v2,
CustomerInProject
where
CustomerInProject.VahedID = v2.VahedID
group by PlanID
Plans has an int field named Capacity. I want to subtract Capacity from the above query. How can I do that?
Something like this should do it:
select p.PlanID, count(*) - p.Capacity
from Vahed as v2
join CustomerInProject c
on c.VahedID = v2.VahedID
join Plan p
on p.PlanID = c.PlanID /* or v.PlanID, it's not clear from the question */
group by p.PlanID, p.Capacity
On the 6th line, you may want to replace c.PlanID with v.PlanID - I don't know the exact table schema.
select sum(cnt)
from
(select capacity*-1 as cnt from
plans
union
select count(*) as cnt from Vahed as v2
inner join
CustomerInProject on
CustomerInProject.VahedID = v2.VahedID
group by PlanID )

How can I join on multiple columns within the same table that contain the same type of info?

I am currently joining two tables based on Claim_Number and Customer_Number.
SELECT
A.*,
B.*,
FROM Company.dbo.Company_Master AS A
LEFT JOIN Company.dbp.Compound_Info AS B ON A.Claim_Number = B.Claim_Number AND A.Customer_Number = B.Customer_Number
WHERE A.Filled_YearMonth = '201312' AND A.Compound_Ind = 'Y'
This returns exactly the data I'm looking for. The problem is that I now need to join to another table to get information based on a Product_ID. This would be easy if there was only one Product_ID in the Compound_Info table for each record. However, there are 10. So basically I need to SELECT 10 additional columns for Product_Name based on each of those Product_ID's that are being selected already. How can do that? This is what I was thinking in my head, but is not working right.
SELECT
A.*,
B.*,
PD_Info_1.Product_Name,
PD_Info_2.Product_Name,
....etc {Up to 10 Product Names}
FROM Company.dbo.Company_Master AS A
LEFT JOIN Company.dbo.Compound_Info AS B ON A.Claim_Number = B.Claim_Number AND A.Customer_Number = B.Customer_Number
LEFT JOIN Company.dbo.Product_Info AS PD_Info_1 ON B.Product_ID_1 = PD_Info_1.Product_ID
LEFT JOIN Company.dbo.Product_Info AS PD_Info_2 ON B.Product_ID_2 = PD_Info_2.Product_ID
.... {Up to 10 LEFT JOIN's}
WHERE A.Filled_YearMonth = '201312' AND A.Compound_Ind = 'Y'
This query not only doesn't return the correct results, it also takes forever to run. My actual SQL is a lot longer and I've changed table names, etc but I hope that you can get the idea. If it matters, I will be creating a view based on this query.
Please advise on how to select multiple columns from the same table correctly and efficiently. Thanks!
I found put my extra stuff into CTE and add ROW_NUMBER to insure that I get only 1 row that I care about. it would look something like this. I only did for first 2 product info.
WITH PD_Info
AS ( SELECT Product_ID
,Product_Name
,Effective_Date
,ROW_NUMBER() OVER ( PARTITION BY Product_ID, Product_Name ORDER BY Effective_Date DESC ) AS RowNum
FROM Company.dbo.Product_Info)
SELECT A.*
,B.*
,PD_Info_1.Product_Name
,PD_Info_2.Product_Name
FROM Company.dbo.Company_Master AS A
LEFT JOIN Company.dbo.Compound_Info AS B
ON A.Claim_Number = B.Claim_Number
AND A.Customer_Number = B.Customer_Number
LEFT JOIN PD_Info AS PD_Info_1
ON B.Product_ID_1 = PD_Info_1.Product_ID
AND B.Fill_Date >= PD_Info_1.Effective_Date
AND PD_Info_2.RowNum = 1
LEFT JOIN PD_Info AS PD_Info_2
ON B.Product_ID_2 = PD_Info_2.Product_ID
AND B.Fill_Date >= PD_Info_2.Effective_Date
AND PD_Info_2.RowNum = 1

How to get some records without cursor, without Cross Apply in T-SQL

I have a table called Objects which contains some files, say:
User
Teacher
There is another table (States) which holds the possible states of these objects, like:
Active
Idle
Teaching
Resting
Authoring
And there is a third table (junction table) which logs each state change of each object. In this third table (ObjectStates) records are like:
1, 1, DateTime1 (User was active on DateTime1)
2, 5, DateTime2 (Teacher was authoring on DateTime2)
etc.
Now, what I want is a query to get each object, with its latest state (not state history). It's possible to get this result using cursors, or using Cross Apply command. However, I'd like to know if there is any other way to get the latest states of each object from these three tables? Because cursors are costy.
Using the row_number() windowing function...
select *
from
(
select objects.*,
state.state,
objectstates.changedate,
row_number() over (partition by object.objectid order by changedate desc) rn
from
objects
inner join
objectstates
on objects.id = objectstates.objectid
inner join
states
on objectstates.stateid = states.stateid
) v
where rn = 1
If you can't use row_number because you're on SQL 2000, for example, you can use a join on a max/group by query.
select objects.*,
state.state,
objectstates.changedate,
from
objects
inner join
objectstates
on objects.id = objectstates.objectid
inner join
states
on objectstates.stateid = states.stateid
inner join
(select objectid, max(changedate) as maxdate from objectstates group by objectid) maxstates
on objectstates.objectid=maxstates.objectid
and objectstates.changedate = maxstates.maxdate
You can join on the ObjectStates table twice. The first join of the table will get the max(activedate) for each objectid. The second time, you will join on both the objectid and the value of the max(activedate) and this will get the state associated with that value:
select o.name o_name,
s.name s_name,
os1.activedate
from objects o
left join
(
select max(activeDate) activedate, objectid
from objectstates
group by objectid
) os1
on o.id = os1.objectid
left join ObjectStates os2
on os1.objectid = os2. objectid
and os1.activedate = os2.activedate
left join states s
on os2.stateid = s.id
See SQL Fiddle with Demo
You can use partition over to find the latest row for each Object, like this
create table #ObjectState
(
Object int NOT NULL,
State int NOT NULL,
TimeStamp datetime NOT NULL
)
INSERT INTO #ObjectState (Object, State, TimeStamp) VALUES (1, 1, '2012-01-01')
INSERT INTO #ObjectState (Object, State, TimeStamp) VALUES (1, 2, '2012-01-02')
INSERT INTO #ObjectState (Object, State, TimeStamp) VALUES (1, 3, '2012-01-03')
INSERT INTO #ObjectState (Object, State, TimeStamp) VALUES (2, 4, '2012-01-01')
INSERT INTO #ObjectState (Object, State, TimeStamp) VALUES (2, 2, '2012-01-02')
select *, ROW_NUMBER() over (partition by Object order by TimeStamp desc) as RowNo from #ObjectState
select InnerSelect.Object, InnerSelect.State, InnerSelect.TimeStamp FROM
(
select *, ROW_NUMBER() over (partition by Object order by TimeStamp desc) as RowNo from #ObjectState
) InnerSelect
where InnerSelect.RowNo = 1
DROP TABLE #ObjectState
gives output
Object State TimeStamp
1 3 2012-01-03 00:00:00.000
2 2 2012-01-02 00:00:00.000
for the last select
In the good old days, we just used Scalar subqueries.
select o.*, (select top(1) s.description
from objectstates os
join states s on s.id = os.state_id
where os.object_id = o.id
order by os.recorded_time desc) last_state
from objects o;
Which CROSS APPLY replaces. To extend it to more fields, it had to be extended something like
select *
from (
select o.*, (select top(1) os.id
from objectstates os
where os.object_id = o.id
order by os.recorded_time desc) last_state
from objects o
) x
join objectstates os on os.id = x.last_state
join states s on s.id = os.state_id;

How to rewrite long query?

I have the following 2 tables:
items:
id int primary key
bla text
events:
id_items int
num int
when timestamp without time zone
ble text
composite primary key: id_items, num
and want to select to each item the most recent event (the newest 'when').
I wrote an request, but I don't know if it could be written more efficiently.
Also on PostgreSQL there is a issue with comparing Timestamp objects:
2010-05-08T10:00:00.123 == 2010-05-08T10:00:00.321
so I select with 'MAX(num)'
Any thoughts how to make it better? Thanks.
SELECT i.*, ea.* FROM items AS i JOIN
( SELECT t.s AS t_s, t.c AS t_c, max(e.num) AS o FROM events AS e JOIN
( SELECT DISTINCT id_item AS s, MAX(when) AS c FROM events GROUP BY s ORDER BY c ) AS t
ON t.s = e.id_item AND e.when = t.c GROUP BY t.s, t.c ) AS tt
ON tt.t_s = i.id JOIN events AS ea ON ea.id_item = tt.t_s AND ea.cas = tt.t_c AND ea.num = tt.o;
EDIT: had bad data, sorry, my bad, however thanks for finding better SQL query
SELECT (i).*, (e).*
FROM (
SELECT i,
(
SELECT e
FROM events e
WHERE e.id_items = i.id
ORDER BY
when DESC
LIMIT 1
) e
FROM items i
) q
If you're using 8.4:
select * from (
select item.*, event.*,
row_number() over(partition by item.id order by event."when" desc) as row_number
from items item
join events event on event.id_items = item.id
) x where row_number = 1
For this kind of joins, I prefer the DISTINCT ON syntax (example).
It's a Postgresql extension (not SQL standard syntax), but it comes very handy:
SELECT DISTINCT ON (it.id)
it.*, ev.*
FROM items it, events ev
WHERE ev.id_items = it.id
ORDER by it.id, ev.when DESC;
You can't beat that, on terms of simplicity and readability.
That query assumes that every item has at least one event. If not, and if you want all
events, you'll need an outer join:
SELECT DISTINCT ON (it.id)
it.*, ev.*
FROM items it LEFT JOIN events ev
ON ev.id_items = it.id
ORDER BY it.id, ev.when DESC;
BTW: There is no "timestamp issue" in Postgresql, perhaps you should change the title.