if clause in bigquery - google-bigquery

I have a (join) query that works as expected. But as soon as I add the following column, it does not show any results nor does it complete. (Query running counter keeps growing)
IF((d.network_type contains '_user' AND d.is_network=1),s.impressions,0) AS effimp
Is there any other way to optimize this?
The full query is as follows and it was working when I tried it in the last month.
SELECT s.date_time AS date_time
, s.requests AS requests, s.impressions AS impressions
, s.clicks AS clicks, s.conversions AS conversions
, IF((d.network_type contains '_user'
AND d.is_network=1),s.impressions,0) AS effimp
, s.total_revenue AS total_revenue
, s.total_basket_value AS total_basket_value
, s.total_num_items AS total_num_items
, s.zone_id as zone_id
FROM company.ox_data_summary s
INNER JOIN company.ox_banners1 AS d ON d.bannerid=s.ad_id
limit 100
Query Failed
Error: Unexpected. Please try again.
If I remove the "IF clause it does work.

Looks like you're hitting a query processing bug. We're investigating.

Related

My SQL query is saying my order by is invalid

I have a SQL query that is giving me this error.
Column INSTR.JOB_HDR.JOB_START_DTTM is invalid in the ORDER BY clause because it is not contained in either an aggregate function or the GROUP BY clause.
But... I am using an aggregate, here is my query
SELECT [JOB_NAME]
, [JOB_DESCR]
, [INFA_FOLDER_NAME]
, [WORKFLOW_NAME]
, [LOAD_GROUP]
, MAX(JOB_START_DTTM) AS 'JOB START'
, MAX(JOB_END_DTTM) AS 'JOB END'
, [JOB_STATUS]
FROM [INSTR].[JOB] JB
INNER JOIN [INSTR].[JOB_HDR] JOB_HDR
ON JB.JOB_ID = JOB_HDR.JOB_ID
WHERE (JB.JOB_NAME LIKE 'EDW%1D00%')
AND
(
(JOB_STATUS = 'COMPLETE')
OR (JOB_STATUS = 'RUNNING')
)
GROUP BY JOB_NAME
, JOB_DESCR
, INFA_FOLDER_NAME
, WORKFLOW_NAME
, LOAD_GROUP
, JOB_STATUS
ORDER BY JOB_HDR.JOB_START_DTTM DESC
, JOB_HDR.JOB_END_DTTM DESC;
Funny enough, just to see, I put those columns in the group by and I get results. But the results are giving me multiple records, when I just want the highest value for the job start and end times which is why I used max.
In case it helps, this is the format of the record for job start and end.
2023-02-08 16:12:09.
What is causing the order by clause to not know that I'm using the max function?? Why is the max function not working?
I'm doing this as a test run. I work in a baby IT JOB that allows you to learn SQL while working. And then you do a "Qual book" checkout. My signatory gave me the hint to use an aggregate function to those. I suggested max and they more less confirmed without giving me the answer, cause I'm supposed to write this query on my own.
But also, he mentioned I would use a case statement for the job status (which confuses me because my where clause filters what I want already) how would a case statement? I'm not as familiar with CASE as I'd like to be and all the free resources don't seem to be making it click for me.
Thanks for any help! I mostly just really want to know the why of problem, why is my aggregate function not working....
The results are to pull those columns in the select, using aforementioned join for two tables.
The results need to show job status as "running" or "complete" with the latest run time (job start, end time). This should produce 1 unique record for each "job name" with %edw%1d00% cause I want only the latest run time and it's status.
Hope this clarifies.
Try this:
SELECT [JOB_NAME]
, [JOB_DESCR]
, [INFA_FOLDER_NAME]
, [WORKFLOW_NAME]
, [LOAD_GROUP]
, MAX(JOB_START_DTTM) AS 'JOB START'
, MAX(JOB_END_DTTM) AS 'JOB END'
, [JOB_STATUS]
FROM [INSTR].[JOB] JB
INNER JOIN [INSTR].[JOB_HDR] JOB_HDR
ON JB.JOB_ID = JOB_HDR.JOB_ID
WHERE (JB.JOB_NAME LIKE 'EDW%1D00%')
AND
(
(JOB_STATUS = 'COMPLETE')
OR (JOB_STATUS = 'RUNNING')
)
GROUP BY [JOB_NAME]
, [JOB_DESCR]
, [INFA_FOLDER_NAME]
, [WORKFLOW_NAME]
, [LOAD_GROUP]
, [JOB_STATUS]
ORDER BY MAX(JOB_HDR.JOB_START_DTTM) DESC
, MAX(JOB_HDR.JOB_END_DTTM) DESC;
The issue is that your order by was trying to order by things that aren't in the group by, or being aggregated, like the error mentions. The way to fix that in this case is to aggregate them in the order by

Most recent transaction date against a Works Order?

Apologies in advance for what will probably be a very stupid question but I've been using Google to teach myself SQL after making the move from years of using Crystal Reports.
We have Works Orders which can have numerous transactions against them. I want to find the most recent one and have it returned against the Works Order number (which is a unique ID)? I attempted to use MAX but that just returns whatever the Transaction Date for that record is.
I think my struggles may be caused by a lack of understanding of grouping in SQL. In Crystal it was just 'choose what to group by' but for some reason in SQL I seem to be forced to group by all selected fields.
My ultimate goal is to be able to compare the planned end date of the Works Order ("we need to finish this job by then") vs when the last transaction was booked against the Works Order, so that I can create an OTIF KPI.
I've attached an image of what I'm currently seeing in SQL Server 2014 Management Studio and below is my attempt at the query.
SELECT wip.WO.WO_No
, wip.WO.WO_Type
, stock.Stock_Trans_Log.Part_No
, stock.Stock_Trans_Types.Description
, stock.Stock_Trans_Log.Qty_Change
, stock.Stock_Trans_Log.Trans_Date
, wip.WO.End_Date
, wip.WO.Qty - wip.WO.Qty_Stored AS 'Qty remaining'
, MAX(stock.Stock_Trans_Log.Trans_Date) AS 'Last Production Receipt'
FROM stock.Stock_Trans_Log
INNER JOIN production.Part
ON stock.Stock_Trans_Log.Part_No = production.Part.Part_No
INNER JOIN wip.WO
ON stock.Stock_Trans_Log.WO_No = wip.WO.WO_No
INNER JOIN stock.Stock_Trans_Types
ON stock.Stock_Trans_Log.Tran_Type = stock.Stock_Trans_Types.Type
WHERE (stock.Stock_Trans_Types.Type = 10)
AND (stock.Stock_Trans_Log.Store_Code <> 'BI')
GROUP BY wip.WO.WO_No
, wip.WO.WO_Type
, stock.Stock_Trans_Log.Part_No
, stock.Stock_Trans_Types.Description
, stock.Stock_Trans_Log.Qty_Change
, stock.Stock_Trans_Log.Trans_Date
, wip.WO.End_Date
, wip.WO.Qty - wip.WO.Qty_Stored
HAVING (stock.Stock_Trans_Log.Part_No BETWEEN N'2Z' AND N'9A')
Query + results
If my paraphrase is correct, you could use something along the following lines...
WITH
sequenced_filtered_stock_trans_log AS
(
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY WO_No
ORDER BY Trans_Date DESC) AS reversed_sequence_id
FROM
stock.Stock_Trans_Log
WHERE
Type = 10
AND Store_Code <> 'BI'
AND Part_No BETWEEN N'2Z' AND N'9A'
)
SELECT
<stuff>
FROM
sequenced_filtered_stock_trans_log AS stock_trans_log
INNER JOIN
<your joins>
WHERE
stock_trans_log.reversed_sequence_id = 1
First, this will apply the WHERE clause to filter the log table.
After the WHERE clause is applied, a sequence id is calculated. Restarting from one for each partition (each WO_No), and starting from the highest Trans_Date.
Finally, that can be used in your outer query with a WHERE clause that specifies that you only want the records with sequence id one, this it the most recent row per WO_No. The rest of the joins on to that table would proceed as normal.
If there is any other filtering that should be done (through joins or any other means) that should all be done before the application of the ROW_NUMBER().

Selected field is not included in GROUP BY clause, so how does this Access query run successfully?

I found this query in an MS Access database that was built by someone else:
SELECT
tblWorkOrder.WorkOrderNum
, tblWorkOrder.SprayTypes
, tblWorkOrder.Description
, tblWorkOrderMaterials.ChemicalName
, tblWorkOrderMaterials.RatePerAcre
, tblMaterials.ApplicationUnit
, tblMaterials.DryOrLiquid
, tblWorkOrderMaterials.ID
FROM (tblMaterials
INNER JOIN tblMaterialsDetails ON tblMaterials.ChemicalName = tblMaterialsDetails.ChemicalName)
INNER JOIN (tblWorkOrder
INNER JOIN tblWorkOrderMaterials ON tblWorkOrder.WorkOrderNum = tblWorkOrderMaterials.WorkOrderNum) ON tblMaterials.ChemicalName = tblWorkOrderMaterials.ChemicalName
WHERE (((tblMaterialsDetails.CropType) = "Apples"
OR (tblMaterialsDetails.CropType) = "All"))
GROUP BY
tblWorkOrder.WorkOrderNum
, tblWorkOrder.Description
, tblWorkOrderMaterials.ChemicalName
, tblWorkOrderMaterials.RatePerAcre
, tblMaterials.ApplicationUnit
, tblMaterials.DryOrLiquid
, tblWorkOrderMaterials.ID;
The query runs fine in Access, which is the problem. How does this query run when field "tblWorkOrder.SprayTypes" is included in the SELECT list but not in the GROUP BY clause? It should cause an error based on the field not being included in the aggregate function, right? When I migrated the back end to MySQL, it broke like I would expect it to so I want to make sure I wasn't missing something in the Access back end version.
Here is the relationship between tblWorkOrder and tblSprayTypes:
It runs successfully because there is nothing in the select clause that necessitates a group by clause. There is no min, max, sum, count, or avg.
The point of the group by clause is not clear, but that wasn't your question.

Google BiqQuery Internal Error

Edit: Tidied up the query a bit. Checked running on one day (versus the 27 I need) and the query runs. With 27 days of data it's trying to process 5.67TB. Could this be the issue?
Latest ID of error run:
Job ID: ee-corporate:bquijob_3f47d425_1530e03af64
I keep getting this error message when trying to run a query in BigQuery, both through the UI and Bigrquery.
Query Failed
Error: An internal error occurred and the request could not be completed.
Job ID: ee-corporate:bquijob_6b9bac2e_1530dba312e
Code below:
SELECT
CASE WHEN d.category_grouped IS NULL THEN 'N/A' ELSE d.category_grouped END AS category_grouped_cleaned,
COUNT(UNIQUE(msisdn_token)) AS users,
(SUM(up_link_data_bytes) + SUM(down_link_data_bytes))/1000000 AS tot_data_mb
FROM (
SELECT
request_domain, up_link_data_bytes, down_link_data_bytes, msisdn_token, timestamp
FROM (TABLE_DATE_RANGE([helpful-skyline-97216:WEBLOG_Staging.WEBLOG_], TIMESTAMP('20160101'), TIMESTAMP('20160127')))
WHERE SUBSTR(http_status_code,1,1) IN ('1',
'2',
'3')) a
LEFT JOIN EACH web_usage_201601.domain_to_cat_lookup_27JAN_with_groups d
ON
a.request_domain = d.request_domain
WHERE
DATE(timestamp) >= '2016-01-01'
AND DATE(timestamp) <= '2016-01-27'
GROUP EACH BY
1
Is there something I'm doing wrong?
The problem seems to be coming from UNIQUE() - it returns repeated field with too many elements in it. The error could be improved, but workaround for you would be to use explicit GROUP BY and then run COUNT on top of it.
If you are okay with an approximation, you can also use
COUNT(DISTINCT msisdn_token) AS users
or a higher approximation parameter than the default 1000,
COUNT(DISTINCT msisdn_token, 5000) AS users
GROUP BY is the most general approach, but these can be faster if they do what you need.

Query including subquery and group by slower than expected

The whole query below runs incredibly slowly.
The subquery query [alias Stage_1] takes only 1.37 minutes returning 9514 records, however the whole query takes over 20 minutes, returning 2606 records.
I could use a #temp table to hold the subquery to improve the performance however I would prefer not to.
An overview of the query is that table WeeklySpace inner joins to Spaceblock_Name_to_PG table on SpaceblockName_SID, this cuts down the results in WeeklySpace and includes PG_Code with the results in WeeklySpace. WeeklySpace is then Full Outer Joined to Sales_PG_Wk across 3 fields. The where clause focuses the results, and may be changed. The results from the subquery are then sum'd. You cannot do the final sum'ing in the subquery due to the group by and sum over used.
I believe the issue is due to the subquery re calculation repeatedly during the group by in the final sum'ing. The field SpaceblockName_SID also appears to be involved in causing the issue as without it the run time with a group by in the subquery isn't affected.
I have read though loads of suggestion, trying them all to resolve the issue.
These include;
Adding TOP 2147483647 with Order by to force intermediate
materialization, both in the subquery and using a CTE.
Adding a join after stage_1.
Cast'ing SpaceblockName_SID from an int to a varchar and back again
The execution plan (cut in two parts, shown below the code) for both the subquery and the whole query appear similar. The cost is around the Full Outer Join (Hash Match), which I expected.
The query is running on T-SQL 2005.
Any help greatly appreciated!
select
Cost_centre
, Fin_week
, SpaceblockName_SID
, sum(Propor_rep_SRV) as Total_SpaceblockName_SID_SRV
from
(
select
coalesce(space_side.fin_week , sales_side.fin_week) as Fin_week
,coalesce(space_side.cost_centre , sales_side.cost_Centre) as Cost_centre
,space_side.SpaceblockName_SID
,case
when space_side.SpaceblockName_SID is null
then sales_side.SalesExVAT
else sum(space_side.TLM)
/nullif(sum (sum(space_side.TLM) ) over (partition by coalesce(space_side.fin_week , sales_side.fin_week)
, coalesce(space_side.cost_centre , sales_side.cost_Centre)
, coalesce( Spaceblock_Name_to_PG.PG_Code, sales_side.PG_Code)) ,0)*sales_side.SalesExVAT
end as Propor_rep_SRV
from
WeeklySpace as space_side
INNER JOIN
Spaceblock_Name_to_PG
ON space_side.SpaceblockName_SID = Spaceblock_Name_to_PG.SpaceblockName_SID
and Spaceblock_Name_to_PG.PG_Code < 10000
full outer join
sales_pg_wk as sales_side
on space_side.fin_week = sales_side.fin_week
and space_side.Cost_Centre = sales_side.Cost_Centre
and Spaceblock_Name_to_PG.PG_code = sales_side.pg_code
where
coalesce(space_side.fin_week, sales_side.fin_week) between 201538 and 201550
and
coalesce(space_side.cost_centre, sales_side.cost_Centre) in (3, 2800)
group by
coalesce(space_side.fin_week, sales_side.fin_week)
,coalesce(space_side.cost_centre, sales_side.cost_Centre)
,coalesce( Spaceblock_Name_to_PG.PG_Code, sales_side.PG_Code)
,sales_side.SalesExVAT
,space_side.SpaceblockName_SID
) as stage_1
group by
Cost_centre
, Fin_week
, SpaceblockName_SID
Execution plan left hand side
Execution plan right hand side
You didn't mentioned about indices are created or not on those columns those you used in your query. If not then create and check performance of the query
In looking at you logic I think you split this in two with a UNION
One with Spaceblock_Name_to_PG.PG_Code < 10000 and the other with Spaceblock_Name_to_PG.PG_Code >= 10000
And consider this change
If may be doing a bunch of join that you are going to throw out anyway
full outer join sales_pg_wk as sales_side
on space_side.fin_week = sales_side.fin_week
and space_side.Cost_Centre = sales_side.Cost_Centre
and Spaceblock_Name_to_PG.PG_code = sales_side.pg_code
and space_side.fin_week between 201538 and 201550
and sales_side.fin_week between 201538 and 201550
and space_side.cost_centre in (3, 2800)
and sales_side.cost_Centre in (3, 2800)