Unable to convert this legacy SQL into Standard SQL in Google BigQuery - google-bigquery

I am not able to validate this legacy sql into standard bigquery sql as I don't know what else is required to change here(This query fails during validation if I choose standard SQL as big query dialect):
SELECT
lineitem.*,
proposal_lineitem.*,
porder.*,
company.*,
product.*,
proposal.*,
trafficker.name,
salesperson.name,
rate_card.*
FROM (
SELECT
*
FROM
dfp_data.dfp_order_lineitem
WHERE
DATE(end_datetime) >= DATE(DATE_ADD(CURRENT_TIMESTAMP(), -1, 'YEAR'))
OR end_datetime IS NULL ) lineitem
JOIN (
SELECT
*
FROM
dfp_data.dfp_order) porder
ON
lineitem.order_id = porder.id
LEFT JOIN (
SELECT
*
FROM
adpoint_data.dfp_proposal_lineitem) proposal_lineitem
ON
lineitem.id = proposal_lineitem.dfp_lineitem_id
JOIN (
SELECT
*
FROM
dfp_data.dfp_company) company
ON
porder.advertiser_id = company.id
LEFT JOIN (
SELECT
*
FROM
adpoint_data.dfp_product) product
ON
proposal_lineitem.product_id=product.id
LEFT JOIN (
SELECT
*
FROM
adpoint_data.dfp_proposal) proposal
ON
proposal_lineitem.proposal_id=proposal.id
LEFT JOIN (
SELECT
*
FROM
adpoint_data.dfp_rate_card) rate_card
ON
proposal_lineitem.ratecard_id=rate_card.id
LEFT JOIN (
SELECT
id,
name
FROM
dfp_data.dfp_user) trafficker
ON
porder.trafficker_id =trafficker.id
LEFT JOIN (
SELECT
id,
name
FROM
dfp_data.dfp_user) salesperson
ON
porder. salesperson_id =salesperson.id

Most likely the error you are getting is something like below
Duplicate column names in the result are not supported. Found duplicate(s): name
Legacy SQL adjust trafficker.name and salesperson.name in your SELECT statement into respectively trafficker_name and salesperson_name thus effectively eliminating column names duplication
Standard SQL behaves differently and treat both those columns as named name thus producing duplication case. To avoid it - you just need to provide aliases as in example below
SELECT
lineitem.*,
proposal_lineitem.*,
porder.*,
company.*,
product.*,
proposal.*,
trafficker.name AS trafficker_name,
salesperson.name AS salesperson_name,
rate_card.*
FROM ( ...
You can easily check above explained using below simplified/dummy queries
#legacySQL
SELECT
porder.*,
trafficker.name,
salesperson.name
FROM (
SELECT 1 order_id, 'abc' order_name, 1 trafficker_id, 2 salesperson_id
) porder
LEFT JOIN (SELECT 1 id, 'trafficker' name) trafficker
ON porder.trafficker_id =trafficker.id
LEFT JOIN (SELECT 2 id, 'salesperson' name ) salesperson
ON porder. salesperson_id =salesperson.id
and
#standardSQL
SELECT
porder.*,
trafficker.name AS trafficker_name,
salesperson.name AS salesperson_name
FROM (
SELECT 1 order_id, 'abc' order_name, 1 trafficker_id, 2 salesperson_id
) porder
LEFT JOIN (SELECT 1 id, 'trafficker' name) trafficker
ON porder.trafficker_id =trafficker.id
LEFT JOIN (SELECT 2 id, 'salesperson' name ) salesperson
ON porder. salesperson_id =salesperson.id
Note: if you have more duplicate names - you need to alias all of them too

Related

BQ SQL join with a table with a name that is derived from a query

I have some fields that are a date. That date then is then used to look up a table with a name corresponding to the date of that field. I'm doing a join to get other fields, but the question is how to treat the field with the date as a variable that can be used then to perform the join.
Here is the example query:
with tab1 as (
select
product_id,
start_date,
from `project.user.table`
)
select * from tab1 inner join `project2.table2.{start_date}` as B on tab1.product_id = B.p_id
After suggestions I have tried the following query to tighten things up, but it is sadly not working.
with tab1 as (
select
cast(product_id as INT64) as product_id_64,
cast(FORMAT_DATE('%Y%m%d', CAST(start_date AS DATE)) as STRING) as start_date_string
from `project.user.table`
)
select * from `user2.dataset.*` b
inner join tab1
on b._TABLE_SUFFIX = tab1.start_date_string
led to the following error:
Error running query. Cannot read field of type STRING as INT64 Field: GTIN
If your table is a native BigQuery one you can try to test and use a wildcard:
WITH tab1 as (
select
product_id,
start_date,
from `project.user.table`
)
SELECT *
FROM `<yourproject>.<yourdataset>.*` b
INNER JOIN tab1
ON b._TABLE_PREFIX = tab1.start_date
AND b.p_id = tab1.product_id
I can't test unfortunately

Oracle SQL Developer - error when referencing fields within nested from statements

For the below query I am getting an error with line 4 when referencing variables within "y". The query runs successfully when I use just " y.* " (line 5), however it generates an error when I try to also pull from the specified fields in line 4 (y.field1 as PRODUCT, y.field2 as PRODUCT_TYPE, y.entity, y.TYPE1). For the output, I want these fields listed first for visual reference.
I have this approach/ logic working for other queries (as i'm re using this logic for multiple variations of queries and various tables). However, I think that the issue with this one lies in my attempt to reference fields from tables that are in my join statements.
(
select
-- categorization fields:
-- table2.field1 as PRODUCT, table2.field2 as PRODUCT_TYPE, table3.entity, table3.TYPE1
y.field1 as PRODUCT,
y.field2 as PRODUCT_TYPE,
y.entity,
y.TYPE1
,y.*
from (
select *
from (
-- table references:
select table1.*,
row_number() over (
partition by
-- categorization fields:
table2.field1,
table2.field2,
table3.entity,
table3.TYPE1
order by table3.entity
) as rn
-- table references
from table1
-- joins, links, and filtering:
inner join table6 on table1.field_1 = table6.code1
inner join table5 on (table6.code = table5.code1)
AND (table6.code = table5.code)
left join table3 on table6.ent1 = table3.ent_code
left join table2 on table1.extid = table2.extID
where table1.tdate between '01-APR-19' and '01-APR-21'
AND table1.refe NOT IN ('OFF')
) x
-- sample rows:
where rn <= 2
) y
);
Let me know if anyone has a way that I can maybe better specify which tables those fields come from. I wish I could just do something like this:
y.table2.field1 as PRODUCT,
y.table2.field2 as PRODUCT_TYPE,
y.table3.entity,
y.table3.TYPE1
Sorry that I don't have a fiddle available!
Let me know if anyone has a way that I can maybe better specify which tables those fields come from.
Don't use select *. Instead, use the column names and give them appropriate aliases so you know where they came from:
As an example:
SELECT small_value,
medium_value,
big_value
FROM (
SELECT small.value AS small_value,
medium.value AS medium_value,
big.value AS big_value
FROM big
CROSS JOIN medium
CROSS JOIN small
)
WHERE 1 = 1
In your query, instead of using SELECT * in y or using SELECT table1.* in x you can name the columns and give them descriptive aliases.
I am getting an error with line 4 when referencing variables within "y".
(
select
-- categorization fields:
-- table2.field1 as PRODUCT, table2.field2 as PRODUCT_TYPE, table3.entity, table3.TYPE1
That is because you cannot see TABLE2 or TABLE3 because the only "view" you are looking at is of the sub-query with the alias y.
If you want to see those columns then you need to SELECT them inside the x subquery and pass them to each subsequent outer-query.
(
select *
from (
-- table references:
select table1.field1 AS t1_product,
table1.field2 AS t1_product_type,
table1.entity AS t1_entity,
table1.type1 AS t1_type1,
table2.field1 AS t2_product,
table2.field2 AS t2_product_type,
table2.entity AS t2_entity,
table2.type1 AS t2_type1,
table3.field1 AS t3_product,
table3.field2 AS t3_product_type,
table3.entity AS t3_entity,
table3.type1 AS t3_type1,
row_number() over (
partition by
-- categorization fields:
table2.field1,
table2.field2,
table3.entity,
table3.TYPE1
order by table3.entity
) as rn
-- table references
from table1
-- joins, links, and filtering:
inner join table6 on table1.field_1 = table6.code1
inner join table5 on (table6.code = table5.code1)
AND (table6.code = table5.code)
left join table3 on table6.ent1 = table3.ent_code
left join table2 on table1.extid = table2.extID
where table1.tdate between '01-APR-19' and '01-APR-21'
AND table1.refe NOT IN ('OFF')
) x
-- sample rows:
where rn <= 2
);

DB2 Alternate to EXISTS Function

I have the below query in my application which was running on DB2:
SELECT COD.POST_CD,CLS.CLASS,COD2.STATUS_CD
FROM DC01.POSTAL_CODES COD
INNER JOIN DC02.STATUS_CODES COD2
ON COD.ORDER=COD2.ORDER
INNER JOIN DC02.VALID_ORDERS ORD
ON ORD.ORDER=COD.ORDER
WHERE
(
( EXISTS (SELECT 1 FROM DC00.PROCESS_ORDER PRD
WHERE PRD.ORDER=COD.ORDER
AND PRD.IDNUM=COD.IDNUM
)
) OR
( EXISTS (SELECT 1 FROM DC00.PENDING_ORDER PND
WHERE PND.ORDER=COD.ORDER
AND PND.IDNUM=COD.IDNUM
)
)
)
AND EXISTS (SELECT 1 FROM DC00.CUSTOM_ORDER CRD
WHERE CRD.ORDER=COD.ORDER
)
;
When we changed to UDB (LUW v9.5) we are getting the below warning:
IWAQ0003W SQL warnings were found
SQLState=01602 Performance of this complex query might be sub-optimal.
Reason code: "3".. SQLCODE=437, SQLSTATE=01602, DRIVER=4.13.111
I know this warning is due to the EXISTS () OR EXISTS statements. But I am not sure any other way I can write this query to replace. If it is AND, I could have made an INNER JOIN, but I am not able to change this condition as it is OR. Can any one suggest better way to replace these EXISTS Statements?
SELECT COD.POST_CD,CLS.CLASS,COD2.STATUS_CD
FROM DC01.POSTAL_CODES COD
INNER JOIN DC02.STATUS_CODES COD2
ON COD.ORDER=COD2.ORDER
INNER JOIN DC02.VALID_ORDERS ORD
ON ORD.ORDER=COD.ORDER
WHERE
(
EXISTS SELECT 1 FROM
(SELECT ORDER,IDNUM FROM DC00.PROCESS_ORDER PRD UNION
SELECT ORDER,IDNUM FROM DC00.PENDING_ORDER PND) PD
WHERE PD.ORDER=COD.ORDER
AND PD.IDNUM=COD.IDNUM
)
AND EXISTS (SELECT 1 FROM DC00.CUSTOM_ORDER CRD
WHERE CRD.ORDER=COD.ORDER
)
;

Limit join to one row

I have the following query:
SELECT sum((select count(*) as itemCount) * "SalesOrderItems"."price") as amount, 'rma' as
"creditType", "Clients"."company" as "client", "Clients".id as "ClientId", "Rmas".*
FROM "Rmas" JOIN "EsnsRmas" on("EsnsRmas"."RmaId" = "Rmas"."id")
JOIN "Esns" on ("Esns".id = "EsnsRmas"."EsnId")
JOIN "EsnsSalesOrderItems" on("EsnsSalesOrderItems"."EsnId" = "Esns"."id" )
JOIN "SalesOrderItems" on("SalesOrderItems"."id" = "EsnsSalesOrderItems"."SalesOrderItemId")
JOIN "Clients" on("Clients"."id" = "Rmas"."ClientId" )
WHERE "Rmas"."credited"=false AND "Rmas"."verifyStatus" IS NOT null
GROUP BY "Clients".id, "Rmas".id;
The problem is that the table "EsnsSalesOrderItems" can have the same EsnId in different entries. I want to restrict the query to only pull the last entry in "EsnsSalesOrderItems" that has the same "EsnId".
By "last" entry I mean the following:
The one that appears last in the table "EsnsSalesOrderItems". So for example if "EsnsSalesOrderItems" has two entries with "EsnId" = 6 and "createdAt" = '2012-06-19' and '2012-07-19' respectively it should only give me the entry from '2012-07-19'.
SELECT (count(*) * sum(s."price")) AS amount
, 'rma' AS "creditType"
, c."company" AS "client"
, c.id AS "ClientId"
, r.*
FROM "Rmas" r
JOIN "EsnsRmas" er ON er."RmaId" = r."id"
JOIN "Esns" e ON e.id = er."EsnId"
JOIN (
SELECT DISTINCT ON ("EsnId") *
FROM "EsnsSalesOrderItems"
ORDER BY "EsnId", "createdAt" DESC
) es ON es."EsnId" = e."id"
JOIN "SalesOrderItems" s ON s."id" = es."SalesOrderItemId"
JOIN "Clients" c ON c."id" = r."ClientId"
WHERE r."credited" = FALSE
AND r."verifyStatus" IS NOT NULL
GROUP BY c.id, r.id;
Your query in the question has an illegal aggregate over another aggregate:
sum((select count(*) as itemCount) * "SalesOrderItems"."price") as amount
Simplified and converted to legal syntax:
(count(*) * sum(s."price")) AS amount
But do you really want to multiply with the count per group?
I retrieve the the single row per group in "EsnsSalesOrderItems" with DISTINCT ON. Detailed explanation:
Select first row in each GROUP BY group?
I also added table aliases and formatting to make the query easier to parse for human eyes. If you could avoid camel case you could get rid of all the double quotes clouding the view.
Something like:
join (
select "EsnId",
row_number() over (partition by "EsnId" order by "createdAt" desc) as rn
from "EsnsSalesOrderItems"
) t ON t."EsnId" = "Esns"."id" and rn = 1
this will select the latest "EsnId" from "EsnsSalesOrderItems" based on the column creation_date. As you didn't post the structure of your tables, I had to "invent" a column name. You can use any column that allows you to define an order on the rows that suits you.
But remember the concept of the "last row" is only valid if you specifiy an order or the rows. A table as such is not ordered, nor is the result of a query unless you specify an order by
Necromancing because the answers are outdated.
Take advantage of the LATERAL keyword introduced in PG 9.3
left | right | inner JOIN LATERAL
I'll explain with an example:
Assuming you have a table "Contacts".
Now contacts have organisational units.
They can have one OU at a point in time, but N OUs at N points in time.
Now, if you have to query contacts and OU in a time period (not a reporting date, but a date range), you could N-fold increase the record count if you just did a left join.
So, to display the OU, you need to just join the first OU for each contact (where what shall be first is an arbitrary criterion - when taking the last value, for example, that is just another way of saying the first value when sorted by descending date order).
In SQL-server, you would use cross-apply (or rather OUTER APPLY since we need a left join), which will invoke a table-valued function on each row it has to join.
SELECT * FROM T_Contacts
--LEFT JOIN T_MAP_Contacts_Ref_OrganisationalUnit ON MAP_CTCOU_CT_UID = T_Contacts.CT_UID AND MAP_CTCOU_SoftDeleteStatus = 1
--WHERE T_MAP_Contacts_Ref_OrganisationalUnit.MAP_CTCOU_UID IS NULL -- 989
-- CROSS APPLY -- = INNER JOIN
OUTER APPLY -- = LEFT JOIN
(
SELECT TOP 1
--MAP_CTCOU_UID
MAP_CTCOU_CT_UID
,MAP_CTCOU_COU_UID
,MAP_CTCOU_DateFrom
,MAP_CTCOU_DateTo
FROM T_MAP_Contacts_Ref_OrganisationalUnit
WHERE MAP_CTCOU_SoftDeleteStatus = 1
AND MAP_CTCOU_CT_UID = T_Contacts.CT_UID
/*
AND
(
(#in_DateFrom <= T_MAP_Contacts_Ref_OrganisationalUnit.MAP_KTKOE_DateTo)
AND
(#in_DateTo >= T_MAP_Contacts_Ref_OrganisationalUnit.MAP_KTKOE_DateFrom)
)
*/
ORDER BY MAP_CTCOU_DateFrom
) AS FirstOE
In PostgreSQL, starting from version 9.3, you can do that, too - just use the LATERAL keyword to achieve the same:
SELECT * FROM T_Contacts
--LEFT JOIN T_MAP_Contacts_Ref_OrganisationalUnit ON MAP_CTCOU_CT_UID = T_Contacts.CT_UID AND MAP_CTCOU_SoftDeleteStatus = 1
--WHERE T_MAP_Contacts_Ref_OrganisationalUnit.MAP_CTCOU_UID IS NULL -- 989
LEFT JOIN LATERAL
(
SELECT
--MAP_CTCOU_UID
MAP_CTCOU_CT_UID
,MAP_CTCOU_COU_UID
,MAP_CTCOU_DateFrom
,MAP_CTCOU_DateTo
FROM T_MAP_Contacts_Ref_OrganisationalUnit
WHERE MAP_CTCOU_SoftDeleteStatus = 1
AND MAP_CTCOU_CT_UID = T_Contacts.CT_UID
/*
AND
(
(__in_DateFrom <= T_MAP_Contacts_Ref_OrganisationalUnit.MAP_KTKOE_DateTo)
AND
(__in_DateTo >= T_MAP_Contacts_Ref_OrganisationalUnit.MAP_KTKOE_DateFrom)
)
*/
ORDER BY MAP_CTCOU_DateFrom
LIMIT 1
) AS FirstOE
Try using a subquery in your ON clause. An abstract example:
SELECT
*
FROM table1
JOIN table2 ON table2.id = (
SELECT id FROM table2 WHERE table2.table1_id = table1.id LIMIT 1
)
WHERE
...

SQL show records that don't exist in my table variable

I have a table variable that holds orderID, UnitID and OrderServiceId (it is already populated via a query with insert statement).
I then have a query under this that returns 15 columns which also include the OrderId, UnitId, OrderServiceId
I need to only return the rows from this query where the same combination of OrderId, UnitId, and OrderServiceId are not in the table variable.
You can use NOT EXISTS. e.g.
FROM YourQuery q
WHERE NOT EXISTS
(
SELECT * FROM #TableVar t
WHERE t.OrderId = q.OrderId
and t.UnitId = q.UnitId
and t.OrderServiceId=q.OrderServiceId
)
select q.*
from (
MyQuery
) q
left outer join MyTableVariable t on q.ORDERID = t.ORDERID
and q.UNITID= t.UNITID
and q.ORDERSERVICESID = t.ORDERSERVICESID
where t.ORDERID is null
You can use EXCEPT | INTERSECT operators for this (link).
Example:
(select 3,4,1
union all
select 2,4,1)
intersect
(select 1,2,9
union all
select 3,4,1)