AWS Athena (Presto) '+' cannot be applied to varchar, varchar error - sql

I'm having a bit of trouble with some Presto SQL that I have written in Athena. I get the following error which I'm a bit confused about:
SYNTAX_ERROR: line 46:39: '+' cannot be applied to varchar, varchar
Here is my script:
SELECT Duplicate.AircraftTypeCode,
Duplicate.LineNumber,
Serial.SerialNumber
FROM (SELECT *
FROM (SELECT DISTINCT TypeCode AS AircraftTypeCode,
LineNumber
FROM (SELECT acl.aircraft_type AS Type,
achl.aircraft_type_code_internal AS TypeCode,
acl.aircraft_line_number AS LineNumber,
row_number()
OVER (
partition BY aahl.aircraft_id
ORDER BY aahl.aircraft_id, aahl.start_event_date
DESC,
aahl.event_sequence_number DESC) AS Rown
FROM fleets.aircraft_all_history_latest aahl
LEFT OUTER JOIN fleets.aircraft_latest acl
ON aahl.aircraft_id = acl.aircraft_id
LEFT OUTER JOIN
fleets.aircraft_configuration_history_latest achl
ON acl.aircraft_id = achl.aircraft_id)
AH
WHERE linenumber IS NOT NULL
GROUP BY TypeCode,
LineNumber)LineNumber
GROUP BY AircraftTypeCode,
LineNumber
HAVING Count(LineNumber) > 1)Duplicate
LEFT OUTER JOIN (SELECT *
FROM (SELECT achl.aircraft_type_code_internal AS
TypeCode,
acl.aircraft_serial_number AS
SerialNumber,
acl.aircraft_line_number AS
LineNumber
FROM fleets.aircraft_all_history_latest aahl
LEFT OUTER JOIN fleets.aircraft_latest acl
ON aahl.aircraft_id =
acl.aircraft_id
LEFT OUTER JOIN fleets.aircraft_configuration_history_latest achl
ON acl.aircraft_id = achl.aircraft_id) SerialNumber
WHERE LineNumber IS NOT NULL
GROUP BY TypeCode,
SerialNumber,
LineNumber) Serial
ON Serial.TypeCode + Serial.LineNumber =
Duplicate.AircraftTypeCode
+ Duplicate.LineNumber
Everything I am using is of type String. Is there is something in Presto that i should be doing differently as my thinking is more along the lines of MSSQL

I assume you want to concatenate TypeCode and LineNumber in your JOIN condition. In Presto / Athena you need to use the CONCAT function or the || operator for that.

Related

How to convert this query for Spark SQL

I'm trying to convert an SQL Server query to execute it into a Notebook, but I can't figure out how to convert a "CROSS APPLY" into something that Spark can understand.
Here is my SQL Server query :
WITH Benef as (
SELECT DISTINCT
IdBeneficiaireSource
,Adress
FROM
UPExpBeneficiaryStaging
)
-------- Split Adress --------
,AdresseBenefTemp1 as (
SELECT
IdBeneficiaireSource
,REPLACE(REPLACE(Adress, char(10), '|'), char(13), '|') as AdresseV2
FROM
Benef
)
,AdresseBenefTemp2 as (
SELECT
IdBeneficiaireSource
,value as Adresse
,ROW_NUMBER() OVER(PARTITION BY IdBeneficiaireSource ORDER BY (SELECT NULL)) as LigneAdresse
FROM
AdresseBenefTemp1
CROSS APPLY string_split(AdresseV2, '|')
)
,AdresseBenefFinal as (
SELECT DISTINCT
a.IdBeneficiaireSource
,b.Adresse as Adresse_1
,c.Adresse as Adresse_2
,d.Adresse as Adresse_3
FROM
AdresseBenefTemp2 as a
LEFT JOIN AdresseBenefTemp2 as b on b.IdBeneficiaireSource = a.IdBeneficiaireSource AND b.LigneAdresse = 1
LEFT JOIN AdresseBenefTemp2 as c on c.IdBeneficiaireSource = a.IdBeneficiaireSource AND c.LigneAdresse = 2
LEFT JOIN AdresseBenefTemp2 as d on d.IdBeneficiaireSource = a.IdBeneficiaireSource AND d.LigneAdresse = 3
)
-------------------------------
SELECT
a.IdBeneficiaireSource
,Adresse_1
,Adresse_2
,Adresse_3
FROM
AdresseBenefFinal
(This query split an address field into three address fields)
When I run it into a Notebook, it says that "CROSS APPLY" is not correct.
Thanks.
Correct me if I'm wrong, but the cross apply string_split is basically a cross join for each entry in the resulting split.
In Spark you're able to use an explode for this (https://docs.databricks.com/sql/language-manual/functions/explode.html). So you should be able to add another CTE in between where you explode the splitted (https://docs.databricks.com/sql/language-manual/functions/split.html) results from AddresseV2 by '|'.

Joining Onto Partition Statement HANA SQL

I am trying to join two tables to the following query:
SELECT "NUMBER",
"U_ANALYZED_DATE",
"DV_SALES_ACCOUNT",
"U_USD_TOTAL_POTENTIAL_NNACV"
FROM
(select *, row_number() over ( partition by "DV_SALES_ACCOUNT" order by "U_ANALYZED_DATE" desc ) rownum
from "SURF_RT"."SALES_REQUEST")
WHERE rownum = 1
AND "DV_SALES_CATEGORY" = 'Compliance'
AND "DV_STATE" NOT IN ('Closed Canceled')
AND (YEAR("U_ANALYZED_DATE") = '2019' AND MONTH("U_ANALYZED_DATE") IN ('10','11','12')
OR YEAR("U_ANALYZED_DATE") = '2020' AND MONTH("U_ANALYZED_DATE") IN ('1','2','3'))
AND "U_USD_TOTAL_POTENTIAL_NNACV" > 0
ORDER BY "U_ANALYZED_DATE" desc
The tables should be joined as follows:
JOIN "SURF_RT"."SALES_ACCOUNT" on "SURF_RT"."SALES_ACCOUNT"."NAME" = "SURF_RT"."SALES_REQUEST"."DV_SALES_ACCOUNT"
JOIN "SURF_RT"."SALES_CONTRACT" on "SURF_RT"."SALES_CONTRACT"."DV_ACCOUNT" = "SURF_RT"."SALES_REQUEST"."DV_SALES_ACCOUNT"
I am getting an error no matter what I try and it has to be because of the partition. Does anyone know the solution here?
I suspect that you just need to alias the derived table so you can then refer to join it. In many databases this is mandatory anyway, but apparently not in Hana (otherwise your original query would not run).
But to join on a derived table (the resultset that is generated by the subquery), alias do help:
SELECT
...
FROM
(
select
*,
row_number() over ( partition by "DV_SALES_ACCOUNT" order by "U_ANALYZED_DATE" desc ) rownum
from "SURF_RT"."SALES_REQUEST"
) sr -- table alias
JOIN "SURF_RT"."SALES_ACCOUNT"
ON "SURF_RT"."SALES_ACCOUNT"."NAME" = "SURF_RT"."SALES_REQUEST"."DV_SALES_ACCOUNT"
JOIN "SURF_RT"."SALES_CONTRACT"
ON "SURF_RT"."SALES_CONTRACT"."DV_ACCOUNT" = sr."DV_SALES_ACCOUNT" --reference to the derived table
WHERE
...
Side notes:
you should use table aliases for other tables involved in the query too, in order to make the code more readable
you should also prefix each column in the query with the identifier of the table it belongs to, so the query is unambiguous (and easier to maintain)
The problem with this query is not "that the where rownum = 1 needs to be at the end" but that the OP got confused by the bracketing of SQL expression.
More specifically, trying to reference the sub-query data by specifying join conditions against the base table that is used in the sub-query:
JOIN "SURF_RT"."SALES_ACCOUNT"
on "SURF_RT"."SALES_ACCOUNT"."NAME" = "SURF_RT"."SALES_REQUEST"."DV_SALES_ACCOUNT"
JOIN "SURF_RT"."SALES_CONTRACT"
on "SURF_RT"."SALES_CONTRACT"."DV_ACCOUNT" = "SURF_RT"."SALES_REQUEST"."DV_SALES_ACCOUNT"
Since the sub-query (derived table) is used in the query and should be used for the join, it needs to be referred in the join condition, instead.
So, yes, it needs a table alias here and the join conditions need to refer to it.
SELECT
...
FROM
(select *
, row_number() over
(partition by "DV_SALES_ACCOUNT"
order by "U_ANALYZED_DATE" desc) rownum
from
"SURF_RT"."SALES_REQUEST") sr
INNER JOIN "SURF_RT"."SALES_ACCOUNT" sa
on sa."NAME" = sr."DV_SALES_ACCOUNT"
INNER JOIN "SURF_RT"."SALES_CONTRACT" sc
on sc."DV_ACCOUNT" = sr."DV_SALES_ACCOUNT"
WHERE
sr.rownum = 1
AND "DV_SALES_CATEGORY" = 'Compliance'
AND "DV_STATE" NOT IN ('Closed Canceled')
AND (YEAR("U_ANALYZED_DATE") = '2019'
AND MONTH("U_ANALYZED_DATE") IN ('10','11','12')
OR YEAR("U_ANALYZED_DATE") = '2020'
AND MONTH("U_ANALYZED_DATE") IN ('1','2','3'))
AND "U_USD_TOTAL_POTENTIAL_NNACV" > 0
ORDER BY
"U_ANALYZED_DATE" desc;
With just this little bit of standard SQL syntax and formatting of code the query got a lot easier to understand.
Now it's even obvious that the IN conditions for MONTH and YEAR should in fact be integers, not strings as these functions return integers.
SELECT
...
FROM
(SELECT *
, row_number() over
(partition by "DV_SALES_ACCOUNT"
order by "U_ANALYZED_DATE" desc) rownum
FROM
"SURF_RT"."SALES_REQUEST") sr
INNER JOIN "SURF_RT"."SALES_ACCOUNT" sa
on sa."NAME" = sr."DV_SALES_ACCOUNT"
INNER JOIN "SURF_RT"."SALES_CONTRACT" sc
on sc."DV_ACCOUNT" = sr."DV_SALES_ACCOUNT"
WHERE
sr.rownum = 1
AND "DV_SALES_CATEGORY" = 'Compliance'
AND "DV_STATE" NOT IN ('Closed Canceled')
AND ( YEAR("U_ANALYZED_DATE") = 2019
AND MONTH("U_ANALYZED_DATE") IN (10, 11, 12)
OR YEAR("U_ANALYZED_DATE") = '2020'
AND MONTH("U_ANALYZED_DATE") IN (1 , 2 , 3 )
)
AND "U_USD_TOTAL_POTENTIAL_NNACV" > 0
ORDER BY
"U_ANALYZED_DATE" DESC

Unable to convert this legacy SQL into Standard SQL in Google BigQuery

I am not able to validate this legacy sql into standard bigquery sql as I don't know what else is required to change here(This query fails during validation if I choose standard SQL as big query dialect):
SELECT
lineitem.*,
proposal_lineitem.*,
porder.*,
company.*,
product.*,
proposal.*,
trafficker.name,
salesperson.name,
rate_card.*
FROM (
SELECT
*
FROM
dfp_data.dfp_order_lineitem
WHERE
DATE(end_datetime) >= DATE(DATE_ADD(CURRENT_TIMESTAMP(), -1, 'YEAR'))
OR end_datetime IS NULL ) lineitem
JOIN (
SELECT
*
FROM
dfp_data.dfp_order) porder
ON
lineitem.order_id = porder.id
LEFT JOIN (
SELECT
*
FROM
adpoint_data.dfp_proposal_lineitem) proposal_lineitem
ON
lineitem.id = proposal_lineitem.dfp_lineitem_id
JOIN (
SELECT
*
FROM
dfp_data.dfp_company) company
ON
porder.advertiser_id = company.id
LEFT JOIN (
SELECT
*
FROM
adpoint_data.dfp_product) product
ON
proposal_lineitem.product_id=product.id
LEFT JOIN (
SELECT
*
FROM
adpoint_data.dfp_proposal) proposal
ON
proposal_lineitem.proposal_id=proposal.id
LEFT JOIN (
SELECT
*
FROM
adpoint_data.dfp_rate_card) rate_card
ON
proposal_lineitem.ratecard_id=rate_card.id
LEFT JOIN (
SELECT
id,
name
FROM
dfp_data.dfp_user) trafficker
ON
porder.trafficker_id =trafficker.id
LEFT JOIN (
SELECT
id,
name
FROM
dfp_data.dfp_user) salesperson
ON
porder. salesperson_id =salesperson.id
Most likely the error you are getting is something like below
Duplicate column names in the result are not supported. Found duplicate(s): name
Legacy SQL adjust trafficker.name and salesperson.name in your SELECT statement into respectively trafficker_name and salesperson_name thus effectively eliminating column names duplication
Standard SQL behaves differently and treat both those columns as named name thus producing duplication case. To avoid it - you just need to provide aliases as in example below
SELECT
lineitem.*,
proposal_lineitem.*,
porder.*,
company.*,
product.*,
proposal.*,
trafficker.name AS trafficker_name,
salesperson.name AS salesperson_name,
rate_card.*
FROM ( ...
You can easily check above explained using below simplified/dummy queries
#legacySQL
SELECT
porder.*,
trafficker.name,
salesperson.name
FROM (
SELECT 1 order_id, 'abc' order_name, 1 trafficker_id, 2 salesperson_id
) porder
LEFT JOIN (SELECT 1 id, 'trafficker' name) trafficker
ON porder.trafficker_id =trafficker.id
LEFT JOIN (SELECT 2 id, 'salesperson' name ) salesperson
ON porder. salesperson_id =salesperson.id
and
#standardSQL
SELECT
porder.*,
trafficker.name AS trafficker_name,
salesperson.name AS salesperson_name
FROM (
SELECT 1 order_id, 'abc' order_name, 1 trafficker_id, 2 salesperson_id
) porder
LEFT JOIN (SELECT 1 id, 'trafficker' name) trafficker
ON porder.trafficker_id =trafficker.id
LEFT JOIN (SELECT 2 id, 'salesperson' name ) salesperson
ON porder. salesperson_id =salesperson.id
Note: if you have more duplicate names - you need to alias all of them too

How to use outer field name or alias in a nested subquery

I would use the selected field Referencia in the subquery.
I have tried including the field name the alias, and table name but not works.
How I can achieve this ?
Thanks
SELECT * FROM
(
SELECT
articulos.Codigo AS Referencia,
articulos.Nombre AS Descripcion,
barras.Codigo AS [Codigo de Barras],
ROW_NUMBER() OVER (PARTITION BY articulos.Codigo ORDER BY
articulos.Codigo ASC) as cantidad,
articulos.Familia,
articulos.Marca,
categorias.Codigo as Categoria,
articulos.ImpuestoEspecial AS Ecotasa,
articulos.Fase,
articulos.Iva,
--
-- Tarifa1
( SELECT [Codigo],[EuroPrecio]
FROM [GES16100].[dbo].[Tarifas]
WHERE [Codigo] = 1 AND [Articulo] = <------- Here, Referencia
)AS T1,
articulos.Proveedor,
articulos.GUID_Registro
FROM [GES16100].[dbo].[Articulos] as articulos
FULL JOIN [GES16100].[dbo].[Barras] as barras
ON articulos.Codigo = barras.Articulo
FULL JOIN [GES16100].[dbo].[Categorias_Asignaciones] catasignaciones
ON catasignaciones.GUID_RegistroFichero =articulos.GUID_Registro
FULL JOIN [GES16100].[dbo].[CategoriasFicheros] categorias
ON categorias.GUID_Registro = catasignaciones.GUID_Categoria
)AS supersub
WHERE supersub.cantidad = 1
Use table aliases and qualified column names whenever you have more than one table in a query.
Second, your subquery will not work because it returns two columns where one is expected.
For your example, I am guessing:
( SELECT t.EuroPrecio
FROM [GES16100].[dbo].[Tarifas] t
WHERE t.Codigo = 1 AND t.Articulo = a.Codigo
) AS T1,
You cannot use the column alias Referencias because it is defined in the same SELECT. Just use the column it is refering to.

Limit join to one row

I have the following query:
SELECT sum((select count(*) as itemCount) * "SalesOrderItems"."price") as amount, 'rma' as
"creditType", "Clients"."company" as "client", "Clients".id as "ClientId", "Rmas".*
FROM "Rmas" JOIN "EsnsRmas" on("EsnsRmas"."RmaId" = "Rmas"."id")
JOIN "Esns" on ("Esns".id = "EsnsRmas"."EsnId")
JOIN "EsnsSalesOrderItems" on("EsnsSalesOrderItems"."EsnId" = "Esns"."id" )
JOIN "SalesOrderItems" on("SalesOrderItems"."id" = "EsnsSalesOrderItems"."SalesOrderItemId")
JOIN "Clients" on("Clients"."id" = "Rmas"."ClientId" )
WHERE "Rmas"."credited"=false AND "Rmas"."verifyStatus" IS NOT null
GROUP BY "Clients".id, "Rmas".id;
The problem is that the table "EsnsSalesOrderItems" can have the same EsnId in different entries. I want to restrict the query to only pull the last entry in "EsnsSalesOrderItems" that has the same "EsnId".
By "last" entry I mean the following:
The one that appears last in the table "EsnsSalesOrderItems". So for example if "EsnsSalesOrderItems" has two entries with "EsnId" = 6 and "createdAt" = '2012-06-19' and '2012-07-19' respectively it should only give me the entry from '2012-07-19'.
SELECT (count(*) * sum(s."price")) AS amount
, 'rma' AS "creditType"
, c."company" AS "client"
, c.id AS "ClientId"
, r.*
FROM "Rmas" r
JOIN "EsnsRmas" er ON er."RmaId" = r."id"
JOIN "Esns" e ON e.id = er."EsnId"
JOIN (
SELECT DISTINCT ON ("EsnId") *
FROM "EsnsSalesOrderItems"
ORDER BY "EsnId", "createdAt" DESC
) es ON es."EsnId" = e."id"
JOIN "SalesOrderItems" s ON s."id" = es."SalesOrderItemId"
JOIN "Clients" c ON c."id" = r."ClientId"
WHERE r."credited" = FALSE
AND r."verifyStatus" IS NOT NULL
GROUP BY c.id, r.id;
Your query in the question has an illegal aggregate over another aggregate:
sum((select count(*) as itemCount) * "SalesOrderItems"."price") as amount
Simplified and converted to legal syntax:
(count(*) * sum(s."price")) AS amount
But do you really want to multiply with the count per group?
I retrieve the the single row per group in "EsnsSalesOrderItems" with DISTINCT ON. Detailed explanation:
Select first row in each GROUP BY group?
I also added table aliases and formatting to make the query easier to parse for human eyes. If you could avoid camel case you could get rid of all the double quotes clouding the view.
Something like:
join (
select "EsnId",
row_number() over (partition by "EsnId" order by "createdAt" desc) as rn
from "EsnsSalesOrderItems"
) t ON t."EsnId" = "Esns"."id" and rn = 1
this will select the latest "EsnId" from "EsnsSalesOrderItems" based on the column creation_date. As you didn't post the structure of your tables, I had to "invent" a column name. You can use any column that allows you to define an order on the rows that suits you.
But remember the concept of the "last row" is only valid if you specifiy an order or the rows. A table as such is not ordered, nor is the result of a query unless you specify an order by
Necromancing because the answers are outdated.
Take advantage of the LATERAL keyword introduced in PG 9.3
left | right | inner JOIN LATERAL
I'll explain with an example:
Assuming you have a table "Contacts".
Now contacts have organisational units.
They can have one OU at a point in time, but N OUs at N points in time.
Now, if you have to query contacts and OU in a time period (not a reporting date, but a date range), you could N-fold increase the record count if you just did a left join.
So, to display the OU, you need to just join the first OU for each contact (where what shall be first is an arbitrary criterion - when taking the last value, for example, that is just another way of saying the first value when sorted by descending date order).
In SQL-server, you would use cross-apply (or rather OUTER APPLY since we need a left join), which will invoke a table-valued function on each row it has to join.
SELECT * FROM T_Contacts
--LEFT JOIN T_MAP_Contacts_Ref_OrganisationalUnit ON MAP_CTCOU_CT_UID = T_Contacts.CT_UID AND MAP_CTCOU_SoftDeleteStatus = 1
--WHERE T_MAP_Contacts_Ref_OrganisationalUnit.MAP_CTCOU_UID IS NULL -- 989
-- CROSS APPLY -- = INNER JOIN
OUTER APPLY -- = LEFT JOIN
(
SELECT TOP 1
--MAP_CTCOU_UID
MAP_CTCOU_CT_UID
,MAP_CTCOU_COU_UID
,MAP_CTCOU_DateFrom
,MAP_CTCOU_DateTo
FROM T_MAP_Contacts_Ref_OrganisationalUnit
WHERE MAP_CTCOU_SoftDeleteStatus = 1
AND MAP_CTCOU_CT_UID = T_Contacts.CT_UID
/*
AND
(
(#in_DateFrom <= T_MAP_Contacts_Ref_OrganisationalUnit.MAP_KTKOE_DateTo)
AND
(#in_DateTo >= T_MAP_Contacts_Ref_OrganisationalUnit.MAP_KTKOE_DateFrom)
)
*/
ORDER BY MAP_CTCOU_DateFrom
) AS FirstOE
In PostgreSQL, starting from version 9.3, you can do that, too - just use the LATERAL keyword to achieve the same:
SELECT * FROM T_Contacts
--LEFT JOIN T_MAP_Contacts_Ref_OrganisationalUnit ON MAP_CTCOU_CT_UID = T_Contacts.CT_UID AND MAP_CTCOU_SoftDeleteStatus = 1
--WHERE T_MAP_Contacts_Ref_OrganisationalUnit.MAP_CTCOU_UID IS NULL -- 989
LEFT JOIN LATERAL
(
SELECT
--MAP_CTCOU_UID
MAP_CTCOU_CT_UID
,MAP_CTCOU_COU_UID
,MAP_CTCOU_DateFrom
,MAP_CTCOU_DateTo
FROM T_MAP_Contacts_Ref_OrganisationalUnit
WHERE MAP_CTCOU_SoftDeleteStatus = 1
AND MAP_CTCOU_CT_UID = T_Contacts.CT_UID
/*
AND
(
(__in_DateFrom <= T_MAP_Contacts_Ref_OrganisationalUnit.MAP_KTKOE_DateTo)
AND
(__in_DateTo >= T_MAP_Contacts_Ref_OrganisationalUnit.MAP_KTKOE_DateFrom)
)
*/
ORDER BY MAP_CTCOU_DateFrom
LIMIT 1
) AS FirstOE
Try using a subquery in your ON clause. An abstract example:
SELECT
*
FROM table1
JOIN table2 ON table2.id = (
SELECT id FROM table2 WHERE table2.table1_id = table1.id LIMIT 1
)
WHERE
...