pyspark.sql.utils.ParseException: u"\nextraneous > input 'xxx' expecting {')', ','} - sql

I have 2 main tables: flights and holidays.
Flights is identified by: outboundlegid, inboundlegid, agent, querydatetime. The additional columns applicable to the question are out_date, in_date. They indicate when is the flight departing and the return date.
And Holidays with the columns start, end, type
I want to determine if the flight out/in date is intersecting any thing from holidays table.
I followed some suggestion from PySpark: How to add columns whose data come from a query (similar to subquery for each row) to determine if the out/in dates intersect any holiday.
However, I am getting: "pyspark.sql.utils.ParseException: u"\nextraneous
input 'outboundlegid' expecting {')', ','}(line 35, pos 12)". Whats wrong here?
File "script_2019-02-08-10-46-14.py", line 182, in """) File
"/mnt/yarn/usercache/root/appcache/application_1549622095592_0002/container_1549622095592_0002_01_000001/pyspark.zip/pyspark/sql/session.py",
line 603, in sql File
"/mnt/yarn/usercache/root/appcache/application_1549622095592_0002/container_1549622095592_0002_01_000001/py4j-0.10.4-src.zip/py4j/java_gateway.py",
line 1133, in call File
"/mnt/yarn/usercache/root/appcache/application_1549622095592_0002/container_1549622095592_0002_01_000001/pyspark.zip/pyspark/sql/utils.py",
line 73, in deco pyspark.sql.utils.ParseException: u"\nextraneous
input 'outboundlegid' expecting {')', ','}(line 35, pos 12)\n\n== SQL
==\n\n WITH t (\n SELECT \n f.outboundlegid,\n f.inboundlegid,\n f.agent,\n f.querydatetime,\n CASE WHEN type = 'HOLIDAY' AND (out_date
BETWEEN start AND end)\n THEN true\n ELSE false\n END
out_is_holiday,\n CASE WHEN type = 'LONG_WEEKENDS' AND (out_date
BETWEEN start AND end)\n THEN true\n ELSE false\n END
out_is_longweekends,\n CASE WHEN type = 'HOLIDAY' AND (in_date BETWEEN
start AND end)\n THEN true\n ELSE false\n END in_is_holiday,\n CASE
WHEN type = 'LONG_WEEKENDS' AND (in_date BETWEEN start AND end)\n THEN
true\n ELSE false\n END in_is_longweekends\n FROM flights f\n CROSS
JOIN holidays h\n )\n SELECT \n f.*,\n t1.out_is_holiday,\n
t1.out_is_longweekends,\n t1.in_is_holiday,\n t1.in_is_longweekends,\n
FROM (\n SELECT \n outboundlegid,\n------------^^^\n inboundlegid,\n
agent,\n querydatetime,\n CASE WHEN
array_contains(collect_set(out_is_holiday), true)\n THEN true\n ELSE
false\n END out_is_holiday,\n CASE WHEN
array_contains(collect_set(out_is_longweekends), true)\n THEN true\n
ELSE false\n END out_is_longweekends,\n CASE WHEN
array_contains(collect_set(in_is_holiday), true)\n THEN true\n ELSE
false\n END in_is_holiday,\n CASE WHEN
array_contains(collect_set(in_is_longweekends), true)\n THEN true\n
ELSE false\n
Whats the problem here?
resultDf = spark.sql("""
WITH t (
SELECT
f.outboundlegid,
f.inboundlegid,
f.agent,
f.querydatetime,
CASE WHEN type = 'HOLIDAY' AND (out_date BETWEEN start AND end)
THEN true
ELSE false
END out_is_holiday,
CASE WHEN type = 'LONG_WEEKENDS' AND (out_date BETWEEN start AND end)
THEN true
ELSE false
END out_is_longweekends,
CASE WHEN type = 'HOLIDAY' AND (in_date BETWEEN start AND end)
THEN true
ELSE false
END in_is_holiday,
CASE WHEN type = 'LONG_WEEKENDS' AND (in_date BETWEEN start AND end)
THEN true
ELSE false
END in_is_longweekends
FROM flights f
CROSS JOIN holidays h
)
SELECT
f.*,
t1.out_is_holiday,
t1.out_is_longweekends,
t1.in_is_holiday,
t1.in_is_longweekends,
FROM (
SELECT
outboundlegid, # <<< I am guessing something wrong with this? But Why?
inboundlegid,
agent,
querydatetime,
CASE WHEN array_contains(collect_set(out_is_holiday), true)
THEN true
ELSE false
END out_is_holiday,
CASE WHEN array_contains(collect_set(out_is_longweekends), true)
THEN true
ELSE false
END out_is_longweekends,
CASE WHEN array_contains(collect_set(in_is_holiday), true)
THEN true
ELSE false
END in_is_holiday,
CASE WHEN array_contains(collect_set(in_is_longweekends), true)
THEN true
ELSE false
END in_is_longweekends
FROM t
GROUP BY
querydatetime,
outboundlegid,
inboundlegid,
agent
LIMIT 100000
) t1
INNER JOIN flights f
ON t1.querydatetime = f.querydatetime
AND t1.outboundlegid = f.outboundlegid
AND t1.inboundlegid = f.inboundlegid
AND t1.agent = f.agent
INNER JOIN agents a
ON f.agent = a.id
INNER JOIN airports p
ON f.querydestinationplace = p.airportId
""")

Related

Using if statement in string_agg function - postreSQL

The query is as follows
WITH notes AS (
SELECT 891090 Order_ID, False customer_billing, false commander, true agent
UNION ALL
SELECT 891091, false, true, true
UNION ALL
SELECT 891091, true, false, false)
SELECT
n.order_id,
string_Agg(distinct CASE
WHEN n.customer_billing = TRUE THEN 'AR (Customer Billing)'
WHEN n.commander = TRUE THEN 'AP (Commander)'
WHEN n.agent = TRUE THEN 'AP (Agent)'
ELSE NULL
END,', ') AS finance
FROM notes n
WHERE
n.order_id = 891091 AND (n.customer_billing = TRUE or n.commander = TRUE or n.agent = TRUE)
GROUP BY ORDER_ID
As you can see there are two records with order_id as 891091.
First 891091 record has commander and agent set as true
Second 891091 record has customer_billing set as true
Since switch case is used, it considers only the first true value and returns commander and does not consider agent.
So the output becomes
order_id finance
891091 AP (Commander), AR (Customer Billing)
dbfiddle.uk Example
I need all the true values in the record to be considered so that the output becomes
order_id finance
891091 AP (Commander), AP (Agent), AR (Customer Billing)
My initial thought is that using if statement instead of case statement may fix this. I am not sure how to do this inside string_agg function
How to achieve this?
EDIT 1:
The answer specified below works almost fine. But the issue is that the comma separated values are not distinct
Here is the updated fiddle
https://dbfiddle.uk/?rdbms=postgres_14&fiddle=9647d92870e3944516172eda83a8ac6e
You can consider splitting your case into separate ones and using array to collect them. Then you can use array_to_string to format:
WITH notes AS (
SELECT 891090 Order_ID, False customer_billing, false commander, true agent UNION ALL
SELECT 891091, false, true, true UNION ALL
SELECT 891091, true, true, false),
tmp as (
SELECT
n.order_id id,
array_agg(
ARRAY[
CASE WHEN n.customer_billing = TRUE THEN 'AR (Customer Billing)' END,
CASE WHEN n.commander = TRUE THEN 'AP (Commander)' END,
CASE WHEN n.agent = TRUE THEN 'AP (Agent)' END
]) AS finance_array
FROM notes n
WHERE
n.order_id = 891091 AND (n.customer_billing = TRUE or n.commander = TRUE or n.agent = TRUE)
GROUP BY ORDER_ID )
select id, array_to_string(array(select distinct e from unnest(finance_array) as a(e)), ', ')
from tmp;
Here is db_fiddle.

ORA-01476: Divisor is equal to zero

I have this query that I've been given to fix and I cannot understand why its causing error. I reviewed similar topics involving the max function and the null function but I can't seem to make them work without throwing another error.
select case when count(trlr_num) > 0 then 'Inbounds Need Closed'
else 'All Good'
end as "Inbounds To Be Closed"
from (select t.trlr_num,
r.invnum "Invoice",
to_char(nvl(max(rl.init_rcv_dte), sysdate), 'MM/DD/YY HH:MI AM') as "Begin receive date",
round(sum(rl.idnqty) / sum(rl.expqty) *100, 1) as "% Received%"
from trlr t
join rcvinv r
on t.trlr_num = r.trknum
join rcvlin rl
on r.trknum = rl.trknum
and r.supnum = rl.supnum
and r.invnum = rl.invnum
and r.wh_id = rl.wh_id
join rcvtrk rt
on r.trknum = rt.trknum
and t.trlr_id = rt.trlr_id
and r.wh_id = rt.wh_id
where t.yard_loc_wh_id = 'US_3218'
and rt.rcvtrk_stat <> 'C'
and r.invtyp not in ('PKG', 'XFR')
and (rl.rcvsts <> 'QI' or (rl.rcvsts = 'QI' and t.trlr_seal1 is not null and t.trlr_seal2 is not null))
group by t.trlr_num,
r.invnum
having sum(rl.idnqty) / sum(rl.expqty) = 1
and nvl(max(rl.init_rcv_dte), sysdate) < sysdate -2)

In an array constant with psql

Here is my psql query :
WITH myconstants (nb_pieces,nb_room,is_fiber,codes_insee) as (
values (0,0,false,('95018','75018'))
)
SELECT
*
FROM
on_plan_buy pbuy
INNER JOIN
card_fiche fiche
ON pbuy.uuid = fiche.ad_uuid
INNER JOIN
myconstants const
ON true
WHERE pbuy.code_insee IN ('95018','75018')
AND pbuy.price <= 99999999 AND pbuy.price >= 0
AND CASE WHEN const.nb_pieces = 0 THEN pbuy.piece > 0 ELSE pbuy.piece = const.nb_pieces END
AND CASE WHEN const.nb_room = 0 THEN pbuy.chambre > 0 ELSE pbuy.chambre = const.nb_room END
AND CASE WHEN const.is_fiber = false THEN true ELSE fiche.fiber = true END
LIMIT 100;
It works fine.
But i would like to use my constant:
WITH myconstants (nb_pieces,nb_room,is_fiber,codes_insee) as (
values (0,0,false,('95018','75018'))
)
SELECT
*
FROM
on_plan_buy pbuy
INNER JOIN
card_fiche fiche
ON pbuy.uuid = fiche.ad_uuid
INNER JOIN
myconstants const
ON true
WHERE pbuy.code_insee IN const.codes_insee
AND pbuy.price <= 99999999 AND pbuy.price >= 0
AND CASE WHEN const.nb_pieces = 0 THEN pbuy.piece > 0 ELSE pbuy.piece = const.nb_pieces END
AND CASE WHEN const.nb_room = 0 THEN pbuy.chambre > 0 ELSE pbuy.chambre = const.nb_room END
AND CASE WHEN const.is_fiber = false THEN true ELSE fiche.fiber = true END
LIMIT 100;
And now it doesn't works.
Any idea how to use correctly the const.codes_insee ?
Regards
You have created a record with ('95018','75018') for codes_insee where you would like to specify a set of values to search using the IN operator. However this operator does not work with this type. Instead, you can change the data type to an array, allowing you to add more filters in the future if you desire, and use the array functions. I've used array_position to determine if the value is in the array. See below
WITH myconstants (nb_pieces,nb_room,is_fiber,codes_insee) as (
values (0,0,false,string_to_array('95018,75018',','))
)
SELECT
*
FROM
on_plan_buy pbuy
INNER JOIN
card_fiche fiche
ON pbuy.uuid = fiche.ad_uuid
INNER JOIN
myconstants const
ON true
WHERE
array_position(const.codes_insee,cast(pbuy.code_insee as text)) > -1
AND pbuy.price <= 99999999 AND pbuy.price >= 0
AND CASE WHEN const.nb_pieces = 0 THEN pbuy.piece > 0 ELSE pbuy.piece = const.nb_pieces END
AND CASE WHEN const.nb_room = 0 THEN pbuy.chambre > 0 ELSE pbuy.chambre = const.nb_room END
AND CASE WHEN const.is_fiber = false THEN true ELSE fiche.fiber = true END
LIMIT 100;
Reference
Postgresql Array Functions
Instead of IN please try exists:
WITH myconstants (nb_pieces,nb_room,is_fiber,codes_insee) as (
values (0,0,false,('95018','75018'))
)
SELECT
*
FROM
on_plan_buy pbuy
INNER JOIN
card_fiche fiche
ON pbuy.uuid = fiche.ad_uuid
INNER JOIN
myconstants const
ON true
WHERE
pbuy.price <= 99999999 AND pbuy.price >= 0
AND CASE WHEN const.nb_pieces = 0 THEN pbuy.piece > 0 ELSE pbuy.piece = const.nb_pieces END
AND CASE WHEN const.nb_room = 0 THEN pbuy.chambre > 0 ELSE pbuy.chambre = const.nb_room END
AND CASE WHEN const.is_fiber = false THEN true ELSE fiche.fiber = true END
and exists
(
select 1 from myconstants where pbuy.code_insee = myconstants.codes_insee
)
LIMIT 100;

Query Timeout Expired in SSAS Cube Processing

While migrating a cube from 2008 to 2014, we had a cube processing failure with the message "Query Timeout Expired : HYT00". I looked into the error information and found a certain query executing for more than an hour which is causing the issue. The query is,
SELECT [dbo_IndicatorFact].[PY] AS [dbo_IndicatorFactPY0_0],[dbo_IndicatorFact].[BP] AS [dbo_IndicatorFactBP0_1],[dbo_IndicatorFact].[RE] AS [dbo_IndicatorFactRE0_2],[dbo_IndicatorFact].[UCPY] AS [dbo_IndicatorFactUCPY0_3],[dbo_IndicatorFact].[UCBP] AS [dbo_IndicatorFactUCBP0_4],[dbo_IndicatorFact].[UCRE] AS [dbo_IndicatorFactUCRE0_5],[dbo_IndicatorFact].[GRPY] AS [dbo_IndicatorFactGRPY0_6],[dbo_IndicatorFact].[GRBP] AS [dbo_IndicatorFactGRBP0_7],[dbo_IndicatorFact].[GRRE] AS [dbo_IndicatorFactGRRE0_8],[dbo_IndicatorFact].[NRPY] AS [dbo_IndicatorFactNRPY0_9],[dbo_IndicatorFact].[NRBP] AS [dbo_IndicatorFactNRBP0_10],[dbo_IndicatorFact].[NRRE] AS [dbo_IndicatorFactNRRE0_11],[dbo_IndicatorFact].[GRC2] AS [dbo_IndicatorFactGRC20_12],[dbo_IndicatorFact].[AnalysisCategoryId] AS [dbo_IndicatorFactAnalysisCategoryId0_13],[dbo_IndicatorFact].[IndicatorTypeId] AS [dbo_IndicatorFactIndicatorTypeId0_14],[dbo_IndicatorFact].[IndicatorNameId] AS [dbo_IndicatorFactIndicatorNameId0_15],[dbo_IndicatorFact].[CategoryId] AS [dbo_IndicatorFactCategoryId0_16],[dbo_IndicatorFact].[CountryId] AS [dbo_IndicatorFactCountryId0_17],[dbo_IndicatorFact].[FiscalQuarterId] AS [dbo_IndicatorFactFiscalQuarterId0_18]
FROM
(
SELECT vwIndicatorFact.IndicatorId AS Id,
CASE vwIndicatorFact.IndicatorTypeId WHEN 1 THEN vwIndicatorFact.PY ELSE CASE IndicatorFact_GRC2.PY WHEN 0 THEN 0 ELSE vwIndicatorFact.PY END END AS PY,
CASE vwIndicatorFact.IndicatorTypeId WHEN 1 THEN vwIndicatorFact.BP ELSE CASE IndicatorFact_GRC2.BP WHEN 0 THEN 0 ELSE vwIndicatorFact.BP END END AS BP,
CASE vwIndicatorFact.IndicatorTypeId WHEN 1 THEN vwIndicatorFact.RE ELSE CASE IndicatorFact_GRC2.RE WHEN 0 THEN 0 ELSE vwIndicatorFact.RE END END AS RE,
vwIndicatorFact.BPvsPY, vwIndicatorFact.BPvsPYpercent, vwIndicatorFact.REvsBP, vwIndicatorFact.REvsBPpercent, vwIndicatorFact.REvsPY,
vwIndicatorFact.REvsPYpercent,
CASE vwIndicatorFact.IndicatorTypeId WHEN 1 THEN CASE vwIndicatorFact.IndicatorNameId WHEN 1 THEN 1000000 ELSE IndicatorFact_UC.PY END ELSE CASE IndicatorFact_GRC2.PY
WHEN 0 THEN 0 ELSE CASE vwIndicatorFact.IndicatorNameId WHEN 1 THEN 1000000 ELSE IndicatorFact_UC.PY END END END AS UCPY,
CASE vwIndicatorFact.IndicatorTypeId WHEN 1 THEN CASE vwIndicatorFact.IndicatorNameId WHEN 1 THEN 1000000 ELSE IndicatorFact_UC.BP END ELSE CASE IndicatorFact_GRC2.BP
WHEN 0 THEN 0 ELSE CASE vwIndicatorFact.IndicatorNameId WHEN 1 THEN 1000000 ELSE IndicatorFact_UC.BP END END END AS UCBP,
CASE vwIndicatorFact.IndicatorTypeId WHEN 1 THEN CASE vwIndicatorFact.IndicatorNameId WHEN 1 THEN 1000000 ELSE IndicatorFact_UC.RE END ELSE CASE IndicatorFact_GRC2.RE
WHEN 0 THEN 0 ELSE CASE vwIndicatorFact.IndicatorNameId WHEN 1 THEN 1000000 ELSE IndicatorFact_UC.RE END END END AS UCRE, vwIndicatorFact.IndicatorNameId,
vwIndicatorFact.CategoryId, vwIndicatorFact.AnalysisCategoryId, vwIndicatorFact.CountryId, vwIndicatorFact.FiscalQuarterId, vwIndicatorFact.IndicatorTypeId,
CASE vwIndicatorFact.IndicatorTypeId WHEN 1 THEN IndicatorFact_GR.PY ELSE CASE IndicatorFact_GRC2.PY WHEN 0 THEN 0 ELSE IndicatorFact_GR.PY END END AS GRPY,
CASE vwIndicatorFact.IndicatorTypeId WHEN 1 THEN IndicatorFact_GR.BP ELSE CASE IndicatorFact_GRC2.BP WHEN 0 THEN 0 ELSE IndicatorFact_GR.BP END END AS GRBP,
CASE vwIndicatorFact.IndicatorTypeId WHEN 1 THEN IndicatorFact_GR.RE ELSE CASE IndicatorFact_GRC2.RE WHEN 0 THEN 0 ELSE IndicatorFact_GR.RE END END AS GRRE,
CASE vwIndicatorFact.IndicatorTypeId WHEN 1 THEN IndicatorFact_NR.PY ELSE CASE IndicatorFact_GRC2.PY WHEN 0 THEN 0 ELSE IndicatorFact_NR.PY END END AS NRPY,
CASE vwIndicatorFact.IndicatorTypeId WHEN 1 THEN IndicatorFact_NR.BP ELSE CASE IndicatorFact_GRC2.BP WHEN 0 THEN 0 ELSE IndicatorFact_NR.BP END END AS NRBP,
CASE vwIndicatorFact.IndicatorTypeId WHEN 1 THEN IndicatorFact_NR.RE ELSE CASE IndicatorFact_GRC2.RE WHEN 0 THEN 0 ELSE IndicatorFact_NR.RE END END AS NRRE,
IndicatorFact_GRC2.BP AS GRC2
FROM
dbo.vwIndicatorFact INNER JOIN
dbo.vwIndicatorFact AS IndicatorFact_UC ON vwIndicatorFact.IndicatorTypeId = IndicatorFact_UC.IndicatorTypeId AND
vwIndicatorFact.FiscalQuarterId = IndicatorFact_UC.FiscalQuarterId AND vwIndicatorFact.CountryId = IndicatorFact_UC.CountryId AND
vwIndicatorFact.CategoryId = IndicatorFact_UC.CategoryId AND IndicatorFact_UC.IndicatorNameId = 1 LEFT OUTER JOIN
dbo.vwIndicatorFact AS IndicatorFact_GR ON vwIndicatorFact.IndicatorTypeId = IndicatorFact_GR.IndicatorTypeId AND
vwIndicatorFact.FiscalQuarterId = IndicatorFact_GR.FiscalQuarterId AND vwIndicatorFact.CountryId = IndicatorFact_GR.CountryId AND
vwIndicatorFact.CategoryId = IndicatorFact_GR.CategoryId AND IndicatorFact_GR.IndicatorNameId = 3 AND
IndicatorFact_GR.AnalysisCategoryId = vwIndicatorFact.AnalysisCategoryId LEFT OUTER JOIN
dbo.vwIndicatorFact AS IndicatorFact_NR ON vwIndicatorFact.IndicatorTypeId = IndicatorFact_NR.IndicatorTypeId AND
vwIndicatorFact.FiscalQuarterId = IndicatorFact_NR.FiscalQuarterId AND vwIndicatorFact.CountryId = IndicatorFact_NR.CountryId AND
vwIndicatorFact.CategoryId = IndicatorFact_NR.CategoryId AND IndicatorFact_NR.IndicatorNameId = 5 AND
IndicatorFact_NR.AnalysisCategoryId = vwIndicatorFact.AnalysisCategoryId LEFT OUTER JOIN
dbo.vwIndicatorFact AS IndicatorFact_GRC2 ON IndicatorFact_GRC2.IndicatorTypeId = 2 AND vwIndicatorFact.FiscalQuarterId = IndicatorFact_GRC2.FiscalQuarterId AND
vwIndicatorFact.CountryId = IndicatorFact_GRC2.CountryId AND vwIndicatorFact.CategoryId = IndicatorFact_GRC2.CategoryId AND
IndicatorFact_GRC2.IndicatorNameId = 3 AND IndicatorFact_GRC2.AnalysisCategoryId = 2
)
AS [dbo_IndicatorFact]
This is basically multiple self joins on a particular view which contains 300k records. Our dba updated all the indexes and updated the stats, But we are still not able to execute this query quickly. If this query executes quicker, there's a chance that the cube will process quicker. The data in the view also has no inconsistencies. Need some advice on what could be the issue here.
Some background on this - this is part of migration project that we are working on. The old dev environment is able to execute the same query in about 20 seconds with similar number of records in the view. The new dev takes forever to execute the same query with the same view.
In your statement it can be anything: poor indexes, outdated statistics, bad statement, huge data....
I had similar problem and I've increased external command timeout in SSAS(it is 1 hour by default). Look here how to do it if you will not manage to optimize your query:
http://www.msbiguide.com/2013/01/how-to-increase-externalcommandtimeout-in-ssas/

Combining Case Statements in a View

I have these two case statements and can not for the life of me figure out how to combine them to show in a MSSQL view. Any help would be great.
CASE WHEN [ordertype] = '2' THEN [CommissionAmt1] * - 1 ELSE [CommissionAmt1] END
and
CASE WHEN (is_member('Buyer') = 1 OR is_member('CustomerService') = 1) THEN 0 ELSE CommissionAmt1 END
Just adding the first case to wherever the CommissionAmt1 is referenced in the second statement.
CASE WHEN (is_member('Buyer') = 1 OR is_member('CustomerService') = 1) THEN
0
ELSE
CASE WHEN [ordertype] = '2' THEN
[CommissionAmt1] * - 1
ELSE
[CommissionAmt1]
END
END
Or going the other way. It was hard to understand which way the calculation needs to be performed. The only hint was []
CASE WHEN [ordertype] = '2' THEN
(
CASE WHEN (is_member('Buyer') = 1 OR is_member('CustomerService') = 1) THEN
0
ELSE
CommissionAmt1
END
) * - 1
ELSE
CASE WHEN (is_member('Buyer') = 1 OR is_member('CustomerService') = 1) THEN
0
ELSE
CommissionAmt1
END
END
Either way, you would be able to save some calculations by sub querying the dependent value.
SELECT
*,
ValueWithDependant=CASE WHEN (Dependant>0) THEN (SomeValue / Dependant) ELSE NULL END
FROM
(
SELECT
X,Y,Z,
Dependant=CASE WHEN SomeValue=1 THEN 1 ELSE 0 END
FROM
SomeTable
)AS DETAIL