Multi-if else using question mark - apache-pig

I need to evaluate the CD_MARCHE according to CD_AXE_MCH 's values in pig. I should only use the question mark option as bellow:
(CD_AXE_MCH IN ('PLIB','ATPE','COMM') ? 'P': (CD_AXE_MCH == 'PME') ?
'E': (CD_AXE_MCH == 'AGRI') ? 'A': (CD_AXE_MCH == 'OBNL') ?
'O':(CD_AXE_MCH == 'COLL') ? 'C' :(CD_AXE_MCH == 'EFIN') ?
'B' :'X') AS CD_MARCHE,
But this return this error
mismatched input '?' expecting RIGHT_PAREN
How can I resolve it please ?

In this scenario it's easier to use a CASE statement. Available from Pig version 0.11+.
CASE
WHEN CD_AXE_MCH MATCHES 'PLIB|ATPE|COMM' THEN 'P'
WHEN CD_AXE_MCH == 'PME' THEN 'E'
WHEN CD_AXE_MCH == 'AGRI' THEN 'A'
WHEN CD_AXE_MCH == 'OBNL' THEN 'O'
WHEN CD_AXE_MCH == 'COLL' THEN 'C'
WHEN CD_AXE_MCH == 'EFIN' THEN 'B'
ELSE 'B'
END CD_MARCHE
If that's not feasible/supported, make sure to place the parentheses correctly.
(CD_AXE_MCH MATCHES 'PLIB|ATPE|COMM' ? 'P' : (CD_AXE_MCH == 'PME' ? 'E' : (CD_AXE_MCH == 'AGRI' ? 'A' : (etc.))))

Related

NOT statement in where Clause in SQL Server Slow

I have this query , where the results are as expected but the query is really slow. The below is just an example
SELECT ispending, isviewable, iscomparable, ID
FROM tableA
WHERE
name = 'Karen'
AND NOT ((ispending = 'F' AND isviewable = '0') OR
(ispending = 'T' AND iscomparable = '0') OR
(ispending = 'T' AND iscomparable IS NULL AND isviewable = '0') OR
(ispending IS NULL AND iscomparable = '0'))
How to achieve the same result but not using the 'NOT' statement in the where clause?
I tried changing the not to be within the clause
WHERE (ispending != 'F' AND isviewable != '0') OR
(ispending != 'T' AND iscomparable != '0') OR
(ispending != 'T' AND iscomparable IS NOT NULL AND isviewable != '0') OR
(ispending IS NOT NULL AND iscomparable !='0')
but the expected results are different.
You're second code block is quite close. De Morgan's law guides in trying to switch operands on boolean operations.
The code is correct to switch = with != (the negation of the =). But you will also need to negate conjunctions/disjunctions. In essence: AND becomes OR and vice versa.
WHERE (ispending != 'F' OR isviewable != '0') AND
(ispending != 'T' OR iscomparable != '0') AND
(ispending != 'T' OR iscomparable IS NOT NULL OR isviewable != '0') AND
(ispending IS NOT NULL OR iscomparable != '0')
Now we have the logical equivalent.

Pandas .loc[] method is too slow, how can I speed it up

I have a dataframe with 40 million rows,and I want to change some colums by
age = data[data['device_name'] == 12]['age'].apply(lambda x : x if x != -1 else max_age)
data.loc[data['device_name'] == 12,'age'] = age
but this method is too slow, how can I speed it up.
Thanks for all reply!
you might wanna change the first part to :
age = data[data['device_name'] == 12]['age']
age[age == -1] = max_age
data.loc[data['device_name'] == 12,'age'] = age
you could use, to me more concise(this could gain you a little speed)
cond = data['device_name'] == 12
age = data.loc[cond, age]
data.loc[cond,'age'] = age.where(age != -1, max_age)

My .loc with multiple conditions keeps running...help me land the plane

When I try to run the code below, it just keeps running. Is it something obvious?
df.loc[(df['Target_Group'] == 'x') & (df['Period'].dt.year == df['Year_Performed'].dt.year), ['Target_P']] = df.loc[(df['Target_Group'] == 'x') & (df['Period'].dt.year == df['Year_Performed'].dt.year), ['y']]
I think you need assign condition to variable and the reuse:
m = (df['Target_Group'] == 'x') & (df['Period'].dt.year == df['Year_Performed'].dt.year)
df.loc[m, 'Target_P'] = df.loc[m, 'y']
For improve performance is possible use numpy.where:
df['Target_P'] = np.where(m, df['y'], df['Target_P'])
pandas is index sensitive , so you do not need repeat the condition for assignment
cond=(df['Target_Group'] == 'x') & (df['Period'].dt.year == df['Year_Performed'].dt.year)
df.loc[cond, 'Target_P'] = df.y
More info, example
df=pd.DataFrame({'cond':[1,2],'v1':[-110,-11],'v2':[9999,999999]})
df.loc[df.cond==1,'v1']=df.v2
df
Out[200]:
cond v1 v2
0 1 9999 9999
1 2 -11 999999
If index contain duplicate
df.loc[cond, 'Target_P'] = df.loc[cond,'y'].values

Disregard component of a Triple in a comparison

I am attempting to compare Triples while disregarding certain values of the Triple. The value I wish to disregard below is signified by _. Note the below code is for example purposes and does not compile because _ is an Unresolved reference.
val coordinates = Triple(3, 2, 5)
when (coordinates) {
Triple(0, 0, 0) -> println("Origin")
Triple(_, 0, 0)-> println("On the x-axis.")
Triple(0, _, 0)-> println("On the y-axis.")
Triple(0, 0, _)-> println("On the z-axis.")
else-> println("Somewhere in space")
}
I know you can use _ when destructuring if you would like to ignore a value but that doesn't seem to help me with the above issue:
val (x4, y4, _) = coordinates
println(x4)
println(y4)
Any ideas how I can achieve this?
Thank you!
Underscore for unused variables was introduced in Kotlin 1.1 and it is designed to be used when some variables are not needed in the destructuring declaration.
In the branch conditions of your when expression, Triple(0, 0, 0) is creating an new instance but not destructuring. So, using underscore is not permitted here.
Currently, destructuring in the branch conditions of when expression is not possible in Kotlin. One of the solutions for your case is to compare each of the component verbosely in each branch condition:
val (x, y, z) = Triple(3, 2, 5)
when {
x == 0 && y == 0 && z == 0 -> println("Origin")
y == 0 && z == 0 -> println("On the x-axis.")
x == 0 && z == 0 -> println("On the y-axis.")
x == 0 && y == 0 -> println("On the z-axis.")
else -> println("Somewhere in space")
}
Here is a discussion on destructuring in when expression.

Using Multiple ANDs and ORs in ANSI SQL

I have a simple SQL query:
SELECT
w.fizz
FROM
widgets w
WHERE
w.special_id = 2394
AND w.buzz IS NOT NULL
AND w.foo = 12
In pseudo-code, this WHERE clause could be thought of as:
if(specialId == 2394 && buzz != null && foo == 12)
I now want to change this query so that it returns all widgets whose special_id is 2394, and whose buzz is not null, and whose foo is 12, OR whose special_id is 2394, and whose blah is 'YES', and whose num is 4. In pseudo-code:
if(specialId == 2394 && (buzz != null && foo == 12) || (blah == "YES" && num == 4))
I tried the following, only to get errors:
SELECT
w.fizz
FROM
widgets w
WHERE
w.special_id = 2394
AND
(
w.buzz IS NOT NULL
AND w.foo = 12
)
OR
(
w.blah = 'YES'
AND w.num = 4
)
Any ideas? Thanks in advance!
SELECT
w.fizz
FROM
widgets w
WHERE
w.special_id = 2394
AND
(
(
w.buzz != null
AND w.foo = 12
)
OR
(
w.blah = 'YES'
AND w.num = 4
)
)
Add additional brackets surrounding "OR", because "OR" has less priority than "AND".