Pyspark: mismatched input ... expecting EOF - dataframe

I want to add a column to a data frame and depending on whether a certain value appears in the source json, the value of the column should be the value from the source or null.
My code looks like this:
withColumn("STATUS_BIT", expr("case when 'statusBit:' in jsonDF.schema.simpleString() then statusBit else None end"))
When I run this, I am getting "mismatched input ''statusBit:'' expecting {< EOF >, '-'} .
Am I doing something wrong with the quotation marks?
When I try
withColumn("STATUS_BIT", expr("case when \'statusBit:\' in jsonDF.schema.simpleString() then statusBit else None end"))
I get the exact same error.
Trying the whole thing without expr but as a simple when, triggers the error "condition should be a Column". Running 'statusBit:' in jsonDF.schema.simpleString() by itself returns True with the testdata I am using, but somehow I cant integrate it into the dataframe transformation.Thanks a lot for your help in advance.
edit: Applying the solution provided by PLTC has helped a lot, but I am still struggling to get this solution implemented in the when clause:
I try
.withColumn("STATUS_BIT", when(lit(df.schema.simplestring()).contains("statusBit") is True, col(statusBit)).otherwise(None))
but it tells me "condition should be a Column". So I added an extra colum called SCHEMA, which is equal to lit(df.schema.simpleString) and I used this column in the condition:
.withColumn("STATUS_BIT", when(col("SCHEMA").contains("statusBit"), col("StatusBit")).otherwise(None)
The problem is that if I run this with test data that does not contain "statusBit", I get the error "No such struct field statusBit in ...", which is obviously the opposite of what I wanted to achieve

jsonDF.schema.simpleString() is Python variable, you can use it in Python way
from pyspark.sql import functions as F
df.withColumn("STATUS_BIT", F.lit(df.schema.simpleString()).contains('statusBit:'))

Related

Array contains string from column in Athena

I'm trying to check if at least one element of my_array array contains a string from my_column column in athena. I want the end result to be a boolean.
I've tried :
contains(my_array, my_column)
It seems to be partially working and I actually can't understand why because sometimes I get false even though the element of the array and the string are matching, but I never get true if there's no match.
So I've also tried (as seen here : Presto array contains an element that likes some pattern )
cardinality(filter(my_array, x->x like my_column))>0
But same issue, ending with the exact same result as above.
I thought I had an issue with some values from my_column, but after checking all my problematic values it appears the strings from my_column are clean.
For instance I am able to find a match for the first value of my_array, but no match for the second, although both values are in my_column.
Could someone please help ?
Thank you so much !!

ERROR: function regexp_matches(jsonb, unknown) does not exist in Tableau but works elsewhere

I have a column called "Bakery Activity" whose values are all JSONs that look like this:
{"flavors": [
{"d4js95-1cc5-4asn-asb48-1a781aa83": "chocolate"},
{"dc45n-jnsa9i-83ysg-81d4d7fae": "peanutButter"}],
"degreesToCook": 375,
"ingredients": {
"d4js95-1cc5-4asn-asb48-1a781aa83": [
"1nemw49-b9s88e-4750-bty0-bei8smr1eb",
"98h9nd8-3mo3-baef-2fe682n48d29"]
},
"numOfPiesBaked": 1,
"numberOfSlicesCreated": 6
}
I'm trying to extract the number of pies baked with a regex function in Tableau. Specifically, this one:
REGEXP_EXTRACT([Bakery Activity], '"numOfPiesBaked":"?([^\n,}]*)')
However, when I try to throw this calculated field into my text table, I get an error saying:
ERROR: function regexp_matches(jsonb, unknown) does not exist;
Error while executing the query
Worth noting is that my data source is PostgreSQL, which Tableau regex functions support; not all of my entries have numOfPiesBaked in them; when I run this in a simulator I get the correct extraction (actually, I get "numOfPiesBaked": 1" but removing the field name is a problem for another time).
What might be causing this error?
In short: Wrong data type, wrong function, wrong approach.
REGEXP_EXTRACT is obviously an abstraction layer of your client (Tableau), which is translated to regexp_matches() for Postgres. But that function expects text input. Since there is no assignment cast for jsonb -> text (for good reasons) you have to add an explicit cast to make it work, like:
SELECT regexp_matches("Bakery Activity"::text, '"numOfPiesBaked":"?([^\n,}]*)')
(The second argument can be an untyped string literal, Postgres function type resolution can defer the suitable data type text.)
Modern versions of Postgres also have regexp_match() returning a single row (unlike regexp_matches), which would seem like the better translation.
But regular expressions are the wrong approach to begin with.
Use the simple json/jsonb operator ->>:
SELECT "Bakery Activity"->>'numOfPiesBaked';
Returns '1' in your example.
If you know the value to be a valid integer, you can cast it right away:
SELECT ("Bakery Activity"->>'numOfPiesBaked')::int;
I found an easier way to handle JSONB data in Tableau.
Firstly, make a calculated field from the JSONB field and convert the field to a string by using str([FIELD_name]) command.
Then, on the calculated field, make another calculated field and use function:
REGEXP_EXTRACT([String_Field_Name], '"Key_to_be_extracted":"?([^\n,}]*)')
The required key-value pair will form the second caluculated field.

Can't quote array - rails sql

Im trying to create a sql query dynamically with the following syntax:
Company.joins(:founder_persons)
.where("people.first_name like people[:first_name]", {people: {first_name: 'm%'}})
But running this on the rails console gives me TypeError: can't quote Array. Im guessing this is not how we use the where string? What's the right way to fix this error? Thanks.
One reason this error can occur is with a nested array used as SQL value.
Example:
Article.where(author: ['Jane', 'Bob'])
works, but:
Article.where(author: ['Jane', ['Bob']])
would give the error. A quick fix would be to run flatten on the array.
(Mentioning this since this page comes up when searching for the confusing error "Can't quote array".)
You could bind any value and then assign it, this way they should coincide in numbers, like:
Model.joins(:join_table)
.where('models.first_attribute LIKE ? AND models.second_attribute LIKE ?', value_for_first_attr, value_for_second_attr)
If using an array you should access each index you want to compare, or you can precede a splat *, and specify just one value, like:
Model.joins(:join_table)
.where('models.first_attribute LIKE ? AND models.second_attribute LIKE ?', *array_of_values)
Note although this way you're passing the "whole" array it should also coincide in size or numbers, otherwise it'd raise an ActiveRecord::PreparedStatementInvalid error depending if there are more or less elements than needed.

SSRS if field value in list

I've looked through a number of tutorials and asks, and haven't found a working solution to my problem.
Suppose my dataset has two columns: sort_order and field_value. sort_order is an integer and field_value is a numerical (10,2).
I want to format some rows as #,#0 and others as #,#0.00.
Normally I would just do
iif( fields!sort_order.value = 1 or fields!sort_order.value = 23 or .....
unfortunately, the list is fairly long.
I'd like to do the equivalent of if fields!sort_order.value in (1,2,21,63,78,...) then...)
As recommended in another post, I tried the following (if sort in list, then just output a 0, else a 1. this is just to test the functionality of the IN operator):
=iif( fields!sort_order.Value IN split("1,2,3,4,5,6,8,10,11,15,16,17,18,19,20,21,26,30,31,33,34,36,37,38,41,42,44,45,46,49,50,52,53,54,57,58,59,62,63,64,67,68,70,71,75,76,77,80,81,82,92,98,99,113,115,116,120,122,123,127,130,134,136,137,143,144,146,147,148,149,154,155,156,157,162,163,164,165,170,171,172,173,183,184,185,186,192,193,194,195,201,202,203,204,210,211,212,213,263",","),0,1)
However, it doesn't look like the SSRS expression editor wants to accept the "IN" operator. Which is strange, because all the examples I've found that solve this problem use the IN operator.
Any advice?
Try using IndexOf function:
=IIF(Array.IndexOf(split("1,2,3,4,...",","),fields!sort_order.Value)>-1,0,1)
Note all values must be inside quotations.
Consider the recommendation of #Jakub, I recommend this solution if
your are feeding your report via SP and you can't touch it.
Let me know if this helps.

How can I add conditions to my equation when using WorksheetFunction.MAX in VBA?

How can I add conditions to the WorkSheetFunction.Max() command in VBA.
I am working with two columns of data and would like to know the Max of column B, if column A equals 1.
I have tried a couple of different methods, and none seem to work.
Initially, I tried:
MaxTaskNumbers = WorksheetFunction.Max(IF(Worksheets(AnalysisPart1").Range("A:A") = 1, Worksheets("AnalysisPart1").Range("B:B")))
But I received the error
Compile error Expected: expression
When I replaced IF with IIF, I received the error
Argument not optional
Do you have any recommendations regarding how I can embed an IF statement in WorksheetFunction.Max?