Text Mining - Token Converstion - text-mining

I'm trying to convert my text as token, but I keep getting this error
## Selection only the "Body" column to analize the text
email_body <- as.tibble(data_email_pp$body)
df <- email_body %>% unnest_tokens(word, email_body)
Error:
! Must extract column with a single valid subscript.
✖ Subscript var has the wrong type tbl_df<value:character>.
ℹ It must be numeric or character.
Backtrace:
email_body %>% unnest_tokens(word, email_body)
tidytext::unnest_tokens(., word, email_body)
dplyr:::pull.data.frame(tbl, !!input)
tidyselect::vars_pull(names(.data), !!enquo(var))
tidyselect:::pull_as_location2(loc, n, vars)
vctrs::vec_as_subscript2(i, arg = "var", logical = "error")
vctrs:::result_get(...)
Error:
✖ Subscript var has the wrong type tbl_df<value:character>.
ℹ It must be numeric or character.
glimpse(email_body)
Rows: 182
Columns: 1
$ value "Hi Natsumi,\nThank you for raising a request to CDS. Please note, this ticket is n…
Thanks

Related

Cannot coerce String { class: java.lang.String }

I am trying to send email from the flow and stored all the email addresses in the yaml file like below
# Email
email:
toEmail: "abc.123#gg.org,def.456#gg.org"
fromEmail: "ms-dev#gg.org"
ccAddress: "abc123#gmail.com"
I am trying use the above values in to the send email connector like
<email:send doc:name="Send" doc:id="fd09c56f-eaed-44c4-ab06-aa0417f2fdbf" config-ref="Email_SMTP" subject="Error with SOW integration between D365 and Salesforce " fromAddress='#[p("email.fromEmail")]' toAddresses='#[p("email.toEmail") splitBy ","]' ccAddresses='#[p("email.ccAddress")]'>
<email:body contentType="text/html">
<email:content ><![CDATA[#[vars.emailBody]]]></email:content>
</email:body>
</email:send>
But when debugging Iam getting the error like below
org.mule.runtime.core.internal.exception.OnErrorPropagateHandler:
********************************************************************************
Message : "Cannot coerce String { class: java.lang.String } ("abc123#gmail.com" as String {class: "java.lang.String"}) to Array" evaluating expression: "p("email.ccAddress")".
Element : salesforce-proc-SendEmail_Flow/processors/1 # salesforce-proc:salesforce-proc-implementation.xml:566 (Send)
Element DSL : <email:send doc:name="Send" doc:id="fd09c56f-eaed-44c4-ab06-aa0417f2fdbf" config-ref="Email_SMTP" subject="Error with SOW integration between D365 and Salesforce " fromAddress="#[p("email.fromEmail")]" toAddresses="#[p("email.toEmail") splitBy ","]" ccAddresses="#[p("email.ccAddress")]">
<email:body contentType="text/html">
<email:content><![CDATA[
#[vars.emailBody]
]]></email:content>
</email:body>
</email:send>
Error type : MULE:EXPRESSION
FlowStack : at salesforce-proc-SendEmail_Flow(salesforce-proc-SendEmail_Flow/processors/1 # salesforce-proc:salesforce-proc-implementation.xml:566 (Send))
at listener-flow(listener-flow/errorHandler/0/processors/2 # salesforce-proc:salesforce-proc-implementation.xml:547 (Flow Reference))\
After fixing ccAddresses error I get below error
caf9-11ec-b461-025041000001] org.mule.runtime.core.internal.exception.OnErrorPropagateHandler:
********************************************************************************
Message : Error while sending email: Exception reading response
Element : salesforce-proc-SendEmail_Flow/processors/1 # salesforce-proc:salesforce-proc-implementation.xml:566 (Send)
Element DSL : <email:send doc:name="Send" doc:id="fd09c56f-eaed-44c4-ab06-aa0417f2fdbf" config-ref="Email_SMTP" subject="Error with SOW integration between D365 and Salesforce " fromAddress="#[p("email.fromEmail")]" toAddresses="#[p("email.toEmail") splitBy ","]" ccAddresses="#[p("email.ccAddress") splitBy ","]">
<email:body contentType="text/html">
<email:content><![CDATA[
#[vars.emailBody]
]]></email:content>
</email:body>
</email:send>
Error type : EMAIL:SEND
FlowStack : at salesforce-proc-SendEmail_Flow(salesforce-proc-SendEmail_Flow/processors/1 # salesforce-proc:salesforce-proc-implementation.xml:566 (Send))
at listener-flow(listener-flow/errorHandler/0/processors/2 # salesforce-proc:salesforce-proc-implementation.xml:547 (Flow Reference))
Can anyone please suggest what is that I am missing here.
The problem is that ccAddresses expects an array. Because you are using an expression to configure that attribute you need to convert the string value from the configuration file explicitly to an array, for example using the splitBy() function as you did in toAddresses.
Or if you will only use one address simply remove the expression and use a property placeholder (ccAddresses="${email.ccAddress}").

Parse a web activity error message into a synapse field

I have been trying to log an error from a web activity (POST method) into a field in a synapse table. The problem is, there are some special characters in the message key string like:
{
"value": [
{
"id": "",
"runId": "",
"debugRunId": ,
"runGroupId": "",
"pipelineName": "my_dynamic_pipeline_name",
"parameters": {
"region_code": "",
"data_start_date": "",
"data_end_date": "",
"etl_insert_batch_id": "",
"pipeline_subject_area": "",
"type_of_request": "",
"pipeline_name": "",
"pipeline_requested_by": "",
"debug": "",
"cdmloadtype": ""
},
"invokedBy": {
"id": "",
"name": "",
"invokedByType": ""
},
"runStart": "",
"runEnd": "",
"durationInMs": ,
"status": "",
"message": "Operation on target my_dynamic_pipeline_name failed: Operation on target my_dynamic_dataflow_name failed: {\"StatusCode\":\"DFExecutorUserError\",\"Message\":\"Job failed due to reason: at Sink 'SinkutilFailedDummy': java.sql.BatchUpdateException: Execution Status - FAILED ;Error number - 15165 ;Pipeline name - my_dynamic_pipeline_name; Stored procedure name - my_stored_proc_name ; Error step - Step 3: Key Hash and Util Type Hash Generation ; Insert batch ID - 1816 ; Error Message - Could not find object 'my object' or you do not have permission.\",\"Details\":\"java.sql.BatchUpdateException: Execution Status - FAILED ;Error number - 15165 ;Pipeline name - my_dynamic_pipeline_name; Stored procedure name - my_stored_proc_name ; Error step - Step 3: Key Hash and Util Type Hash Generation ; Insert batch ID - 1816 ; Error Message - Could not find object 'my_object' or you do not have permission.\\n\\tat shaded.msdataflow.com.microsoft.sqlserver.jdbc.SQLServerStatement.executeBatch(SQLServerStatement.java:1845)\\n\\tat com.microsoft.dataflow.transformers.store.JDBCWriter.executeBatchSQLs(JDBCStore.scala:462)\\n\\tat com.microsoft.dataflow.transformers.store.JDBCWriter.executeSQL(JDBCStore.scala:440)\\n\\tat com.microsoft.dataflow.transformers.store.JDBCWriter$$anonfun$executeTableOpAndPostSQLAndDDL$2.apply$mcV$sp(JDBCStore.scala:494)\\n\\tat com.microsoft.dataflow.transformers.store.JDBCWriter$$anonfun$executeTableOpAndPostSQLAndDDL$2.apply(JDBCStore.scala:494)\\n\\tat com.microsoft.dataflow.transformers.store.JDBCWriter$$anonfun$executeTableOpAndPostS\"}",
...
}
So I can filter down the output with:
#activity('pingPL').output.value[0].message
but there are {} and $ special characters that the Data Flow expression is trying to evaluate.
I already try to use replace or string functions in the pipeline expression or in the dataflow expression without success.
Is there a way to parse this as a string or get to the Message key?, something like:
#activity('pingPL').output.value[0].message*.failed*.failed.Message
Update:
This seems to be working:
#json(split(activity('pingPL').output.value[0].message, 'failed: ')[2]).Message
I can split by failed: and the index 2 will give me the error logs within the {...}. I can parse that as a json and use the Message key. It is working but it is not the ideal dynamic solution since the error message wouldn't have always the same structure.
Got a solution using substring and indexof to extract the {...} info:
substring(activity('pingPL').output.value[0].message,indexof(activity('pingPL').output.value[0].message,'{'),sub(indexof(activity('pingPL').output.value[0].message,'}'),sub(indexof(activity('pingPL').output.value[0].message,'{'),1)))
Getting this string as the output:
{\"StatusCode\":\"DFExecutorUserError\",\"Message\":\"Job failed due to reason: at Sink 'SinkutilFailedDummy': java.sql.BatchUpdateException: Execution Status - FAILED ;Error number - 15165 ;Pipeline name - my_dynamic_pipeline_name; Stored procedure name - my_stored_proc_name ; Error step - Step 3: Key Hash and Util Type Hash Generation ; Insert batch ID - 1816 ; Error Message - Could not find object 'my object' or you do not have permission.\",\"Details\":\"java.sql.BatchUpdateException: Execution Status - FAILED ;Error number - 15165 ;Pipeline name - my_dynamic_pipeline_name; Stored procedure name - my_stored_proc_name ; Error step - Step 3: Key Hash and Util Type Hash Generation ; Insert batch ID - 1816 ; Error Message - Could not find object 'my_object' or you do not have permission.\\n\\tat shaded.msdataflow.com.microsoft.sqlserver.jdbc.SQLServerStatement.executeBatch(SQLServerStatement.java:1845)\\n\\tat com.microsoft.dataflow.transformers.store.JDBCWriter.executeBatchSQLs(JDBCStore.scala:462)\\n\\tat com.microsoft.dataflow.transformers.store.JDBCWriter.executeSQL(JDBCStore.scala:440)\\n\\tat com.microsoft.dataflow.transformers.store.JDBCWriter$$anonfun$executeTableOpAndPostSQLAndDDL$2.apply$mcV$sp(JDBCStore.scala:494)\\n\\tat com.microsoft.dataflow.transformers.store.JDBCWriter$$anonfun$executeTableOpAndPostSQLAndDDL$2.apply(JDBCStore.scala:494)\\n\\tat com.microsoft.dataflow.transformers.store.JDBCWriter$$anonfun$executeTableOpAndPostS\"}
Then I used json expression to extract the key message:
json('extracted string').message
Then use replace to remove the single quotations ' to avoid a sql error.
This is the final expression I got to extract the error message:
#replace(json(substring(activity('pingPL').output.value[0].message,indexof(activity('pingPL').output.value[0].message,'{'),sub(indexof(activity('pingPL').output.value[0].message,'}'),sub(indexof(activity('pingPL').output.value[0].message,'{'),1)))).message,'''','-')

Error is coming while executing Apache pig script with macro option

I am learning pig myself, I am stuck with one error.
This is written in macro script to understand modular programming in Pig. the issue is that I am trying to use string parameters with $ by referencing relation name :: then immediately parameter string substitution with $. The error is:
unexpected character $
How to go ahead on this?
define dividend_analysis (daily, year, daily_symbol, daily_open, daily_close)
returns analyzed {
divs = load '/PigData1/NYSE_dividends.txt'as (exchange:chararray, symbol:chararray, date:chararray, dividends:float);
divisthisyear = filter divs by date matches '$year-.*';
dailythisyear = filter $daily by date matches '$year-.*';
jnd = join divisthisyear by symbol, dailythisyear by $daily_symbol;
$analyzed = foreach jnd generate dailythisyear.$daily_symbol,$daily_close - $daily_open;
};
daily = load '/PigData1/NYSE_daily.txt'as (exchange:chararray, symbol:chararray, date:chararray, open:float, high:float, low:float, close:float, volume:int, adj_close:float);
results = dividend_analysis(daily, '2009', 'symbol', 'open', 'close');
the error message is from pig is ...
<line 7, column 53> Unexpected character '$'
2016-12-09 16:14:04,283 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: <line 7, column 53> Unexpected character '$'
Details at logfile: /home/hadoop/pig_1481278051535.log
grunt> exec macro1.pig
<line 7, column 53> Unexpected character '$'
2016-12-09 17:00:37,723 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: <line 7, column 53> Unexpected character '$'
Details at logfile: /home/hadoop/pig_1481278051535.log
Here is change made for your script , try this one
define dividend_analysis (daily, year, daily_symbol, daily_open, daily_close)
returns analyzed {
divs = load '/user/data/NYSE_dividends'as (exchange:chararray, symbol:chararray, date:chararray, dividends:float);
divisthisyear = filter divs by date matches '.*$year.*';
dailythisyear = filter $daily by date matches '.*$year.*';
jnd = join divisthisyear by symbol, dailythisyear by $daily_symbol;
$analyzed = foreach jnd generate $1 ,$daily_close - $daily_open; };
daily = load '/user/data/NYSE_daily'as (exchange:chararray, symbol:chararray, date:chararray, open:float, high:float, low:float, close:float, volume:int, adj_close:float);
results = dividend_analysis(daily, '2009', 'symbol', 'open', 'close');

How can I signal parsing errors with LPeg?

I'm writing an LPeg-based parser. How can I make it so a parsing error returns nil, errmsg?
I know I can use error(), but as far as I know that creates a normal error, not nil, errmsg.
The code is pretty long, but the relevant part is this:
local eof = lpeg.P(-1)
local nl = (lpeg.P "\r")^-1 * lpeg.P "\n" + lpeg.P "\\n" + eof -- \r for winblows compat
local nlnoeof = (lpeg.P "\r")^-1 * lpeg.P "\n" + lpeg.P "\\n"
local ws = lpeg.S(" \t")
local inlineComment = lpeg.P("`") * (1 - (lpeg.S("`") + nl * nl)) ^ 0 * lpeg.P("`")
local wsc = ws + inlineComment -- comments count as whitespace
local backslashEscaped
= lpeg.P("\\ ") / " " -- escaped spaces
+ lpeg.P("\\\\") / "\\" -- escaped escape character
+ lpeg.P("\\#") / "#"
+ lpeg.P("\\>") / ">"
+ lpeg.P("\\`") / "`"
+ lpeg.P("\\n") -- \\n newlines count as backslash escaped
+ lpeg.P("\\") * lpeg.P(function(_, i)
error("Unknown backslash escape at position " .. i) -- this error() is what I wanna get rid of.
end)
local Line = lpeg.C((wsc + (backslashEscaped + 1 - nl))^0) / function(x) return x end * nl * lpeg.Cp()
I want Line:match(...) to return nil, errmsg when there's an invalid escape.
LPeg itself doesn't provide specific functions to help you with error reporting. A quick fix to your problem would be to make a protected call (pcall) to match like this:
local function parse(text)
local ok, result = pcall(function () return Line:match(text) end)
if ok then
return result
else
-- `result` will contain the error thrown. If it is a string
-- Lua will add additional information to it (filename and line number).
-- If you do not want this, throw a table instead like `{ msg = "error" }`
-- and access the message using `result.msg`
return nil, result
end
end
However, this will also catch any other error, which you probably don't want. A better solution would be to use LPegLabel instead. LPegLabel is an extension of LPeg that adds support for labeled failures. Just replace require"lpeg" with require"lpeglabel" and then use lpeg.T(L) to throw labels where L is an integer from 1-255 (0 is used for regular PEG failures).
local unknown_escape = 1
local backslashEscaped = ... + lpeg.P("\\") * lpeg.T(unknown_escape)
Now Line:match(...) will return nil, label, suffix if there is a label thrown (suffix is the remaining unprocessed input, which you can use to compute for the error position via its length). With this, you can print out the appropriate error message based on the label. For more complex grammars, you would probably want a more systematic way of mapping the error labels and messages. Please check the documentation found in the readme of the LPegLabel repository to see examples of how one may do so.
LPegLabel also allows you to catch the labels in the grammar by the way (via labeled choice); this is useful for implementing things like error recovery. For more information on labeled failures and examples, please check the documentation.

python3 - data type of input()

I need to assign a user-provided integer value to an object. My format is as follows:
object = input("Please enter an integer")
The following print tests...
print(type(object))
print(object)
...return <class 'str'> and '1'. Is there a way to set the data type of 'object' to the data type of the user's input value? IOW, such that if object=1, type(object)=int? I know I can set the data type of an input to int using the following:
object = int(input("Please enter an integer"))
In this case, if the user does not provide an int, the console throws a traceback error and the program crashes. I would prefer to test whether the object is in fact an int; if not, use my program to print an error statement and recursively throw the previous prompt.
while True:
try:
object = int(input("Please enter an integer"))
break
except ValueError:
print("Invalid input. Please try again")
print(type(object))
print(object)
You can always catch a "traceback error" and substitute your own error handling.
user_input = input("Please enter an integer")
try:
user_number = int(user_input)
except ValueError:
print("You didn't enter an integer!")