Debugging u-sql Jobs - azure-data-lake

I would like to know if there are any tips and tricks to find error in data lake analytics jobs. The error message seems most of the time to be not very detailed.
When trying to extract from CSV file I often get error like this
Vertex failure triggered quick job abort. Vertex failed: SV1_Extract[0] with >error: Vertex user code error.
Vertex failed with a fail-fast error
It seems that these error occur when trying to convert the columns to specified types.
The technique I found is to extract all columns to string and then do a SELECT that will try to convert the columns to the expected type. Doing that columns by columns can help find the specific column in error.
#data =
EXTRACT ClientID string,
SendID string,
FromName string,
FROM "wasb://..."
USING Extractors.Csv();
//convert some columns to INT, condition to skip header
#clean =
SELECT Int32.Parse(ClientID) AS ClientID,
Int32.Parse(SendID) AS SendID,
FromName,
FROM #data
WHERE !ClientID.StartsWith("ClientID");
Is it also possible to use something like a TryParse to return null or default values in case of a parsing error, instead of the whole job failing?
Thanks

Here is a solution without having to use code behind (although Codebehind will make your code a bit more readable):
SELECT ((Func<string, Int32?>)(v => { Int32 res; return Int32.TryParse(v, out res)? (Int32?) res : (Int32?) null; }))(ClientID) AS ClientID
Also, the problem you see regarding error message being cryptic has to do with a bug that should be fixed soon in returning so called inner error messages. The work around today is to do the following:
In the ADL Tools for VisualStudio, open the Job View of the failed job.
In the lower left corner, click on “resources” link in the job detail area.
Once the job resources are loaded, click on “Profile”.
Search for the string “jobError” at the beginning of the line. Copy the entire line of text and paste in notepad (or other text editor) to read the actual error.
That should give you the exact error message.

Yes, you can use TryParse using U-SQL user defined functions. You can do this like:
In code behind:
namespace TestNS
{
public class TestClass
{
public static int TryConvertToInt(string s)
{
int i = 0;
if (Int32.TryParse(s, out i))
return i;
return 0;
}
}
}
In U-SQL Script:
TestNS.TestClass.TryConvertToInt(ClientID) AS clientID
It looks like you have some other issues, as I always get appropriate error in case of conversion problem, something like:
"E_RUNTIME_USER_EXTRACT_COLUMN_CONVERSION_INVALID_ERROR","message":"Invalid character when attempting to convert column data."

Related

SSIS value of variable not changing

I am new to SSIS . I am trying to use Script Task to get the last modified date and create date of a file. I have declared two variables to read the file path and file name (File_Path,Filename) in my script task as variables with scope as package and datatype as string.
I want to store the create date and modified date to two diff output variables(Create_Date,Last_Updated) with datatype as Datetime.
my code for the script is as follows
FileInfo fileInfo = new FileInfo(Path.Combine(Dts.Variables["File_Path"].Value.ToString(), Dts.Variables["Filename"].Value.ToString()));
if (fileInfo.Exists)
{
// Get file creation date
Dts.Variables["Create_Date"].Value = fileInfo.CreationTime;
// Get last modified date
Dts.Variables["Last_Updated"].Value = fileInfo.LastWriteTime;
}
else
{
Dts.Events.FireWarning(1, Dts.Variables["System::TaskName"].Value.ToString()
, string.Format("File '{0}' does not exist", fileInfo.FullName)
, "", 0);
}```
SSIS has a Design time and a Run time interface.
Variables are created in the Design time space. There you assign data type and a value. There is an explicit Variable's window that you do all of this with.
During run time, the Variables window will still be visible but the values there are not the run-time value. It's just a reference for what the package was initialized with. The actual values of SSIS variables are to be found in the debug windows. I favor the Locals window (Ctrl+Alt+V, L)
From there, expand the Variables node
You can also add explicit logging into your Script tasks. This little bit will enumerate through all the variables you selected for readonly or read/write access and pop off their name and value into the run log. If you're running in Visual Studio, it will show up in the Results tab or the Output window (great place to copy errors for further research or asking on forums). If you're running from the server, these will show in the SSISDB.catalog.operation_messages view (unless you picked an incompatible logging mode)
bool fireAgain = false;
string message = "{0}::{1} : {2}";
foreach (var item in Dts.Variables)
{
Dts.Events.FireInformation(0, "SCR Echo Back", string.Format(message, item.Namespace, item.Name, item.Value), string.Empty, 0, ref fireAgain);
}

Use antlr v4 for syntax check

Can I use antlr v4 for syntax check before I actually run the code?
Example :
I defined syntax: select * from table, I want to know the statement is correct or not before actually executing it.
Following is my code :
val listener = new SQLListener()
val loadLexer = new SQLLexer(new ANTLRInputStream(input))
val tokens = new CommonTokenStream(loadLexer)
val parser = new SQLParser(tokens)
val stat = parser.statement()
I tried but DefaultErrorStrategy won't throw an Exception
I tried this:
parser.addErrorListener(new BaseErrorListener {
override def syntaxError(recognizer: Recognizer[_, _ <: ATNSimulator],
offendingSymbol: scala.Any,
line: Int,
charPositionInLine: Int,
msg: String, e: RecognitionException ): Unit = {
println("==========2============"+msg)
throw new AssertionError("line: " + line + ", offset: " + charPositionInLine +
", symbol:" + offendingSymbol + " " + msg)
}
})
but get this:
Error: Note: the super classes of contain the following, non final members named syntaxError:
If the input contains any syntax errors, this will call the visitErrorNode method on the listener. So if you define that method in your listener, you'll see any errors that occur.
If your listener is directly executing the code (rather than first building an AST or other form of IR), you probably won't want your listener to even start executing when there's a syntax error. One way to achieve that would be to set the BailErrorStrategy instead of the DefaultErrorStrategy as the error handling strategy of your parser (using setErrorHandler on the parser). This will throw an exception as soon as a syntax error occurs.
If you don't want to abort on the first error and/or you want some additional checks beyond just syntax errors (like checking for certain types of semantic errors), an alternative is to have a listener just to perform those checks. Then you'd run your code-executing listener only if the error-checking listener does not find any errors.
You are on the right track here. Use your error listener to store the errors in a list while parsing. Afterwards you can then check that list.
That requires however not to do any action during the parsing process (e.g. in a parse listener) other than stuff related to the parsing process itself. Any follow up action (e.g. error markup in an editor) should be done after the parse run.
If you like to see an example of an application using this approach take a look at the parser module implementation of MySQL Workbench. It also demonstrates the 2-stage parsing strategy for quicker parsing.

how to get more info than generic "Failed to parse JSON: No active field found.; ParsedString returned false; Could not parse value" on BigQuery load?

We're trying BigQuery for the first time, with data extracted from mongo in json format. I kept getting this generic parse error upon loading the file. But then I tried a smaller subset of the file, 20 records, and it loaded fine. This tells me it's not the general structure of the file, which I had originally thought was the problem. Is there any way to get more info on the parse error, such as the string of the record that it's trying to parse when it has this error?
I also tried using the max errors field, but that didn't work either.
This was via the website. I also tried it via the Google Cloud SDK command line 'bq load...' and got the same error.
This error is most likely caused by some of the JSON records not compying with table schema. It is not clear whether you used schema autodetect feature, or you are supplying schema for the load. But here is one example where such error could happen:
{ "a" : "1" }
{ "a" : { "b" : "2" } }
If you only have a few of these and they are for invalid records - you can automatically ignore them by using max_bad_records option for load job. More details at: https://cloud.google.com/bigquery/docs/loading-data-cloud-storage-json

Knowledge & Connect PHP API, Found object(Account or Answer) but contains only null fields

I'm facing some strange issues when I try to fetch(Connect PHP API)/searchContent(Knowledge Foundation API) following the tutorials/documentations.
Behaviour and output
Following the documentation, we initialize the API. The function error_get_last() (called after the fetch) states that the core read-only file (we are not allowed to modify it) contains an error:
Array ( [type] => 8 [message] => Undefined index: REDIRECT_URL [file] => /cgi-bin/${interface_name}.cfg/scripts/cp/core/framework/3.2.4/init.php [line] => 246 )
After initialization, we call the fetch function to retrieve an account. If we give a wrong ID, it returns an error:
Invalid ID: No such Account with ID = 32
Otherwise, furnishing a correct ID returns an Account object with all fields populated as NULL:
object(RightNow\Connect\v1_2\Account)#22 (25) {
["ID"]=>
NULL
["LookupName"]=>
NULL
["CreatedTime"]=>
NULL
["UpdatedTime"]=>
NULL
["AccountHierarchy"]=>
NULL
["Attributes"]=>
NULL
["Country"]=>
NULL
["CustomFields"]=>
NULL
["DisplayName"]=>
NULL
["DisplayOrder"]=>
NULL
["EmailNotification"]=>
NULL
["Emails"]=>
NULL
["Login"]=>
NULL
/* [...] */
["StaffGroup"]=>
NULL
}
Attempts, workaround and troubleshooting information
Configuration: The account used using the InitConnectAPI() has the permissions
Initialization: Call to InitConnectAPI() not throwing any exception(added a try - catch block)
Call to the fetch function: As said above, the call to RNCPHP\Account::fetch($act_id) finds the account (invalid_id => error) but doesn't manage to populate the fields
No exception is thrown on the RNCPHP::fetch($correct_id) call
The behaviour is the same when I try to retrieve an answer following a sample example from the Knowledge Foundation API : $token = \RNCK::StartInteraction(...) ; \RNCK::searchContent($token, 'lorem ipsum');
Using PHP's SoapClient, I manage to retrieve populated objects. However, It's not part of the standard and a self-call-local-WebService is not a good practice.
Code reproducing the issue
error_reporting(E_ALL);
require_once(get_cfg_var('doc_root') . '/include/ConnectPHP/Connect_init.phph');
InitConnectAPI();
use RightNow\Connect\v1_2 as RNCPHP;
/* [...] */
try
{
$fetched_acct = RNCPHP\Account::fetch($correct_usr_id);
} catch ( \Exception $e)
{
echo ($e->getMessage());
}
// Dump part
echo ("<pre>");
var_dump($fetched_acct);
echo ("</pre>");
// The core's error on which I have no control
print_r(error_get_last());
Questions:
Have any of you face the same issue ? What is the workaround/fix which would help me solve it ?
According to the RNCPHP\Account::fetch($correct_usr_id) function behaviour, we can surmise that the issue comes from the 'fields populating' step which might be part of the core (on which I have no power). How am I supposed to deal with this (fetch is static and account doesn't seem abstract) ?
I tried to use the debug_backtrace() function in order to have some visibility on what may go wrong but it doesn't output relevant information. Is there any way I can get more debug information ?
Thanks in advance,
Oracle Service Cloud uses lazy loading to populate the object variables from queried data using Connect for PHP APIs. When you output the result of an object, it will appear as each variable is empty, per your example. However, if you access the parameter, then it becomes available. This is only an issue when you try to print your object, like this example. Accessing the data should be immediate.
To print your object, like in your example, you would need to iterate through the object variables and access each one first. You could build a helper class to do that through reflection. But, to illustrate with a single field, do the following:
$acct = RNCPHP\Account::fetch($correctId);
$acct->ID;
print_r($acct); // Will now "show" ID, but none of the other fields have been loaded.
In the real world, you probably just want to operate on the data. So, even though you cannot "see" the data in the object, it's there. In the example below, we're accessing the updated time of the account and then performing an action on the object if it meets a condition.
//Set to disabled if last updated < 90 days ago
$acct = RNCPHP\Account::fetch($correctId);
$chkDate = time() - 7776000;
if($acct->UpdatedTime < $chkDate){
$acct->Attributes->PermanentlyDisabled = true;
$acct->save(RNCPHP\RNObject::SuppressAll);
}
If you were to print_r the object after the if condition, then you would see the UpdatedTime variable data because it was loaded at the condition check.

Getting mapping error. After dragging table with xml fields into dbml file and then compiling

"Error 1 DBML1005: Mapping between DbType 'Xml' and Type 'System.Xml.Linq.XElement' in Column 'XML_LAYOUT' of Type 'QUEST_BLOCK' is not supported."
The above is the error am getting. What am doing is dragging a table with xml fields as columns from server explorer into a dbml file. After that when i compile i am getting the above error. Now after that i changed server datatype to blank. Now the program compiles successfully. But at runtime if i query the table directly using WCF in silverlight the function is showing error. After a debug i found that the select statement on the table is returning the rows in the funtiion, however the error is produced in the reference file in the following function.
Public Function EndGetQuestionListRecord1(ByVal result As System.IAsyncResult) As ServiceReference1.QUEST_BLOCK Implements ServiceReference1.Medex.EndGetQuestionListRecord1
Dim _args((0) - 1) As Object
Dim _result As ServiceReference1.QUEST_BLOCK = CType(MyBase.EndInvoke("GetQuestionListRecord1", _args, result),ServiceReference1.QUEST_BLOCK)
Return _result
End Function
Hope someone around here could resolve this error...
rideonscreen, recently I started getting the same type of error. In my case I get it dragging a stored procedure with a XML input parameter.
I wonder whether you managed to resolve the issue and how.
I googled and found some articles:
http://dev.techmachi.com/?p=319
http://www.west-wind.com/Weblog/posts/505990.aspx
http://www.jonathanjungman.com/blog/post/Visual-Studio-Build-failed-due-to-validation-errors-in-dbml-file.aspx
"devenv /resetskippkgs" helps, but next day the issue appears again.
What is also interesting that I do not touch the LINQ2SQL model (dbml file) at all. The code there is the same for a long time. The issues is definitely exclusively related to Visual Studio.
P.S. I am thinking to migrate to EF.