Format of dates in log files PDI / Kitchen 4.0.1 - pentaho

Inherited a set of jobs, and the logging to the filesystem begins with format {SEV} MM-dd HH:MM:SS, where I need to have the year as part of the timestamp.
The only log4j configs I can find are part of an old Jasper install, and modifying them to use log4j.appender.fileout.layout.conversionPattern=%d{yyyy-MM-dd} instead of ISO8601 as a test seems to have no effect.
Where else could the log line format be defined?

In Data Integration 4.2.1:
Index: src/log4j.xml
===================================================================
--- src/log4j.xml (revision 16273)
+++ src/log4j.xml (working copy)
## -32,7 +32,7 ##
I imagine it gets cached and reused throughout the life of the application).
-->
-
+
Index: src-core/org/pentaho/di/core/logging/LogWriter.java
===================================================================
--- src-core/org/pentaho/di/core/logging/LogWriter.java (revision 16273)
+++ src-core/org/pentaho/di/core/logging/LogWriter.java (working copy)
## -101,7 +101,7 ##
// Play it safe, if another console appender exists for org.pentaho, don't add another one...
//
if (!consoleAppenderFound) {
- Layout patternLayout = new PatternLayout("%-5p %d{dd-MM HH:mm:ss,SSS} - %m%n");
+ Layout patternLayout = new PatternLayout("%-5p %d{yyyy-MM-dd HH:mm:ss,SSS} - %m%n");
ConsoleAppender consoleAppender = new ConsoleAppender(patternLayout);
consoleAppender.setName(STRING_PENTAHO_DI_CONSOLE_APPENDER);
pentahoLogger.addAppender(consoleAppender);

Related

How to log in hibernate which part of the code caused a given SQL

We can turn on all of the SQL related logging with the following settings in spring:
spring.jpa.properties.hibernate.show_sql=true
spring.jpa.properties.hibernate.use_sql_comments=true
spring.jpa.properties.hibernate.format_sql=true
logging.level.org.hibernate.type=trace
If we have a standalone hibernate/springdata command like
myEntityRepository.save(myEntity);
OR
enityManager.persist(myEntity);
then it is easy to debug what happened just by reading the generated SQL from the log.
But, how would you debug when there isn't any explicit ORM action like here:
#Transactional
void doHundredOfTask(Long id){
MyEntity myEntity = myEntityRepository.findById(id);
// here comes ton of action on the entity like settings field,setting/adding to collection
// myEntity.setField1()..
//myEntity.setField2()
// ....
// myEntity.setField_N()
// myEntity.getSomeList.get(0).setSomeField()
// no ORM action
}
At the end we don't explicitly save anything but after the transaction hibernate will flush the changes, hence a massive amount of SQL will occur in the log. If you have a ton of action on the entity and on it's associations then it is extremly hard to debug why a given SQL was triggered.
Is there a way to assign the generated SQL to the triggering code in the log?
edit: Right know all I can do is splitting up the code to smaller chunks / or commenting out some part of it. But this process is slow..
p6spy can print a stacktrace for each executed SQL statement. Here is configuration to enable this: stacktrace=true.
How to configure p6spy for maven project:
Add p6spy dependency
<dependency>
<groupId>p6spy</groupId>
<artifactId>p6spy</artifactId>
<version>3.9.1</version>
</dependency>
Wrap the jdbc connection with p6spy:
spring.datasource.url=jdbc:p6spy:mysql://localhost:3306/xxx
spring.datasource.driver-class-name=com.p6spy.engine.spy.P6SpyDriver
Add spy.properties config src/main/resources/spy.properties
stacktrace=true
appender=com.p6spy.engine.spy.appender.Slf4JLogger
logMessageFormat=com.p6spy.engine.spy.appender.MultiLineFormat
You can remove the properties bellow:
spring.jpa.properties.hibernate.show_sql=true
spring.jpa.properties.hibernate.use_sql_comments=true
spring.jpa.properties.hibernate.format_sql=true
With this configuration, p6spy will output SQL and the stacktrace. E.g.:
select x0_.id as id1_7_ from X x0_
15:10:16.166 default [main] INFO c.p.e.spy.appender.Slf4JLogger[logException]-39 -
java.lang.Exception: null
at com.p6spy.engine.common.P6LogQuery.doLog(P6LogQuery.java:126)
...
at org.hibernate.loader.Loader.getResultSet(Loader.java:2341)
at org.hibernate.loader.Loader.executeQueryStatement(Loader.java:2094)
...
at com.springapp.Test.test(Test.java:36)
...

Beam Job Creates BigQuery Table but Does Not Insert

I am writing a beam job that is a simple 1:1 ETL from a binary protobuf file stored in GCS into BigQuery. The table schema is quite large, and generated automatically from a representative protobuf.
I am encountering behavior where the BigQuery table is created successfully, but no records are inserted. I have confirmed that records are being generated by the earlier stage, and when I use a normal file sink I can confirm that records are written.
Does anyone know why this is happening?
Logs:
WARNING:root:Inferring Schema...
WARNING:root:Unable to find default credentials to use: The Application Default Credentials are not available. They are available if running in Google Compute Engine. Otherwise, the environment variable GOOGLE_APPLICATION_CREDENTIALS must be defined pointing to a file defining the credentials. See https://developers.google.com/accounts/docs/application-default-credentials for more information.
Connecting anonymously.
WARNING:root:Defining Beam Pipeline...
<PATH REDACTED>/venv/lib/python3.7/site-packages/apache_beam/io/gcp/bigquery.py:1145: BeamDeprecationWarning: options is deprecated since First stable release. References to <pipeline>.options will not be supported
experiments = p.options.view_as(DebugOptions).experiments or []
WARNING:root:Running Beam Pipeline...
WARNING:root:extracted {'counters': [MetricResult(key=MetricKey(step=extract_games, metric=MetricName(namespace=__main__.ExtractGameProtobuf, name=extracted_games), labels={}), committed=8, attempted=8)], 'distributions': [], 'gauges': []} games
Pipeline Source:
def main(args):
DEFAULT_REPLAY_IDS_PATH = "./replay_ids.txt"
DEFAULT_BQ_TABLE_OUT = "<PROJECT REDACTED>:<DATASET REDACTED>.games"
# configure logging
logging.basicConfig(level=logging.WARNING)
# set up replay source
replay_source = ETLReplayRemoteSource.default()
# TODO: load the example replay and parse schema
logging.warning("Inferring Schema...")
sample_replay = replay_source.load_replay(DEFAULT_REPLAY_IDS[0])
game_schema = ProtobufToBigQuerySchemaGenerator(
sample_replay.analysis.DESCRIPTOR).schema()
# print("GAME SCHEMA:\n{}".format(game_schema)) # DEBUG
# submit beam job that reads replays into bigquery
def count_ones(word_ones):
(word, ones) = word_ones
return (word, sum(ones))
with beam.Pipeline(options=PipelineOptions()) as p:
logging.warning("Defining Beam Pipeline...")
# replay_ids = p | "create_replay_ids" >> beam.Create(DEFAULT_REPLAY_IDS)
(p | "read_replay_ids" >> beam.io.ReadFromText(DEFAULT_REPLAY_IDS_PATH)
| "extract_games" >> beam.ParDo(ExtractGameProtobuf())
| "write_out_bq" >> WriteToBigQuery(
DEFAULT_BQ_TABLE_OUT,
schema=game_schema,
write_disposition=BigQueryDisposition.WRITE_APPEND,
create_disposition=BigQueryDisposition.CREATE_IF_NEEDED)
)
logging.warning("Running Beam Pipeline...")
result = p.run()
result.wait_until_finish()
n_extracted = result.metrics().query(
MetricsFilter().with_name('extracted_games'))
logging.warning("extracted {} games".format(n_extracted))

Pentaho Data Integration: Error Handling

I'm building out an ETL process with Pentaho Data Integration (CE) and I'm trying to operationalize my Transformations and Jobs so that they'll be able to be monitored. Specifically, I want to be able to catch any errors and then send them to an error reporting service like Honeybadger or New Relic. I understand how to do row-level error reporting but I don't see a way to do job or transaction failure reporting.
Here is an example job.
The down path is where the transformation succeeds but has row errors. There we can just filter the results and log them.
The path to the right is the case where the transformation fails all-together (e.g. DB credentials are wrong). This is where I'm having trouble: I can't figure out how to get the error info to be sent.
How do I capture transformation failures to be logged?
You can not capture job-level errors details inside the job itself.
However there are other options for monitoring.
First option is using database logging for transformations or jobs (see the "Log" tab in the job/trans parameters dialog) - this way you always have up-to-date information about the execution status so you can, say, write a job that periodically scans the logging database and sends error reports wherever you need.
Meanwhile this option seems to be something pretty heavy-weight for development and support and not too flexible for further modifications. So in our company we ended up with monitoring on a job-execution level - i.e. when you run a job with kitchen.bat and it fails by any reason you get an "error" status of execution of the kitchen, so you can easily examine it and perform necessary actions with whenever tools you'd like - .bat commands, PowerShell or (in our case) Jenkins CI.
You could use the writeToLog("e", "Message") function in the Modified Java Script step.
Documentation:
// Writes a string to the defined Kettle Log.
//
// Usage:
// writeToLog(var);
// 1: String - The Message which should be written to
// the Kettle Debug Log
//
// writeToLog(var,var);
// 1: String - The Type of the Log
// d - Debug
// l - Detailed
// e - Error
// m - Minimal
// r - RowLevel
//
// 2: String - The Message which should be written to
// the Kettle Log

Spark execution occasionally gets stuck at mapPartitions at Exchange.scala:44

I am running a Spark job on a two node standalone cluster (v 1.0.1).
Spark execution often gets stuck at the task mapPartitions at Exchange.scala:44.
This happens at the final stage of my job in a call to saveAsTextFile (as I expect from Spark's lazy execution).
It is hard to diagnose the problem because I never experience it in local mode with local IO paths, and occasionally the job on the cluster does complete as expected with the correct output (same output as with local mode).
This seems possibly related to reading from s3 (of a ~170MB file) immediately prior, as I see the following logging in the console:
DEBUG NativeS3FileSystem - getFileStatus returning 'file' for key '[PATH_REMOVED].avro'
INFO FileInputFormat - Total input paths to process : 1
DEBUG FileInputFormat - Total # of splits: 3
...
INFO DAGScheduler - Submitting 3 missing tasks from Stage 32 (MapPartitionsRDD[96] at mapPartitions at Exchange.scala:44)
DEBUG DAGScheduler - New pending tasks: Set(ShuffleMapTask(32, 0), ShuffleMapTask(32, 1), ShuffleMapTask(32, 2))
The last logging I see before the task apparently hangs/gets stuck is:
INFO NativeS3FileSystem: INFO NativeS3FileSystem: Opening key '[PATH_REMOVED].avro' for reading at position '67108864'
Has anyone else experience non-deterministic problems related to reading from s3 in Spark?

log4php - Change log file Name dynamically in log4php.properties

hi how can i change the log file name and path in log4php.properties dynamically
log4php.appender.A8.File=../logs/logs.log
Thanks
2 useful pieces of information:
(1) The previous answer by user367134 is helpful, however it has a bug: when setting the level you should not set it to the constant integer value denoted by LoggerLevel::DEBUG. You should instead make use of the LoggerLevel::toLevel() function to obtain a LoggerLevel object.
i.e.,
$rootlogger->setLevel(LoggerLevel::DEBUG);
Should instead be:
$rootlogger->setLevel(LoggerLevel::toLevel(LoggerLevel::DEBUG));
(2) Here is a similar example to the one above, with a few differences:
uses rolling log files (max size of each log file is 100MB and at most 10 are kept)
uses a custom pattern for the log lines
fixes the setLevel bug
sets the log level at INFO
The code:
$rootlogger = Logger::getRootLogger();
$rootlogger->setLevel(LoggerLevel::toLevel(LoggerLevel::INFO));
$appender = new LoggerAppenderRollingFile("MyAppender");
$appender->setFile("custom_name.log", true);
$appender->setMaxBackupIndex(10);
$appender->setMaxFileSize("100MB");
$appenderlayout = new LoggerLayoutPattern();
$pattern = '%d{Y-m-d H:i:s} [%p] %c: %m (at %F line %L)%n';
$appenderlayout->setConversionPattern($pattern);
$appender->setLayout($appenderlayout);
$appender->activateOptions();
$rootlogger->removeAllAppenders();
$rootlogger->addAppender($appender);
$rootlogger->info("info");
Well its not my code, But here is the sample code and link to the site
require_once('log4php/Logger.php');
$rootlogger = Logger::getRootLogger();
$rootlogger->setLevel(LoggerLevel::DEBUG);
$appender = new LoggerAppenderFile("MyAppender");
$appender->setFile("mylogfile.log", true);
$appenderlayout = new LoggerLayoutTTCC();
$appender->setLayout($appenderlayout);
$appender->activateOptions();
$rootlogger->removeAllAppenders();
$rootlogger->addAppender($appender);
$rootlogger->info("info");
$rootlogger->error("error");
$rootlogger->debug("debug");
Actual Site Link
Credit goes to "AKJOL"