Issue with executing spark sql job using oozie action - apache-spark-sql

Facing a weird issue, trying to execute a spark-sql(Spark2) job using oozie action but the behavior of execution is quite weird, at times it executes fine but sometimes it continues to be in "Running" state forever, on checking the logs got the below issue.
WARN org.apache.spark.scheduler.cluster.YarnClusterScheduler` - Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
The strange thing is that we have already provided sufficient resources, the same can be seen from spark environment variables as well and as well under the cluster resources(cluster has sufficient cores and RAM).
<spark-opts>--executor-memory 10G --num-executors 7 --executor-cores 3 --driver-memory 8G --driver-cores 2</spark-opts>
With the same configuration sometimes it is executing fine as well. Are we missing something?

The issue was related to jar conflict,following are the suggestions to identify the same.
a)Check the maven dependency tree to make sure there is no transitive dependency conflict.
b)While spark job is running check the environment variables being used using Spark UI.
c)Resolve the conflict and run a maven clean package.

Related

testcontainers-python hanging while showing "waiting to be ready...", then fails

I'm running my unit testing code for neo4j.
My environment:
Ubuntu 20.04LTS server
1Gb Memory
1CPU
Here is what is displayed in the console:
====================================== test session starts ======================================
platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0
rootdir: ~/morsvq, configfile: pytest.ini
plugins: mock-3.8.2
collected 2 items
---------------------------------------- live log setup -----------------------------------------
INFO testcontainers.core.container:container.py:52 Pulling image neo4j:latest
INFO testcontainers.core.container:container.py:63 Container started: ad7963ed01
INFO testcontainers.core.waiting_utils:waiting_utils.py:46 Waiting to be ready...
INFO testcontainers.core.waiting_utils:waiting_utils.py:46 Waiting to be ready...
ERROR neo4j:__init__.py:571 Failed to read from defunct connection IPv4Address(('localhost', 49153)) (IPv4Address(('127.0.0.1', 49153)))
The same code runs successfully on a faster virtual machine with 8Gb Memory. So the code itself shouldn't be faulty. My suspision is that there is something to do with my configuration, so that it now consumes to much memory?
I've checked the official websites' documentation, but it doesn't mention the memory problem. I wonder if someone has encountered similar problem? How to fix this?
Disclaimer: I am a maintainer of tc-java, so I have only some basic experience with tc-python. However, some facts and constraints are universal across Testcontainers language implementations.
As you already wrote, the code runs fine on a more powerful machine, while it fails on an extremely limited machine. 1GB of RAM is not much, I would expect it is generally not enough to successfully start a Neo4j Docker container without memory swapping. Swapping would make the startup and interactions very slow, hence the startup timeout triggers.
For further debugging, you can run the Neo4j container directly using Docker CLI on your environment and see how it behaves.

How to have dbt run with log error FileNotFound to trigger an exit code that isn't 0?

We're running dbt version 0.16.1. We've set up our data pipeline to run in Airflow, and have a library set up to map each dbt model run within it's own bash operator on Airflow.
The dbt run command executed is as follows:
cd /usr/local/airflow/models/[PACKAGE_NAME] && dbt --log-format json run --models [MODEL_NAME]--no-version-check --profiles-dir=/usr/local/airflow/dags/dags-enterprise-model/enterprise_model/include --target=[TARGET] --profile=[PROFILE]
Occasionally (likely when two models are being run at the same time), Airflow will show the following message from within the dbt run command:
INFO - FileNotFoundError: [Errno 2] No such file or directory: 'logs/dbt.log' -> 'logs/dbt.log.1'
This is problematic because the logfiles do not get updated, but the exit code of the task is listed a 0:
Command exited with return code 0
This causes Airflow to mark the task as a success; however, the log wasn't printed successfully.
My questions:
Is there a a way for these errors to be raised as an actual error?
Failing that, is there a way to specific a unique log file?
I'm not sure if this is a gap in my understand, a bug within dbt's logging, or maybe both?
It definitely sounds like this is the result of invoking dbt multiple times simultaneously, while having it write to the same files. It's not a dbt bug because we don't intend for dbt to be invoked simultaneously; a single invocation can handle concurrent model runs via threads. Log collisions are one risk of reimplementing dbt's model DAG as Airflow DAGs.
Those are both fair questions:
Historically, dbt only used two log levels: debug and info. See the comment on a related issue: dbt#2680. I totally appreciate that Airflow and other orchestration tools have well defined notification behaviors when presented with different log levels. A community member actually just opened a PR to add error-level logging (dbt#2723).
It is possible to set a custom log path for a dbt invocation using the log-path config in dbt_project.yml (docs)

I'm having trouble with extended entities

This question is related to I need help upgrading OroCommerce to 4.1.1.
I'm getting several errors related to extended entities... I believe there must be something wrong with cache building but I can't find the root cause (nor a solution :( ).
I checked the db structure in my production server against the VM where everything is working just fine and I can't see any significant difference (meaning the new fields such as digitalAsset_id for oro_attachment_file table or wysiwyg for oro_fallback_localization_val are there).
I just run an extra php bin/console oro:migration:load --force -e prod it didn't make a difference...
Edit:
Just checked the differences in the var/cache directory of both installations and in fact I see that the VM version has the methods that are missing from the prod one.
I uploaded the working code into the production server and re run the platform upgrade but I'm still running into issues.
In case oro:migration:load command (or oro:platform:update that actually triggers migration load) failed for the first time, you have to:
fix errors,
restore from the database dump
and run the command again.
Otherwise, there could be migrations that end up with errors,
but on the second run, they are not executed again, which could lead to the mess with the database schema, entity metadata, or entity config.
Also oro:migration:load command is not self-sufficient. There could be a need to warm up some entity configuration after the schema change. Please, try to run oro:platform:update, even if all the migrations are already executed, it would try to warm up all the caches and could fix an error.

Use of Enable blocking in PDI - Pig Script Executor

I am exploring Big data plugin in Pentaho 5.2. I was trying to run Pig Script executor. I am unable to understand the usage of
Enabling Blocking. The PDI documentation says that
If checked, the Pig Script Executor job entry will prevent downstream
entries from executing until the script has finished processing.
I am aware that running a pig script will convert the execution to Map reduce jobs. I am running the job with Start job -> Pig Script. If I disable the Enable blocking step I am unable to execute the script. I am getting permission denied errors. As per the documentation " ".
What does downstream mean here. I do not pass any hops from the pig script out. I am unable to understand the Enable blocking step. Any hints can be helpful and will be appreciated.
Enable blocking: the task is deployed to the Hadoop cluster; PDI will follow up on progress and only proceed with the rest of the job tasks AFTER the execution of the Hadoop job finishes;
Enable blocking is disabled: PDI deploys the task to the Hadoop cluster and forgets about it. The rest of the job tasks proceed immediately after the cluster accepts the task, but doesn't wait for it to complete.

Weblogic forces recompile of EJBs when migrating from 9.2.1 to 9.2.3

I have a few EJBs compiled with Weblogic's EJBC complient with Weblogic 9.2.1.
Our customer uses Weblogic 9.2.3.
During server start Weblogic gives the following message:
<BEA-010087> <The EJB deployment named: YYY.jar is being recompiled within the WebLogic Server. Please consult the server logs if there are any errors. It is also possible to run weblogic.appc as a stand-alone tool to generate the required classes. The generated source files will be placed in .....>
Consequently, server start takes 1.5 hours instead of 20 min. The next server start takes exactly the same time, meaning Weblogic does not cache the products of the recompilation. Needless to say, we cannot recompile all our EJBs to 9.2.3 just for this specific customer, so we need an on-site solution.
My questions are:
1. Is there any way of telling Weblogic to leave those EJB jars as they are and avoid the re-compilation during server start?
2. Can I tell Weblogic to cache the recompiled EJBs to avoid prolonged restarts?
Our current workaround was to write a script that does this recompilation manually before the EAR's creation and deployment (by simply running java weblogic.appc <jar-name>), but we would rather avoid this solution being used in production.
I FIXED this problem by spending a great deal of time researching
and decompiling some classes.I encountered this when migrating from weblogic8 to 10
by this time you might have understood the pain in dealing with oracle weblogic tech support.
unfortunately they did not have a server configuration setting to disable this
You need to do 2 things
Step 1.You if you open the EJB jar files you can see
ejb-jar.xml=3435671213
com.mycompany.myejbs.ejb.DummyEJBService=2691629828
weblogic-ejb-jar.xml=3309609440
WLS_RELEASE_BUILD_VERSION_24=10.0.0.0
you see these hascodes for each of your ejb names.Make these hadcodes zero.
pack the jar file and deploy it on server.
com.mycompany.myejbs.ejb.DummyEJBService=0
weblogic-ejb-jar.xml=0
This is just a Marker file that weblogic.appc keeps in each ejb jar to trigger the recompilation
during server boot up.i automated this process of making these hadcodes to zero.
This hashcodes remain the same for each ejb even if you execute appc for more than once
if you add a new EJB class or delete a class those entries are added to this marker file
Note 1:
how to get this file?
if you open domains/yourdomain/servers/yourServerName/cache/EJBCompilerCache/XXXXXXXXX
you will see this file for each ejb.weblogic makes the hashcodes to zero after it recompiles
Note 2:
When you generate EJB using appc.generate them to a exploded directory using -output C:\myejb
instead of C:\myejb.jar.This way you can play around with the marker file
Step2.
Also you need a PATCH from weblogic.When you install the patch you see some message like this
"PATH CRXXXXXX installed successfully.Eliminate EJB recomilation for appc".
i dont remember the patch number but you can request weblogic for that.
You need to use both steps to fix the problem.The patch fixes only part of the problem
Goodluck!!
cheers
raj
the Marker file in EJBs is WL_GENERATED
Just to update the solution we went with - eventually we opted to recompile the EJBs once at the Customer's site instead of messing with the EJBs' internal markers (we don't want Oracle saying they cannot support problems derived from this scenario).
We created two KSH scripts - the first iterates over all the EJB jars, copies them to a temp dir and then re-compiles them in parallel by running several instances of the 2nd script which does only one thing: java -Drecompiler=yes -cp $CLASSPATH weblogic.appc $1 (With error handling of course :))
This solution reduced compilation time from 70min to 15min. After this we re-create the EAR file and redeploy it with the new EJBs. We do this once per several UAT environment creations, so we save quite a lot of time here (55min X num of envs per drop X num of drops)