Cannot stop then start Geode server with a disk store - gemfire

I start Geode with a disk store using the example config. Then if I stop the server and start it again I get:
Exception in thread "main" java.lang.IllegalStateException: Detected multiple disk store initialization files named "BACKUPDeal.if". This disk store directories must only contain one initialization file.
If I remove the if file, then the error is: Exception in thread "main" java.lang.IllegalStateException: The init file "D:\deal\BACKUPdeal.if" does not exist. If it no longer exists then delete the following files to be able to create this disk store. Existing oplogs are: [D:\deal\BACKUPdeal_1.drf, D:\deal\BACKUPdeal_1.crf]
What am I missing here?

The example config uses 2 disk stores and I had renamed these to the same store, therefore it couldn't initialize 2 disk stores out of 1. To solve it I removed the 2nd disk store from the config like so:
<disk-store name="deal"
compaction-threshold="40"
auto-compact="false"
allow-force-compaction="true"
max-oplog-size="512"
queue-size="10000"
time-interval="15"
write-buffer-size="65536"
disk-usage-warning-percentage="80"
disk-usage-critical-percentage="98">
<disk-dirs>
<disk-dir>d:\deal</disk-dir>
</disk-dirs>
</disk-store>

Related

using txm_module_manager_memory_load to load Module in ThreadX

I am trying to load 2 modules differently:
Module_1 using txm_module_manager_memory_load
Module_2 using txm_module_manager_in_place_load
Results: Just after loading them and starting Module_1, it throws a UsageFault error (Module 2 didn't start yet), both modules share the same byte_pool created from txm_module_manager_initialize.
I could not catch the error since all APIs return TX_SUCCESS
Referring to the x-cube-azrtos-h7 TX-MPU example, what would change to app_threadX to load Module_1 properly?
Or is it about something to specify in link file STM32H7xx_FLASH.ld?
EDIT: the following is the latest threads status and "_txm_module_manager_memory_fault_info" value :
ThreadX will allocate some memory from the byte pool created in txm_module_manager_initialize and copy Module_1 from wherever it is located into that allocated memory. I assume txm_module_manager_memory_load returns TX_SUCCESS, as does txm_module_manager_start when you start Module_1. Can you step through the scheduler and when it schedules a thread from Module_1 (the first thread it will schedule is the "Module Start Thread" that gets created in txm_module_manager_start), how far into the module execution does the usage fault occur?

Spark - Failed to load collect frame - "RetryingBlockFetcher - Exception while beginning fetch"

We have a Scala Spark application, that reads something like 70K records from the DB to a data frame, each record has 2 fields.
After reading the data from the DB, we make minor mapping and load this as a broadcast for later usage.
Now, in local environment, there is an exception, timeout from the RetryingBlockFetcher while running the following code:
dataframe.select("id", "mapping_id")
.rdd.map(row => row.getString(0) -> row.getLong(1))
.collectAsMap().toMap
The exception is:
2022-06-06 10:08:13.077 task-result-getter-2 ERROR
org.apache.spark.network.shuffle.RetryingBlockFetcher Exception while
beginning fetch of 1 outstanding blocks
java.io.IOException: Failed to connect to /1.1.1.1:62788
at
org.apache.spark.network.client.
TransportClientFactory.createClient(Transpor .tClientFactory.java:253)
at
org.apache.spark.network.client.
TransportClientFactory.createClient(TransportClientFactory.java:195)
at
org.apache.spark.network.netty.
NettyBlockTransferService$$anon$2.
createAndStart(NettyBlockTransferService.scala:122)
In the local environment, I simply create the spark session with local "spark.master"
When I limit the max of records to 20K, it works well.
Can you please help? maybe I need to configure something in my local environment in order that the original code will work properly?
Update:
I tried to change a lot of Spark-related configurations in my local environment, both memory, a number of executors, timeout-related settings, and more, but nothing helped! I just got the timeout after more time...
I realized that the data frame that I'm reading from the DB has 1 partition of 62K records, while trying to repartition with 2 or more partitions the process worked correctly and I managed to map and collect as needed.
Any idea why this solves the issue? Is there a configuration in the spark that can solve this instead of repartition?
Thanks!

Cache partition not replicated

I have 2 nodes with the persistence enabled. I create a cache like so
// all the queues across the frontier instances
CacheConfiguration cacheCfg2 = new CacheConfiguration("queues");
cacheCfg2.setBackups(backups);
cacheCfg2.setCacheMode(CacheMode.PARTITIONED);
globalQueueCache = ignite.getOrCreateCache(cacheCfg2);
where backups is a value > 1
When one of the nodes dies, I get
Exception in thread "Thread-2" javax.cache.CacheException: class org.apache.ignite.internal.processors.cache.CacheInvalidStateException: Failed to execute query because cache partition has been lostParts [cacheName=queues, part=2]
at org.apache.ignite.internal.processors.cache.query.GridCacheQueryAdapter.executeScanQuery(GridCacheQueryAdapter.java:597)
at org.apache.ignite.internal.processors.cache.IgniteCacheProxyImpl$1.applyx(IgniteCacheProxyImpl.java:519)
at org.apache.ignite.internal.processors.cache.IgniteCacheProxyImpl$1.applyx(IgniteCacheProxyImpl.java:517)
at org.apache.ignite.internal.util.lang.IgniteOutClosureX.apply(IgniteOutClosureX.java:36)
at org.apache.ignite.internal.processors.query.GridQueryProcessor.executeQuery(GridQueryProcessor.java:3482)
at org.apache.ignite.internal.processors.cache.IgniteCacheProxyImpl.query(IgniteCacheProxyImpl.java:516)
at org.apache.ignite.internal.processors.cache.IgniteCacheProxyImpl.query(IgniteCacheProxyImpl.java:843)
at org.apache.ignite.internal.processors.cache.GatewayProtectedCacheProxy.query(GatewayProtectedCacheProxy.java:418)
at crawlercommons.urlfrontier.service.ignite.IgniteService$QueueCheck.run(IgniteService.java:270)
Caused by: class org.apache.ignite.internal.processors.cache.CacheInvalidStateException: Failed to execute query because cache partition has been lostParts [cacheName=queues, part=2]
... 9 more
I expected the content to have been replicated onto the other node. Why isn't that the case?
Most likely there is a misconfiguration somewhere. Check the following:
you are not working with an existing cache (replace getOrCreateCache to createCache)
you are not having more server nodes than the backup factor is
inspect the logs for "Detected lost partitions" message and what happened prior

Copy data activity fails when sink is immutable

In my Azure data factory pipeline, I'm using a Copy data activity inside a ForEach activity to copy files from an input container to an archive container before processing the files in the input container. This normally works, but today I made the archive container immutable by adding a legal hold policy to it, and the next time the copy data activity ran, it failed with an error (see below). Is there any way around this, since you should be able to add new files to an immutable container?
Error code: 2200
Failure type: User configuration issue
Details:
Failure happened on 'Sink' side. ErrorCode=AdlsGen2OperationFailed,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=ADLS Gen2 operation failed for: Operation returned an invalid status code 'Conflict'. Account: 'mydatalake'. FileSystem: 'raw'. Path: 'Source/ABC/File_2021_03_24.csv'. ErrorCode: 'PathImmutableDueToLegalHold'. Message: 'This operation is not permitted as the path is immutable due to one or more legal holds.'. RequestId: '37f75e88-501a-0026-2fa1-20d52e000000'. TimeStamp: 'Wed, 24 Mar 2021 11:30:54 GMT'..,Source=Microsoft.DataTransfer.ClientLibrary,''Type=Microsoft.Azure.Storage.Data.Models.ErrorSchemaException,Message=Operation returned an invalid status code 'Conflict',Source=Microsoft.DataTransfer.ClientLibrary,'
Source: Pipeline LoadMyData

how to run multiple coordinators in oozie bundle

I'm fresher in oozie bundle. I want to run multiple coordinators one after another in bundle job.My requirement is after completion of one coordinator job _SUCCESS file will be generated, then by using that _SUCCESS file second coordinator should be triggered. I don't know how to do that.For that i used data dependency technique which will keep track for generated output files of previous coordinator. I'm sharing some code which i tried.
Lets say there are 2 coordinator jobs:A and B.and i want to trigger only A coordinator.and if _SUCCESS file for Coordinator A generated then only Coordinator B should get start.
A - coordinator.xml
<workflow>
<app-path>${aDir}/aWorkflow</app-path>
</workflow>
this will call respective workflow.and _SUCCESS file is generated at ${aDir}/aWorkflow/final_data/${date}/aDim location so i included this location in
B coordinator:
<dataset name="input1" frequency="${freq}" initial-instance="${START_TIME1}" timezone="UTC">
<uri-template>${aDir}/aWorkflow/final_data/${date}/aDim</uri-template>
</dataset>
<done-flag>_SUCCESS</done-flag>
<data-in name="coordInput1" dataset="input1">
<instance>${START_TIME1}</instance>
</data-in>
<workflow>
<app-path>${bDir}/bWorkflow</app-path>
</workflow>
but when i run it first coordinator gets KILLED itself, but if i run individually they are running successfully.i'm not getting why these are all getting KILLED.
help to sort out
I find out easy way to do that. I'm sharing solution.For coordinator B coordinator.xml I'm sharing.
1)For Data-set instance should be start time of second one but it should not be time instance of first coordinator.otherwise that particular coordinator will get KILLED.
2)If you want to run multiple coordinators one after another then you can also include controls in coordinator.xml. e.g. concurrency, timeout or throttle. Detailed information about these controls you can find out in "apache oozie" book's 6th chapter.
3)in "" i included latest(0) it will take latest generated folder in mentioned output path.
4)for "input-events" it is mandatory to include it's name as a input to ${coord:dataIn('coordInput1')}.otherwise oozie will not consider dataset.
30
1
${aimDir}/aDimWorkflow/final_data/${date}/aDim
_SUCCESS
${coord:latest(0)}
${bDir}/bWorkflow
input_files
${coord:dataIn('coordInput1')}