When I try to use Aerospike client Write() I obtain this error:
22 AS_PROTO_RESULT_FAIL_FORBIDDEN
The error occurs only when the Write operation is called after a Truncate() and only on specific keys.
I tried to:
change the key type (string, long, small numbers, big numbers)
change the Key type passed (Value, long, string)
change the retries number on WritePolicy
add a delay (200ms, 500ms) before every write
generate completely new keys (GUID.NewGuid().ToString())
None solved the case so I think the unique cause is the Truncate operation.
The error is systematic; for the same set of keys fail exactly on the same keys.
The error occurs also when after calling the Truncate I wait X seconds and checking the Console Management the Objects number on the Set is "0" .
I have to wait minutes (1 to 5) to be sure that running the process the problem is gone.
The cluster has 3 nodes with replica factor of 2. SSD persistence
I'm using the NuGet C# Aerospike.Client v 3.4.4
Running the process on a single local node (docker, in memory) does not give any error.
How can I know when the Truncate() process (the delete operation behind it) is completely terminated and I can safely use the Set ?
[Solution]
As suggested our devops checked the timespan synchronization. He found that the NTP was not enabled on the machine images (by mistake).
Enabled it. Tested again. No more errors.
Thanks,
Alex
Sounds like a potential issue with time synchronization across nodes, make sure you have ntp setup correctly... That would be my only guess at this point, especially as you are mentioning it does work on a single node. The truncate command will capture the current time (if you don't specify a time) and will use that to prevent records written 'prior' to that time from being written. Check under the (from top of my head, sorry if not exactly this) /opt/aerospike/smd/truncate.smd to see on each node the timestamp of the truncated command and check the time across the different nodes.
[Thanks #kporter for the comment. So the time would be the same in all truncate.smd file, but a time discrepancy between machine would then still cause writes to fail against some of the nodes]
Related
MarkLogic 9.0.8.2
We have around 20M records in our database in XML format.
To work with facets, we have created element-rage-index on the given element.
It is working fine, so no issue there.
Real problem is that, we now want to deploy same code on different environments like System Test(ST), UAT, Production.
Before deploying code, we have to make sure that given index exist. So we execute it in 1/2 days in advance.
We noticed that until full indexing is completed, we can't deploy our code else it will start showing up errors like this.
<error:code>XDMP-ELEMRIDXNOTFOUND</error:code>
<error:name/>
<error:xquery-version>1.0-ml</error:xquery-version>
<error:message>No element range index</error:message>
<error:format-string>XDMP-ELEMRIDXNOTFOUND: cts:element-reference(fn:QName("","tc"), ("type=string", "collation=http://marklogic.com/collation/")) -- No string element range index for tc collation=http://marklogic.com/collation/ </error:format-string>
And once index is finished, same code will run as expected.
Specially in ST/UAT, we are fine if we get partial data with unfinished indexing.
Is there any way we can achieve this? else we are loosing too much time just to wait for index to finish.
This happens every time when we come up with new feature which has dependency on new index
You can only use a range index if it exists and is available. It is not available until all matching records have been indexed.
You should create your indexes earlier and allow enough time to finish reindexing before deploying code that uses them. Maybe make your code deployment depend upon the reindexing status and not allow for it to be deployed until it has completed.
If the new versions of your applications can function without the indexes (value query instead of range-query), or you are fine with queries returning inaccurate results, then you could enable/disable the section of code utilizing them with feature flags, or wrap with try/catch, but you really should just create the indexes earlier in your deployment cycles.
Otherwise, if you are performing tests without a complete and functioning environment, what are you really testing?
I have a set of strings stored in Redis via Redisson. When I use Redisson to get the set, if I call size() on the set (which returns a positive integer that indicates its size, and thus indicating that there are items in the set), and then (afterward) attempt to get an iterator and read from the set, it now appears to contain zero elements. If I stop my application, and then restart it, and if I skip getting the size of the set, and only get the iterator, then I can iterate through the data. I have tried getting the set at application startup, and keeping the reference to use throughout the execution, and I have also tried using the client to get the set each time, but the behavior is the same with both approaches.
So how can I get the cardinality of the set, and then get an iterator to read through the set?
So, for some reason, I decided to configure my client in single server mode, and the operations worked the way I expected them to work. I could get the cardinality of the set, then iterate through the set without any problem. So there must be some configuration problems with the slave nodes. But what I would like to know is why it consistently works the first time and consistently fails the next time. I.e., I can always get the set size first, or I can always iterate over the set first, and then the next operation always implies that the set is empty. So, I understand that improperly configured slave nodes would cause problems, but it is strange that it behaves so consistently when things go wrong.
I'm attempting to migrate IngestionTime (_PARTITIONTIME) to TIMESTAMP partitioned tables in BQ. In doing so, I also need to add several required columns. However, when I flip the switch and redirect my dataflow to the new TIMESTAMP partitioned table, it breaks. Things to note:
Approximately two million rows (likely one batch) is successfully inserted. The job continues to run but doesn't insert anything after that.
The job runs in batches.
My project is entirely in Java
When I run it as streaming, it appears to work as intended. Unfortunately, it's not practical for my use case and batch is required.
I've been investigating the issue for a couple of days and tried to break down the transition into the smallest steps possible. It appears that the step responsible for the error is introducing REQUIRED variables (it works fine when the same variables are NULLABLE). To avoid any possible parsing errors, I've set default values for all of the REQUIRED variables.
At the moment, I get the following combination of errors and I'm not sure how to address any of them:
The first error, repeats infrequently but usually in groups:
Profiling Agent not found. Profiles will not be available from this
worker
Occurs a lot and in large groups:
Can't verify serialized elements of type BoundedSource have well defined equals method. This may produce incorrect results on some PipelineRunner
Appears to be one very large group of these:
Aborting Operations. java.lang.RuntimeException: Unable to read value from state
Towards the end, this error appears every 5 minutes only surrounded by mild parsing errors described below.
Processing stuck in step BigQueryIO.Write/BatchLoads/SinglePartitionWriteTables/ParMultiDo(WriteTables) for at least 20m00s without outputting or completing in state finish
Due to the sheer volume of data my project parses, there are several parsing errors such as Unexpected character. They're rare but shouldn't break data insertion. If they do, I have a bigger problem as the data I collect changes frequently and I can adjust the parser only after I see the error, and therefore, see the new data format. Additionally, this doesn't cause the ingestiontime table to break (or my other timestamp partition tables to break). That being said, here's an example of a parsing error:
Error: Unexpected character (',' (code 44)): was expecting double-quote to start field name
EDIT:
Some relevant sample code:
public PipelineResult streamData() {
try {
GenericSection generic = new GenericSection(options.getBQProject(), options.getBQDataset(), options.getBQTable());
Pipeline pipeline = Pipeline.create(options);
pipeline.apply("Read PubSub Events", PubsubIO.readMessagesWithAttributes().fromSubscription(options.getInputSubscription()))
.apply(options.getWindowDuration() + " Windowing", generic.getWindowDuration(options.getWindowDuration()))
.apply(generic.getPubsubToString())
.apply(ParDo.of(new CrowdStrikeFunctions.RowBuilder()))
.apply(new BigQueryBuilder().setBQDest(generic.getBQDest())
.setStreaming(options.getStreamingUpload())
.setTriggeringFrequency(options.getTriggeringFrequency())
.build());
return pipeline.run();
}
catch (Exception e) {
LOG.error(e.getMessage(), e);
return null;
}
Writing to BQ. I did try to set the partitoning field here directly, but it didn't seem to affect anything:
BigQueryIO.writeTableRows()
.to(BQDest)
.withMethod(Method.FILE_LOADS)
.withNumFileShards(1000)
.withTriggeringFrequency(this.triggeringFrequency)
.withTimePartitioning(new TimePartitioning().setType("DAY"))
.withWriteDisposition(BigQueryIO.Write.WriteDisposition.WRITE_APPEND)
.withCreateDisposition(BigQueryIO.Write.CreateDisposition.CREATE_NEVER);
}
After a lot of digging, I found the error. I had parsing logic (a try/catch) that returned nothing (essentially a null row) in the event there was a parsing error. This would break BigQuery as my schema had several REQUIRED rows.
Since my job ran in batches, even one null row would cause the entire batch job to fail and not insert anything. This also explains why streaming inserted just fine. I'm surprised that BigQuery didn't throw an error claiming that I was attempting to insert a null into a required field.
In reaching this conclusion, I also realized that setting the partition field in my code was also necessary as opposed to just in the schema. It could be done using
.setField(partitionField)
I am currently working on project with infinispan 8.1.3. I want to make sure that the node who created object must be owner of that entry all the time in distribution mode .Is there any option to meet my requirement??. I heard the flag LOCAL_MODE.but, it stores entry in local only .I dont know if that node down, local cahe entry will be shared to another node??. thanks
Don't use flags unless you exactly know what you're doing. Flag.CACHE_MODE_LOCAL means that you won't execute any RPC when doing that operation, but in case that the key does not route to this node, a write will result in a noop and read will return null.
It's not possible to tie the entry to the node exclusively - what would you do if this node crashes?
However, if the cluster is stable enough, there's the Key Affinity Service that will give you a key that belongs to this node. See next chapter about grouping, too, it might fit your use case.
EDIT: Instead moving data to the executing node, you can move the execution towards the data. With Grouping API you can find the data by the group, using
Address owningNode = cache.getAdvancedCache().getDistributionManager()
.getCacheTopology().getDistributionInfo(group).primary();
ClusterExecutor executor = cache.getCacheManager().executor()
.filterTargets(Collections.singleton(owningNode));
executor.submit(...)
I have following scenario:
Fetch array of numbers (from REDIS) conditionally
For each number do some async stuff (fetch something from DB based on number)
For each thing in result set from DB do another async stuff
Periodically repeat 1. 2. 3. because new numbers will be constantly added to REDIS structure.Those numbers represent unix timestamp in milliseconds so out of the box those numbers will always be sorted in time of addition
Conditionally means fetch those unix timestamp from REDIS that are less or equal to current unix timestamp in milliseconds(Date.now())
Question is what REDIS data type fit the most for this use case having in mind that this code will be scaled up to N instances, so N instances will share access to single REDIS instance. To equally share the load each instance will read for example first(oldest) 5 numbers from REDIS. Numbers are unique (adding same number should fail silently) so REDIS SET seems like a good choice but reading M first elements from REDIS set seems impossible.
To prevent two different instance of the code to read same numbers REDIS read operation should be atomic, it should read the numbers and delete them. If any async operation fail on specific number (steps 2. and 3.), numbers should be added again to REDIS to be handled again. They should be re-added back to the head not to the end to be handled again as soon as possible. As far as i know SADD would push it to the tail.
SMEMBERS key would read everything, it looks like a hammer to me. I would need to include some application logic to get first five than to check what is less or equal to Date.now() and then to delete those and to wrap somehow everything in single transaction. Besides that set cardinality can be huge.
SSCAN sounds interesting but i don't have any clue how it works in "scaled" environment like described above. Besides that, per REDIS docs: The SCAN family of commands only offer limited guarantees about the returned elements since the collection that we incrementally iterate can change during the iteration process. Like described above collection will be changed frequently
A more appropriate data structure would be the Sorted Set - members have a float score that is very suitable for storing a timestamp and you can perform range searches (i.e. anything less or equal a given value).
The relevant starting points are the ZADD, ZRANGEBYSCORE and ZREMRANGEBYSCORE commands.
To ensure the atomicity when reading and removing members, you can choose between the the following options: Redis transactions, Redis Lua script and in the next version (v4) a Redis module.
Transactions
Using transactions simply means doing the following code running on your instances:
MULTI
ZRANGEBYSCORE <keyname> -inf <now-timestamp>
ZREMRANGEBYSCORE <keyname> -inf <now-timestamp>
EXEC
Where <keyname> is your key's name and <now-timestamp> is the current time.
Lua script
A Lua script can be cached and runs embedded in the server, so in some cases it is a preferable approach. It is definitely the best approach for short snippets of atomic logic if you need flow control (remember that a MULTI transaction returns the values only after execution). Such a script would look as follows:
local r = redis.call('ZRANGEBYSCORE', KEYS[1], '-inf', ARGV[1])
redis.call('ZREMRANGEBYSCORE', KEYS[1], '-inf', ARGV[1])
return r
To run this, first cache it using SCRIPT LOAD and then call it with EVALSHA like so:
EVALSHA <script-sha> 1 <key-name> <now-timestamp>
Where <script-sha> is the sha1 of the script returned by SCRIPT LOAD.
Redis modules
In the near future, once v4 is GA you'll be able to write and use modules. Once this becomes a reality, you'll be able to use this module we've made that provides the ZPOP command and could be extended to cover this use case as well.