SAP HANA Sequence reset by Max or Max +1 - hana

In SAP HANA we use sequences.
However I am not sure what to define for reset by
do I use select max(ID) from tbl or max(ID) + 1 from tbl?
resently we got an unique constrained violation for the ID field.
And the sequence is defined as select max(ID) from tbl
Also is it even better to avoid the option "reset by"?

The common logic for the RESET BY clause is to check the current value (max(ID)) and add an offset (e.g. +1) to avoid a double allocation of a key value.
Not using the option effectively disables the ability to automatically set the current sequence value to a value that will not collide with existing stored values.
To provide some context: usually the sequence number generator uses a cache (even though it's not set up by default) to allow for high-speed consumption of sequence numbers.
In case of a system failure, the numbers in the cache that have not yet been consumed are "lost" in the sense that the database doesn't retain the information which numbers from the cache had been fetched in a recoverable fashion.
By using the RESET BY clause, the "loss" of numbers can be reduced as the sequence gets set back to the last actually used sequence number.

Related

Pentaho Data Integration Import large dataset from DB

I'm trying to import a large set of data from one DB to another (MSSQL to MySQL).
The transformation does this: gets a subset of data, check if it's an update or an insert by checking hash, map the data and insert it into MySQL DB with an API call.
The subset part for the moment is strictly manual, is there a way to set Pentaho to do it for me, kind of iteration.
The query I'm using to get the subset is
select t1.*
from (
select *, ROW_NUMBER() as RowNum over (order by id)
from mytable
) t1
where RowNum between #offset and #offset + #limit;
Is there a way that PDI can set the offset and reiterate the whole?
Thanks
You can (despite the warnings) create a loop in a parent job, incrementing the offset variable each iteration in a Javascript step. I've used such a setup to consume webservices with an unknown number of results, shifting the offset each time I after get a full page and stopping when I get less.
Setting up the variables
In the job properties, define parameters Offset and Limit, so you can (re)start at any offset even invoke the job from the commandline with specific offset and limit. It can be done with a variables step too, but parameters do all the same things plus you can set defaults for testing.
Processing in the transformation
The main transformation(s) should have "pass parameter values to subtransformation" enabled, as it is by default.
Inside the transformation (see lower half of the image) you start with a Table Input that uses variable substitution, putting ${Offset} and ${Limit} where you have #offset and #limit.
The stream from Table Input then goes to processing, but also is copied to a Group By step for counting rows. Leave the group field empty and create a field that counts all rows. Check the box to always give back a result row.
Send the stream from Group By to a Set Variables step and set the NumRows variable in the scope of the parent job.
Looping back
In the main job, go from the transformations to a Simple Evaluation step to compare the NumRows variable to the Limit. If NumRows is smaller than ${Limit}, you've reached the last batch, success!
If not, proceed to a Javascript step to increment the Offset like this:
var offset = parseInt(parent_job.getVariable("Offset"),0);
var limit = parseInt(parent_job.getVariable("Limit"),0);
offset = offset + limit;
parent_job.setVariable("Offset",offset);
true;
The job flow then proceeds to the dummy step and then the transformation again, with the new offset value.
Notes
Unlike a transformation, you can set and use a variable within the same job.
The JS step needs "true;" as the last statement so it reports success to the job.

Prometheus: how to rate a sum of the same counter from different machines?

I have a Prometheus counter, for which I want to get its rate on a time range (the real target is to sum the rate, and sometimes use histogram_quantile on that for histogram metric).
However, I've got multiple machines running that kind of job, each one sets its own instance label. This causes different inc operations on this counter in different machines to create different entities of the counter, as the combination of labels values is unique.
The problem is that rate() works separately on each such counter entity.
The result is that counter entities with unique combinations don't get into account for rate().
For example, if I've got:
mycounter{aaa="1",instance="1.2.3.4:6666",job="job1"} value: 1
mycounter{aaa="2",instance="1.2.3.4:6666",job="job1"} value: 1
mycounter{aaa="2",instance="1.2.3.4:7777",job="job1"} value: 1
mycounter{aaa="1",instance="5.5.5.5:6666",job="job1"} value: 1
All counter entities are unique, so they get values of 1.
If counter labels are always unique (come from different machines), rate(mycounter[5m]) would get values of 0 in this case,
and sum(rate(mycounter[5m])) would get 0, which is not what I need!
I want to ignore the instance label so that it would refer these mycounter inc operations as they were made on the same counter entity.
In other words, I expect to have only 2 entities (they can have a common instance value or no instance label):
mycounter{aaa="1", job="job1"} value: 2
mycounter{aaa="2", job="job1"} value: 2
In such a case, inc operation in new machine (with existing aaa value) would increase some entity counter instead of adding new entity with value of 1, and rate() would get real rates for each, so we may sum() them.
How do I do that?
I made several tries to solve it but all failed:
Doing a rate() of the sum() - fails because of type mismatch...
Removing the automatic instance label, using metric_relabel_configswork with action: labeldrop in configuration, but then it assigns the default address value.
Changing all instance values to a common one using metric_relabel_configswork with replacement, but it seems that one of the entities overwrites all others, so it doesn't help...
Any suggestions?
Prometheus version: 2.3.2
Thanks in Advance!
You'd better expose your counters at 0 on application start, if the other labels (aaa, etc) have a limited set of possible combinations. This way rate() function works correctly at the bottom level and sum() will give you correct results.
If you have to do a rate() of the sum(), read this first:
Note that when combining rate() with an aggregation operator (e.g. sum()) or a function aggregating over time (any function ending in _over_time), always take a rate() first, then aggregate. Otherwise rate() cannot detect counter resets when your target restarts.
If you can tolerate this (or the instances reset counters at the same time), there's a way to work around. Define a recording rule as
record: job:mycounter:sum
expr: sum without(instance) (mycounter)
and then this expression works:
sum(rate(job:mycounter:sum[5m]))
The obvious query rate(sum(...)) won't work in most cases, since the resulting sum(...) may hide possible resets to zero for individual time series, which are passed to sum. So usually the correct answer is to use sum(rate(...)) instead. See this article for details.
Unfortunately, Prometheus may miss some increases for slow-changing counter when calculating rate() as shown in the original question above. The same applies to increase() calculations. See this issue, this comment and this article for details. Prometheus developers are going to fix these issues - see this design doc.
In the mean time try to use VictoriaMetrics when you need exact values for rate() and increase() functions over slow-changing counter (and distributed counter).

oracle/sql Creating a sequence without triggering

I created a sequence with beginning value as 2.
create sequence seq_ben
start with 2
increment by 1
nocache
nocycle;
when i was asked to show the next two numbers of the sequence i wrote
select seq_ben.nextval from dual
and ran this code twice to give next two values, then i was asked to show the next sequence without triggering it to move the next number and Use its next three values to add new rows to the the above sequence. Is this possible ? how can it generate a next sequence without triggering it?
You can use CURRVAL, if you have referenced NEXTVAL at least once in the current session.
However, I believe that if you really want to know the next number in the sequence, there is something fundamentally wrong about your design. Sequences are design such that NEXTVAL is an atomic operation, and no two sessions may get the same number. Or an incrementing unique identifier, in other words. That's the only guarantee it gives you. With this design, it is almost meaningless to ask for the next possible value of a sequence.
You may try to use MAX(), which is often used as a poor man's solution to sequences.

How predictable is NEWSEQUENTIALID?

According to Microsoft's documentation on NEWSEQUENTIALID, the output of NEWSEQUENTIALID is predictable. But how predictable is predictable? Say I have a GUID that was generated by NEWSEQUENTIALID, how hard would it be to:
Calculate the next value?
Calculate the previous value?
Calculate the first value?
Calculate the first value, even without knowing any GUID's at all?
Calculate the amount of rows? E.g. when using integers, /order?id=842 tells me that there are 842 orders in the application.
Below is some background information about what I am doing and what the various tradeoffs are.
One of the security benefits of using GUID's over integers as primary keys is that GUID's are hard to guess. E.g. say a hacker sees a URL like /user?id=845 he might try to access /user?id=0, since it is probable that the first user in the database is an administrative user. Moreover, a hacker can iterate over /user?id=0..1..2 to quickly gather all users.
Similarly, a privacy downside of integers is that they leak information. /order?id=482 tells me that the web shop has had 482 orders since its implementation.
Unfortunately, using GUID's as primary keys has well-known performance downsides. To this end, SQL Server introduced the NEWSEQUENTIALID function. In this question, I would like to learn how predictable the output of NEWSEQUENTIALID is.
The underlying OS function is UuidCreateSequential. The value is derived from one of your network cards MAC address and a per-os-boot incremental value. See RFC4122. SQL Server does some byte-shuffling to make the result sort properly. So the value is highly predictable, in a sense. Specifically, if you know a value you can immediately predict a range of similar value.
However one cannot predict the equivalent of id=0, nor can it predict that 52DE358F-45F1-E311-93EA-00269E58F20D means the store sold at least 482 items.
The only 'approved' random generation is CRYPT_GEN_RANDOM (which wraps CryptGenRandom) but that is obviously a horrible key candidate.
In most cases, the next newsequentialid can be predicted by taking the current value and adding one to the first hex pair.
In other words:
1E29E599-45F1-E311-80CA-00155D008B1C
is followed by
1F29E599-45F1-E311-80CA-00155D008B1C
is followed by
2029E599-45F1-E311-80CA-00155D008B1C
Occasionally, the sequence will restart from a new value.
So, it's very predictable
NewSequentialID is a wrapper around the windows function UuidCreateSequential
You can try this code:
DECLARE #tbl TABLE (
PK uniqueidentifier DEFAULT NEWSEQUENTIALID(),
Num int
)
INSERT INTO #tbl(Num) values(1),(2),(3),(4),(5)
select * from #tbl
On my machine in this time is result:
PK Num
52DE358F-45F1-E311-93EA-00269E58F20D 1
53DE358F-45F1-E311-93EA-00269E58F20D 2
54DE358F-45F1-E311-93EA-00269E58F20D 3
55DE358F-45F1-E311-93EA-00269E58F20D 4
56DE358F-45F1-E311-93EA-00269E58F20D 5
You should try it several times in different time/date to interpolate the behaviour.
I tried it run several times and the first part is changing everytime (you see in results: 52...,53...,54...,etc...). I waited some time to check it, and after some time the second part is incremented too. I suppose the incementation continues to the all parts. Basically it look like simple +=1 incementation transformed into Guid.
EDIT:
If you want sequential GUID and you want have control over the values, you can use Sequences.
Sample code:
select cast(cast(next value for [dbo].[MySequence] as varbinary(max)) as uniqueidentifier)
• Calculate the next value? Yes
Microsoft says:
If privacy is a concern, do not use this function. It is possible to guess the value of the next generated GUID and, therefore, access data associated with that GUID.
SO it's a possibility to get the next value. I don't find information if it is possible to get the prevoius one.
from: http://msdn.microsoft.com/en-us/library/ms189786.aspx
edit: another few words about NEWSEQUENTIALID and security: http://vadivel.blogspot.com/2007/09/newid-vs-newsequentialid.html
Edit:
NewSequentialID contains the server's MAC address (or one of them), therefore knowing a sequential ID gives a potential attacker information that may be useful as part of a security or DoS attack.
from: Are there any downsides to using NewSequentialID?

Invalid numbers on sequence

When I created a sequence for a table article, it's started from 17 not 1
CREATE SEQUENCE seq_article START WITH 1 INCREMENT BY 1;
CREATE OR REPLACE TRIGGER auto_article BEFORE insert ON article
FOR EACH ROW
BEGIN
SELECT seq_article.NEXTVAL INTO :NEW.id_article FROM dual;
END;
/
I tried to delete all rows and creat other data, this time it's started from 19. How can I fix that?
I'm not sure that I understand the problem.
A sequence generates unique values. Unless you set the sequence to CYCLE and you exceed the MAXVALUE (not realistically possible given the definition you posted) or you manually reset the sequence (say, by setting the INCREMENT BY to -16, fetching a nextval, and then setting the INCREMENT BY back to 1), it won't ever generate a value of 1 a second time. Deleting the data has no impact on the next id_article that will be generated.
A sequence-generated column will have gaps. Whether because the sequence cache gets aged out of the shared pool or because a transaction was rolled back, not every value will end up in the table. If you really need gap-free values, you cannot use a sequence. Of course, that means that you would have to serialize INSERT operations which will massively decrease the scalability of your application.