ST05 shows FLAG=N"X" in S/4HANA system. Why? - abap

We are comparing performance between a classic R/3 and a S/4 system, and simple selects from standard function modules ( e.g. selecting records from an IDoc table ) look different in S/4.
The most interesting things are those:
The S/4 brings less performance, than R/3 ( with same amount of records stored in the db table )
When we see ( in this case in a FOR ALL ENTRIES A.K.A. FAE ) WHERE STATUS = 69 or FLAG ="X" inside the R/3, we see this prefixed with an N, like FLAG = N"X"....
I assume, this stands for negation, BUT the code says clearly EQUALS.
And because the performance is so bad compared to S/4, I assumed, the S/4 somehow sometimes cannot deal with FAE and one of the side-effects is, to negate the where clause on the fields of the FAE-related source-table...
What does this N stand for ?

FLAG=N"X"
It does not mean negation, it means the value is sent down to HANA as a Unicode hardcoded value (NCHAR).
S4 performance
This was not directly asked, but I think it is imporant to also answer. There can be several reasons why R3 is sometimes faster:
In R3 you have Oracle or DB2 (DB6) and the SELECT is using the perfect index and the data is in the cache
You are comparing SELECTs on a table that was transparent in R3, but is a compatibility view in S4, like MARC or ANLC
Your S4 hardware is slower than R3. It is quite common, new CPUs with very high core count run the individual cores slower than a decade ago. So the total throughput is much higher, but each individual report and transaction runs slower
In my experience these are the typical cases where S4 is slower than R3.

Related

SQL: how does parallelism of a join operation exactly work in a "shared nothing" architecture?

I'm currently reading a book about parallelism in a DBMS and I find it difficult to understand how parallelism works exactly in the join operation.
Suppose we have 10 systems, each system has each own disk space and main memory. There is a network with which the systems can communicate with each other in order for example to share data.
Now suppose we have the following operation: A(X,Y) JOIN B(Y,Z)
the tables A and B are too big so we want to use parallelism in order to gain better overall computing speed.
What we do is we apply a hash function on the 'Y' attribute for each record of A and B tables, and we send these records to a different system. Each system can then use a local algorithm in order to join the records that they got.
What I don't understand is, where exactly is the initial hash function being applied and where exactly are the initial tables A and B being stored?
While I was reading I thought that we had another "main" system, which had also its own disk space, and in this space we had all the initial information, which is table A and B with all their records. This system used its own main memory in order to apply the initial hash function, which determined for each record the system out of the total 10 where it will eventually go and be processed.
however upon reading I got stuck in the following example(I translate from Greek)
Let's say we have two tables R(X,Y) JOIN S(Y,Z) where R has 1000 pages and S 500 pages. Suppose that we have 10 systems that can be used in parallel. So we start by using a hash function to determine where we should send each record. The total amount of I/Os needed to read the tables R and S is 1500, which is 150 for each system. Each system will have 15 pages of data which are necessary for each of the remaining systems, so it sends 135 pages to the other nine systems. Hence the total communication is 1350 pages.
I don't really understand the bold part. Why would a system have to send any data to the other systems? Doesn't the "main" system I was talking about previously do this job?
I imagine something like this:
main_system
||
\/
apply_hash(record)
||
\/
send record to the appropriate system
/ / / / / / / / / /
s1 s2 s3 s4 s5 s6 s7 s8 s9 s10
now all systems have their own records, they apply the local algorithm and give the result to the output. No communication between the systems, what am I missing here?? does the book use a different approach and if so what kind of approach because I have read the same unit 3 times and I still don't get it(maybe a bad translation not sure though).
thanks in advance
In a shared-nothing system, the data is typically partitioned across the processors when the data is created. Although databases can be shared-nothing, probably the best documentation is for Hadoop and HDFS, the Hadoop distributed file system.
A function assigns rows to a partition. Some examples are: round-robin, where new rows are assigned to the processors one after the other; range-based, where rows are assigned to a processor based on the value of a column; hash-based, where rows are assigned to a processor based on a hash of the value. The process of partitioning the data is very similar to "partitioning" in databases such as SQL Server and Oracle which are not in a shared-nothing environment.
If your join uses the partition key for both tables, and the partitioning method is the same, then the data is already local. Otherwise, one or both tables need to be redistributed to continue the processing.
In the section that you quote, you are probably confused by the arithmetic. Remember that if you have 1500 pages across 10 processors, each will have on average 150 pages. These pages need to be redistributed. Say you are processor 3. About 15 pages will go to processor 1; another 15 to processor 2. And another to processor 3. Wait! You don't have to send these; they are already in the right spot. You only have to send 9*15 = 135 pages to other processors.
The key idea is that the same processors are storing the data as doing the processing, in a shared-notthing environment.
id take a wild guess that your connection is your local client. since it has a connection to all machines.

libpq very slow for large (20 million record) database

I am new to SQL/RDBMS.
I have an application which adds rows with 10 columns in PostgreSQL server using the libpq library. Right now, my server is running on same machine as my visual c++ application.
I have added around 15-20 million records. The simple query of getting total count is taking 4-5 minutes using select count(*) from <tableName>;.
I have indexed my table with the time I am entering the data (timecode). Most of the time I need count with different WHERE / AND clauses added.
Is there any way to make things fast? I need to make it as fast as possible because once the server moves to network, things will become much slower.
Thanks
I don't think network latency will be a large factor in how long your query takes. All the processing is being done on the PostgreSQL server.
The PostgreSQL MVCC design means each row in the table - not just the index(es) - must be walked to calculate the count(*) which is an expensive operation. In your case there are a lot of rows involved.
There is a good wiki page on this topic here http://wiki.postgresql.org/wiki/Slow_Counting with suggestions.
Two suggestions from this link, one is to use an index column:
select count(index-col) from ...;
... though this only works under some circumstances.
If you have more than one index see which one has the least cost by using:
EXPLAIN ANALYZE select count(index-col) from ...;
If you can live with an approximate value, another is to use a Postgres specific function for an approximate value like:
select reltuples from pg_class where relname='mytable';
How good this approximation is depends on how often autovacuum is set to run and many other factors; see the comments.
Consider pg_relation_size('tablename') and divide it by the seconds spent in
select count(*) from tablename
That will give the throughput of your disk(s) when doing a full scan of this table. If it's too low, you want to focus on improving that in the first place.
Having a good I/O subsystem and well performing operating system disk cache is crucial for databases.
The default postgres configuration is meant to not consume too much resources to play nice with other applications. Depending on your hardware and the overall utilization of the machine, you may want to adjust several performance parameters way up, like shared_buffers, effective_cache_size or work_mem. See the docs for your specific version and the wiki's performance optimization page.
Also note that the speed of select count(*)-style queries have nothing to do with libpq or the network, since only one resulting row is retrieved. It happens entirely server-side.
You don't state what your data is, but normally the why to handle tables with a very large amount of data is to partition the table. http://www.postgresql.org/docs/9.1/static/ddl-partitioning.html
This will not speed up your select count(*) from <tableName>; query, and might even slow it down, but if you are normally only interested in a portion of the data in the table this can be helpful.

Does the speed of the query depend on the number of rows in the table?

Let's say I have this query:
select * from table1 r where r.x = 5
Does the speed of this query depend on the number of rows that are present in table1?
The are many factors on the speed of a query, one of which can be the number of rows.
Others include:
index strategy (if you index column "x", you will see better performance than if it's not indexed)
server load
data caching - once you've executed a query, the data will be added to the data cache. So subsequent reruns will be much quicker as the data is coming from memory, not disk. Until such point where the data is removed from the cache
execution plan caching - to a lesser extent. Once a query is executed for the first time, the execution plan SQL Server comes up with will be cached for a period of time, for future executions to reuse.
server hardware
the way you've written the query (often one of the biggest contibutors to poor performance!). e.g. writing something using a cursor instead of a set-based operation
For databases with a large number of rows in tables, partitioning is usually something to consider (with SQL Server 2005 onwards, Enterprise Edition there is built-in support). This is to split the data down into smaller units. Generally, smaller units = smaller tables = smaller indexes = better performance.
Yes, and it can be very significant.
If there's 100 million rows, SQL server has to go through each of them and see if it matches.
That takes a lot more time compared to there being 10 rows.
You probably want an index on the 'x' column, in which case the sql server might check the index rather than going through all the rows - which can be significantly faster as the sql server might not even need to check all the values in the index.
On the other hand, if there's 100 million rows matching x = 5, it's slower than 10 rows.
Almost always yes. The real question is: what is the rate at which the query slows down as the table size increases? And the answer is: by not much if r.x is indexed, and by a large amount if not.
Not the rows (to a certain degree of course) per se, but the amount of data (columns) is what can make a query slow. The data also needs to be transfered from the backend to the frontend.
The Answer is Yes. But not the only factor.
if you did appropriate optimizations and tuning the performance drop will be negligible
Main Performance factors
Indexing Clustered or None clustered
Data Caching
Table Partitioning
Execution Plan caching
Data Distribution
Hardware specs
There are some other factors but these are mainly considered.
Even how you designed your Schema makes effect on the performance.
You should assume that your query always depends on the number of rows. In fact, you should assume the worst case (linear or O(N) for the example you provided) and exponential for more complex queries. There are database specific manuals filled with tricks to help you avoid the worst case but SQL itself is a language and doesn't specify how to execute your query. Instead, the database implementation decides how to execute any given query: if you have indexed a column or set of columns in your database then you will get O(log(N)) performance for a simple lookup; if the system has effective query caching you might get O(1) response. Here is a good introductory article: High scalability: SQL and computational complexity

SQL Server Int or BigInt database table Ids

I am writing a new program and it will require a database (SQL Server 2008). Everything I am running now for the system is 64-bit, which brings me to this question. For all of the Id columns in various tables, should I make them all INT or BIGINT? I doubt the system will ever surpass the INT range but it is a possibility within some of the larger financial tables I suppose. It seems like INT is the standard though...
OK, let's do a quick math recap:
INT is 32-bit and gives you basically 4 billion values - if you only count the values larger than zero, it's still 2 billion. Do you have this many employees? Customers? Products in stock? Orders in the lifetime of your company? REALLY?
BIGINT goes way way way beyond that. Do you REALLY need that?? REALLY?? If you're an astronomer, or into particle physics - maybe. An average Line of Business user? I strongly doubt it
Imagine you have a table with - say - 10 million rows (orders for your company). Let's say, you have an Orders table, and that OrderID which you made a BIGINT is referenced by 5 other tables, and used in 5 non-clustered indices on your Orders table - not overdone, I think, right?
10 million rows, by 5 tables plus 5 non-clustered indices, that's 100 million instances where you are using 8 bytes each instead of 4 bytes - 400 million bytes = 400 MB. A total waste... you'll need more data and index pages, your SQL Server will have to read more pages from disk and cache more pages.... that's not beneficial for your performance - plain and simple.
PLUS: What most programmer's don't think about: yes, disk space it dirt cheap. But that wasted space is also relevant in your SQL Server RAM memory and your database cache - and that space is not dirt cheap!
So to make a very long post short: use the smallest type of INT that really suits your need; if you have 10-20 distinct values to handle - use TINYINT. If you need an order table, I believe INT should be PLENTY ENOUGH - BIGINT is only a waste of space.
Plus: should any of your tables really ever get close to reaching 2 or 4 billion rows, you'll still have plenty of time to upgrade your table to a BIGINT ID, if that's really needed.......
Here is an article with some real answers on performance... I prefer to answer questions with hard numbers if possible... If you click the following link at least up to a million records you will find a negligible difference in disk usage....
http://www.sqlservercentral.com/articles/Performance+Tuning/2753/
Personally I do feel that using the appropriate ID size is important,but also consider the fact that you may have a table that has a ton of activity over time. It is not that your storing a massive amount of data, but that the key value has grown due to the nature of being auto-incremented (deletes and inserts occurring over time).
Consider a file repository on a community site, or the id of the user comments on a community site multi-tenant application.
I understand that most developers are building systems that will never touch millions of records, but it is important to note that there are reasons that a bigint is required, and I am still not convinced that when you are designing a schema that you do not know the potential growth for that you should not attempt to anticipate the future and consider using a bigint if you feel that the potential is there to exceed the max value of int as the id value grows.
You should use the smallest data type that makes sense for the table in question. That includes using smallint or even tinyint if there are few enough rows.
You'll save space on both data and indexes and get better index performance. Using a bigint when all you need is a smallint is similar to using a varchar(4000) when all you need is a varchar(50).
Even if the machine's native word size is 64 bits, that only means that 64-bit CPU operations won't be any slower than 32-bit operations. Most of the time, they also won't be faster, they'll be the same. But most databases are not going to be CPU bound anyway, they'll be I/O bound and to a lesser extent memory-bound, so a 50%-90% smaller data size is a Very Good Thing when you need to perform an index scan over 200 million rows.
The alignment of 32 bit numbers with x86 architecture or 64 bit with x64 architecture is called data structure alignment
This has no meaning for data in a database because here it's things disk space, data cache and table/index architecture that affect performance (as mentioned in other answers).
Remember, it's not the CPU accessing the data as such. It's the DB engine code (which may be aligned, but who cares?) that runs on the CPU and manipulates your data. When/if your data goes through the CPU it certainly won't be in the same on-disk structures.
Other people already gave compelling answers for 32-bit IDs.
For some applications 64-bit IDs do make more sense.
If you want to guarantee that IDs are unique across a cluster of databases - 63-bits for IDs can be very convenient. With 32 bits it's very difficult to distribute generation of IDs across servers in a cluster; or across data centers. While with 64 bits you have enough room to play with that you can conveniently generate IDs across servers without locking and still guarantee uniqueness.
For example see Twitter Snowflake, and Instagram Engineering's blog post on "Sharding & IDs at Instagram". Both provide good reasons why 63 or 64 bits make more sense for their IDs than 32-bit counters.
The first answer is the naive answer for anyone not working with TB size databases or tables with constant and high volume inserts. In any decent sized database you will run into problems with INT at some stage in its lifetime. Use BIGINT if you have to as it will save a lot of hassle further down the line. I have seen companies hit the INT issue after only a year of data and where reseeding was not an option it caused massive downtime. Also in long running systems (10 years+) where the system was not expected to still be used it has been hit even with moderate sized databases that purge old data. It is much better to use GUID in most cases where large amounts of data are expected but barring that use BIGINT if required.
You should judge each table individually as to what datatype would meet the needs for each one. If an INTEGER would meet the needs of a particular table, use that. If a SMALLINT would be sufficient, use that. Use the datatype that will last, without being excessive.

Do relational databases provide a feasible backend for a process historian?

In the process industry, lots of data is read, often at a high frequency, from several different data sources, such as NIR instruments as well as common instruments for pH, temperature, and pressure measurements. This data is often stored in a process historian, usually for a long time.
Due to this, process historians have different requirements than relational databases. Most queries to a process historian require either time stamps or time ranges to operate on, as well as a set of variables of interest.
Frequent and many INSERT, many SELECT, few or no UPDATE, almost no DELETE.
Q1. Is relational databases a good backend for a process historian?
A very naive implementation of a process historian in SQL could be something like this.
+------------------------------------------------+
| Variable |
+------------------------------------------------+
| Id : integer primary key |
| Name : nvarchar(32) |
+------------------------------------------------+
+------------------------------------------------+
| Data |
+------------------------------------------------+
| Id : integer primary key |
| Time : datetime |
| VariableId : integer foreign key (Variable.Id) |
| Value : float |
+------------------------------------------------+
This structure is very simple, but probably slow for normal process historian operations, as it lacks "sufficient" indexes.
But for example if the Variable table would consist of 1.000 rows (rather optimistic number), and data for all these 1.000 variables would be sampled once per minute (also an optimistic number) then the Data table would grow with 1.440.000 rows per day. Lets continue the example, estimate that each row would take about 16 bytes, which gives roughly 23 megabytes per day, not counting additional space for indexes and other overhead.
23 megabytes as such perhaps isn't that much but keep in mind that numbers of variables and samples in the example were optimistic and that the system will need to be operational 24/7/365.
Of course, archiving and compression comes to mind.
Q2. Is there a better way to accomplish this? Perhaps using some other table structure?
I work with a SQL Server 2008 database that has similar characteristics; heavy on insertion and selection, light on update/delete. About 100,000 "nodes" all sampling at least once per hour. And there's a twist; all of the incoming data for each "node" needs to be correlated against the history and used for validation, forecasting, etc. Oh, there's another twist; the data needs to be represented in 4 different ways, so there are essentially 4 different copies of this data, none of which can be derived from any of the other data with reasonable accuracy and within reasonable time. 23 megabytes would be a cakewalk; we're talking hundreds-of-gigabytes to terabytes here.
You'll learn a lot about scale in the process, about what techniques work and what don't, but modern SQL databases are definitely up to the task. This system that I just described? It's running on a 5-year-old IBM xSeries with 2 GB of RAM and a RAID 5 array, and it performs admirably, nobody has to wait more than a few seconds for even the most complex queries.
You'll need to optimize, of course. You'll need to denormalize frequently, and maintain pre-computed aggregates (or a data warehouse) if that's part of your reporting requirement. You might need to think outside the box a little: for example, we use a number of custom CLR types for raw data storage and CLR aggregates/functions for some of the more unusual transactional reports. SQL Server and other DB engines might not offer everything you need up-front, but you can work around their limitations.
You'll also want to cache - heavily. Maintain hourly, daily, weekly summaries. Invest in a front-end server with plenty of memory and cache as many reports as you can. This is in addition to whatever data warehousing solution you come up with if applicable.
One of the things you'll probably want to get rid of is that "Id" key in your hypothetical Data table. My guess is that Data is a leaf table - it usually is in these scenarios - and this makes it one of the few situations where I'd recommend a natural key over a surrogate. The same variable probably can't generate duplicate rows for the same timestamp, so all you really need is the variable and timestamp as your primary key. As the table gets larger and larger, having a separate index on variable and timestamp (which of course needs to be covering) is going to waste enormous amounts of space - 20, 50, 100 GB, easily. And of course every INSERT now needs to update two or more indexes.
I really believe that an RDBMS (or SQL database, if you prefer) is as capable for this task as any other if you exercise sufficient care and planning in your design. If you just start slinging tables together without any regard for performance or scale, then of course you will get into trouble later, and when the database is several hundred GB it will be difficult to dig yourself out of that hole.
But is it feasible? Absolutely. Monitor the performance constantly and over time you will learn what optimizations you need to make.
It sounds like you're talking about telemetry data (time stamps, data points).
We don't use SQL databases for this (although we do use SQL databases to organize it); instead, we use binary streaming files to capture the actual data. There are a number of binary file formats that are suitable for this, including HDF5 and CDF. The file format we use here is a proprietary compressible format. But then, we deal with hundreds of megabytes of telemetry data in one go.
You might find this article interesting (links directly to Microsoft Word document):
http://www.microsoft.com/caseStudies/ServeFileResource.aspx?4000003362
It is a case study from the McClaren group, describing how SQL Server 2008 is used to capture and process telemetry data from formula one race cars. Note that they don't actually store the telemetry data in the database; instead, it is stored in the file system, and the FILESTREAM capability of SQL Server 2008 is used to access it.
I believe you're headed in the right path. We have a similar situation were we work. Data comes from various transport / automation systems across various technologies such as manufacturing, auto, etc. Mainly we deal with the big 3: Ford, Chrysler, GM. But we've had a lot of data coming in from customers like CAT.
We ended up extracting data into a database and as long as you properly index your table, keep updates to a minimum and schedule maintenance (rebuild indexes, purge old data, update statistics) then I see no reason for this to be a bad solution; in fact I think it is a good solution.
Certainly a relational database is suitable for mining the data after the fact.
Various nuclear and particle physics experiments I have been involved with have explored several points from not using a RDBMS at all though storing just the run summaries or the run summaries and the slowly varying environmental conditions in the DB all the way to cramming every bit collected into the DB (though it was staged to disk first).
When and where the data rate allows more and more groups are moving towards putting as much data as possible into the database.
IBM Informix Dynamic Server (IDS) has a TimeSeries DataBlade and RealTime Loader which might provide relevant functionality.
Your naïve schema records each reading 100% independently, which makes it hard to correlate across readings- both for the same variable at different times and for different variables at (approximately) the same time. That may be necessary, but it makes life harder when dealing with subsequent processing. How much of an issue that is depends on how often you will need to run correlations across all 1000 variables (or even a significant percentage of the 1000 variables, where significant might be as small as 1% and would almost certainly start by 10%).
I would look to combine key variables into groups that can be recorded jointly. For example, if you have a monitor unit that records temperature, pressure and acidity (pH) at one location, and there are perhaps a hundred of these monitors in the plant that is being monitored, I would expect to group the three readings plus the location ID (or monitor ID) and time into a single row:
CREATE TABLE MonitorReading
(
MonitorID INTEGER NOT NULL REFERENCES MonitorUnit,
Time DATETIME NOT NULL,
PhReading FLOAT NOT NULL,
Pressure FLOAT NOT NULL,
Temperature FLOAT NOT NULL,
PRIMARY KEY (MonitorID, Time)
);
This saves having to do self-joins to see what the three readings were at a particular location at a particular time, and uses about 20 bytes instead of 3 * 16 = 48 bytes per row. If you are adamant that you need a unique ID integer for the record, that increases to 24 or 28 bytes (depending on whether you use a 4-byte or 8-byte integer for the ID column).
Yes, a DBMS is appropriate for this, although not the fastest option. You will need to invest in a reasonable system to handle the load though. I will address the rest of my answer to this problem.
It depends on how beefy a system you're willing to throw at the problem. There are two main limiters for how fast you can insert data into a DB: bulk I/O speed and seek time. A well-designed relational DB will perform at least 2 seeks per insertion: one to begin the transaction (in case the transaction can not be completed), and one when the transaction is committed. Add to this additional storage to seek to your index entries and update them.
If your data are large, then the limiting factor will be how fast you can write data. For a hard drive, this will be about 60-120 MB/s. For a solid state disk, you can expect upwards of 200 MB/s. You will (of course) want extra disks for a RAID array. The pertinent figure is storage bandwidth AKA sequential I/O speed.
If writing a lot of small transactions, the limitation will be how fast your disk can seek to a spot and write a small piece of data, measured in IO per second (IOPS). We can estimate that it will take 4-8 seeks per transaction (a reasonable case with transactions enabled and an index or two, plus some integrity checks). For a hard drive, the seek time will be several milliseconds, depending on disk RPM. This will limit you to several hundred writes per second. For a solid state disk, the seek time is under 1 ms, so you can write several THOUSAND transactions per second.
When updating indices, you will need to do about O(log n) small seeks to find where to update, so the DB will slow down as the record counts grow. Remember that a DB may not write in the most efficient format possible, so data size may be bigger than you expect.
So, in general, YES, you can do this with a DBMS, although you will want to invest in good storage to ensure it can keep up with your insertion rate. If you wish to cut on cost, you may want to roll data over a specific age (say 1 year) into a secondary, compressed archive format.
EDIT:
A DBMS is probably the easiest system to work with for storing recent data, but you should strongly consider the HDF5/CDF format someone else suggested for storing older, archived data. It is an flexible and widely supported format, provides compression, and provides for compression and VERY efficient storage of large time series and multi-dimensional arrays. I believe it also provides for some methods of indexing in the data. You should be able to write a little code to fetch from these archive files if data is too old to be in the DB.
There is probably a data structure that would be more optimal for your given case than a relational database.
Having said that, there are many reasons to go with a relational DB including robust code support, backup & replication technology and a large community of experts.
Your use case is similar to high-volume financial applications and telco applications. Both are frequently inserting data and frequently doing queries that are both time-based and include other select factors.
I worked on a mid-sized billing project that handled cable bills for millions of subscribers. That meant an average of around 5 rows per subscriber times a few million subscribers per month in the financial transaction table alone. That was easily handled by a mid-size Oracle server using (now) 4 year old hardware and software. Large billing platforms can have 10x that many records per unit time.
Properly architected and with the right hardware, this case can be handled well by modern relational DB's.
Years ago, a customer of ours tried to load an RDBMS with real-time data collected from monitoring plant machinery. It didn't work in a simplistic way.
Is relational databases a good backend for a process historian?
Yes, but. It needs to store summary data, not details.
You'll need a front-end based in-memory and on flat files. Periodic summaries and digests can be loaded into an RDBMS for further analysis.
You'll want to look at Data Warehousing techniques for this. Most of what you want to do is to split your data into two essential parts ---
Facts. The data that has units. Actual measurements.
Dimensions. The various attributes of the facts -- date, location, device, etc.
This leads you to a more sophisticated data model.
Fact: Key, Measure 1, Measure 2, ..., Measure n, Date, Geography, Device, Product Line, Customer, etc.
Dimension 1 (Date/Time): Year, Quarter, Month, Week, Day, Hour
Dimension 2 (Geography): location hierarchy of some kind
Dimension 3 (Device): attributes of the device
Dimension *n*: attributes of each dimension of the fact
You may want to look at KDB. It is specificaly optimized for this kind of usage: many inserts, few or no updates or deletes.
It isn't as easy to use as traditional RDBMS though.
The other aspect to consider is what kind of selects you're doing. Relational/SQL databases are great for doing complex joins dependent on multiple indexes, etc. They really can't be beaten for that. But if you're not doing that kind of thing, they're probably not such a great match.
If all you're doing is storing per-time records, I'd be tempted to roll your own file format ... even just output the stuff as CSV (groans from the audience, I know, but it's hard to beat for wide acceptance)
It really depends on your indexing/lookup requirements, and your willingness to write tools to do it.
You may want to take a look at a Stream Data Manager System (SDMS).
While not addressing all your needs (long-time persistence), sliding windows over time and rows and frequently changing data are their points of strength.
Some useful links:
Stanford Stream Data Manager
Stream Mill
Material about Continuous Queries
AFAIK major database makers all should have some kind of prototype version of an SDMS in the works, so I think it's a paradigm worth checking out.
I know you're asking about relational database systems, but those are unicorns. SQL DBMSs are probably a bad match for your needs because no current SQL system (I know of) provides reasonable facilities to deal with temporal data. depending on your needs you might or might not have another option in specialized tools and formats, see e. g. rrdtool.