Data compression algorithm in SAP clusters

Data compression algorithm in SAP clusters - sap

I'm interested in data decompression in SAP systems.
Which algorithm is used for compression/decompression in clustered tables? For example, in RFBLG table. I read something about by LZ algorithm but I'm not sure how it exactly works. Do it some detailed description exist how it works in SAP?

The compression mechanisms would be highly dependent on the DB backend behind the Netweaver.
You can view compression method in Database Utility (SE14) by Go To->Storage Parameters.
Generally SAP uses three compression types for clustered (as well) data:
NONE. No compression
ROW. It stores row in a variable length and search repetitive patterns to compress.
PAGE. It is performed on top of row compression.
However, their implementation by DB vendors may differ significantly.
SAP and MS created MSSCOMPRESS report for performing table compression. Look at these articles too, where MS mentions about UCS-2 compression for SAP systems:
Choosing SQL Server compression type for SAP
UCS2 compression what is it – Impact on SAP systems
Oracle involves its own compression mechanisms which are described, for example, in note 1436352, called Oracle Database 11g Advanced Compression for SAP Systems (S-LOGIN required).
DB2 uses LZ2 (Lempel-Z) algorithm as your correctly stated. Here is detailed manual.

Related

How can NoSQL databases achieve much better write throughput than some relational databases?

How is this possible? What is it about NoSQL that gives it a higher write throughput than some RDBMS? Does it boil down to scalability?

Some noSQL systems are basically just persistent key/value storages (like Project Voldemort). If your queries are of the type "look up the value for a given key", such a system will (or at least should be) faster that an RDBMS, because it only needs to have a much smaller feature set.
Another popular type of noSQL system is the document database (like CouchDB). These databases have no predefined data structure. Their speed advantage relies heavily on denormalization and creating a data layout that is tailored to the queries that you will run on it. For example, for a blog, you could save a blog post in a document together with its comments. This reduces the need for joins and lookups, making your queries faster, but it also could reduce your flexibility regarding queries.
There are many NoSQL solutions around, each one with its own strengths and weaknesses, so the following must be taken with a grain of salt.
But essentially, what many NoSQL databases do is rely on denormalization and try to optimize for the denormalized case. For instance, say you are reading a blog post together with its comments in a document-oriented database. Often, the comments will be saved together with the post itself. This means that it will be faster to retrieve all of them together, as they are stored in the same place and you do not have to perform a join.
Of course, you can do the same in SQL, and denormalizing is a common practice when one needs performance. It is just that many NoSQL solutions are engineered from the start to be always used this way. You then get the usual tradeoffs: for instance, adding a comment in the above example will be slower because you have to save the whole document with it. And once you have denormalized, you have to take care of preserving data integrity in your application.
Moreover, in many NoSQL solutions, it is impossible to do arbitrary joins, hence arbitrary queries. Some databases, like CouchDB, require you to think ahead of the queries you will need and prepare them inside the DB.
All in all, it boils down to expecting a denormalized schema and optimizing reads for that situation, and this works well for data that is not highly relational and that requires much more reads than writes.
This link explains a lot moreover where:
RDBMS -> data integrity is a key feature (which can slow down some operations like writing)
NoSQL -> Speed and horizontal scalability are imperative (So speed is really high with this imperatve)
AAAND... The thing about NoSQL is that NoSQl cannot be compared to SQL in any way. NoSQL is name of all persistence technologies that are not SQL. Document DBs, Key-Value DBs, Event DBs are all NoSQL. They are all different in almost all aspects, be it structure of saved data, querying, performance and available tools.
Hope it is useful to understand

In summary, NoSQL databases are built to easily scale across a large number of servers (by sharding/horizontal partitioning of data items), and to be fault tolerant (through replication, write-ahead logging, and data repair mechanisms). Furthermore, NoSQL supports achieving high write throughput (by employing memory caches and append-only storage semantics), low read latencies (through caching and smart storage data models), and flexibility (with schema-less design and denormalization).
From:
Open Journal of Databases (OJDB)
Volume 1, Issue 2, 2014
www.ronpub.com/journals/ojdb
ISSN 2199-3459
https://estudogeral.sib.uc.pt/bitstream/10316/27748/1/Which%20NoSQL%20Database.pdf - page 19

A higher write throughput can also be credited to the internal data structures that power the database storage engine.
Even though B-tree implementations used by some RDBMS have stood the test of time, LSM-trees used in some key-value datastores are typically faster for writes:
1: When a write comes, you add it to the in-memory balanced tree, called memtable.
2: When the memtable grows big, it is flushed to the disk.
To understand this data structure better, please check this video and this answer.

Disadvantages of using table compression

Are there any disadvantages of using table compression such as Row compression and Page compression, for example:
ALTER TABLE A
REBUILD WITH (DATA_COMPRESSION = PAGE) --or ROW
If the above command could leverage the performance of the sql query, why don't we use that every time we create a new table even though it may not effect a table with few data pages.
Any disadvantages of using this?
Thanks
Summary:
check either #paulbarbin's answer or check the conclusion part of this post here
As we can see, row- and page-level compression can be powerful tools
to help you reduce space taken by your data and improve the execution
speed, but at the expense of CPU time. This is because each access of
a row or page requires a step to undo the compression (or calculate
and match hashes) and this translates directly into compute time. So,
when deploying row- or page-level compression, conduct some similar
testing (you are welcome to use my framework!) and see how it plays
out in your test environment. Your results should inform your decision
- if you're already CPU-bound, can you afford to deploy this? If your storage is on fire, can you afford NOT to?

Compression does come with an overhead. There is additional CPU required to complete the compression and based on the limitations of compression, you might find that the gain is less than the pain. However, it's my understanding that most people benefit from page compression for most scenarios and use row compression in specific circumstances. I'd say try it in your dev/test environment, determine your cost on CPU and savings in queries and implement if it makes sense.

when a page level compression gets applied to a table,row level compression get also applied.The benefits of page compression depend on the type of data compressed. Data that involves many repeating values will be more compressed than data populated by more unique values.One more thing, Data compression change the query plan because the data is compressed in different number pages and rows.Additional CPU requires to retrieve compressed data exist.
I suggest, go with the compression only when you have a big warehouse table that contain millions of records and you/your application don't need to query the table frequently. You can also use partition level compression when it's partitioned table.

Scan / Update operation statistics on DB2 and Oracle

I am looking for a way to get statistics from both Oracle and DB2 databases for select/update/insert/delete operations count performed on every table. To say in other way, I would like to know how many scan operations were performed on given table vs. how many modifying operations were executed.
I had found that it is possible to it do in MS SQL Server as described in http://msdn.microsoft.com/en-us/library/dd894051%28v=sql.100%29.aspx
The reason I need it, is because it provides reasonable statistic if it is worthwhile to apply compression for a given table. The better the scan / update ratio - the better candidate the table is. I think this also holds true for other databases.
So is it possible to get these statistics in Oracle or/and DB2 ? Thanks in advance.

In Oracle you can see how many update/delete/inserts have been on a table in sys.dba_tab_modifications. The data is flushed to the table every 4 hours. For the reads you can use dba_hist_seg_stat, part of AWR. Use of this is licensed.
sys.dba_tab_modifications is reset once a table gets new optimizer statistics.

My answer applies to the DB2 database engines for Linux, UNIX, and Windows (LUW) platforms, not DB2 for iSeries (AS/400) or DB2 for z/OS, which have significantly different engine internals than the LUW platforms. All of the documentation links I've included reference version 9.7 of DB2 for LUW.
DB2 for LUW provides extensive performance and utilization statistics in every version of the data engine, including the no-cost DB2 Express-C product. The collection of these statistics is governed by a series of database engine settings called system monitor switches. The statistics you seek involve the table monitor switch, and possibly also the statement and UOW (unit of work) monitor switches. When those system monitor switches are enabled, you can retrieve running totals of various performance gauges and counters from snapshot monitors or by selecting from administrative SQL views (in the SYSIBMADM schema) that present the same snapshot monitor output as SQL result sets. The snapshot monitors incur less system overhead than event monitors, which run in the background as a trace and store a stream of detailed information to special tables or files.
Compression is a licensed feature that alters the internal storage of tables and indexes all the way from the tablespace to the buffer pool (RAM cache) to the transaction log file. In most cases, the additional CPU overhead of compression and decompression is more than offset by the overall reduction in I/O. The deep row compression feature compresses rows in tables by building and using a 12-bit dictionary of multi-byte patterns that can even cross column boundaries. Enabling deep row compression for a table typically reduces its size by 40% or more before DBA intervention. Indexes are compressed through a shorthand algorithm that exploits their sorted nature by omitting common leading bytes between the current and previous index keys.

MongoDB and PostgreSQL thoughts

I've got an app fully working with PostgreSQL. After reading about MongoDB, I was interested to see how the app would work with it. After a few weeks, I migrated the whole system to MongoDB.
I like a few things with MongoDB. However, I found certain queries I was doing in PostgreSQL, I couldn't do efficiently in MongoDB. Especially, when I had to join several tables to calculate some logic. For example, this.
Moreover, I am using Ruby on Rails 3, and an ODM called Mongoid. Mongoid is still in beta release. Documentation was good, but again, at times I found the ODM to be very limiting compared to what Active Record offered with traditional (SQL) database systems.
Even to this date, I feel more comfortable working with PostgreSQL than MongoDB. Only because I can join tables and do anything with the data.
I've made two types of backups. One with PostgreSQL and the other with MongoDB. Some say, some apps are more suitable with one or the other type of db. Should I continue with MongoDB and eventually hope for its RoR ODM (Mongoid) to fully mature, or should I consider using PostgreSQL?
A few more questions:
1) Which one would be more suitable for developing a social networking site similar to Facebook.
2) Which one would be more suitable for 4-page standard layout type of website (Home, Products, About, Contact)

You dumped a decades-tested, fully featured RDBMS for a young, beta-quality, feature-thin document store with little community support. Unless you're already running tens of thousands of dollars a month in servers and think MongoDB was a better fit for the nature of your data, you probably wasted a lot of time for negative benefit. MongoDB is fun to toy with, and I've built a few apps using it myself for that reason, but it's almost never a better choice than Postgres/MySQL/SQL Server/etc. for production applications.

Let's quote what you wrote and see what it tells us:
"I like a few things with Mongodb. However, I found certain queries I was
doing in PostgreSql, I couldn't do efficiently in Mongodb. Especially,
when I had to join several tables to calculate some logic."
"I found the ODM to be very limiting compared to what Active Record offered
with traditional (SQL) database systems."
"I feel more comfortable working with PostgreSql than Mongodb. Only because
I can join tables and do anything with the data."
Based on what you've said it looks to me like you should stick with PostgreSQL. Keep an eye on MongoDB and use it if and when it's appropriate. But given what you've said it sounds like PG is a better fit for you at present.
Share and enjoy.

I haven't used MongoDB yet, and may never get round to it as I haven't found anything I can't do with Postgres, but just to quote the PostgreSQL 9.2 release notes:
With PostgreSQL 9.2, query results can be returned as JSON data types.
Combined with the new PL/V8 Javascript and PL/Coffee database
programming extensions, and the optional HStore key-value store, users
can now utilize PostgreSQL like a "NoSQL" document database, while
retaining PostgreSQL's reliability, flexibility and performance.
So looks like in new versions of Postgres you can have the best of both worlds. I haven't used this yet either but as a bit of a fan of PostgreSQL (excellent docs / mailing lists) I wouldn't hesitate using it for almost anything RDBMS related.

First of all postgres is an RDBMS and MongoDB is NoSQL .
but Stand-alone NoSQL technologies do not meet ACID standards because they sacrifice critical data protections in favor of high throughput performance for unstructured applications.
Postgres 9.4 providing NoSQL capabilities along with full transaction support, storing JSON documents with constraints on the fields data.
so you will get all advantages from both RDBMS and NoSQL
check it out for detailed article http://www.aptuz.com/blog/is-postgres-nosql-database-better-than-mongodb/
To experience Postgres' NoSQL performance for yourself. Download the pg_nosql_benchmark at GitHub. here is the link https://github.com/EnterpriseDB/pg_nosql_benchmark

We also have research on the same that which is better. PostGres or MongoDb. but with all facts and figures in hand, we found that PostGres is far better to use than MongoDb. in MongoDb, beside eats up memory and CPU, it also occupies large amount of disk space. It's increasing 2x size of disk on certain interval.

My experience with Postgres and Mongo after working with both the databases in my projects .
Postgres(RDBMS)
Postgres is recommended if your future applications have a complicated schema that needs lots of joins or all the data have relations or if we have heavy writing. Postgres is open source, faster, ACID compliant and uses less memory on disk, and is all around good performant for JSON storage also and includes full serializability of transactions with 3 levels of transaction isolation.
The biggest advantage of staying with Postgres is that we have best of both worlds. We can store data into JSONB with constraints, consistency and speed. On the other hand, we can use all SQL features for other types of data. The underlying engine is very stable and copes well with a good range of data volumes. It also runs on your choice of hardware and operating system. Postgres providing NoSQL capabilities along with full transaction support, storing JSON documents with constraints on the fields data.
General Constraints for Postgres
Scaling Postgres Horizontally is significantly harder, but doable.
Fast read operations cannot be fully achieved with Postgres.
NO SQL Data Bases
Mongo DB (Wired Tiger)
MongoDB may beat Postgres in dimension of “horizontal scale”. Storing JSON is what Mongo is optimized to do. Mongo stores its data in a binary format called BSONb which is (roughly) just a binary representation of a superset of JSON. MongoDB stores objects exactly as they were designed. According to MongoDB, for write-intensive applications, Mongo says the new engine(Wired Tiger) gives users an up to 10x increase in write performance(I should try this), with 80 percent reduction in storage utilization, helping to lower costs of storage, achieve greater utilization of hardware.
General Constraints of MongoDb
The usage of a schema less storage engine leads to the problem of implicit schemas. These schemas aren’t defined by our storage engine but instead are defined based on application behavior and expectations.
Stand-alone NoSQL technologies do not meet ACID standards because they sacrifice critical data protections in favor of high throughput performance for unstructured applications. It’s not hard to apply ACID on NoSQL databases but it would make database slow and inflexible up to some extent.
“Most of the NoSQL limitations were optimized in the newer versions and releases which have overcome its previous limitations up to a great extent”.
Which one would be more suitable for developing a social networking site similar to Facebook?
Facebook currently uses combination of databases like Hive and Cassandra.
Which one would be more suitable for 4-page standard layout type of website (Home, Products, About, Contact)
Again it depends how you want to store and process your data. but any SQL or NOSQL database would do the job.

gDatabase Optimization: Need a really big database to test some of the features of sql server

I have done database optimization for dbs upto 3GB size. Need a really large database to test optimization.

Simply generating a lot of data and throwing it into a table proves nothing about the DBMS, the database itself, the queries being issued against it, or the applications interacting with them, all of which factor into the performance of a database-dependent system.
The phrase "I have done database optimization for [databases] up to 3 GB" is highly suspect. What databases? On what platform? Using what hardware? For what purposes? For what scale? What was the model? What were you optimizing? What was your budget?
These same questions apply to any database, regardless of size. I can tell you first-hand that "optimizing" a 250 GB database is not the same as optimizing a 25 GB database, which is certainly not the same as optimizing a 3 GB database. But that is not merely on account of the database size, it is because databases that contain 250 GB of data invariably deal with requirements that are vastly different from those addressed by a 3 GB database.
There is no magic size barrier at which you need to change your optimization strategy; every optimization requires in-depth knowledge of the specific data model and its usage requirements. Maybe you just need to add a few indexes. Maybe you need to remove a few indexes. Maybe you need to normalize, denormalize, rewrite a couple of bad queries, change locking semantics, create a data warehouse, implement caching at the application layer, or look into the various kinds of vertical scaling available for your particular database platform.
I submit that you are wasting your time attempting to create a "really big" database for the purposes of trying to "optimize" it with no specific requirements in mind. Various data-generation tools are available for when you need to generate data fitting specific patterns for testing against a specific set of scenarios, but until you have that information on hand, you won't accomplish very much with a database full of unorganized test data.

The best way to do this is to create your schema and write a script to populate it with lots of random(ish) dummy data. Random, meaning that your text-fields don't necessarily have to make sense. 'ish', meaning that the data distribution and patterns should generally reflect your real-world DB usage.
Edit: a quick Google search reveals a number of commercial tools that will do this for you if you don't want to write your own populate scripts: DB Data Generator, DTM Data Generator. Disclaimer: I've never used either of these and can't really speak to their quality or usefulness.

Here is a free procedure I wrote to generate Random person names. Quick and dirty, but it works and might help.
http://www.joebooth-consulting.com/products/genRandNames.sql

I use Red-Gate's Data Generator regularly to test out problems as well as loads on real systems and it works quite well. That said, I would agree with Aaronnaught's sentiment in that the overall size of the database isn't nearly as important as the usage patterns and the business model. For example, generating 10 GB of data on a table that will eventually get no traffic will not provide any insight into optimization. The goal is to replicate the expected transaction and storage loads you anticipate to occur in order to identify bottlenecks before they occur.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas