H2 database unuseable when table contains 9 million records

H2 database unuseable when table contains 9 million records - sql

I am using H2 as an embedded database started with AUTO_SERVER=TRUE.
At the very start I had only a few records and performance wasn't an issue even with no indexes defined.
When the table had more records performance seriously degraded and this was resolved by adding an index.
The db then performed very well until recently when the no. of records reached more than 8 million and now I have been unable to get any normal performance out of the DB and have tried changing cache_size etc... but no improvement.
I have seen posts that people are using H2 with many millions and even billions of records so is there something basic I am missing? Even basic queries such as select count(*) from HISTORICALDATA2 take so long that I end up cancelling the query.
Here is the table definition:
CREATE TABLE "PUBLIC"."HISTORICALDATA2"
(
REQUESTID integer,
SYMBOL varchar(50) NOT NULL,
EXCHANGE varchar(20),
SECTYPE varchar(10),
CURRENCYNAME varchar(5),
ENDDATETIME varchar(20),
DURATION varchar(20),
BARSIZE varchar(20),
WHATTOSHOW varchar(20),
USERTH integer,
FORMATDATE integer,
CHARTOPTIONS varchar(50),
DATETIMEDATA timestamp,
OPEN_PRICE decimal(20,2),
HIGH_PRICE decimal(20,2),
LOW_PRICE decimal(20,2),
CLOSE_PRICE decimal(20,2),
VOLUME integer,
COUNT_FIELD integer,
WAP integer,
HASGAPS boolean,
TSTAMP timestamp DEFAULT CURRENT_TIMESTAMP()
);
And the index:
CREATE INDEX HD_MAIN ON "PUBLIC"."HISTORICALDATA2"
(
SYMBOL,
EXCHANGE,
ENDDATETIME,
WHATTOSHOW,
DURATION,
BARSIZE
);

H2 and Production
H2 is not usually used as a production database. See Are there any reasons why h2 database shouldn't be used in production? for more details. Many of the answers give valid reasons for not doing this.
Migrating to Another DB
You can migrate your records away from h2 and move them to Postgres, MySQL, or Oracle.
See: How to convert H2Database database file to MySQL database .sql file?

Related

clickhouse table creation error in ubuntu

I installed the clickhouse program in the ubuntu operating system and connected the SQLyog program. I can create a database, but I cannot create a table in it. It gives the following CODE:119 error.
ubuntu :) create table taxonomy_object_firewalls
CREATE TABLE taxonomy_object_firewalls
Query id: 37520dd5-44b9-436f-a5b6-96002f0a4ce7
0 rows in set. Elapsed: 0.001 sec.
Received exception from server (version 22.2.2):
Code: 119. DB::Exception: Received from localhost:9000. DB::Exception: Table engine is not specified in CREATE query. (ENGINE_REQUIRED)

Noting that Rich's syntax had a small unnecessary () after the ENGINE definition but was still successfully created in my testing. This syntax would be more accurate to the documentation:
CREATE DATABASE IF NOT EXISTS helloworld;
CREATE TABLE helloworld.my_first_table
(
user_id UInt32,
message String,
timestamp DateTime,
metric Float32
)
ENGINE = MergeTree
PRIMARY KEY (user_id, timestamp);

Check out the "Getting Started" guide in the docs. ClickHouse is unique in that every table has an Engine. Table engines determine where and how the data is stored. When in doubt, use MergeTree:
CREATE DATABASE IF NOT EXISTS helloworld;
CREATE TABLE helloworld.my_first_table
(
user_id UInt32,
message String,
timestamp DateTime,
metric Float32
)
ENGINE = MergeTree()
PRIMARY KEY (user_id, timestamp)
Also, make sure you read about the primary key and how it works in ClickHouse. Primary keys are not typically unique, but instead determine the sort order.

Handle >10 million rows for one table (postgresql)

I imported 11 Million location names from geonames.org into my postgresql. However when I try to just view the data for instance in TablePlus it is extremely slow. Executing a simple select for one row, takes like 2 minutes. What can I do with large data, so that it won't be too slow and I can select it very fast?
I think I don't have any indexes, would that make a difference?
This is my table:
create table geoname (
geonameid int,
name varchar(200),
asciiname varchar(200),
alternatenames text,
latitude float,
longitude float,
fclass char(1),
fcode varchar(10),
country varchar(2),
cc2 varchar(120),
admin1 varchar(20),
admin2 varchar(80),
admin3 varchar(20),
admin4 varchar(20),
population bigint,
elevation int,
gtopo30 int,
timezone varchar(40),
moddate date
);

You need to specify what the query looks like.
Indexes would definitely make a difference. But the type of index depends on the query you are using and the columns used for selecting one or more rows.
The place to start is by defining a primary key on the table. Presumably, geonameid is the primary key. You can do this:
alter table geonames add constraint pk_geonames_geonameid primary key (geonameid);
You should really do this when you create the table, but better late than never.
If you are searching by geonameid, then you will notice a significant speed-up.
If you want to search by other columns, such as name or asciiname, then add indexes for those:
create index idx_geonames_name on geonames(name);
create index idx_geonames_asciiname on geonames(aciiname);
This doesn't work for all searches. If your criteria is like with wildcards, you may need a different indexing strategy. Similarly, if it is by latitude and longitude, you'll want a GIS index.

Optimizing GROUP BY in hsqldb

I have a table with 700K+ records on wich a simple GROUP BY query takes in excess of 35+ seconds to execute. I'm out of ideas on how to optimize this.
SELECT TOP 10 called_dn, COUNT(called_dn) FROM reportview.calls_out GROUP BY called_dn;
Here I add TOP 10 to limit network transfer induced delays.
I have an index on called_dn (hsqldb seems not to be using this).
called_dn is non nullable.
reportview.calls_out is a cached table.
Here's the table script:
CREATE TABLE calls_out (
pk_global_call_id INTEGER GENERATED BY DEFAULT AS SEQUENCE seq_global_call_id NOT NULL,
sys_global_call_id VARCHAR(65),
call_start TIMESTAMP WITH TIME ZONE NOT NULL,
call_end TIMESTAMP WITH TIME ZONE NOT NULL,
duration_interval INTERVAL HOUR TO SECOND(0),
duration_seconds INTEGER,
call_segments INTEGER,
calling_dn VARCHAR(25) NOT NULL,
called_dn VARCHAR(25) NOT NULL,
called_via_dn VARCHAR(25),
fk_end_status INTEGER NOT NULL,
fk_incoming_queue INTEGER,
call_start_year INTEGER,
call_start_month INTEGER,
call_start_week INTEGER,
call_start_day INTEGER,
call_start_hour INTEGER,
call_start_minute INTEGER,
call_start_second INTEGER,
utc_created TIMESTAMP WITH TIME ZONE,
created_by VARCHAR(25),
utc_modified TIMESTAMP WITH TIME ZONE,
modified_by VARCHAR(25),
PRIMARY KEY (pk_global_call_id),
FOREIGN KEY (fk_incoming_queue)
REFERENCES lookup_incoming_queue(pk_id),
FOREIGN KEY (fk_end_status)
REFERENCES lookup_end_status(pk_id));
I'm I stuck with this kind of performance or is there something I might try to speed up this query?
EDIT: Here's the query plan if it helps:
isDistinctSelect=[false]
isGrouped=[true]
isAggregated=[true]
columns=[ COLUMN: REPORTVIEW.CALLS_OUT.CALLED_DN not nullable
COUNT arg=[ COLUMN: REPORTVIEW.CALLS_OUT.CALLED_DN nullable]
[range variable 1
join type=INNER
table=CALLS_OUT
cardinality=771855
access=FULL SCAN
join condition = [index=SYS_IDX_SYS_PK_10173_10177]]]
groupColumns=[COLUMN: REPORTVIEW.CALLS_OUT.CALLED_DN]
offset=[VALUE = 0, TYPE = INTEGER]
limit=[VALUE = 10, TYPE = INTEGER]
PARAMETERS=[]
SUBQUERIES[]

Well, as it seems there's no way to avoid a full column scan in this situation.
Just for reference of future souls reaching this question, here's what I resorted to in the end:
Created a summary table maintained by INSERT / DELETE triggers in the original table. This in combination with suitable indexes and using LIMIT USING INDEX clauses in my queries yields very good performance.

On duplicate key update feature in H2

I have developed java desktop application with the use of H2(Embedded). I just have basic knowledge about database, so i simply installed H2 and create a schema name RecordAutomation and then add tables to that schema. Now i am trying to use the ON DUPLICATE KEY UPDATE feature for a specific table which is not working giving sql syntax error, i check my query i found it right, given below
INSERT INTO RECORDAUTOMATION.MREPORT
(PRODUCTID ,DESCRIPTION ,QUANTITY ,SUBTOTAL ,PROFIT )
VALUES (22,olper,5,100,260)
ON DUPLICATE KEY UPDATE SET QUANTITY = QUANTITY+5;
i search and try to solve this problem some where it is discussed like this feature does not work for non-default tables. i have no idea about default and non-default. please make help me

You need to use the MySQL mode. To do that, append ;mode=MySQL to the database URL. (This feature is not properly documented yet).
The table needs to have a primary key or at least a unique index. Complete example:
drop table MREPORT;
set mode MySQL;
create table MREPORT(PRODUCTID int primary key,
DESCRIPTION varchar, QUANTITY int, SUBTOTAL int, PROFIT int);
INSERT INTO MREPORT
(PRODUCTID ,DESCRIPTION ,QUANTITY ,SUBTOTAL ,PROFIT )
VALUES (22,'olper',5,100,260)
ON DUPLICATE KEY UPDATE QUANTITY = QUANTITY+5;

tuning search on combination of float and string

I have this schema
RESTAURANT
(id int not null,
name varchar(50),
place varchar(100),
distance float,
a varchar(50),
b varchar(50),
c varchar(50),
d varchar(50),
PRIMARY KEY (id))
and I'm tuning a search function for this table.
a,b,c,d are different field used in the research, but what I need to focus on is place and the distance because most of the query are actually performed on the combination of this two field
I'm using db2, and I'm not really skilled, suggestion where to start from?

What you need is to use indexes. You can execute the the Desing Advisor to see what DB2 proposes about indexes:
http://publib.boulder.ibm.com/infocenter/db2luw/v9r7/topic/com.ibm.db2.luw.admin.perf.doc/doc/c0005144.html
For more information about indexes, you can take a look at:
http://www.ibm.com/developerworks/data/library/dmmag/DMMag_2010_Issue4/DataArchitect/index.html

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

H2 database unuseable when table contains 9 million records - sql

Related

clickhouse table creation error in ubuntu

Handle >10 million rows for one table (postgresql)

Optimizing GROUP BY in hsqldb

On duplicate key update feature in H2

tuning search on combination of float and string

Categories

Resources