SQL - Selecting random rows and combining into a new table

SQL - Selecting random rows and combining into a new table - sql

Here's the creation of my tables...
CREATE TABLE questions(
id INTEGER PRIMARY KEY AUTOINCREMENT,
question VARCHAR(256) UNIQUE NOT NULL,
rangeMin INTEGER,
rangeMax INTEGER,
level INTEGER NOT NULL,
totalRatings INTEGER DEFAULT 0,
totalStars INTEGER DEFAULT 0
);
CREATE TABLE games(
id INTEGER PRIMARY KEY AUTOINCREMENT,
level INTEGER NOT NULL,
inclusive BOOL NOT NULL DEFAULT 0,
active BOOL NOT NULL DEFAULT 0,
questionCount INTEGER NOT NULL,
completedCount INTEGER DEFAULT 0,
startTime DATETIME DEFAULT CURRENT_TIMESTAMP
);
CREATE TABLE gameQuestions(
gameId INTEGER,
questionId INTEGER,
asked BOOL DEFAULT 0,
FOREIGN KEY(gameId) REFERENCES games(id),
FOREIGN KEY(questionId) REFERENCES questions(id)
);
I'll explain the full steps that I'm doing, and then I'll ask for input.
I need to...
Using a games.id value, lookup the games.questionCount and games.level for that game.
Now since I have games.questionCount and games.level, I need to look at all of the rows in questions table with questions.level = games.level and select games.questionCount of them at random.
Now with the rows (aka questions) I got from step 2, I need to put them into gameQuestions table using the games.id value and the questions.id value.
Whats the best way to accomplish this? I could do it with several different sql queries, but I feel like someone really skilled with sql could make it happen a bit more efficient. I am using sqlite3.

This does it in one statement. Let's assume :game_id to be the game id you want to process.
insert into gameQuestions (gameId, questionId)
select :game_id, id
from questions
where level = (select level from games where id = :game_id)
order by random()
limit (select questionCount from games where id = :game_id);
#Tony: sqlite doc says LIMIT takes an expression. The above statement works fine using sqlite 3.8.0.2 and produces the desired results. I have not tested an older version.

Related

How to insert a new row into the table has unique id autoincrement?

After creating the table with a unique ID autoincrement, I realize my table lack a row. But I don't know how to do this without compromising the order of other rows in the table!
TABLE flights
id INTEGER PRIMARY KEY AUTOINCREMENT,
origin TEXT NOT NULL,
destination TEXT NOT NULL,
duration INTEGER NOT NULL
I want to insert a row: 2|Shanghai|Paris|760 into the table with id = 2.
1|New York|London|415
2|Istanbul|Tokyo|700
3|New York|Paris|435
4|Moscow|Paris|245
5|Lima|New York|455
Table I wished:
1|New York|London|415
2|Shanghai|Paris|760
3|Istanbul|Tokyo|700
4|New York|Paris|435
5|Moscow|Paris|245
6|Lima|New York|455
Thanks for any advice to me!

No way you can do this with auto-increment ID because IDS are not to order rows, but to identify the rows and assert it's the only row with that ID. If you want to, use a new specific column for this purpose, this way the IDs still the same and you can sort using anything as indexes.
CREATE TABLE flights (
id INTEGER AUTO_INCREMENT,
index INTEGER NOT NULL,
origin TEXT NOT NULL,
destination TEXT NOT NULL,
duration INTEGER NOT NULL,
PRIMARY KEY (id),
UNIQUE KEY unique_index (index)
);

GroupBy query for billion records - Vertica

I am working on an application where records are in billions and I need to make a query where GroupBy clause is needed.
Table Schema:
CREATE TABLE event (
eventId INTEGER PRIMARY KEY,
eventTime INTEGER NOT NULL,
sourceId INTEGER NOT NULL,
plateNumber VARCHAR(10) NOT NULL,
plateCodeId INTEGER NOT NULL,
plateCountryId INTEGER NOT NULL,
plateStateId INTEGER NOT NULL
);
CREATE TABLE source (
sourceId INTEGER PRIMARY KEY,
sourceName VARCHAR(32) NOT NULL
);
Scenario:
User will select sources, suppose source ID (1,2,3)
We need to get all events which occurred more than once for those source for event time range
Same event criteria (same platenumber, platecodeId, platestateId, plateCountryId)
I have prepared a query to perform above mentioned operation but its taking long time to execute.
select plateNumber, plateCodeId, plateStateId,
plateCountryId, sourceId,count(1) from event
where sourceId in (1,2,3)
group by sourceId, plateCodeId, plateStateId,
plateCountryId, plateNumber
having count(1) > 1 limit 10 offset 0
Can you recommend optimized query for it?

Since you didn't supply the projection DDL, I'll assume the projection is default and created by the CREATE TABLE statement
Your goal is to achieve the use of the GROUPBY PIPELINED algorithm instead of GROUPBY HASH which is usually slower and consumes more memory.
To do so, you need the table('s projection) to be sorted by the columns in the group by clause.
More info here: GROUP BY Implementation Options
CREATE TABLE event (
eventId INTEGER PRIMARY KEY,
eventTime INTEGER NOT NULL,
sourceId INTEGER NOT NULL,
plateNumber VARCHAR(10) NOT NULL,
plateCodeId INTEGER NOT NULL,
plateCountryId INTEGER NOT NULL,
plateStateId INTEGER NOT NULL
)
ORDER BY sourceId,
plateCodeId,
plateStateId,
plateCountryId,
plateNumber;
You can see which algorithm is being used by adding EXPLAIN before your query.

SQLITE3: find IDs across multiple tables

I would like to do analysis of what codes appear in multiple tables under certains conditions. However I don't think the database schema suits the task very well but maybe there's something I don't know about that can help me. Here's a simplified schema:
CREATE TABLE "batchDescription" (
id INTEGER NOT NULL,
name TEXT NOT NULL UNIQUE,
PRIMARY KEY (id)
);
CREATE TABLE "simulationDetails" (
id INTEGER NOT NULL,
ko_index_id INTEGER NOT NULL,
batch_description_id INTEGER NOT NULL,
data1 REAL NOT NULL,
data2 INTEGER NOT NULL,
PRIMARY KEY (id)
FOREIGN KEY(ko_index_id) REFERENCES "koIndex" (id)
FOREIGN KEY(batch_description_id) REFERENCES "batchDescription" (id)
);
CREATE TABLE "koIndex" (
id INTEGER NOT NULL,
number_of_kos INTEGER NOT NULL,
PRIMARY KEY (id)
);
CREATE TABLE "1kos" (
ko_index_id INTEGER NOT NULL,
ko1 INTEGER NOT NULL,
PRIMARY KEY (ko_index_id)
FOREIGN KEY(ko_index_id) REFERENCES "koIndex" (id)
);
CREATE TABLE "2kos" (
ko_index_id INTEGER NOT NULL,
ko1 INTEGER NOT NULL,
ko2 INTEGER NOT NULL,
PRIMARY KEY (ko_index_id)
FOREIGN KEY(ko_index_id) REFERENCES "koIndex" (id)
);
CREATE TABLE "3kos" (
ko_index_id INTEGER NOT NULL,
ko1 INTEGER NOT NULL,
ko2 INTEGER NOT NULL,
ko3 INTEGER NOT NULL,
PRIMARY KEY (ko_index_id)
FOREIGN KEY(ko_index_id) REFERENCES "koIndex" (id)
);
This goes up to table "525kos" which has ko1 to ko525 in it - ko1 to ko525 are IDs that are primary keys in a table not shown here. I want to do an analysis of how often certain IDs are present under certain conditions. Here is a simple example to illustrate:
I would like to like to count the amount of times a certain ID (let's say 127) (in any koX column) in the "13kos" table occurs when simulationDetails.data1 not equal to 0. I would do this on a database called ko.db from the bash command line like:
for ko_idx in {1..13}; do sqlite3 ko.db "select count(ko${ko_idx}) from '13kos' where ko${ko_idx} = 127 and ko_index_id in (select ko_index_id from simulationDetails where data1 != 0);"; done
Already this is slow and inefficient but is simple compared to what I would like to do. What if I wanted to do an analysis of all the IDs in all possible columns in all "Xkos" tables and compare them to where data1 is equal and not equal to zero?
Can anybody direct me to a better way of doing this or is the schema design just not very good for this kind of analysis and I'll have to give up?
EDIT: Thought I'd add a bit of extra detailto avoid confusion. I suspect that a good way to achieve want I want would be to somehow combine all the "Xkos" tables into one temporary table and then search for certain IDs from that table. How would I combine all 525 ko tables without writing out each table name?

How would I combine all 525 ko tables without writing out each table
name?
Create a table with the same number of columns as the largest table (the table into which you merge) allowing nulls.
query the sqlite_master table using something like :-
SELECT * from sqlite_master WHERE name LIKE '%kos%' AND type = 'table'
Loop through the extracted table names building an INSERT SELECT for each table that will insert the rows from the tables into the table created in 1.
See 2. INSERT INTO table SELECT ...; especially in regard to handling missing columns.
All done, the table created in 1 will be populated accordingly.

Optimizing GROUP BY in hsqldb

I have a table with 700K+ records on wich a simple GROUP BY query takes in excess of 35+ seconds to execute. I'm out of ideas on how to optimize this.
SELECT TOP 10 called_dn, COUNT(called_dn) FROM reportview.calls_out GROUP BY called_dn;
Here I add TOP 10 to limit network transfer induced delays.
I have an index on called_dn (hsqldb seems not to be using this).
called_dn is non nullable.
reportview.calls_out is a cached table.
Here's the table script:
CREATE TABLE calls_out (
pk_global_call_id INTEGER GENERATED BY DEFAULT AS SEQUENCE seq_global_call_id NOT NULL,
sys_global_call_id VARCHAR(65),
call_start TIMESTAMP WITH TIME ZONE NOT NULL,
call_end TIMESTAMP WITH TIME ZONE NOT NULL,
duration_interval INTERVAL HOUR TO SECOND(0),
duration_seconds INTEGER,
call_segments INTEGER,
calling_dn VARCHAR(25) NOT NULL,
called_dn VARCHAR(25) NOT NULL,
called_via_dn VARCHAR(25),
fk_end_status INTEGER NOT NULL,
fk_incoming_queue INTEGER,
call_start_year INTEGER,
call_start_month INTEGER,
call_start_week INTEGER,
call_start_day INTEGER,
call_start_hour INTEGER,
call_start_minute INTEGER,
call_start_second INTEGER,
utc_created TIMESTAMP WITH TIME ZONE,
created_by VARCHAR(25),
utc_modified TIMESTAMP WITH TIME ZONE,
modified_by VARCHAR(25),
PRIMARY KEY (pk_global_call_id),
FOREIGN KEY (fk_incoming_queue)
REFERENCES lookup_incoming_queue(pk_id),
FOREIGN KEY (fk_end_status)
REFERENCES lookup_end_status(pk_id));
I'm I stuck with this kind of performance or is there something I might try to speed up this query?
EDIT: Here's the query plan if it helps:
isDistinctSelect=[false]
isGrouped=[true]
isAggregated=[true]
columns=[ COLUMN: REPORTVIEW.CALLS_OUT.CALLED_DN not nullable
COUNT arg=[ COLUMN: REPORTVIEW.CALLS_OUT.CALLED_DN nullable]
[range variable 1
join type=INNER
table=CALLS_OUT
cardinality=771855
access=FULL SCAN
join condition = [index=SYS_IDX_SYS_PK_10173_10177]]]
groupColumns=[COLUMN: REPORTVIEW.CALLS_OUT.CALLED_DN]
offset=[VALUE = 0, TYPE = INTEGER]
limit=[VALUE = 10, TYPE = INTEGER]
PARAMETERS=[]
SUBQUERIES[]

Well, as it seems there's no way to avoid a full column scan in this situation.
Just for reference of future souls reaching this question, here's what I resorted to in the end:
Created a summary table maintained by INSERT / DELETE triggers in the original table. This in combination with suitable indexes and using LIMIT USING INDEX clauses in my queries yields very good performance.

Extremly poor insert performance with SQLite3

I use SQLlite3 (exactly 3.9.2 2015-11-02) in my application. For test purposes I have a few tables.
One of them has schema as follows:
CREATE TABLE "Recordings" (
`PartId` INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT,
`CameraId` INTEGER NOT NULL,
`StartTime` INTEGER NOT NULL,
`EndTime` INTEGER NOT NULL,
`FilePath` TEXT NOT NULL UNIQUE,
`FileSize` INTEGER NOT NULL,
`DeleteLockManual` INTEGER NOT NULL DEFAULT 0,
`Event1` INTEGER NOT NULL,
`Event2` INTEGER NOT NULL,
`Event3` INTEGER NOT NULL,
`Event4` INTEGER NOT NULL,
`Event5` INTEGER NOT NULL,
`Event6` INTEGER NOT NULL,
FOREIGN KEY(`CameraId`) REFERENCES Devices ( CameraId )
);
CREATE INDEX `Table_Event2` ON `Table` (`Event2`);
CREATE INDEX `Table_Event3` ON `Table` (`Event3`);
CREATE INDEX `Table_Event4` ON `Table` (`Event4`);
CREATE INDEX `Table_Event5` ON `Table` (`Event5`);
CREATE INDEX `Table_Event6` ON `Table` (`Event6`);
CREATE INDEX `Table_Event1` ON `Table` (`Event1`);
CREATE INDEX `Table_DeleteLockManual` ON `Table` (`DeleteLockManual`);
CREATE INDEX `Table_EndTime` ON `Table` (`EndTime`);
The hardware I'm using is pretty old: Pentium 4 2.4GHZ, 512MB RAM, old 40GB Hard drive.
The Recordings table contains ~60k rows.
When I'm doing INSERT on this table, from time to time (one per 30) query takes extremely long to finish. Last time it took 23 sec (sic!) for a single 1-row INSERT. For the rest times it takes ~120ms.
I dumped stack during this operation:
[<e0a245a8>] jbd2_log_wait_commit+0x88/0xbc [jbd2]
[<e0a258ce>] jbd2_complete_transaction+0x69/0x6d [jbd2]
[<e0ac1e8e>] ext4_sync_file+0x208/0x279 [ext4]
[<c020acb1>] vfs_fsync_range+0x64/0x76
[<c020acd7>] vfs_fsync+0x14/0x16
[<c020acfb>] do_fsync+0x22/0x3f
[<c020aed5>] SyS_fdatasync+0x10/0x12
[<c0499292>] syscall_after_call+0x0/0x4
[<ffffffff>] 0xffffffff
Or:
[<c02b3560>] submit_bio_wait+0x46/0x51
[<c02bb54b>] blkdev_issue_flush+0x41/0x67
[<e0ac1ea8>] ext4_sync_file+0x222/0x279 [ext4]
[<c020acb1>] vfs_fsync_range+0x64/0x76
[<c020acd7>] vfs_fsync+0x14/0x16
[<c020acfb>] do_fsync+0x22/0x3f
[<c020aed5>] SyS_fdatasync+0x10/0x12
[<c0499292>] syscall_after_call+0x0/0x4
[<ffffffff>] 0xffffffff
The application using this database is single-threaded.
What can cause such behavior?
Will switching to recent hardware (with ssd) solve this issue?

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SQL - Selecting random rows and combining into a new table - sql

Related

How to insert a new row into the table has unique id autoincrement?

GroupBy query for billion records - Vertica

SQLITE3: find IDs across multiple tables

Optimizing GROUP BY in hsqldb

Extremly poor insert performance with SQLite3

Categories

Resources