So I know just enough SQL to be dangerous and not much else. I'm developing an application that needs to display sections of data from a SQLite database, and be able to iterate through some of the data based on certain criteria.
So I have a table that looks like this:
create table packets( id INTEGER PRIMARY KEY, timestamp varchar(32), type varchar(32), source varchar(32), destination varchar(32), channel varchar(16), first_sequence integer, last_sequence integer, missing integer );
Then I have a view that will display some or all of the data based on certain criteria.
We're talking about a table with millions of rows, so I can't load it all into memory. So I have a view class that asks for each item individually based upon the index in the view.
So for instance, if I am displaying all of the packets that have a channel value of 'C', and the view is ready to draw the first item I would issue this query to the database:
SELECT * FROM packets WHERE channel='C' LIMIT 1 OFFSET 0
When it's ready to draw the second item I do
SELECT * FROM packets WHERE channel='C' LIMIT 1 OFFSET 1
etc...
This works fine. I realize it's probably not the most optimal way to do it, but I'm just trying to learn how to do it the simplest way with SQLite first and then worry about optimizing later.
Now the problem is I need to be able to find items within this result set based on another search criteria, and get the index within the original result set.
For instance I need to be able to iterate through the items within the original result set that has a "missing" value of 1, and figure out the view index of those items.
If I'm using the entire data set in the table I think I can do it with something similar to this:
SELECT rowid FROM packets WHERE rowid > [currentlySelectedRowID] AND missing=1 LIMIT 1
But when I need an index within a subset of data, the rowid doesn't really help me, and I have no idea how to find the index within the subset without iterating through all of the items individually myself.
Any ideas on how to go about doing this with a SQL query, or pointers to relevant documentation or tutorials would be greatly appreciated.
Thanks!
(note: the question may not make sense as it's a little convoluted in my mind of how to explain it, so if something is not clear I will try to elaborate.)
Edit: actually I found that doing this in C code is sufficient from a performance perspective by querying the entire result set after the initial index by doing a LIMIT -1 OFFSET [startIndex], and then stepping through until I find the next missing item. It would still be nice to know if there is a way to do it with just an SQL statement, though, for future reference.
Note that if a SELECT statement that returns more than one row does not have an ORDER BY clause, the order in which the rows are returned is undefined.
The order appears unimportant to your initial query, but will become important when you need to create subset of the rows (since you want to use the same initial set for the domain of the subset).
Use
SELECT rowid,* FROM packets WHERE channel='C' ORDER BY rowid LIMIT N? OFFSET M?
to impose an order on the results. Then you can do
SELECT rowid
FROM (SELECT rowid,* FROM packets WHERE channel='C' ORDER BY rowid LIMIT N? OFFSET M?)
WHERE missing=1 LIMIT 1
to find a subset within those.
Addendum, re: does the "rowid" returned from the main SELECT statement reflect the rowid in this temporary table?
Yes...
sqlite> create table packets (channel, missing);
sqlite> insert into packets values ('A',0);
sqlite> insert into packets values ('B',0);
sqlite> insert into packets values ('C',0);
sqlite> select * from packets;
A|0
B|0
C|0
sqlite> create temp table tt as SELECT rowid,* FROM packets WHERE channel='C';
sqlite> select rowid,* from tt;
3|3|C|0
sqlite> insert into packets values ('C',1);
sqlite> drop table tt;
sqlite> create temp table tt as SELECT rowid,* FROM packets WHERE channel='C';
sqlite> select rowid,* from tt;
3|3|C|0
4|4|C|1
sqlite>
Related
I have table in Oracle database which is called my_table for example. It is type of log table. It has an incremental column which is named "id" and "registration_number" which is unique for registered users. Now I want to get latest changes for registered users so I wrote queries below to accomplish this task:
First version:
SELECT t.*
FROM my_table t
WHERE t.id =
(SELECT MAX(id) FROM my_table t_m WHERE t_m.registration_number = t.registration_number
);
Second version:
SELECT t.*
FROM my_table t
INNER JOIN
( SELECT MAX(id) m_id FROM my_table GROUP BY registration_number
) t_m
ON t.id = t_m.m_id;
My first question is which of above queries is recommended and why? And second one is if sometimes there is about 70.000 insert to this table but mostly the number of inserted rows is changing between 0 and 2000 is it reasonable to add index to this table?
An analytical query might be the fastest way to get the latest change for each registered user:
SELECT registration_number, id
FROM (
SELECT
registration_number,
id,
ROW_NUMBER() OVER (PARTITION BY registration_number ORDER BY id DESC) AS IDRankByUser
FROM my_table
)
WHERE IDRankByUser = 1
As for indexes, I'm assuming you already have an index by registration_number. An additional index on id will help the query, but maybe not by much and maybe not enough to justify the index. I say that because if you're inserting 70K rows at one time the additional index will slow down the INSERT. You'll have to experiment (and check the execution plans) to figure out if the index is worth it.
In order to check for faster query, you should check the execution plan and cost and it will give you a fair idea. But i agree with solution of Ed Gibbs as analytics make query run much faster.
If you feel this table is going to grow very big then i would suggest partitioning the table and using local indexes. They will definitely help you to form faster queries.
In cases where you want to insert lots of rows then indexes slow down insertion as with each insertion index also has to be updated[I will not recommend index on ID]. There are 2 solutions i have think of for this:
You can drop index before insertion and then recreate it after insertion.
Use reverse key indexes. Check this link : http://oracletoday.blogspot.in/2006/09/there-is-option-to-create-index.html. Reverse key index can impact your query a bit so there will be trade off.
If you look for faster solution and there is a really need to maintain list of last activity for each user, then most robust solution is to maintain separate table with unique registration_number values and rowid of last record created in log table.
E.g. (only for demo, not checked for syntax validity, sequences and triggers omitted):
create table my_log(id number not null, registration_number number, action_id varchar2(100))
/
create table last_user_action(refgistration_number number not null, last_action rowid)
/
alter table last_user_action
add constraint pk_last_user_action primary key (registration_number) using index
/
create or replace procedure write_log(p_reg_num number, p_action_id varchar2)
is
v_row_id rowid;
begin
insert into my_log(registration_number, action_id)
values(p_reg_num, p_action_id)
returning rowid into v_row_id;
update last_user_action
set last_action = v_row_id
where registration_number = p_reg_num;
end;
/
With such schema you can simple query last actions for every user with good performance:
select
from
last_user_action lua,
my_log l
where
l.rowid (+) = lua.last_action
Rowid is physical storage identity directly addressing storage block and you can't use it after moving to another server, restoring from backups etc. But if you need such functionality it's simple to add id column from my_log table to last_user_action too, and use one or another depending on requirements.
I need to periodically update a local cache with new additions to some DB table. The table rows contain an auto-increment sequential number (SN) field. The cache keeps this number too, so basically I just need to fetch all rows with SN larger than the highest I already have.
SELECT * FROM table where SN > <max_cached_SN>
However, the majority of the attempts will bring no data (I just need to make sure that I have an absolutely up-to-date local copy). So I wander if this will be more efficient:
count = SELECT count(*) from table;
if (count > <cache_size>)
// fetch new rows as above
I suppose that selecting by an indexed numeric field is quite efficient, so I wander whether using count has benefit. On the other hand, this test/update will be done quite frequently and by many clients, so there is a motivation to optimize it.
this test/update will be done quite frequently and by many clients
this could lead to unexpected race competition for cache generation
I would suggest
upon new addition to your table add the newest id into a queue table
using like crontab to trigger the cache generation by checking queue table
upon new cache generated, delete the id from queue table
as you stress majority of the attempts will bring no data, the above will only trigger where there is new addition
and the queue table concept, even can expand for update and delete
I believe that
SELECT * FROM table where SN > <max_cached_SN>
will be faster, because select count(*) may call table scan. Just for clarification, do you never delete rows from this table?
SELECT COUNT(*) may involve a scan (even a full scan), while SELECT ... WHERE SN > constant can effectively use an index by SN, and looking at very few index nodes may suffice. Don't count items if you don't need the exact total, it's expensive.
You don't need to use SELECT COUNT(*)
There is two solution.
You can use a temp table that has one field that contain last count of your table, and create new Trigger after insert on your table and inc temp table field in Trigger.
You can use a temp table that has one field that contain last SN of your table is cached and create new Trigger after insert on your table and update temp table field in Trigger.
not much to this really
drop table if exists foo;
create table foo
(
foo_id int unsigned not null auto_increment primary key
)
engine=innodb;
insert into foo values (null),(null),(null),(null),(null),(null),(null),(null),(null);
select * from foo order by foo_id desc limit 10;
insert into foo values (null),(null),(null),(null),(null),(null),(null),(null),(null);
select * from foo order by foo_id desc limit 10;
I need to do a process on all the records in a table. The table could be very big so I rather process the records page by page. I need to remember the records that have already been processed so there are not included in my second SELECT result.
Like this:
For first run,
[SELECT 100 records FROM MyTable]
For second run,
[SELECT another 100 records FROM MyTable]
and so on..
I hope you get the picture. My question is how do I write such select statement?
I'm using oracle btw, but would be nice if I can run on any other db too.
I also don't want to use store procedure.
Thank you very much!
Any solution you come up with to break the table into smaller chunks, will end up taking more time than just processing everything in one go. Unless the table is partitioned and you can process exactly one partition at a time.
If a full table scan takes 1 minute, it will take you 10 minutes to break up the table into 10 pieces. If the table rows are physically ordered by the values of an indexed column that you can use, this will change a bit due to clustering factor. But it will anyway take longer than just processing it in one go.
This all depends on how long it takes to process one row from the table of course. You could chose to reduce the load on the server by processing chunks of data, but from a performance perspective, you cannot beat a full table scan.
You are most likely going to want to take advantage of Oracle's stopkey optimization, so you don't end up with a full tablescan when you don't want one. There are a couple ways to do this. The first way is a little longer to write, but let's Oracle automatically figure out the number of rows involved:
select *
from
(
select rownum rn, v1.*
from (
select *
from table t
where filter_columns = 'where clause'
order by columns_to_order_by
) v1
where rownum <= 200
)
where rn >= 101;
You could also achieve the same thing with the FIRST_ROWS hint:
select /*+ FIRST_ROWS(200) */ *
from (
select rownum rn, t.*
from table t
where filter_columns = 'where clause'
order by columns_to_order_by
) v1
where rn between 101 and 200;
I much prefer the rownum method, so you don't have to keep changing the value in the hint (which would need to represent the end value and not the number of rows actually returned to the page to be accurate). You can set up the start and end values as bind variables that way, so you avoid hard parsing.
For more details, you can check out this post
I've got 10 tables that I'm joining together to create a view. I'm only selecting the ID from each table, but in the view, each ID can show more than once, however the combination of all ID's will always be unique. Is there a way to create another column in this view that will be a unique ID?
I'd like to be able to store the unique ID and use it to query against the view in order to get all the other ID's.
I had a similar issue where I needed to establish a hierarchy across multiple tables. If you are using an integer as the id in each of the tables, you could simply convert the ids of each table to a varchar and prefix them with a different letter for each table. For instance
CREATE VIEW LocationHierarchy as
SELECT 'C' + CONVERT(VARCHAR,[Id]) as Id
,[Name]
,'S' + CONVERT(VARCHAR,[State]) as parent
FROM [City]
UNION
SELECT 'S' + CONVERT(VARCHAR,[Id]) as Id
,[Name]
,'C' + CONVERT(VARCHAR,[Suburb]) as parent
FROM [Suburb]
etc
The effectiveness of this solution will depend on how large your dataset is.
I think you can do that using ROW_NUMBER(), at least if you can guarantee an ordering of the view. For example:
SELECT
ROW_NUMBER() OVER (ORDER BY col1, col2, col3) as UniqueId
FROM <lotsa joins>
As long as the order stays the same, and only fields are added at the end, the id will be unique.
Yes, recently I have had the same requirement.
When creating the view, keep the select statement as a TEMP_TABLE and then use Row_number() function on this TEMP_TABLE.
Here's an example:
CREATE VIEW VIEW_NM
AS
SELECT Row_number() OVER(ORDER BY column_nm DESC) AS 'RowNumber' FROM
(SELECT COL1,COL2 FROM TABLE1
UNION
SELECT COL1,COL2 FROM TABLE2
) AS TEMP_TABLE;
No you cannot do that in a view, what you can do is create a temporary table to hold all this information and create a key or unique id there for each row.
The fact that you're attempting to do this points to much bigger problems in the database design and/or application architecture.
Since you have 10 tables and I'm going to guess that the DB designer(s) just slapped on ID INT IDENTITY to all of the tables you will end up having about (2^31)^10 possible rows in the table.
The only data type that I can think of that might cover that number would be to translate all of the integers into 0-padded strings and put them all together as a big CHAR.
My guess though is that your real problem isn't getting this ID for a view but some other thing that you're attempting to do, which is the question that you should be asking. Just a hunch though.
One possibility would be to call a function from your view that calculates a hash code on all of the ID columns. If you use a decent cryptographic hash algorithm, the odds of a collision are minuscule (probably more likely that the disk would deliver bad data). Even easier, you could of course just concatenate the different IDs into a single, much larger ID, perhaps as a binary or varbinary column.
Storing that ID and being able to query against it would be a bit more work. I can't think of a way to both compute it and store it from a view. You would probably need a stored procedure to create it first; the details depend heavily on the specifics of your app.
I'm using an Informix (Version 7.32) DB. On one operation I create a temp table with the ID of a regular table and a serial column (so I would have all the IDs from the regular table numbered continuously). But I want to insert the info from the regular table ordered by ID something like:
CREATE TEMP TABLE tempTable (id serial, folio int );
INSERT INTO tempTable(id,folio)
SELECT 0,folio FROM regularTable ORDER BY folio;
But this creates a syntax error (because of the ORDER BY)
Is there any way I can order the info then insert it to the tempTable?
UPDATE: The reason I want to do this is because the regular table has about 10,000 items and in a jsp file, it has to show every record, but it would take to long, so the real reason I want to do this is to paginate the output. This version of Informix doesn't have Limit nor Skip. I can't renumber the serial because is in a relationship, and this is the only solution we could get a fixed number of results on one page (for example 500 results per page). In the Regular table has skipped id's (called folio) because they have been deleted. if i were to put
SELECT * FROM regularTable WHERE folio BETWEEN X AND Y
I would get maybe 300 in one page, then 500 in the next page
You can do this by breaking up the SQL into two temp tables:
CREATE TEMP TABLE tempTable1 (
id serial,
folio int);
SELECT folio FROM regularTable ORDER BY folio
INTO TEMP tempTable2;
INSERT INTO tempTable1(id,folio) SELECT 0,folio FROM tempTable2;
In Informix when using a SELECT as a sub-clause in an INSERT statement, you are limited
to a subset of the SELECT syntax.
The following SELECT clauses are not supported in this case:
INTO TEMP
ORDER BY
UNION.
Additionally, the FROM clause of the SELECT can not reference the same table as referenced by the INSERT (not that this matters in your case).
It's been years since I worked on Informix, but perhaps something like this will work:
INSERT INTO tempTable(id,folio)
SELECT 0, folio
FROM (
SELECT folio FROM regularTable ORDER BY folio
);
You might try it iterating a cursor over the SELECT ... ORDER BY and doing the INSERTs within the loop.
It makes no sense to order the rows as you insert into a table. Relational databases do not allow you to specify the order of rows in a table.
Even if you could, SQL does not guarantee a query will return rows in any order, such as the order you inserted them. You must specify an ORDER BY clause to guarantee an order for a query result.
So it would do you no good to change the order in which you insert the rows.
As stated by Bill, there's not a lot of point ordering the input, you really need to order the output. In the simplistic example you've provided, it just makes no sense, so I can only assume that the real problem you're trying to solve is more complex - deduplication perhaps?
The functionality you're after is CREATE SEQUENCE, but I'm pretty sure it's not available in such an old version of Informix.
If you really need to do what you're asking, you could look into UNLOADing the data in the required order, and then LOADing it again. That would ensure the SERIAL values get allocated sequentially.
Would something like this work?
SELECT
folio
FROM
(
SELECT
ROWNUM n,
folio
FROM
regularTable
ORDER BY
folio
)
WHERE
n BETWEEN 501 AND 1000
It may not be terribly efficient if the table grows larger or you're fetching later "pages", but 10K rows is pretty small.
I don't recall if Informix has a ROWNUM concept, I use Oracle.