I have got two databases with the same structure, but different datas.
And... in both databases all datas have auto-incerement IDs in columns 'Tiles' and 'TilesData' and these IDs are related keys. I have to move rows from first one to another, but there is exception with id.
Is description of my problem clearly?
I tried this (but I'm a little afraid of reliability of this solution, there will be few millions of rows):
INSERT INTO DataBase_2.Tiles (X,Y,Zoom,Type,CacheTime)
SELECT X, Y, Zoom, Type, CacheTime FROM DataBase_1.Tiles;
INSERT INTO DataBase_2.TilesData(Tile)
SELECT Tile FROM DataBase_1.TilesData;
Could you help me or give me some tips? Is simply SQL enough?
When the autoincrementing columns are declared as INTEGER PRIMARY KEY (without AUTOINCREMENT), then new IDs will get the next value after the largest value already in the table.
Check if both tables have the same maximum ID value:
SELECT MAX(id) FROM DataBase_2.Tiles;
SELECT MAX(id) FROM DataBase_2.TilesData;
If they are equal, then corresponding rows will indeed get the same new ID. However, you should use ORDER BY to ensure that the rows are read/inserted in the same order:
INSERT INTO DataBase_2.Tiles (...)
SELECT ... FROM DataBase_1.Tiles ORDER BY id;
INSERT INTO DataBase_2.TilesData(Tile)
SELECT Tile FROM DataBase_1.TilesData ORDER BY id;
If the id columns are declared with AUTOINCREMENT, then you have to check (and if needed, adjust) the next ID values in the sqlite_sequence table.
Related
I'm having an issue with Sequences when inserting data into a Postgres table through SQL Alchemy.
All of the data is inserted fine, the id BIGSERIAL PRIMARY KEY column has all unique values which is great.
However when I query the first 10/20 rows etc. of the table, the id values are not ascending in numeric order. There are gaps in the sequence, fine, that's to be expected, I mean rows will go through values randomly not ascending like:
id
15
22
16
833
30
etc...
I've gone through plenty of SO and Postgres forum posts around this and have only found people talking about having huge serial gaps in their sequences, not about incorrect ascending order when being created
Screenshots of examples:
The table itself has being created through standard DDL statement like so:
CREATE TABLE IF NOT EXISTS schema.table_name (
id BIGSERIAL NOT NULL,
col1 text NOT NULL,
col2 JSONB[] NOT NULL,
etc....
PRIMARY KEY (id)
);
However when I query the first 10/20 rows etc. of the table
Your query has no order by clause, so you are not selecting the first rows of the table, just an undefined set of rows.
Use order by - you will find out that sequence number are indeed assigned in ascending order (potentially with gaps):
select id from ht_data order by id limit 30
In order to actually check the ordering of the sequence, you would actually need another column that stores the timestamp when each row was created. You could then do:
select id from ht_data order by ts limit 30
In general, there is no defined "order" within a SQL table. If you want to view your data in a certain order, you need an ORDER BY clause:
SELECT *
FROM table_name
ORDER BY id;
As for gaps in the sequence, the contract of an auto increment column generally only guarantees that each newly generated id value with be unique and, most of the time (but not necessarily always), will be increasing.
How could you possibly know if the values are "out of order"? SQL tables represent unordered sets. The only indication of ordering in your table is the serial value.
The query that you are running has no ORDER BY. The results are not guaranteed to be in any particular ordering. Period. That is a very simply fact about SQL. That you want the results of a SELECT to be ordered by the primary key or by insertion order is nice, but not how databases work.
The only way you could determine if something were out of order would be if you had a column that separate specified the insert order -- you could have a creation timestamp for instance.
All you have discovered is that SQL lives up to its promise of not guaranteeing ordering unless the query specifically asks for it.
I want to add a new row for each existing rows. The new row will contain a new value for one of the column and same values for the remaining columns.
From the below table, I want to add a new 'ADUser' -> 'manju#apac.corpdir.net' for each 'Tag' value. Lets say, the current number of rows are 4. After the operation, the final number of rows should be 8.
I have been experimenting with some INSERT queries, but I am unsuccessful. Any help is highly appreciated.
Database: Azure SQL Database
Current Table Rows :
Expected Table Rows:
You can use insert:
insert into t (id, tag, aduser)
select id, t.tag, 'manju#apac.corpdir.net'
from t;
I am guessing that created is assigned automatically. If not, you could use:
insert into t (id, tag, aduser, created)
select t.tag, 'manju#apac.corpdir.net', current_timestamp
from t;
It seems odd that you want duplicate values for id, but that is what the question is asking for. This should really identify each row uniquely and be assigned automatically.
I have this fairly large DB. It contains lots of column. One of the will have a value that I need to select, but the DB has several of that value. How can I insert into a column in the row thats the newest in the DB, with a matching column.
Without knowing the ins and outs of your database, I think you would likely want to select the largest id you have in the auto incrementing row. For instance:
SELECT MAX(UNIQUE_ID) FROM TABLE WHERE MATCHING_COLUMN = MATCHING_VALUE
From there you can take your unique ID and insert into that row.
I have a table of millions of rows that is constantly changing(new rows are inserted, updated and some are deleted). I'd like to query 100 new rows(I haven't queried before) every minute but these rows can't be ones I've queried before. The table has a about 2 dozen columns and a primary key.
Happy to answer any questions or provide clarification.
A simple solution is to have a separate table with just one row to store the last ID you fetched.
Let's say that's your "table of millions of rows":
-- That's your table with million of rows
CREATE TABLE test_table (
id serial unique,
col1 text,
col2 timestamp
);
-- Data sample
INSERT INTO test_table (col1, col2)
SELECT 'test', generate_series
FROM generate_series(now() - interval '1 year', now(), '1 day');
You can create the following table to store an ID:
-- Table to keep last id
CREATE TABLE last_query (
last_quey_id int references test_table (id)
);
-- Initial row
INSERT INTO last_query (last_quey_id) VALUES (1);
Then with the following query, you will always fetch 100 rows never fetched from the original table and maintain a pointer in last_query:
WITH last_id as (
SELECT last_quey_id FROM last_query
), new_rows as (
SELECT *
FROM test_table
WHERE id > (SELECT last_quey_id FROM last_id)
ORDER BY id
LIMIT 100
), update_last_id as (
UPDATE last_query SET last_quey_id = (SELECT MAX(id) FROM new_rows)
)
SELECT * FROM new_rows;
Rows will be fetched by order of new IDs (oldest rows first).
You basically need a unique, sequential value that is assigned to each record in this table. That allows you to search for the next X records where the value of this field is greater than the last one you got from the previous page.
Easiest way would be to have an identity column as your PK, and simply start from the beginning and include a "where id > #last_id" filter on your query. This is a fairly straightforward way to page through data, regardless of underlying updates. However, if you already have millions of rows and you are constantly creating and updating, an ordinary integer identity is eventually going to run out of numbers (a bigint identity column is unlikely to run out of numbers in your great-grandchildren's lifetimes, but not all DBs support anything but a 32-bit identity).
You can do the same thing with a "CreatedDate" datetime column, but as these dates aren't 100% guaranteed to be unique, depending on how this date is set you might have more than one row with the same creation timestamp, and if those records cross a "page boundary", you'll miss any occurring beyond the end of your current page.
Some SQL system's GUID generators are guaranteed to be not only unique but sequential. You'll have to look into whether PostgreSQL's GUIDs work this way; if they're true V4 GUIDs, they'll be totally random except for the version identifier and you're SOL. If you do have access to sequential GUIDs, you can filter just like with an integer identity column, only with many more possible key values.
I have a SQL query where I am going to be transferring a fair amount of response data down the wire, but I want to get the total rowcount as quickly as possible to facilitate binding in the UI. Basically I need to get a snapshot of all of the rows that meet a certain criteria, and then be able to page through all of the resulting rows.
Here's what I currently have:
SELECT --primary key column
INTO #tempTable
FROM --some table
--some filter clause
ORDER BY --primary key column
SELECT ##ROWCOUNT
SELECT --the primary key column and some others
FROM #tempTable
JOIN -- some table
DROP TABLE #tempTable
Every once in a while, the query results end up out of order (presumably because I am doing an unordered select from the temp table).
As I see it, I have a couple of options:
Add a second order by clause to the select from the temp table.
Move the order by clause to the second select and let the first select be unordered.
Create the temporary table with a primary key column to force the ordering of the temp table.
What is the best way to do this?
Use number 2. Just because you have a primary key on the table does not mean that the result set from select statement will be ordered (even if what you see actually is).
There's no need to order the data when putting it in the temp table, so take that one out. You'll get the same ##ROWCOUNT value either way.
So do this:
SELECT --primary key column
INTO #tempTable
FROM --some table
--some filter clause
SELECT ##ROWCOUNT
SELECT --the primary key column and some others
FROM #tempTable
JOIN -- some table
ORDER BY --primary key column
DROP TABLE #tempTable
Move the order by from the first select to the second select.
A database isn't a spreadsheet. You don't put the data into a table in a particular order.
Just make sure you order it properly when you get it back out.
Personally I would select out the data in the order you want to eventually have it. So in your first select, have your order by. That way it can take advantage of any existing indexes and access paths.