SQLite Autoincrement Based on Another Column's Value - sql

I basically want to create a table like this
col1|col2
---------
1 1
1 2
1 3
2 1
3 1
2 2
1 4
where column 2 autoincrements, but its autoincrement values are not tied to the overall table but column 1's value. Is this possible?

I thought I found a duplicate question, but it was for PostgreSQL. Apologies for temporarily marking your question as a duplicate. I've reversed that.
I don't know for certain if this is possible in SQLite in an automated way, but one solution would be to do it in steps:
BEGIN a transaction and INSERT one row the table with a NULL for the col2. This should acquire a RESERVED lock and prevent other concurrent processes from doing the same thing and causing a race condition.
SELECT MAX(col2) FROM mytable WHERE col1 = ? to get the greatest value inserted for the given group so far.
UPDATE mytable SET col2 = ?+1 WHERE col1 = ? AND col2 IS NULL Using the MAX discovered in step 2.
COMMIT to write the changes to the file.

Related

Oracle Enforce Uniqueness

I need to enforce uniqueness on specific data in a table (~10 million rows). This example data illustrates the rule -
For code=X the part# cannot be duplicate. For any other code there can be duplicate part#. e.g ID 8 row can't be there but ID 6 row is fine. There are several different codes in the table and part# but uniqueness is desired only for one code=X.
ID CODE PART#
1 A R0P98
2 X R9P01
3 A R0P98
4 A R0P44
5 X R0P44
6 A R0P98
7 X T0P66
8 X T0P66
The only way I see is to create a trigger on the table and check for PART# for code=X before insert or update. However, I fear this solution may slow down inserts and updates on this table.
Appreciate your help!
In Oracle, you can create a unique index on an expression for this:
create unique index myidx
on mytable (case when code = 'X' then part# end);

Selecting records from PostgreSQL after the primary key has cycled?

I have a PostgreSQL 9.5 table that is set to cycle when the primary key ID hits the maximum value. For argument's sake, lets the maximum ID value can be 999,999. I'll add commas to make the numbers easier to read.
We run a job that deletes data from the table that is older than 45 days. Let's assume that the table now only contains records with IDs of 999,998 and 999,999.
The primary key ID cycles back to 1 and 20 more records have been written. I need to keep it generic so I won't make any assumptions about how many were written. In my real world needs, I don't care how many were written.
How can I select the records without getting duplicates with an ID of 999,998 and 999,999?
For example:
SELECT * FROM my_table WHERE ID >0;
Would return (in no particular order):
999,998
999,999
1
2
...
20
My real world case is that I need to publish every record that was written to the table to a message broker. I maintain a separate table that tracks the row ID and timestamp of the last record that was published. The pseudo-query/pseudo-algorithm to determine what new records to write is something like this. The IF statement handles when the primary key ID cycles back to 1 as I need to read the new record written after the ID cycled:
SELECT * from my_table WHERE id > last_written_id
PUBLISH each record
if ID of last record published == MAX_TABLE_ID (e.g 999,999):
??? What to do here? I need to get the newest records where ID >= 1 but less than the oldest record I have
I realise that the "code" is rough, but it's really just an idea at the moment so there's no code.
Thanks
Hmm, you can use the current value of the sequence to do what you want:
select t.*
from my_table t
where t.id > #last_written_id or
(currval(pg_get_serial_sequence('my_table', 'id')) < #last_written_id and
t.id <= currval(pg_get_serial_sequence('my_table', 'id'))
);
This is not a 100% solution. After all, 2,000,000 records could have been added so the numbers will all be repeated or the records deleted. Also, if you have inserts happening while the query is running -- particularly in a multithreaded environment.
Here is a completely different approach: You could completely fill the table, giving it a column for deletion time. So instead of deleting rows, you merely set this datetime. And instead of inserting a row you merely update the one that was deleted the longest time ago:
update my_table
set col1 = 123, col2 = 456, col3 = 'abc', deletion_datetime = null
where deletion_datetime =
(
select deletion_datetime
from my_table
where deletion_datetime is not null
order by deletion_datetime
limit 1
);

How do I rearrange a non continuous seq number in an Oracle table

I have a table with some values and a seq column which orders the values and this gets displayed in the front end, so whenever I delete a record using a function, this seq is not arranged properly.
i.e.
col1:
1
2
3
4
5
after deletion of 2nd and 4th record. It comes as:
Col1:
1
3
5
I tried possible ways to re-order it as:
Col1:
1
2
3..
but its not working
can some one please help in this?
I hope you are not talking about updating a primary key which is a very bad idea.
Otherwise you can do this
update table set col1 = rownum;

Return rows in the exact order they were inserted

I have a simple join table with two id columns in SQL Server.
Is there any way to select all rows in the exact order they were inserted?
If I try to make a SELECT *, even if I don't specify an ORDER BY clause, the rows are not being returned in the order they were inserted, but ordered by the first key column.
I know it's a weird question, but this table is very big and I need to check exactly when a strange behavior has begun, and unfortunately I don't have a timestamp column in my table.
UPDATE #1
I'll try to explain why I'm saying that the rows are not returned in 'natural' order when I SELECT * FROM table without an ORDER BY clause.
My table was something like this:
id1 id2
---------------
1 1
2 2
3 3
4 4
5 5
5 6
... and so on, with about 90.000+ rows
Now, I don't know why (probably a software bug inserted these rows), but my table have 4.5 million rows and looks like this:
id1 id2
---------------
1 1
1 35986
1 44775
1 60816
1 62998
1 67514
1 67517
1 67701
1 67837
...
1 75657 (100+ "strange" rows)
2 2
2 35986
2 44775
2 60816
2 62998
2 67514
2 67517
2 67701
2 67837
...
2 75657 (100+ "strange" rows)
Crazy, my table have now millions of rows. I have to take a look when this happened (when the rows where inserted) because I have to delete them, but I can't just delete using *WHERE id2 IN (strange_ids)* because there are "right" id1 columns that belongs to these id2 columns, and I can't delete them, so I'm trying to see when exactly these rows were inserted to delete them.
When I SELECT * FROM table, it returns me ordered by id1, like the above table, and
the rows were not inserted in this order in my table. I think my table is not corrupted because is the second time that this strange behavior happens the same way, but now I have so many rows that I can delete manually like it was on 1st time. Why the rows are not being returned in the order they were inserted? These "strange rows" were definetely inserted yesterday and should be returned near the end of my table if I do a SELECT * without an ORDER BY, isn't it?
A select query with no order by does not retrieve the rows in any particular order. You have to have an order by to get an order.
SQL Server does not have any default method for retrieving by insert order. You can do it, if you have the information in the row. The best way is a primary key identity column:
TableId int identity(1, 1) not null primary key
Such a column is incremented as each row is inserted.
You can also have a CreatedAt column:
CreatedAt datetime default getdate()
However, this could have duplicates for simultaneous inserts.
The key point, though, is that a select with no order by clause returns an unordered set of rows.
As others have already written, you will not be able to get the rows out of the link table in the order they were inserted.
If there is some sort of internal ordering of the rows in one or both of the tables that this link table is joining, then you can use that to try to figure out when the link table rows have been created. Basically, they cannot have been created BEFORE both of the rows containing the PK:s have been created.
But on the other hand you will not be able to find out how long after they have been created.
If you have decent backups, you could try to restore one or a few backups of varying age and then try to see if those backups also contains this strange behaviour. It could give you at least some clue about when the strangeness has started.
But the bottom line is that using just a select, there is now way to get the row out of a table like this in the order they were inserted.
If SELECT * doesn't return them in 'natural' order and you didn't insert them with a timestamp or auto-incrementing ID then I believe you're sunk. If you've got an IDENTITY field, order by that.
But the question I have is, how can you tell that SELECT * isn't returning them in the order they were inserted?
Update:
Based on your update, it looks like there is no method by which to return records as you wish, I'd guess you've got a clustered index on ID1?
Select *, %%physloc%% as pl from table
order by pl desc

Inserting a row in multiple tables, and maintaining a relationship separately

I am a bit lost trying to insert my data in a specific scenario from an excel sheet into 4 tables, using SSIS.
Each row of my excel sheet needs to be split into 3 tables. The identity column value then needs to be inserted into a 4th mapping table to hold the relationship. How do I achieve this efficiently using SSIS 2008?
Note in the below example, its fixed that both col4 and 5 go into 3rd table.
Here is data example
Excel
col1 col2 col3 col4 col5
a b c d 3
a x c y 5
Table1
PK col
1 a
2 a
Table2
PK col1 col2
1 b c
2 x c
Table3
PK Col
1 d
2 3
3 y
4 5
Map_table
PK Table1_ID Table2_ID Table3_ID
1 1 1 1
2 1 1 2
2 2 2 3
2 2 2 4
I am fine even if just a SQL based approach is suggested, as I do not ave any mandate to use SSIS only. Additional challenge is that in table 2, if a same data row exists, I want to use that ID in the map table, instead of inserting duplicate rows!
Multicast is the component you are looking for. This component takes an input source and DUPLICATE it as many output. You can, in that scenario, have an Excel source and duplicate the flow to insert the data into your Table1, Table2 and Table3.
Now, the tricky part is getting back those identities into your Map_Table. Either you dont use IDENTITY and use some other means (like a GUID, or an incremental counter of your own that you would setup as a derived column before the multicast) or you use the ##IDENTITY to retrive the last inserted identity. Using ##IDENTITY sounds like a pain to me for your current scenario, but that's up to you. If the data is not that huge, I would go for a GUID.
##IDENTITY don't work well with BULK operations. It will retrieve only the last identity created. Also, keep in mind that I talked about ##IDENTITY, but you may want to use IDENT_CURRENT('TableName') instead to retrieve the last identity for a specific table. ##IDENTITY retrieve the last identity created within your session, whatever the scope. You can use SCOPE_IDENTITY() to retrive the last identity within your scope.