Writing values in serial fails to update immediately - cratedb

I am trying to write many values into the create-io DB using a python script.
Since crate does not support an auto-incrementer for fiels like ID, I query the last ID and use (+1) for the next one.
However, when I send a "insert into..." command, the data is not written immediately. So even if I close the connection and call
select count(id) from mytable
I still receive the old id counter.
For now, I am forced to use
time.sleep(0.5)
after each insert, which is "not good" .
Can someone guide me here into a direction?
Thanks alot!

CrateDB is eventual consistent (https://crate.io/docs/reference/en/0.54.4/storage_consistency.html) but provides e.g. read-after write consistency.
so if you query a document by its primary key - it will be immediately available.
if this is not enough or you can't query by primary key you can issue the REFRESH TABLEcommand - but this comes at a performance penalty.

Related

Why does the id serial primary key keep changing?

I have just started a full stack web developer course which includes PostgreSQL. I have been give some practice questions to do and when I clicked on run SQL it displays the id, first_name and last_name but when I entered in more lines of code to answer more questions and clicked on run SQL again, the id number changed to a completely different number and I don't understand why this is happening.
In the practice questions I was instructed to add more rows and then to update the entry with an id of 2 to something else but how can I update id 2 if the id numbers keep changing? id 2 wasn't even on the screen. What I understand of id serial primary key is that it auto increments the id when you add new rows but in this case the id keeps changing to random numbers, why does it do this? The screenshots are code from the course, not what I entered. http://sqlfiddle.com/#!17/a114f/2 this is the link but I am not sure if you anyone who has not signed up to the course can access it. Sorry if this is a really simple newbie question but I have spent a lot of time looking online and I really need to move forward.
As far as I can tell this is a bug in SQLFiddle.
Apparently the table definition (or something else) is shared with other users. If you do the same e.g. using db<>fiddle you always get the same ID after dropping and re-creating the tables:
db<>fiddle demo
SQLFiddle has never worked reliably for me anyway. Plus it seems to be stuck on a really old Postgres version. So you might use something different to practice your SQL skills or do your homework.
Like a_horse_with_no_name, I too prefer db<>fiddle for SQL code sharing. But if you're restricted to sqlfiddle for whatever reason, you can add a setval() command to your code to force the seeding value.
select setval('drivers_id_seq',1);
INSERT INTO drivers (first_name, last_name) VALUES ('Amy', 'Hua');
SELECT * from drivers;
See example here (link). Note that drivers_id_seq is a system-generated name that you can guess pretty easily (should you need to reseed the serial you create on another object).
In SQL Fiddle every time you click the button on the right its going to "rerun" your code.
However SQL Fiddle doesn't guarantee the primary keys are going to be the same, and isolates your code in such a way that definitely is causing the pk to increment.
http://sqlfiddle.com/#!17/a114f/2 Here's the original fiddle, and if you just jam on that submit button you can see the value changing each time.
Nothing in your code prevents duplication from concurring if you submitted it multiple times, but that would always have more than one row in the discover table.

Search and replace part of string in database - what are the pitfalls?

I need to replace all occurrences "google.com" that are met in the SQL db table Column1 with "newurl". It can be a full cell value, a part of it (substring of varchar()), can be met even several times in a cell.
Based on SO answer search-and-replace-part-of-string-in-database
this is what I need:
UPDATE
MyTable
SET
Column1 = Replace(Column, 'google.com', 'newurl')
WHERE
xxx
However, in that answer it is mentioned that
You will want to be extremely careful when doing this! I highly recommend doing a backup first.
What are the pitfalls of doing this query? Looks like it does the same what any texteditor would do by clicking on Replace All button. I don't think it is possible in my case to check the errors even with reserve copy as I would like to know possible errors in advance.
Any reasons to be careful with this query?
Again, I expect it replaces all occurences of google.com with 'newurl' in the Column1 of MyTable table in the SQL db.
Thank you.
Just create a test table, as a replica of your original source table, complete the update on there and check results.
You would want to do this as good SQL programming practice to ensure you don't mess up columns that should not be updated.
Another thing you can do is get a count of the records before hand that fit the criteria using a SELECT statement.
Run your update statement and if it's a 1-1 match on count, you should be good to go.
The only thing i can think of that would happen negatively in this respect is that additional columns get updated. Your WHERE clause is not specific for us to see, so there's no way to validate that what you're doing will do what you expect it to.
I think the person posting the answer is just being cautious - This will modify the value in Column1 for every row in MyTable, so make sure you mean it when you execute. Another way to be cautious would be to wrap it in a transaction so you could roll it back if you don't like the results.

Get last few query results in SQL

I frequently do a static analysis of SQL databases, during which I have the luxury of nobody being able to change the data except me.
However, I have not found a way to 'tell' this to SQL in order to prevent running the same query multiple times.
Here is what I would like to do, first I start with a complicated query that has a very small output.
SELECT * FROM MYTABLE WHERE MYPROPERTY = 1234
Then I run a simple query from the same window (Mostly using SQL server studio if that is relevant)
SELECT 1
Now I suddenly realize that I forgot to save the results from my first complicated (slow) query.
As I know the underlying data did not change (or even if it did) I would like to look one step back and simply get the result. However at the moment I don't know any trick to do this and I have to run the entire query again.
So the question summary is: How can I (automatically store/)get the results from recently executed queries.
I am particulary interested in simple select queries, and would be happy to allocate say 100MB memory for automated result storage. Would prefer a solution that works in SQL server studio with T-SQL, but other SQL solutions are also welcome.
EDIT: I am not looking for a way to manually prevent this from happening. In the cases where I can anticipate the problem it will not happen.
This can't be done in Microsoft SQL Server. SQL Server does not cache results, instead it caches data pages that were accessed by your query. This should make your query go a lot faster the second time around so it won't be as painful to re-run it.
In other databases, such as Oracle and MySQL, they do have a query caching mechanism that will allow you to retrieve the results directly the second time around.
I run into this frequently, I often just throw the results of longer-running queries into a temp table:
SELECT *
INTO #results1
FROM MYTABLE WHERE MYPROPERTY = 1234
SELECT *
FROM #results1
If the query is very long-running I might use a 'real' table. It's a good way to save on re-run time.
Downside is that it adds to your query.
You can also send query results to a file in SSMS, info on formatting the output is here: SSMS Results to File
The easiest way to do this is to run each query in its own SSMS window, the results will stay there until you close it, or run out of memory - besides that, I am not sure there is a way to accomplish what you want.
Once you close the SSMS window, I don't believe there is a way to get back 'cached' results.
This isn't a technical answer to your question. Having written queries and looking at results for many years, I am in the habit of saving the results in Excel, regardless of the database/query tool I'm using.
The format in Excel is rather methodical:
Each worksheet has the date. (Called something like "1 Jul".)
Each spreadsheet contains one month. (Typically with the month name like "work-201307".)
In the "B" column I copy and paste the query.
Underneath, in the "C" column, I copy and paste the results.
The next query starts a few lines after, one after the other.
I put the queries in the "B" column, so I can go to the "A" column and use to get to the first row. I put the results in the "C" column, so I can go to the "B" column and use to move between queries.
I mostly do this so I can go back and see the work I did many months ago. For instance, someone sends an email from February and says "do this again". I can go back to the February spreadsheet, go to the day it was created, and see what I was doing at that time.
In your question, though, I realize that I now instinctively solve this problem, because the "right click on the grid, copy with column headers, alt-tab to excel, alt-V" is a behavior that I comes quite naturally.
I was going to suggest you to run each query into a script with a counter (stored in a table) increased each time the query is executed (i.e. i++) and storing each query in a Temp Table called "tmpTable" + i, but it sounds very complicated to manage. Am I right?
Then I googled and I've found this Tool Pack: I didn't try it but you could take a look:
http://www.ssmstoolspack.com/Features
Hope it helps.
EDIT: added the folliwing link. There's the option to output as XML file and they mention SQL Server Integration Services as a possible solution too.
http://michaeljswart.com/2012/03/sending-query-results-to-others/#method5
SECOND EDIT: There's this DBMS-Independent tool too, it sounds interesting:
http://www.sql-workbench.net/
i am not sure this is what you want. Anyway check my answer
In sql server management studio you can open multiple tabs for executing queries. Open new tab for each query, then the result of executed queries will be available under that tab.
After executing one query in a tab dont use that tab for new query, open new tab for that job.
Have you considered using some kind of offline SQL client such as Excel? Specifically, Excel will retrieve the results into the spread sheet (using the Data ribbon/menus) where they are stored pretty much permanently as results. It will prompt you to refresh when necessary or you can do it on demand.
Your question as to whether it can be done in T/SQL or other databases depends on the database and results cache and even then they are options that the query processor can use not guarantees to the individual query.

Fill your tables with junk data?

I am lazy, sometimes excruciatingly lazy but hey (ironically) this is how we get stuff done right?
Had a simple idea that may or not be out there. If it is I would like to know and if not perhaps I will make it.
When working with my MSSQL database sometimes I want to test the performance of various transactions over tables and view and procedures etc... Does anyone know if there is a way to fill a table up with x rows of junk data mearly to experiment with.
One could simple enough..
INSERT INTO `[TABLE]`
SELECT `COLUMNS` FROM [`SOURCE_TABLE`]
Or do some kind of...
DECLARE count int
SET count = 0
WHILE count <= `x`
BEGIN
INSERT INTO `[TABLE]`
(...column list...)
VALUES
(...VALUES (could include the count here as a primary key))
SET count = count + 1
END
But it seems like there is or should already be something out there. Any ideas??
I use redgate
SQL Data generator
Use a Data Generation Plan (a feature of Visual Studio database projects).
WinSQL seems to have a data generator (which I did not test) and has a free version. But the Test data generation wizard seems to be reserved to the Pro version.
My personal favorite would be to generate a CSV file (using a 4.5 lines script) and load it into your SQL DB using BULK INSERT. This will also allow better customization of the data as sometimes is needed (e.g. when writing tests).

Can I maintain state between calls to a SQL Server UDF?

I have a SQL script that inserts data (via INSERT statements currently numbering in the thousands) One of the columns contains a unique identifier (though not an IDENTITY type, just a plain ol' int) that's actually unique across a few different tables.
I'd like to add a scalar function to my script that gets the next available ID (i.e. last used ID + 1) but I'm not sure this is possible because there doesn't seem to be a way to use a global or static variable from within a UDF, I can't use a temp table, and I can't update a permanent table from within a function.
Currently my script looks like this:
declare #v_baseID int
exec dbo.getNextID #v_baseID out --sproc to get the next available id
--Lots of these - where n is a hardcoded value
insert into tableOfStuff (someStuff, uniqueID) values ('stuff', #v_baseID + n )
exec dbo.UpdateNextID #v_baseID + lastUsedn --sproc to update the last used id
But I would like it to look like this:
--Lots of these
insert into tableOfStuff (someStuff, uniqueID) values ('stuff', getNextID() )
Hardcoding the offset is a pain in the arse, and is error prone. Packaging it up into a simple scalar function is very appealing, but I'm starting to think it can't be done that way since there doesn't seem to be a way to maintain the offset counter between calls. Is that right, or is there something I'm missing.
We're using SQL Server 2005 at the moment.
edits for clarification:
Two users hitting it won't happen. This is an upgrade script that will be run only once, and never concurrently.
The actual sproc isn't prefixed with sp_, fixed the example code.
In normal usage, we do use an id table and a sproc to get IDs as needed, I was just looking for a cleaner way to do it in this script, which essentially just dumps a bunch of data into the db.
I'm starting to think it can't be done that way since there doesn't seem to be a way to maintain the offset counter between calls. Is that right, or is there something I'm missing.
You aren't missing anything; SQL Server does not support global variables, and it doesn't support data modification within UDFs. And even if you wanted to do something as kludgy as using CONTEXT_INFO (see http://weblogs.sqlteam.com/mladenp/archive/2007/04/23/60185.aspx), you can't set that from within a UDF anyway.
Is there a way you can get around the "hardcoding" of the offset by making that a variable and looping over the iteration of it, doing the inserts within that loop?
If you have 2 users hitting it at the same time they will get the same id. Why didn't you use an id table with an identity instead, insert into that and use that as the unique (which is guaranteed) id, this will also perform much faster
sp_getNextID
never ever prefix procs with sp_, this has performance implication because the optimizer first checks the master DB to see if that proc exists there and then th local DB, also if MS decide to create a sp_getNextID in a service pack yours will never get executed
It would probably be more work than it's worth, but you can use static C#/VB variables in a SQL CLR UDF, so I think you'd be able to do what you want to do by simply incrementing this variable every time the UDF is called. The static variable would be lost whenever the appdomain unloaded, of course. So if you need continuity of your ID from one day to the next, you'd need a way, on first access of NextId, to poll all of tables that use this ID, to find the highest value.