How to append to a large varchar(max) column efficiently - sql

Consider having a table like this:
CREATE TABLE Product (
Id int PRIMARY KEY CLUSTERED,
InvoicesStr varchar(max)
)
which InvoicesStr is concatenated Ids of the invoices containing this Product.
I know it is not a proper design, but I just brought it up to demonstrate the problem I want to describe.
So the table data would be something like this:
Product
Id | InvoicesStr
----|-------------------------------------
1 | 4,5,6,7,34,6,78,967,3,534,
2 | 454,767,344,567,89676,4435,3,434,
After selling millions of products the InvoicesStr would contain a very large string.
Consider a situation in which for a row, this column contains a very big string, say a 1GB string.
I want to know about the performance for such an update query:
UPDATE Product
SET InvoiceStr = InvoiceStr + '584,'
WHERE Id = 100
Is the performance of this query dependent on the size of InvoiceStr? Or is SQL Server smart enough to just append the new string and not replace it completely?

You can use the little-known .WRITE syntax to append or modify text/data in a max column.
This does an efficient append or replace (minimally logged if possible), and is useful for modifying large values. Note that SQL Server modifies only whole 8k pages, so the minimum amount of modified data would be 8k (unless the existing data exactly filled a page).
For example
UPDATE Product
SET InvoicesStr.WRITE('100,', NULL, NULL)
WHERE Id = 2;
db<>fiddle
In reality, there is usually little reason to actually use this syntax, because you would not normally have such a denormalized design. And if you were storing something like pictures or audio, you would just replace the whole value.

Related

in sql in a table, in a given column with data type text, how can we show the rest of the entries in that column after a particular entry

in sql, in any given table, in a column named "name", wih data type as text
if there are ten entries, suppose an entry in the column is "rohit". i want to show all the entries in the name column after rohit. and i do not know the row id or id. can it be done??
select * from your_table where name > 'rohit'
but in general you should not treat text columns like that.
a database is more than a collection of tables.
think about how to organize your data, what defines a datarow.
maybe, beside their name, there is another thing how you would classify such a row? some things like "shall be displayed?" "is modified" "is active"?
so if you had a second column, say display of type int and your table looked like
CREATE TABLE MYDATA
NAME TEXT,
DISPLAY INT NOT NULL DEFAULT(1);
you could flag every row with 1 or 0 whether it should be displayed or not and then your query could look like
SELECT * FROM MYDATA WHERE DISPLAY=1 ORDER BY NAME
to get your list of values.
it's not much of a difference with ten rows, you don't even need indexes here, but if you build something bigger, say 10,000+ rows, you'd be surprised how slow that would become!
in general, TEXT columns are good to select and display, but should be avoided as a WHERE condition as much as you can. Use describing columns, preferrably int fields which can be indexed with extreme high efficiency and an application doesn't get slower even if the record size goes over 100k.
You can use "default" keyword for it.
CREATE TABLE Persons (
ID int NOT NULL,
name varchar(255) DEFAULT 'rohit'
);

How do I increase 1 value over multiple rows?

I have a item tax table that records the different tax rates for different counties in our state. Each row has an ID number (1-130). Our front end software always orders the tax options by this number when we want it alphabetical. Most of our rows were added that way but I want to be able to insert rows.
Thus I need to add 1 to every entry after a certain number (e.g. 37-130 need to all increase by one). Unfortunately, this is the primary key. Is it possible to increase this value on all of them easily? Or in a loop? I'll have to do this repeatedly as we're moving about a dozen entries if possible.
UPDATE ItemTax
SET ID = ID + 1
WHERE ID = Last ID number
Treating your question as academic, and not endorsing this as an actual solution, you can do this:
UPDATE ItemTax
SET ID = ID + 1
WHERE ID > 37
Depending upon how you use this id, it might be better to leave original ID column unchanged. E.g.
alter table TaxItem add NewID int null
GO
update TaxItem set NewID =
case
when ID between 37 and 130 then ID + 1
else ID
end
Now you don't have to update foreign key relationships, etc.
You see, as ID usually represents a surrogate key, and should never have its value changed in a good design. So your desire to change it value leads to suspicion that you do not understand your design as well as you should. -- We all start from ignorance, I have bad some very poor decisions in the past.
If this is the only change there will ever be for NewID, you don't even need a physical column, a computed column would serve well. But if this is the first mod of many a physical column is likely a better choice.
You also mention inserting rows. Build in some room to insert rows and change values as needed because you have room to rearrange rows by tweaking values without having to renumber entire blocks of rows just to insert a single row, e.g.
update TaxItem set NewID = ID * 100

Column Copy and Update vs. Column Create and Insert

I have a table with 32 Million rows and 31 columns in PostgreSQL 9.2.10. I am altering the table by adding columns with updated values.
For example, if the initial table is:
id initial_color
-- -------------
1 blue
2 red
3 yellow
I am modifying the table so that the result is:
id initial_color modified_color
-- ------------- --------------
1 blue blue_green
2 red red_orange
3 yellow yellow_brown
I have code that will read the initial_color column and update the value.
Given that my table has 32 million rows and that I have to apply this procedure on five of the 31 columns, what is the most efficient way to do this? My present choices are:
Copy the column and update the rows in the new column
Create an empty column and insert new values
I could do either option with one column at a time or with all five at once. The columns types are either character varying or character.
The columns types are either character varying or character.
Don't use character, that's a misunderstanding. varchar is ok, but I would suggest just text for arbitrary character data.
Any downsides of using data type "text" for storing strings?
Given that my table has 32 million rows and that I have to apply this
procedure on five of the 31 columns, what is the most efficient way to do this?
If you don't have objects (views, foreign keys, functions) depending on the existing table, the most efficient way is create a new table. Something like this ( details depend on the details of your installation):
BEGIN;
LOCK TABLE tbl_org IN SHARE MODE; -- to prevent concurrent writes
CREATE TABLE tbl_new (LIKE tbl_org INCLUDING STORAGE INCLUDING COMMENTS);
ALTER tbl_new ADD COLUMN modified_color text
, ADD COLUMN modified_something text;
-- , etc
INSERT INTO tbl_new (<all columns in order here>)
SELECT <all columns in order here>
, myfunction(initial_color) AS modified_color -- etc
FROM tbl_org;
-- ORDER BY tbl_id; -- optionally order rows while being at it.
-- Add constraints and indexes like in the original table here
DROP tbl_org;
ALTER tbl_new RENAME TO tbl_org;
COMMIT;
If you have depending objects, you need to do more.
Either was, be sure to add all five at once. If you update each in a separate query you write another row version each time due to the MVCC model of Postgres.
Related cases with more details, links and explanation:
Updating database rows without locking the table in PostgreSQL 9.2
Best way to populate a new column in a large table?
Optimizing bulk update performance in PostgreSQL
While creating a new table you might also order columns in an optimized fashion:
Calculating and saving space in PostgreSQL
Maybe I'm misreading the question, but as far as I know, you have 2 possibilities for creating a table with the extra columns:
CREATE TABLE
This would create a new table and filling could be done using
CREATE TABLE .. AS SELECT.. for filling with creation or
using a separate INSERT...SELECT... later on
Both variants are not what you seem to want to do, as you stated solution without listing all the fields.
Also this would require all data (plus the new fields) to be copied.
ALTER TABLE...ADD ...
This creates the new columns. As I'm not aware of any possibility to reference existing column values, you will need an additional UPDATE ..SET... for filling in values.
So, I' not seeing any way to realize a procedure that follows your choice 1.
Nevertheless, copying the (column) data just to overwrite them in a second step would be suboptimal in any case. Altering a table adding new columns is doing minimal I/O. From this, even if there would be a possibility to execute your choice 1, following choice 2 promises better performance by factors.
Thus, do 2 statements one ALTER TABLE adding all your new columns in on go and then an UPDATE providing the new values for these columns will achieve what you want.
create new column (modified colour), it will have a value of NULL or blank on all records,
run an update statement, assuming your table name is 'Table'.
update table
set modified_color = 'blue_green'
where initial_color = 'blue'
if I am correct this can also work like this
update table set modified_color = 'blue_green' where initial_color = 'blue';
update table set modified_color = 'red_orange' where initial_color = 'red';
update table set modified_color = 'yellow_brown' where initial_color = 'yellow';
once you have done this you can do another update (assuming you have another column that I will call modified_color1)
update table set 'modified_color1'= 'modified_color'

SQL query: have results into a table named the results name

I have a very large database I would like to split up into tables. I would like to make it so when I run a distinct, it will make a table for every distinct name. The name of the table will be the data in one of the fields.
EX:
A --------- Data 1
A --------- Data 2
B --------- Data 3
B --------- Data 4
would result in 2 tables, 1 named A and another named B. Then the entire row of data would be copied into that field.
select distinct [name] from [maintable]
-make table for each name
-select [name] from [maintable]
-copy into table name
-drop row from [maintable]
Any help would be great!
I would advise you against this.
One solution is to create indexes, so you can access the data quickly. If you have only a handful of names, though, this might not be particularly effective because the index values would have select almost all records.
Another solution is something called partitioning. The exact mechanism differs from database to database, but the underlying idea is the same. Different portions of the table (as defined by name in your case) would be stored in different places. When a query is looking only for values for a particular name, only that data gets read.
Generally, it is bad design to have multiple tables with exactly the same data columns. Here are some reasons:
Adding a column, changing a type, or adding an index has to be done times instead of one time.
It is very hard to enforce a primary key constraint on a column across the tables -- you lose the primary key.
Queries that touch more than one name become much more complicated.
Insertions and updates are more complex, because you have to first identify the right table. This often results in overuse of dynamic SQL for otherwise basic operations.
Although there may be some simplifications (security comes to mind), most databases have other mechanisms that are superior to splitting the data into separate tables.
what you want is
CREATE TABLE new_table
AS (SELECT .... //the data that you want in this table);

can I insert a copy of a row from table T into table T without listing its columns and without primary key error?

I want to do something like this:
INSERT INTO T SELECT * FROM T WHERE Column1 = 'MagicValue' -- (multiple rows may be affected)
The problem is that T has a primary key column and so this causes an error as if trying to set the primary key. And frankly, I don't want to set the primary key either. I want to create entirely new rows with new primary keys but the rest of the fields being copied over from the original rows.
This is supposed to be generic code applicable to various tables. Well, so if there is no nice way of doing this, I will just write code to dynamically extract column names, construct the list etc. But maybe there is? Am I the first guy trying to create duplicate rows in a database or something?
I'm assuming by "Primary Key" you mean identity or guid data types that auto-assign or auto-increment.
Without some very fancy dynamic SQL, you can't do what you are after. If you want to insert everything but the identity field, you need to specify fields.
If you want to specify a value for that field, you need to specify all the fields in the SELECT and in the INSERT AND turn on IDENTITY_INSERT.
You don't gain anything from duplicating a row in a database (considering you didn't try to set the Primary Key). It would be wiser and will avoid problem to have another column called "amount" or something.
something like
UPDATE T SET Amount = Amount + 1 WHERE Column1 = 'MagicValue'
or if it can increase by more than 1 like amount of returned fields
Update T SET Amount = Amount * 2 WHERE Column1 = 'MagicValue'
I'm not sure what you're trying to do exactly but if the above doesn't work for what you're doing I think your design requires a new table and insert it there.
EDIT: Also as mentioned under your comments, a generic insert doesn't really make sense. Imagine, for this to work, you need the same number of fields, and they will hold the same values suggesting that they should also have the same names(even if it wouldn't require it to). It would basically be the same table structure twice.