Remove a particular column data in SQL without deleting entire row - sql

I have a table in SQL server, where one column contains excel files.
Now we need to remove those excel files only without deleting the entire row. because the size of this table is increasing day by day, we need to
remove old data to decrease the size of this table
Id file_name Code
1001 abc.xlsx A1
1002 das.xlsx A2
1003 kap.xlsl A3
I have done the below
Update rec_table
set file_name = Null where id = '1001'
Will this help to reduce the size of the table?
Thanks

In SQL Server the size of a Table is calculated by adding the Sie of every row in the table. Ie if a Table is having 10 rows, then the Total size of the table would be the sum of the total size of the 10 rows.
For a Row, the Total size is calculated by calculating the size of every column.
For Example in your case, the Size of the row with ID 1001 will be the
Size of the value in Column ID + Size Of Value in Column File_Name + Size of Value in COlumn Code
so if a Column Holds the value NULL for a Particular Row, then that COlumn will have a Data size of 0
updating the values to NULL for a particular column will reduce the size of the column, But how much it gets reduced will depend on the Type of the Column and Data stored in it
Which means, If your Column File_name holds a Data of 100 bytes for Row id 1001, then updating the Value to NULL will reduce the Table size by 100 Bytes
You may use the following queries to find out the table / Row Size
For getting the Size and Details for the whole table
dbcc showcontig ('Person.Person') with tableresults
To Get the Data Size of a Particular column for each Row in the Database
SELECT DATALENGTH(FirstName) FROM Person.Person

Will this reduce the size of the table? Probably.
Will this release free space to the database? Probably not -- until you compact the database.
Is this an expensive operation? Very.
Databases store tables on data pages. Each data page contains one or more rows. If you have wide columns, then these might be stored on their own data pages.
The number of rows that fit on a page depends on the size of the rows. A page is about 8k bytes. If a row is 100 bytes, then a table with 1 row occupies the same space as one with 50 rows.
When you remove a column from a table, the entire table needs to be rewritten. This is a very expensive operation. And it might take a long time. Often it is faster to select the columns that you do want and reload the original table.
Removing a column is -- to me -- a very curious way to reduce the size of a table. More typically, older data would be removed. The most efficient method is to partition the table by time, using whatever the appropriate date/time column might be. Then you can quickly recover space by dropping a partition.

Related

Data space after join function is "huge"

My table after using the join function has grown from two tables with 27mb and 37mb to 2930mb. This table size is too large too use further on with my project for me.
I am running on Microsoft SQL Server and after joining the two tables the data space for the table have grown considerably. There are about 40000 rows in the first table and 428 in the second. Number of rows after joining is 37000. This should be correct.
I suspect it might be because of how the data type for the column has been defined. I imported one of the tables from excel (the one with 40000 rows) and therefore the program only took a sample of the first rows to define the datatype. Some of the rows further below in the table exceeded the given data type given (mostly nvarchar(255) when some rows had about 400 characters). Therefore i changed the first column where now almost every column has a field with 300 characters in the forst row. SQL then automatically changed to nvarchar(MAX) for these columns and it worked.
The join function i used is this one:
SELECT * into NewTable
FROM Table1
JOIN Table2
ON Table2.Row1=Table1.Row2;
I would like to have the same table as result, but not larger than 50mb.
The screenshot is from after the join operation

SQL: Altered columns resulting in invisible columns?

I have recently run into a problem with SQL tables where I had a mistake in my code that dropped a column in the table and after that recreated the column. This process was repeated many times until I discovered it.
But as a result it appears that the SQL table haven't properly deleted the columns. For instance I get this message when I want to add a new column:
"Warning: The table "Matches" has been created, but its maximum row size exceeds the allowed maximum of 8060 bytes. INSERT or UPDATE to this table will fail if the resulting row exceeds the size limit."
And I also have trouble updating column values because they exceed the maximum limits.
And to be clear, the table does not have more than 30 (visible) rows and I only made each column a maximum of 40 varchars, so it appears that the dropped rows still exist somewhere. But how do I delete them?
Thanks in advance.

How can I store records with 500 CLOB fields?

Oracle has a max column limit of 1000 and even with all columns defined as VARCHAR(4000) I was able to create the table and load huge amounts of data in all fields.
I was able to create a table in SQL Server with 500 varchar(max) columns, however when I attempt to insert data, I got the following error:
Cannot create a row of size 13075 which is greater than the allowable
maximum row size of 8060.
When I made the table 200 columns I was able to insert huge amounts of data.
Is there a way to do this in SQL Server?
I ran some test and it seems we have an overhead of 26 bytes on each populated varchar(max) column.
I was able to populate 308 columns.
If you'll divide your columns between 2 tables you'll be fine (until the next limitation - which will come).
P.s.
I seriously doubt the justification for this table structure.
Any reason not saving the data as rows instead of columns?

SQL to delete excess rows outside valid sub-range

I have a table named "A" which have 2 columns, "A1" and "A2".
I want each unique value in column "A1" to have MAX 2 rows in the table, if a unique value in column "A1" have 5 rows, 3 rows should be deleted.
Which 3 rows to delete is determinated by the lowest values in column "A2".
The table consist of +20 million rows, +300000 unique values in column "A1" and up to 3000 rows per unique value in column "A1".
I have solved this with the following query:
with excess as
(
select
id,
row_number() over(partition by A1 order by A2 desc) as rownum
from
A
)
delete from excess
where rownum > 2
I'm satisfied with this query since it took 8 minutes for the initial batch and ~20 seconds in recurring executions.
Is this the most efficient query to achieve the requirements?
yes, this is is the most efficient query without copying the data into another table because it is making it in a single run against the table instead of joining back to itself. I would suggest that you use "delete top(N)" and keep the number under 5,000, if there are any other consumers of the table. this will attempt to prevent the lock escalation from escalating to a full table lock. it will also free up the tlogs on the server to be reused in between batches. if you do it all in one go, all of the deleted rows have to be accounted for in the tlogs, and the space can't be reused until the statement is complete. I would also suggest creating a composite index on (A1, A2).
if the number of rows that need to be deleted are a significant percentage, it would be faster to copy the rows where rownum <= 2 into a new table. then, drop the original table and rename the new table back to the original. if you have other consumers of the table and/or don't want to copy the data, then this may not be a valid solution.

SQLite scanning table performance

My table has the following fields:
Date (integer)
State (integer)
ProductId (integer)
ProductName (integer)
Description (text) (maximum text lenght
3000 characters)
There will be more than 8 million rows. I need to decide whether I should put the product description in another table. My main goal is to have this statement very fast:
SELECT Date,State,ProductId,ProductName FROM tablename ORDER BY DATE desc LIMIT 100
The SQL result will not fetch the Description field value in the above statement. The user will see the description only when the row is selected in the application (new query).
I would really want to have the product Description in the same table, but I'm not sure how SQLite scans the rows. If Date value doesn't match I would assume that SQLite can quickly skip to the next row. Or maybe it needs to scan all fields of the row till it gets to the end of the Description field value in order to know that the row has ended? If it needs to scan all fields to get to the next row will the value of 3000 characters in the Description field decrease the speed a lot?
EDIT: No indexing should be used since INSERT speed is important.
EDIT: The only reason of trying to have it all in one table is that I want to do INSERTs and UPDATEs in one transaction of hundreds of items. The same item could be inserted and later updated in the same transaction, so I can not know the last insert id per item.
When you use that query and do not have an index on the Date column, SQLite will read all records from the table, and use a temporary table to sort the result.
When you have an index on the Date column, SQLite will look up the last 100 records in the index, then read all the data of those records from the table.
When you have a covering index, i.e., one index with the four columns Date, State, ProductId, and ProductName, SQLite will just read the last 100 entries from the index.
Whenever SQLite reads from the database file, it does not read values or records, but entire pages (typically, 1 KB or 4 KB).
In case 1, SQLite will read all pages of the table.
In case 2, SQLite will read the last page of the index (because the 100 dates will fit into one page), and 100 pages of the table (one for each record, assuming that no two of these records happen to be in the same page).
In case 3, SQLite will read the last few pages of the index.
Case 2 will be much faster than case 1; case 3 will be faster still, but probably not enough to be noticeable.
I would suggest to rely on good old database normalization rules, in this case specifically 1NF. If that Description (same goes for the ProductName) is going to be repeated, you have a database design issue, and it being in SQLite or other has little to do with it. CL is right with his indexes, mind you, proper indexing will still matter.
Review your model, make a table for products and another for inventory.