Overwrite ONE column in select statement with a fixed value without listing all columns - sql

I have a number of tables with a large number of columns (> 100) in a SQL Server database. In some cases when selecting (using views) I need to replace exactly ONE of the columns with a fixed result value instead of the data from the row(s).
Is there a way to use something like
select table.*, 'value' as Column1 from table
if Column1 is a column name within the table?
Of course I can list all the columns which are expected as result in the select Statement, replacing the one with a value.
However, this is very inconvinient and having 3 or 4 those views I have to maintain them all if columns are added or removed from the table.

Nope, you have to specify columns in this case.
And you have much more serious problems if tables are being changed often. This may be a signal of large architectural defects.
Anyway, listing all columns instead of * is a good practice, because if columns number will change, it may cause cascade errors.

As other responses have noted, this can't be done in a single statement. There is a workaround, however, which is not perfect but does circumvent the need to list columns manually: save your initial, unmodified query to a temp table, update the column(s) you need to overwrite, then select the results:
--we're going to use a temp table; make sure it doesn't already exist
if (object_id('tempdb..#tmpTbl') is not null)
drop table #tmpTbl
--initial query to retrieve all the columns
select *
into #tmpTbl
from TblWithManycolumns
--update column(s) from another table or query
update #tmpTbl
set ColToBeReplaced = trv.ColWithReplacementValue
from #tmpTbl t
join TableWithReplacementValue trv
on trv.KeyCol = t.KeyCol
--where trv.FilterCol = #FilterVal -- if needed
--this select contains the final output data
select * from #tmpTbl
drop table #tmpTbl
This has plenty of drawbacks. Complexity, performance, etc. But it is very flexible and solves the major problem of preventing changes to the main table (TblWithManyColumns) from breaking the query or requiring manual changes. This is particularly important if you're trying to generate SQL.

Related

What is the most efficient way to implement a Redshift merge/upsert operation

I am in the process of writing a custom upsert function for a specific use case for a redshift table. On their docs, AWS suggests two methods which i'm drawing inspiration from. Here is what i want to accomplish:
Insert any new rows to an existing table, but only if they don't already exist.
There is never a need to delete or modify an existing row (for my use case)
I have so far come up with two separate ways to do this, but I'm wondering what the tradeoffs of each could be
using an EXCEPT query for insertion of only new rows from a temp table:
insert into persisted_table (
select *
from temp_table
except
select *
from persisted_table
);
store results of aUNION ALL query on temp table with persisted table, and use that as the persisted table
insert into new_table (
select *
from temp_table
union
select *
from persisted_table
);
alter table persisted_table rename to old_perisisted_table_marked_for_deletion;
alter table new_table rename to persisted_table;
I'm aware that union all is slow and generally not recommended for bulk/large scale operations. Apart from that though are there any arguments that could influence this decision?
The first advise I'd give is to remember that Redshift is a cluster. Whatever process you select, if the data is large, you will want the comparison to determine if the row already exists to stay "on node". You will want the tables in questions to be distributed by the same key.
Next I would think about what the indexes are into the data. The processes you laid out will compare all columns when comparing. Is this needed? If a subset of columns can be the key this can make things more efficient.
insert into persisted_table (
select * from temp_table a left join persisted_table b on {keys}
where a.keys is null );
Hopefully these aspects will help your decision process

DB2/SQL equivalent of SAS's sum(of ) function

SAS has a sum(of col1 - coln ) function which finds the sum of all the values from col1, col2, col3...coln. (ie, you don't have to list out all the column names, as long as they are numbered consecutively). This is a handy shortcut to find a sum of several (suitably named) variables.
Question - Is there a DB2/SQL equivalent of this? I have 50 columns (they are named col1, col2, col3....col50 and I need to find the sum of them.
ie:
select sum(col1, col2, col3,....,col50) AggregateSum
from foo.table
No, DB2 has no such beast, at least to my knowledge. However, you can dynamically create such a query by first querying the database metadata to extract the columns for a given table.
From memory, DB2 has a sysibm.syscolumns table which basically contains the column information that you could use to construct a query on the fly.
You would first use a query like:
select column for sysibm.syscolumns
where schema = 'foo' and tablename = 'table'
and column like 'col%'
(the column names may not match exactly but, since they're not the same on the differing variants of DB2 (DB2/z, DB2/LUW, iSeries DB2, etc) anyway, that hardly matters).
Then use the results of that query to construct your actual query:
select col1+col2+...+colN AggregateSum from foo.table
where the col1+col2+...+colN bit has been built from the previous query.
If, as you mention in a comment, you only want the eighteen "highest" columns (e.g., if columns 1 thru 100 exist, you only want 83 thru 100), you can modify the first query to do that, with something like:
select column for sysibm.syscolumns
where schema = 'foo' and tablename = 'table'
and column like 'col%'
order by column desc
fetch first 18 rows only
but, in that case, you may want to call the columns col0001, col0145 and so on, or make the sorting able to handle variable width numbers.
Although it may be easier (if you can't change the column names) to get all the columns colNNN, sort them yourself by the numeric (not string) value after the col, and throw away all but the last eighteen when constructing the second query).
Both these options will return only eighteen rows maximum.
But you may also want to think, in that case, about moving the variable data to another table, if that's possible in your situation. If you ever find yourself maintaining an array within a table, it's usually better to separate that out.
So your main table would then be something like:
main_id primary key
other_data
and your auxiliary table would be akin to:
main_id foreign key to main(main_id)
sequence_nm
other_data
primary key (main_id, sequence_num)
That would allow you to have sparse data if needed, and also to add data without having to change the schema of the main table. The query to get the latest eighteen results would be a little more complicated but still a relatively simple join of the two tables.

Alternatives to UPDATE statement Oracle 11g

I'm currently using Oracle 11g and let's say I have a table with the following columns (more or less)
Table1
ID varchar(64)
Status int(1)
Transaction_date date
tons of other columns
And this table has about 1 Billion rows. I would want to update the status column with a specific where clause, let's say
where transaction_date = somedatehere
What other alternatives can I use rather than just the normal UPDATE statement?
Currently what I'm trying to do is using CTAS or Insert into select to get the rows that I want to update and put on another table while using AS COLUMN_NAME so the values are already updated on the new/temporary table, which looks something like this:
INSERT INTO TABLE1_TEMPORARY (
ID,
STATUS,
TRANSACTION_DATE,
TONS_OF_OTHER_COLUMNS)
SELECT
ID
3 AS STATUS,
TRANSACTION_DATE,
TONS_OF_OTHER_COLUMNS
FROM TABLE1
WHERE
TRANSACTION_DATE = SOMEDATE
So far everything seems to work faster than the normal update statement. The problem now is I would want to get the remaining data from the original table which I do not need to update but I do need to be included on my updated table/list.
What I tried to do at first was use DELETE on the same original table using the same where clause so that in theory, everything that should be left on that table should be all the data that i do not need to update, leaving me now with the two tables:
TABLE1 --which now contains the rows that i did not need to update
TABLE1_TEMPORARY --which contains the data I updated
But the delete statement in itself is also too slow or as slow as the orginal UPDATE statement so without the delete statement brings me to this point.
TABLE1 --which contains BOTH the data that I want to update and do not want to update
TABLE1_TEMPORARY --which contains the data I updated
What other alternatives can I use in order to get the data that's the opposite of my WHERE clause (take note that the where clause in this example has been simplified so I'm not looking for an answer of NOT EXISTS/NOT IN/NOT EQUALS plus those clauses are slower too compared to positive clauses)
I have ruled out deletion by partition since the data I need to update and not update can exist in different partitions, as well as TRUNCATE since I'm not updating all of the data, just part of it.
Is there some kind of JOIN statement I use with my TABLE1 and TABLE1_TEMPORARY in order to filter out the data that does not need to be updated?
I would also like to achieve this using as less REDO/UNDO/LOGGING as possible.
Thanks in advance.
I'm assuming this is not a one-time operation, but you are trying to design for a repeatable procedure.
Partition/subpartition the table in a way so the rows touched are not totally spread over all partitions but confined to a few partitions.
Ensure your transactions wouldn't use these partitions for now.
Per each partition/subpartition you would normally UPDATE, perform CTAS of all the rows (I mean even the rows which stay the same go to TABLE1_TEMPORARY). Then EXCHANGE PARTITION and rebuild index partitions.
At the end rebuild global indexes.
If you don't have Oracle Enterprise Edition, you would need to either CTAS entire billion of rows (followed by ALTER TABLE RENAME instead of ALTER TABLE EXCHANGE PARTITION) or to prepare some kind of "poor man's partitioning" using a view (SELECT UNION ALL SELECT UNION ALL SELECT etc) and a bunch of tables.
There is some chance that this mess would actually be faster than UPDATE.
I'm not saying that this is elegant or optimal, I'm saying that this is the canonical way of speeding up large UPDATE operations in Oracle.
How about keeping in the UPDATE in the same table, but breaking it into multiple small chunks?
UPDATE .. WHERE transaction_date = somedatehere AND id BETWEEN 0000000 and 0999999
COMMIT
UPDATE .. WHERE transaction_date = somedatehere AND id BETWEEN 1000000 and 1999999
COMMIT
UPDATE .. WHERE transaction_date = somedatehere AND id BETWEEN 2000000 and 2999999
COMMIT
This could help if the total workload is potentially manageable, but doing it all in one chunk is the problem. This approach breaks it into modest-sized pieces.
Doing it this way could, for example, enable other apps to keep running & give other workloads a look in; and would avoid needing a single humungous transaction in the logfile.

SQL - renumbering a sequential column to be sequential again after deletion

I've researched and realize I have a unique situation.
First off, I am not allowed to post images yet to the board since I'm a new user, so see appropriate links below
I have multiple tables where a column (not always the identifier column) is sequentially numbered and shouldn't have any breaks in the numbering. My goal is to make sure this stays true.
Down and Dirty
We have an 'Event' table where we randomly select a percentage of the rows and insert the rows into table 'Results'. The "ID" column from the 'Results' is passed to a bunch of delete queries.
This more or less ensures that there are missing rows in several tables.
My problem:
Figuring out an sql query that will renumber the column I specify. I prefer to not drop the column.
Example delete query:
delete ItemVoid
from ItemTicket
join ItemVoid
on ItemTicket.item_ticket_id = itemvoid.item_ticket_id
where itemticket.ID in (select ID
from results)
Example Tables Before:
Example Tables After:
As you can see 2 rows were delete from both tables based on the ID column. So now I gotta figure out how to renumber the item_ticket_id and the item_void_id columns where the the higher number decreases to the missing value, and the next highest one decreases, etc. Problem #2, if the item_ticket_id changes in order to be sequential in ItemTickets, then
it has to update that change in ItemVoid's item_ticket_id.
I appreciate any advice you can give on this.
(answering an old question as it's the first search result when I was looking this up)
(MS T-SQL)
To resequence an ID column (not an Identity one) that has gaps,
can be performed using only a simple CTE with a row_number() to generate a new sequence.
The UPDATE works via the CTE 'virtual table' without any extra problems, actually updating the underlying original table.
Don't worry about the ID fields clashing during the update, if you wonder what happens when ID's are set that already exist, it
doesn't suffer that problem - the original sequence is changed to the new sequence in one go.
WITH NewSequence AS
(
SELECT
ID,
ROW_NUMBER() OVER (ORDER BY ID) as ID_New
FROM YourTable
)
UPDATE NewSequence SET ID = ID_New;
Since you are looking for advice on this, my advice is you need to redesign this as I see a big flaw in your design.
Instead of deleting the records and then going through the hassle of renumbering the remaining records, use a bit flag that will mark the records as Inactive. Then when you are querying the records, just include a WHERE clause to only include the records are that active:
SELECT *
FROM yourTable
WHERE Inactive = 0
Then you never have to worry about re-numbering the records. This also gives you the ability to go back and see the records that would have been deleted and you do not lose the history.
If you really want to delete the records and renumber them then you can perform this task the following way:
create a new table
Insert your original data into your new table using the new numbers
drop your old table
rename your new table with the corrected numbers
As you can see there would be a lot of steps involved in re-numbering the records. You are creating much more work this way when you could just perform an UPDATE of the bit flag.
You would change your DELETE query to something similar to this:
UPDATE ItemVoid
SET InActive = 1
FROM ItemVoid
JOIN ItemTicket
on ItemVoid.item_ticket_id = ItemTicket.item_ticket_id
WHERE ItemTicket.ID IN (select ID from results)
The bit flag is much easier and that would be the method that I would recommend.
The function that you are looking for is a window function. In standard SQL (SQL Server, MySQL), the function is row_number(). You use it as follows:
select row_number() over (partition by <col>)
from <table>
In order to use this in your case, you would delete the rows from the table, then use a with statement to recalculate the row numbers, and then assign them using an update. For transactional integrity, you might wrap the delete and update into a single transaction.
Oracle supports similar functionality, but the syntax is a bit different. Oracle calls these functions analytic functions and they support a richer set of operations on them.
I would strongly caution you from using cursors, since these have lousy performance. Of course, this will not work on an identity column, since such a column cannot be modified.

Adding new columns to a column database having billions of rows

I want to add a new column to a table which already consists billions of rows. The new columns are derived from existing columns.
For example,
new_col1 = old_col1 + old_col2
new_col2 = old_col1 / old_col2
I am trying to do this in following way -
Add new columns
ALTER TABLE table_name
ADD ( column_1 column-definition,
column_2 column-definition,
...
column_n column_definition )
Read rows one by one from the table and fill the values for new columns.
There is no primary key in the database. So I can not refer to an individual row. To read rows one by one, I have to do a select * which would give a huge resultset (considering billions of records).
Is there any better way to do this?
Different DBMS have different SQL dialects, it is useful to specify which you are using in the question.
In SQL Server you could use a Computed Column but this would calculate the result every time you select the data, you could flag it as persisted but it may take a while to make the change. But you can't do that if you are going to remove the old columns.
Alternatively create the new column allowing nulls and then update it in batches
UPDATE TOP (1000) table_name SET new_col1 = old_col1 + col_col2 WHERE new_col1 IS NULL
Again the query is for SQL Server, but there will alternatives for your DBMS.
Also read Mr Hoopers comment about adding an index to the new column to make sure that the performance of the UPDATE doesn't get worse as more data is added. The update is a read and write operation, the index will speed up the reads and slightly delay the writes (maintaining the index), but it should be worthwhile.
I think Mr Diver's method would be fine if you also added an index on one of your new columns; otherwise, as the job progresses, it will have to do more and more scanning to find the rows it hasn't already updated. Adding an index will mean it doesn't have to do that. A possible drawback is that the index differentiation will be frightful when the column is created, but I don't think that would be a problem as you only care about NULL or NOT NULL. You could drop the index when the update is complete.
use stored procedures, do an update by 100 of them, add the stored procedure as a job to run every say 30 seconds.