DB2/SQL equivalent of SAS's sum(of ) function - sql

SAS has a sum(of col1 - coln ) function which finds the sum of all the values from col1, col2, col3...coln. (ie, you don't have to list out all the column names, as long as they are numbered consecutively). This is a handy shortcut to find a sum of several (suitably named) variables.
Question - Is there a DB2/SQL equivalent of this? I have 50 columns (they are named col1, col2, col3....col50 and I need to find the sum of them.
ie:
select sum(col1, col2, col3,....,col50) AggregateSum
from foo.table

No, DB2 has no such beast, at least to my knowledge. However, you can dynamically create such a query by first querying the database metadata to extract the columns for a given table.
From memory, DB2 has a sysibm.syscolumns table which basically contains the column information that you could use to construct a query on the fly.
You would first use a query like:
select column for sysibm.syscolumns
where schema = 'foo' and tablename = 'table'
and column like 'col%'
(the column names may not match exactly but, since they're not the same on the differing variants of DB2 (DB2/z, DB2/LUW, iSeries DB2, etc) anyway, that hardly matters).
Then use the results of that query to construct your actual query:
select col1+col2+...+colN AggregateSum from foo.table
where the col1+col2+...+colN bit has been built from the previous query.
If, as you mention in a comment, you only want the eighteen "highest" columns (e.g., if columns 1 thru 100 exist, you only want 83 thru 100), you can modify the first query to do that, with something like:
select column for sysibm.syscolumns
where schema = 'foo' and tablename = 'table'
and column like 'col%'
order by column desc
fetch first 18 rows only
but, in that case, you may want to call the columns col0001, col0145 and so on, or make the sorting able to handle variable width numbers.
Although it may be easier (if you can't change the column names) to get all the columns colNNN, sort them yourself by the numeric (not string) value after the col, and throw away all but the last eighteen when constructing the second query).
Both these options will return only eighteen rows maximum.
But you may also want to think, in that case, about moving the variable data to another table, if that's possible in your situation. If you ever find yourself maintaining an array within a table, it's usually better to separate that out.
So your main table would then be something like:
main_id primary key
other_data
and your auxiliary table would be akin to:
main_id foreign key to main(main_id)
sequence_nm
other_data
primary key (main_id, sequence_num)
That would allow you to have sparse data if needed, and also to add data without having to change the schema of the main table. The query to get the latest eighteen results would be a little more complicated but still a relatively simple join of the two tables.

Related

Overwrite ONE column in select statement with a fixed value without listing all columns

I have a number of tables with a large number of columns (> 100) in a SQL Server database. In some cases when selecting (using views) I need to replace exactly ONE of the columns with a fixed result value instead of the data from the row(s).
Is there a way to use something like
select table.*, 'value' as Column1 from table
if Column1 is a column name within the table?
Of course I can list all the columns which are expected as result in the select Statement, replacing the one with a value.
However, this is very inconvinient and having 3 or 4 those views I have to maintain them all if columns are added or removed from the table.
Nope, you have to specify columns in this case.
And you have much more serious problems if tables are being changed often. This may be a signal of large architectural defects.
Anyway, listing all columns instead of * is a good practice, because if columns number will change, it may cause cascade errors.
As other responses have noted, this can't be done in a single statement. There is a workaround, however, which is not perfect but does circumvent the need to list columns manually: save your initial, unmodified query to a temp table, update the column(s) you need to overwrite, then select the results:
--we're going to use a temp table; make sure it doesn't already exist
if (object_id('tempdb..#tmpTbl') is not null)
drop table #tmpTbl
--initial query to retrieve all the columns
select *
into #tmpTbl
from TblWithManycolumns
--update column(s) from another table or query
update #tmpTbl
set ColToBeReplaced = trv.ColWithReplacementValue
from #tmpTbl t
join TableWithReplacementValue trv
on trv.KeyCol = t.KeyCol
--where trv.FilterCol = #FilterVal -- if needed
--this select contains the final output data
select * from #tmpTbl
drop table #tmpTbl
This has plenty of drawbacks. Complexity, performance, etc. But it is very flexible and solves the major problem of preventing changes to the main table (TblWithManyColumns) from breaking the query or requiring manual changes. This is particularly important if you're trying to generate SQL.

How to do damage with SQL by adding to the end of a statement?

Perhaps I am not creative or knowledgeable enough with SQL... but it looks like there is no way to do a DROP TABLE or DELETE FROM within a SELECT without the ability to start a new statement.
Basically, we have a situation where our codebase has some gigantic, "less-than-robust" SQL generation component that never uses prepared statements and we now have an API that interacts with this legacy component.
Right now we can modify a query by appending to the end of it, but have been unable to insert any semicolons. Thus, we can do something like this:
/query?[...]&location_ids=loc1')%20or%20L1.ID%20in%20('loc2
which will result in this
SELECT...WHERE L1.PARENT_ID='1' and L1.ID IN ('loc1') or L1.ID in ('loc2');...
This is just one example.
Basically we can append pretty much anything to the end of any/most generated SQL queries, less adding a semicolon.
Any ideas on how this could potentially do some damage? Can you add something to the end of a SQL query that deletes from or drops tables? Or create a query so absurd that it takes up all CPU and never completes?
You said that this:
/query?[...]&location_ids=loc1')%20or%20L1.ID%20in%20('loc2
will result in this:
SELECT...WHERE L1.PARENT_ID='1' and L1.ID IN ('loc1') or L1.ID in ('loc2');
so it looks like this:
/query?[...]&location_ids=');DROP%20TABLE users;--
will result in this:
SELECT...WHERE L1.PARENT_ID='1' and L1.ID IN ('');DROP TABLE users;--');
which is a SELECT, a DROP and a comment.
If it’s not possible to inject another statement, you limited to the existing statement and its abilities.
Like in this case, if you are limited to SELECT and you know where the injection happens, have a look at PostgreSQL’s SELECT syntax to see what your options are. Since you’re injecting into the WHERE clause, you can only inject additional conditions or other clauses that are allowed after the WHERE clause.
If the result of the SELECT is returned back to the user, you may want to add your own SELECT with a UNION operation. However, PostgreSQL requires compatible data types for corresponding columns:
The two SELECT statements that represent the direct operands of the UNION must produce the same number of columns, and corresponding columns must be of compatible data types.
So you would need to know the number and data types of the columns of the original SELECT first.
The number of columns can be detected with the ORDER BY clause by specifying the column number like ORDER BY 3, which would order the result by the values of the third column. If the specified column does not exist, the query will fail.
Now after determining the number of columns, you can inject a UNION SELECT with the appropriate number of columns with an null value for each column of your UNION SELECT:
loc1') UNION SELECT null,null,null,null,null --
Now you determine the types of each column by using a different value for each column one by one. If the types of a column are incompatible, you may an error that hints the expected data type like:
ERROR: invalid input syntax for integer
ERROR: UNION types text and integer cannot be matched
After you have determined enough column types (one column may be sufficient when it’s one that is presented the user), you can change your SELECT to select whatever you want.

Select query to find all unique keys from a table where a any word from a list appears in a free text field?

Looking for a little bit of SQL-foo to help find the most efficient way to do this query.
I have a table with two columns, ID and a small character field (<300 chars). The ID field is not unique, and I would like the result to be a distinct list of ID numbers. I also have an input list of words that I want to query on, say 'foo', 'bar' as the base case. For a result to be valid, it also must have at least one matching row for each word that is input.
What is a clean and efficient way to write this as one query? I am also open to multiple queries if there is no single-query way to execute it efficiently.
Please note that in the specific environment I am working with I cannot use more than 10 subqueries, and I may have 10 or more words provided as input (although I may be able to limit the input to 10 as long as the user is aware of this). Also note that I cannot use the 'IN' clause if it is possible that the list of values in it grows to be larger than a few thousand. I am querying a table with potentially millions of ID-text pairs.
Thanks for any and all advice!
Use a UDF that returns a table:
Consider writing a user-defined function (UDF) that takes a string containing all values that you wish to search for, separated by a delimiter. The UDF would split the data in the string and return it as a table. Then, include the table that the UDF returns as a join on the table in question.
Here's an example: http://everysolution.wordpress.com/2011/07/28/udf-to-split-a-delimited-string-and-return-it-as-a-table/
If that small character field is always one word and you're looking for an exact match with a word in your list, I don't see why the below would not work. That is, if you're looking for IDs with 'foo', do you want only IDs that are 'foo', or might there be 'fooish', which should also be a match? In the latter case this won't work, in the former it should.
The query below assumes:
That your 2 column table is called "tbl"
That you can put the list of these 'input' words into a table; in my example below this other table is called "othertbl". It should contain however many words you're searching on, and it can be over 1,000 (the exists subquery doesn't have that limitation)
As stated before, I am assuming you are looking for exact matches on the 2nd column of "tbl", not partial or fuzzy matches
For performance reasons, you'll want to ensure that tbl.wordfield and othertbl.word are indexed (whatever the column names actually are)
-
select distinct id
from tbl
where exists
(select 'x' from othertbl where othertbl.word = tbl.wordfield)

is there a way to search within multiple tables in SQL

Right now i have 100 tables in SQL and i am looking for a specific string value in all tables, and i do not know which column it is in.
select * from table1, table2 where column1 = 'MyLostString' will not work because i do not know which column it has to be in.
Is there a SQL query for that, must i brute force search every table for every column for that 'MyLostString'
If I were to brute-force search across all tables, is there an efficient query for that?
For instance:
select * from table3 where allcolumns = MyLostString
It is the defining feature of a RDBMS (or at least one of them), that the meaning of a value depends on the column it is in. E.g.: The value 17 will have quite different meanings, if it stands in a customer_id column, than in the product_id of a fictional orders table.
This leads to the fact, that RDBMS are not well equipped to search for a value, no matter in which column of which tables it might be used.
My recommendation is to first study the data model to try and find out, which column of which table should be holding the value. If this really fails, you have a problem much worse than a "lost string".
The last ressort is to transform the DB into something better suited for fulltext search ... such as a flat file. You might want to try mydbexportcommand --options | grep -C10 'My lost string' or friends.

SQL: Automated column selection

I am doing an autojoin on a table (lets say with table aliases current, prev, next) but am only interested in the columns of current.
Is there a way (without explicitly enumerating them) to limit the columns of the result set to that of current?
Something less complex (no metatable querying or even something that is entirely DB agnostic) than this: sql select with column name like would be fantastic. As I impose some specific constraints on my requirement (select all colums of the original/current table), my hope is that there is something easy that I am missing. In the end, I'd be happy with a Postgres specific solution as well.
Using Postgres, a statment like
SELECT current FROM ...
indeed nearly provides what I want except that everything now is merged into a single comma separated text column.
Sure this is possible:
select current.*
from some_table current
join other_table prev on prev.fid = current.id
join third_table nxt on nxt.oid = prev.id