SQL-ish : how to change enormous code into an elegant one? - sql

I have just one abhorrend table, no index, no keys, no IDs, no order, 25 columns, 19 million rows.
I am using the SQL-ish language named TaQL ("Table Query Language").
I need to select-from-where ... It sounds no problem!
However the WHERE conditions are 1683 sets of simple conditions:
set#1: columnA>num1 and columnB>num2 and columnC<num3 and columnD>=num4 ...
or
set#2: columnA>num189 and columnB>num274 and columnC<num321 and columnD>=num457 ...
or
set#n: ...
or
set#1683: ....
My current code is working fine, but it has 1683 lines in the WHERE statement. I created it by awk and regular expressions.
Is there an elegant way to reduce such enormous code?

Have you ever tried the approach of creating a new data format the DID have some normalization to it so you could import into your own database, then you can add indexes, clean criteria keep, etc. Maybe even adding a new column to the end of your table (or columns) as "KeepThis".
Then, apply an update YourTable set KeepThis = 1 where your criteria.. maybe even set the value equal to the condition it qualified with. Then you could query based on those values, or even all those NOT assigned to a value and see if any merit with those records not previously realized.
Sounds like a task no matter what, but might be a nice approach to have things pre-stamped and in a database you can manage vs other source.

Related

SQL DB2 - How to SELECT or compare columns based on their name?

Thank you for checking my question out!
I'm trying to write a query for a very specific problem we're having at my workplace and I can't seem to get my head around it.
Short version: I need to be able to target columns by their name, and more specifically by a part of their name that will be consistent throughout all the columns I need to combine or compare.
More details:
We have (for example), 5 different surveys. They have many questions each, but SOME of the questions are part of the same metric, and we need to create a generic field that keeps it. There's more background to the "why" of that, but it's pretty important for us at this point.
We were able to kind of solve this with either COALESCE() or CASE statements but the challenge is that, as more surveys/survey versions continue to grow, our vendor inevitably generates new columns for each survey and its questions.
Take this example, which is what we do currently and works well enough:
CASE
WHEN SURVEY_NAME = 'Service1' THEN SERV1_REC
WHEN SURVEY_NAME = 'Notice1' THEN FNOL1_REC
WHEN SURVEY_NAME = 'Status1' THEN STAT1_REC
WHEN SURVEY_NAME = 'Sales1' THEN SALE1_REC
WHEN SURVEY_NAME = 'Transfer1' THEN Null
ELSE Null
END REC
And also this alternative which works well:
COALESCE(SERV1_REC, FNOL1_REC, STAT1_REC, SALE1_REC) as REC
But as I mentioned, eventually we will have a "SALE2_REC" for example, and we'll need them BOTH on this same statement. I want to create something where having to come into the SQL and make changes isn't needed. Given that the columns will ALWAYS be named "something#_REC" for this specific metric, is there any way to achieve something like:
COALESCE(all columns named LIKE '%_REC') as REC
Bonus! Related, might be another way around this same problem:
Would there also be a way to achieve this?
SELECT (columns named LIKE '%_REC') FROM ...
Thank you very much in advance for all your time and attention.
-Kendall
Table and column information in Db2 are managed in the system catalog. The relevant views are SYSCAT.TABLES and SYSCAT.COLUMNS. You could write:
select colname, tabname from syscat.tables
where colname like some_expression
and syscat.tabname='MYTABLE
Note that the LIKE predicate supports expressions based on a variable or the result of a scalar function. So you could match it against some dynamic input.
Have you considered storing the more complicated properties in JSON or XML values? Db2 supports both and you can query those values with regular SQL statements.

Deleting in SQL using multiple conditions

Some background, I have a code column that is char(6). In this field, I have the values of 0,00,000,0000,000000,000000. It seems illogical but that's how it is. What i need to do is delete all rows that possess these code values. I know how to do it individually as such
delete from [dbo.table] where code='0'
delete from [dbo.table] where code='00'
and so on.
How does one do this one section of code instead of 6
Try this:
delete from [dbo.table] where code='0'
or code='00'
or code='000'
etc. You get the idea.
There can be more efficient ways when the set of vales gets larger, but your 5 or 6 values is still quite a ways from that.
Update:
If your list grows long, or if your table is significantly larger than can reside in cache, you will likely see a significant performance gain by storing your selection values into an indexed temporary table and joining to it.
It strongly depends on your DBMS, but I suggest to use regular expressions. For example, with MySQL you just need simple query like this:
delete from dbo.table where code regexp '(0+)'
For most of popular DBMS you can do the same, but syntax may be various
I can't test it right now, but the following should work:
DELETE FROM dbo.table WHERE CONVERT(int, code) = 0
edit- Just thought of another way, that should be safer:
DELETE FROM dbo.table WHERE LEN(code) > 0 AND LEFT(code + '0000000000', 10) = '0000000000'

Search and replace part of string in database - what are the pitfalls?

I need to replace all occurrences "google.com" that are met in the SQL db table Column1 with "newurl". It can be a full cell value, a part of it (substring of varchar()), can be met even several times in a cell.
Based on SO answer search-and-replace-part-of-string-in-database
this is what I need:
UPDATE
MyTable
SET
Column1 = Replace(Column, 'google.com', 'newurl')
WHERE
xxx
However, in that answer it is mentioned that
You will want to be extremely careful when doing this! I highly recommend doing a backup first.
What are the pitfalls of doing this query? Looks like it does the same what any texteditor would do by clicking on Replace All button. I don't think it is possible in my case to check the errors even with reserve copy as I would like to know possible errors in advance.
Any reasons to be careful with this query?
Again, I expect it replaces all occurences of google.com with 'newurl' in the Column1 of MyTable table in the SQL db.
Thank you.
Just create a test table, as a replica of your original source table, complete the update on there and check results.
You would want to do this as good SQL programming practice to ensure you don't mess up columns that should not be updated.
Another thing you can do is get a count of the records before hand that fit the criteria using a SELECT statement.
Run your update statement and if it's a 1-1 match on count, you should be good to go.
The only thing i can think of that would happen negatively in this respect is that additional columns get updated. Your WHERE clause is not specific for us to see, so there's no way to validate that what you're doing will do what you expect it to.
I think the person posting the answer is just being cautious - This will modify the value in Column1 for every row in MyTable, so make sure you mean it when you execute. Another way to be cautious would be to wrap it in a transaction so you could roll it back if you don't like the results.

Excluding some columns in a SELECT statement

In the result for SELECT * from myTable WHERE some-condition;
I'm interested in 9 of all the 10 columns that exist. The only way out is to specify the 9 columns explicitly ?
I cannot somehow specify just the column I don't want to see?
The only way is to list all 9 columns.
Such as:
SELECT col1, col2, col3, col4, col5, col6, col7, col8, col9 FROM myTable
No, you can not. An example definition of select list for Sybase can be found here, you can easily find others for other DBs
The reason for that is that the standard methods of selection - "*" (aka all columns) and a list of columns - are defined operations in relational Algebra whereas the exclusion of columns is not
Also, as mentioned in Joe's comment, it is usually considered good practice to explicitly specify column list as opposed to "*" even when selecting all columns.
The reason for that is that having * in a joined query may cause the query to break if a table schema change introduces identically-named fields in both of the joined tables.
However, when selecting without a join from a very wide and often-mutating table, the above rule may not apply, as having "*" makes for a good change management (your query is one less place to fix and release when adding new columns), especially if you have flexible DB retrieval code that can dynamically deal with a column set from table definition instead of something specified in the code. (e.g., 100% of our extractors and loaders are fully working whenever a new column is added to the DB).
If you had to (can't think of why), but you could dynamically create this select statement by querying the columns in this table and exclude the one column name in the where clause.
Not worth the performance hit, confusion, and maintenance issues that will come up.
You actually need to specify the columns explicitly (as said by Luke it is good practice), and here is the reason:
Let's say that you write some code / scripts around you sql queries. You now have a whooping 50 different selects in various places of your code.
Suddenly you realize that for this new feature you are working on, you need another column (symmetry, you are doing cleanup and realize a column is useless and wasting space, though it is harder).
Now you are in either of this 2 situations:
You explicitly stated the columns in each and every query: Adding a column is a backward compatible change, just code your new feature and be done with it.
You used the '*' operator for a few queries: you have to track them down and modify them all. Forget a single one and it will be your grave.
Oh, and did I specify that a query with a '' selector takes more time to be executed since the DB actually has to query the model and develop the '' selector ?
Moral: only use the '*' selector when you are checking manually that your columns are fine (at which point you actually need to check everything), in code, just bane them or they'll be your doom.
No, you can't (at least not in any SQL dialect that I'm aware of).
It's good practice to explicitly specify your column names anyway, rather than using SELECT *.
In the end, you need to specify all 9 out of 10 columns separately - but there's tooling help out there which helps you make this easier!
Check out Red-Gate's SQL Prompt which is an intellisense-add-on for SQL Server Management Studio and Visual Studio.
Amongst a lot of other things, it allows you to type
SELECT * FROM MyTable
and then go back, put the cursor after the " * ", and press TAB - it will then list out all the columns in that table and you can tweak that list (e.g. remove a few you don't need).
Absolutely invaluable - saves hours and hours of mindless typing! Well worth the price of a license, I'd say.
Highly recommended!
Marc

How best to sum multiple boolean values via SQL?

I have a table that contains, among other things, about 30 columns of boolean flags that denote particular attributes. I'd like to return them, sorted by frequency, as a recordset along with their column names, like so:
Attribute Count
attrib9 43
attrib13 27
attrib19 21
etc.
My efforts thus far can achieve something similar, but I can only get the attributes in columns using conditional SUMs, like this:
SELECT SUM(IIF(a.attribIndex=-1,1,0)), SUM(IIF(a.attribWorkflow =-1,1,0))...
Plus, the query is already getting a bit unwieldy with all 30 SUM/IIFs and won't handle any changes in the number of attributes without manual intervention.
The first six characters of the attribute columns are the same (attrib) and unique in the table, is it possible to use wildcards in column names to pick up all the applicable columns?
Also, can I pivot the results to give me a sorted two-column recordset?
I'm using Access 2003 and the query will eventually be via ADODB from Excel.
This depends on whether or not you have the attribute names anywhere in data. If you do, then birdlips' answer will do the trick. However, if the names are only column names, you've got a bit more work to do--and I'm afriad you can't do it with simple SQL.
No, you can't use wildcards to column names in SQL. You'll need procedural code to do this (i.e., a VB Module in Access--you could do it within a Stored Procedure if you were on SQL Server). Use this code build the SQL code.
It won't be pretty. I think you'll need to do it one attribute at a time: select a string whose value is that attribute name and the count-where-True, then either A) run that and store the result in a new row in a scratch table, or B) append all those selects together with "Union" between them before running the batch.
My Access VB is more than a bit rusty, so I don't trust myself to give you anything like executable code....
Just a simple count and group by should do it
Select attribute_name
, count(*)
from attribute_table
group by attribute_name
To answer your comment use Analytic Functions for that:
Select attribute_table.*
, count(*) over(partition by attribute_name) cnt
from attribute_table
In Access, Cross Tab queries (the traditional tool for transposing datasets) need at least 3 numeric/date fields to work. However since the output is to Excel, have you considered just outputting the data to a hidden sheet then using a pivot table?