I have a script to extract certain data from a much bigger table, with one field in particular changing regularly, e.g.
SELECT CASE #Flag WHEN 1 THEN t.field1 WHEN 2 THEN t.field2 WHEN 3
THEN t.field3 END as field,
...[A bunch of other fields]
FROM table t
However, the issue is now I want to do other processing on the data. I'm trying to figure out the most effective method. I need to have some way of getting the flag through, so I know I'm talking about data sliced by the right field.
One possible solution I was playing around with a bit (mostly to see what would happen) is to dump the contents of the script into a table function which has the flag passed to it, and then use a SELECT query on the results of the function. I've managed to get it to work, but it's significantly slower than...
The obvious solution, and probably the most efficient use of processor cycles: to create a series of cache tables, one for each of the three flag values. However, the problem then is to find some way of extracting the data from the right cache table to perform the calculation. The obvious, though incorrect, response would be something like
SELECT CASE #Flag WHEN 1 THEN table1.field WHEN 2 THEN table2.field WHEN 3
THEN table3.field END as field,
...[The various calculated fields]
FROM table1, table2, table3
Unfortunately, as is obvious, this creates a massive cross join - which is not my intended result at all.
Does anyone know how to turn that cross join into an "Only look at x table"? (Without use of Dynamic SQL, which makes things hard to deal with?) Or an alternative solution, that's still reasonably speedy?
EDIT: Whether it's a good reason or not, the idea I was trying to implement was to not have three largely identical queries, that differ only by table - which would then have to be edited identically whenever a change is made to the logic. Which is why I've avoided the "Have the flag entirely separate" thing thus far...
I think you need to pull #Flag out of the query altogether, and use it to decide which of three separate SELECT statements to run.
How about a UNION ALL for each value of FLAG.
In the where clause of the first bit include:
AND #flag = 1
Although the comment about running different select statements for different flag values also makes sense to me.
You seem to be focusing your attention on the technology rather than the problem to be solved. Think about one select from the main table for each case - which is how you describe it here, isn't it?
A simpler solution, and one suggested by a workmate:
SELECT CASE #Flag WHEN 1 THEN t.field1 WHEN 2 THEN t.field2 WHEN 3
THEN t.field3 END as field,
[A bunch of other fields],
#Flag as flag
FROM table t
Then base the decision making on the last field. A lot simpler, and probably should have occurred to me in the first place.
Related
Some background, I have a code column that is char(6). In this field, I have the values of 0,00,000,0000,000000,000000. It seems illogical but that's how it is. What i need to do is delete all rows that possess these code values. I know how to do it individually as such
delete from [dbo.table] where code='0'
delete from [dbo.table] where code='00'
and so on.
How does one do this one section of code instead of 6
Try this:
delete from [dbo.table] where code='0'
or code='00'
or code='000'
etc. You get the idea.
There can be more efficient ways when the set of vales gets larger, but your 5 or 6 values is still quite a ways from that.
Update:
If your list grows long, or if your table is significantly larger than can reside in cache, you will likely see a significant performance gain by storing your selection values into an indexed temporary table and joining to it.
It strongly depends on your DBMS, but I suggest to use regular expressions. For example, with MySQL you just need simple query like this:
delete from dbo.table where code regexp '(0+)'
For most of popular DBMS you can do the same, but syntax may be various
I can't test it right now, but the following should work:
DELETE FROM dbo.table WHERE CONVERT(int, code) = 0
edit- Just thought of another way, that should be safer:
DELETE FROM dbo.table WHERE LEN(code) > 0 AND LEFT(code + '0000000000', 10) = '0000000000'
I have something what I think is a srange issue. Normally, I think that a Query should last less time if I put a restriction (so that less rows are processed). But I don't know why, this is not the case. Maybe I'm putting something wrong, but I don't get error; the query just seems to run 'till infinity'.
This is the query
SELECT
A.ENTITYID AS ORG_ID,
A.ID_VALUE AS LEI,
A.MODIFIED_BY,
A.AUDITDATETIME AS LAST_DATE_MOD
FROM (
SELECT
CASE WHEN IFE.NEWVALUE IS NOT NULL
then EXTRACTVALUE(xmltype(IFE.NEWVALUE), '/DocumentElement/ORG_IDENTIFIERS/ID_TYPE')
ELSE NULL
end as ID_TYPE,
case when IFE.NEWVALUE is not null
then EXTRACTVALUE(xmltype(IFE.NEWVALUE), '/DocumentElement/ORG_IDENTIFIERS/ID_VALUE')
ELSE NULL
END AS ID_VALUE,
(select u.username from admin.users u where u.userid = ife.analystuserid) as Modified_by,
ife.*
FROM ife.audittrail ife
WHERE
--IFE.AUDITDATETIME >= '01-JUN-2016' AND
attributeid = 499
AND ROWNUM <= 10000
AND (CASE WHEN IFE.NEWVALUE IS NOT NULL then EXTRACTVALUE(xmltype(IFE.NEWVALUE), '/DocumentElement/ORG_IDENTIFIERS/ID_TYPE') ELSE NULL end) = '38') A
--WHERE A.AUDITDATETIME >= '01-JUN-2016';
So I tried with the two clauses commented (one per each time of course).
And with both of them happens the same; the query runs for so long time that I have to abort it.
Do you know why this could be happening? How could I do, maybe in a different way, to put the restriction?
The values of the field AUDITDATETIME are '06-MAY-2017', for example. In that format.
Thank you very much in advance
I think you may misunderstand how databases work.
Firstly, read up on EXPLAIN - you can find out exactly what is taking time, and why, by learning to read the EXPLAIN statement.
Secondly - the performance characteristics of any given query are determined by a whole range of things, but usually the biggest effort goes not in processing rows, but finding them.
Without an index, the database has to look at every row in the database and compare it to your where clause. It's the equivalent of searching in the phone book for a phone number, rather than a name (the phone book is indexed on "last name").
You can improve this by creating indexes - for instance, on columns "AUDITDATETIME" and "attributeid".
Unlike the phone book, a database server can support multiple indexes - and if those indexes match your where clause, your query will be (much) faster.
Finally, using an XML string extraction for a comparison in the where clause is likely to be extremely slow unless you've got an index on that XML data.
This is the equivalent of searching the phone book and translating the street address from one language to another - not only do you have to inspect every address, you have to execute an expensive translation step for each item.
You probably need index(es)... We can all make guesses on what indexes you already have, and need to add, but most dbms have built in query optimizers.
If you are using MS SQL Server you can execute query with query plan, that will tell you what index you need to add to optimize this particular query. It will even let you copy /paste the command to create it.
this question might appear a bit strange to you but i´ll try to explain it.
In our company in the production department we are tracking machine data. This data is also used for evaluating the quality of the production process.
In the following i will refer to these attributes:
productId
componentOfProduct -> the component which is affected by the error
routeStepOfError
causeOfError
The problem is, that the data the machine produces is not in the order the management wants to have it for evaluation.
So we have to do a data matching. Most of the time it is a simple relationship e.g. matching several productId numbers to 1 product Name / Group.
But in the case of the routeStepOfError it´s different. For some cases the routeStep the production lines are logging can be matched to the routeStep for the management reports like descirbed above with the productIds.
But for some routeSteps a way more complicated matching is done. So far it´s implementet in an VBA app which is matching the database output and writes data into a spreadsheet. the matching is done via Select Case Instructions like this:
Select Case routeStep
Case EOL
Select Case productId
Case 1111, 1112, 1113
Select Case causeOfError
Case A1:
Select Case componentOfProduct
Case "be1": routeStepReport = "final optical test"
Case Else: routeStepReport = "end of line"
End Select
Case Else: routeStepReport = "end of line"
End Select
Case Else: routeStepReport = "end of line"
End Select
Case...
End Select
...i know that the syntax might not be correct, but i hope you get what i´m trying to say: sometimes the mathing from routeStep to routeStepReport (i.e. the value we need for our management reports) depends on the routeStep, the productId, the componentOfProduct and the causeOfError.
...and these Select Case Statements are really long as there are many products and many routeSteps in our production process. So, each time, there is a change in the production programm / process, this has to be maintained in the VBA Code which is far away from being perfect as only 1 guy in our company really knows where in the code to look for this and how to maintain it.
So, i proposed to implement the whole matching in an SQL Database and just create the right relationships between the values of the machines and the values the management wants to have. Togehter with an interface in php or whatever people could just do the matching quite easily.
Well, for the simple matchings like productIds to Product Groups this works quite fine, but for the routeSteps like described above for me this might be a problem.
I would have created one table with the following attributes:
|-----------------|-----------------|-----------------|-----------------|-----------------|
|routeSTepofError |productId | componentOfProd | causeOfError | routeStepReport |
|-----------------|-----------------|-----------------|-----------------|-----------------|
But Let´s say, we have about 20 routeSteps, 50 productIds, each with about 4 Components and 10 causes of error this table might be endless as well and really hard to maintain.
Maybe i should have told before, that for the majority of routeSTepofErrors, there is a simple matiching from routeSTepofError to routeStepReport regardless to productIds, components and causes.... but if some mathings are depending on all 4 criterias, i have to completly fill the table above, don´t I?
Maybe there´s an easier solution to achieve this, but yet I cannot see it.
So i would be really pleased for each and every hint you could give me for solving this problem (i cannot change the way of matching itself; they still want to have "their" well-known figures :-) ).
Thanks a lot in advance!
Regards
You might use two tables, tblRouteStepErrorMatch and tblRouteStepErrorException.
tblRouteStepErrorMatch
routeStepofError
routeStepReport
tblRouteStepErrorException
routeStepError
productID
componentOfProd
causeOfError
routeStepReport
Then in your code, check the Exception table. If there's not match, go to the Match table.
ExcRecordset = SELECT * FROM tblRouteStepErrorException WHERE ...
If BOF(ExcRecordset) and EOF(ExcRecordset) Then 'No match in exception table
MatchRecordset = SELECT * FROM tblRouteStepErrorMatch WHERE ... 'go get from match table
get result from MatchRecordset
Else
get result from ExcRecordset
End if
Now your exceptions are a lot easier to maintain because there are far fewer of them and the match table becomes the fallback for when a special case isn't found.
Apologies for the somewhat confusing Title, I've been struggling to find an answer to my question, partly because it's hard to concisely describe it in the title line or come up with a good search string for it. Anyhoooo, here's the problem I'm facing:
Short version of the question is:
How can I write the following (invalid but understandable SQL) in valid SQL understood by Oracle:
select B.REPLACER as COL, A.* except A.COL from A join B on a.COL = B.COL;
Here's the long version (if you already know what I want from reading the short version, you don't need to read this :P ):
My (simplified) task is to come up with service that massages a table's data and provide it as a sub-query. The table has a lot of columns (a few dozens or more), and I am stuck with using "select *" rather than explicitly listing out all columns one by one, because new columns may be added to or removed from the table without me knowing, although my downstream systems will know and adjust accordingly.
Say, this table (let's call it Table A from now on) has a column called "COL", and we need to replace the values in that COL with the value in the REPLACER column of table B where the two COL value matches.
How do I do this? I cannot rename the column because the downstream systems expect "COL"; I cannot do without the "expect A.COL" part because that would cause the sql to be ambiguous.
Appreciate your help, almighty StackOverflow
Ray
You can either use table.* or table.fieldName.
There is no syntax available for table.* (except field X).
This means that you can only get what you want by explicitly listing all of the fields...
select
A.field1,
A.field2,
B.field3,
A.field4,
etc
from
A join B on a.COL = B.COL;
This means that you may need to re-model your data so as to ensure you don't keep getting new fields. OR write dynamic sql. Interrogate the database to find out the column names, use code to write a query as above, and then run that dynamically generated query.
Try this: (not tested)
select Case B.COL
when null then A.COL
else B.REPLACER
end as COLAB, A.*
from A left join B on A.COL = B.COL;
This should get the B.REPLACER when exists B.COL = A.COL, you can add more column in the select (like sample col1, col2) or use A.* (change COL into COLAB to make it distinguish with A.COL in A.*) .
Like said before, you cannot specify in regular sql which column not to select. you could write a procedure for that, but it would be quite complex, because you would need to return a variable table type. Probably something with refcursor magic stuff.
The closest I could come up with is joining with using. This will give you the column col in the first field once and for the rest all columns in a and b. So not what you want basically. :)
select *
from a
join b using (col)
Let's start from first principles. select * from .... is a bug waiting to happend and has no place in production code. Of course everybody uses it because it entails less typing but that doesn't make it a good practice.
Beyond that, the ANSI SQL standard doesn't support select * except col1 from .... syntax. I know a lot of people wish it would but it doesn't.
There are a couple of ways to avoid excessive typing. One is to generate the query by selecting from data dictionary, using one of the views like USER_TAB_COLUMNS. It is worth writing the PL?SQL block to do this if you need lots of queries like this.
A simpler hack is to use the SQL*Plus describe to list out the structure of table A; cut'n'paste it into an IDE which supports regular expressions and edit the columns to give you the query's projection.
Both these options might strike you as labourious but frankly either workaround (and especially the second) would have taken less effort than asking StackOverflow. You'll know better next time.
In the result for SELECT * from myTable WHERE some-condition;
I'm interested in 9 of all the 10 columns that exist. The only way out is to specify the 9 columns explicitly ?
I cannot somehow specify just the column I don't want to see?
The only way is to list all 9 columns.
Such as:
SELECT col1, col2, col3, col4, col5, col6, col7, col8, col9 FROM myTable
No, you can not. An example definition of select list for Sybase can be found here, you can easily find others for other DBs
The reason for that is that the standard methods of selection - "*" (aka all columns) and a list of columns - are defined operations in relational Algebra whereas the exclusion of columns is not
Also, as mentioned in Joe's comment, it is usually considered good practice to explicitly specify column list as opposed to "*" even when selecting all columns.
The reason for that is that having * in a joined query may cause the query to break if a table schema change introduces identically-named fields in both of the joined tables.
However, when selecting without a join from a very wide and often-mutating table, the above rule may not apply, as having "*" makes for a good change management (your query is one less place to fix and release when adding new columns), especially if you have flexible DB retrieval code that can dynamically deal with a column set from table definition instead of something specified in the code. (e.g., 100% of our extractors and loaders are fully working whenever a new column is added to the DB).
If you had to (can't think of why), but you could dynamically create this select statement by querying the columns in this table and exclude the one column name in the where clause.
Not worth the performance hit, confusion, and maintenance issues that will come up.
You actually need to specify the columns explicitly (as said by Luke it is good practice), and here is the reason:
Let's say that you write some code / scripts around you sql queries. You now have a whooping 50 different selects in various places of your code.
Suddenly you realize that for this new feature you are working on, you need another column (symmetry, you are doing cleanup and realize a column is useless and wasting space, though it is harder).
Now you are in either of this 2 situations:
You explicitly stated the columns in each and every query: Adding a column is a backward compatible change, just code your new feature and be done with it.
You used the '*' operator for a few queries: you have to track them down and modify them all. Forget a single one and it will be your grave.
Oh, and did I specify that a query with a '' selector takes more time to be executed since the DB actually has to query the model and develop the '' selector ?
Moral: only use the '*' selector when you are checking manually that your columns are fine (at which point you actually need to check everything), in code, just bane them or they'll be your doom.
No, you can't (at least not in any SQL dialect that I'm aware of).
It's good practice to explicitly specify your column names anyway, rather than using SELECT *.
In the end, you need to specify all 9 out of 10 columns separately - but there's tooling help out there which helps you make this easier!
Check out Red-Gate's SQL Prompt which is an intellisense-add-on for SQL Server Management Studio and Visual Studio.
Amongst a lot of other things, it allows you to type
SELECT * FROM MyTable
and then go back, put the cursor after the " * ", and press TAB - it will then list out all the columns in that table and you can tweak that list (e.g. remove a few you don't need).
Absolutely invaluable - saves hours and hours of mindless typing! Well worth the price of a license, I'd say.
Highly recommended!
Marc