Emulating TO_SECONDS() in older versions of MySQL (<5.5.0) - sql

For performance and simplicity reasons I would like to get the contents of a DATETIME column in my MySQL 3.x server as seconds (or any numeric type really - I just want to avoid all the apparent nastiness of timezones when using UNIX_TIMESTAMP() [the dates in my table are indeed from different locales so I'd rather not have any doubt as to whether or not some timezone-compensation weirdness is going on behind my back]).
The TO_SECONDS() function seemed ideal, until I found out that it only works on newer MySQL installations (upgrading is not an option)...
I though about doing something like this:
SELECT (TO_DAYS(Timestamp)-730486)*86400+TIME_TO_SEC(Timestamp)
To manually calculate the number of seconds elapsed since 2000-01-01. But it seems like it might put unnecessary load on the server by forcing it to make a temporary table or something?
I could also just do:
SELECT TO_DAYS(Timestamp), TIME_TO_SEC(Timestamp)
and combine the results myself, but that makes it less simple on the code-side.
What's the best compromise? I'll be fetching a large number of rows (on the order of 10^6) each query, so both client and server-side performance is not entirely unimportant..
Thanks,.
(post edited to reduce confusion)

Firstly, just to make sure, the new field will be a BIGINT... correct?
Can you use explicit casting in to prevent the overflow?
SELECT CAST(TO_DAYS(Timestamp)*86400 + TIME_TO_SEC(Timestamp) AS UNSIGNED INTEGER)
Or perhaps use an intermediate string before populating the new BIGINT field?
SELECT CAST(TO_DAYS(Timestamp)*86400 + TIME_TO_SEC(Timestamp) AS UNSIGNED CHAR(11))

Related

IGNORE CASE query problems saving to a table and using Allow large results

I need case insensitivity in my queries so I found IGNORE CASE which works superbly when used in queries that target the browser (I am talking about BQ web UI). If I choose a destination table (an absolute must for me) and select Allow Large Results (with unchecked Flatten Results) then I get a cryptic error like this:
Error: unexpected LIMIT clause at: 2.200 - 2.206
Even though this Official Google BigQuery issue and feature request tracker post seems to speak of the same issue and even though the problem seems to have been acknowledged back in Jan 2015 the solution isn't apparent.
I could potentially use a bunch of temp tables with lowercased search columns as a workaround but that sounds awfully difficult with the number of tables and columns that I have and the complex queries that I intend to run.
Any other possible workarounds? Why isn't this working yet on BQ?
Yes, it is a known problem, and it has not been neglected. The code changes to fix it are (surprisingly) not trivial, but they are mostly done. Not team is carefully looking how to enable and deploy them. I cannot give you a timeline, but the fix to this problem is coming.
The only workarounds in the meantime, are to wrap all the string comparisons, string GROUP BYs and string ORDER BYs with conversion to LOWER() (or UPPER()) of operands.

cursor_sharing parameter in Oracle

I would like to know the tradeoff in setting cursor_sharing parameter in Oracle as "FORCE" .
As this would try to soft-parse any SQL statement, and certainly the performance must be improved.
But the default value is "EXACT", so I would like to know if there is any danger in setting it as FORCE or SIMILAR.
Unless you really know what you're doing, I'd recommend not changing this setting.
Usually, if you've got a high number of hard parses, it's an indication of bad application design.
A typical example for selecting all products for a given category (pseudocode):
stmt = 'select * from products where category = ' || my_category
results = stmt.execute
This is flawed for a number of reasons:
it creates a different SQL statement for each category, therefore increasing the number of hard parses dramatically
it is vulnerable to SQL injection attacks
A good application runs perfectly OK with cursor_sharing = exact. A good application can use literals for specific reasons, for example select orders with state = new. That use of literals is OK. If the application uses literals to identify an order by ID it would be different since the will be many different order ID's.
Best is to clean-up the app to use literals in the correct way or to start using prepared statements for best performance.
IF you happen to have an application that only uses literals, set cursor_sharing to FORCE. In 11g there are mechanisms, like cardinality feedback to be able to adjust an execution plan based on un expected row counts that came from a query to make sure that the plans that are originally planned for a query are corrected based on the input and output, for the next time it is used.

Is it a bad idea to use SQLExecDirect with preformatted query string instead of SQLPrepare+SQLBindParameter+SQLExecute?

OK, at the first glance, it seems that it must be more efficient to use SQLPrepare+SQLBindParameter+SQLExecute than format string (e.g. with CString::Format) and pass the whole complete query string to SQLExecDirect. If not, why would there exist the second method (SQLPrepare+SQLBindParameter+SQLExecute) at all?
BUT... here is what I think: The driver has sooner or later (I suspect later, but anyway...) convert the parameters (that I feed it with SQLBindParameter) into string representation right? (Or maybe not?) So if I make this formatting in my application (printf-like formatting), will I have any loss in performance?
One thing I suspect is that when the connection is over the network, passing parameters as raw data and then formatting them at server end might decrease the network traffic, instead of passing preformatted query strings, but lets ignore the network traffic for a moment. If not that, is there any performance gain in using SQLPrepare+SQLBindParameter+SQLExecute instead of formatting full query string in application and then using SQLExecDirect?
For me using SQLExecDirect is simpler and more convenient, so I need good answer on whether (and if) I should opt to another approach.
Important: If you will say that SQLPrepare+SQLBindParameter+SQLExecute approach will give better performance, I'd like to know how much! I don't mind theoretical assumptions, I'd like to know when is it worth practically? My current use-case is not very db-intensive, I won't have more than 100 inserts/updates per second, is it ok to use SQLExecDirect? In what scenarios - if ever - do I have to use SQLPrepare+SQLBindParameter+SQLExecute?
If you are inserting or updating with the same SQL (excluding parameters) repeatedly then SQLPrepare, SQLBindParameter and SQLExecute will be faster than SQLExecDirect every time. Consider:
SQLPrepare("insert into mytable (cola, colb) values(?,?);");
for (n = 0; n < 10000; n++) {
SQLBindParameter(1, n);
SQLBindParameter(2, n);
SQLExecute;
}
and
for (n = 0; n < 10000; n++) {
char sql[1000];
sprintf("insert into mytable (cola, colb) values(%d,%d)", n, n);
SQLExecDirect(sql);
}
In the first example, the statement is prepared once and hence the db engine only has to parse it once and work out an execution plan once. In the second example the sql and parameters are passed every time and the SQL looks different every time so it is parsed each time.
In addition, in the first example you can use arrays of parameters to pass multiple rows of parameters in one go - see SQL_PARAMSET_SIZE.
See 3.1.2 Inserting data for a worked example and an indication of how much time you can save.
Ignore network traffic, you'll just be second guessing what happens under the hood in the driver.
ADDITION:
Regarding your description of what happens with parameters where you seem to think the driver will convert them to strings; the other advantage of binding parameters is you can provide them in one type and ask the driver to use them as another type. You may find you'll come across a parameter type which cannot easily be represented as a string without adding some sort of conversion function which could be avoided with a parameter.
Yes, it's a bad idea, and for two reasons:
Performance
SQLPrepare causes the SQL statement to be parsed, and depending on the SQL statement it can be time consuming. If you're using a DB on another server, it might get sent to it for parsing. Even if the parsing takes only e.g. 10% time of executing your whole query, you save that time when executing the statement twice. That may be the case when you're inserting multiple rows, or call a "select" another time.
Of course the SQL statement passed must always be a static string. Some SQL frameworks may even do prepared statement caching for you. I don't know if ODBC does this. If you want to have real performance numbers, you have to measure for yourself - every query is different (and even might depend on the table contents, too).
SQL Injection
No matter what you say where the data comes from that you're formatting with CString::Format or any other method, you might be at risk for SQL injection. Even if you're using strings from your sources, sometimes later you or someone other may be changing your code to accept data from outside, and then you're vulnerable to SQL injection. If you need more info about SQL injection, just search StackOverflow, I'm sure there are some good questions about it, or see this image:

Strange SQL Server type conversion issue

I've experienced today strange issue. One of my projects is running .NET + SQL Server 2005 Express.
There is one query I use for some filtering.
SELECT *
FROM [myTable]
where UI = 2011040773395012950010370
GO
SELECT *
FROM [myTable]
where UI = '2011040773395012950010370'
GO
UI column is nvarchar(256) and UI value passed to filter is always 25 digits.
On my DEV environment - both queries return same row and no errors. However at my customers, after few months of running fine, first version started to return type conversion error.
Any idea why?
I'm not looking for solution - I'm looking for explanation why on one environment it works and on other doesn't and why out of sudden it started to return errors instead of results. I'm using same tools on both (SQL Server Management Studio Express and 2 different .NET clients)
Environments are more or less the same (W2k3 + SQL Server 2005 Express)
This is completely predictable and expected because of Datatype precedence
For this, the UI column will be changed to decimal(25,0)
where UI = 2011040773395012950010370
This one is almost correct. The right hand side is varchar and is changed to nvarchar
where UI = '2011040773395012950010370'
This is the really correct version where both types are the same
where UI = N'2011040773395012950010370'
Errors will have started because the UI column now contains a value that won't CAST to decimal(25,0).
Some unrelated notes:
if you have an index on the UI column it would be ignored in the first version because of the implicit CAST required
do you need unicode to store numeric digits? There is a serious overhead with unicode data types in storage and performance
why not use char(25) or nchar(25) is values are always fixed length? Your queries use too much memory as the optimiser assumes an average length of 128 characters based on nvarchar(256)
Edit, after comment
Don't assume "why does it works sometimes" when you don't know that it does work
Examples:
The value could have been deleted then added later
A TOP clause or SET ROWCOUNT could mean the offending value is not reached
The query was never run so it couldn't fail
The error is silently ignored by some other code?
Edit 2 for hopefully more clarity
Chat
gbn:
When you run WHERE UI = 2011040773395012950010370, you do not know the order of row access. So if one row does have "bicycle" you may or may not hit that row.
Random:
So the problem may be not in the row which i was trying to access but other one with corrupted value?
gbn
different machines will have different order of reads based on service pack level, index and table fragmentation, number of CPUs, parallelism maybe
correct
and TOP even. That kind of stuff
As Tao mentions, it's important to understand that another unrelated can break the query even if this one is OK.
data type precedence can cause ALL the data in that column to be converted before the where clause is evaluated

Performance of SQL functions vs. code functions

We're currently investigating the load against our SQL server and looking at ways to alleviate it. During my post-secondary education, I was always told that, from a performance standpoint, it was cheaper to make SQL Server do the work. But is this true?
Here's an example:
SELECT ord_no FROM oelinhst_sql
This returns 783119 records in 14 seconds. The field is a char(8), but all of our order numbers are six-digits long so each has two blank characters leading. We typically trim this field, so I ran the following test:
SELECT LTRIM(ord_no) FROM oelinhst_sql
This returned the 783119 records in 13 seconds. I also tried one more test:
SELECT LTRIM(RTRIM(ord_no)) FROM oelinhst_sql
There is nothing to trim on the right, but I was trying to see if there was any overhead in the mere act of calling the function, but it still returned in 13 seconds.
My manager was talking about moving things like string trimming out of the SQL and into the source code, but the test results suggest otherwise. My manager also says he heard somewhere that using SQL functions meant that indexes would not be used. Is there any truth to this either?
Only optimize code that you have proven to be the slowest part of your system. Your data so far indicates that SQL string manipulation functions are not effecting performance at all. take this data to your manager.
If you use a function or type cast in the WHERE clause it can often prevent the SQL server from using indexes. This does not apply to transforming returned columns with functions.
It's typically user defined functions (UDFs) that get a bad rap with regards to SQL performance and might be the source of the advice you're getting.
The reason for this is you can build some pretty hairy functions that cause massive overhead with exponential effect.
As you've found with rtrim and ltrim this isn't a blanket reason to stop using all functions on the sql side.
It somewhat depends on what all is encompassed by: "things like string trimming", but, for string trimming at least, I'd definitely let the database do that (there will be less network traffic as well). As for the indexes, they will still be used if you're where clause is just using the column itself (as opposed to a function of the column). Use of the indexes won't be affected whatsoever by using functions on the actual columns you're retrieving (just on how you're selecting the rows).
You may want to have a look at this for performance improvement suggestions: http://net.tutsplus.com/tutorials/other/top-20-mysql-best-practices/
As I said in my comment, reduce the data read per query and you will get a speed increase.
You said:
our order numbers are six-digits long
so each has two blank characters
leading
Makes me think you are storing numbers in a string, if so why are you not using a numeric data type? The smallest numeric type which will take 6 digits is an INT (I'm assuming SQL Server) and that already saves you 4 bytes per order number, over the number of rows you mention that's quite a lot less data to read off disk and send over the network.
Fully optimise your database before looking to deal with the data outside of it; it's what a database server is designed to do, serve data.
As you found it often pays to measure but I what I think your manager may have been referring to is somthing like this.
This is typically much faster
SELECT SomeFields FROM oelinhst_sql
WHERE
datetimeField > '1/1/2011'
and
datetimeField < '2/1/2011'
than this
SELECT SomeFields FROM oelinhst_sql
WHERE
Month(datetimeField) = 1
and
year(datetimeField) = 2011
even though the rows that are returned are the same