MySQL - Join Based on Date - sql

Is it possible to join two tables based on the same date, not factoring in time?
Something like:
...FROM appointments LEFT JOIN sales ON
appointments.date = sales.date...
The only problem is it is a datetime field, so I want to make sure it is only looking at the date and ignoring time

You can do it like this:
FROM appointments
LEFT JOIN sales ON DATE(appointments.date) = DATE(sales.date)
But I'm pretty sure it won't be able to use an index, so will be very slow.
You might be better off adding a date column to each table.

the ON clause accepts an arbitrary conditional expression, so you can perform the date normalization in both sides before comparing, but it should represent a significant performance penalty.

Yes, but you have to join on a calculated expression, that strips the time from the datetime column values. This will make the query non-SARGable (It cannot use an index) so it will generate a table scan...
I don't know the syntax in mysql to strip time from a datetime value, but whatever it is, just write
FROM appointments a
LEFT JOIN sales s
ON StripTime(s.date) = StripTime(a.date)

Related

Performance of JOINS in SAP HANA Calculation View

For Example:
I have 4 columns (A,B,C,D).
I thought that instead of connecting each and every column in join I should make a concatenated column in both projection(CA_CONCAT-> A+B+C+D) and make a join on this, Just to check on which method performance is better.
It was working faster earlier but in few CV's this method is slower sometimes, especially at time of filtering!
Can any one suggest which is an efficient method?
I don't think the JOIN conditions with concatenated fields will work better in performance.
Although we say in general there is not a need for index on column tables on HANA database, the column tables have a structure that works with an index on every column.
So if you concatenate 4 columns and produce a new calculated field, first you loose the option to use these index on 4 columns and the corresponding joining columns
I did not check the execution plan, but it will probably make a full scan on these columns
In fact I'm surprised you have mentioned that it worked faster, and experienced problems only on a few
Because concatenation or applying a function on a database column is even only by itself a workload over the SELECT process. It might include implicit type cast operation, which might bring additional workload more than expected
First I would suggest considering setting your table to column store and check the new performance.
After that I would suggest to separate the JOIN to multiple JOINs if you are using OR condition in your join.
Third, INNER JOIN will give you better performance compare to LEFT JOIN or LEFT OUTER JOIN.
Another thing about JOINs and performance, you better use them on PRIMARY KEYS and not on each column.
For me, both the time join with multiple fields is performing faster than join with concatenated fields. For filtering scenario, planviz shows when I join with multiple fields, filter gets pushed down to both the tables. On the other hand, when I join with concatenated field only one table gets filtered.
However, if you put filter on both the fields (like PRODUCT from Tab1 and MATERIAL from Tab2), then you can push the filter down to both the tables.
Like:
Select * from CalculationView where PRODUCT = 'A' and MATERIAL = 'A'

Simple query hangs forever on large table

I'm trying to crosswalk some code values from another developer's code using the business objects frontend (I know, it's sub-optimal, but they haven't given me back-end access).
What I need to do is just pull a record from the relevant table to compare code values to display values. I'm guessing the problem has something to do with the table containing millions of records. Even when I narrow my query to one value, try only records from today, and set Max rows retrieved to 1, it's hanging forever.
The code it generated for my query is:
SELECT
CLINICAL_EVENT.EVENT_CD,
CV_EVENT.DISPLAY
FROM
CLINICAL_EVENT,
CODE_VALUE CV_EVENT
WHERE
( CLINICAL_EVENT.EVENT_CD=CV_EVENT.CODE_VALUE )
AND
(
CLINICAL_EVENT.EVENT_CD = 338743225
AND
CLINICAL_EVENT.EVENT_END_DT_TM
> '16-02-2017 00:00:00'
)
Can you by chance avoid the cross join in your query by using a join syntax instead of the , notation? perhaps the engine is optimizing to avoid the cross join, perhaps not.
SELECT
CLINICAL_EVENT.EVENT_CD,
CV_EVENT.DISPLAY
FROM
CLINICAL_EVENT
INNER JOIN CODE_VALUE CV_EVENT
on CLINICAL_EVENT.EVENT_CD=CV_EVENT.CODE_VALUE
WHERE CLINICAL_EVENT.EVENT_CD = 338743225
AND CLINICAL_EVENT.EVENT_END_DT_TM > '16-02-2017 00:00:00'
Additionally what data type is EVENT_END_DT_TM perhaps implicitly casting your '16-02-2017 00:00:00' to a date or datetime would aid performance.
Expanding a bit on my comment:
The code values and corresponding display values you want to examine are effectively both coming from table CODE_VALUE. The only thing you're gaining from the join is duplication of those results according to the number of times the code value appears on the CLINICAL_EVENT rows satisfying the date criterion (in a sense that encompasses suppressing all appearances if there are no matching rows).
You seem to want simply to compare the code value and corresponding description, rather than to evaluate how many times that code appears. In that case, you are incurring a lot of unneeded work -- and possibly even some unwanted work -- by joining CODE_VALUE to CLINICAL_EVENT. Instead, just select the wanted row(s) directly from CODE_VALUE alone.

Slow Join on Varchar

I have a query in sql server with a join that is taking forever. I'm hoping someone might have a tip to speed it up.
I think the problem is I'm joining on a field called Reseller_CSN which has values like '_0070000050'
I've tried using the substring function in the join to return everything but underscore, example '0070000050' but I keep getting an error when I try to cast or convert the result to int or bigint.
Any tips would be greatly appreciated, the query is below:
SELECT
t1.RESELLER_CSN
,t1.FIRST_YEAR_RENEWAL
,t1.SEAT_SEGMENT
,t2.Target_End_Date_CY
,t2.Target_End_Date_PQ
,t2.Target_End_Date_CY_1
,t2.Target_End_Date_CY_2
,t1.ASSET_SUB_END_DATE
FROM dbo.new_all_renewals t1
LEFT JOIN dbo.new_all_renewals_vwTable t2
ON SUBSTRING(t1.RESELLER_CSN,2,11) = SUBSTRING(t2.RESELLER_CSN,2,11)
A join on processed columns invariably takes more effort than a join on raw columns. In this case, you can improve performance by using computed columns. For instance, on the first table, you can do:
alter table new_all_renewals add CSN_num as SUBSTRING(t1.RESELLER_CSN,2,11);
create index on new_all_renewals(CSN_num);
This will generate an index on the column, which should speed the query. (Note: you'll need to reference the computed column rather than actually using the function.)

Where clause affecting join

Dumb question time. Oracle 10g.
Is it possible for a where clause to affect a join?
I've got a query in the form of:
select * from
(select product, product_name from products p
join product_serial ps on product.id = ps.id
join product_data pd on pd.product_value = to_number(p.product_value)) product_result
where product_name like '%prototype%';
Obviously this is a contrived example. No real need to show the table structure as it's all imaginary. Unfortunately, I can't show the real table structure or query. In this case, p.product_value is a VARCHAR2 field which in certain rows have an ID stored inside it rather than text. (Yes, bad design - but something I inherited and am unable to change)
The issue is in the join. If I leave out the where clause, the query works and rows are returned. However, if I add the where clause, I get "invalid number" error on the pd.product_value = to_number(p.product_value) join condition.
Obviously, the "invalid number" error happens when rows are joined which contain non-digits in the p.product_value field. However, my question is how are those rows being selected? If the join succeeds without the outer where clause, shouldn't the outer where clause just select rows from the result of the join? It appears what is happening is the where clause is affecting what rows are joined, despite the join being in an inner query.
Is my question making sense?
It affects the plan that's generated.
The actual order that tables are joined (and so filtered) is not dictated by the order you write your query, but by the statistics on the tables.
In one version, the plan generated co-incidentally means that the 'bad' rows never get processed; because the preceding joins filtered the result set down to a point that they're never joined on.
The introduction of the WHERE clause has meant that ORACLE now believes a different order of join is better (because filtering by the product name requires a certain index, or because it narrows the data down a lot, etc).
This new order means that the 'bad' rows get processed before the join that filters them out.
I would endeavour to clean the data before querying it. Possibly by creating a derived column where the value is already cast to a number, or left as NULL if it is not possible to do so.
You can also use EXPLAIN PLAN to see the different plans being gerenated from your queries.
Short answer: yes.
Long answer: the query engine is free to rewrite your query however it wants, as long as it returns the same results. All of the query is available to it to use for the purpose of producing the most efficient query it can.
In this case, I'd guess that there is an index that covers what you are wanting, but it doesn't cover product name, when you add that to the where clause, the index isn't used and instead there's a scan where both conditions are tested at the same time, thus your error.
Which is really an error in your join condition, you shouldn't be using to_number unless you are sure it's a number.
I guess your to_number(p.product_value) only applies for rows with a valid product_name.
What happens is that your join is applied before your where clause resulting in the failure of the to_number function.
What you need to do is include your product_name like '%prototype%' as a JOIN clause like this:
select * from
(select product, product_name from products p
join product_serial ps on product.id = ps.id
join product_data pd on product_name like '%prototype%' AND
pd.product_value = to_number(p.product_value));
For more background (and a really good read), I'd suggest reading Jonathan Gennick's Subquery Madness.
Basically, the problem is that Oracle is free to evaluate predicates in any order. So it is free to push (or not push) the product_name predicate into your subquery. It is free to evaluate the join conditions in any order. So if Oracle happens to pick a query plan where it filters out the non-numeric product_value rows before it applies the to_number, the query will succeed. If it happens to pick a plan where it applies the to_number before filtering out the non-numeric product_value rows, you'll get an error. Of course, it's also possible that it will return the first N rows successfully and then you'll get an error when you try to fetch row N+1 because row N+1 is the first time that it is trying to apply the to_number predicate to a non-numeric data.
Other than fixing the data model, you could potentially throw some hints into the query to force Oracle to evaluate the predicate that ensures that all the non-numeric data is filtered out before the to_number predicate is applied. But in general, it's a bit challenging to fully hint a query in a way that will force the optimizer to always evaluate things in the "proper" order.

Conversion to datetime fails only on WHERE clause?

I'm having a problem with some SQL server queries. Turns out that I have a table with "Attibute_Name" and "Attibute_Value" fields, which can be of any type, stored in varchar. (Yeah... I know.)
All the dates for a particular attribute seem to be stored the the "YYYY-MM-DD hh:mm:ss" format (not 100% sure about that, there are millions of records here), so I can execute this code without problems:
select /*...*/ CONVERT(DATETIME, pa.Attribute_Value)
from
ProductAttributes pa
inner join Attributes a on a.Attribute_ID = pa.Attribute_ID
where
a.Attribute_Name = 'SomeDate'
However, if I execute the following code:
select /*...*/ CONVERT(DATETIME, pa.Attribute_Value)
from
ProductAttributes pa
inner join Attributes a on a.Attribute_ID = pa.Attribute_ID
where
a.Attribute_Name = 'SomeDate'
and CONVERT(DATETIME, pa.Attribute_Value) < GETDATE()
I will get the following error:
Conversion failed when converting date and/or time from character string.
How come it fails on the where clause and not on the select one?
Another clue:
If instead of filtering by the Attribute_Name I use the actual Attribute_ID stored in database (PK) it will work without problem.
select /*...*/ CONVERT(DATETIME, pa.Attribute_Value)
from
ProductAttributes pa
inner join Attributes a on a.Attribute_ID = pa.Attribute_ID
where
a.Attribute_ID = 15
and CONVERT(DATETIME, pa.Attribute_Value) < GETDATE()
Update
Thanks everyone for the answers. I found it hard to actually choose a correct answer because everyone pointed out something that was useful to understanding the issue. It was definitely having to do with the order of execution.
Turns out that my first query worked correctly because the WHERE clause was executed first, then the SELECT.
My second query failed because of the same reason (as the Attributes were not filtered, the conversion failed while executing the same WHERE clause).
My third query worked because the ID was part of an index (PK), so it took precedence and it drilled down results on that condition first.
Thanks!
You seem to be assuming some sort of short circuiting evaluation or guaranteed ordering of the predicates in the WHERE clause. This is not guaranteed. When you have mixed datatypes in a column like that the only safe way of dealing them is with a CASE expression.
Use (e.g.)
CONVERT(DATETIME,
CASE WHEN ISDATE(pa.Attribute_Value) = 1 THEN pa.Attribute_Value END)
Not
CONVERT(DATETIME, pa.Attribute_Value)
If the conversion is in the WHERE clause it may be evaluated for many more records (values) than it would be if it appears in the projection list. I have talked about this before in different context, see T-SQL functions do no imply a certain order of execution and On SQL Server boolean operator short-circuit. Your case is even simpler, but is similar, and ultimately the root cause is the same: do not an assume an imperative execution order when dealing with a declarative language like SQL.
Your best solution, by a far and a large margin, is to sanitize the data and change the column type to a DATETIME or DATETIME2 type. All other workarounds will have one shortcoming or another, so you may be better to just do the right thing.
Update
After a closer look (sorry, I'm #VLDB and only peeking SO between sessions) I realize you have an EAV store with inherent type-free semantics (the attribute_value can bea string, a date, an int etc). My opinion is that your best bet is to use sql_variant in storage and all the way up to the client (ie. project sql_variant). You can coherce the type in the client, all client APIs have methods to extract the inner type from a sql_variant, see Using sql_variant Data (well, almost all client APIs... Using the sql_variant datatype in CLR). With sql_variant you can store multiple types w/o the problems of going through a string representations, you can use SQL_VARIANT_PROPERTY to inspect things like the BaseType in the stored values, and you can even do thinks like check constraints to enforce data type correctness.
This has to do with the order that a SELECT query is processed. The WHERE clause is processed long before the SELECT. It has to determine which rows to include/exclude. The clause that uses the name must use a scan that investigates all rows, some of which do not contain valid date/time data, whereas the key probably leads to a seek and none of the invalid rows are included at the point. The convert in the SELECT list is performed last, and clearly by this time it is not going to try to convert invalid rows. Since you're mixing date/time data with other data, you may consider storing date or numeric data in dedicated columns with correct data types. In the meantime, you can defer the check in the following way:
SELECT /* ... */
FROM
(
SELECT /* ... */
FROM ProductAttributes AS pa
INNER JOIN dbo.Attributes AS a
ON a.Attribute_ID = pa.Attribute_ID
WHERE a.Attribute_Name = 'SomeDate'
AND ISDATE (pa.Attribute_Value) = 1
) AS z
WHERE CONVERT(CHAR(8), AttributeValue, 112) < CONVERT(CHAR(8), GETDATE(), 112);
But the better answer is probably to use the Attribute_ID key instead of the name if possible.
Seems like a data issue to me. Take a look at the data when you select it using the two different methods, try looking for distinct lengthsand then select the items in the different sets and eyeball them. Also check for nulls? (I'm not sure what happens if you try converting null to a datetime)
I think the problem is you have a bad date in your database (obviously).
In your first example, where you aren't checking the date in the WHERE clause, all of the dates where a.attribute.Name = 'SomeDate' are valid, so it never attempts to convert a bad date.
In your second example, the addition to the WHERE clause is causing the query plan to actually convert all those dates and finding the bad one and then looking at attribute name.
In your third example, changing to use Attribute_Id probably changes the query plan so that it only looks for those where id = 15 First, and then checks to see if those records have a valid date, which they do. (Perhaps Attribute_Id is indexed and Attribute_name isn't)
So, you have a bad date somewhere, but it's not on any records with Arttribute_id = 15.
You can check execution plans. It might be that with first query the second criteria ( CONVERT(DATETIME, pa.Attribute_Value) < GETDATE() ) gets evaluated first over all rows including ones with invalid data (not date), while in the case of the second one - a.Attribute_ID = 15 get evaluated first. Thus excluding rows with non-date values.
btw, the second one might be faster also, and if you don't have anything from Attributes in the select list, you can get rid of inner join Attributes a on a.Attribute_ID = pa.Attribute_ID.
On that note, it would be advisable to get rid of EAV while it's no too late :)