Select Distinct has Duplicates

Select Distinct has Duplicates - sql

I have this query that is clearly producing duplicates, but I don't see why since I have the DISTINCT option in use here.
I just migrated SQL servers from one running SQL version 12.0.6329.1 to 13.0.6419.1 (2014 to 2016 I believe)
and I don't experience the same issue on the old server.
Any ideas why DISTINCT isn't working as [I] expected?
SELECT DISTINCT
[UWI_vn]
,[WI_PrdWellCnt]
,[AAV_GUID]
,[InResFlag]
FROM [AAV_WellStore].[dbo].[V_ResultsProdBdgtOpsUpLiveBaseV4.5]
WHERE [InResFlag] =1
AND [WI_PrdWellCnt] > 0
AND [UWI_vn] = '102/16-25-069-05W6/0'

Thanks to dfundako for the checksum trick and ThorstenKettner and KeithL for bringing up the intrinsic properties of the float column.
The [WI_PrdWellCnt] column is a float and goes through a CTE that aggregates hundereds of rows to get down to one row. This average must be whats causing the issue. You'd expect the average to be the same if all values are the same, but they arent We've broken this value out and calculated it separately so we don't have to deal with this issue.
Casting [WI_PrdWellCnt] to Real would also potentially solve the issue. (Casting to Decimal as suggested in this thread loses precision and rounds up to 1 in my example) The tables are produced by a proprietary application so altering the base tables is not an option.

Related

How can I use data from more than one measurement in a single Grafana panel?

I am attempting to create a gauge panel in Grafana (Version 6.6.2 - presume that upgrading is a last resort, but possible if necessary, for the purposes of this problem) that can represent the percentage of total available memory used by the Java Virtual Machine running a process of mine. the problem that I am running into is the following:
I have used Springboot actuator's metrics and imported them into an Influx database with Micrometer, but in the process, it has stored the two values that I would like to use in my calculation into two different measurements. jvm_memory_used and jvm_memory_max
My initial Idea was to simply call a SELECT on both of the measurements to get the value that I want, and then divide the "used" / "max" and multiply that value by 100 to get the percentage to display. Unfortunately I run into syntax errors when I try to do this manually, and I am unsure if I can do this using Grafana's query builder.
I know that the syntax is incorrect, but I am not familiar enough with InfluxQL to know how to properly structure this query. Here is what I had tried:
(SELECT last("value")
FROM "jvm_memory_used"
WHERE ("area" = 'heap')
AND $timeFilter
GROUP BY time($__interval) fill(null)
) /
(SELECT last("value")
FROM "jvm_memory_max"
WHERE ("area" = 'heap')
AND $timeFilter
GROUP BY time($__interval) fill(null)
)
(The AND and GROUP BY are present as a result of the default values from Grafana's query builder, I am not sure whether they are necessary or not)
I'm assuming that my parenthesis and division process is illegal, but I am not sure how to resolve it.
How can I divide these two values from separate tables?
EDIT: I have gotten slightly further but it seems that there is a new issue. I now have the following query that I am sending in:
SELECT 100 * (last("used") / sum("max")) AS "percentUsed"
FROM(
SELECT last("value") AS "used"
FROM "jvm_memory_used"
WHERE ("area" = 'heap')
AND $timeFilter
),(
SELECT last("value") AS "max"
FROM "jvm_memory_max"
WHERE ("area" = 'heap')
AND $timeFilter
)
GROUP BY time($__interval) fill(null)
and the result I get is this:
How can I now get this query to return only one gauge with data, instead of two with nulls?
I've accepted an answer that works for versions of Grafana after 7. If there are any other answers that arise that do not involve updating the version of Grafana, please provide them as well!

I am not particulary experienced with Influx, but since your question is how to use/combine two measurements (query results) for a Grafana panel, I can tell you about one approach:
You can use a transformation. By that, you can keep two separate queries. With the transformation mode binary operation you can simply divide one of your values by the other one.
In your specific case, to display the result as percentage, you can then use Percent (0.0-1.0) as unit and you should have accomplished your goal.

sql statement not working on AND (... OR ... OR ...)

this is probably a little thing
but i try to use this sql statement:
SELECT * FROM Colors
WHERE colorHueWarmth < 0
AND colorV >=0.7
AND (fk_subCategory=4 OR fk_subCategory=5 OR fk_subCategory=11)
And in the results i get the perfect colorHueWarmth and colorV but i also get the fk_subcategories for other values than 4, 5 or 11.
i tried changing the values but no results, is it even possible to do such a statement?
Does anyone what i am doing wrong?
Thanks in advance

You've actually got multiple options; although I'd point out that the query (in your qusetion) actually works for me (see this Sql Fiddle)
SELECT
*
FROM
Colors
WHERE
colorHueWarmth < 0
AND colorV >=0.7
AND (fk_subCategory=4 OR fk_subCategory=5 OR fk_subCategory=11)
As stated in one of the comments I would guess that your original didn't have braces on the fk_subCategory clause (the third table in my previous fiddle). Brackets are immensely important when working with logic and should always be used to group items together.
The easiest solution is as follows:
SELECT
*
FROM
Colors
WHERE
colorHueWarmth < 0
AND colorV >=0.7
AND (fk_subCategory IN(4,5,11));
You will find loads of documentation online regarding the LIKE clause here are a few you might find useful:
http://webcheatsheet.com/sql/interactive_sql_tutorial/sql_in.php
http://www.w3schools.com/sql/sql_in.asp (note W3Schools can't always be taken on face value and are often excluded from suggested links due to the errors/omissions they often contain)
http://msdn.microsoft.com/en-gb/library/ms177682.aspx
Given the size of the foreign key constraint (4,5 or 11) the IN clause is a reasonable option, if you have other queries using something similar with large collections this can become quite inefficient in which case you could create a temporary table which contains the ID's and INNER JOIN onto that. (here is a question regarding alternatives to LIKE)

Strange SQL Server type conversion issue

I've experienced today strange issue. One of my projects is running .NET + SQL Server 2005 Express.
There is one query I use for some filtering.
SELECT *
FROM [myTable]
where UI = 2011040773395012950010370
GO
SELECT *
FROM [myTable]
where UI = '2011040773395012950010370'
GO
UI column is nvarchar(256) and UI value passed to filter is always 25 digits.
On my DEV environment - both queries return same row and no errors. However at my customers, after few months of running fine, first version started to return type conversion error.
Any idea why?
I'm not looking for solution - I'm looking for explanation why on one environment it works and on other doesn't and why out of sudden it started to return errors instead of results. I'm using same tools on both (SQL Server Management Studio Express and 2 different .NET clients)
Environments are more or less the same (W2k3 + SQL Server 2005 Express)

This is completely predictable and expected because of Datatype precedence
For this, the UI column will be changed to decimal(25,0)
where UI = 2011040773395012950010370
This one is almost correct. The right hand side is varchar and is changed to nvarchar
where UI = '2011040773395012950010370'
This is the really correct version where both types are the same
where UI = N'2011040773395012950010370'
Errors will have started because the UI column now contains a value that won't CAST to decimal(25,0).
Some unrelated notes:
if you have an index on the UI column it would be ignored in the first version because of the implicit CAST required
do you need unicode to store numeric digits? There is a serious overhead with unicode data types in storage and performance
why not use char(25) or nchar(25) is values are always fixed length? Your queries use too much memory as the optimiser assumes an average length of 128 characters based on nvarchar(256)
Edit, after comment
Don't assume "why does it works sometimes" when you don't know that it does work
Examples:
The value could have been deleted then added later
A TOP clause or SET ROWCOUNT could mean the offending value is not reached
The query was never run so it couldn't fail
The error is silently ignored by some other code?
Edit 2 for hopefully more clarity
Chat
gbn:
When you run WHERE UI = 2011040773395012950010370, you do not know the order of row access. So if one row does have "bicycle" you may or may not hit that row.
Random:
So the problem may be not in the row which i was trying to access but other one with corrupted value?
gbn
different machines will have different order of reads based on service pack level, index and table fragmentation, number of CPUs, parallelism maybe
correct
and TOP even. That kind of stuff
As Tao mentions, it's important to understand that another unrelated can break the query even if this one is OK.
data type precedence can cause ALL the data in that column to be converted before the where clause is evaluated

Performance of SQL functions vs. code functions

We're currently investigating the load against our SQL server and looking at ways to alleviate it. During my post-secondary education, I was always told that, from a performance standpoint, it was cheaper to make SQL Server do the work. But is this true?
Here's an example:
SELECT ord_no FROM oelinhst_sql
This returns 783119 records in 14 seconds. The field is a char(8), but all of our order numbers are six-digits long so each has two blank characters leading. We typically trim this field, so I ran the following test:
SELECT LTRIM(ord_no) FROM oelinhst_sql
This returned the 783119 records in 13 seconds. I also tried one more test:
SELECT LTRIM(RTRIM(ord_no)) FROM oelinhst_sql
There is nothing to trim on the right, but I was trying to see if there was any overhead in the mere act of calling the function, but it still returned in 13 seconds.
My manager was talking about moving things like string trimming out of the SQL and into the source code, but the test results suggest otherwise. My manager also says he heard somewhere that using SQL functions meant that indexes would not be used. Is there any truth to this either?

Only optimize code that you have proven to be the slowest part of your system. Your data so far indicates that SQL string manipulation functions are not effecting performance at all. take this data to your manager.
If you use a function or type cast in the WHERE clause it can often prevent the SQL server from using indexes. This does not apply to transforming returned columns with functions.

It's typically user defined functions (UDFs) that get a bad rap with regards to SQL performance and might be the source of the advice you're getting.
The reason for this is you can build some pretty hairy functions that cause massive overhead with exponential effect.
As you've found with rtrim and ltrim this isn't a blanket reason to stop using all functions on the sql side.

It somewhat depends on what all is encompassed by: "things like string trimming", but, for string trimming at least, I'd definitely let the database do that (there will be less network traffic as well). As for the indexes, they will still be used if you're where clause is just using the column itself (as opposed to a function of the column). Use of the indexes won't be affected whatsoever by using functions on the actual columns you're retrieving (just on how you're selecting the rows).
You may want to have a look at this for performance improvement suggestions: http://net.tutsplus.com/tutorials/other/top-20-mysql-best-practices/

As I said in my comment, reduce the data read per query and you will get a speed increase.
You said:
our order numbers are six-digits long
so each has two blank characters
leading
Makes me think you are storing numbers in a string, if so why are you not using a numeric data type? The smallest numeric type which will take 6 digits is an INT (I'm assuming SQL Server) and that already saves you 4 bytes per order number, over the number of rows you mention that's quite a lot less data to read off disk and send over the network.
Fully optimise your database before looking to deal with the data outside of it; it's what a database server is designed to do, serve data.

As you found it often pays to measure but I what I think your manager may have been referring to is somthing like this.
This is typically much faster
SELECT SomeFields FROM oelinhst_sql
WHERE
datetimeField > '1/1/2011'
and
datetimeField < '2/1/2011'
than this
SELECT SomeFields FROM oelinhst_sql
WHERE
Month(datetimeField) = 1
and
year(datetimeField) = 2011
even though the rows that are returned are the same

Maximum values in wherein clause of mysql

Do anyone knows about how many values I can give in a where in clause? I get 25000 values in a where in clause and mysql is unable to execute. Any thoughts? Awaiting for your thoughts

Although this is old, it still shows up in search results so is worth answering.
There is no hard-coded maximum in MySQL for the length of a query. This includes all parts of the query such as the WHERE clause.
However, there is a value called max_allowed_packet which determines the largest query you can run on the MySQL server process. It isn't to do with the number of elements in the query, but the total length of the query. So
SELECT * FROM mytable WHERE mycol IN (1,2,3);
is less likely to hit the limit than
SELECT * FROM mytable WHERE mycal IN ('This string','That string','Tother string');
The value of max_allowed_packet is configurable from server to server. But almost certainly, if you find yourself hitting the limit because you're writing SQL statements of epic length (rather than dealing with binary data which is a legitimate reason to hit it), then you need to re-think your SQL.

I think that if this restriction is a problem then you're doing something wrong.
Perhaps you could store the data from your where clause in a table and then join with it. This would probably be more efficient.

I think it is something with execution time.
I think you are doing soemthing like this: Correct me if i am wrong:
Select FROM table WHERE V1='A1' AND V2='A1' AND V3='A3' AND ... Vn='An'
There is always a efficient way how you can do your SELECT in your database. Working in a database is importent to keep in your mind that seconds are very importent.
If you can share how your query is look like, then we can help you making a efficient SELECT statement.
I wish u succes

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Select Distinct has Duplicates - sql

Related

How can I use data from more than one measurement in a single Grafana panel?

sql statement not working on AND (... OR ... OR ...)

Strange SQL Server type conversion issue

Performance of SQL functions vs. code functions

Maximum values in wherein clause of mysql

Categories

Resources