How do I find the closest number across columns? - sql

I have this table:
col_1 | col_2 | col_3 | compare
------+-------+-------+--------
1.1 | 2.1 | 3.1 | 2
------+-------+-------+--------
10 | 9 | 1 | 15
I want to derive a new column choice indicating the column closest to the compare value:
col_1 | col_2 | col_3 | compare | choice
------+-------+-------+---------+-------
1.1 | 2.1 | 3.1 | 2 | col_2
------+-------+-------+---------+-------
10 | 9 | 1 | 15 | col_1
Choice refers to the column where cell value is closest to the compare value.

I think the simplest method is apply:
select t.*, v.which as choice
from t cross apply
(select top (1) v.*
from (values ('col_1', col_1), ('col_2', col_2), ('col_3', col_3)
) v(which, val)
order by abs(v.val - t.compare)
) v;
In the event of ties, this returns an arbitrary closest column.
You can also use case expressions, but that gets complicated. With no NULL values:
select t.*,
(case when abs(compare - col_1) <= abs(compare - col_3) and
abs(compare - col_1) <= abs(compare - col_3)
then 'col_1'
when abs(compare - col_2) <= abs(compare - col_3)
then 'col_2'
else 'col_3'
end) as choice
from t;
In the event of ties, this returns the first column.

Related

How to partition and find the most latest value in SQL

I have a table as follows:
ID | col1 | Date Time
1 | WA | 2/11/20
1 | CI | 1/11/20
2 | CI | 2/11/20
2 | WA | 3/11/20
3 | WA | 2/10/20
3 | WA | 1/11/20
3 | WA | 2/11/20
4 | WA | 1/10/20
4 | CI | 2/10/20
4 | SA | 3/10/20
I want to find all ID values for which col1 had some other value in addition to WA as well and the most latest value in col1 should be 'WA'. i.e. from the sample data above , only ID values 1 & 2 should be returned. Because both of those have an additional value (i.e., CI) in additon to WA, but still the most latest value for them is WA.
How do I get that??
FYI, there could be some IDs that don't have WA value at all. I want to eliminate them. Also those that only have WA value, I want to eliminate those as well.
Thanks for the help.
You can use window functions for this:
select distinct id
from (
select
t.*,
last_value(col1) over(partition by id oder by datetime) last_col1,
min(col1) over(partition by id) min_col1,
max(col1) over(partition by id) max_col1
from mytable t
) t
where last_col1 = 'WA' and min_col1 <> max_col1
The inner query uses last_value() to recover the last value of col1 for the given id, and computes the min and max values in the same partition.
Then, the outer query filters on ids whose last value is 'WA' and that have at least two distinct values (which is phrased as the inequality of the min and max value).
You can do this with aggregation:
select id
from t
group by id
having min(col1) <> max(col1) and -- at least two different values
max(case when col1 = 'WA' then datetime end) = max(datetime) -- last is WA

Find and update non duplicated record based on one of the column

I want to find all the non duplicated records and update one of the column.
Ex.
Col_1 | Col_2 | Col_3 | Col_4 | Col_5
A | AA | BB | 1 |
A | AB | BC | 2 |
A | AC | BD | 3 |
B | BB | CC | 1 |
B | BB | CC | 2 |
C | CC | DD | 1 |
My query has to group by Col_1, and I want to find out not unique record based on Col_2 and Col3 and then update the Col_5.
Basically output should be as below,
Col_1 | Col_2 | Col_3 | Col_4 | Col_5
A | AA | BB | 1 | 1
A | AB | BC | 2 | 1
A | AC | BD | 3 | 1
B | BB | CC | 1 | 0
B | BB | CC | 2 | 0
C | CC | DD | 1 | 0
Does anyone have an idea how can I achieve this? This is a large database, so performance is also a key factor.
Thanks heaps,
There are plenty ways to do it. This solution comes from postgres for which I have access to, but I bet it will be working also on tsql as should have common syntax.
;WITH
cte_1 AS (
SELECT col_1 FROM some_table GROUP BY col_1 HAVING count(*) > 1
),
cte_2 AS (
SELECT col_1 FROM some_table GROUP BY col_1, col_2, col_3 HAVING count(*) > 1
),
cte_3 AS (
SELECT cte_1.col_1 FROM cte_1
LEFT JOIN cte_2 ON cte_1.col_1 = cte_2.col_1
WHERE cte_2.col_1 IS NULL
)
UPDATE some_table SET col_5 = 1
FROM cte_3 WHERE cte_3.col_1 = some_table.col_1;
So, what happens above?
First we build three CTE semi-tables which allow us to split logic into smaller parts:
cte_1 which extracts rows which can have multiple col2 and col_3 rows
cte_2 which selects those which have non-unique col_2 and col_3
cte_3 which returns those col_1 which have unique col_2 and col_3 just by LEFT JOIN
Using the last cte_3 structure we are able to update some_table correctly
I assume that your table is called some_table here. If you're worring about a performance you should provide some primary key here and also it would be good to have indexes on col_2 and col_3 (standalone but it may help if those would be composite on (col_1, col_2) and so on).
Also you may want to move it from CTE to use temporary tables (which could be also indexed to gain efficiency.
Please also note that this query works fine with your example but without real data it may be just guessing. I mean that what will happen when you would have for col_1=A some unique and non-uniqe col_2 in the same time?
But I believe it's good point to start.
;WITH
cte_1 AS (
SELECT col_1, count(*) as items FROM some_table GROUP BY col_1 HAVING count(*) > 1
),
cte_2 AS (
SELECT col_1, count(*) as items FROM some_table GROUP BY col_1, col_2, col_3 HAVING count(*) > 1
),
cte_3 AS (
SELECT cte_1.col_1 FROM cte_1
LEFT JOIN cte_2 ON cte_1.col_1 = cte_2.col_1
WHERE cte_2.col_1 IS NULL OR cte_1.items > cte_2.items
GROUP BY cte_1.col_1
)
UPDATE some_table SET col_5 = 1
FROM cte_3 WHERE cte_3.col_1 = some_table.col_1;

TSQL Number Rows Based on change in fieldvalue and sorted on date with incremented numbers on duplicates

Say I have a data like the following:
X | 2/2/2000
X | 2/3/2000
B | 2/4/2000
B | 2/10/2000
B | 2/10/2000
J | 2/11/2000
X | 3/1/2000
I would like to get a dataset like this:
1 | X | 2/2/2000
1 | X | 2/3/2000
2 | B | 2/4/2000
2 | B | 2/10/2000
2 | B | 2/10/2000
3 | J | 2/11/2000
4 | X | 3/1/2000
So far everything I have tried has either ended up numbering each change resetting the count on each field value change or in the example leave the last X as 1.
This is a gaps and islands problem. You can use a difference of row numbers:
select dense_rank() over (order by col1, seqnum_1 - seqnum_2) as col0,
col1, col2
from (select t.*,
row_number() over (order by col2) as seqnum_1,
row_number() over (partition by col1 order by col2) as seqnum_2
from t
) t;
Explaining why this works is a bit cumbersome. If you run the subquery, you will see how the sequence numbers are assigned and why the difference is what you want.
you can query like this:
SELECT dense_rank() over(order by yourcolumn1), * from yourtable

Cannot use group by and over(partition by) in the same query?

I have a table myTable with 3 columns. col_1 is an INTEGER and the other 2 columns are DOUBLE. For example, col_1={1, 2}, col_2={0.1, 0.2, 0.3}. Each element in col_1 is composed of all the values of col_2 and col_2 has repeated values for each element in col_1. The 3rd column can have any value as shown below:
col_1 | col_2 | Value
----------------------
1 | 0.1 | 1.0
1 | 0.2 | 2.0
1 | 0.2 | 3.0
1 | 0.3 | 4.0
1 | 0.3 | 5.0
2 | 0.1 | 6.0
2 | 0.1 | 7.0
2 | 0.1 | 8.0
2 | 0.2 | 9.0
2 | 0.3 | 10.0
What I want is to use an aggregate-function SUM() on the Value column partition by col_1 and grouped by col_2. The Above table should look like this:
col_1 | col_2 | sum_value
----------------------
1 | 0.1 | 1.0
1 | 0.2 | 5.0
1 | 0.3 | 9.0
2 | 0.1 | 21.0
2 | 0.2 | 9.0
2 | 0.3 | 10.0
I tried the following SQL query:
SELECT col_1, col_2, sum(Value) over(partition by col_1) as sum_value
from myTable
GROUP BY col_1, col_2
But on DB2 v10.5 it gave the following error:
SQL0119N An expression starting with "Value" specified in a SELECT
clause, HAVING clause, or ORDER BY clause is not specified in the
GROUP BY clause or it is in a SELECT clause, HAVING clause, or ORDER
BY clause with a column function and no GROUP BY clause is specified.
Can you kindly point out what is wrong. I do not have much experience with SQL.
Thank you.
Yes, you can, but you should be consistent regarding the grouping levels.
That is, if your query is a GROUP BY query, then in an analytic function
you can only use "detail" columns from the "non-analytic" part of your selected
columns.
Thus, you can use either the GROUP BY columns or the non-analytic aggregates,
like this example:
select product_id, company,
sum(members) as No_of_Members,
sum(sum(members)) over(partition by company) as TotalMembership
From Product_Membership
Group by Product_ID, Company
Hope that helps
SELECT col_1, col_2, sum(Value) over(partition by col_1) as sum_value
-- also try changing "col_1" to "col_2" in OVER
from myTable
GROUP BY col_2,col_1
I found the solution.
I do not need to use OVER(PARTITION BY col_1) because it is already in the GROUP BY clause. Thus, the following query gives me the right answer:
SELECT col_1, col_2, sum(Value) as sum_value
from myTable GROUP BY col_1, col_2
since I am already grouping w.r.t col_1 and col_2.
Dave, thanks, I got the idea from your post.

SQL query showing just few DISTINCT records

I have the following records:
Col_1 | Col_2 | Col_3
------+-------+------
A | A | XYZ01
A | A | XYZ02
A | A | XYZ03
A | B | XYZ04
B | B | XYZ05
B | B | XYZ06
B | B | XYZ07
B | B | XYZ08
I need a query which will return maximum of 2 records where Col_1 and Col_2 are distinct (regardless of Col_3) (that should be like 2 records sample of each distinct col_1,col_2 combination).
So this query should return:
Col_1 | Col_2 | Col_3
------+-------+------
A | A | XYZ01
A | A | XYZ02
A | B | XYZ04
B | B | XYZ05
B | B | XYZ06
SELECT *
FROM (
SELECT col_1
,col_2
,col_3
,row_number() OVER (
PARTITION BY col_1
,col_2 ORDER BY col_1
) AS foo
FROM TABLENAME
) bar
WHERE foo < 3
Top command will not work because you want to 'group by' multiple columns. What will help is partitioning the data and assigning a row number to the partition data. By partitioning on col_1 and col_2 we can create 3 different groupings.
1.All rows with 'a' in col_1
2.All rows with 'b' in col_2
3 All rows with 'a' and 'b' in col_1,col_2
we will order by col_1 (I picked this because your result set was ordered by a). Then for each row in that grouping we will count the rows and display the row number.
We will use this information as a derived table, and select * from this derived table where the rownumber is less than 3. This will get us the first two elements in each grouping
as far as Oracle The use of Rank would work.
Select * From (SELECT Col_1 ,
Col_2 ,
Col_3 ,
RANK() OVER (PARTITION BY Col_1, Col_2 ORDER BY Col_1 ) part
FROM someTable ) st where st.part < 2;
Since I was reminded that you can't use the alias in the original where clause, I made a change that will still work, though may not be the most elegant.