SQL code to get next variable in table with different value - sql

I need to find a way in SQL Server 2014 Management Studios to find the next unique value in a column that shares the value of a different column.
So for example below I would want my results to be
Column 1 - A
Column 2 - 1
Column 3 - 4
As that is the first time that A has unique values in column 2 and 3
Column1 | Column2 | Column3
---------+---------+---------
| A | X | 1 |
| A | X | 2 |
| B | Y | 3 |
| A | Z | 4 |
Query:
SELECT
Column1,
LEAD(Column3) OVER (PARTITION BY Column2 ORDER BY Column3) AS FindValue
FROM
Table

If I understand it correctly I would try something like this:
-- first we find minimum values for column1, column2 variations
WITH min_values AS (
SELECT
column1,
column2,
min(column3) AS min_value
FROM
table
GROUP BY 1,2
)
-- then we find bottom 2 values for column1
,bottom_2 AS (
SELECT
column1,
min_value,
row_number() OVER (PARTITION BY column1 ORDER BY min_value ASC) AS rn
FROM
min_values
)
-- THEN we JOIN results INTO single record
SELECT
b1.column1, b2.min_value, b1.min_value
FROM
bottom_2 b1
JOIN
bottom_2 b2 ON b1.column1 = b2.column1 AND b2.rn < b1.rn
WHERE b1.rn <= 2
I just checked comments above and would like to add some notes.
If you want to find next value ordered by column2 then you have to change order by from min_value to column2 in row_number() line. Otherwise, if you are looking for next inserted value then you need a timestamp or some kind of id.

Related

How to get distinct count over multiple columns in Hive SQL?

I have a table that looks like this. And I want to get the distinct count horizontally across the three columns ignoring nulls.
ID
Column1
Column 2
Column 3
1
A
B
C
2
A
A
B
3
A
A
The desired output I'm looking for is:
ID
Column1
Column 2
Column 3
unique_count
1
A
B
C
3
2
A
A
B
2
3
A
A
1
One possible option would be
WITH sample AS (
SELECT 'A' Column1, 'B' Column2, 'C' Column3 UNION ALL
SELECT 'A', 'A', 'B' UNION ALL
SELECT 'A', 'A', NULL UNION ALL
SELECT '', 'A', NULL
)
SELECT Column1, Column2, Column3, COUNT(DISTINCT NULLIF(TRIM(c), '')) unique_count
FROM (SELECT *, ROW_NUMBER() OVER () rn FROM sample) t LATERAL VIEW EXPLODE(ARRAY(Column1, Column2, Column3)) tf AS c
GROUP BY Column1, Column2, Column3, rn;
output
+---------+---------+---------+--------------+
| column1 | column2 | column3 | unique_count |
+---------+---------+---------+--------------+
| | A | NULL | 1 |
| A | A | NULL | 1 |
| A | A | B | 2 |
| A | B | C | 3 |
+---------+---------+---------+--------------+
case when C1 not in (C2, C3) then 1 else 0 end +
case when C2 not in (C3) then 1 else 0 end + 1
This will not work if you intend to count nulls. The pattern would extend to more columns by successively comparing each one to all columns to its right. The order doesn't strictly matter. There's just no point in repeating the same test over and over.
If the values were alphabetically ordered then you could test only adjacent pairs to look for differences. While that applies to your limited sample it would not be the most general case.
Using a column pivot with a distinct count aggregate is likely to be a lot less efficient, less portable, and a lot less adaptable to a broad range of queries.

sql - Only want rows with NULL in column if it isn't defined somewhere else as well

I have a table with possible NULL values in a column. I need to return the NULL values, but only if it isn't also defined somewhere else. Below, I want row F, but I do not want row B. We have some automation that attempts something but also has a fail over. We need to identify when both tries fail.
Column 1 | Column 2
A | 1
B | 1
B | null
C | 2
C | 1
D | 1
E | 2
F | null
F | null
G | 2
Simply do aggregation :
select col1, null as col2
from table t
group by col1
having max(col2) is null;
You can use not exists:
select t.*
from mytable t
where not exists (
select 1
from mytable t1
where t1.column1 = t.column1 and t1.column2 is not null
)
Or you can use window functions:
select column1, column2
from (
select t.*, max(column2) over(partition by column1) max_column2
from mytable t
) t
where max_column2 is null

SQL Partition By Function without aggregation

I have a table with data like the following:
Column1 | Column2 | Column3 | Value
SQ03 | D | 1000040 | 1000
SQ03 | | 1000040 | 1000
SQ03 | | 1000050 | 2000
SQ03 | | 1000060 | 3000
SQ03 | L | 1000060 | 3000
SQ03 | D | 1000060 | 3000
What I need to do is to get a single value based on column3. Is a value in column3 is unique, I need to get that value. But if there are duplicates in Column3, I need to get the value where Column2 is not null. But like in the example that I showed in above, there are values for Column3 where Column2 is marked more than once, in these cases I need to get only one of these values, doesn't matter what.
So I thought on flagging which line I would need with the following solution:
select *, CASE
WHEN "Column2" != ' '
THEN 'X'
WHEN "Column2" = ' ' AND row_number() over(PARTITION BY "Column3" ORDER BY "Column2" DESC, "Column3") = 1
THEN 'X'
ELSE 'O'
END AS "FLAG" from DUMMY
WHERE "Column1" = 'SQ03'
But the problem with this solution is that it's aggregating the value from Column3. Like, it sums the values where Column3 has duplicates.
Can anyone help me with a solution where I don't get the values aggregated?
EDIT:
My expected output would be this:
Column1 | Column2 | Column3 | Value
SQ03 | D | 1000040 | 1000
SQ03 | | 1000050 | 2000
SQ03 | L | 1000060 | 3000
You can use a subquery to generate row numbers for each Column3 value (ordered by Column2 DESC to make NULL values come last), and then select the rows which have row_number = 1:
SELECT Column1, Column2, Column3, Value
FROM (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY Column3 ORDER BY Column2 DESC) AS rn
FROM DUMMY
WHERE Column1 = 'SQ03'
) D
WHERE rn = 1
Alternatively you can use a CTE:
WITH CTE AS (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY Column3 ORDER BY Column2 DESC) AS rn
FROM DUMMY
WHERE Column1 = 'SQ03'
)
SELECT Column1, Column2, Column3, Value
FROM CTE
WHERE rn = 1
Output for both queries:
Column1 Column2 Column3 Value
SQ03 D 1000040 1000
SQ03 (null) 1000050 2000
SQ03 L 1000060 3000
Demo on SQLFiddle
I think an aggregation function (as a window function) does what you want:
select t.*,
max(column3) over (partition by column1)
from t;

SQL Server - updating distinct values

I have 3 columns of data:
Column1 has duplicate values eg a a b b c c
Column2 has all NULL values
Column3 has other data that is not really important
I want to update Column2 with a value eh Hello but only for 1 instance of each value for column1. Eg, a = Hello but the 2nd instance of Hello is NULL, same with b c and so on.
I can find the distinct value by using this:
select distinct Column1
from TABLENAME
But when I try to update a different column it breaks. What is wrong (probably a lot!!) with this:
update TABLENAME
set Column2 = 'Hello'
where (select distinct Column1 from TABLENAME)
You can try to use ROW_NUMBER window function make row number then only update with row number is 1.
update t1
set Column2 = 'Hello'
FROM (
select *,ROW_NUMBER() OVER(PARTITION BY Column1 ORDER BY Column3) rn
from TABLENAME
) t1
where rn = 1
Results:
| Column1 | Column2 | Column3 |
|---------|---------|---------|
| a | Hello | 1 |
| a | (null) | 2 |
| b | Hello | 3 |
| b | (null) | 4 |
| c | Hello | 5 |
| c | (null) | 6 |
Assuming it is your unique index, use Column 3.
UPDATE tablename SET column2 = 'Hello' WHERE column3 IN
(SELECT column3 from TableName GROUP BY column1)
You could also update only the rows that have odd numbers in Column 3.
UPDATE tablename SET column2 = 'Hello' WHERE column3 % 2 != 0
You can also use CROSS APPLY and CTE (Common Table Expression) to achieve this:
;with CTE AS
(SELECT t.Column1, t.Column2
FROM (SELECT DISTINCT Column1
FROM TABLENAME) x
CROSS APPLY(SELECT TOP 1 *
FROM TABLENAME
WHERE column1 = x.column1) t)
UPDTATE CTE
SET Column2 = 'Hello'
SELECT * FROM TABLENAME
You could use a window function as
UPDATE TT
SET Col = B
FROM
(
SELECT Col, ROW_NUMBER() OVER(PARTITION BY Col ORDER BY Col) RN
FROM T
)TT INNER JOIN
(
VALUES (1, 'Hello'), (2, NULL)
) TVC (A, B)
ON TT.RN = TVC.A;
Results:
+-------+
| Col |
+-------+
| Hello |
| NULL |
| Hello |
| NULL |
| Hello |
| NULL |
+-------+
Or using a CASE expression as:
UPDATE TT
SET Col = CASE WHEN RN = 1 THEN 'Hello' END
FROM
(
SELECT Col, ROW_NUMBER() OVER(PARTITION BY Col ORDER BY Col) RN
FROM T
)TT;
Online Demo

SQL query to give distinct values from one column, and a count of distinct values from a second column

Say I have a table like this:
column1 | column2
---------------------
1 | a
1 | b
1 | c
2 | a
2 | b
I need an SQL query to show the distinct values from column 1, and a count of the related distinct values from column 2. The output would look like:
column1 | count
-------------------
1 | 3
2 | 2
You could do something like this:
SELECT column1, count(column2)
FROM table
GROUP BY column1
You should do a COUNT(DISTINCT ...) with a GROUP BY:
Select Column1,
Count(Distinct Column2) As Count
From Table
Group By Column1