Selecting unique rows in in groups with missing values

Selecting unique rows in in groups with missing values - sql

I am having a table with two columns where the values of one of columns can be missing. First column is ID, second column is value.
I wanna select rows for unique IDs such that if there is multiple rows with the same ID but some of them have missing value, then return one of those that have existing value. If all rows with the ID have empty value, then return any one of them.
In other words, As long as two rows have the same ID they should belong to same group. But within each group, return the one that has 'value' if there is such.
For example,
Input table.
+--------+---------+
| ID | VALUE |
+------------------+
| x | 1 |
| x | 1 |
| y | 2 |
| y | |
| z | |
| z | |
+------------------+
Should return:
+------------+---------+
| ID | VALUE |
+------------+---------+
| x | 1 |
| y | 2 |
| z | |
+------------+---------+

From your description, you can just use max():
select id, max(value)
from t
group by id;
If you have additional columns that you want, then use row_number():
select t.*
from (select t.*,
row_number() over (partition by id order by (case when value is not null then 1 else 0 end)) as seqnum
from t
) t
where seqnum = 1;

You can use distinct function in hive/sql
hive> select distinct id,value from <db_name>.<table_name>;
the above query will return distinct values in id,value columns
hive> select distinct * from <db_name>.<table_name>;
the above statement is used to return only distinct (different) values based on all columns.

You can easily divide your query in to two query:
A: 1- find unique row with DISTINCT on (ID,Value) which are not empty VALUE
B: 2- find unique row with DISTINCT on ID which are empty in VALUE and ID not in(A(ID))
A U (B - A)

Related

How to count/increment the current number of occurances of a table column in a MS SQL select

I have a table which looks like this:
id | name| fk_something
----------------
0 | 25 | 3
1 | 25 | 2
2 | 23 | 1
and I want to add another column with a number which increments everytime row name occurs, e.g.:
id | name| fk_something| n
--------------------------
0 | 25 | 3 | 1
1 | 25 | 2 | 2
2 | 23 | 1 | 1
I'm not really sure how to achieve this. Using count() I will only get the total number of occurances of name but I want to increment n so that I have a distinct value for each row.

You want row_number() :
select t.*, row_number() over (partition by name order by id) as n
from table t;

You may try using COUNT as an analytic function:
SELECT
id,
name,
fk_something,
COUNT(*) OVER (PARTITION BY name ORDER BY id) n
FROM yourTable
ORDER BY
id;
Demo

How to use DISTINCT ON (of PostgreSQL) in Firebird?

I have a TempTable with datas:
------------------------------------
| KEY_1 | KEY 2 | NAME | VALUE |
------------------------------------
| 1 | 0001 | NAME 2 | VALUE 1 |
| 1 | 0002 | NAME 1 | VALUE 3 |
| 1 | 0003 | NAME 3 | VALUE 2 |
| 2 | 0001 | NAME 1 | VALUE 2 |
| 2 | 0001 | NAME 2 | VALUE 1 |
------------------------------------
I want to get the following data:
------------------------------------
| KEY_1 | KEY 2 | NAME | VALUE |
------------------------------------
| 1 | 0001 | NAME 2 | VALUE 1 |
| 2 | 0001 | NAME 1 | VALUE 2 |
------------------------------------
In PostgreSQL, I use a query with DISTINCT ON:
SELECT DISTINCT ON (KEY_1) KEY_1, KEY_2, NAME, VALUE
FROM TempTable
ORDER BY KEY_1, KEY_2
In Firebird, how to get data as above datas?

PostgreSQL's DISTINCT ON takes the first row per stated group key considering the ORDER BY clause. In other DBMS (including later versions of Firebird), you'd use ROW_NUMBER for this. You number the rows per group key in the desired order and stay with those numbered #1.
select key_1, key_2, name, value
from
(
select key_1, key_2, name, value,
row_number() over (partition by key_1 order by key_2) as rn
from temptable
) numbered
where rn = 1
order by key_1, key_2;
In your example you have a tie (key_1 = 2 / key_2 = 0001 occurs twice) and the DBMS picks one of the rows arbitrarily. (You'd have to extend the sortkey both in DISTINCT ON and ROW_NUMBER to decide which to pick.) If you want two rows, i.e. showing all tied rows, you'd use RANK (or DENSE_RANK) instead of ROW_NUMBER, which is something DISTINCT ON is not capable of.

Firebird 3.0 supports window functions, so you can use:
select . . .
from (select t.*,
row_number() over (partition by key_1 order by key_2) as seqnum
from temptable t
) t
where seqnum = 1;
In earlier versions, you can use several methods. Here is a correlated subquery:
select t.*
from temptable t
where t.key_2 = (select max(t2.key_2)
from temptable t2
where t2.key_1 = t.key_1
);
Note: This will still return duplicate values for key_1 because of the duplicates for key_2. Alas . . . getting just one row is tricky unless you have a unique identifier for each row.

SQL query to create ascending values within groups

I have the following table:
+----+--------+-----+
| id | fk_did | pos |
+----+--------+-----+
This table contains hundreds of rows, each of them referencing another table with fk_did. The value in pos is currently always zero which I want to change.
Basically, for each group of fk_did, the pos-column should start at zero and be ascending. It doesn't matter how the rows are ordered.
Example output (select * from table order by fk_did, pos) that I wanna get:
+----+--------+-----+
| id | fk_did | pos |
+----+--------+-----+
| xx | 0 | 0 |
| xx | 0 | 1 |
| xx | 0 | 2 |
| xx | 1 | 0 |
| xx | 1 | 1 |
| xx | 1 | 2 |
| xx | 4 | 0 |
| xx | 8 | 0 |
| xx | 8 | 1 |
| xx | 8 | 2 |
+----+--------+-----+
There must be no two rows that have the same combination of fk_did and pos
pos must be ascending for each fk_did
If there is a row with pos > 0, there must also be a row with the same fk_did and a lower pos.
Can this be done with a single update query?

You can do this using a window function:
update the_table
set pos = t.rn - 1
from (
select id,
row_number() over (partition by fk_id) as rn
from the_table
) t
where t.id = the_table.id;
The ordering of pos will be more or less random, as there is no order by, but you said that doesn't matter.
This assumes that id is unique, if not, you can use the internal column ctid instead.

If id is the PK of your table, then you can use the following query to update your table:
UPDATE mytable
SET pos = t.rn
FROM (
SELECT id, fk_did, pos,
ROW_NUMBER() OVER (PARTITION BY fk_did ORDER BY id) - 1 AS rn
FROM mytable) AS t
WHERE mytable.id = t.id
ROW_NUMBER window function, used with a PARTITION BY clause, generates sequence numbers starting from 1 for each fk_did slice.
Demo here

I'd suggest creating a temporary table if id column is not unique):
create temp table tmp_table as
select id, fk_did, row_number() over (partition by fk_did) - 1 pos
from table_name
And then truncate current table and insert records from the temp table

ROW_NUMBER() for rows which consists of more rows

I have this table
ObjectId| Value
---------------------
1 | A
1 | A
1 | A
5 | B
5 | B
5 | B
ordered by value and try to get "row number" this way (one row consists from multiple rows):
RowNumber | ObjectId | Value
------------------------------------
1 | 1 | A
1 | 1 | A
1 | 1 | A
2 | 5 | B
2 | 5 | B
2 | 5 | B
Any idea?
Thank you

You are looking for dense_rank:
select dense_rank() over (order by Value), ObjectId, Value
from thistable;
You can include two columns like this:
select dense_rank() over (order by ObjectId, Value), ObjectId, Value
from thistable;

Look at dense_rank(), this will continue with the next number in sequence. There's an example here.
SQL Fiddle
Returns the rank of rows within the partition of a result set, without
any gaps in the ranking. The rank of a row is one plus the number of
distinct ranks that come before the row in question.

MySQL's alternative to T-SQL's WITH TIES

I have a table from which I want to get the top N records. The records are ordered by values and some records have the same values. What I'd like to do here is to get a list of top N records, including the tied ones. This is what's in the table:
+-------+--------+
| Name | Value |
+-------+--------+
| A | 10 |
| B | 30 |
| C | 40 |
| D | 40 |
| E | 20 |
| F | 50 |
+-------+--------+
Now if I want to get the top 3 like so
SELECT * FROM table ORDER BY Value DESC LIMIT 3
I get this:
+-------+--------+
| Name | Value |
+-------+--------+
| F | 50 |
| C | 40 |
| D | 40 |
+-------+--------+
What I would like to get is this
+-------+--------+
| Name | Value |
+-------+--------+
| F | 50 |
| C | 40 |
| D | 40 |
| B | 30 |
+-------+--------+
I calculate the rank of each record so what I would really like is to get the first N ranked records instead of the first N records ordered by value. This is how I calculate the rank:
SELECT Value AS Val, (SELECT COUNT(DISTINCT(Value))+1 FROM table WHERE Value > Val) as Rank
In T-SQL something like this is achievable by doing this:
SELECT TOP 3 FROM table ORDER BY Value WITH TIES
Does anyone have an idea how to do this in MySQL? I understand it could be done with subqueries or temporary tables but I don't have enough knowledge to accomplish this. I'd prefer a solution without using temporary tables.

Does this work for you?
select Name, Value from table where Value in (
select distinct Value from table order by Value desc limit 3
) order by Value desc
Or perhaps:
select a.Name, a.Value
from table a
join (select distinct Value from table order by Value desc limit 3) b
on a.Value = b.Value

select a.Name, a.Value
from table a
join (select Value from table order by Value desc limit 3) b
on a.Value = b.Value
This is like #Fosco's answer, but without DISTINCT in the subquery. His version returns the players with the top N scores, not the top N players (plus ties). E.g. if the scores are 50, 50, 50, 40, 40, 30, 20, he'll return 6 players (3x50, 2x40, 1x30), but you presumably just want 3x50.

Starting with MySQL 8, you can use window functions to emulate the WITH TIES semantics, by filtering on RANK(). For example:
SELECT Name, Value
FROM (
SELECT Name, Value, RANK() OVER (ORDER BY Value DESC) AS rk
FROM table
) t
WHERE rk <= 3
Note that when reading your question more closely, this doesn't do exactly what you seem to want, but it does exactly what T-SQL can do through the TOP n WITH TIES clause.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Selecting unique rows in in groups with missing values - sql

You can easily divide your query in to two query: A: 1- find unique row with DISTINCT on (ID,Value) which are not empty VALUE B: 2- find unique row with DISTINCT on ID which are empty in VALUE and ID not in(A(ID)) A U (B - A)

Related

How to count/increment the current number of occurances of a table column in a MS SQL select

How to use DISTINCT ON (of PostgreSQL) in Firebird?

SQL query to create ascending values within groups

ROW_NUMBER() for rows which consists of more rows

MySQL's alternative to T-SQL's WITH TIES

Categories

Resources