How to use DISTINCT ON (of PostgreSQL) in Firebird? - sql

I have a TempTable with datas:
------------------------------------
| KEY_1 | KEY 2 | NAME | VALUE |
------------------------------------
| 1 | 0001 | NAME 2 | VALUE 1 |
| 1 | 0002 | NAME 1 | VALUE 3 |
| 1 | 0003 | NAME 3 | VALUE 2 |
| 2 | 0001 | NAME 1 | VALUE 2 |
| 2 | 0001 | NAME 2 | VALUE 1 |
------------------------------------
I want to get the following data:
------------------------------------
| KEY_1 | KEY 2 | NAME | VALUE |
------------------------------------
| 1 | 0001 | NAME 2 | VALUE 1 |
| 2 | 0001 | NAME 1 | VALUE 2 |
------------------------------------
In PostgreSQL, I use a query with DISTINCT ON:
SELECT DISTINCT ON (KEY_1) KEY_1, KEY_2, NAME, VALUE
FROM TempTable
ORDER BY KEY_1, KEY_2
In Firebird, how to get data as above datas?

PostgreSQL's DISTINCT ON takes the first row per stated group key considering the ORDER BY clause. In other DBMS (including later versions of Firebird), you'd use ROW_NUMBER for this. You number the rows per group key in the desired order and stay with those numbered #1.
select key_1, key_2, name, value
from
(
select key_1, key_2, name, value,
row_number() over (partition by key_1 order by key_2) as rn
from temptable
) numbered
where rn = 1
order by key_1, key_2;
In your example you have a tie (key_1 = 2 / key_2 = 0001 occurs twice) and the DBMS picks one of the rows arbitrarily. (You'd have to extend the sortkey both in DISTINCT ON and ROW_NUMBER to decide which to pick.) If you want two rows, i.e. showing all tied rows, you'd use RANK (or DENSE_RANK) instead of ROW_NUMBER, which is something DISTINCT ON is not capable of.

Firebird 3.0 supports window functions, so you can use:
select . . .
from (select t.*,
row_number() over (partition by key_1 order by key_2) as seqnum
from temptable t
) t
where seqnum = 1;
In earlier versions, you can use several methods. Here is a correlated subquery:
select t.*
from temptable t
where t.key_2 = (select max(t2.key_2)
from temptable t2
where t2.key_1 = t.key_1
);
Note: This will still return duplicate values for key_1 because of the duplicates for key_2. Alas . . . getting just one row is tricky unless you have a unique identifier for each row.

Related

Grouping using row number function

I have been using the row_number() function to only select the observations that I need.
In my scenario, whenever there is two different name for a particular <id, entity_id, period, element>, the National one should be left-out. In case there is only one, take the only one.
+----+-----------+--------+---------------+---------------------------+
| id | entity_id | period | element | name |
+----+-----------+--------+---------------+---------------------------+
| 12 | ABC123 | 2021 | Overall value | National Compatible - XYZ |
| 12 | ABC123 | 2021 | Overall value | Overall Estimation |
+----+-----------+--------+---------------+---------------------------+
With cases like above, the following did the trick:
SELECT *
FROM (SELECT *,
Row_number()
OVER (
partition BY id, entity_id, period, element
ORDER BY NAME DESC) AS rn
FROM mydata) table
WHERE table.rn = 1
Problem is that now there are other cases like the following:
+----+-----------+--------+---------------+---------------------------+
| id | entity_id | period | element | name |
+----+-----------+--------+---------------+---------------------------+
| 12 | ABC123 | 2021 | Overall value | National Based - ZYX |
| 12 | ABC123 | 2021 | Overall value | Base Estimation |
+----+-----------+--------+---------------+---------------------------+
And with the current SQL this would not work as I would have to change the order by from descending to ascending.
Is there any possibility to de-prioritize the "National..." record and take the other one in case there are multiple ones?
I am running the query on Hive/Impala.
If you add another derived-table layer (or use a CTE) then you can add a CASE WHEN to check for "name" starting with 'National' and give it a simple integer "tag" value you can use to de-prioritize those rows.
...like so:
WITH q AS (
SELECT
"id",
"entity_id",
"period",
"element",
"name",
CASE WHEN "name" LIKE 'National%' THEN 1 ELSE 2 END AS "tag"
FROM
mydata
),
filtered AS (
SELECT
*,
ROW_NUMBER() OVER (
PARTITION BY
"id", "entity_id", "period", "element"
ORDER BY
"tag" DESC,
"name" DESC
) AS rn
FROM
q
)
SELECT
*
FROM
filtered
WHERE
rn = 1

Select the highest value of column 2 per column 1

Given the following table P_PROV
+----+-----------+-----------+
| id | date | person_id |
+----+-----------+-----------+
| 1 |19/06/2019 | 1 |
| 2 |18/07/2010 | 2 |
| 3 |19/06/2020 | 1 |
| 4 |17/06/2020 | 2 |
| 5 |28/06/2020 | 3 |
+----+-----------+-----------+
I want this output
+----+-----------+-----------+
| id | date | person_id |
+----+-----------+-----------+
| 3 |19/06/2020 | 1 |
| 4 |17/06/2020 | 2 |
| 5 |28/06/2020 | 3 |
+----+-----------+-----------+
Putting this in words, I want to return per person the maximum date. I tried something like this
SELECT DISTINCT pp.date, pp.id FROM P_PROV pp
WHERE (SELECT MAX(aa.date)
FROM P_PROV aa) = pp.date;
This one is only returning one row (of course, because the MAX will return the maximum date only), but I really don't know how to approach this issue, any kind of help would be appreciated
ROW_NUMBER provides one way to handle this:
SELECT id, date, person_id
FROM
(
SELECT t.*, ROW_NUMBER() OVER (PARTITION BY person_id ORDER BY date DESC) rn
FROM yourTable t
) t
WHERE rn = 1;
Oracle has a fun way to do this using aggregation:
select max(id) keep (dense_rank first order by date desc) as id,
max(date) as date, person_id
from P_PROV
group by person_id;
Given that your ids are increasing, this probably also does what you want:
select max(id) as id, max(date) as date, person_id
from P_PROV
group by person_id;

SQL query for selecting multiple records for one product for a single id

My table looks like this, what I'm trying to achieve is to pull out all the records for one user for the product that have the earliest date
product |type_id| user | Date |Desired ROW_NUMBER as output |
-------+--------+------+-------+---------------------
1 | 1 | A | 0101 | 1
1 | 1 | A | 0102 | 1
2 | 3 | A | 0105 | 2
2 | 5 | A | 0105 | 2
3 | 7 | B | 0101 | 1
3 | 8 | B | 0104 | 1
So I want to pull all the records with "1" in the desired row_num column, but I haven't figured out hot to get this without doing another group by. Any helps would be appreciated.
You can use window functions:
select t.*
from (select t.*,
rank() over (partition by user order by min_date) as seqnum
from (select t.*,
min(date) over (partition by user, product) as min_date
from t
) t
) t
where seqnum = 1;
Or, with only one subquery:
select t.*
from (select t.*,
min(date) over (partition by user, product) as min_date_up,
min(date) over (partition by user) as min_date_u
from t
) t
where min_date_u = min_date_up;
You can interpret this as "return all rows where the product has the minimum date for the user".
Here is a db<>fiddle.
SELECT * FROM [tableName] WHERE Desired ROW_NUMBER = 1 ORDER BY Date[DESC, ASC]
Pass the Desired ROW_NUMBER value dynamically as a parameter.

SQL query to create ascending values within groups

I have the following table:
+----+--------+-----+
| id | fk_did | pos |
+----+--------+-----+
This table contains hundreds of rows, each of them referencing another table with fk_did. The value in pos is currently always zero which I want to change.
Basically, for each group of fk_did, the pos-column should start at zero and be ascending. It doesn't matter how the rows are ordered.
Example output (select * from table order by fk_did, pos) that I wanna get:
+----+--------+-----+
| id | fk_did | pos |
+----+--------+-----+
| xx | 0 | 0 |
| xx | 0 | 1 |
| xx | 0 | 2 |
| xx | 1 | 0 |
| xx | 1 | 1 |
| xx | 1 | 2 |
| xx | 4 | 0 |
| xx | 8 | 0 |
| xx | 8 | 1 |
| xx | 8 | 2 |
+----+--------+-----+
There must be no two rows that have the same combination of fk_did and pos
pos must be ascending for each fk_did
If there is a row with pos > 0, there must also be a row with the same fk_did and a lower pos.
Can this be done with a single update query?
You can do this using a window function:
update the_table
set pos = t.rn - 1
from (
select id,
row_number() over (partition by fk_id) as rn
from the_table
) t
where t.id = the_table.id;
The ordering of pos will be more or less random, as there is no order by, but you said that doesn't matter.
This assumes that id is unique, if not, you can use the internal column ctid instead.
If id is the PK of your table, then you can use the following query to update your table:
UPDATE mytable
SET pos = t.rn
FROM (
SELECT id, fk_did, pos,
ROW_NUMBER() OVER (PARTITION BY fk_did ORDER BY id) - 1 AS rn
FROM mytable) AS t
WHERE mytable.id = t.id
ROW_NUMBER window function, used with a PARTITION BY clause, generates sequence numbers starting from 1 for each fk_did slice.
Demo here
I'd suggest creating a temporary table if id column is not unique):
create temp table tmp_table as
select id, fk_did, row_number() over (partition by fk_did) - 1 pos
from table_name
And then truncate current table and insert records from the temp table

Select entire partition where max row in partition is greater than 1

I'm partitioning by some non unique identifier, but I'm only concerned in the partitions with at least two results. What would be the way to get out all the instances where there's exactly one of the specified identifier?
Query I'm using:
SELECT ROW_NUMBER() OVER
(PARTITION BY nonUniqueId ORDER BY nonUniqueId, aTimeStamp) as row
,nonUniqueId
,aTimeStamp
FROM myTable
What I'm getting:
row | nonUniqueId | aTimeStamp
---------------------------------
1 | 1234 | 2014-10-08...
2 | 1234 | 2014-10-09...
1 | 1235 | 2014-10-08...
1 | 1236 | 2014-10-08...
2 | 1236 | 2014-10-09...
What I want:
row | nonUniqueId | aTimeStamp
---------------------------------
1 | 1234 | 2014-10-08...
2 | 1234 | 2014-10-09...
1 | 1236 | 2014-10-08...
2 | 1236 | 2014-10-09...
Thanks for any direction :)
Based on syntax, I'm assuming this is SQL Server 2005 or higher. My answer will be meant for that.
You have a couple options.
One, use a CTE:
;WITH CTE AS (
SELECT ROW_NUMBER() OVER
(PARTITION BY nonUniqueId ORDER BY nonUniqueId, aTimeStamp) as row
,nonUniqueId
,aTimeStamp
FROM myTable
)
SELECT *
FROM CTE t
WHERE EXISTS (SELECT 1 FROM CTE WHERE row = 2 and nonUniqueId = t.nonUniqueId);
Or, you can use subqueries:
SELECT ROW_NUMBER() OVER
(PARTITION BY nonUniqueId ORDER BY nonUniqueId, aTimeStamp) as row
,nonUniqueId
,aTimeStamp
FROM myTable t
WHERE EXISTS (SELECT 1 FROM myTable
WHERE nonUniqueId = t.nonUniqueId GROUP BY nonUniqueId, aTimeStamp HAVING COUNT(*) >= 2);