SQL distinct/groupby on combination of columns

SQL distinct/groupby on combination of columns - sql

I am trying to do a SQL select on a table based on two columns, but not in the usual way where the combination of values in both columns must be unique; I want to select where the value can only appear once in either column.
Given the dataset:
|pkid | fkself | otherData |
|-----+--------+-----------|
| 1 | 4 | there |
| 4 | 1 | will |
| 3 | 6 | be |
| 2 | 5 | other |
| 5 | 2 | data |
| 6 | 3 | columns |
I need to return either
|pkid | fkself | otherData |
|-----+--------+-----------|
| 1 | 4 | there |
| 3 | 6 | be |
| 2 | 5 | other |
or
|pkid | fkself | otherData |
|-----+--------+-----------|
| 4 | 1 | will |
| 5 | 2 | data |
| 6 | 3 | columns |
The only way I can think of to do this is to concatenate `pkid and fkid in order so that both row 1 and row 2 would concatenate to 1,4, but I'm not sure how to do that, or if it is even possible.
The rows will have other data columns, but it does not matter which row I get, only that I get each ID only once, whether the value is in pkid or fkself.

You can use least and greatest to get the smallest or biggest value of the two. That allows you to put them in the right order to generate those keys for you. You could concatenate the values as you suggested, but it's not needed in this solution. With dense_rank you can generate a sequence for each of those fictional keys. Then, you can get the first OtherData from that sequence.
select
pkid,
fkself,
otherData
from
(select
pkid,
fkself,
otherData,
dense_rank() over (partition by least(pkid, fkself), greatest(pkid, fkself) order by pkid) as rank
from
YourTable t)
where
rank = 1

Your idea is possible, and it should produce the results you want.
SELECT DISTINCT joinedID
FROM (
SELECT min(id) & "," & max(id) as joinedID
FROM (
SELECT pkid as id, someUniqueValue
FROM table
UNION ALL
SELECT fkself as id, someUniqueValue
FROM table)
GROUP BY someUniqueValue )
This will give you a unique list of IDs, concatenated as you like. You can easily include other fields by adding them to each SELECT statement. Also, someUniqueValue can be either an existing unique field, a new unique field, or the concatenated pkid and fkself, if that combination is unique.

The only way I can think of to do this is to concatenate `pkid and
fkid in order so that both row 1 and row 2 would concatenate to 1,4,
but I'm not sure how to do that, or if it is even possible.
You could do it using a CASE statement in Oracle:
SQL> SELECT * FROM sample
2 /
PKID FKSELF
---------- ----------
1 4
4 1
3 6
2 5
5 2
7 7
6 rows selected.
SQL> l
1 SELECT DISTINCT *
2 FROM (
3 SELECT CASE WHEN pkid <= fkself THEN pkid||','||fkself
4 ELSE fkself||','||pkid
5 END "JOINED"
6 FROM sample
7* )
SQL> /
JOINED
-------------------------------------------------------------------------------
1,4
2,5
3,6
7,7

Related

Count redundant for each value in a column in SQL server

If I have the following table in a SQL Server 2019 database as follows:
|id | name | count |
+-----+--------+--------+
| 1 | rose | 1 |
| 2 | peter | 1 |
| 3 | ann | 1 |
| 4 | rose | 2 |
| 5 | ann | 2 |
| 6 | ann | 3 |
| 7 | mike | 1 |
I would like to find out if an inserted name already exists in the column "name" and how many times and right a count next to it as shown in "count" column. For example when ann was inserted the second time I put count value bext to it which is 2, and ann was inserted the third time I put 3 next to it.
How to do that using SQL?
Thank you

Here is one approach using the insert ... select syntax:
insert into mytable (name, cnt)
select v.name, (select count(*) + 1 from mytable t where t.name = v.name)
from (values (#name)) v(name)
The value to insert is given as #name in derived table values(). Then we use a subquery to count how many such names already exist in the table.

I would suggest that you calculate this information when you query the table, rather than when you insert rows:
select t.*,
row_number() over (partition by name order by id) as count
from t;
The value will always be correct -- even when the data is updated or rows are deleted.

Select Name, Count(*)
From Table
Group By Name

Selecting the first row of group with additional group by columns

Say I have a table with the following results:
How is it possible for me to select such that I only want distinct parent_ids with the min result of object0_behaviour?
Expected output:
parent_id | id | object0_behaviour | type
------------------------------------------
1 | 1 | 5 | IP
2 | 3 | 5 | IP
3 | 5 | 7 | ID
4 | 6 | 7 | ID
5 | 8 | 5 | IP
6 | 18 | 7 | ID
7 | 10 | 7 | ID
8 | 9 | 5 | IP
I have tried:
SELECT parent_id, min(object0_behaviour) FROM table GROUP BY parent_id
It works, however if I wanted the other 2 additional columns, I am required to add into GROUP BY clause and things go back to square one.
I saw examples with R : Select the first row by group
Similar output from what I need, but I can't seem to convert it into SQL

You can try using row_number() window function
select * from
(
select *, row_number() over(partition by parent_id order by object0_behaviour) as rn
from tablename
)A where rn=1

select * from table
join (
SELECT parent_id, min(object0_behaviour) object0_behaviour
FROM table GROUP BY parent_id
) grouped
on grouped.parent_id = table.parent_id
and grouped.object0_behaviour = table.object0_behaviour

How to pivot a table in oracle SQL that has no feature to use as columns?

I have a query that looks like this
select
parentid,
id
from
table
order by
parentid;
The parentid is a reference to another type of object in a different table. The records in this table are additional information about the record in the parent table, and there can be anywhere from 1 to 10 ids associated with a parent id. The records don't have any particular order, either. So right now, the query above returns something like this:
parentid | id
---------------------------
1 10
1 20
1 30
1 40
2 50
2 60
3 70
4 80
4 90
4 100
I'd like to transform the results into a table like this
parentid | id1 | id2 | id3 | id4 ....
--------------------------------------------------------------------
1 10 20 30 40
2 50 60
3 70
4 80 90 100
I don't really care what column the ids end up in, since there's no order, but I do want each of them to be assigned to some column associated with the parent id. I thought about using pivot, but the examples I have seen make it look like you have to have an ordering or some other unique identifier associated with the ids to transform them into columns. There's no such field that could order or otherwise distinguish these records from one another. Is there a way to pivot without this, or to randomly assign some attribute that I could then use to pivot on?
Also, not sure if it will matter to the answer, but the table above is also a trivialization of the actual data for the sake of clarity - in reality there's tens of thousands of parent ids and records in this table.

Just create your column:
SqlFiddleDemo
SELECT 'ID' || ROW_NUMBER() OVER (PARTITION BY "parentid" ORDER BY "id") AS rn,
"parentid",
"id"
FROM Table1
OUTPUT
| RN | parentid | id |
|-----|----------|-----|
| ID1 | 1 | 10 |
| ID2 | 1 | 20 |
| ID3 | 1 | 30 |
| ID4 | 1 | 40 |
| ID1 | 2 | 50 |
| ID2 | 2 | 60 |
| ID1 | 3 | 70 |
| ID1 | 4 | 80 |
| ID2 | 4 | 90 |
| ID3 | 4 | 100 |
Or use this version if have more than 9 columns
SELECT 'ID' || LPAD(rn, 2, '0') as rn,
"parentid",
"id"
FROM (
SELECT ROW_NUMBER() OVER (PARTITION BY "parentid" ORDER BY "id") AS rn,
"parentid",
"id"
FROM Table1
) T

ROW_NUMBER() for rows which consists of more rows

I have this table
ObjectId| Value
---------------------
1 | A
1 | A
1 | A
5 | B
5 | B
5 | B
ordered by value and try to get "row number" this way (one row consists from multiple rows):
RowNumber | ObjectId | Value
------------------------------------
1 | 1 | A
1 | 1 | A
1 | 1 | A
2 | 5 | B
2 | 5 | B
2 | 5 | B
Any idea?
Thank you

You are looking for dense_rank:
select dense_rank() over (order by Value), ObjectId, Value
from thistable;
You can include two columns like this:
select dense_rank() over (order by ObjectId, Value), ObjectId, Value
from thistable;

Look at dense_rank(), this will continue with the next number in sequence. There's an example here.
SQL Fiddle
Returns the rank of rows within the partition of a result set, without
any gaps in the ranking. The rank of a row is one plus the number of
distinct ranks that come before the row in question.

Sqlite: Select last row group by 2 column

I'm trying to get the last row of my table but with 2 column.
+----+-----+---------+
| id1| id2 | info |
+----+-----+---------+
| 1 | 2 | info |
| 2 | 1 | NULL |
| 2 | 3 | info |
| 2 | 1 | NULL |
+----+-----+---------+
I tried:
SELECT * FROM table GROUP BY id1
but I got:
1 2
2 3
2 1
What I need:
2 3
2 1
In other words, I need the last row of each couple ids
Any idea?

SELECT DISTINCT id1, id2 FROM table WHERE id1=2
This should do the trick. Unless you want to apply an aggregation function to other columns, SELECT DISTINCT should to the trick. It will drop any duplicate rows.

If you want to get all items with the highest value dynamically, you can use:
SELECT DISTINCT id1, id2 FROM table WHERE id1=(SELECT MAX(id1))

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas