How to select unique rows (comparing by a few columns)? - sql

I want to select unique rows from a table (without repeating the combination of 'f' and 'x' fields).
The table:
| f | x | z |
|---|—--|---|
| 1 | 1 | a |
| 1 | 2 | b |
| 1 | 3 | c |
| 1 | 3 | d |
The result:
| f | x | z |
|---|—--|---|
| 1 | 1 | a |
| 1 | 2 | b |

The following query groups rows in "the_table" by "f" and "x", selects the minimum value of "z" in each group and filters out groups with a count greater than 1, returning only unique combinations of "f" and "x".
SELECT f, x, MIN(z) AS z
FROM the_table
GROUP BY f, x
HAVING COUNT(*) = 1;

WITH
check_repetitions AS
(
SELECT
*,
COUNT(*) OVER (PARTITION BY f, x) AS repetitions
FROM
your_table
)
SELECT
f, x, z
FROM
check_repetitions
WHERE
repetitions = 1

You can use the following query to select only rows where the combination of columns f and x do not repeat:
SELECT f, x, MIN(z) AS z
FROM table_name
GROUP BY f, x
HAVING COUNT(*) = 1
This query will group the rows based on the values of f and x, and then return only the rows where the combination of f and x occurs only once. The function MIN is used to select a single value for z for each group.

Related

How do I return two lines of unions and a count for each row?

I'm trying to take 10 columns and put into one column, take 10 more different columns into a separate column and get a count for each unique couple in a third column.
So far I've got
UNION
(SELECT a_3 as x, a_4 as y FROM result)
....
UNION
(SELECT a_19 as x, a_20 as y FROM result)
This gives me a table as follows
| x | y |
---------
| a | 1 |
| a | 2 |
| b | 1 |
| b | 3 |
etc...
I want this, however I also want a third column counting how many times each row occurs, like below
| x | y |count|
---------------
| a | 1 | 10 |
| a | 2 | 3 |
| b | 1 | 6 |
| b | 3 | 2 |
etc...
I can also do:
select count(*) from (insert above union SQL)
but then I just get a total number for the table.
Thanks!
Looks like your original table design is not normalized at all.
Instead of multiple union query, you can use CROSS APPLY with Table Value Constructor to simplify the query
select x, y, count(*) as [count]
from [result] t
cross apply
(
values (a_1, a_2), (a_3, a_4), (a_5, a_6),
. . .
(a_19, a_20)
) v (x, y)
group by x, y
SELECT X,Y,COUNT(1) AS[COUNT] FROM
(
SELECT a_3 as x, a_4 as y FROM result
....
UNION
SELECT a_19 as x, a_20 as y FROM result
) x
GROUP BY X,Y

Group observations with SQL and Specifying in same group

I have a table consisting of two columns (X,Y) that represent correlations between observations like below.
X Y
1 2
2 3
3 4
A B
B C
I want a create new column that represent the relation between observation. 1 become 2, 2 become 3, 3 become 4. So i wanna show this variables in same group(1,2,3,4 are belong to same group). The table should be like below.
X Y Z
1 2 Group 1
2 3 Group 1
3 4 Group 1
A B Group 2
B C Group 2
I am using SAS Enterprise Guide. The solution would be great with proc sql or any sql type. I need the logic.
Note: I have no additional information except this table.
Try the following, here is the demo which is in PostgreSQL but you may be able to use the same logic.
with cte as
(
select
*,
lag(y) over (order by x) as rnk
from myTable
)
select
x,
y,
concat('Group ', sum(case when x = rnk then 0 else 1 end) over (order by x)) as z
from cte;
Output:
| x | y | z |
| --- | --- | ------- |
| 1 | 2 | Group 1 |
| 2 | 3 | Group 1 |
| 3 | 4 | Group 1 |
| A | B | Group 2 |
| B | C | Group 2 |

How to select a random record for each group

I have a table like
| A | B | C | D |
|--------|---|---|---|
| Value1 | x | x | x |
| Value1 | y | x | y |
| Value1 | x | x | x |
| .... |
| Value2 | x | x | x |
| Value2 | x | x | x |
| Value2 | x | x | x |
| .... |
| Value3 | x | x | x |
| Value3 | x | x | x |
| Value3 | x | x | x |
where A column can have one value from a set. I want to get a random record for each unique value in A column.
You can use window functions:
select *
from (
select
t.*,
row_number() over(partition by a order by random()) rn
from mytable t
) t
where rn = 1
row_number() assigns a random rank to each record within groups having the same a; then, the outer query filters one record per group.
Actually, since you are running Postgres, you could as well use distinct on, which could give better performance (and a shorter syntax):
select distinct on (a) t.*
from mytable t
order by a, random();
You can do it with distinct on:
select distinct on (a) a, b, c, d
from test t;
Here is a Demo
With DISTINCT ON, You tell PostgreSQL to return a single row for each
distinct group defined by the ON clause.
More about that subject here: https://www.geekytidbits.com/postgres-distinct-on/

Grouping by similar values in multiple columns

I have a table of entities with an id, and a category (few different values with NULL allowed) from 3 different years (category can be different from 1 year to another), in 'wide' table format:
| ID | CATEG_Y1 | CATEG_Y2 | CATEG_Y3 |
+-----+----------+----------+----------+
| 1 | NULL | B | C |
| 2 | A | A | C |
| 3 | B | A | NULL |
| 4 | A | C | B |
| ... | ... | ... | ... |
I would like to simply count the number of entities by category, grouped by category, independently for the year:
+-------+----+----+----+
| CATEG | Y1 | Y2 | Y3 |
+-------+----+----+----+
| A | 6 | 4 | 5 | <- 6 entities w/ categ_y1, 4 w/ categ_y2, 5 w/ categ_y3
| B | 3 | 1 | 10 |
| C | 8 | 4 | 5 |
| NULL | 3 | 3 | 3 |
+-------+----+----+----+
I guess I could do it by grouping values one column after the other and UNION ALL the results, but I was wondering if there was a more rapid & convenient way, and if it can be generalized if I have more columns/years to manage (e.g. 20-30 different values)
A bit clumsy, but probably someone has a better idea. Query first collects all diferent categories (the union-query in the from part), and then counts the occurences with dedicated subqueries in the select part. One could omit the union-part if there is a table already defining the available categories (I suppose categ_y1 is a foreign key to such a primary category table). Hope there are not to many typos:
select categories.cat,
(select count(categ_y1) from table ty1 where select categories.cat = categ_y1) as y1,
(select count(categ_y2) from table ty2 where select categories.cat = categ_y2) as y2,
(select count(categ_y3) from table ty3 where select categories.cat = categ_y3) as y3
from ( select categ_y1 as cat from table t1
union select categ_y2 as cat from table t2
union select categ_y3 as cat from table t3) categories
Use jsonb functions to transpose the data (from the question) to this format:
select categ, jsonb_object_agg(key, count) as jdata
from (
select value as categ, key, count(*)
from my_table t,
jsonb_each_text(to_jsonb(t)- 'id')
group by 1, 2
) s
group by 1
order by 1;
categ | jdata
-------+-----------------------------------------------
A | {"categ_y1": 2, "categ_y2": 2}
B | {"categ_y1": 1, "categ_y2": 1, "categ_y3": 1}
C | {"categ_y2": 1, "categ_y3": 2}
| {"categ_y1": 1, "categ_y3": 1}
(4 rows)
For a known (static) number of years you can easily unpack the jsonb column:
select categ, jdata->'categ_y1' as y1, jdata->'categ_y2' as y2, jdata->'categ_y3' as y3
from (
select categ, jsonb_object_agg(key, count) as jdata
from (
select value as categ, key, count(*)
from my_table t,
jsonb_each_text(to_jsonb(t)- 'id')
group by 1, 2
) s
group by 1
) s
order by 1;
categ | y1 | y2 | y3
-------+----+----+----
A | 2 | 2 |
B | 1 | 1 | 1
C | | 1 | 2
| 1 | | 1
(4 rows)
To get fully dynamic solution you can use the function create_jsonb_flat_view() described in Flatten aggregated key/value pairs from a JSONB field.
I would do this as using union all followed by aggregation:
select categ, sum(categ_y1) as y1, sum(categ_y2) as y2,
sum(categ_y3) as y3
from ((select categ_y1, 1 as categ_y1, 0 as categ_y2, 0 as categ_y3
from t
) union all
(select categ_y2, 0 as categ_y1, 1 as categ_y2, 0 as categ_y3
from t
) union all
(select categ_y3, 0 as categ_y1, 0 as categ_y2, 1 as categ_y3
from t
)
)
group by categ ;

Sum of values in column by row and also selecting other columns in the result set

My current query is as follows:
select id, quantity, c, d
from table1;
This would give me example data as below:
id | quantity | c | d
---------------------
1 | 1 | x | y
1 | 3 | x | y
2 | 1 | x | y
2 | 1 | x | y
However I want to group by the ID and get the sum of the quantities to be as below:
id | quantity | c | d
---------------------
1 | 4 | x | y
2 | 2 | x | y
I tried to modify my first query to include a group by on the id and a sum on the quantities:
select id, sum(quantity), c, d
from table1
group by id;
But I got an error because the other 2 columns are not part of the group by clause. How can I include them?
Simple add they to the group by clause:
select id, sum(quantity), c, d from table1 group by id, c, d;
Using OVER clause make sense here:
select id, SUM(quantity) OVER (PARTITION BY id), c, d
from table1
Don't forget to get distinct results.
If you have more than one combination of c, d you will need to state explicitly which one you want to choose.