Count distinct by link on sql? - sql

I'm trying to count distinct by the link between two columns.
Here is the example.
rownum
type
id
1
A
a
2
A
b
3
B
b
4
B
c
5
C
c
6
C
d
If I count distinct by type column, it returns 3. However, what I'd like to do is to consider rownum 2 and 3, 4 and 5 are not distinctive because they got the same value on id column.
To rephrase,
type
array of id
A
a, b
B
b, c
C
c, d
Since A and B got same b, and B and C got same c on their arrays, it would return 1 as a result.
I have no idea where to start. Would appreciate if I can get any hint or something.

Consider below:
you might use STRING_AGG
WITH TMP_TBL AS
(
SELECT 1 AS ROWNUM, 'A' AS TYPE, 'a' AS ID UNION ALL
SELECT 2,'A','b' UNION ALL
SELECT 3,'B','b' UNION ALL
SELECT 4,'B','b' UNION ALL
SELECT 5,'C','c' UNION ALL
SELECT 6,'C','d'
);
SELECT DISTINCT TYPE,N_ID
FROM
(
SELECT TYPE,STRING_AGG(ID)OVER(PARTITION BY TYPE) AS N_ID FROM TMP_TBL
)

Related

How to get the record (by group) with max value? (Big Query)

Consider the following data
Column A
Column B
Column C
A
t
9
A
d
12
A
l
8
B
x
7
B
z
9
B
q
6
How do I extract the record with the max value in Col C for each value in Col A.
So the expected result would be...
Column A
Column B
Column C
A
d
12
B
z
9
Trying
select ColA, max(ColC) from table group by ColA
doesn't provide the value in ColB.
I'm sure there is a simple and elegant solution here, but it's escaping me....
Consider below approach
select * from your_table
qualify 1 = row_number() over(partition by colA order by colC desc)
if applied to sample data in y our question - output is
window function can be used here:
with tbl as (
Select "A" as colA, "t" as colB, 9 as colC
union all select "A","d", 12
union all select "A","dd", 12
union all select "A","l", 8
union all select "B","x", 7
union all select "B","z", 9
union all select "B","q", 6
)
select
colA,
max(colC),
any_Value(colB_max)
from (select *, first_value(colB) over (partition by colA order by colC desc) as colB_max from tbl )
group by 1
I added an entry for column A is "A". Then there are two entries for the max value of column C. The selected value for it from column B is more or less random.

How to select the total count?

I have the following two tables (postgresql)
tableA
a b
----------
1 A
2 B
table B
c b
----------
1 A
3 B
I want to find out the same number of columns b, but if column a and column c are the same, count one.
So the final result should be
b count
----------
A 1
B 2
How should I write sql?
You need union all for the 2 tables and then group by b to count distinct values of a:
select t.b, count(distinct t.a) counter
from (select * from tablea union all select * from tableb) t
group by t.b
Aggregate by column b and take the distinct count of column a:
SELECT b, COUNT(DISTINCT a) AS count
FROM yourTable
GROUP BY b
ORDER BY b;

Sum col2 of two tables based on duplicate matching col1

I have 2 tables both structured as (id, views)
Table 1:
id views
A 1
B 2
B 3
C 3
C 4
D 4
Table 2:
id views
C 1
D 3
D 4
E 5
E 7
F 8
I'm looking to sum views of ids that are both in table 1 and 2 (id C and D) in this case so the output would be:
Table 3:
id views
C 8
D 11
You could use the following query in your case :
select a.id,sum(a.views) from ( select * from table1 union table2 ) as a group by id;
select id,sum(views) from (select * from table1 union all select * from table2)a where a.id="C" or a.id="D" group by id;

Remove multiple entries for a column

I need two columns A and B but of them A has repeated values and B has single unique values. I have to fetch only those values of A which has max(C) value. C is another column.
You can use ROW_NUMBER.
ROW_NUMBER
Returns the sequential number of a row within a partition of a result
set, starting at 1 for the first row in each partition.
PARTITION BY value_expression
Divides the result set produced by the
FROM clause into partitions to which the ROW_NUMBER function is
applied. value_expression specifies the column by which the result set
is partitioned. If PARTITION BY is not specified, the function treats
all rows of the query result set as a single group.
ORDER BY
The ORDER BY clause determines the sequence in which the rows are assigned their unique ROW_NUMBER within a specified
partition. It is required.
Sample of ROW_NUMBER in your case:
SELECT A, B
FROM
(
SELECT ROW_NUMBER() OVER(PARTITION BY A ORDER BY C DESC) AS RowNum, A, B, C
FROM TableName
)
WHERE RowNum = 1
Use Row_Number Analytic function to do this
select A,B
from
(
select row_number() over(partition by A order by C desc)rn,A,B,C
from yourtable
)
where RN=1
An alternative to #NoDisplayName's solution is to use keep dense_rank first/last:
with your_table as (select 1 a, 3 b, 10 c from dual union all
select 1 a, 2 b, 20 c from dual union all
select 1 a, 1 b, 30 c from dual union all
select 2 a, 4 b, 40 c from dual union all
select 2 a, 5 b, 60 c from dual union all
select 2 a, 3 b, 60 c from dual union all
select 3 a, 6 b, 70 c from dual union all
select 4 a, 2 b, 80 c from dual)
select a,
max(b) keep (dense_rank first order by c desc) b,
max(c) max_c
from your_table
group by a;
A B MAX_C
---------- ---------- ----------
1 1 30
2 5 60
3 6 70
4 2 80
Using the INTERSECT keyword get those rows which have maximum value of ColC for the ColA.
select ColA, ColB from
(
select ColA, ColB, max(colC) from Tabl
group by ColA, ColB
intersect
select ColA, ColB, ColC from Tabl
) as A

Use a calculated field in the where clause

Is there a way to use a calculated field in the where clause?
I want to do something like
SELECT a, b, a+b as TOTAL FROM (
select 7 as a, 8 as b FROM DUAL
UNION ALL
select 8 as a, 8 as b FROM DUAL
UNION ALL
select 0 as a, 0 as b FROM DUAL
)
WHERE TOTAL <> 0
;
but I get ORA-00904: "TOTAL": invalid identifier.
So I have to use
SELECT a, b, a+b as TOTAL FROM (
select 7 as a, 8 as b FROM DUAL
UNION ALL
select 8 as a, 8 as b FROM DUAL
UNION ALL
select 0 as a, 0 as b FROM DUAL
)
WHERE a+b <> 0
;
Logically, the select clause is one of the last parts of a query evaluated, so the aliases and derived columns are not available. (Except to order by, which logically happens last.)
Using a derived table is away around this:
select *
from (SELECT a, b, a+b as TOTAL FROM (
select 7 as a, 8 as b FROM DUAL
UNION ALL
select 8 as a, 8 as b FROM DUAL
UNION ALL
select 0 as a, 0 as b FROM DUAL)
)
WHERE TOTAL <> 0
;
This will work...
select *
from (SELECT a, b, a+b as TOTAL FROM (
select 7 as a, 8 as b FROM DUAL
UNION ALL
select 8 as a, 8 as b FROM DUAL
UNION ALL
select 0 as a, 0 as b FROM DUAL)
) as Temp
WHERE TOTAL <> 0;