postgres count distinct values from multiple column

postgres count distinct values from multiple column - sql

I am trying to display all different values from 3 columns and the amount of them.
My table:
date | col1 | col2 | col3
-------------------------------
26...| a | a | b
25...| c | d | a
...
All 3 columns have the values a, b, c, d.
I would like to have something like this:
date | col | a | b | c | d
--------------------------------------
26.....| col1 | 1 | 0 | 0 | 0
26.....| col2 | 1 | 0 | 0 | 0
26.....| col3 | 0 | 1 | 0 | 0
25.....| col1 | 0 | 0 | 1 | 0
25.....| col2 | 0 | 0 | 0 | 1
Is there a way to do it?

Welcome to SO. Assuming that the possible values are fixed (a,b,c and d), an alternative is to create a row for each column and date in a CTE and in the outer query count them with a FILTER, e.g.
WITH j (date,col) AS (
SELECT date, unnest(array[col1,col2,col3])
FROM mytable
)
SELECT j.date, 'col'||j.col,
count(*) FILTER (WHERE col ='a'),
count(*) FILTER (WHERE col ='b'),
count(*) FILTER (WHERE col ='c'),
count(*) FILTER (WHERE col ='d')
FROM j
JOIN mytable t ON t.date = j.date
GROUP BY j.date,j.col
ORDER BY j.date,j.col;
Demo: db<>fiddle

Related

Expanding information from one row to all similarly grouped rows in SQL

I am not sure of the logic required to accomplish this, but I want to take a table like this...
+----+------+
| Id | Type |
+----+------+
| 10 | A |
| 10 | B |
| 10 | C |
| 20 | A |
| 20 | C |
+----+------+
...and end up with a table like this...
+----+------+---+---+---+
| Id | Type | A | B | C |
+----+------+---+---+---+
| 10 | A | 1 | 1 | 1 |
| 10 | B | 1 | 1 | 1 |
| 10 | C | 1 | 1 | 1 |
| 20 | A | 1 | 0 | 1 |
| 20 | C | 1 | 0 | 1 |
+----+------+---+---+---+
...where each Id will have new columns created to consolidate information about Type into every row of that Id. Since 10 has a row of types A, B, and C, then all rows that have an ID of 10 should have a 1/true in the new columns A, B and C.
I know how to do this on a per-row basis, but can't wrap my head around how to consolidate the information from multiple rows into each row of the same ID.

Try this below logic- Demo
SELECT *,
(SELECT COUNT(DISTINCT Type) FROM your_table B WHERE B.ID = A.Id and B.Type = 'A') A,
(SELECT COUNT(DISTINCT Type) FROM your_table C WHERE C.ID = A.Id and C.Type = 'B') B,
(SELECT COUNT(DISTINCT Type) FROM your_table D WHERE D.ID = A.Id and D.Type = 'C') C
FROM your_table A
And just another option- Demo
SELECT *,
SUM(CASE WHEN Type= 'A' THEN 1 ELSE 0 END) OVER(PARTITION BY Id) A,
SUM(CASE WHEN Type= 'B' THEN 1 ELSE 0 END) OVER(PARTITION BY Id) B,
SUM(CASE WHEN Type= 'C' THEN 1 ELSE 0 END) OVER(PARTITION BY Id) C
FROM your_table

Select most recent rows - last 24 hours

I have a table that looks like this:
col1 | col2 | col3 | t_insert
---------------------------------
1 | z | |2018-04-25 17:23:46.686816+10
1 | zy | |2018-04-26 18:53:46.686816+10
2 | f | |2018-04-26 19:23:46.686816+10
3 | g | |2018-04-27 17:23:46.686816+10
2 | z | |2018-04-27 18:23:46.686816+10
4 | z | |2018-04-27 20:13:46.686816+10
Where there are duplicate values in col1 I want to select by most recent timestamp and create a new column (col4) and insert the string 'update'.
Where there are not duplicate values in col1 I want to select the value and insert the string 'new' into col4.
Also I only want to select rows that have a timestamp from the last 24 hours.
The expected result: (This result dosen't show select rows from last 24 hours)
col1 | col2 | col3 | t_insert | col4 |
-------------------------------------------------------------
1 | zy | |2018-04-26 18:53:46.686816+10 |update |
3 | g | |2018-04-27 17:23:46.686816+10 |new |
2 | z | |2018-04-27 18:23:46.686816+10 |update |
4 | z | |2018-04-27 20:13:46.686816+10 |new |
Thanks in advance,

Hmmm, window function can help here:
select col, col2, col3, t_insert,
(case when cnt > 1 then 'update' else 'new' end) as col4
from (select t.*,
count(*) over (partition by col1) as cnt,
row_number() over (partition by col1 order by t_insert desc) as seqnum
from t
where t_insert >= now() - interval '24 hour'
) t
where seqnum = 1;

Get column with two two rows having specific values

I have a table that looks like this:
| col1 | col2 |
|------|------|
| a | 1 |
| a | 2 |
| a | 3 |
| b | 1 |
| b | 3 |
| c | 1 |
| c | 2 |
I need to find the value of col1 where two rows with the same col1 value exist that has a col2 value of 1 and 2
results would be:
| col1 |
|------|
| a |
| c |

You can filter the rows with the col2 values you want, then group by col1 and only take the groups with count = 2
select col1
from yourTable
where col2 in (1, 2)
group by col1
having count(distinct col2) = 2

Another solution would be
select col1
from your_table
group by col1
having sum(case when col2 = 1 then 1 else 0 end) > 0
and sum(case when col2 = 2 then 1 else 0 end) > 0

Fetch the column which has the Max value for a row in Hive

I have a scenario where i need to pick the greatest value in the row from three columns, there is a function called Greatest but it doesn't work in my version of Hive 0.13.
Please suggest better way to accomplish it.
Example table:
+---------+------+------+------+
| Col1 | Col2 | Col3 | Col4 |
+---------+------+------+------+
| Group A | 1 | 2 | 3 |
+---------+------+------+------+
| Group B | 4 | 5 | 1 |
+---------+------+------+------+
| Group C | 4 | 2 | 1 |
+---------+------+------+------+
expected Result:
+---------+------------+------------+
| Col1 | output_max | max_column |
+---------+------------+------------+
| Group A | 3 | Col4 |
+---------+------------+------------+
| Group B | 5 | col3 |
+---------+------------+------------+
| Group C | 4 | col2 |
+---------+------------+------------+

select col1
,tuple.col1 as output_max
,concat('Col',tuple.col2) as max_column
from (select Col1
,sort_array(array(struct(Col2,2),struct(Col3,3),struct(Col4,4)))[2] as tuple
from t
) t
;
sort_array(Array)
Sorts the input array in ascending order according to the natural ordering of the array elements and returns it
(as of version 0.9.0).
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF
hive> select col1
> ,tuple.col1 as output_max
> ,concat('Col',tuple.col2) as max_column
>
> from (select Col1
> ,sort_array(array(struct(Col2,2),struct(Col3,3),struct(Col4,4)))[2] as tuple
> from t
> ) t
> ;
OK
Group A 3 Col4
Group B 5 Col3
Group C 4 Col2

display records based on ranks and also delete duplicated data

i have a table like this
+------+------+------+------+
| col1 | col2 | col3 | rank |
+------+------+------+------+
| 1 | A | X | 4 |
| 2 | C | Y | 3 |
| 2 | C | Y | 3 |
| | A | X | 3 |
| 1 | B | Z | 2 |
+------+------+------+------+
(5 rows)
I need o/p like this
+------+------+------+------+
| col1 | col2 | col3 | rank |
+------+------+------+------+
| 1 | A | X | 4 |
| 2 | C | Y | 3 |
| 1 | B | Z | 2 |
+------+------+------+------+
so that I written query like below
select col1,col2,col3,rank,dense_rank() over(order by rank desc) from table1;
but its not giving proper o/p

try this !!
select a.col1,a.col2,a.col3,max(a.rank) as rank
from [dbo].[5] a join [dbo].[5] b
on a.col1=b.col1 group by a.col1,a.col2,a.col3

looks like you need aggregation with max():
select
col1,col2,col3,
max(rnk)
from table1
group by col1,col2,col3
If you could have different values of col1 for one combination of col2, col3, then distinct on is what you need:
select distinct on (col2, col3)
col1,col2,col3,
rnk
from table1
order by col2, col3, rnk desc
sql fiddle demo

The following should match what you are looking for:
select col1,col2,col3,rank,dense_rank() over(order by rank desc) from table1
WHERE col1 IS NOT NULL
GROUP BY 1, 2, 3, 4;
You can also use numeric aliases in your order by clause if you want one.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

postgres count distinct values from multiple column - sql

Related

Expanding information from one row to all similarly grouped rows in SQL

Select most recent rows - last 24 hours

Get column with two two rows having specific values

Fetch the column which has the Max value for a row in Hive

display records based on ranks and also delete duplicated data

Categories

Resources