Get the mostly occured value in multiple columns of a table

Get the mostly occured value in multiple columns of a table - sql

I have table which contains three columns Work, Cost, Duration. I need to get the maximum
occurred values in all three columns. If two values occurred same times, then return the
maximum value from that two. Please see the sample data & result below.
Work Cost Duration
5 2 6
5 8 7
6 8 7
2 2 2
6 2 6
I need to get the result as
Work Cost Duration
6 2 7
I tried with the following query, But it is returning the value for one column, that too it is returning the count for all the values
select Duration, count(*) as "DurationCount" from SimulationResult
group by Duration
order by count(*) desc,Duration desc

You can do something like
select * from
(select top 1 Work from SimulationResult
group by Work
order by count(*) desc, Work desc),
(select top 1 Cost from SimulationResult
group by Cost
order by count(*) desc, Cost desc),
(select top 1 Duration from SimulationResult
group by Duration
order by count(*) desc, Duration desc)

Try the following:
select max(t1.a), max(t2.b), max(t3.c)
from
(select a from (
select a, count(a) counta
from #tab
group by a) tempa
having counta = max(counta)) t1,
(select b from (
select b, count(b) countb
from #tab
group by b) tempb
having countb = max(countb)) t2,
(select c from (
select c, count(c) countc
from #tab
group by c) tempc
having countc = max(countc)) t3

Related

select value based on max of other column

I have a few questions about a table I'm trying to make in Postgres.
The following table is my input:
id
area
count
function
1
100
20
living
1
200
30
industry
2
400
10
living
2
400
10
industry
2
400
20
education
3
150
1
industry
3
150
1
education
I want to group by id and get the dominant function based on max area. With summing up the rows for area and count. When area is equal it should be based on max count, when area and count is equal it should be based on prior function (i still have to decide if education is prior to industry or vice versa). So the result should be:
id
area
count
function
1
300
50
industry
2
1200
40
education
3
300
2
industry
I tried a lot of things and maybe it's easy, but i don't get it. Can someone help to get the right SQL?

One method uses row_number() and conditional aggregation:
select id, sum(area), sum(count),
max(function) over (filter where seqnum = 1) as function
from (select t.*,
row_number() over (partition by id order by area desc) as seqnum
from t
) t
group by id;
Another method uses ``distinct on`:
select id, sum(area) over (partition by id) as area,
sum(count) over (partition by id) as count,
function
from t
order by id, area desc;

Use a scalar sub-query for "function".
select t.id, sum(t.area), sum(t.count),
(
select "function"
from the_table
where id = t.id
order by area desc, count desc, "function" desc
limit 1
) as "function"
from the_table as t
group by t.id order by t.id;
SQL Fiddle

you can use sum as window function:
select distinct on (t.id)
id,
sum(area) over (partition by id) as area,
sum(count) over (partition by id) as count,
( select function from tbl_test where tbl_test.id = t.id order by count desc limit 1 ) as function
from tbl_test t

This is how you get the function for each group based on id:
select id, function
from yourtable yt1
left join yourtable yt2
on yt1.id = yt2.id and yt1.area < yt2.area
where yt2.area.id is null;
(we ensure that no yt2 exists that would be of the same id but of higher areay)
This would work nicely, but you might have several max areas with different values. To cope with this isue, let's ensure that exactly one is chosen:
select id, max(function) as function
from yourtable yt1
left join yourtable yt2
on yt1.id = yt2.id and yt1.area < yt2.area
where yt2.area.id is null
group by id;
Now, let's join this to our main table;
select yourtable.id, sum(yourtable.area), sum(yourtable.count), t.function
from yourtable
join (
select id, max(function) as function
from yourtable yt1
left join yourtable yt2
on yt1.id = yt2.id and yt1.area < yt2.area
where yt2.area.id is null
group by id
) t
on yourtable.id = t.id
group by yourtable.id;

SQL MIN(value) matching row in PostgreSQL

I have a following tables:
TABLE A:
ID ID NAME PRICE CODE
00001 B 1000 1
00002 A 2000 1
00003 C 3000 1
Here is the SQL I use:
Select Min (ID),
Min (ID NAME),
Sum(PRICE)
From A
GROUP BY CODE
Here is what I get:
ID ID NAME PRICE
00001 A 6000
As you can see, ID NAME don't match up with the min row value. I need them to match up.
I would like the query to return the following
ID ID NAME PRICE
00001 B 6000
What SQL can I use to get that result?

If you want one row, use limit or fetch first 1 row only:
select a.*
from a
order by a.price asc
fetch first 1 row only;
If, for some reason, you want the sum() of all prices, then you can use window functions:
select a.*, sum(a.price) over () as sum_prices
from a
order by a.price asc
fetch first 1 row only;

You can use row_number() function :
select min(id), max(case when seq = 1 then id_name end) as id_name, sum(price) as price, code
from (select t.*, row_number() over (partition by code order by id) seq
from table t
) t
group by code;

you can also use sub-query
select t1.*,t2.* from
(select ID,Name from t where ID= (select min(ID) from t)
) as t1
cross join (select sum(Price) as total from t) as t2
https://dbfiddle.uk/?rdbms=postgres_10&fiddle=a496232b552390a641c0e5c0fae791d1
id name total
1 B 6000

Get highest highscore entries between given dates

There is a scores_score table which contains following columns:
id, player_name, value, created_at
I have to fetch N (100) best scores where:
player_name must be unique across results
only best score for given player_name should be returned
results have to be filtered by date range
Lets say I have following data:
id player_name value date
1 A 400 2016-09-10
2 B 200 2016-09-12
3 C 400 2016-09-15
4 C 500 2016-09-14
5 B 100 2016-09-20
6 A 6000 2015-01-01
7 B 1200 2016-09-29
And want to get best players with their scores between 2016-09-01 and 2016-09-20. I should get the:
id player_name value date
4 C 500 2016-09-14
1 A 400 2016-09-10
2 B 200 2016-09-12
This is my approach to solve it, but there is an issue in nested SELECT as it fetches the best score of the player overall not within date ranges.
SELECT b.*, a.*
FROM (SELECT player_name, max(value) AS max_value
FROM scores_score
GROUP BY player_name
ORDER BY max(value) DESC) a
INNER JOIN scores_score b ON a.player_name = b.player_name AND a.max_value = b.value
WHERE CAST(b.created_at AS DATE) >= %(date_border)s
ORDER BY b.value DESC
LIMIT 100

distinct on
select *
from (
select distinct on (player_name) *
from scores_score
where date between '2016-09-01' and '2016-09-20'
order by player_name, value desc
) s
order by value desc
limit 100

This is going to work and will provide you with expected output. Use row_number() window function to mark highest score for each player between dates (rn = 1) and then order the result set by value descending and finally limit the output to 100 highest.
select
id, player_name, value, created_at
from (
select
id, player_name, value, created_at,
row_number() over (partition by player_name order by value desc, id) as rn
from scores_score
where created_at between '2016-09-01' and '2016-09-20'
) ranks
where rn = 1
order by value desc
limit 100
Note that additional column id for sorting within row_number function is to resolve ties (even though it assigns only one value per row within partition) that would involve the same player having two rows with equal values that are within given date. This would get older record and if they differ with created_at date you would see a difference in the output :-)

This one is a little cumbersome but should work. First select just the players and values within your date range (a). Then select the max score by player (b). Then join the id and date (c):
SELECT c.id, c.player_name, c.value, c.date
FROM
scores_score c
INNER JOIN
(SELECT player_name, max(value)
FROM
(SELECT player_name, value
FROM scores_score
WHERE date BETWEEN '2016-09-01' AND '2016-09-20') a
GROUP BY player_name) b
ON c.player_name = b.player_name
AND c.value = b.value
ORDER BY value
LIMIT 100
Tested here: http://sqlfiddle.com/#!9/10db42/6

Oracle SQL query : finding the last time a data was changed

I want to retrieve elapsed days since the last time the data of the specific column was changed, for example :
TABLE_X contains
ID PDATE DATA1 DATA2
A 10-Jan-2013 5 10
A 9-Jan-2013 5 10
A 8-Jan-2013 5 11
A 7-Jan-2013 5 11
A 6-Jan-2013 14 12
A 5-Jan-2013 14 12
B 10-Jan-2013 3 15
B 9-Jan-2013 3 15
B 8-Jan-2013 9 15
B 7-Jan-2013 9 15
B 6-Jan-2013 14 15
B 5-Jan-2013 14 8
I simplify the table for example purpose.
The result should be :
ID DATA1_LASTUPDATE DATA2_LASTUPDATE
A 4 2
B 2 5
which says,
- data1 of A last update is 4 days ago,
- data2 of A last update is 2 days ago,
- data1 of B last update is 2 days ago,
- data2 of B last update is 5 days ago.
Using query below is OK but it takes too long to complete if I apply it to the real table which have lots of records and add 2 more data columns to find their latest update days.
I use LEAD function for this purposes.
Any other alternatives to speed up the query?
with qdata1 as
(
select ID, pdate from
(
select a.*, row_number() over (partition by ID order by pdate desc) rnum from
(
select a.*,
lead(data1,1,0) over (partition by ID order by pdate desc) - data1 as data1_diff
from table_x a
) a
where data1_diff <> 0
)
where rnum=1
),
qdata2 as
(
select ID, pdate from
(
select a.*, row_number() over (partition by ID order by pdate desc) rnum from
(
select a.*,
lead(data2,1,0) over (partition by ID order by pdate desc) - data2 as data2_diff
from table_x a
) a
where data2_diff <> 0
)
where rnum=1
)
select a.ID,
trunc(sysdate) - b.pdate data1_lastupdate,
trunc(sysdate) - c.pdate data2_lastupdate,
from table_master a, qdata1 b, qdata2 c
where a.ID=b.ID(+) and a.ID=b.ID(+)
and a.ID=c.ID(+) and a.ID=c.ID(+)
Thanks a lot.

You can avoid the multiple hits on the table and the joins by doing both lag (or lead) calculations together:
with t as (
select id, pdate, data1, data2,
lag(data1) over (partition by id order by pdate) as lag_data1,
lag(data2) over (partition by id order by pdate) as lag_data2
from table_x
),
u as (
select t.*,
case when lag_data1 is null or lag_data1 != data1 then pdate end as pdate1,
case when lag_data2 is null or lag_data2 != data2 then pdate end as pdate2
from t
),
v as (
select u.*,
rank() over (partition by id order by pdate1 desc nulls last) as rn1,
rank() over (partition by id order by pdate2 desc nulls last) as rn2
from u
)
select v.id,
max(trunc(sysdate) - (case when rn1 = 1 then pdate1 end))
as data1_last_update,
max(trunc(sysdate) - (case when rn2 = 1 then pdate2 end))
as data2_last_update
from v
group by v.id
order by v.id;
I'm assuming that you meant your data to be for Jun-2014, not Jan-2013; and that you're comparing the most recent change dates with the current date. With the data adjusted to use 10-Jun-2014 etc., this gives:
ID DATA1_LAST_UPDATE DATA2_LAST_UPDATE
-- ----------------- -----------------
A 4 2
B 2 5
The first CTE (t) gets the actual table data and adds two extra columns, one for each of the data columns, using lag (whic his the the same as lead ordered by descending dates).
The second CTE (u) adds two date columns that are only set when the data columns are changed (or when they are first set, just in case they have never changed). So if a row has data1 the same as the previous row, its pdate1 will be blank. You could combine the first two by repeating the lag calculation but I've left it split out to make it a bit clearer.
The third CTE (v) assigns a ranking to those pdate columns such that the most recent is ranked first.
And the final query works out the difference from the current date to the highest-ranked (i.e. most recent) change for each of the data columns.
SQL Fiddle, including all the CTEs run individually so you can see what they are doing.

Your query wasn't returning the right results for me, maybe I missed something, but I got the correct results also with the below query (you can check this SQLFiddle demo):
with ranked as (
select ID,
data1,
data2,
rank() over(partition by id order by pdate desc) r
from table_x
)
select id,
sum(DATA1_LASTUPDATE) DATA1_LASTUPDATE,
sum(DATA2_LASTUPDATE) DATA2_LASTUPDATE
from (
-- here I get when data1 was updated
select id,
count(1) DATA1_LASTUPDATE,
0 DATA2_LASTUPDATE
from ranked
start with r = 1
CONNECT BY (PRIOR data1 = data1)
and PRIOR r = r - 1
group by id
union
-- here I get when data2 was updated
select id,
0 DATA1_LASTUPDATE,
count(1) DATA0_LASTUPDATE
from ranked
start with r = 1
CONNECT BY (PRIOR data2 = data2)
and PRIOR r = r - 1
group by id
)
group by id

Create array in SELECT

I'm using PostgreSQL 9.1 and I have this data structure:
A B
-------
1 a
1 a
1 b
1 c
1 c
1 c
1 d
2 e
2 e
I need a query that produces this result:
1 4 {{c,3},{a,2},{b,1},{d,1}}
2 1 {{e,2}}
A=1, 4 rows total with A=1, the partial counts (3 rows with c value, 2 rows with a value, .....)
The distinct values of column "A"
The count of all rows related to the "A" value
An array contains all the elements related to the "A" value and the relative count of itself
The sort needed for the array is based of the count of each group (like the example 3,2,1,1).

This should do the trick:
SELECT a
, sum(ab_ct)::int AS ct_total
, count(*)::int AS ct_distinct_b
, array_agg(b || ', ' || ab_ct::text) AS b_arr
FROM (
SELECT a, b, count(*) AS ab_ct
FROM tbl
GROUP BY a, b
ORDER BY a, ab_ct DESC, b -- append "b" to break ties in the count
) t
GROUP BY a
ORDER BY ct_total DESC;
Returns:
ct_total: total count of b per a.
ct_distinct_b: count of distinct b per a.
b_arr: array of b plus frequency of b, sorted by frequency of b.
Ordered by total count of b per a.
Alternatively, you can use an ORDER BY clause within the aggregate call in PostgreSQL 9.0 or later. Like:
SELECT a
, sum(ab_ct)::int AS ct_total
, count(*)::int AS ct_distinct_b
, array_agg(b || ', ' || ab_ct::text ORDER BY a, ab_ct DESC, b) AS b_arr
FROM (
SELECT a, b, count(*) AS ab_ct
FROM tbl
GROUP BY a, b
) t
GROUP BY a
ORDER BY ct_total DESC;
May be clearer. But it's typically slower. And sorting rows in a subquery works for simple queries like this one. More explanation:
How to apply ORDER BY and LIMIT in combination with an aggregate function?

Maybe I'm missing something, but this should do it:
SELECT a,
count(*) as cnt,
array_agg(b) as all_values
FROM your_table
GROUP BY a

This is what you need:
SELECT A, COUNT(*), array_agg(b)
FROM YourTable
GROUP BY A

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Get the mostly occured value in multiple columns of a table - sql

Related

select value based on max of other column

SQL MIN(value) matching row in PostgreSQL

Get highest highscore entries between given dates

Oracle SQL query : finding the last time a data was changed

Create array in SELECT

Categories

Resources