Merge and aggregate two columns SQL - sql

I have a table with id, name and score and I am trying to extract the top scoring users. Each user may have multiple entries, and so I wish to SUM the score, grouped by user.
I have looked into JOIN operations, but they seem to be used when there are two separate tables, not with two 'views' of a single table.
The issue is that if the id field is present, the user will not have a name, and vice-versa.
A minimal example can be found at the following link: http://sqlfiddle.com/#!9/ce0629/11
Essentially, I have the following data:
id name score
--- ----- ------
1 '' 15
4 '' 20
NULL 'paul' 8
NULL 'paul' 11
1 '' 13
4 '' 17
NULL 'simon' 9
NULL 'simon' 12
What I want to end up with is:
id/name score
-------- ------
4 37
1 28
'simon' 21
'paul' 19
I can group by id easily, but it treats the NULLs as a single field, when really they are two separate users.
SELECT id, SUM(score) AS total FROM posts GROUP BY id ORDER by total DESC;
id score
--- ------
NULL 40
4 37
1 28
Thanks in advance.
UPDATE
The target environment for this query is in Hive. Below is the query and output looking only at the id field:
hive> SELECT SUM(score) as total, id FROM posts WHERE id is not NULL GROUP BY id ORDER BY total DESC LIMIT 10;
...
OK
29735 87234
20619 9951
20030 4883
19314 6068
17386 89904
13633 51816
13563 49153
13386 95592
12624 63051
12530 39677
Running the query below gives the exact same output:
hive> select coalesce(id, name) as idname, sum(score) as total from posts group by coalesce(id, name) order by total desc limit 10;
Running the following query using the new calculated column name idname gives an error:
hive> select coalesce(id, name) as idname, sum(score) as total from posts group by idname order by total desc limit 10;
FAILED: SemanticException [Error 10004]: Line 1:83 Invalid table alias or column reference 'idname': (possible column names are: score, id, name)

Your id looks numeric. In some databases, using coalesce() on a numeric and a string can be a problem. In any case, I would suggesting being explicit about the types:
select coalesce(cast(id as varchar(255)), name) as id_name,
sum(score) as total
from posts
group by id_name
order by total desc;

SELECT new_id, SUM(score) FROM
(SELECT coalesce(id,name) new_id, score FROM posts)o
GROUP BY new_id ORDER by total DESC;

You could use a COALESCE to get the non-NULL value of either column:
SELECT
COALESCE(id, name) AS id
, SUM(score) AS total
FROM
posts
GROUP BY
COALESCE(id, name)
ORDER by total DESC;

Related

Use window functions to select the value from a column based on the sum of another column, in an aggregate query

Consider this data (View on DB Fiddle):
id
dept
value
1
A
5
1
A
5
1
B
7
1
C
5
2
A
5
2
A
5
2
B
15
2
A
2
The base query I am running is pretty simple. Just get the total value by id and the most frequent dept.
SELECT
id,
MODE() WITHIN GROUP(ORDER BY dept) AS dept_freq,
SUM(value) AS value
FROM test
GROUP BY id
;
id
dept_freq
value
1
A
22
2
A
27
But I also need to get, for each id, the dept that concentrates the greatest value (so the greatest sum of value by id and dept, not the highest individual value in the original table).
Is there any way to use window functions to achieve that and do it directly in the base query above?
The expected output for this particular example would be:
id
dept_freq
dept_value
value
1
A
A
22
2
A
B
27
I could achieve that with the query below and then joining that with the results of the base query above
SELECT * FROM(
SELECT
*,
ROW_NUMBER() OVER(PARTITION BY id ORDER BY value DESC) as row
FROM (
SELECT id, dept, SUM(value) AS value
FROM test
GROUP BY id, dept
) AS alias1
) AS alias2
WHERE alias2.row = 1
;
id
dept
value
row
1
A
10
1
2
B
15
1
But it is not easy to read/maintain and seems also pretty inefficient. So I thought it should be possible to achieve this using window functions directly in the base query, and that also may also help Postgres to come up with a better query plan that does less passes over the data. But none of my attempts using over partition and filter worked.
step-by-step demo:db<>fiddle
You can fetch the dept for the highest values using the first_value() partition function. Adding this before your mode() grouping should do it:
SELECT
id,
highest_value_dept,
MODE() WITHIN GROUP(ORDER BY dept) AS dept_freq,
SUM(value) as value
FROM (
SELECT
id,
dept,
value,
FIRST_VALUE(dept) OVER (PARTITION BY id ORDER BY value DESC) as highest_value_dept
FROM test
) s
GROUP BY 1,2

histogram of duplicate id [sql]

I have table that each ID appear many times.
I want to create table that show me how many ID appear one time, how many appear two times and ets.
for example:
value count
1 123
2 513
3 215
.
.
.
value 1 mean that 123 ID apear one time . 513 ID apear 2 times (513 ID have two rows)
select id,count(id)as count_of_id
from your_table
group by id
having count(id)>1
SELECT value, COUNT(*)
FROM your_table
GROUP BY value
ORDER BY value
You can try this:
select count(id) as value
,id as count
from your_table_name
group by id
order by count(id)

How to find frequency in SQL

I am having some issues with SQL code, specifically in finding the frequency of an ID.
My table looks like
Num ID
136 23
1427 45
1415 67
1416 23
7426 45
4727 12
4278 67
...
I would need to see the frequency of ID, when this has more or equal 2 same values.
For example: 23, 45 and 67 in the table above.
I have tried as follows:
Select distinct *, count(*)
From table_1
Group by 1,2
Having count(*) >2
But it is wrong.
I need distinct, as I do not want any duplicates in Num.
I think I should you a counter to reset when the value of the next rows is different from the previous one and report the frequency (1, 2, 3, and so on), then select values greater or equal to 2, but Indo not know how to do it in Sql.
Could you help me please?
Thanks
Use ID only in GROUP BY :
SELECT ID, COUNT(*) AS No_frequency
FROM table t
GROUP BY id
HAVING COUNT(*) >= 2;
Note : If you have duplicate num then use distinct :
HAVING COUNT(DISTINCT num) >= 2;
If I understand your question, you can try this:
SELECT ID, COUNT(1)
FROM table_1
GROUP BY ID
HAVING COUNT(1) >= 2
In this way you have the ID's with 2 or more occurences and the number of occurences
EDIT
I suppose you are using MySql but add your DBMS in your question, so, try this:
SELECT ID, COUNT(1) as FREQUENCY, GROUP_CONCAT(NUM)
FROM table_1
GROUP BY ID
HAVING COUNT(1) >= 2
This works for me
SELECT ID, COUNT(ID) AS Frq FROM MyTable
GROUP BY ID
HAVING COUNT(ID) > 2
ORDER BY COUNT(ID) DESC

displaying distinct column value with corresponding column random 3 values

I have a table employee with columns (state_cd,emp_id,emp_name,ai_cd) how can i display disticnt state_cd with 3 different values from ai_cd
the answer should be
state_cd ai_cd
------- --------
TX 1
2
5
CA 9
10
11
This type of operation is normally better done in the application. But, you can do it in the query, if you really want to:
select (case when row_number() over (partition by state_cd order by ai_cd) = 1
then state_cd
end) as state_cd,
ai_cd
from employee e
order by e.state_cd, e.ai_cd;
The order by is very important, because SQL result sets are unordered. Your result requires ordering in order to make sense.
Just group by state_id and then count using count(Distinct column_name_)
select state_id from (select state_id,COUNT(DISTINCT ai_cd) as cnt from employee group by state_id) where cnt==3

totalling rows for distinct values in SQL

I haven't had much experience with SQL and it strikes me as a simple question, but after an hour of searching I still can't find an answer...
I have a table that I want to add up the totals for based on ID - e.g:
-------------
ID Quantity
1 30
2 11
1 4
1 3
2 17
3 16
.............
After summing the table should look something like this:
-------------
ID Quantity
1 37
2 28
3 16
I'm sure that I need to use the DISTINCT keyword and the SUM(..) function, but I can only get one total value for all unique value combinations in the table, and not separate ones like above. Help please :)
Select ID, Sum(Quantity) from YourTable
Group by ID
You can find here some resources to learn more about "Group by": http://www.w3schools.com/sql/sql_groupby.asp
SELECT ID, SUM(QUANTITY) FROM TABLE1 GROUP BY ID ORDER BY ID;
Select ID, Sum(Quantity) AS Quantity
from table1
Group by ID
Replace table1 with name of the table.
Just posting a complete answer that aliases the column and orders the results:
SELECT ID, SUM(Quantity) as [Quantity]
FROM TableName
GROUP BY ID
ORDER BY ID