How to find frequency in SQL - sql

I am having some issues with SQL code, specifically in finding the frequency of an ID.
My table looks like
Num ID
136 23
1427 45
1415 67
1416 23
7426 45
4727 12
4278 67
...
I would need to see the frequency of ID, when this has more or equal 2 same values.
For example: 23, 45 and 67 in the table above.
I have tried as follows:
Select distinct *, count(*)
From table_1
Group by 1,2
Having count(*) >2
But it is wrong.
I need distinct, as I do not want any duplicates in Num.
I think I should you a counter to reset when the value of the next rows is different from the previous one and report the frequency (1, 2, 3, and so on), then select values greater or equal to 2, but Indo not know how to do it in Sql.
Could you help me please?
Thanks

Use ID only in GROUP BY :
SELECT ID, COUNT(*) AS No_frequency
FROM table t
GROUP BY id
HAVING COUNT(*) >= 2;
Note : If you have duplicate num then use distinct :
HAVING COUNT(DISTINCT num) >= 2;

If I understand your question, you can try this:
SELECT ID, COUNT(1)
FROM table_1
GROUP BY ID
HAVING COUNT(1) >= 2
In this way you have the ID's with 2 or more occurences and the number of occurences
EDIT
I suppose you are using MySql but add your DBMS in your question, so, try this:
SELECT ID, COUNT(1) as FREQUENCY, GROUP_CONCAT(NUM)
FROM table_1
GROUP BY ID
HAVING COUNT(1) >= 2

This works for me
SELECT ID, COUNT(ID) AS Frq FROM MyTable
GROUP BY ID
HAVING COUNT(ID) > 2
ORDER BY COUNT(ID) DESC

Related

SQL need to compare rows count values

I have a query that returns the ID, Name and count of the number of times an ID has been entered to the table.
SELECT
ID,
NAME,
COUNT(*) count
FROM
TABLE
GROUP BY
NAME, ID, CASE_DETAIL_ID
HAVING
COUNT(*) > 1;
This returns the following data:
ID
NAME
COUNT
123
HAT
10
123
UMBRELLA
10
123
TOWEL
10
123
WATER
8
555
HAT
3
555
UMBRELLA
10
555
TOWEL
10
555
WATER
10
322
UMBRELLA
5
322
TOWEL
20
322
WATER
20
I want to be able to query the row with a count of less than what the other rows with the same ID have. How can I do this? So that the end result is:
ID
NAME
COUNT
FULL COUNT
123
WATER
8
10
555
HAT
3
10
322
UMBRELLA
5
20
There are multiple IDs that we store and I only want the rows/names that have a count less than the rows with the same IDs have.
I have also tried -
WITH x AS
(SELECT ID, NAME, COUNT(*) count
FROM FRT.CASE_DETAIL_HISTORY
GROUP BY
NAME,
ID,
CASE_DETAIL_ID)
SELECT x.ID, t.NAME, X.COUNT, MIN(x.count)
FROM x
JOIN FRT.CASE_DETAIL_HISTORY t
on t.ID= x.ID
GROUP BY x.ID, t.ID, X.COUNT
However, this doesnt give me what I am looking for. I only want rows returned if the name's count doesnt match the 'mode' count of the ID.
I also have tried the below but keep facing errors:
WITH COUNT_OF_ROWS AS
(SELECT ID, NAME, COUNT(*) count
FROM TABLE
GROUP BY NAME, ID, CASE_DETAIL_ID
HAVING COUNT(*) >= 1),
MINIMUM AS
(SELECT COUNT_OF_ROWS.ID, COUNT_OF_ROWS.NAME,
MIN(COUNT_OF_ROWS.COUNT) MINI
FROM COUNT_OF_ROWS
JOIN TABLE CD on CD.ID = COUNT_OF_ROWS.ID
GROUP BY COUNT_OF_ROWS.ID, COUNT_OF_ROWS.NAME
)
select distinct COUNT_OF_ROWS.*, MINIMUM.MINI
from minimum, count_of_rows
where minimum.mini != count_of_rows.count;
Some sample data would help but you can use a CTE, and select the lowest using min() something like this:
WITH x AS(
SELECT t.id, t.nametext, COUNT(*) as count
FROM table t
GROUP BY id, t.nametext, CASE_DETAIL_ID
), y as(
SELECT x.id, MIN(x.[COUNT]) as mincount
FROM x
GROUP BY x.id
)
select y.id, x.nametext, y.mincount
from y
join x
on x.[COUNT] = y.mincount
and x.id = y.id
Or it can be done using top 1 and order by like this:
SELECT TOP 1 id, name, COUNT(*) as count
FROM TABLE
WHERE ID = 123
GROUP BY NAME, ID, CASE_DETAIL_ID
ORDER BY count DESC
But bare in mind that as this would select only the first row, this would only work with the where clause because it would always only return 1 row.
While if you use the CTE option it would work also if you want per id, without where id = 123.

Merge and aggregate two columns SQL

I have a table with id, name and score and I am trying to extract the top scoring users. Each user may have multiple entries, and so I wish to SUM the score, grouped by user.
I have looked into JOIN operations, but they seem to be used when there are two separate tables, not with two 'views' of a single table.
The issue is that if the id field is present, the user will not have a name, and vice-versa.
A minimal example can be found at the following link: http://sqlfiddle.com/#!9/ce0629/11
Essentially, I have the following data:
id name score
--- ----- ------
1 '' 15
4 '' 20
NULL 'paul' 8
NULL 'paul' 11
1 '' 13
4 '' 17
NULL 'simon' 9
NULL 'simon' 12
What I want to end up with is:
id/name score
-------- ------
4 37
1 28
'simon' 21
'paul' 19
I can group by id easily, but it treats the NULLs as a single field, when really they are two separate users.
SELECT id, SUM(score) AS total FROM posts GROUP BY id ORDER by total DESC;
id score
--- ------
NULL 40
4 37
1 28
Thanks in advance.
UPDATE
The target environment for this query is in Hive. Below is the query and output looking only at the id field:
hive> SELECT SUM(score) as total, id FROM posts WHERE id is not NULL GROUP BY id ORDER BY total DESC LIMIT 10;
...
OK
29735 87234
20619 9951
20030 4883
19314 6068
17386 89904
13633 51816
13563 49153
13386 95592
12624 63051
12530 39677
Running the query below gives the exact same output:
hive> select coalesce(id, name) as idname, sum(score) as total from posts group by coalesce(id, name) order by total desc limit 10;
Running the following query using the new calculated column name idname gives an error:
hive> select coalesce(id, name) as idname, sum(score) as total from posts group by idname order by total desc limit 10;
FAILED: SemanticException [Error 10004]: Line 1:83 Invalid table alias or column reference 'idname': (possible column names are: score, id, name)
Your id looks numeric. In some databases, using coalesce() on a numeric and a string can be a problem. In any case, I would suggesting being explicit about the types:
select coalesce(cast(id as varchar(255)), name) as id_name,
sum(score) as total
from posts
group by id_name
order by total desc;
SELECT new_id, SUM(score) FROM
(SELECT coalesce(id,name) new_id, score FROM posts)o
GROUP BY new_id ORDER by total DESC;
You could use a COALESCE to get the non-NULL value of either column:
SELECT
COALESCE(id, name) AS id
, SUM(score) AS total
FROM
posts
GROUP BY
COALESCE(id, name)
ORDER by total DESC;

totalling rows for distinct values in SQL

I haven't had much experience with SQL and it strikes me as a simple question, but after an hour of searching I still can't find an answer...
I have a table that I want to add up the totals for based on ID - e.g:
-------------
ID Quantity
1 30
2 11
1 4
1 3
2 17
3 16
.............
After summing the table should look something like this:
-------------
ID Quantity
1 37
2 28
3 16
I'm sure that I need to use the DISTINCT keyword and the SUM(..) function, but I can only get one total value for all unique value combinations in the table, and not separate ones like above. Help please :)
Select ID, Sum(Quantity) from YourTable
Group by ID
You can find here some resources to learn more about "Group by": http://www.w3schools.com/sql/sql_groupby.asp
SELECT ID, SUM(QUANTITY) FROM TABLE1 GROUP BY ID ORDER BY ID;
Select ID, Sum(Quantity) AS Quantity
from table1
Group by ID
Replace table1 with name of the table.
Just posting a complete answer that aliases the column and orders the results:
SELECT ID, SUM(Quantity) as [Quantity]
FROM TableName
GROUP BY ID
ORDER BY ID

SQL query to select most recent of duplicates

I have a table of values, with a date stored against each entry for example
Name
Age
PaymentAmount
Date
Can someone help me to write a query that would show the most recent payment only of any person within a certain age range.
E.g If I had 5 entries, and wanted the most recent payment of all people aged 20-25
Allan, 45, $1500, 1/1/2014
Tim, 22, $1500, 1/2/2001
John, 25, $2000, 2/3/2001
Tim, 22, $2500, 1/2/2010
John, 25, $3000, 2/3/2010
It would return the bottom 2 rows only
You didn't state your DBMS, so this is ANSI SQL
select *
from (
select name,
age,
PaymentAmount,
Date,
row_number() over (partition by name order by date desc) as rn
from the_table
where age between 22 and 25
) t
where rn = 1;
Another option is to use a co-related subquery:
select name,age,paymentamount,date
from the_table t1
where age between 22 and 25
and date = (select max(date)
from the_table t2
where t2.name = t1.name
and t2.age between 22 and 25)
order by name;
Usually the solution with a window function is faster than the co-related subquery as only a single access to the table is needed.
SQLFiddle: http://sqlfiddle.com/#!15/17e37/4
Btw: having a column named age is a bit suspicious because you need to update that every year. You should rather store the date of birth and then calculate the age when retrieving the data.
This query would give you all records of most recent payment of age 20 and 25. Limit it by using TOP 2 or LIMIT 2 or rownum <=2 as per your DB syntax
SELECT NAME,AGE,PAYMENTAMOUNT,DATE FROM MY_TABLE
WHERE AGE BETWEEN 20 AND 25
AND DATE IN
(
SELECT MAX(DATE)
FROM MY_TABLE
WHERE
AGE BETWEEN 20 AND 25
);
EDIT as per horse_with_no_name:
SELECT NAME,AGE,PAYMENTAMOUNT,DATE
FROM the_table
WHERE AGE BETWEEN 20 AND 25
AND DATE IN
(
SELECT (DATE)
FROM the_table
WHERE
AGE BETWEEN 20 AND 25 order by date desc limit 2
)
limit 2;
Fiddle reference : http://sqlfiddle.com/#!15/17e37/10
Simplest of all,Try this following query
select name,age,paymentamount,date from yourtablename where date in (select max(date) from yourtablename where age between 20 and 25 and group by name);
You should Create a Table with Identity Column to make your Life easier
ColumnPrimaryKey IDENTITY (1,1)
Name
Age
PaymentAmount
Date
SELECT TOP 2 * FROM [TableName] Where Age BETWEEN 20 AND 25 ORDER BY [PrimaryKey] DESC
The above query will return the top two row Inserted in table
You can use between like
select * from meta where title='$title' and (date between '$start_date' and '$end_date').
Okay, I know you said SQL-- here's for people with two layers.
VIA SQL:
Order your SQL results by date descending (should be newest to oldest...).
VIA YOUR "BACK END":
Create an empty final set.
As you are iterating through your results, if your result row person is not in your final set, add the data to the final set.
Boom, your final set has the latest of each person.

Grouping by number of occurrences of a repeatable value in Oracle SQL

Lets assume we have a table like this.
id name value
1 x 12
2 x 23
3 y 47
4 x 18
5 y 29
6 z 45
7 y 67
Doing a normal group by name would yield us
select name,count(*) from table group by name;
name count(*)
x 3
y 3
z 1
I want to get the reverse.. ie. grouping the number of names that occur a set number of times. I want my output to be
count number of elements occuring count times
1 1
3 2
Is it possible to do this using just a single query? Another way is to use a temp table but I dont want to do that.
Thanks
You need one more group by:
select cnt, count(*), min(name), max(name)
from (select name, count(*) as cnt
from table
group by name
) n
group by cnt
order by 1;
I do these types of histogram queries all the time. The min() and max() provide sample data. This is useful to understand outliers and unexpected values.
You can GROUP BY twice, e.g.
with
Names as (
select name as name,
count(1) as cnt
from MyTable
group by name)
select count(1),
cnt
from Names
group by cnt