How to count distinct records - sql

Could anybody please help me on SQL command?
I have a table (tbl_sActivity) that have below data:
user_id | client_id | act_status |
1 | 7 |
cold |
1 | 7 |
dealed |
22 | 5 |
cold |
1 | 6 |
cold |
1 | 6 |
warm |
1 | 6 |
hot |
1 | 6 |
dealed |
1 | 8 |
warm |
1 | 8 |
dealed |
21 | 4 |
warm |
21 | 4 |
dealed |
The out put should be
user_id | Count_C_id |
1 |
3 |
21 |
1 |
22 |
1 |
I've searched from net and learnt that MS ACCESS cannot use COUNT(DISTINCT) function. So I'm stuck at this stage for days.

Try this one. The "trick" is to have a subquery first to get all the distinct combinations of user and client IDs and then do the grouping per user:
SELECT
user_id
, COUNT(*) AS count_distinct_clients
FROM
( SELECT DISTINCT
user_id,
client_id
FROM tbl_sActivity
) AS tmp
GROUP BY
user_id ;

Recommendation is to make query without using sub-query.
Please find the below code which will be faster and accurate then subquery.
// Temp Table
CREATE TABLE #TempStudent(userId int, c_id int , Name varchar(MAX) )
SELECT max(userid) as UserId, count(c_id) as C_ID from #TempStudent
GROUP BY userId

Related

More efficient way to SELECT rows from PARTITION BY

Suppose I have the following table:
+----+-------------+-------------+
| id | step_number | employee_id |
+----+-------------+-------------+
| 1 | 1 | 3 |
| 1 | 2 | 3 |
| 1 | 3 | 4 |
| 2 | 2 | 3 |
| 2 | 3 | 4 |
| 2 | 4 | 5 |
+----+-------------+-------------+
My desired results are:
+----+-------------+-------------+
| id | step_number | employee_id |
+----+-------------+-------------+
| 1 | 1 | 3 |
| 2 | 2 | 3 |
+----+-------------+-------------+
My current solution is:
SELECT
*
FROM
(SELECT
id,
step_number,
MIN(step_number) OVER (PARTITION BY id) AS min_step_number,
employee_id
FROM
table_name) AS t
WHERE
t.step_number = t.min_step_number
Is there a more efficient way I could be doing this?
I'm currently using postgresql, version 12.
In Postgres, I would recommend using distinct on to adress this greatest-n-per-group problem:
select distinct on (id) t.*
from mytbale t
order by id, step_number
This Postgres extension to the SQL standard has usually better performance than the standard approach using window functions (and, as a bonus, the syntax is neater).
Note that this assumes unicity of (id, step_number) tuples: otherwise, the results might be different than those of your query (which allows ties, while distinct on does not).

Select record with max value from each group with Query DSL

I have a score table where I have players scores, and I want select unique records for each player with the biggest score.
Here is the table:
id | player_id | score | ...
1 | 1 | 10 | ...
2 | 2 | 21 | ...
3 | 3 | 9 | ...
4 | 1 | 30 | ...
5 | 3 | 2 | ...
Expected result:
id | player_id | score | ...
2 | 2 | 21 | ...
3 | 3 | 9 | ...
4 | 1 | 30 | ...
I can achieve that with pure SQL like this:
SELECT *
FROM player_score ps
WHERE ps.score =
(
SELECT max(ps2.score)
FROM player_score ps2
WHERE ps2.player_id = ps.player_id
)
Can you tell me how to achieve the same query with query dsl? I found some solutions with JPASubQuery but this class doesn't work for me (my IDE cannot resolve this class). I am using querydsl 4.x. Thank you in advance.
JPASubQuery has been removed in querydsl 4. Instead use JPAExpressions.select. Your WHERE clause should look something like this:
.where(playerScore.score.eq(JPAExpressions.select(playerScore2.score.max())
.from(playerScore2))
.where(playerScore2.playerId.eq(playerScore.playerId)))

Filtering using aggregation functions

I would like to filter my table by MIN() function but still keep columns which cant be grouped.
I have table:
+----+----------+----------------------+
| ID | distance | geom |
+----+----------+----------------------+
| 1 | 2 | DSDGSAsd23423DSFF |
| 2 | 11.2 | SXSADVERG678BNDVS4 |
| 2 | 2 | XCZFETEFD567687SDF |
| 3 | 24 | SADASDSVG3423FD |
| 3 | 10 | SDFSDFSDF343DFDGF |
| 4 | 34 | SFDHGHJ546GHJHJHJ |
| 5 | 22 | SDFSGTHHGHGFHUKJYU45 |
| 6 | 78 | SDFDGDHKIKUI45 |
| 6 | 15 | DSGDHHJGHJKHGKHJKJ65 |
+----+----------+----------------------+
This is what I would like to achieve:
+----+----------+----------------------+
| ID | distance | geom |
+----+----------+----------------------+
| 1 | 2 | DSDGSAsd23423DSFF |
| 2 | 2 | XCZFETEFD567687SDF |
| 3 | 10 | SDFSDFSDF343DFDGF |
| 4 | 34 | SFDHGHJ546GHJHJHJ |
| 5 | 22 | SDFSGTHHGHGFHUKJYU45 |
| 6 | 15 | DSGDHHJGHJKHGKHJKJ65 |
+----+----------+----------------------+
it is possible when I use MIN() on distance column and grouping by ID but then I loose my geom which is essential.
The query looks like this:
SELECT "ID", MIN(distance) AS distance FROM somefile GROUP BY "ID"
the result is:
+----+----------+
| ID | distance |
+----+----------+
| 1 | 2 |
| 2 | 2 |
| 3 | 10 |
| 4 | 34 |
| 5 | 22 |
| 6 | 15 |
+----+----------+
but this is not what I want.
Any suggestions?
One common approach to this is to find the minimum values in a derived table that you join with:
SELECT somefile."ID", somefile.distance, somefile.geom
FROM somefile
JOIN (
SELECT "ID", MIN(distance) AS distance FROM somefile GROUP BY "ID"
) t ON t.distance = somefile.distance AND t.ID = somefile.ID;
Sample SQL Fiddle
You need a window function to do this:
SELECT "ID", distance, geom
FROM (
SELECT "ID", distance, geom, rank() OVER (PARTITION BY "ID" ORDER BY distance) AS rnk
FROM somefile) sub
WHERE rnk = 1;
This effectively orders the entire set of rows first by the "ID" value, then by the distance and returns the record for each "ID" where the distance is minimal - no need to do a GROUP BY.
select a.*,b.geom from
(SELECT ID, MIN(distance) AS distance FROM somefile GROUP BY ID) as a
inner join somefile as b on a.id=b.id and a.distance=b.distance
You can use "distinct on" clause of the PostgreSQL.
select distinct on(id) id, distance, geom
from table_name
order by distance;
I think this is what you are exactly looking for.
For more details on how "distinct on" works, refer the documentation and the example.
But, remember, using "distinct on" does not comply to SQL standards.

selecting data with highest field value in a field

I have a table, and I'd like to select rows with the highest value. For example:
----------------
| user | index |
----------------
| 1 | 1 |
| 2 | 1 |
| 2 | 2 |
| 3 | 4 |
| 3 | 7 |
| 4 | 1 |
| 5 | 1 |
----------------
Expected result:
----------------
| user | index |
----------------
| 1 | 1 |
| 2 | 2 |
| 3 | 7 |
| 4 | 1 |
| 5 | 1 |
----------------
How may I do so? I assume it can be done by some oracle function I am not aware of?
Thanks in advance :-)
You can use MAX() function for that with grouping user column like this:
SELECT "user"
,MAX("index") AS "index"
FROM Table1
GROUP BY "user"
ORDER BY "user";
Result:
| USER | INDEX |
----------------
| 1 | 1 |
| 2 | 2 |
| 3 | 7 |
| 4 | 1 |
| 5 | 1 |
See this SQLFiddle
if you have more than one column
select user , index
from (
select u.* , row_number() over (partition by user order by index desc) as rnk
from some_table u)
where rnk = 1
user is a reserved word - you should use a different name for the column.
select user,max(index) index from tbl
group by user;
Alternatively, you can use analytic functions:
select user,index, max(index) over (partition by user order by 1 ) highest from YOURTABLE
Note: Try NOT to use words like user, index, date etc.. as your column names, as they are reserved words for Oracle. If you will use, then use them with quotation marks, eg. "index", "date"...

SQL query: please help weird output on SQL query. (Counting each client latest status)

I used to ask this question in this SO and already got the solution, however, that sql query produced only partial correct result. I have tried to figure it out but it seem to complicate for me to understand. So would you please kindly help me. Thank you.
for table tbl_sActivity
act_id| Client_id | act_status| user_id | act_date
1 | 7 | warm | 1 | 19/7/12
2 | 7 | dealed | 1 | 30/7/12 <- lastest status of client(7)
3 | 8 | hot | 1 | 6/8/12 <- lastest status of client(8)
4 | 5 | cold | 22 | 7/8/12 <- lastest status of client(5)
5 | 6 | cold | 1 | 16/7/12
6 | 6 | warm | 1 | 18/7/12
7 | 6 | dealed | 1 | 7/8/12 <- lastest status of client(6)
8 | 9 | warm | 26 | 2/8/12
9 | 10 | warm | 26 | 2/8/12
10 | 9 | hot | 26 | 4/8/12 <- lastest status of client(9)
11 | 10 | hot | 26 | 4/8/12
12 | 10 | dealed | 26 | 10/8/12 <- lastest status of client(10)
13 | 13 | dealed | 26 | 8/8/12 <- lastest status of client(13)
14 | 11 | hot | 25 | 8/8/12
15 | 11 | dealed | 25 | 14/8/12 <- lastest status of client(11)
I want to produce the User Progressive Report that shows the latest progress of how each user follow up his/her clients.
Therefore this report must show number of each user's clients group by each of their latest status.
The correct output should be looked like this..
user_id | act_status | Count(act_status)
1 | dealed | 2
1 | hot | 1
22 | cold | 1
25 | dealed | 1
26 | hot | 1
26 | dealed | 2
The SQL query that I had is below:
select user_id, act_status, count(act_Status)
from your_table
where act_date in (
select max(act_date)
from your_table
group by Client_id
)
group by user_id, act_status
which when i add more transactions to the database, the output went wrong like this..
user_id | act_status | Count(act_status)
1 | dealed | 2
1 | hot | 1
22 | cold | 1
25 | hot | 1 ** (this one shouldn't show up)
25 | dealed | 1
26 | hot | 2 ** (should be only '1')
26 | dealed | 2
Tried and tested:
SELECT S.user_id, S.act_status, Count(S.act_status) AS CountOfact_status
FROM tbl_sActivity AS S INNER JOIN [SELECT A.client_id, Max(A.act_date) AS max_act
FROM tbl_sActivity AS A
GROUP BY A.client_id]. AS S2 ON (S.act_date = S2.max_act) AND (S.client_id = S2.client_id)
GROUP BY S.user_id, S.act_status
A similar request to an answer I gave yesterday, please try and read through my query as well as just using it, this will hopefully help you understand and is the only way to learn : )
Will it not be better if you use the primary key column: act_id, instead of using the date column.
Thus, getting the max primary key?
What I meant was, instead of this:
where act_date in (
select max(act_date)
from your_table
group by Client_id
)
something like this:
where act_id in (
select max(act_id)-- for each user, can't think of syntax now, but this could point you into the right direction
)
BugFinder's answer looks better.
select user_id, act_status, count(act_Status)
from your_table left join
(select act_date in (
select max(act_date)
from your_table
group by Client_id
) as t where t.client_id = your_table.client_id and t.act_date = your_table.act_date
group by user_id, act_status
Should give you what you wanted
Eg make a table of max dates by client, then link it back to your table using both fields to be precise.