SQLite: How to fetch the ORDER index of COUNT - sql

I have an SQLite database that contains a list of user messages in a group.
And I want to get a user's "rank" by counting the number of messages they had sent.
Currently I'm doing this
SELECT user_id, COUNT(*) as count
FROM message
group by user_id
ORDER BY count DESC
It'd return something like this:
-
user_id
count
1
2072040132
61877
2
1609505732
40514
3
1543287045
34735
4
203349203
30570
5
842634673
29651
6
1702633101
29185
7
1978947042
27728
8
1929648593
27025
9
1069841429
17944
10
1437208364
17344
11
...
...
Like user 1609505732 is top 2, and 1702633101 is top 6.
But my database has more than 2 million rows, and this is too slow having to fetch all of the list.
I was wondering if there are any way that I can fetch only the order of it.
Like this:
-
user_id
order
count
1
1702633101
6
61877
And the user with id=1702633101 is top 6. That'd be a lot faster.
Thanks for spending time on my question, I can't seem to find the answer anywhere on the internet.

To improve query speed, I'd consider physicalising the aggregate view, example below:
CREATE Table as tbl_aggregate()
Id INTEGER PRIMARY KEY AUTOINCREMENT
, user_id NVARCHAR
, count INT;
INSERT INTO tbl_aggregate
SELECT user_id, COUNT(*) as count
FROM message
group by user_id
ORDER BY count DESC;
Select * from tbl_aggregate
Where Id = 6
Select top 10 * from tbl_aggregate

Related

Selecting all rows that have a value in the top N rows

Consider this simple table of event counts:
event_name
count
viewLoaded
20
viewUnloaded
17
buttonTapped
12
viewScrolled
12
networkSuccess
9
linkTapped
9
networkFailure
2
leapSecond
0
I would like to select the top N events by count, but with the additional requirement that if the result set includes any event with a particular count, then it should include all of the events with that count. In other words, I don’t want to break up any of the “groups” of rows that have the same count. Instead, I will potentially get more rows than I asked for.
For example, if I wanted the “top five” events in the table above, the query would actually return six rows so that both events with count 9 were included. The query for the top four would return four rows, and the query for the top three would also return four rows.
How can I accomplish this in SQLite?
You can use the RANK window function for this task. It's ranking value will be equal for the identical values, but will consider the amount of past rows when needs to assign the next ranking.
WITH cte AS (
SELECT *, RANK() OVER(ORDER BY count_ DESC) AS rn
FROM events
)
SELECT event_name, count_ FROM cte WHERE rn <= 5
Check the demo here.
One way is to use a common table expression to identify the counts corresponding to the “top five” events:
with top_five as (
select count from events order by count desc limit 5
)
select * from events where count in top_five order by count desc;

How to get top 10 from one column and sort by another column in hive?

I want to find top 10 title with high number of user ids. So I used query like
select title,count(userid) as users from combined_moviedata group by title order by users desc limit 10
But i need to sort them based on title, I tried this query
select title,count(userid) as users from combined_moviedata group by title order by users desc,title asc limit 10
But it doesnot sort them. Merely returned same results. How to do this
The answer from #KaushikNayak is very close to what I'd consider the "right" answer.
At one level, work out what your top 10 records are
At a different level, sort them by a different field
The only thing I'd say is that if the 10th and 11th most common titles are tied for the same count, they should generally also be included in the results. This is a RANK().
WITH
ranked_titles AS
(
SELECT
RANK() OVER (ORDER BY COUNT(*) DESC) frequency_rank,
title
FROM
combined_moviedata
GROUP BY
title
)
SELECT
*
FROM
ranked_titles
WHERE
frequency_rank <= 10
ORDER BY
title
;
http://sqlfiddle.com/#!6/7283c/1
Note that in the example linked, 12 rows are returned. That is because 4 titles are all tied for the 9th most frequent, and it is actually impossible to determine which two should be selected in preference over the others. In this case selecting 10 rows would normally be statistically incorrect.
title frequency frequency_rank
title06 2 9
title07 2 9
title08 2 9
title09 2 9
title10 3 6
title11 3 6
title12 3 6
title13 4 4
title14 4 4
title15 5 2
title16 5 2
title17 6 1
You could make use of a WITH clause
with t AS
(
select title,count(userid) as users from combined_moviedata
group by title
order by users desc limit 10
)
select * FROM t ORDER BY title ;

Specific Column is not showing up, when it should?

I don't know what's going on with my code here. I am trying to return the highest number of calls received by particular phone numbers, and find the top 7 numbers by calls received, but I am only getting the count column in my results. The code is:
SELECT COUNT (call_id) FROM call_test
GROUP BY receiver_id
ORDER BY COUNT(call_id) DESC
LIMIT 7;
But all it is returning is:
COUNT(call_id)
3
2
2
2
2
1
1
I think my code is right, but how do you show the particular numbers that correspond to the respective counts? This is SQLPro for MAC.
Is it as simple as including the number in the select?
SELECT receiver_id, COUNT(call_id)
FROM call_test
GROUP BY receiver_id
ORDER BY COUNT(call_id) DESC
LIMIT 7;

Get MAX() on repeating IDs

This is how my query results look like currently. How can I get the MAX() value for each unique id ?
IE,
for 5267139 is 8.
for 5267145 is 4
5267136 5
5267137 8
5267137 2
5267139 8
5267139 5
5267139 3
5267141 4
5267141 3
5267145 4
5267145 3
5267146 1
5267147 2
5267152 3
5267153 3
5267155 8
SELECT DISTINCT st.ScoreID, st.ScoreTrackingTypeID
FROM ScoreTrackingType stt
LEFT JOIN ScoreTracking st
ON stt.ScoreTrackingTypeID = st.ScoreTrackingTypeID
ORDER BY st.ScoreID, st.ScoreTrackingTypeID DESC
GROUP BY will partition your table into separate blocks based on the column(s) you specify. You can then apply an aggregate function (MAX in this case) against each of the blocks -- this behavior applies by default with the below syntax:
SELECT First_column, MAX(Second_column) AS Max_second_column
FROM Table
GROUP BY First_column
EDIT: Based on the query above, it looks like you don't really need the ScoreTrackingType table at all, but leaving it in place, you could use:
SELECT st.ScoreID, MAX(st.ScoreTrackingTypeID) AS ScoreTrackingTypeID
FROM ScoreTrackingType stt
LEFT JOIN ScoreTracking st ON stt.ScoreTrackingTypeID = st.ScoreTrackingTypeID
GROUP BY st.ScoreID
ORDER BY st.ScoreID
The GROUP BY will obviate the need for DISTINCT, MAX will give you the value you are looking for, and the ORDER BY will still apply, but since there will only be a single ScoreTrackingTypeID value for each ScoreID you can pull it out of the ordering.

Progressive count using a query?

I use this query to
SELECT userId, submDate, COUNT(submId) AS nSubms
FROM submissions
GROUP BY userId, submDate
ORDER BY userId, submDate
obtain the total number of submissions per user per date.
However I need to have the progressive count for every user so I can see how their submissions accumulate over time.
Is this possible to implement in a query ?
EDIT: The obtained table looks like this :
userId submDate nSubms
1 2-Feb 1
1 4-Feb 7
2 1-Jan 4
2 2-Jan 2
2 18-Jan 1
I want to produce this :
userId submDate nSubms progressive
1 2-Feb 1 1
1 4-Feb 7 8
2 1-Jan 4 4
2 2-Jan 2 6
2 18-Jan 1 7
EDIT 2 : Sorry for not mentioning it earlier, I am not allowed to use :
Stored procedure calls
Update/Delete/Insert/Create queries
Unions
DISTINCT keyword
as I am using a tool that doesn't allow those.
You can use a self-join to grab all the rows of the same table with a date before the current row:
SELECT s0.userId, s0.submDate, COUNT(s0.submId) AS nSubms, COUNT (s1.submId) AS progressive
FROM submissions AS s0
JOIN submissions AS s1 ON s1.userId=s0.userId AND s1.submDate<=s0.submDate
GROUP BY s0.userId, s0.submDate
ORDER BY s0.userId, s0.submDate
This is going to force the database to do a load of pointless work counting all the same rows again and again though. It would be better to just add up the nSubms as you go down in whatever script is calling the query, or in an SQL variable, if that's available in your environment.
The Best solution for this is to do it at the client.
It's the right tool for the job. Databases are not suited for this kind of task
Select S.userId, S.submDate, Count(*) As nSubms
, (Select Count(*)
From submissions As S1
Where S1.userid = S.userId
And S1.submDate <= S.submDate) As TotalSubms
From submissions As S
Group By S.userid, S.submDate
Order By S.userid, S.submDate