Aggregation with limit in hive is not working as expected - hive

I am trying to count users by channel and need only 3 rows in return.
But it seems like the result is less aggregated than an actual number.
Does anybody know why I cannot use 'group by' and 'limit' at the same time?
select count(users) as cnt
from user_table
group by channel
limit 3
;
/*
channel cnt
a 39
b 27
c 16
*/
select count(users) as cnt
from user_table
where channel = 'a'
;
/*
channel cnt
a 2057
*/
Why do those two queries have different results?

Not sure why it shows different counts for the same groups... maybe your example is oversimplified, but to make results deterministic LIMIT should be used along with order by
For example:
select count(users) as cnt
from user_table
group by channel
order by cnt desc --top counts first for example
limit 3

Related

Getting result basis on count of another SQL query

I have a table with the following columns:
bkng_date
bkng_id (varchar)
villa_id (varchar)
This query
select bkng_date,count(*) as cnt
from tab_bkng_det
group by bkng_date;
returns the no.of records for each date as count.
Now I need to find dates in the resultset of this query where cnt = 2.
I tried a couple of subqueries but I'm not getting the desired results.
The simplest, correct and safe solution is adding having count(*) = 2 clause as Gordon said.
For completeness, if you were curious how to solve it using subqueries (you didn't provide your db vendor though it's very likely your vendor supports having clause), it would be:
select x.bkng_date, x.cnt from (
select bkng_date,count(*) as cnt
from tab_bkng_det
group by bkng_date
) x
where x.cnt = 2
or
with x as (
select bkng_date,count(*) as cnt
from tab_bkng_det
group by bkng_date
)
select * from x where cnt = 2
Best Option is to use the Having Clause as follows,
select bkng_date,count(*) as cnt
from tab_bkng_det
group by bkng_date
having count(*) = 2

SQLite LIMIT OFFSET and WHERE clause

I have table Test1 => ID(INT), NAME(VARCHAR) having
values like (1,'One'), (2,'two') ..... (51,'Fifty-one')
I want sum of ID of last 5 rows whose ID is divisible by 5. I tried following query but not getting any output:
SELECT SUM(ID) FROM Test1 WHERE id%5 = 0 LIMIT 5 OFFSET (SELECT COUNT(*) FROM Test1)
So answer should be 50+45+40+35+30=200
You should never use LIMIT without ORDER BY. Only with ORDER BY is the order in your result set guaranteed and only then LIMIT makes sense.
Moreover you use SUM without a GROUP BY. That gives you one result row. Then you use LIMIT on your results, which is still one result row.
And what is the offset supposed to do? You want to start after the last record in the table? That doesn't seem to make sense.
Here is the query with ORDER BY and SUM after LIMIT:
select sum(id)
from
(
select id
from test1
where id % 5 = 0
order by id desc
limit 5
) last5;

Replace nested query to single select query

Consider the table fields as follows.
Appid Client_name is_real RTT
100 C1 1 1
200 C1 1 6
200 C2 1 7
100 C1 1 9
200 C1 0 7
Now I need total number of unique real Appid's in the table. We can say one appid record is real by if 'is_real' is 1.
In above table, we have only 3 real Appid's. Which are (100,C1), (200,C1) and (200, C2).
Postgesql command:
Select sum(r)
from (select count(is_real) as r from table group by Appid, Client_name) as t;
I don't want any recursive query. If you can fetch with single select query, it would be helpful.
Since you seem to define a unique id by (Appid, Client_name) (which is confusing, since you are mixing terms):
SELECT COUNT(DISTINCT (Appid, Client_name)) AS ct
FROM tbl
WHERE is_real = 1;
(Appid, Client_name) is a row-type expression, short for ROW(Appid, Client_name). Only distinct combinations are counted.
Another trick to get this done without subquery is to use a window function:
SELECT DISTINCT count(*) OVER () AS ct
FROM tbl
WHERE is_real = 1
GROUP BY Appid, Client_name;
But neither is going to be faster than using a subquery (which is not a recursive query):
SELECT count(*) AS ct
FROM (
SELECT 1
FROM tbl
WHERE is_real = 1
GROUP BY Appid, Client_name
) sub;
That's what I would use.
It's essential to understand the sequence of events in a SELECT query:
Best way to get result count before LIMIT was applied
total number of unique real Appid's in the table
I assume is_real is 1 = true, 0 = false.
SELECT COUNT(DISTINCT Appid)
FROM table
WHERE is_real = 1;

SQL Function to count data

Say I have 100 records in a certain column in a certain table.
All of these pieces of data in that column are random numbers from 1 to 10
What SQL function can I use to count the number that appears the most within those 100 records and it will display that number alone in the column?
How do I do this? Thanks
Assuming you're using mysql (because of question tags):
SELECT n
FROM tablename
GROUP BY n
ORDER BY COUNT(*) DESC
LIMIT 1
Try a query like this to get the count:
select count(*)
from t
group by col
order by count(*) desc
limit 1
This is MySQL syntax. The limit 1 is database-specific. In SQL Server, for instance, it would be select top 1.
And this to get the number in the column:
select col
from t
group by col
order by count(*) desc
limit 1

adding count( ) column on each row

I'm not sure if this is even a good question or not.
I have a complex query with lot's of unions that searches multiple tables for a certain keyword (user input). All tables in which there is searched are related to the table book.
There is paging on the resultset using LIMIT, so there's always a maximum of 10 results that get withdrawn.
I want an extra column in the resultset displaying the total amount of results found however. I do not want to do this using a separate query. Is it possible to add a count() column to the resultset that counts every result found?
the output would look like this:
ID Title Author Count(...)
1 book_1 auth_1 23
2 book_2 auth_2 23
4 book_4 auth_.. 23
...
Thanks!
This won't add the count to each row, but one way to get the total count without running a second query is to run your first query using the SQL_CALC_FOUND_ROWS option and then select FOUND_ROWS(). This is sometimes useful if you want to know how many total results there are so you can calculate the page count.
Example:
select SQL_CALC_FOUND_ROWS ID, Title, Author
from yourtable
limit 0, 10;
SELECT FOUND_ROWS();
From the manual:
http://dev.mysql.com/doc/refman/5.1/en/information-functions.html#function_found-rows
The usual way of counting in a query is to group on the fields that are returned:
select ID, Title, Author, count(*) as Cnt
from ...
group by ID, Title, Author
order by Title
limit 1, 10
The Cnt column will contain the number of records in each group, i.e. for each title.
Regarding second query:
select tbl.id, tbl.title, tbl.author, x.cnt
from tbl
cross join (select count(*) as cnt from tbl) as x
If you will not join to other table(s):
select tbl.id, tbl.title, tbl.author, x.cnt
from tbl, (select count(*) as cnt from tbl) as x
My Solution:
SELECT COUNT(1) over(partition BY text) totalRecordNumber
FROM (SELECT 'a' text, id_consult_req
FROM consult_req cr);
If your problem is simply the speed/cost of doing a second (complex) query I would suggest you simply select the resultset into a hash-table and then count the rows from there while returning, or even more efficiently use the rowcount of the previous resultset, then you do not even have to recount
This will add the total count on each row:
select count(*) over (order by (select 1)) as Cnt,*
from yourtable
Here is your answare:
SELECT *, #cnt count_rows FROM (
SELECT *, (#cnt := #cnt + 1) row_number FROM your_table
CROSS JOIN (SELECT #cnt := 0 AS variable) t
) t;
You simply cannot do this, you'll have to use a second query.