SQL Count by equal columns Query - sql

I have this table:
subscriberID | date | segmentID | Counter
------------------------------------------
1 | 1.1 | 2 | 3
1 | 2.1 | 4 | 2
1 | 3.1 | 4 | 5
2 | 1.1 | 1 | 12
2 | 2.1 | 1 | 1
2 | 3.1 | 2 | 10
3 | 1.1 | 2 | 4
I have to write SQL Query that does:
Get the top 3 most common segmentID's (by counter) for a given subscriberID.
can anyone help me with that?
Thanks.

select segmentID
from your_table
where subscriberID = 123
group by segmentID
order by sum(counter) desc
To get only 3 records you have to limit your result. Depending on your DB engine that could be top 3 or limit 3 or rownum <= 3.

Related

SQL : Get the 3 first occurrences of a field

I have a PostgreSQL table with 2 fields like the following. Field A is the primary key.
A | B
------
1 | 1
2 | 1
3 | 1
4 | 1
5 | 2
6 | 2
7 | 2
8 | 2
9 | 2
10 | 3
11 | 3
I'm looking for a request to get only the 3 first occurrences of B, like this:
A | B
1 | 1
2 | 1
3 | 1
5 | 2
6 | 2
7 | 2
10 | 3
11 | 3
Does somebody have a solution?
You want row_number() :
select t.*
from (select t.*, row_number() over (partition by b order by a) as seq
from table t
) t
where seq <= 3;

hive - split a row into multiple rows between the range of values

I have a table below and would like to split the rows by the range from start to end columns.
i.e id and value should repeat for each value between start & end(both inclusive)
--------------------------------------
id | value | start | end
--------------------------------------
1 | 5 | 1 | 4
2 | 8 | 5 | 9
--------------------------------------
Desired output
--------------------------------------
id | value | current
--------------------------------------
1 | 5 | 1
1 | 5 | 2
1 | 5 | 3
1 | 5 | 4
2 | 8 | 5
2 | 8 | 6
2 | 8 | 7
2 | 8 | 8
2 | 8 | 9
--------------------------------------
I can write my own UDF in java/python to get this result but would like to check if I can implement in Hive SQL using any existing hive UDFs
Thanks in advance.
This can be accomplished with a recursive common table expression, which Hive doesn't support.
One option is to create a table of numbers and use it to generate rows between start and end.
create table numbers
location 'hdfs_location' as
select row_number() over(order by somecolumn) as num
from some_table --this can be any table with the desired number of rows
;
--Join it with the existing table
select t.id,t.value,n.num as current
from tbl t
join numbers n on n.num>=t.start and n.num<=t.end
You can do using posexplode() UDF.
WITH
data AS (
SELECT 1 AS id, 5 AS value, 1 AS start, 4 AS `end`
UNION ALL
SELECT 2 AS id, 8 AS value, 5 AS start, 9 AS `end`
)
SELECT distinct id, value, (zr.start+rge.diff) as `current`
FROM data zr LATERAL VIEW posexplode(split(space(zr.`end`-zr.start),' ')) rge as diff, x
Here is its Output:
+-----+--------+----------+--+
| id | value | current |
+-----+--------+----------+--+
| 1 | 5 | 1 |
| 1 | 5 | 2 |
| 1 | 5 | 3 |
| 1 | 5 | 4 |
| 2 | 8 | 5 |
| 2 | 8 | 6 |
| 2 | 8 | 7 |
| 2 | 8 | 8 |
| 2 | 8 | 9 |
+-----+--------+----------+--+

How to select if similar field count is the maximum in the table?

I want to select from a table if row counts of similar filed is maximum depends on other columns.
As example
| user_id | team_id | isOk |
| 1 | 1 | 1 |
| 2 | 1 | 1 |
| 3 | 1 | 1 |
| 4 | 1 | 1 |
| 5 | 2 | 1 |
| 6 | 2 | 1 |
| 7 | 2 | 1 |
| 8 | 3 | 1 |
| 9 | 3 | 1 |
| 10 | 3 | 1 |
| 11 | 3 | 0 |
So i want to select team 1 and 2 because they all have 1 value at isOk Column,
i tried to use this query
SELECT Team
FROM _Table1
WHERE isOk= 1
GROUP BY Team
HAVING COUNT(*) > 3
But still i have to define a row count which can be maximum or not.
Thanks in advance.
Is this what you are looking for?
select team
from _table1
group by team
having min(isOk) = 1;

How to get the max row count grouped by the ID in sql

Say I have this table:
(column: Row is a count based on the column ID)
ID | Row | State |
1 | 1 | CA |
1 | 2 | AK |
2 | 1 | KY |
2 | 2 | GA |
2 | 3 | FL |
3 | 1 | WY |
3 | 2 | HI |
3 | 3 | NY |
3 | 4 | DC |
4 | 1 | RI |
I'd like to generate a new column that would have the highest number in the Row column grouped by the ID column for each row. How would I accomplish this? I've been messing around with MAX(), GROUP BY, and some partitioning but I'm getting different errors each time. It's difficult to finesse this correctly. Here's my target output:
ID | Row | State | MaxRow
1 | 1 | CA | 2
1 | 2 | AK | 2
2 | 1 | KY | 3
2 | 2 | GA | 3
2 | 3 | FL | 3
3 | 1 | WY | 4
3 | 2 | HI | 4
3 | 3 | NY | 4
3 | 4 | DC | 4
4 | 1 | RI | 1
Use window version of MAX:
SELECT ID, Row, State, MAX(Row) OVER (PARTITION BY ID) AS MaxRow
FROM mytable
Demo here
You could join between a query on the table and an aggregate table:
SELECT t.*, max_row
FROM t
JOIN (SELECT id, MAX([row]) AS max_row
FROM t
GROUP BY id) agg ON t.id = agg.id
You can create first a query using group by id and max to get the highest number. Then use this query as a sub query and use the id to inner join.
Then use the max column from the sub query to obtain your final result.

SQL - Select distinct on two column

I have this table 'words' with more information:
+---------+------------+-----------
| ID |ID_CATEGORY | ID_THEME |
+---------+------------+-----------
| 1 | 1 | 1
| 2 | 1 | 1
| 3 | 1 | 1
| 4 | 1 | 2
| 5 | 1 | 2
| 6 | 1 | 2
| 7 | 2 | 3
| 8 | 2 | 3
| 9 | 2 | 3
| 10 | 2 | 4
| 11 | 2 | 4
| 12 | 3 | 5
| 13 | 3 | 5
| 14 | 3 | 6
| 15 | 3 | 6
| 16 | 3 | 6
And this query that gives to me 3 random ids from different categories, but not from different themes too:
SELECT Id
FROM words
GROUP BY Id_Category, Id_Theme
ORDER BY RAND()
LIMIT 3
What I want as result is:
+---------+------------+-----------
| ID |ID_CATEGORY | ID_THEME |
+---------+------------+-----------
| 2 | 1 | 1
| 7 | 2 | 3
| 14 | 3 | 6
That is, repeat no category or theme.
When you use GROUP BY you cannot include in the select list a column which is not being ordered. So, in your query it's impossible to inlcude Id in the select list.
So you need to do something a bit more complex:
SELECT Id_Category, Id_Theme,
(SELECT Id FROM Words W
WHERE W.Id_Category = G.Id_Category AND W.Id_Theme = G.Id_Theme
ORDER BY RAND() LIMIT 1
) Id
FROM Words G
GROUP BY Id_Category, Id_Theme
ORDER BY RAND()
LIMIT 3
NOTE: the query groups by the required columns, and the subselect is used to take a random Id from all the possible Ids in the group. Then main query is filtered to take three random rows.