SQL : Get the 3 first occurrences of a field - sql

I have a PostgreSQL table with 2 fields like the following. Field A is the primary key.
A | B
------
1 | 1
2 | 1
3 | 1
4 | 1
5 | 2
6 | 2
7 | 2
8 | 2
9 | 2
10 | 3
11 | 3
I'm looking for a request to get only the 3 first occurrences of B, like this:
A | B
1 | 1
2 | 1
3 | 1
5 | 2
6 | 2
7 | 2
10 | 3
11 | 3
Does somebody have a solution?

You want row_number() :
select t.*
from (select t.*, row_number() over (partition by b order by a) as seq
from table t
) t
where seq <= 3;

Related

Generating Duplicate Data Series

This example I am trying to generate and add column 1 to 5 as many number as I want. Can I solve with "Connect By" function or another function?
SELECT level
FROM dual
CONNECT BY level <=5;
ID Name Expected Outcome
----- ---- ---------------
1 | A | 1
2 | B | 2
3 | C | 3
4 | D | 4
5 | E | 5
6 | F | 1
7 | G | 2
8 | G | 3
9 | A | 4
10 | E | 5
11 | E | 1
12 | E | 2
Use the MOD function:
SELECT MOD(level - 1,5) + 1
FROM dual
CONNECT BY level <=20;
1
2
3
4
5
1
2
3
4
5
1
2
3
4
5
1
2
3
4
5

hive - split a row into multiple rows between the range of values

I have a table below and would like to split the rows by the range from start to end columns.
i.e id and value should repeat for each value between start & end(both inclusive)
--------------------------------------
id | value | start | end
--------------------------------------
1 | 5 | 1 | 4
2 | 8 | 5 | 9
--------------------------------------
Desired output
--------------------------------------
id | value | current
--------------------------------------
1 | 5 | 1
1 | 5 | 2
1 | 5 | 3
1 | 5 | 4
2 | 8 | 5
2 | 8 | 6
2 | 8 | 7
2 | 8 | 8
2 | 8 | 9
--------------------------------------
I can write my own UDF in java/python to get this result but would like to check if I can implement in Hive SQL using any existing hive UDFs
Thanks in advance.
This can be accomplished with a recursive common table expression, which Hive doesn't support.
One option is to create a table of numbers and use it to generate rows between start and end.
create table numbers
location 'hdfs_location' as
select row_number() over(order by somecolumn) as num
from some_table --this can be any table with the desired number of rows
;
--Join it with the existing table
select t.id,t.value,n.num as current
from tbl t
join numbers n on n.num>=t.start and n.num<=t.end
You can do using posexplode() UDF.
WITH
data AS (
SELECT 1 AS id, 5 AS value, 1 AS start, 4 AS `end`
UNION ALL
SELECT 2 AS id, 8 AS value, 5 AS start, 9 AS `end`
)
SELECT distinct id, value, (zr.start+rge.diff) as `current`
FROM data zr LATERAL VIEW posexplode(split(space(zr.`end`-zr.start),' ')) rge as diff, x
Here is its Output:
+-----+--------+----------+--+
| id | value | current |
+-----+--------+----------+--+
| 1 | 5 | 1 |
| 1 | 5 | 2 |
| 1 | 5 | 3 |
| 1 | 5 | 4 |
| 2 | 8 | 5 |
| 2 | 8 | 6 |
| 2 | 8 | 7 |
| 2 | 8 | 8 |
| 2 | 8 | 9 |
+-----+--------+----------+--+

How to find a ranges of sequential numbers without gaps in a table

I am trying to find ranges of numbers without a certain value within a table grouped by a different identifier.
If i were to have a table like this:
ID | Type | Bad Value | Bad Value 2
4 | a | 0 | 0
5 | a | 0 | 0
6 | a | 0 | 0
7 | a | 0 | 1
8 | a | 1 | 0
9 | a | 0 | 0
2 | b | 0 | 0
3 | b | 0 | 0
4 | b | 1 | 0
5 | b | 1 | 1
6 | b | 0 | 0
7 | b | 0 | 0
6 | c | 0 | 0
7 | c | 0 | 1
8 | c | 1 | 0
9 | c | 0 | 0
I would like to get an output like this:
FROM | TO | Group
4 | 6 | a
9 | 9 | a
2 | 3 | b
6 | 7 | b
6 | 6 | c
9 | 9 | c
I found a similar solution here but none of them work in Oracle. I get an error missing expression error.
Is there a way to go about doing this? The table in question will have several hundred thousand entries in it.
You need to identify groups that are the same. There is a trick to this, which is a difference of row numbers.
select min(id) as fromid, max(id) as toid, type
from (select t.*,
(row_number() over (partition by type order by id) -
row_number() over (partition by type, badvalue order by id)
) as grp
from table t
) grp
where badvalue = 0
group by grp, type;
There is a nuance here, because you only seem to want rows where "bad value" is 0. Note that this condition goes in the outer select, so it doesn't interfere with the row_number() calculations.

SQL - Select distinct on two column

I have this table 'words' with more information:
+---------+------------+-----------
| ID |ID_CATEGORY | ID_THEME |
+---------+------------+-----------
| 1 | 1 | 1
| 2 | 1 | 1
| 3 | 1 | 1
| 4 | 1 | 2
| 5 | 1 | 2
| 6 | 1 | 2
| 7 | 2 | 3
| 8 | 2 | 3
| 9 | 2 | 3
| 10 | 2 | 4
| 11 | 2 | 4
| 12 | 3 | 5
| 13 | 3 | 5
| 14 | 3 | 6
| 15 | 3 | 6
| 16 | 3 | 6
And this query that gives to me 3 random ids from different categories, but not from different themes too:
SELECT Id
FROM words
GROUP BY Id_Category, Id_Theme
ORDER BY RAND()
LIMIT 3
What I want as result is:
+---------+------------+-----------
| ID |ID_CATEGORY | ID_THEME |
+---------+------------+-----------
| 2 | 1 | 1
| 7 | 2 | 3
| 14 | 3 | 6
That is, repeat no category or theme.
When you use GROUP BY you cannot include in the select list a column which is not being ordered. So, in your query it's impossible to inlcude Id in the select list.
So you need to do something a bit more complex:
SELECT Id_Category, Id_Theme,
(SELECT Id FROM Words W
WHERE W.Id_Category = G.Id_Category AND W.Id_Theme = G.Id_Theme
ORDER BY RAND() LIMIT 1
) Id
FROM Words G
GROUP BY Id_Category, Id_Theme
ORDER BY RAND()
LIMIT 3
NOTE: the query groups by the required columns, and the subselect is used to take a random Id from all the possible Ids in the group. Then main query is filtered to take three random rows.

SQL Count by equal columns Query

I have this table:
subscriberID | date | segmentID | Counter
------------------------------------------
1 | 1.1 | 2 | 3
1 | 2.1 | 4 | 2
1 | 3.1 | 4 | 5
2 | 1.1 | 1 | 12
2 | 2.1 | 1 | 1
2 | 3.1 | 2 | 10
3 | 1.1 | 2 | 4
I have to write SQL Query that does:
Get the top 3 most common segmentID's (by counter) for a given subscriberID.
can anyone help me with that?
Thanks.
select segmentID
from your_table
where subscriberID = 123
group by segmentID
order by sum(counter) desc
To get only 3 records you have to limit your result. Depending on your DB engine that could be top 3 or limit 3 or rownum <= 3.