SQL - Select distinct on two column

SQL - Select distinct on two column - sql

I have this table 'words' with more information:
+---------+------------+-----------
| ID |ID_CATEGORY | ID_THEME |
+---------+------------+-----------
| 1 | 1 | 1
| 2 | 1 | 1
| 3 | 1 | 1
| 4 | 1 | 2
| 5 | 1 | 2
| 6 | 1 | 2
| 7 | 2 | 3
| 8 | 2 | 3
| 9 | 2 | 3
| 10 | 2 | 4
| 11 | 2 | 4
| 12 | 3 | 5
| 13 | 3 | 5
| 14 | 3 | 6
| 15 | 3 | 6
| 16 | 3 | 6
And this query that gives to me 3 random ids from different categories, but not from different themes too:
SELECT Id
FROM words
GROUP BY Id_Category, Id_Theme
ORDER BY RAND()
LIMIT 3
What I want as result is:
+---------+------------+-----------
| ID |ID_CATEGORY | ID_THEME |
+---------+------------+-----------
| 2 | 1 | 1
| 7 | 2 | 3
| 14 | 3 | 6
That is, repeat no category or theme.

When you use GROUP BY you cannot include in the select list a column which is not being ordered. So, in your query it's impossible to inlcude Id in the select list.
So you need to do something a bit more complex:
SELECT Id_Category, Id_Theme,
(SELECT Id FROM Words W
WHERE W.Id_Category = G.Id_Category AND W.Id_Theme = G.Id_Theme
ORDER BY RAND() LIMIT 1
) Id
FROM Words G
GROUP BY Id_Category, Id_Theme
ORDER BY RAND()
LIMIT 3
NOTE: the query groups by the required columns, and the subselect is used to take a random Id from all the possible Ids in the group. Then main query is filtered to take three random rows.

Related

SQL How to summarize integer/numeric values on different rows

I am trying to merge integer and numeric values from different SQL rows within the same table into one row so that they are summarized.
| ID | Count | Total Payment
1 | 1 | 5 | 10.99
2 | 1 | 3 | 4.86
3 | 2 | 8 | 19.88
4 | 2 | 2 | 15.99
5 | 2 | 5 | 8.45
6 | 3 | 4 | 12.98
7 | 3 | 10 | 40.42
As such I want to summarize the above rows into the below rows.
| ID | Count | Total Payment
1 | 1 | 8 | 15.85
2 | 2 | 15 | 44.32
3 | 3 | 14 | 53.40
How do I do this?

Thank you HonyBadger and Mathieu Guindon.
The correct code was:
SELECT [id], SUM([count]), SUM([total_payment])
FROM [table_name]
GROUP BY [id]
ORDER BY [count], [total_payment];

Add ID to duplicates in two columns

I have a table representing trade exchanges between cities and I'd like to add an id that would indicate groups of same origin/destination and destination/origin alike.
For example:
| origin | destination
|--------|------------
| 8 | 2
| 2 | 8
| 8 | 2
| 8 | 5
| 8 | 5
| 9 | 1
| 1 | 9
would become:
| id | origin | destination
|----|--------|------------
| 0 | 8 | 2
| 0 | 2 | 8
| 0 | 8 | 2
| 1 | 8 | 5
| 1 | 8 | 5
| 2 | 9 | 1
| 2 | 1 | 9
I can have same origin/destination but I can also have origin/destination = destination/origin and I want all of those groups identified.

One way: with the window function dense_rank() and GREATEST / LEAST:
SELECT dense_rank() OVER (ORDER BY GREATEST(origin, destination)
, LEAST (origin, destination)) - 1 AS id
, origin, destination
FROM trade;
db<>fiddle here
- 1 to start with 0 like your example.

hive - split a row into multiple rows between the range of values

I have a table below and would like to split the rows by the range from start to end columns.
i.e id and value should repeat for each value between start & end(both inclusive)
--------------------------------------
id | value | start | end
--------------------------------------
1 | 5 | 1 | 4
2 | 8 | 5 | 9
--------------------------------------
Desired output
--------------------------------------
id | value | current
--------------------------------------
1 | 5 | 1
1 | 5 | 2
1 | 5 | 3
1 | 5 | 4
2 | 8 | 5
2 | 8 | 6
2 | 8 | 7
2 | 8 | 8
2 | 8 | 9
--------------------------------------
I can write my own UDF in java/python to get this result but would like to check if I can implement in Hive SQL using any existing hive UDFs
Thanks in advance.

This can be accomplished with a recursive common table expression, which Hive doesn't support.
One option is to create a table of numbers and use it to generate rows between start and end.
create table numbers
location 'hdfs_location' as
select row_number() over(order by somecolumn) as num
from some_table --this can be any table with the desired number of rows
;
--Join it with the existing table
select t.id,t.value,n.num as current
from tbl t
join numbers n on n.num>=t.start and n.num<=t.end

You can do using posexplode() UDF.
WITH
data AS (
SELECT 1 AS id, 5 AS value, 1 AS start, 4 AS `end`
UNION ALL
SELECT 2 AS id, 8 AS value, 5 AS start, 9 AS `end`
)
SELECT distinct id, value, (zr.start+rge.diff) as `current`
FROM data zr LATERAL VIEW posexplode(split(space(zr.`end`-zr.start),' ')) rge as diff, x
Here is its Output:
+-----+--------+----------+--+
| id | value | current |
+-----+--------+----------+--+
| 1 | 5 | 1 |
| 1 | 5 | 2 |
| 1 | 5 | 3 |
| 1 | 5 | 4 |
| 2 | 8 | 5 |
| 2 | 8 | 6 |
| 2 | 8 | 7 |
| 2 | 8 | 8 |
| 2 | 8 | 9 |
+-----+--------+----------+--+

SQL order by but repeat crescent numbers

I'm using SQL Server 2014 and i'm having a trouble with a query.
I have this scenario bellow:
| Number | Series | Name |
|--------|--------|---------|
| 9 | 1 | Name 1 |
| 5 | 3 | Name 2 |
| 8 | 2 | Name 3 |
| 7 | 3 | Name 4 |
| 0 | 1 | Name 5 |
| 1 | 2 | Name 6 |
| 9 | 2 | Name 7 |
| 3 | 3 | Name 8 |
| 4 | 1 | Name 9 |
| 0 | 1 | Name 10 |
and I need to get it ordered by series column like this:
| Number | Series | Name |
|--------|--------|---------|
| 9 | 1 | Name 1 |
| 8 | 2 | Name 3 |
| 5 | 3 | Name 2 |
| 7 | 1 | Name 5 |
| 1 | 2 | Name 6 |
| 0 | 3 | Name 4 |
| 4 | 1 | Name 9 |
| 9 | 2 | Name 7 |
| 3 | 3 | Name 8 |
| 0 | 1 | Name 10 |
Actually is more a sequency in "series" column than an ordenation.
1,2,3 again 1,2,3...
Somebody could help me?

You can do this using the ANSI standard function row_number():
select number, series, name
from (select t.*, row_number() over (partition by series order by number) as seqnum
from t
) t
order by seqnum, series;
This assigns "1" to the first record for each series, "2" to the second, and so on. The outer order by then puts all the "1"s together, all the "2" together. This has the effect of interleaving the values of the series.

How to select if similar field count is the maximum in the table?

I want to select from a table if row counts of similar filed is maximum depends on other columns.
As example
| user_id | team_id | isOk |
| 1 | 1 | 1 |
| 2 | 1 | 1 |
| 3 | 1 | 1 |
| 4 | 1 | 1 |
| 5 | 2 | 1 |
| 6 | 2 | 1 |
| 7 | 2 | 1 |
| 8 | 3 | 1 |
| 9 | 3 | 1 |
| 10 | 3 | 1 |
| 11 | 3 | 0 |
So i want to select team 1 and 2 because they all have 1 value at isOk Column,
i tried to use this query
SELECT Team
FROM _Table1
WHERE isOk= 1
GROUP BY Team
HAVING COUNT(*) > 3
But still i have to define a row count which can be maximum or not.
Thanks in advance.

Is this what you are looking for?
select team
from _table1
group by team
having min(isOk) = 1;

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SQL - Select distinct on two column - sql

Related

SQL How to summarize integer/numeric values on different rows

Add ID to duplicates in two columns

hive - split a row into multiple rows between the range of values

SQL order by but repeat crescent numbers

How to select if similar field count is the maximum in the table?

Categories

Resources