Add ID to duplicates in two columns - sql

I have a table representing trade exchanges between cities and I'd like to add an id that would indicate groups of same origin/destination and destination/origin alike.
For example:
| origin | destination
|--------|------------
| 8 | 2
| 2 | 8
| 8 | 2
| 8 | 5
| 8 | 5
| 9 | 1
| 1 | 9
would become:
| id | origin | destination
|----|--------|------------
| 0 | 8 | 2
| 0 | 2 | 8
| 0 | 8 | 2
| 1 | 8 | 5
| 1 | 8 | 5
| 2 | 9 | 1
| 2 | 1 | 9
I can have same origin/destination but I can also have origin/destination = destination/origin and I want all of those groups identified.

One way: with the window function dense_rank() and GREATEST / LEAST:
SELECT dense_rank() OVER (ORDER BY GREATEST(origin, destination)
, LEAST (origin, destination)) - 1 AS id
, origin, destination
FROM trade;
db<>fiddle here
- 1 to start with 0 like your example.

Related

Update where value pair matches in SQL

I need to update this table:
Centers:
+-----+------------+---------+--------+
| id | country | process | center |
+-----+------------+---------+--------+
| 1 | 1 | 1 | 1 |
| 2 | 1 | 2 | 1 |
| 3 | 1 | 3 | 1 |
| 4 | 2 | 1 | 1 |
| 5 | 2 | 2 | 1 |
| 6 | 2 | 3 | 1 |
| 7 | 3 | 1 | 1 |
| 8 | 3 | 2 | 1 |
| 9 | 3 | 3 | 1 |
+-----+------------+---------+--------+
During a selection process I retrieve two tempTables:
TempCountries:
+-----+------------+
| id | country |
+-----+------------+
| 1 | 1 |
| 2 | 3 |
+-----+------------+
And TempProcesses:
+-----+------------+
| id | process |
+-----+------------+
| 1 | 2 |
| 2 | 3 |
+-----+------------+
In a subquery I get all possible combinations of the values:
SELECT TempCountries.countryId, TempProcesses.processesId FROM TempCenterCountries,TempCenterProcesses
This returns:
+-----+------------+---------+
| id | country | process |
+-----+------------+---------+
| 1 | 1 | 2 |
| 2 | 1 | 3 |
| 3 | 3 | 2 |
| 4 | 3 | 3 |
+-----+------------+---------+
During the selection process the user chooses a center for these combinations. Let’s say center = 7.
Now I need to update the center value in the Centers table where the combinations of the subquery are present.
So,
UPDATE Centers SET center = 7 WHERE ?
So I get:
+-----+------------+---------+--------+
| id | country | process | center |
+-----+------------+---------+--------+
| 1 | 1 | 1 | 1 |
| 2 | 1 | 2 | 7 |
| 3 | 1 | 3 | 7 |
| 4 | 2 | 1 | 1 |
| 5 | 2 | 2 | 1 |
| 6 | 2 | 3 | 1 |
| 7 | 3 | 1 | 1 |
| 8 | 3 | 2 | 7 |
| 9 | 3 | 3 | 7 |
+-----+------------+---------+--------+
Not all sql implementations let you have a from clause when using update. Fortunately in your case since you're doing a Cartesian product to get all the combinations it implies that you don't have any constraints between the two values.
UPDATE Centers
SET center = 7
WHERE country IN (SELECT countryId FROM TempCountries)
AND process IN (SELECT processId FROM TempCenterProcesses)
Try if this standard sql,
Update Centers
set center = 7
where country in (select country from TempCenterCountries)
and process in (select process from TempCenterProcesses)
You need to have exact match of country as well as process before you run the update query. So, something like below query would help you achieve that. Basically update the column if there exists a record
WITH (SELECT TempCountries.countryId, TempProcesses.processesId
FROM TempCenterCountries,
TempCenterProcesses) AS TempTables,
UPDATE Centers
SET center = 7
WHERE EXISTS (SELECT 1
FROM TempTables tmp
WHERE country = tmp.countryId and process = tmp.processesId
);
The idea is to update the record if both country and process matches with the one you have already fetched in temporary table.
Use update join -
For Sql Server
update c set SET center = 7 from Centers c
join
(SELECT TempCountries.countryId, TempProcesses.processesId FROM TempCenterCountries join TempCenterProcesses
)A on c.countryid=A.countryid and c.processesId=A.processId
For Mysql -
update Centers c
join
(SELECT TempCountries.countryId, TempProcesses.processesId FROM TempCenterCountries join TempCenterProcesses
)A on c.countryid=A.countryid and c.processesId=A.processId
set SET center = 7

hive - split a row into multiple rows between the range of values

I have a table below and would like to split the rows by the range from start to end columns.
i.e id and value should repeat for each value between start & end(both inclusive)
--------------------------------------
id | value | start | end
--------------------------------------
1 | 5 | 1 | 4
2 | 8 | 5 | 9
--------------------------------------
Desired output
--------------------------------------
id | value | current
--------------------------------------
1 | 5 | 1
1 | 5 | 2
1 | 5 | 3
1 | 5 | 4
2 | 8 | 5
2 | 8 | 6
2 | 8 | 7
2 | 8 | 8
2 | 8 | 9
--------------------------------------
I can write my own UDF in java/python to get this result but would like to check if I can implement in Hive SQL using any existing hive UDFs
Thanks in advance.
This can be accomplished with a recursive common table expression, which Hive doesn't support.
One option is to create a table of numbers and use it to generate rows between start and end.
create table numbers
location 'hdfs_location' as
select row_number() over(order by somecolumn) as num
from some_table --this can be any table with the desired number of rows
;
--Join it with the existing table
select t.id,t.value,n.num as current
from tbl t
join numbers n on n.num>=t.start and n.num<=t.end
You can do using posexplode() UDF.
WITH
data AS (
SELECT 1 AS id, 5 AS value, 1 AS start, 4 AS `end`
UNION ALL
SELECT 2 AS id, 8 AS value, 5 AS start, 9 AS `end`
)
SELECT distinct id, value, (zr.start+rge.diff) as `current`
FROM data zr LATERAL VIEW posexplode(split(space(zr.`end`-zr.start),' ')) rge as diff, x
Here is its Output:
+-----+--------+----------+--+
| id | value | current |
+-----+--------+----------+--+
| 1 | 5 | 1 |
| 1 | 5 | 2 |
| 1 | 5 | 3 |
| 1 | 5 | 4 |
| 2 | 8 | 5 |
| 2 | 8 | 6 |
| 2 | 8 | 7 |
| 2 | 8 | 8 |
| 2 | 8 | 9 |
+-----+--------+----------+--+

SQL order by but repeat crescent numbers

I'm using SQL Server 2014 and i'm having a trouble with a query.
I have this scenario bellow:
| Number | Series | Name |
|--------|--------|---------|
| 9 | 1 | Name 1 |
| 5 | 3 | Name 2 |
| 8 | 2 | Name 3 |
| 7 | 3 | Name 4 |
| 0 | 1 | Name 5 |
| 1 | 2 | Name 6 |
| 9 | 2 | Name 7 |
| 3 | 3 | Name 8 |
| 4 | 1 | Name 9 |
| 0 | 1 | Name 10 |
and I need to get it ordered by series column like this:
| Number | Series | Name |
|--------|--------|---------|
| 9 | 1 | Name 1 |
| 8 | 2 | Name 3 |
| 5 | 3 | Name 2 |
| 7 | 1 | Name 5 |
| 1 | 2 | Name 6 |
| 0 | 3 | Name 4 |
| 4 | 1 | Name 9 |
| 9 | 2 | Name 7 |
| 3 | 3 | Name 8 |
| 0 | 1 | Name 10 |
Actually is more a sequency in "series" column than an ordenation.
1,2,3 again 1,2,3...
Somebody could help me?
You can do this using the ANSI standard function row_number():
select number, series, name
from (select t.*, row_number() over (partition by series order by number) as seqnum
from t
) t
order by seqnum, series;
This assigns "1" to the first record for each series, "2" to the second, and so on. The outer order by then puts all the "1"s together, all the "2" together. This has the effect of interleaving the values of the series.

How to select if similar field count is the maximum in the table?

I want to select from a table if row counts of similar filed is maximum depends on other columns.
As example
| user_id | team_id | isOk |
| 1 | 1 | 1 |
| 2 | 1 | 1 |
| 3 | 1 | 1 |
| 4 | 1 | 1 |
| 5 | 2 | 1 |
| 6 | 2 | 1 |
| 7 | 2 | 1 |
| 8 | 3 | 1 |
| 9 | 3 | 1 |
| 10 | 3 | 1 |
| 11 | 3 | 0 |
So i want to select team 1 and 2 because they all have 1 value at isOk Column,
i tried to use this query
SELECT Team
FROM _Table1
WHERE isOk= 1
GROUP BY Team
HAVING COUNT(*) > 3
But still i have to define a row count which can be maximum or not.
Thanks in advance.
Is this what you are looking for?
select team
from _table1
group by team
having min(isOk) = 1;

SQL - Select distinct on two column

I have this table 'words' with more information:
+---------+------------+-----------
| ID |ID_CATEGORY | ID_THEME |
+---------+------------+-----------
| 1 | 1 | 1
| 2 | 1 | 1
| 3 | 1 | 1
| 4 | 1 | 2
| 5 | 1 | 2
| 6 | 1 | 2
| 7 | 2 | 3
| 8 | 2 | 3
| 9 | 2 | 3
| 10 | 2 | 4
| 11 | 2 | 4
| 12 | 3 | 5
| 13 | 3 | 5
| 14 | 3 | 6
| 15 | 3 | 6
| 16 | 3 | 6
And this query that gives to me 3 random ids from different categories, but not from different themes too:
SELECT Id
FROM words
GROUP BY Id_Category, Id_Theme
ORDER BY RAND()
LIMIT 3
What I want as result is:
+---------+------------+-----------
| ID |ID_CATEGORY | ID_THEME |
+---------+------------+-----------
| 2 | 1 | 1
| 7 | 2 | 3
| 14 | 3 | 6
That is, repeat no category or theme.
When you use GROUP BY you cannot include in the select list a column which is not being ordered. So, in your query it's impossible to inlcude Id in the select list.
So you need to do something a bit more complex:
SELECT Id_Category, Id_Theme,
(SELECT Id FROM Words W
WHERE W.Id_Category = G.Id_Category AND W.Id_Theme = G.Id_Theme
ORDER BY RAND() LIMIT 1
) Id
FROM Words G
GROUP BY Id_Category, Id_Theme
ORDER BY RAND()
LIMIT 3
NOTE: the query groups by the required columns, and the subselect is used to take a random Id from all the possible Ids in the group. Then main query is filtered to take three random rows.