source destination table without duplication - sql

i m trying to create a new data table using the table below :
ID City Language
1 Paris French
2 New York English
3 Delhi English
4 Berlin German
5 Marseille French
6 Hamburg German
the ouput should be something like this:
City 1 City2 Language
Paris Marseille French
NY Delhi English
Berlin Hamburg German
the main idea here is to avoid 2 rows with same city like for example Paris-Marseille and Marseille-Paris.
please advise on how to achieve it.

Do you just want aggregation?
select min(city), max(city), language
from t
group by language;

If there are only two rows per language, then simple aggregation is enough:
select min(city) city1, max(city) city2, language
from mytable
group by language
If you want to possibly handle more cities, and/or control the order in which they appear in the columns based on the id of the intial row, then you can use window functions and conditional aggregation:
select
max(case when rn = 1 then city end) city1,
max(case when rn = 2 then city end) city2,
max(case when rn = 3 then city end) city3
from (
select t.*, row_number() over(partition by language order by id) rn
from mytable t
) t
group by language

Related

How to check how many times some values are duplicated?

I have table like below:
city | segment
------------------
London | A
London | B
New York | A
Berlin | B
Barcelona | C
Barcelona | H
Barcelona | E
Each city should have only one segment, but as you can see there are two cities (London and Barcelona) that have more than one segment.
It is essential that in result table I need only these cities which have > 1 segmnet
As a result I need somethig like below:
city - city based on table above
no_segments - number of segments which have defined city based on table above
segments - segments of defined city based on table above
city
no_segments
segments
London
2
A
B
Barcelona
3
C
H
E
How can I do that in Oracle?
You can use COUNT(*) OVER ()(in order to get number of segments) and ROW_NUMBER()(in order to prepare the results those will be conditionally displayed) analytic functions such as
WITH t1 AS
(
SELECT city,
segment,
COUNT(*) OVER (PARTITION BY city) AS no_segments,
ROW_NUMBER() OVER (PARTITION BY city ORDER BY segment) rn
FROM t
)
SELECT DECODE(rn,1,city) AS city,
DECODE(rn,1,no_segments) AS no_segments,
segment
FROM t1
WHERE no_segments > 1
ORDER BY t1.city, segment
Demo
Another way to do this is:
SELECT NULLIF(CITY, PREV_CITY) AS CITY,
SEGMENT
FROM (SELECT CITY,
LAG(CITY) OVER (ORDER BY CITY DESC) AS PREV_CITY,
SEGMENT,
COUNT(SEGMENT) OVER (PARTITION BY CITY) AS CITY_SEGMENT_COUNT
FROM CITY_SEGMENTS)
WHERE CITY_SEGMENT_COUNT > 1
Using LAG() to determine the "previous" CITY allows us to directly compare the CITY values, which in my mind is clearer that using ROW_NUMBER = 1.
db<>fiddle here
;with cte as (
Select city, count(seg) as cntseg
From table1
Group by city having count(seg) > 1
)
Select a.city, b.cntseg, a.seg
From table1 as a join cte as b
On a.city = b.city

which command should i use on SQL for having?

I want your help.
I have 2 columns with data, customer and city.
I want to show customers a mix if this customer has more different cities, and I want to show the customer if it has only 1 city.
for example, I have this data:
customer city
ana London
Ella London
Sarah Paris
Haidi Greece
Chloe France
ana London
Ella france
I want to show it like this:
Ana London
Ella Mix
Sarah Paris
Haidi Greece
Chloe France
How could I do this? Which command should I use?
Here you go.
Select
Customer,
case when count(DISTINCT city) > 1 then 'MIX' Else max(City) End as City
from MyTable
Group by Customer
You could use a command like before:
Select
Customer,
case when count(city) > 1 then 'MIX' Else City End as City
from Mytable
Group by Customer, City
but Ana is present twice in the table, and you have to use the Max to solve
Select
Customer,
case when count(city) > 1 then 'MIX' Else MAX(City) End as City
from Mytable
Group by Customer
If you wanted NULL instead of 'Mix', you could use:
select customer, nullif(min(city), max(city)) as city
from t
group by customer;
You can actually extend this to get 'Mix':
select customer, coalesce(nullif(min(city), max(city)), 'Mix') as city
from t
group by customer;
But NULL makes sense to me.
Here's what I think you're looking for
select customer,
case when count(*)>1
then 'mix'
else max(city) end as city
from t
group by customer;

I'd like some help to write sql code to return a list of customer data items ranked by frequency (high to low)

The table I am querying has several thousand rows and numerous fields - I'd like the code to return the top 10 values for a handful of the fields, namely: Forename, Surname and City - I'd also like to see a count of the values returned.
For example
Ranking
Forename
FName Frequency
Surname
SName Frequency
City
City Frequency
1
Liam
830,091
Smith
2,353,709
New York
2,679,785
2
Mary
708,390
Johnson
1,562,990
Los Angeles
413,359
3
Noah
639,592
Williams
792,306
Chicago
393,511
4
Patricia
568,410
Brown
743,346
Houston
367,496
5
William
557,049
Jones
633,933
Phoenix
336,929
6
Linda
497,138
Miller
503,523
Philadelphia
304,638
7
James
490,665
Davis
503,115
San Antonio
255,142
8
Barbara
418,312
Garcia
468,683
San Diego
238,521
9
Logan
399,947
Rodriguez
461,816
Dallas
232,718
10
Elizabeth
399,737
Wilson
436,843
San Jose
213,483
The returned list should be interpreted thus:
The most frequently occurring forename in the table is Liam - with 830,091 instances,
The 5th most frequently occurring forename is William - with 557,049 instances,
The 8th most frequently occurring city is San Diego - with 238,521 instances
...and so on
(N.b. the table does not show there are 2.7m Liams in New York - just that there are 830,091 Liams in the entire table - and that there are 2,679,785 New York addresses in the entire table)
The following produces what I need - but just for the first field (Forename) - I'd like to be able to do the same for three fields
SELECT Forename, COUNT(Forename) AS FName_Frequency
FROM Customer_Table
GROUP BY Forename
ORDER BY FName_Frequency DESC
limit 10
Thanks in anticipation
I would just put this in separate rows:
select 'forename', forename, count(*) as freq
from customer_table
group by forename
order by freq desc
fetch first 10 rows only
union all
select 'surname', surname, count(*) as freq
from customer_table
group by surname
order by freq desc
fetch first 10 rows only
union all
select 'city', city, count(*) as freq
from customer_table
group by city
order by freq desc
fetch first 10 rows only;
Note that this uses Standard SQL syntax, because you have not tagged with the question with the database you are using. You can also put this in separate columns, using:
select max(case when which = 'forename' then col end),
max(case when which = 'forename' then freq end),
max(case when which = 'surname' then col end),
max(case when which = 'surname' then freq end),
max(case when which = 'city' then col end),
max(case when which = 'city' then freq end)
from ((select 'forename' as which, forename as col, count(*) as freq,
row_number() over (order by count(*) desc) as seqnum
from customer_table
group by forename
) union all
(select 'surname' as which, surname, count(*) as freq
row_number() over (order by count(*) desc) as seqnum
from customer_table
group by surname
) union all
(select 'city', city, count(*) as freq,
row_number() over (order by count(*) desc) as seqnum
from customer_table
group by city
)
) x
group by seqnum;

SELECT Top 1 ID, DISTINCT Field

I have a table sample table as follows:
ID | City
--------------
1 | New York
2 | San Francisco
3 | New York
4 | Los Angeles
5 | Atlanta
I would like to select the distinct City AND the TOP ID for each. E.g., conceptually I would like to do the following
SELECT TOP 1 ID, DISTINCT City
FROM Cities
Should give me:
ID | City
--------------
1 | New York
2 | San Francisco
4 | Los Angeles
5 | Atlanta
Because New York appears twice, it's taken the first ID 1 in this instance.
But I get the error:
Column 'Cities.ID' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
Try this way:
SELECT min(ID), City
FROM Cities
Group by City
MIN function is used for choose one of the ID from two New York cities.
You need to have your city in a GROUP BY
SELECT MIN(ID), City
FROM Cities
GROUP BY City
More general solution is to use row_number in order to get other details of table:
select * from
(select *, row_number() over(partition by City order by ID) as rn from Cities)
where rn = 1
But for this particular table just grouping will do the work:
select City, Min(ID) as ID
from Cities
group by City
If you have a complex scenario where Group By cannot use, You could use Row_Number() function with Common Table Expression.
;WITH CTE AS
(
SELECT ID, City, ROW_NUMBER() OVER (PARTITION BY City ORDER BY Id) rn
FROM YourTable
)
SELECT Id, City
FROM CTE
WHERE rn = 1

SQL Query - SELECT distinct IDs with 2 extra column

Im working in an SQL Query like this: (sorted by the station visits)
TRAIN_ID TYPE STATION
111 'KC' New York
111 'KC' Washington
111 'KC' Boston
111 'KC' Denver
222 'FC' London
222 'FC' Paris
I'd like to SELECT distinct trains, and actual row must include the first and the last station like:
TRAIN_ID TYPE FIRSTSTATION LASTSTATION
111 'KC' New York Denver
222 'FC' Denver Paris
Anyone can give a hand? Thank you in anticipation!
Assuming you find something to define an order on the stations so that you can identify the "last" and "first" one, the following should work:
WITH numbered_stations AS (
SELECT train_id,
type,
row_number() over (partition by train_id order by some_order_column) as rn,
count(*) over (partition by train_id) as total_stations
FROM the_unknown_table
)
SELECT f.train_id,
f.type,
f.station as first_station,
l.station as last_station
FROM (SELECT train_id,
type
station
FROM numbered_stations
WHERE rn = 1
) f
JOIN (SELECT train_id,
type,
station
FROM numbered_stations
WHERE rn = total_stations) l
ON f.train_id = l.train_id
ORDER BY train_id
This assumes that some_order_column can be used to identify the last and first station.
It also assumes that the type is always the same for all combinations of train_id and station.
The shown syntax is standard ANSI SQL and should work on most modern DBMS.