Concatenate distinct values in a group by [duplicate] - sql

This question already has answers here:
How to concatenate strings of a string field in a PostgreSQL 'group by' query?
(14 answers)
Closed last year.
I have data like this:
Group Provider
A ABC
A DEF
B DEF
B HIJ
And I want to transform the data like this:
Group ProviderList
A ABC, DEF
B DEF, HIJ
I was trying something like this using a concat(select distinct...) but not sure if this is the best approach
SELECT distinct
group,
CONCAT(select distinct provider from data)
FROM data
GROUP BY 1

What Laurenz meant with string_agg() is the following
SELECT
group,
STRING_AGG(Provider,',') as ProviderList
FROM data
GROUP BY 1
Optionally you could also use:
STRING_AGG(provider,',' order by Provider)

Related

It's possible to select distinct and no distinct in Pyspark? [duplicate]

This question already has answers here:
How to get distinct rows in dataframe using pyspark?
(2 answers)
Closed 2 years ago.
I need to select 2 columns from a fact table (attached below). The problem I find is that for one of the columns I need unique values and for the other one I'm happy to have them duplicated as they below to a specific ticket id.
Fact table used:
df = (
spark.table(f'nn_table_{country}.fact_table')
.filter(f.col('date_key').between(start_date,end_date))
.filter(f.col('is_client_plus')==1)
.filter(f.col('source')=='tickets')
.filter(f.col('subtype')=='item_pm')
.filter(f.col('external_id')=='DISC0000077144 | DISC0000076895')
.filter(f.col('external_id').isNotNull())
.select('customer_id','external_id').distinct()
#.join(dim_promotions, 'external_id', 'left')
)
display(df)
As you can see, the select statement contains a customer_id and external_id column, where I'm only interested in get the unique customer_id.
.select('customer_id','external_id').distinct()
Desired output:
customer_id external_id
77000000505097070 DISC0000077144
77000002294023644 DISC0000077144
77000000385346302 DISC0000076895
77000000291101490 DISC0000076895
any idea about how to do that? or if it's possible?
Thanks in advance!
Use dropDuplicates:
df.select('customer_id','external_id').dropDuplicates(['customer_id'])

Is any way to use IN operator to specify pair of values in SQL? [duplicate]

This question already has answers here:
SQL multiple columns in IN clause
(6 answers)
Closed 3 years ago.
In Oracle SQL I need to filter records by pair of multiple values using SELECT query. There is a FIRST_ID and SECOND_ID and I want to have data filtered by only specif pair.
I tried using CONCAT in first way, next i prepared a lot of pairs with OR operator, but both ways need a lot of manual works.
select *
from table_data
where to_char(first_id||;||second_id) in ('123;354', '422;563', ... '353;536');
or
select *
from table_data
where (first_id = 123 and second_id = 354)
or (first_id = 422 and second_id = 563)
or (first_id = 353 and second_id = 536);
So, You see that I cant'use two IN operators (one for first_id, second for second_id) because it will give a result for all crosing pairs like 123 - 254, 123 - 562 & 123-536 etc. Any ideas how to do it fast and easy?
Oracle supports IN with tuples:
select *
from table_data
where (first_id, second_id) in ( (123, 354), (422, 563), (353, 536));

Alternative for GROUP BY and STUFF in SQL

I am writing some SQL queries in AWS Athena. I have 3 tables search, retrieval and intent. In search table I have 2 columns id and term i.e.
id term
1 abc
1 bcd
2 def
1 ghd
What I want is to write a query to get:
id term
1 abc, bcd, ghd
2 def
I know this can be done using STUFF and FOR XML PATH but, in Athena all the features of SQL are yet not supported. Is there any other way to achieve this. My current query is:
select search.id , STUFF(
(select ',' + search.term
from search
FOR XML PATH('')),1,1,'')
FROM search
group by search.id
Also, I have one more question. I have retrieval table that consist of 3 columns i.e.:
id time term
1 0 abc
1 20 bcd
1 100 gfh
2 40 hfg
2 60 lkf
What I want is:
id time term
1 100 gfh
2 60 lkf
I want to write a query to get the id and term on the basis of max value of time. Here is my current query:
select retrieval.id, max(retrieval.time), retrieval.term
from search
group by retrieval.id, retrieval.term
order by max(retrieval.time)
I am getting duplicate id's along with the term. I think it is because, I am doing group by on id and term both. But, I am not sure how can I achieve it without using group by.
The XML method is brokenness in SQL Server. No reason to attempt it in any other database.
One method uses arrays:
select s.id, array_agg(s.term)
from search s
group by s.id;
Because the database supports arrays, you should learn to use them. You can convert the array to a string:
select s.id, array_join(array_agg(s.term), ',') as terms
from search s
group by s.id;
Group by is a group operation: think that you are clubbing the results and have to find min, max, count etc.
I am answering only one question. Use it to find the answer to question 1
For question 2:
select
from (select id, max(time) as time
from search
group by id, term
order by max(time)
) search_1, search as search_2
where search_1.id = search_2.id
and search_1.time = search_2.time

SQL SELECT group multiple rows together when LISTAGG and WM_CONCAT are not available [duplicate]

This question already has answers here:
SQL Query to concatenate column values from multiple rows in Oracle
(10 answers)
Closed 8 years ago.
I'm having trouble explaining this, so if someone can make adjustments to the title or question then please do.
I have a simple SQL query that I'm running
SELECT orders.customer_no, orders.order_no FROM orders WHERE orders.creation = '01-JAN-14';
resulting in
customer_no order_no
----------- ----------
0 8051729
2 2809137
2 3794827
3 1934678
3 9237192
6 3462890
6 3131378
6 6267190
6 2864952
6 1325645
but what I want is
customer_no order_no
----------- ----------
0 8051729
2 2809137 3794827
3 1934678 9237192
6 3462890 3131378 6267190 2864952 1325645
Is it possible to do something like this direct within SQL?
Edit: Using Oracle8i Enterprise Edition Release 8.1.7.4.0 - Production.
I believe you want:
select orders.customer_no, listagg(orders.order_no, ' ') within group (order by orders.order_no) orders.order_no
from orders
WHERE orders.creation = '01-JAN-14'
group by orders.customer_no;
In MySQL you would want the GROUP_CONCAT function, which is roughly LISTAGG in Oracle, according to:
Is there any function in oracle similar to group_concat in mysql?
Based on Oracle: Way to aggregate concatenate an ungrouped column in grouped results, you can try something like:
WITH j
AS (SELECT customer_no, order_no
FROM orders
WHERE creation = '01-JAN-14')
SELECT RTRIM (
EXTRACT (SYS_XMLAGG (XMLELEMENT ("X", order_no || ' ')), '/ROWSET/X/text()').getstringval (),
', ')
FROM j
GROUP BY customer_no;

Concatenate multiple result rows of one column into one, group by another column [duplicate]

This question already has answers here:
How to concatenate strings of a string field in a PostgreSQL 'group by' query?
(14 answers)
Closed 9 years ago.
I'm having a table like this
Movie Actor
A 1
A 2
A 3
B 4
I want to get the name of a movie and all actors in that movie, and I want the result to be in a format like this:
Movie ActorList
A 1, 2, 3
How can I do it?
Simpler with the aggregate function string_agg() (Postgres 9.0 or later):
SELECT movie, string_agg(actor, ', ') AS actor_list
FROM tbl
GROUP BY 1;
The 1 in GROUP BY 1 is a positional reference and a shortcut for GROUP BY movie in this case.
string_agg() expects data type text as input. Other types need to be cast explicitly (actor::text) - unless an implicit cast to text is defined - which is the case for all other string types (varchar, character, name, ...) and some other types.
As isapir commented, you can add an ORDER BY clause in the aggregate call to get a sorted list - should you need that. Like:
SELECT movie, string_agg(actor, ', ' ORDER BY actor) AS actor_list
FROM tbl
GROUP BY 1;
But it's typically faster to sort rows in a subquery. See:
Create array in SELECT
You can use array_agg function for that:
SELECT "Movie",
array_to_string(array_agg(distinct "Actor"),',') AS Actor
FROM Table1
GROUP BY "Movie";
Result:
MOVIE
ACTOR
A
1,2,3
B
4
See this SQLFiddle
For more See 9.18. Aggregate Functions