merging multiple rows into one based on id - sql

i have the data in this format in an amazon redshift database:
id
answer
1
house
1
apple
1
moon
1
money
2
123
2
xyz
2
abc
and what i am looking for would be:
id
answer
1
house, apple, moon, money
2
123, xyz, abc
any idea? the thing is that i cannot hard code the answers as they will be variable, so preferably a solution that would simply scoop the answers for each id's row and put them together separated by a delimiter.

you can use aggregate function listagg:
select id , listagg(answer,',')
from table
group by id

You can use string_agg(concat(answer,''),',') with group by so it will be like that:
select id , string_agg(concat(answer,''),',') as answer
from table
group by id
tested here
Edit:
you don't need concatenate, you can just use string_agg(answer,',')

Related

Alternative for GROUP BY and STUFF in SQL

I am writing some SQL queries in AWS Athena. I have 3 tables search, retrieval and intent. In search table I have 2 columns id and term i.e.
id term
1 abc
1 bcd
2 def
1 ghd
What I want is to write a query to get:
id term
1 abc, bcd, ghd
2 def
I know this can be done using STUFF and FOR XML PATH but, in Athena all the features of SQL are yet not supported. Is there any other way to achieve this. My current query is:
select search.id , STUFF(
(select ',' + search.term
from search
FOR XML PATH('')),1,1,'')
FROM search
group by search.id
Also, I have one more question. I have retrieval table that consist of 3 columns i.e.:
id time term
1 0 abc
1 20 bcd
1 100 gfh
2 40 hfg
2 60 lkf
What I want is:
id time term
1 100 gfh
2 60 lkf
I want to write a query to get the id and term on the basis of max value of time. Here is my current query:
select retrieval.id, max(retrieval.time), retrieval.term
from search
group by retrieval.id, retrieval.term
order by max(retrieval.time)
I am getting duplicate id's along with the term. I think it is because, I am doing group by on id and term both. But, I am not sure how can I achieve it without using group by.
The XML method is brokenness in SQL Server. No reason to attempt it in any other database.
One method uses arrays:
select s.id, array_agg(s.term)
from search s
group by s.id;
Because the database supports arrays, you should learn to use them. You can convert the array to a string:
select s.id, array_join(array_agg(s.term), ',') as terms
from search s
group by s.id;
Group by is a group operation: think that you are clubbing the results and have to find min, max, count etc.
I am answering only one question. Use it to find the answer to question 1
For question 2:
select
from (select id, max(time) as time
from search
group by id, term
order by max(time)
) search_1, search as search_2
where search_1.id = search_2.id
and search_1.time = search_2.time

Select query to fetch required data from SQL table

I have some data like this as shown below:
Acc_Id || Row_No
1 1
2 1
2 2
2 3
3 1
3 2
3 3
3 4
and I need a query to get the results as shown below:
Acc_Id || Row_No
1 1
2 3
3 4
Please consider that I'm a beginner in SQL.
I assume you want the Count of the row
SELECT Acc_Id, COUNT(*)
FROM Table
GROUP BY Acc_Id
Try this:
select Acc_Id, MAX(Row_No)
from table
group by Acc_Id
As a beginner then this is your first exposure to aggregation and grouping. You may want to look at the documentation on group by now that this problem has motivated your interest in a solutions. Grouping operates by looking at rows with common column values, that you specify, and collapsing them into a single row which represents the group. In your case values in Acc_Id are the names for your groups.
The other answers are both correct in the the final two columns are going to be equivalent with your data.
select Acc_Id, count(*), max(Row_No)
from T
group by Acc_Id;
If you have gaps in the numbering then they won't be the same. You'll have to decide whether you're actually looking for a count of rows of a maximum of a value within a column. At this point you can also consider a number of other aggregate functions that will be useful to you in the future. (Note that the actual values here are pretty much meaningless in this context.)
select Acc_Id, min(Row_No), sum(Row_No), avg(Row_No)
from T
group by Acc_Id;

How to get the biggest column value between duplicated rows id?

I am working on an Oracle 11g database query that needs to retrieve a list of the highest NUM value between duplicated rows in a table.
Here is an example of my context:
ID | NUM
------------
1 | 1111
1 | 2222
2 | 3333
2 | 4444
3 | 5555
3 | 6666
And here is the result I am expecting after the query is executed:
NUM
----
2222
4444
6666
I know how to get the GREATEST value in a list of numbers, but I have absolutely no guess on how to group two lines, fetch the biggest column value between them IF they have the same ID.
Programmaticaly it is something quite easy to achieve, but using SQL it tends to be a litle bit less intuitive for me. Any suggestion or advise is welcomed as I don't even know which function could help me doing this in Oracle.
Thank you !
This is the typical use case for a GROUP BY. Assuming your Num field can be compared:
SELECT ID, MAX(NUM) as Max
FROM myTable
GROUP BY ID
If you don't want to select the ID (as in the output you provided), you can run
SELECT Max
FROM (
SELECT ID, MAX(NUM) as Max
FROM myTable
GROUP BY ID
) results
And here is the SQL fiddle
Edit : if NUM is, as you mentioned later, VARCHAR2, then you have to cast it to an Int. See this question.
The most efficient way I would suggest is
SELECT ids,
value
FROM (SELECT ids,
value,
max(value)
over (
PARTITION BY ids) max_value
FROM test)
WHERE value = max_value;
This requires that the query maintain a single value per id of the maximum value encountered so far. If a new maximum is found then the existing value is modified, otherwise the new value is discarded. The total number of elements that have to be held in memory is related to the number of ids, not the number of rows scanned.
See this SQLFIDDLE

Count particular substring text within column

I have a Hive table, titled 'UK.Choices' with a column, titled 'Fruit', with each row as follows:
AppleBananaAppleOrangeOrangePears
BananaKiwiPlumAppleAppleOrange
KiwiKiwiOrangeGrapesAppleKiwi
etc.
etc.
There are 2.5M rows and the rows are much longer than the above.
I want to count the number of instances that the word 'Apple' appears.
For example above, it is:
Number of 'Apple'= 5
My sql so far is:
select 'Fruit' from UK.Choices
Then in chunks of 300,000 I copy and paste into Excel, where I'm more proficient and able to do this using formulas. Problem is, it takes upto an hour and a half to generate each chunk of 300,000 rows.
Anyone know a quicker way to do this bypassing Excel? I can do simple things like counts using where clauses, but something like the above is a little beyond me right now. Please help.
Thank you.
I think I am 2 years too late. But since I was looking for the same answer and I finally managed to solve it, I thought it was a good idea to post it here.
Here is how I do it.
Solution 1:
+-----------------------------------+---------------------------+-------------+-------------+
| Fruits | Transform 1 | Transform 2 | Final Count |
+-----------------------------------+---------------------------+-------------+-------------+
| AppleBananaAppleOrangeOrangePears | #Banana#OrangeOrangePears | ## | 2 |
| BananaKiwiPlumAppleAppleOrange | BananaKiwiPlum##Orange | ## | 2 |
| KiwiKiwiOrangeGrapesAppleKiwi | KiwiKiwiOrangeGrapes#Kiwi | # | 1 |
+-----------------------------------+---------------------------+-------------+-------------+
Here is the code for it:
SELECT length(regexp_replace(regexp_replace(fruits, "Apple", "#"), "[A-Za-z]", "")) as number_of_apples
FROM fruits;
You may have numbers or other special characters in your fruits column and you can just modify the second regexp to incorporate that. Just remember that in hive to escape a character you may need to use \\ instead of just one \.
Solution 2:
SELECT size(split(fruits,"Apple"))-1 as number_of_apples
FROM fruits;
This just first split the string using "Apple" as a separator and makes an array. The size function just tells the size of that array. Note that the size of the array is one more than the number of separators.
This is straight-forward if you have any delimiter ( eg: comma ) between the fruit names. The idea is to split the column into an array, and explode the array into multiple rows using the 'explode' function.
SELECT fruit, count(1) as count FROM
( SELECT
explode(split(Fruit, ',')) as fruit
FROM UK.Choices ) X
GROUP BY fruit
From your example, it looks like fruits are delimited by Capital letters. One idea is to split the column based on capital letters, assuming there are no fruits with same suffix.
SELECT fruit_suffix, count(1) as count FROM
( SELECT
explode(split(Fruit, '[A-Z]')) as fruit_suffix
FROM UK.Choices ) X
WHERE fruit_suffix <> ''
GROUP BY fruit_suffix
The downside is that, the output will not have first letter of the fruit,
pple - 5
range - 4
I think you want to run in one select, and use the Hive if UDF to sum for the different cases. Something like the following...
select sum( if( fruit like '%Apple%' , 1, 0 ) ) as apple_count,
sum( if( fruit like '%Orange%', 1, 0 ) ) as orange_count
from UK.Choices
where ID > start and ID < end;
instead of a join in the above query.
No experience of Hive, I'm afraid, so this may or may not work. But on SQLServer, Oracle etc I'd do something like this:
Assuming that you have an int PK called ID on the row, something along the lines of:
select AppleCount, OrangeCount, AppleCount - OrangeCount score
from
(
select count(*) as AppleCount
from UK.Choices
where ID > start and ID < end
and Fruit like '%Apple%'
) a,
(
select count(*) as OrangeCount
from UK.Choices
where ID > start and ID < end
and Fruit like '%Orange%'
) o
I'd leave the division by the total count to the end, when you have all the rows in the spreadsheet and can count them there.
However, I'd urgently ask my boss to let me change the Fruit field to be a table with an FK to Choices and one fruit name per row. Unless this is something you can't do in Hive, this design is something that makes kittens cry.
PS I'd missed that you wanted the count of occurances of Apple which this won't do. I'm leaving my answer up, because I reckon that my However... para is actually a good answer. :(

Conditioning on multiple rows in a column in Teradata

Suppose I have a table that looks like this:
id attribute
1 football
1 NFL
1 ball
2 football
2 autograph
2 nfl
2 blah
2 NFL
I would like to get a list of distinct ids where the attribute column contains the terms "football", "NFL", and "ball". So 1 would be included, but 2 would not. What's the most elegant/efficient way to do this in Terdata?
The number of attributes can vary for each id, and terms can repeat. For example, NFL appears twice for id 2.
You can use the following:
select id
from yourtable
where attribute in ('football', 'NFL', 'ball')
group by id
having count(distinct attribute) = 3
See SQL Fiddle with Demo (fiddle is showing MySQL, but this should work in TeraData)