Building a string based on columns from a GROUP - sql

I have a table like this:
user 1 A
user 1 B
user 2 H
user 2 G
user 2 A
and I need a result like:
user 1 AB
user 2 HGA
Is there a way to obtain a result like this?

So here we create some test data
CREATE TABLE foo AS
SELECT * FROM (
VALUES (1,'A'),(1,'B'),(2,'H'),(2,'G'),(2,'A')
) AS f(id,col);
This should work,
SELECT id, array_to_string(array_agg(col), '')
FROM table
GROUP BY id;
Here is what we're doing,
GROUP BY id.
Build a PostgreSQL text[] (text array) of that column with array_agg
Convert the array back to text by joining on an empty string '' with array_to_string.
You can also use string_agg,
SELECT id, string_agg(col, '')
FROM foo
GROUP BY id;

The better soluction is using str_sum agregate function
select
user,
str_sum(column_name,'')
from table_name
group by user;

Related

Use a CASE expression without typing matched conditions manually using PostgreSQL

I have a long and wide list, the following table is just an example. Table structure might look a bit horrible using SQL, but I was wondering whether there's a way to extract IDs' price using CASE expression without typing column names in order to match in the expression
IDs
A_Price
B_Price
C_Price
...
A
23
...
B
65
82
...
C
...
A
10
...
..
...
...
...
...
Table I want to achieve:
IDs
price
A
23;10
B
65
C
82
..
...
I tried:
SELECT IDs, string_agg(CASE IDs WHEN 'A' THEN A_Price
WHEN 'B' THEN B_Price
WHEN 'C' THEN C_Price
end::text, ';') as price
FROM table
GROUP BY IDs
ORDER BY IDs
To avoid typing A, B, A_Price, B_Price etc, I tried to format their names and call them from a subquery, but it seems that SQL cannot recognise them as columns and cannot call the corresponding values.
WITH CTE AS (
SELECT IDs, IDs||'_Price' as t FROM ID_list
)
SELECT IDs, string_agg(CASE IDs WHEN CTE.IDs THEN CTE.t
end::text, ';') as price
FROM table
LEFT JOIN CTE cte.IDs=table.IDs
GROUP BY IDs
ORDER BY IDs
You can use a document type like json or hstore as stepping stone:
Basic query:
SELECT t.ids
, to_json(t.*) ->> (t.ids || '_price') AS price
FROM tbl t;
to_json() converts the whole row to a JSON object, which you can then pick a (dynamically concatenated) key from.
Your aggregation:
SELECT t.ids
, string_agg(to_json(t.*) ->> (t.ids || '_price'), ';') AS prices
FROM tbl t
GROUP BY 1
ORDER BY 1;
Converting the whole (big?) row adds some overhead, but you have to read the whole table for your query anyway.
A union would be one approach here:
SELECT IDs, A_Price FROM yourTable WHERE A_Price IS NOT NULL
UNION ALL
SELECT IDs, B_Price FROM yourTable WHERE B_Price IS NOT NULL
UNION ALL
SELECT IDs, C_Price FROM yourTable WHERE C_Price IS NOT NULL;

Create view with JSON array based on many to many table

I have a typical table with users. I have also a many to many table where first column is UserId and second BusinessId. I want to create a view with users where their businessId will be as json.
I tried something like this:
SELECT
ISNULL(CAST(u.[Id] AS INT), 0) AS [Id]
,(SELECT BusinessId FROM [TableA].[dbo].[user_business_entities] WHERE UserId = u.Id FOR JSON AUTO) AS BusinessEntityIds
FROM
[TableA].[dbo].[core_users] u
But in view I get this:
Id
BusinessEntityIds
1
[{"BusinessId":1925},{"BusinessId":1926}]
2
[{"BusinessId":15}]
It's pretty good, but it would be best if json had only values, no key name i.e only ids without "BusinessId":
Id
BusinessEntityIds
1
[1925, 1926]
2
[15]
How can I do this?
Two quick options: First is for <=2016 and the 2nd is 2017+
Never understood why MS never provided this functionality of a simple ARRAY.
Option 1 <=2016
Select ID
,BusinessEntityIds = '['+stuff((Select concat(',',BusinessEntityIds)
From YourTable
Where ID=A.ID
For XML Path ('')),1,1,'')+']'
From YourTable A
Group By ID
Option 2 2017+
select ID,
BusinessEntityIds = '['+string_agg(BusinessEntityIds, ',') +']'
from YourTable
group by ID
Both Results Are
ID BusinessEntityIds
1 [1925,1926]
2 [15]

Hive Explode the Array of Struct key: value:

This is the below Hive Table
CREATE EXTERNAL TABLE IF NOT EXISTS SampleTable
(
USER_ID string,
DETAIL_DATA array<struct<key:string,value:string>>
)
And this is the data in the above table-
11111 [{"key":"client_status","value":"ACTIVE"},{"key":"name","value":"Jane Doe"}]
Is there any way I can get the below output using HiveQL?
**client_status** | **name**
-------------------+----------------
ACTIVE Jane Doe
I tried use explode() but I get result like that:
SELECT details
FROM sample_table
lateral view explode(DETAIL_DATA) exploded_table as details;
**details**
-------------------------------------------+
{"key":"client_status","value":"ACTIVE"}
------------------------------------------+
{"key":"name","value":"Jane Doe"}
Use laterral view [outer] inline to get struct elements already etracted and use conditional aggregation to get values corresponting to some keys grouped in single row, use group_by user_id.
Demo:
with sample_table as (--This is your data example
select '11111' USER_ID,
array(named_struct('key','client_status','value','ACTIVE'),named_struct('key','name','value','Jane Doe')) DETAIL_DATA
)
SELECT max(case when e.key='name' then e.value end) as name,
max(case when e.key='client_status' then e.value end) as status
FROM sample_table
lateral view inline(DETAIL_DATA) e as key, value
group by USER_ID
Result:
name status
------------------------
Jane Doe ACTIVE
If you can guarantee the order of structs in array (one with status comes first always), you can address nested elements dirctly
SELECT detail_data[0].value as client_status,
detail_data[1].value as name
from sample_table
One more approach, if you do not know the order in array, but array is of size=2, CASE expressions without explode will give better performance:
SELECT case when DETAIL_DATA[0].key='name' then DETAIL_DATA[0].value else DETAIL_DATA[1].value end as name,
case when DETAIL_DATA[0].key='client_status' then DETAIL_DATA[0].value else DETAIL_DATA[1].value end as status
FROM sample_table

Loop Through a Table to concatenate Rows

I have a table of similar structure:
Name Movies_Watched
A Terminator
B Alien
A Batman
B Rambo
B Die Hard
....
I am trying to get this:
Name Movies_Watched
A Terminator;Batman
B Alien, Die Hard, Rambo
My initial guess was:
SELECT Name, Movies_Watched || Movies_Watched from TABLE
But obviously that's wrong. Can someone tell me how can I loop through the 2nd column and concatenate them? What's the logic like?
Got to know that group_concat is the right approach. But haven't been able to figure it out yet. When I've tried:
select name, group_concat(movies_watched) from table group by 1
But it throws an error saying User-defined transform function group_concat must have an over clause
You are looking for string_agg():
select name, string_agg(movie_watched, ';') as movies_watched
from t
group by name;
That said, you are using Postgres, so you should learn how to use arrays instead of strings for such things. For instance, there is no confusion with arrays when the movie name has a semicolon. That would be:
select name, array_agg(movie_watched) as movies_watched
from t
group by name;
use array_agg
SELECT Name, array_agg(Movies_Watched)
FROM data_table
GROUP BY Name
i think you need listagg or group_concat as you are using vertica upper is postgrey solution
SELECT Name, listagg(Movies_Watched)
FROM data_table
GROUP BY Name
or
select Name,
group_concat(Movies_Watched) over (partition by Name order by name) ag
from mytable
As already mentioned, in Vertica it's LISTAGG():
WITH
input(nm,movies_watched) AS (
SELECT 'A','Terminator'
UNION ALL SELECT 'B','Alien'
UNION ALL SELECT 'A','Batman'
UNION ALL SELECT 'B','Rambo'
UNION ALL SELECT 'B','Die Hard'
)
SELECT
nm AS "Name"
, LISTAGG(movies_watched) AS movies_watched
FROM input
GROUP BY nm;
-- out Name | movies_watched
-- out ------+----------------------
-- out A | Terminator,Batman
-- out B | Alien,Rambo,Die Hard
-- out (2 rows)
-- out
-- out Time: First fetch (2 rows): 12.735 ms. All rows formatted: 12.776 ms

SQL get rows matching ALL conditions

I would like to retrieve all rows matching a set of conditions on the same column. But I would like the rows only if ALL the conditions are good, and no row if only one condition fails.
For example, taking this table:
|id|name|
---------
|1 |toto|
|2 |tata|
I would like to be able to request if "tata" && "toto" are in this table. But when asking if "tata" and "tuto" are in, I would like an empty response if one of argument is in not in the table, for example asking if "toto" && "tutu" are included in the table.
How can I do that ?
Currently, I'am doing one query per argument, which is not very efficient. I tried several solutions including a subselect or a group+having, but no one is working like I want.
thanks for your support !
cheers
This isn't the most efficient way, but this query would work.
SELECT * FROM table_name
WHERE (name = 'toto' OR name = 'tata')
AND ( SELECT COUNT(*) FROM table_name WHERE name = 'toto') > 0
AND ( SELECT COUNT(*) FROM table_name WHERE name = 'tata') > 0
This is a little vague. If the names are unique, you could count the matching rows that match a where clause:
where name='toto' or name='tata'
If the count is 2, then you know both matched. If name is not unique you could potentially select the first ID (select top 1 id ...) that matches each in a union and count those with an outer select.
Even if you had an arbitrary number of names to match, you could create a stored procedure or code in whatever top-level language you are using to build the select statement.
SELECT 1 AS found FROM hehe
WHERE 1 IN (SELECT 1 FROM hehe WHERE name='tata')
AND 1 IN (SELECT 1 FROM hehe WHERE name='toto')
If name is unique you can simplify to:
SELECT *
FROM tbl
WHERE name IN ('toto', 'tata')
AND (SELECT count(*) FROM tbl WHERE name IN ('toto', 'tata')) > 1;
If it isn't:
SELECT *
FROM tbl
WHERE name IN ('toto', 'tata')
AND EXISTS (SELECT * FROM tbl WHERE name = 'toto')
AND EXISTS (SELECT * FROM tbl WHERE name = 'tata');
Or, in PostgreSQL, MySQL and possibly others:
SELECT *
FROM tbl
WHERE name IN ('toto', 'tata')
AND (SELECT count(DISTINCT name) FROM tbl WHERE name IN ('toto', 'tata')) > 1;