Dynamic transpose for unknown row value into column name on postgres - sql

I have table such:
customer_number
label
value
1
address
St. John 1A
1
phone
111111111
1
email
john#cena.com
2
address
St. Marry 231A
2
phone
222222222
2
email
please#marry.me
I want new table or view so it's become:
customer_number
address
phone
email
1
St. John 1A
111111111
john#cena.com
2
St. Marry 231A
222222222
please#marry.me
but in the future there are possibility to add different label, for example there might be new label called occupation.
Important to note, I don't know the value of the label column, so it's should iterate to any value inside that column.
Is there any way to do this?

You can't have a "dynamic" pivot as the number, names and data types of all columns of a query must be known to the database before the query is actually executed (i.e. at parse time).
I find aggregating stuff into a JSON easier to deal with.
select customer_number,
jsonb_object_agg(label, value) as props
from the_table
group by customer_number
If your frontend can deal with JSON values directly, you can stop here.
If you really need a view with one column per attribute, you can them from the JSON value:
select customer_number,
props ->> 'address' as address,
props ->> 'phone' as phone,
props ->> 'email' as email
from (
select customer_number,
jsonb_object_agg(label, value) as props
from the_table
group by customer_number
) t
I find this a bit easier to manage when new attributes are added.
If you need a view with all labels, you can create a stored procedure to dynamically create it. If the number of different labels doesn't change too often, this might be a solution:
create procedure create_customer_view()
as
$$
declare
l_sql text;
l_columns text;
begin
select string_agg(distinct format('(props ->> %L) as %I', label, label), ', ')
into l_columns
from the_table;
l_sql :=
'create view customer_properties as
select customer_number, '||l_columns||'
from (
select customer_number, jsonb_object_agg(label, value) as props
from the_table
group by customer_number
) t';
execute l_sql;
end;
$$
language plpgsql;
Then create the view using:
call create_customer_view();
And in your code just use:
select *
from customer_properties;
You can schedule that procedure to run in regular intervals (e.g. through a cron job on Linux)

Generally-speaking SQL is not good at pivotting dynamically.
Here is a query that will pivot the data for you. However, it is not dynamic i.e. if a future occupation label was added then you would have to change the query. Not sure whether that is acceptable or not :
select customer_number,
max(value) filter (where label='address') as address,
max(value) filter (where label='phone') as phone,
max(value) filter (where label='email') as email
from your_customer_table
group by customer_number
Bit of an assumption that you are running Postgres 9.4 or better here so that the filter function is supported. If not then it can be re-worked using case statements :
select customer_number,
max(case when label='address' then value else null end) as address,
max(case when label='phone' then value else null end) as phone,
max(case when label='email' then value else null end) as email
from your_customer_table
group by customer_number

I used cross apply to solve this problem ..
Here is my query
select distinct tb9.customer_number, tb9_2.*
from Table_9 tb9 cross apply
(select max(case when tb9_2.[label] like '%address%' then [value] end) as [address],
max(case when tb9_2.[label] like '%phone%' then [value] end) as [phone],
max(case when tb9_2.[label] like '%email%' then [value] end) as [email]
from Table_9 tb9_2
where tb9.customer_number = tb9_2.customer_number
) tb9_2;

Related

Use a CASE expression without typing matched conditions manually using PostgreSQL

I have a long and wide list, the following table is just an example. Table structure might look a bit horrible using SQL, but I was wondering whether there's a way to extract IDs' price using CASE expression without typing column names in order to match in the expression
IDs
A_Price
B_Price
C_Price
...
A
23
...
B
65
82
...
C
...
A
10
...
..
...
...
...
...
Table I want to achieve:
IDs
price
A
23;10
B
65
C
82
..
...
I tried:
SELECT IDs, string_agg(CASE IDs WHEN 'A' THEN A_Price
WHEN 'B' THEN B_Price
WHEN 'C' THEN C_Price
end::text, ';') as price
FROM table
GROUP BY IDs
ORDER BY IDs
To avoid typing A, B, A_Price, B_Price etc, I tried to format their names and call them from a subquery, but it seems that SQL cannot recognise them as columns and cannot call the corresponding values.
WITH CTE AS (
SELECT IDs, IDs||'_Price' as t FROM ID_list
)
SELECT IDs, string_agg(CASE IDs WHEN CTE.IDs THEN CTE.t
end::text, ';') as price
FROM table
LEFT JOIN CTE cte.IDs=table.IDs
GROUP BY IDs
ORDER BY IDs
You can use a document type like json or hstore as stepping stone:
Basic query:
SELECT t.ids
, to_json(t.*) ->> (t.ids || '_price') AS price
FROM tbl t;
to_json() converts the whole row to a JSON object, which you can then pick a (dynamically concatenated) key from.
Your aggregation:
SELECT t.ids
, string_agg(to_json(t.*) ->> (t.ids || '_price'), ';') AS prices
FROM tbl t
GROUP BY 1
ORDER BY 1;
Converting the whole (big?) row adds some overhead, but you have to read the whole table for your query anyway.
A union would be one approach here:
SELECT IDs, A_Price FROM yourTable WHERE A_Price IS NOT NULL
UNION ALL
SELECT IDs, B_Price FROM yourTable WHERE B_Price IS NOT NULL
UNION ALL
SELECT IDs, C_Price FROM yourTable WHERE C_Price IS NOT NULL;

How can I write this using SQL?

I need to write a code in sql that writes "del_row" in the column "Adjustment_name" when there are duplicated Id_numbers (e.g:234566) but just when one of the values in Phone_number start with A and other one start with B and in that case, it will write "del_row" just in the row in which the value in column "Phone_number" starts with "B". Imagine that I have two duplicated id_numbers and in one of them, the Phone_number starts with A and in the other row starts with "C". In this last situation, I don't want to write anything.
Id_number
Phone_number
Adjustment_name
234566
A5258528564
675467
A1147887422
675534
P1554515315
234566
B4141415882
del_row
234566
C5346656665
Many thanks!
One approach
SELECT t.id_number, t.Phone_number,
CASE WHEN a.id_number IS NOT NULL THEN 'del_row' ELSE '' END as Adjustment_name
FROM mytable t
LEFT JOIN
(SELECT id_number from mytable
WHERE SUBSTRING(Phone_number FROM 1 FOR 1)='A') a
/* List of IDs that have a phone number starting with A */
ON a.id_number = t.id_number
AND SUBSTRING(t.Phone_number FROM 1 FOR 1)='B'
/* Only check for matching ID with A if this number starts with B */
A rather crude approach would be as below
(assuming your phones rank Axxx, Bxxx, Cxxx, Dxxx). If your phone numbering logic is different - which is not very clear from your req - you can adjust accordingly.
create table temp_table_1 as (
select id_number, phone_number
, case
when dense_rank() over(partition by id_number order by phone_number)>1
and phone_number like 'B%'
then 'del_row'
end adjustment_name
from your_table_name
) with data;
drop table your_table_name;
rename table temp_table_1 to your_table_name;

Hive Explode the Array of Struct key: value:

This is the below Hive Table
CREATE EXTERNAL TABLE IF NOT EXISTS SampleTable
(
USER_ID string,
DETAIL_DATA array<struct<key:string,value:string>>
)
And this is the data in the above table-
11111 [{"key":"client_status","value":"ACTIVE"},{"key":"name","value":"Jane Doe"}]
Is there any way I can get the below output using HiveQL?
**client_status** | **name**
-------------------+----------------
ACTIVE Jane Doe
I tried use explode() but I get result like that:
SELECT details
FROM sample_table
lateral view explode(DETAIL_DATA) exploded_table as details;
**details**
-------------------------------------------+
{"key":"client_status","value":"ACTIVE"}
------------------------------------------+
{"key":"name","value":"Jane Doe"}
Use laterral view [outer] inline to get struct elements already etracted and use conditional aggregation to get values corresponting to some keys grouped in single row, use group_by user_id.
Demo:
with sample_table as (--This is your data example
select '11111' USER_ID,
array(named_struct('key','client_status','value','ACTIVE'),named_struct('key','name','value','Jane Doe')) DETAIL_DATA
)
SELECT max(case when e.key='name' then e.value end) as name,
max(case when e.key='client_status' then e.value end) as status
FROM sample_table
lateral view inline(DETAIL_DATA) e as key, value
group by USER_ID
Result:
name status
------------------------
Jane Doe ACTIVE
If you can guarantee the order of structs in array (one with status comes first always), you can address nested elements dirctly
SELECT detail_data[0].value as client_status,
detail_data[1].value as name
from sample_table
One more approach, if you do not know the order in array, but array is of size=2, CASE expressions without explode will give better performance:
SELECT case when DETAIL_DATA[0].key='name' then DETAIL_DATA[0].value else DETAIL_DATA[1].value end as name,
case when DETAIL_DATA[0].key='client_status' then DETAIL_DATA[0].value else DETAIL_DATA[1].value end as status
FROM sample_table

SQL group by into a pivot

I can do this on my table:
SELECT country, COUNT(*)
FROM table1
GROUP BY country, type
This query gives me
country type COUNT(*)
Canada first 22
Canada second 42
Canada third 15
Australia second 23
Australia third 18
but I need to get
country first second third
Canada 22 42 15
Australia 23 18 0
Complexity is, 3 columns here just an example; and have about 20 different; and I have over 200 countries... I have found something like this https://dba.stackexchange.com/questions/28406/group-by-two-columns?newreg=febbf51c648e4c17a2ebcb798bff1261, but # of columns I'd end up is rendering this approach infeasible.
Any thoughts?
You would use conditional aggregation:
select country,
sum(case when type = 'first' then 1 else 0 end) as cnt_first,
sum(case when type = 'second' then 1 else 0 end) as cnt_second,
. . .
from t
group by country;
You may find that generating the sum(case) expressions is more easily done in a spreadsheet then typing them out (although copy and paste is really pretty simple for 20 rows).
If you are using MySQL then you already know the answer from the link.
In SQL server you can do this.
SELECT country, [First],[Second],[Third]
FROM
(
SELECT country, [type],COUNT(*)cnt
FROM table1
GROUP BY country, [type]
) AS SourceTable
PIVOT
(
max(cnt)
FOR [type] IN ([First],[Second],[Third])
) AS PivotTable;
And please share your table structure with sample data for better understanding of your problem.
The pivoting style for a select statement is closely related with the DBMS been used. Presuming the DBMS you're using as MySQL depending on the shared link within the question, use
SELECT country, SUM(type='first') AS first,
SUM(type='second') AS second,
SUM(type='third') AS third
FROM table1
GROUP BY country
as a static query, and that could be converted to the following code block in order to make it dynamic
SET #sql = NULL;
SELECT GROUP_CONCAT(
CONCAT(
'SUM(type="',type,'" ) AS ',type
)
)
INTO #sql
FROM ( SELECT DISTINCT type FROM table1 ) AS t;
SET #sql = CONCAT('SELECT country,',#sql,' FROM table1 GROUP BY country');
PREPARE stmt FROM #sql;
EXECUTE stmt;
DEALLOCATE PREPARE stmt;
Demo

How to aggregate data in one column by values in another column using SQL

I have a table in PostgreSQL that contains demographic data for each province of my country.
Columns are: Province_name, professions, Number_of_people.
As you can see, Province_names are repeated for each profession.
How then can I get the province names not repeated and instead get the professions in separate columns?
It sounds like you want to pivot your table (Really: It is better to show data and expected output in your question!)
demo:db<>fiddle
This is the PostgreSQL way (since 9.4) to do that using the FILTER clause
SELECT
province,
SUM(people) FILTER (WHERE profession = 'teacher') AS teacher,
SUM(people) FILTER (WHERE profession = 'banker') AS banker,
SUM(people) FILTER (WHERE profession = 'supervillian') AS supervillian
FROM mytable
GROUP BY province
If you want to go a more common way, you can use the CASE clause
SELECT
province,
SUM(CASE WHEN profession = 'teacher' THEN people ELSE 0 END) AS teacher,
SUM(CASE WHEN profession = 'banker' THEN people ELSE 0 END) AS banker,
SUM(CASE WHEN profession = 'supervillian' THEN people ELSE 0 END) AS supervillian
FROM mytable
GROUP BY province
What you want to do is a pivot which is a little more complicated in Postgresql then in other rdbms. You can use the crosstab function. Find a introduction here: https://www.vertabelo.com/blog/technical-articles/creating-pivot-tables-in-postgresql-using-the-crosstab-function
for you it would look something like this:
SELECT *
FROM crosstab( 'select Province_name, professions, Number_of_people from table1 order by 1,2')
AS final_result(Province_name TEXT, data_scientist NUMERIC,data_engineer NUMERIC,data_architect NUMERIC,student NUMERIC);