postgresql unnest and pivot int array column - sql

I have below table
create table test(id serial, key int,type text,words text[],numbers int[] );
insert into test(key,type,words) select 1,'Name',array['Table'];
insert into test(key,type,numbers) select 1,'product_id',array[2];
insert into test(key,type,numbers) select 1,'price',array[40];
insert into test(key,type,numbers) select 1,'Region',array[23,59];
insert into test(key,type,words) select 2,'Name',array['Table1'];
insert into test(key,type,numbers) select 2,'product_id',array[1];
insert into test(key,type,numbers) select 2,'price',array[34];
insert into test(key,type,numbers) select 2,'Region',array[23,59,61];
insert into test(key,type,words) select 3,'Name',array['Chair'];
insert into test(key,type,numbers) select 3,'product_id',array[5];
I was using below query to pivot table for users.
select key,
max(array_to_string(words,',')) filter(where type='Name') as "Name",
cast(max(array_to_string(numbers,',')) filter(where type='product_id') as int) as "product_id",
cast(max(array_to_string(numbers,',')) filter(where type='price') as int) as "price" ,
max(array_to_string(numbers,',')) filter(where type='Region') as "Region"
from test group by key
But I couldn't unnest the Region column during Pivot in-order to use Region column to join with another table .
My expected output is below

Since we are using unnest("Region") to do to pivot. There must be a row with region data for each product.
Or below code will do the trick by creating an array of null.
unnest(CASE WHEN array_length("Region", 1) >= 1
THEN "Region"
ELSE '{null}'::int[] END)
Schema:
create table test(id serial, key int,type text,words text[],numbers int[] );
insert into test(key,type,words) select 1,'Name',array['Table'];
insert into test(key,type,numbers) select 1,'product_id',array[2];
insert into test(key,type,numbers) select 1,'price',array[40];
insert into test(key,type,numbers) select 1,'Region',array[23,59];
insert into test(key,type,words) select 2,'Name',array['Table1'];
insert into test(key,type,numbers) select 2,'product_id',array[1];
insert into test(key,type,numbers) select 2,'price',array[34];
insert into test(key,type,numbers) select 2,'Region',array[23,59,61];
insert into test(key,type,words) select 3,'Name',array['Chair'];
insert into test(key,type,numbers) select 3,'product_id',array[5];
select key,"Name",product_id,price,unnest(CASE WHEN array_length("Region", 1) >= 1
THEN "Region"
ELSE '{null}'::int[] END) from
(
select key,
max(array_to_string(words,',')) filter(where type='Name') as "Name",
cast(max(array_to_string(numbers,',')) filter(where type='product_id') as int) as "product_id",
cast(max(array_to_string(numbers,',')) filter(where type='price') as int) as "price" ,
max(numbers) filter(where type='Region') as "Region"
from test group by key
)t order by key
key
Name
product_id
price
unnest
1
Table
2
40
23
1
Table
2
40
59
2
Table1
1
34
23
2
Table1
1
34
59
2
Table1
1
34
61
3
Chair
5
null
null
db<>fiddle here

Very strange database design... I'm assuming you inherited it?
If none of the other array values will ever have a cardinality > 1 then, you can simply unnest:
select
key,
(max (words) filter (where type = 'Name'))[1] as name,
(max (numbers) filter (where type = 'product_id'))[1] as product_id,
(max (numbers) filter (where type = 'price'))[1] as price,
unnest (max (numbers) filter (where type = 'Region')) as region
from test
group by key
If they can have multiple values, that can also be handled.
-- EDIT 3/15/2021 --
Short version: an unnest against a null won't product a row, so if you coalesce the null value into an array of a single null element, that should take care of this part:
select
key,
(max (words) filter (where type = 'Name'))[1] as name,
(max (numbers) filter (where type = 'product_id'))[1] as product_id,
(max (numbers) filter (where type = 'price'))[1] as price,
unnest (coalesce (max (numbers) filter (where type = 'Region'), array[null]::integer[])) as region
from test
group by key
order by key
Now for the part you didn't ask... I and at least one other have been gently nudging you that your database design is going to cause multiple problems at every turn. The fact that it's in production doesn't mean you shouldn't fix it as soon as you can.
This design is what's known as EAV - Entity - Attribute - Value. It has its use cases, but like most good things it can also be applied when it shouldn't. The use case that comes to mind is if you want users to be able to dynamically add attributes to certain objects. Even then, there might be better/easier ways.
And as one example, if you have one million objects, five attributes means you have to store that as five million rows, and the majority of that space will be occupied with repeating the key and attribute names.
Just food for thought. We can continue to triage this with every new scenario you find, but it would be better to redo the design.

Related

How to insert a column which sets unique id based on values in another column (SQL)?

I will create table where I will insert multiple values for different companies. Basically I have all values that are in the table below but I want to add a column IndicatorID which is linked to IndicatorName so that every indicator has a unique id. This will obviously not be a PrimaryKey.
I will insert the data with multiple selects:
CREATE TABLE abc
INSERT INTO abc
SELECT company_id, 'roe', roevalue, metricdate
FROM TABLE1
INSERT INTO abc
SELECT company_id, 'd/e', devalue, metricdate
FROM TABLE1
So, I don't know how to add the IndicatorID I mentioned above.
EDIT:
Here is how I populate my new table:
INSERT INTO table(IndicatorID, Indicator, Company, Value, Date)
SELECT [the ID that I need], 'NI_3y' as 'Indicator', t.Company, avg(t.ni) over (partition by t.Company order by t.reportdate rows between 2 preceding and current row) as 'ni_3y',
t.reportdate
FROM table t
LEFT JOIN IndicatorIDs i
ON i.Indicator = roe3 -- the part that is not working if I have separate indicatorID table
I am going to insert different indicators for the same companies. And I want indicatorID.
Your "indicator" is a proper entity in its own right. Create a table with all indicators:
create table indicators (
indicator_id int identity(1, 1) primary key,
indicator varchar(255)
);
Then, use the id only in this table. You can look up the value in the reference table.
Your inserts are then a little more complicated:
INSERT INTO indicators (indicator)
SELECT DISTINCT roevalue
FROM table1 t1
WHERE NOT EXISTS (SELECT 1 FROM indicators i2 WHERE i2.indicator = t1.roevalue);
Then:
INSERT INTO ABC (indicatorId, companyid, value, date)
SELECT i.indicatorId, t1.company, v.value, t1.metricdate
FROM table1 t1 CROSS APPLY
(VALUES ('roe', t1.roevalue), ('d/e', t1.devalue)
) v(indicator, value) JOIN
indicators i
ON i.indicator = v.indicator;
This process is called normalization and it is the typical way to store data in a database.
DDL and INSERT statement to create an indicators table with a unique constraint on indicator. Because the ind_id is intended to be a foreign key in the abc table it's created as a non-decomposable surrogate integer primary key using the IDENTITY property.
drop table if exists test_indicators;
go
create table test_indicators (
ind_id int identity(1, 1) primary key not null,
indicator varchar(20) unique not null);
go
insert into test_indicators(indicator) values
('NI'),
('ROE'),
('D/E');
The abc table depends on the ind_id column from indicators table as a foreign key reference. To populate the abc table company_id's are associated with ind_id's.
drop table if exists test_abc
go
create table test_abc(
a_id int identity(1, 1) primary key not null,
ind_id int not null references test_indicators(ind_id),
company_id int not null,
val varchar(20) null);
go
insert into test_abc(ind_id, company_id)
select ind_id, 102 from test_indicators where indicator='NI'
union all
select ind_id, 103 from test_indicators where indicator='ROE'
union all
select ind_id, 104 from test_indicators where indicator='D/E'
union all
select ind_id, 103 from test_indicators where indicator='NI'
union all
select ind_id, 105 from test_indicators where indicator='ROE'
union all
select ind_id, 102 from test_indicators where indicator='NI';
Query to get result
select i.ind_id, a.company_id, i.indicator, a.val
from test_abc a
join test_indicators i on a.ind_id=i.ind_id;
Output
ind_id company_id indicator val
1 102 NI NULL
2 103 ROE NULL
3 104 D/E NULL
1 103 NI NULL
2 105 ROE NULL
1 102 NI NULL
I was finally able to find the solution for my problem which seems to me very simple, although it took time and asking different people about it.
First I create my indicators table where I assign primary key for all indicators I have:
CREATE TABLE indicators (
indicator_id int identity(1, 1) primary key,
indicator varchar(255)
);
Then I populate easy without using any JOINs or CROSS APPLY. I don't know if this is optimal but it seems as the simplest choice:
INSERT INTO table(IndicatorID, Indicator, Company, Value, Date)
SELECT
(SELECT indicator_id from indicators i where i.indicator = 'NI_3y) as IndicatorID,
'NI_3y' as 'Indicator',
Company,
avg(ni) over (partition by Company order by reportdate rows between 2 preceding and current row) as ni_3y,
reportdate
FROM TABLE1

SQL 'GROUP BY' to filter an array of 'text' data type

I am new to SQL and I an trying to understand the GROUP BY statement.
I have inserted the following data in SQL:
CREATE TABLE table( id integer, type text);
INSERT INTO table VALUES (1,'start');
INSERT INTO table VALUES (2,'start');
INSERT INTO table VALUES (2,'complete');
INSERT INTO table VALUES (3,'complete');
INSERT INTO table VALUES (3,'start');
INSERT INTO table VALUES (4,'start');
I want to select those IDs that do not have a type 'complete'. For this example I should get IDs 1, 4.
I have tried multiple GROUP BY - HAVING combinations. My best approach is:
SELECT id from customers group by type having type!='complete';
but the resulted IDs are 4,3,2.
Could anyone give me a hint about what I am doing wrong?
You are close. The having clause needs an aggregation function and you need to aggregate by id:
select id
from table t
group by id
having sum(case when type = 'complete' then 1 else 0 end) = 0;
Normally, if you have something called an id, you would also have a table with that as primary key. If so, you can also do:
select it.id
from idtable it
where not exists (select 1
from table t
where t.type = 'complete' and it.id = t.id
);

SQL count one field two times in select with different parameters

I like to have my query count one column two times in my select based on the value. So for example.
input: table
id | type
-------------|-------------
1 | 1
2 | 1
3 | 2
4 | 2
5 | 2
output: query (in 1 row, not two):
countfirst = 2 (two times 1)
countsecond = 3 (three times 2)
An default count in an select counts all rows in the query. But i like to count rows based
on an number without limiting the query. When using for example WHERE type = '1', type 2
gets filtered and cannot be counted anymore.
Is there an solution for this case in SQL?
--- EXAMPLE USE (situation above is simplefied but case is the same) ---
With one query i get all cars grouped by type from an table. There are two type signs: yellow (in db 1) and grey (in db 2). So in that query i have the folowing output:
Renault - ten times found - two yellow signs - eight grey signs
Create a table, script is given below.
CREATE TABLE [dbo].[temptbl](
[id] [int] NULL,
[type] [int] NULL
) ON [PRIMARY]
Execute the insert script as
insert into [temptbl] values(1,1)
insert into [temptbl] values(2,1)
insert into [temptbl] values(3,2)
insert into [temptbl] values(4,2)
insert into [temptbl] values(5,2)
Then execute the query.
;WITH cte as(
SELECT [type], Count([type]) cnt
FROM temptbl
GROUP BY [type]
)
SELECT * FROM cte
pivot (Sum([cnt]) for [type] in ([1],[2])) as AvgIncomePerDay
You can use the GROUP BY clause as Mureinik suggested, but with the addition of a WHERE clause to filter the results.
Below shows the results for type = 1 (assuming type is an INT:
SELECT type, COUNT(*) AS NoOfRecords
FROM table
WHERE type IN (1)
GROUP BY type
So if we wanted 1 and 2 we can use:
SELECT type, COUNT(*) AS NoOfRecords
FROM table
WHERE type IN (1, 2)
GROUP BY type
Lastly, that IN statement can pull type from another query:
SELECT type, COUNT(*) AS NoOfRecords
FROM table
WHERE type IN (SELECT type FROM someOtherTable)
GROUP BY type

A thought experiment in SQL

I want to show the number of times each distinct element in a column in a table in a SQL database appears, alongside the particular distinct element in a new output table. Is it possible in a single statement over ramming my head over it manually?
Without having actually tried, how about this:
SELECT tmp.Field, (SELECT COUNT(*) FROM [Table] t WHERE t.DesiredField = tmp.Field) AS Count
FROM
(
SELECT DISTINCT DesiredField FROM [Table]
) tmp
This would first select all distinct values from [Table] and in the outer select, take the values and the number of times they appear in the column.
You could also try
SELECT Field, SUM(1) AS Count FROM Table
GROUP BY Field
This should "flatten" the table so that it only contains distinct values in Field and the number of rows where Field has the same value.
I just tried the second - it seems to work nicely.
Turns out I was wrong all the time. The second example and the following actually return the same results:
SELECT Field, COUNT(*) AS Count FROM Table
GROUP BY Field
Simplest just to use COUNT(). You'll see varieties on what your count parameter, so here are the options.
DECLARE #tbl TABLE(id INT, data INT)
INSERT INTO #tbl VALUES (1,1),(2,1),(3,2),(4,NULL)
SELECT data
,COUNT(*) Count_star
,COUNT(id) Count_id
,COUNT(data) Count_data
,COUNT(1) Count_literal
FROM #tbl
GROUP BY data
data Count_star Count_id Count_data Count_literal
----------- ----------- ----------- ----------- -------------
NULL 1 1 0 1
1 2 2 2 2
2 1 1 1 1
Warning: Null value is eliminated by an aggregate or other SET operation.
You'll see the difference coming with the treatment of NULL if you COUNT a field that contains NULLs.

sql multiple insert on one table, while looping/iterating over another table?

I have two table "TempStore" and "Store" with the same column called "Items".
There is data in "TempStore" table which I need to move over to "Store" table which requires few modifications.
I need to iterate over "TempStore" data (i.e. items) and insert into Store...
More specifically: How can I iterate over TempStore (in sql) where "for each item in 'TempStore' I need to store 2 or maybe 3 items in 'Store' with little modification", how can I accomplish this?
What I want to do is take each rowdata from "[SELECT * FROM TempStore]" and insert three records in "Store" with being able to change "items"
try INSERT-SELECT:
INSERT INTO Store
(col1, col2, col3...)
SELECT
col1, col2, col3...
FROM TempStore
WHERE ...
just make the SELECT return one row for every insert, and produce the values in the Cols that you need. You might need CASE and a join to another table to make the extra rows.
EDIT based on comments, OP wanted to see the numbers table in action
Lets say TempStore table has {Items,
Cost, Price, ActualCost, ActualPrice}
But in the Store table I need to store
{Items, Cost, Price}. The ActualCost
and ActualPrice from TempStore datarow
would need to be added as another row
in Store....(I hope this makes
sense)....Anyways, is the solution
using "WHILE-BEGIN-END"??
CREATE TABLE Numbers (Number int NOT NULL PRIMARY KEY)
INSERT INTO Numbers VALUES(1)
INSERT INTO Numbers VALUES(2)
INSERT INTO Numbers VALUES(3)
INSERT INTO Store
(Items, Cost, Price)
SELECT
t.Items, t.Cost
,CASE
WHEN n.Number=1 THEN t.Price
WHEN n.Number=2 THEN t.ActualCost
ELSE t.ActualPrice
END
FROM TempStore t
INNER JOIN Numbers N ON n.Number<=3
WHERE ...
you could even use a UNION:
INSERT INTO Store
(Items, Cost, Price)
SELECT
t.Items, t.Cost, t.Price
FROM TempStore t
UNION ALL
SELECT
t.Items, t.Cost, t.ActualCost
FROM TempStore t
UNION ALL
SELECT
t.Items, t.Cost, t.ActualPrice
FROM TempStore t
either the Numbers table or the UNION will we WAY better than a loop!
OK, I think KM has proposed an excellent solution involving a "numbers table". However, VoodooChild has requested in a comment that I post example code for my suggestion of using WHILE-BEGIN-END around an INSERT-SELECT.
I have created two tables like VoodooChild's Store and TempStore.
Store has columns StoreID, StoreName, StoreState, StoreNumber.
TempStore has columns TempStoreID, TempStoreName.
I prepopulated TempStoreName with the values First, Second, Third and Fourth.
Now, my SQL will insert three records into the Store table for every record in the TempStore table that meets the condition in the WHERE clause. That condition is the length of the TempStoreName, obviously not a real-world example.
DECLARE #counter int
SET #counter = 0;
WHILE #counter < 3
BEGIN
INSERT INTO Store (StoreName, StoreState, StoreNumber)
SELECT TempStoreName, 'AZ', #counter FROM TempStore WHERE LEN(TempStoreName) = 5
SET #counter = #counter + 1
END
The result of this when applied to an empty Store table is:
StoreID StoreName StoreState StoreNumber
1 First AZ 0
2 First AZ 1
3 First AZ 2
4 Third AZ 0
5 Third AZ 1
6 Third AZ 2
So, this approach works. It appears to meet VoodooChild's needs. It may or may not be the very best choice, but there are other factors involved in the decision that we don't know, such as how many times this operation is going to be repeated.
INSERT INTO Store ( SELECT * FROM TempStore UNION ALL SELECT * FROM TempStore )
The above statement will insert two rows in the store for each row in the TempStore. You can change the SELECT * to whatever modification that you want to do to the item.
Given your latest comment, this should give you what you need. You should probably have some way of differentiating the values in your Stores table once they get there. Perhaps an "actual" BIT column or something similar:
INSERT INTO Stores (item, cost, price, actual)
SELECT item, cost, price, 0
FROM TempStores
UNION ALL
SELECT item, actual_cost, actual_price, 1
FROM TempStores
If you needed to adjust the columns (for example, increase actual_price by 10%) then you could do this:
INSERT INTO Stores (item, cost, price, actual)
SELECT item, cost, price, 0
FROM TempStores
UNION ALL
SELECT item, actual_cost, 1.1 * actual_price, 1
FROM TempStores
WHERE actual_cost IS NOT NULL
I also added a WHERE clause to the second SELECT statement to show that you can filter the rows. That WHERE clause will only affect the second SELECT. So, you could also do this:
INSERT INTO Stores (item, cost, price, actual)
SELECT item, cost, price, 0
FROM TempStores
WHERE cost IS NOT NULL
UNION ALL
SELECT item, actual_cost, 1.1 * actual_price, 1
FROM TempStores
WHERE actual_cost IS NOT NULL