Joining vertical and horizontal table - sql

How can I do a join that takes the two tables and get the table in result. Having trouble thinking about it because one is a horizontal table and the other is a vertical table I believe. The other answers on SO are not clear to me because I have to join a value in a row with a column name. How can I do that?
CREATE TABLE forecast (
year integer,
week integer,
model varchar(50),
category varchar(50),
subcategory varchar(50)
);
insert into forecast (year, week, model, category, subcategory) values (2021, 1, 'AAA', 'CategoryA', 'SubcategoryA');
insert into forecast (year, week, model, category, subcategory) values (2021, 1, 'BBB', 'CategoryA', 'SubcategoryA');
insert into forecast (year, week, model, category, subcategory) values (2021, 1, 'CCC', 'CategoryB', 'SubcategoryB');
insert into forecast (year, week, model, category, subcategory) values (2021, 1, 'DDD', 'CategoryA', 'SubcategoryC');
CREATE TABLE translation (
type varchar(50),
name varchar(50),
translated varchar(50)
);
insert into translation (type, name, translated) values ('category', 'CategoryA', 'TranslatedCategoryA');
insert into translation (type, name, translated) values ('category', 'CategoryB', 'TranslatedCategoryB');
insert into translation (type, name, translated) values ('subcategory', 'SubcategoryA', 'TranslatedSubcategoryA');
insert into translation (type, name, translated) values ('subcategory', 'SubcategoryB', 'TranslatedSubcategoryB');
insert into translation (type, name, translated) values ('subcategory', 'SubcategoryC', 'TranslatedSubcategoryC');
CREATE TABLE result (
year integer,
week integer,
model varchar(50),
category varchar(50),
subcategory varchar(50)
);
insert into result (year, week, model, category, subcategory) values (2021, 1, 'AAA', 'TranslatedCategoryA', 'TranslatedSubcategoryA');
insert into result (year, week, model, category, subcategory) values (2021, 1, 'BBB', 'TranslatedCategoryA', 'TranslatedSubcategoryA');
insert into result (year, week, model, category, subcategory) values (2021, 1, 'CCC', 'TranslatedCategoryB', 'TranslatedSubcategoryB');
insert into result (year, week, model, category, subcategory) values (2021, 1, 'DDD', 'TranslatedCategoryA', 'TranslatedSubcategoryC');
This
select * from forecast f
left join translation t
on t.name = f.category or t.name = f.subcategory
translates one at a time which makes sense, but I can't get two columns out of it that translate each column

Double left join on the type and the name.
And a coalesce to default to the original name if there's no translation.
select f.year, f.week, f.model
, coalesce(cat.translated, f.category) as category
, coalesce(subcat.translated, f.subcategory) as subcategory
from forecast f
left join translation cat
on cat.name = f.category
and cat.type = 'category'
left join translation subcat
on subcat.name = f.subcategory
and subcat.type = 'subcategory'
order by f.year, f.week, f.model;
year
week
model
category
subcategory
2021
1
AAA
TranslatedCategoryA
TranslatedSubcategoryA
2021
1
BBB
TranslatedCategoryA
TranslatedSubcategoryA
2021
1
CCC
TranslatedCategoryB
TranslatedSubcategoryB
2021
1
DDD
TranslatedCategoryA
TranslatedSubcategoryC
db<>fiddle here

We have to do a multiple join. One for each column.
select f.year, f.week, f.model, t.translated as 'category', t2.translated as 'subcategory'
from forecast f
left join translation t
on t.name = f.category
left join translation t2
on t2.name = f.subcategory

Related

sql add columns in group dynamically

It is necessary to build a summary table based on data about the customer and their payments, where the columns will be the sequential number of the contract (contact_number) and the year (year) grouped by gender. The main condition is that contact_number and year should be dynamically generated.
Test data:
CREATE TABLE loans
(
loan_id int,
client_id int,
loan_date date
);
CREATE TABLE clients
(
client_id int,
client_name varchar(20),
gender varchar(20)
);
INSERT INTO CLIENTS
VALUES (1, arnold, 'male'),
(2, lilly, 'female'),
(3, betty, 'female'),
(4, tom, 'male'),
(5, jim, 'male');
INSERT INTO loans
VALUES (1, 1, '20220522'),
(2, 2, '20220522'),
(3, 3, '20220525'),
(4, 4, '20220525'),
(5, 1, '20220527'),
(6, 2, '20220527'),
(7, 3, '20220601'),
(8, 1, '20220603'),
(9, 2, '20220603'),
(10, 1, '20220603');
Formation of columns can be done using the case when construct, but this option is not suitable due to the need to constantly add new lines in the query when adding data.
My code:
with cte as
(
select
l.client_id,
loan_date,
extract(year from loan_date) as year,
client_name,
gender,
row_number() over (partition by l.client_id order by loan_date asc) as serial_number_contact
from
loans l
inner join
client c on l.client_id = c.client_id
)
select
gender
, year
, contract_number
, count(*)
from cte
group by gender, year, contract_number
order by year, contract_number
expected Output :
sex
1 contract, 2022
2 contract, 2022
3 contract, 2022
male
2
2
1
female
4
1
1
RDMBS - postgres

Select rows that contain a range of values while excluding values from other columns

Hitting a small wall with a query here. trying to see if transactions contain type 01 while excluding transactions that contain item 23 or 25.
here's a reprex.
In SQL fiddle
create table purchases (
transaction_id int,
item int,
type int,
customer char(1)
);
insert into purchases values (1, 23, 01, "A");
insert into purchases values (1, 25, 01, "A");
insert into purchases values (2, 23, 01, "B");
insert into purchases values (2, 25, 01, "B");
insert into purchases values (2, 1, 01, "B");
insert into purchases values (3, 3, 01, "A");
insert into purchases values (4, 23, 01,"B");
insert into purchases values (4, 25, 01,"B");
insert into purchases values (5, 23, 01,"A");
insert into purchases values (6, 4, 02,"C");
insert into purchases values (7, 9, 03,"C");
Here's the query to identify transactions that only have items 23 and 25 but nothing else, it works, (should be transactions, 1,4 & 5).
select transaction_id from purchases where item in (23,25)
and transaction_id not in (select transaction_id from purchases where item not in (23,25));
However, when I'm struggling to single out the transactions that have type 01 but not items 23 and 25.
I tried this, but it gives out transactions 2 & 3 when it should only be 3 since 2 does contain items 23 & 25.
here's the query I was going with, based on the first one.
select * from purchases where type = 1 and transaction_id not in (select transaction_id from purchases where item in (23,25)
and transaction_id not in (select transaction_id from purchases where item not in (23,25)));
expected result
transaction_id item type customer
3 3 01 A
Based on your updated question, i'd suggest you use the NOT EXISTS clause like below
select * from purchases p1 where not exists
(
select 1 from purchases p2 where p1.transaction_id=p2.transaction_id
and p2.item in (23,25))
and type=1
fiddle demo link
I see that you have already changed the expected result in the question several times (while the query itself does not change), so I'm not sure what exactly you want to get.
In any case, you can take this dbfiddle example, and using arrays, filtered by distinct sorted elements:
You want one row per transaction, so aggregate and GROUP BY transaction_id. Then use the HAVING clause and COUNT conditionally.
select transaction_id
from purchases
group by transaction_id
having count(*) filter (where item = 23) = 0
and count(*) filter (where item = 25) = 0
and count(*) filter (where type = 1) > 0
order by transaction_id;
Demo: https://dbfiddle.uk/?rdbms=postgres_14&fiddle=520755370f13d41ba35ca12e7eb5277e
If you want to show all rows matching above transaction IDs:
select * from purchases where transaction_id in ( <above query> );
Here is one option
select p.*
from purchases p
join (
select transaction_id
from purchases
group by transaction_id
having count(case when item in (25,23) then 1 end)=0
and count(case typ when 1 then 1 end)>0
)x
on p.transaction_id=x.transaction_id
For your sample data:
insert into purchases values (1, 23, 01, 'A');
insert into purchases values (1, 25, 01, 'A');
insert into purchases values (2, 23, 01, 'B');
insert into purchases values (2, 25, 01, 'B');
insert into purchases values (2, 1, 01, 'B');
insert into purchases values (3, 3, 01, 'A');
insert into purchases values (4, 23, 01,'B');
insert into purchases values (4, 25, 01,'B');
insert into purchases values (5, 23, 01,'A');
insert into purchases values (6, 4, 02,'C');
insert into purchases values (7, 9, 03,'C');
Result:
3 3 1 A

Re-format table, placing multiple column headers as rows

I have a table of fishing catches, showing number of fish and total kg, for all the fishing days. Current format of the data is showing as below
In the other reference table is a list of all the official fish species with codes and names.
How can I re-format the first table so the rows are repeated for each day showing a certain species with the corresponding total catches and kgs in a row. So instead of the species kg and n having their different columns, I would have them in rows while there is only one n and kg column. I am thinking of looping through the list of all species and based on the numbers it will duplicate the rows in a way with the right values of n and kg of the species in the rows. This is the final format I need. My database is SQL Server.
You may use a union query here:
SELECT Day, 'Albacore' AS Species, ALB_n AS n, ALB_kg AS kg FROM yourTable
UNION ALL
SELECT Day, 'Big eye tuna', BET_n, BET_kg FROM yourTable
UNION ALL
SELECT Day, 'Sword fish', SWO_n, SWO_kg FROM yourTable
ORDER BY Day, Species;
You can also use a cross apply here, e.g.:
/*
* Data setup...
*/
create table dbo.Source (
Day int,
ALB_n int,
ALB_kg int,
BET_n int,
BET_kg int,
SWO_n int,
SWO_kg int
);
insert dbo.Source (Day, ALB_n, ALB_kg, BET_n, BET_kg, SWO_n, SWO_kg) values
(1, 10, 120, 4, 60, 2, 55),
(2, 15, 170, 8, 100, 1, 30);
create table dbo.Species (
Sp_id int,
Sp_name nvarchar(20)
);
insert dbo.Species (Sp_id, Sp_name) values
(1, N'Albacore'),
(2, N'Big eye tuna'),
(3, N'Sword fish');
/*
* Unpivot data using cross apply...
*/
select Day, Sp_name as Species, n, kg
from dbo.Source
cross apply dbo.Species
cross apply (
select
case
when Sp_name=N'Albacore' then ALB_n
when Sp_name=N'Big eye tuna' then BET_n
when Sp_name=N'Sword fish' then SWO_n
else null end as n,
case
when Sp_name=N'Albacore' then ALB_kg
when Sp_name=N'Big eye tuna' then BET_kg
when Sp_name=N'Sword fish' then SWO_kg
else null end as kg
) unpivotted (n, kg);

Rule based select with source tracking

Here is an example of the sample dataset that I am having trouble writing an efficient SQL:
There is a target table T1 with 5 columns ID (primary key), NAME, CATEGORY, HEIGHT, LINEAGE
T1 gets data from 3 sources - source1, source2, source3
A map table defines the rule as to which column has to be picked in what order from which source
If a source has NULL value for a column, then check the next source to get the value - that's the rule
So the values for target table columns based on the rules are as below for ID = 1:
Name: A12, CATEGORY: T1, HEIGHT: 4, Lineage: S3-S1-S1
The values for target table columns based on the rules are as below for ID = 2:
NAME: B, CATEGORY: T22, HEIGHT: 5, Lineage: S3-S2-S1
The logic to merge into target should look like this:
Merge into Target T
using (select statement with rank based rules from 3 source tables) on
when matched then
when not matched then
Question: any suggestions on writing this Merge in an efficient way which also should update the Lineage in the merge?
First, the MAP table must have a column that will give priority to the mapping.
Then you should PIVOT this table.
The next step is to combine UNION ALL of all source tables.
And finally, we can join all and select our values with the FIRST_VALUE function.
Having such a result, you can substitute it in MERGE.
Structure and sample data for testing:
CREATE OR REPLACE TABLE SOURCE1 (
ID int,
NAME string,
CATEGORY string,
HEIGHT numeric);
CREATE OR REPLACE TABLE SOURCE2 (
ID int,
NAME string,
CATEGORY string,
HEIGHT numeric);
CREATE OR REPLACE TABLE SOURCE3 (
ID int,
NAME string,
CATEGORY string,
HEIGHT numeric);
CREATE OR REPLACE TABLE MAP (
PRIORITY int,
SOURCE_COLUMN string,
SOURCE_TABLE string);
INSERT INTO SOURCE1 (ID, NAME, CATEGORY, HEIGHT)
VALUES (1, 'A', 'T1', 4),
(2, 'B', 'T2', 5),
(3, 'C', 'T3', 6);
INSERT INTO SOURCE2 (ID, NAME, CATEGORY, HEIGHT)
VALUES (1, 'A1', 'T1', 4.4),
(2, 'B1', 'T22', 6),
(3, NULL, 'T3', 7.2);
INSERT INTO SOURCE3 (ID, NAME, CATEGORY, HEIGHT)
VALUES (1, 'A12', 'T21', NULL),
(2, 'B', NULL, 6),
(3, 'C3', 'T3', NULL);
INSERT INTO MAP (PRIORITY, SOURCE_COLUMN, SOURCE_TABLE)
VALUES (1, 'NAME', 'SOURCE3'),
(2, 'NAME', 'SOURCE1'),
(3, 'NAME', 'SOURCE2'),
(1, 'CATEGORY', 'SOURCE2'),
(2, 'CATEGORY', 'SOURCE3'),
(3, 'CATEGORY', 'SOURCE1'),
(1, 'HEIGHT', 'SOURCE1'),
(2, 'HEIGHT', 'SOURCE2'),
(3, 'HEIGHT', 'SOURCE3');
And my suggestion for a solution:
WITH _MAP AS (
SELECT *
FROM MAP
PIVOT (MAX(SOURCE_TABLE) FOR SOURCE_COLUMN IN ('NAME', 'CATEGORY', 'HEIGHT')) AS p(PRIORITY, NAME, CATEGORY, HEIGHT)
), _SRC AS (
SELECT 'SOURCE1' AS SOURCE_TABLE, ID, NAME, CATEGORY, HEIGHT FROM SOURCE1
UNION ALL
SELECT 'SOURCE2' AS SOURCE_TABLE, ID, NAME, CATEGORY, HEIGHT FROM SOURCE2
UNION ALL
SELECT 'SOURCE3' AS SOURCE_TABLE, ID, NAME, CATEGORY, HEIGHT FROM SOURCE3
)
SELECT DISTINCT _SRC.ID,
FIRST_VALUE(_SRC.NAME) OVER(PARTITION BY _SRC.ID ORDER BY MN.PRIORITY) AS NAME,
FIRST_VALUE(_SRC.CATEGORY) OVER(PARTITION BY _SRC.ID ORDER BY MC.PRIORITY) AS CATEGORY,
FIRST_VALUE(_SRC.HEIGHT) OVER(PARTITION BY _SRC.ID ORDER BY MH.PRIORITY) AS HEIGHT,
REPLACE(FIRST_VALUE(_SRC.SOURCE_TABLE) OVER(PARTITION BY _SRC.ID ORDER BY MN.PRIORITY) || '-' ||
FIRST_VALUE(_SRC.SOURCE_TABLE) OVER(PARTITION BY _SRC.ID ORDER BY MC.PRIORITY) || '-' ||
FIRST_VALUE(_SRC.SOURCE_TABLE) OVER(PARTITION BY _SRC.ID ORDER BY MH.PRIORITY), 'SOURCE', 'S') AS LINEAGE
FROM _SRC
LEFT JOIN _MAP AS MN ON _SRC.SOURCE_TABLE = MN.NAME AND _SRC.NAME IS NOT NULL
LEFT JOIN _MAP AS MC ON _SRC.SOURCE_TABLE = MC.CATEGORY AND _SRC.CATEGORY IS NOT NULL
LEFT JOIN _MAP AS MH ON _SRC.SOURCE_TABLE = MH.HEIGHT AND _SRC.HEIGHT IS NOT NULL;
Result:
+----+------+----------+--------+----------+
| ID | NAME | CATEGORY | HEIGHT | LINEAGE |
+----+------+----------+--------+----------+
| 1 | A12 | T1 | 4 | S3-S2-S1 |
| 2 | B | T22 | 5 | S3-S2-S1 |
| 3 | C3 | T3 | 6 | S3-S2-S1 |
+----+------+----------+--------+----------+

How can I get the median value of each product in Postgresql?

I have one table named 'sales'.
create table sales
(
cust varchar(20),
prod varchar(20),
day integer,
month integer,
year integer,
state char(2),
quant integer
);
insert into sales values ('Bloom', 'Pepsi', 2, 12, 2001, 'NY', 4232);
insert into sales values ('Knuth', 'Bread', 23, 5, 2005, 'PA', 4167);
insert into sales values ('Emily', 'Pepsi', 22, 1, 2006, 'CT', 4404);
insert into sales values ('Emily', 'Fruits', 11, 1, 2000, 'NJ', 4369);
insert into sales values ('Helen', 'Milk', 7, 11, 2006, 'CT', 210);
insert into sales values ('Emily', 'Soap', 2, 4, 2002, 'CT', 2549);
insert into sales values ('Bloom', 'Eggs', 30, 11, 2000, 'NJ', 559);
.... There are 498 rows in total. Here is the overview of this table:
Now I want to get the median quant for each product. The result table should look like this:
I have tried these code and it works:
CREATE OR REPLACE FUNCTION _final_median(NUMERIC[])
RETURNS NUMERIC AS
$$
SELECT AVG(val)
FROM (
SELECT val
FROM unnest($1) val
ORDER BY 1
LIMIT 2 - MOD(array_upper($1, 1), 2)
OFFSET CEIL(array_upper($1, 1) / 2.0) - 1
) sub;
$$
LANGUAGE 'sql' IMMUTABLE;
CREATE AGGREGATE median(NUMERIC) (
SFUNC=array_append,
STYPE=NUMERIC[],
FINALFUNC=_final_median,
INITCOND='{}'
);
SELECT prod,round(median(quant)) AS median_quant FROM sales
group by prod
order by prod;
But I want to use the 'aggregation' function to get the same result and if there anyway I can do this without special functions?
Median is 0.5-th percentile (value in the middle of the ordered set). You can use percentile_cont to calculate it:
select percentile_cont(0.50) within group (order by sales.quant)
from sales
Seems that aggregate function tries to find the upper median.
In that case a PERCENTILE_DISC(0.5) with descending order can be used for the aggregation.
select prod,
count(*) as total_prod,
percentile_disc(0.5) within group (order by quant desc) as ceil_median_quant
from sales
group by prod;
A test on rextester here