Transpose Columns having values into Rows - sql

I have a requirement that needs columns with values to be transposed into rows. For instance refer to the table below:
cust:
cust_id | cover1 | cover2 | cover3
1234 | 'PAG' | Null | 'TDE'
5678 | Null | 'GAP' | Null
Given the above table, we have to find out which columns have a value and if there is a value in that column then there should be a row created. For e.g.
cust_id | cover
1234 | 'PAG'
1234 | 'TDE'
5678 | 'GAP'
For the customer 1234 only cover1 and cover 3 are populated hence there will be 2 records created. For 5678 cover1 & cover2 are Nulls hence, only 1 record for cover_3 needs to be created.
I could apply a simple approach like below. But I was wondering if there is an elegant approach and a smarter solution to this.
select cust_id, cover1 AS cover where cover1 IS Not Null
UNION ALL
select cust_id, cover2 AS cover where cover2 IS Not Null
UNION ALL
select cust_id, cover3 AS cover where cover3 IS Not Null
Please share your thoughts. We use Spark-SQL 2.4
Thanks

Maybe like this:
SELECT CUST_ID, COVER FROM(
SELECT * FROM test
UNPIVOT(
COVER
for COVER_C
IN(
COVER1,
COVER2,
COVER3))
);
Here is the DEMO

The requirement could be met by using Spark's stack() with the %sql mode:
create temporary view cust
as
select * from values (12345, 'PAG', Null, 'TDE'),
(5678, Null, 'GAP', Null) as (cust_id, cover1, cover2, cover3);
select * from
(select cust_id,
stack(3, 'cover1', cover1,
'cover2', cover2,
'cover3', cover3) as (cover_nm, cover)
from cust
)
where cover is not null;

Related

HQL, insert two rows if a condition is met

I have the following table called table_persons in Hive:
+--------+------+------------+
| people | type | date |
+--------+------+------------+
| lisa | bot | 19-04-2022 |
| wayne | per | 19-04-2022 |
+--------+------+------------+
If type is "bot", I have to add two rows in the table d1_info else if type is "per" i only have to add one row so the result is the following:
+---------+------+------------+
| db_type | info | date |
+---------+------+------------+
| x_bot | x | 19-04-2022 |
| x_bnt | x | 19-04-2022 |
| x_per | b | 19-04-2022 |
+---------+------+------------+
How can I add two rows if this condition is met?
with a Case When maybe?
You may try using a union to merge or duplicate the rows with bot. The following eg unions the first query which selects all records and the second query selects only those with bot.
Edit
In response to the edited question, I have added an additional parity column (storing 1 or 0) named original to differentiate the duplicate entry named
SELECT
p1.*,
1 as original
FROM
table_persons p1
UNION ALL
SELECT
p1.*,
0 as original
FROM
table_persons p1
WHERE p1.type='bot'
You may then insert this into your other table d1_info using the above query as a subquery or CTE with the desired transformations CASE expressions eg
INSERT INTO d1_info
(`db_type`, `info`, `date`)
WITH merged_data AS (
SELECT
p1.*,
1 as original
FROM
table_persons p1
UNION ALL
SELECT
p1.*,
0 as original
FROM
table_persons p1
WHERE p1.type='bot'
)
SELECT
CONCAT('x_',CASE
WHEN m1.type='per' THEN m1.type
WHEN m1.original=1 AND m1.type='bot' THEN m1.type
ELSE 'bnt'
END) as db_type,
CASE
WHEN m1.type='per' THEN 'b'
ELSE 'x'
END as info,
m1.date
FROM
merged_data m1
ORDER BY m1.people,m1.date;
See working demo db fiddle here
I think what you want is to create a new table that captures your logic. This would simplify your query and make it so you could easily add new types without having to edit logic of a case statement. It may also make it cleaner to view your logic later.
CREATE TABLE table_persons (
`people` VARCHAR(5),
`type` VARCHAR(3),
`date` VARCHAR(10)
);
INSERT INTO table_persons
VALUES
('lisa', 'bot', '19-04-2022'),
('wayne', 'per', '19-04-2022');
CREATE TABLE info (
`type` VARCHAR(5),
`db_type` VARCHAR(5),
`info` VARCHAR(1)
);
insert into info
values
('bot', 'x_bot', 'x'),
('bot', 'x_bnt', 'x'),
('per','x_per','b');
and then you can easily do a join:
select
info.db_type,
info.info,
persons.date date
from
table_persons persons inner join info
on
info.type = persons.type

SQL Identify records which occur more than once in the same year

I have a records from which a set of Procedure codes should only occur once per year per member. I'm trying to identify occurrences where this rule is broken.
I've tried the below SQL, is that correct?
Table
+---------------+--------+-------------+
| ProcedureCode | Member | ServiceDate |
+---------------+--------+-------------+
| G0443 | 1234 | 01-03-2017 |
+---------------+--------+-------------+
| G0443 | 1234 | 05-03-2018 |
+---------------+--------+-------------+
| G0443 | 1234 | 07-03-2018 |
+---------------+--------+-------------+
| G0444 | 3453 | 01-03-2017 |
+---------------+--------+-------------+
| G0443 | 5676 | 07-03-2018 |
+---------------+--------+-------------+
Expected results where rule is broken
+---------------+--------+
| ProcedureCode | Member |
+---------------+--------+
| G0443 | 1234 |
+---------------+--------+
SQL
Select ProcedureCD, Mbr_Id
From CLAIMS
Where ProcedureCD IN ('G0443', 'G0444')
GROUP BY ProcedureCD,Mbr_Id, YEAR(ServiceFromDate)
having count(YEAR(ServiceFromDate))>1
The query you've written will work (if you correct the column names- your query uses different column names to the sample data you posted). It can be simplified visually by using COUNT(*) in the HAVING clause. COUNT works on any non null value and accumulates a 1 for non nulls, or 0 for nulls, but there isn't any significance to using YEAR inside the count in this case because all the dates are non null and count isn't interested in the value - count(*), count(1), count(0), count(member)would all work equally here
The only time count(column) works differently to count(*) is when column contains null values. There is also an option of COUNT where you put DISTINCT inside the brackets, and this causes the counting to ignore repeated values.
COUNT DISTINCT on a table column that contains 6 rows of values 1, 1, 2, null, 3, 3 would return 3 (3 unique values). COUNTing the same column would return 5 (5 non null values), COUNT(*) would return 6
You should understand that by putting the YEAR(...) in the group by but not the select, you might produce duplicate-looking rows in the output. For example if you had these rows also:
Member, Code, Date
1234, G0443, 1-1-19
1234, G0443, 2-1-19
And you're grouping on year (but not showing it) then you'll see:
1234, G0443 --it's for year 2018
1234, G0443 --it's for year 2019
Personally I think it'd be handy to show the year in the select list, so you can better pinpoint where the problem is, but if you want to squish these duplicate rows, do a SELECT DISTINCT Alternatively, leverage the difference between count and count distinct: remove the year from the GROUP BY and instead say HAVING COUNT(*) > COUNT(DISTINCT YEAR(ServiceDate))
As discussed above a count(*) will be greater than a count distinct year if there are duplicated years
Select ProcedureCode, Member,YEAR(ServiceDate) [Year],Count(*) Occurences
From CLAIMS
Where ProcedureCode IN ('G0443', 'G0444')
GROUP BY ProcedureCode, Member,YEAR(ServiceDate)
HAVING Count(*) > 1
Hope This code will help you
create table #temp (ProcedureCode varchar(20),Member varchar(20),ServiceDate Date)
insert into #temp (ProcedureCode,Member,ServiceDate) values ('G0443','1234','01-03-2017')
insert into #temp (ProcedureCode,Member,ServiceDate) values ('G0443','1234','05-03-2018 ')
insert into #temp (ProcedureCode,Member,ServiceDate) values ('G0443','1234','07-03-2018')
insert into #temp (ProcedureCode,Member,ServiceDate) values ('G0444','3453','01-03-2017')
insert into #temp (ProcedureCode,Member,ServiceDate) values ('G0443','5676','07-03-2018')
select ProcedureCode,Member from #temp
where YEAR(ServiceDate) in (Select year(ServiceDate) ServiceDate from #temp group by
ServiceDate having count(ServiceDate)>1)
and Member in (Select Member from #temp group by Member having count(Member)>1)
Group by ProcedureCode,Member
drop table #temp

How to add a total row at the end of the table in t-sql?

I need to add a row of sums as the last row of the table. For example:
book_name | some_row1 | some_row2 | sum
---------------+---------------+---------------+----------
book1 | some_data11 | some_data12 | 100
book2 | some_data21 | some_data22 | 300
book3 | some_data31 | some_data32 | 500
total_books=3 | NULL | NULL | 900
How can I do this? (T-SQL)
You can use union all :
select book_name, some_row1, some_row2, sum
from table t
union all
select cast(count(*) as varchar(255)), null, null, sum(sum)
from table t;
However, count(*) will give you no of rows available in table, if the book_name has null value also, then you need count(book_name) instead of count(*).
Try with ROLLUP
SELECT CASE
WHEN (GROUPING([book_name]) = 1) THEN 'total_books'
ELSE [book_name] END AS [book_name],some_row1, some_row2
,SUM(]sum]) as Total_Sales
From Before
GROUP BY
[book_name] WITH ROLLUP
I find that grouping sets is much more flexible than rollup. I would write this as:
select coalesce(book_name,
replace('total_books=#x', '#x', count(*))
) as book_name,
col2, col3, sum(whatever)
from t
group by grouping sets ( (book_name), () );
Strictly speaking, the GROUPING function with a CASE is better than COALESCE(). However, NULL values on the grouping keys is quite rare.

Redshift does not support rollup(), grouping() functions

Trying to convert Teradata bteq SQL scripts to redshift SQL. My current redshift Postgres version is 8.0.2, redshift version is 1.0.1499. The current version of redshift does not support rollup(), grouping() functions. How to overcome and resolve this scenario. What are the equivalent redshift functions for them? Could anyone explain with some examples how to do?
Sample Teradata SQL-
select
PRODUCT_ID,CUST_ID,
GROUPING (PRODUCT_ID),
GROUPING (CUST_ID),
row_number over (order by PRODUCT_ID,CUST_ID) AS "ROW_OUTPUT_NUM"
from products
group by rollup(PRODUCT_ID,CUST_ID);
Need to convert above sql query to Redshift
Implement the ROLLUP by hand
Once Redshift does not currently recognize the ROLLUP clause, you must implement this grouping technique in a hard way.
ROLLUP with 1 argument
With ROLLUP Ex. PostgreSQL
SELECT column1, aggregate_function(*)
FROM some_table
GROUP BY ROLLUP(column1)
The equivalent implementation
-- First, the same GROUP BY without the ROLLUP
-- For efficiency, we will reuse this table
DROP TABLE IF EXISTS tmp_totals;
CREATE TEMP TABLE tmp_totals AS
SELECT column1, aggregate_function(*) AS total1
FROM some_table
GROUP BY column1;
-- Show the table 'tmp_totals'
SELECT * FROM tmp_totals
UNION ALL
-- The aggregation of 'tmp_totals'
SELECT null, aggregate_function(total1) FROM tmp_totals
ORDER BY 1
Example output
Country | Sales
-------- | -----
Poland | 2
Portugal | 4
Ukraine | 3
null | 9
ROLLUP with 2 argument
With ROLLUP Ex. PostgreSQL
SELECT column1, column2, aggregate_function(*)
FROM some_table
GROUP BY ROLLUP(column1, column2);
The equivalent implementation
-- First, the same GROUP BY without the ROLLUP
-- For efficiency, we will reuse this table
DROP TABLE IF EXISTS tmp_totals;
CREATE TEMP TABLE tmp_totals AS
SELECT column1, column2, aggregate_function(*) AS total1
FROM some_table
GROUP BY column1, column2;
-- Show the table 'tmp_totals'
SELECT * FROM tmp_totals
UNION ALL
-- The sub-totals of the first category
SELECT column1, null, sum(total1) FROM tmp_totals GROUP BY column1
UNION ALL
-- The full aggregation of 'tmp_totals'
SELECT null, null, sum(total1) FROM tmp_totals
ORDER BY 1, 2;
Example output
Country | Segment | Sales
-------- | -------- | -----
Poland | Premium | 0
Poland | Base | 2
Poland | null | 2 <- sub total
Portugal | Premium | 1
Portugal | Base | 3
Portugal | null | 4 <- sub total
Ukraine | Premium | 1
Ukraine | Base | 2
Ukraine | null | 3 <- sub total
null | null | 9 <- grand total
If you use the UNION technique that others have pointed to, you'll be scanning the underlying table multiple times.
If the fine-level GROUPing actually results in a significant reduction in the data size, a better solution may be:
create temp table summ1
as
select PRODUCT_ID,CUST_ID, ...
from products
group by PRODUCT_ID,CUST_ID;
create temp table summ2
as
select PRODUCT_ID,cast(NULL as INT) AS CUST_ID, ...
from products
group by PRODUCT_ID;
select * from summ1
union all
select * from summ2
union all
select cast(NULL as INT) AS PRODUCT_ID, cast(NULL as INT) AS CUST_ID, ...
from summ2

Problems in oracle SQL using select in update

So, i have 2 tables, for simplicity's sake i'll show 2 columns for one, and only one for the other, but they have a lot more data.
ok, so, the tables looks like this:
_________________________________
| table 1 |
+---------------------------------+
| gas_deu est_pag |
+---------------------------------+
| 56857 (null) |
| 60857 (null) |
| 80857 (null) |
+---------------------------------+
______________________
| table 2 |
+----------------------+
| gas_pag |
+----------------------+
| 56857 |
| 21000 |
| 75857 |
+----------------------+
table 1 and table 2 can be joined using id_edi, and nr_dep (same name in both tables)
what's happening here is basically the following:
in table 1, i have gas_deu which is a number owned to someone.
in table 2 gas_pag is how much has been paid, meaning gas_deu-gas_pag should give 0 (or negative) if gas_deu was paid in full, or a positive number if it was partially paid
Also, some rows from table 1 are not in table 2 meaning gas_deu has not been paid at all.
What i need to do is update est_pag in table 1 with the following:
if gas_deu-gas_pag<=0 (debt paid) then update est_pag value to 1
if gas_deu-gas_pag>0 (partially paid) then update est_pag value to 2
if a row is in table 1 but not in table 2, it has not been paid at all so est_pag value will be 3
I have tried a lot, and i mean A LOT of code in the update for this, the tables have a lot of columns so i won-t post the code i have tried because it would only get more confusing
I mostly tried using select sub-queries in the set of the update, and in the where of the update. all of them always give me the single row error, i know this happens because it returns more than one value, but the how would i do this? I don't see a way in which i get only one value of the query and update only the rows that match the debt status.
using case was the logical choice for me but it always returns more than one row it seems (tried to make a (case when gas_deu-gas_pag<=0 then 1 else est_pag end), since if i could get at least one value in there, it would be a start, but get the same more than one row problem)
Any help or suggestions are highly appreciated,i already tried everything i could think of, and a lot of answers from here in stackoverflow, but still can't get it to work out
Edit: adding what table 1 should look like after updating
| table 1 |
+---------------------------------+
| gas_deu est_pag |
+---------------------------------+
| 56857 1 |
| 60857 2 |
| 80857 2 |
+---------------------------------+
Update 2:
update table1
set est_pag=(select (case when (select min((case when gasto.gas_deu-
pago.gas_pag<=0 then 0 else null end))
from table2 pago full outer join gasto_comun_pruebaestpago gasto on
pago.nr_dep=gasto.nr_dep
where gasto.nr_dep=pago.nr_dep and gasto.id_edi=pago.id_edi)=0 then 1 else
null end)
from table2 pago full outer join gasto_comun_pruebaestpago gasto on
pago.nr_dep=gasto.nr_dep
where gasto.nr_dep=pago.nr_dep and gasto.id_edi=pago.id_edi)
where est_pag is null;
This is one of many codes i tried. this one changes all values to 1, this is bacause of the min() in there, that outpust just one row, with a 0 and the gets checked by the case, 0=0 so everything goes to 1. The problem is that, without the min(), the select does what i need, but the update throws the 'single-row subquery returns more than one row' error again
You can update through a left outer join. It is possible the second table has more than one payment for the same account (customers may make partial payments), so I aggregate the second table by ID first. You didn't provide input data for testing, so in my small test I assume the tables are matched by id (primary key in the first table, foreign key in the second table - even though I didn't write the constraints into the table definitions).
The way I wrote the update, a 3 will be assigned if an account is not present in the second table, but also if it is present but with NULL in the gas_pag column. (It would be best if that column was declared NOT NULL from the outset!) If a different handling is desired in that case, you didn't say; it can be accommodated easily though.
So, here goes.
TABLE CREATION
create table t1 ( id number, gas_deu number, est_pag number );
insert into t1
select 101, 56857, null from dual union all
select 104, 60857, null from dual union all
select 108, 80857, null from dual
;
create table t2 ( id number, gas_pag number ) ;
insert into t2
select 101, 56857 from dual union all
select 104, 60000 from dual
;
select * from t1;
ID GAS_DEU EST_PAG
---------- ---------- ----------
101 56857
104 60857
108 80857
select * from t2;
ID GAS_PAG
---------- ----------
101 56857
104 60000
UPDATE STATEMENT
update
( select est_pag, gas_deu, gas_pag
from t1 left join
( select id, sum(gas_pag) as gas_pag from t2 group by id ) x
on t1.id = x.id
)
set est_pag = case when gas_deu <= gas_pag then 1
when gas_deu > gas_pag then 2
else 3
end
;
select * from t1;
ID GAS_DEU EST_PAG
---------- ---------- ----------
101 56857 1
104 60857 2
108 80857 3
CLEAN-UP
drop table t1 purge;
drop table t2 purge;