How to transpose data grouped by two fields - sql

I am having issues figuring out wow I can transform my data from the example table to the desired results? The main idea is to group the rows by id_1 and id_2 and then transform the data into one row with the order by sequence_id. Any help or tips would be appreciated, thanks!
Example data:
date
id_1
id_2
sequence_id
data_1
data_2
data_3
2020-01-01
ABC
123
2
hi
nice
to
2020-01-01
ABC
123
3
meet
you
my
2020-01-01
ABC
123
4
name
is
bob
2020-02-01
DEF
456
1
good
day
sir
2020-02-01
DEF
456
3
how
are
you
Desired output:
date
id_1
id_2
sequence_id
data_1
data_2
data_3
data_1
data_2
data_3
data_1
data_2
data_3
2020-01-01
ABC
123
2
hi
nice
to
meet
you
my
name
is
bob
2020-02-01
DEF
456
1
good
day
sir
how
are
you

Consider below approach (could be good starting point for you to further optimize it)
select * from(
select date, id_1, id_2, min(sequence_id) over win as sequence_id,
data, row_number() over win pos
from your_table, unnest([data_1, data_2, data_3]) data with offset
window win as (partition by date, id_1, id_2 order by sequence_id, offset)
)
pivot (any_value(data) as data_ for pos in (1,2,3,4,5,6,7,8,9))
if applied to sample data in your question - output is

We cannot use the same name for more than one column in a table.Hence your desired output is not feasible(i.e data_1/2/3 as column name more than once).
If your goal is to have the full sentence for each id in each row then as an alternative you can consider the below query:
with cte as (
select "2020-01-01" date,"ABC" id_1,"123"id_2,"2"sequence,"hi" data_1,"nice" data_2,"to" data_3 union all
select "2020-01-01","ABC","123","3","meet","you","my" union all
select "2020-01-01","ABC","123","4","name","is","bob" union all
select "2020-02-01","DEF","456","1","good","day","sir" union all
select "2020-02-01","DEF","456","3","how","are","you"
)
select date,id_1,id_2,sequence
,STRING_AGG(concat(data_1," ",data_2," ",data_3)," ")over(partition by date,id_1,id_2 order by sequence)str
from cte
qualify row_number() over(partition by date,id_1,id_2 order by sequence desc)=1

Related

Select earliest date and count rows in table with duplicate IDs

I have a table called table1:
id created_date
1001 2020-06-01
1001 2020-01-01
1001 2020-07-01
1002 2020-02-01
1002 2020-04-01
1003 2020-09-01
I'm trying to write a query that provides me a list of distinct IDs with the earliest created_date they have, along with the count of rows each id has:
id created_date count
1001 2020-01-01 3
1002 2020-02-01 2
1003 2020-09-01 1
I managed to write a window function to grab the earliest date, but I'm having trouble figuring out where to fit the count statement in one:
SELECT
id,
created_date
FROM ( SELECT
id,
created_date,
row_number() OVER(PARTITION BY id ORDER BY created_date) as row_num
FROM table1)
) AS a
WHERE row_num = 1
You would use aggregation:
select id, min(create_date), count(*)
from table1
group by id;
I find it amusing that you want to use window functions -- which are considered more advanced -- when lowly aggregation suffices.

How can I return only one record per type?

I have a table with multiple records per one account type and I want to return only one by latest date. I have:
SELECT
id_nbr AS ID,
contact_type AS contype,
last_update AS date
FROM table
WHERE (contact_type = 'AAA' OR contact_type = 'BBB' OR contact_type = 'CCC');
It may return something like this:
ID contype date
111111111 AAA 2020-01-30
111111111 AAA 2019-05-05
111111111 BBB 2020-01-02
111111111 CCC 2020-02-17
Looking at this data, I only want 3 rows because contype has multiple AAA records but I only want the latest date. Something like:
ID contype date
111111111 AAA 2020-01-30
111111111 BBB 2020-01-02
111111111 CCC 2020-02-17
This is obviously very high-level but how can I achieve this? This would help me tremendously. Thanks in advance!!
Use qualify:
SELECT id_nbr AS ID, contact_type AS contype, last_update AS date
FROM table
WHERE contact_type IN ('AAA', 'BBB' , 'CCC')
QUALIFY ROW_NUMBER() OVER (PARTITION BY contact_type ORDER BY last_update DESC) = 1;
This returns the most recent row for each type.
-- create a new column (temp_tab) with max date and
-- select the rows where the date is equal to temp_tab
select ID, contype, date from
(select ID, contype, date, max(date) over (partition by ID, contype) as temp_tab from table)
where date = temp_tab

SQL: Take maximum value, but if a field is missing for a particular ID, ignore all values

This is somewhat difficult to explain...(this is using SQL Assistant for Teradata, which I'm not overly familiar with).
ID creation_date completion_date Difference
123 5/9/2016 5/16/2016 7
123 5/14/2016 5/16/2016 2
456 4/26/2016 4/30/2016 4
456 (null) 4/30/2016 (null)
789 3/25/2016 3/31/2016 6
789 3/1/2016 3/31/2016 30
An ID may have more than one creation_date, but it will always have the same completion_date. If the creation_date is populated for all records for an ID, I want to return the record with the most recent creation_date. However, if ANY creation_date for a given ID is missing, I want to ignore all records associated with this ID.
Given the data above, I would want to return:
ID creation_date completion_date Difference
123 5/14/2016 5/16/2016 2
789 3/25/2016 3/31/2016 6
No records are returned for 456 because the second record has a missing creation_date. The record with the most recent creation_date is returned for 123 and 789.
Any help would be greatly appreciated. Thanks!
Depending on your database, here's one option using row_number to get the max date per group. You can then filter those results with not exists to check against null values:
select *
from (
select *,
row_number() over (partition by id order by creation_date desc) rn
from yourtable
) t
where rn = 1 and not exists (
select 1
from yourtable t2
where t2.creationdate is null and t.id = t2.id
)
row_number is a window function that is supported in many databases. mysql doesn't but you can achieve the same result using user-defined variables.
Here is a more generic version using conditional aggregation:
select t.*
from yourtable t
join (select id, max(creation_date) max_creation_date
from yourtable
group by id
having count(case when creation_date is null then 1 end) = 0
) t2 on t.id = t2.id and t.creation_date = t2.max_creation_date
SQL Fiddle Demo

fill in a null cell with cell from previous record

Hi I am using DB2 sql to fill in some missing data in the following table:
Person House From To
------ ----- ---- --
1 586 2000-04-16 2010-12-03
2 123 2001-01-01 2012-09-27
2 NULL NULL NULL
2 104 2004-01-01 2012-11-24
3 987 1999-12-31 2009-08-01
3 NULL NULL NULL
Where person 2 has lived in 3 houses, but the middle address it is not known where, and when. I can't do anything about what house they were in, but I would like to take the previous house they lived at, and use the previous To date to replace the NULL From date, and use the next address info and use the From date to replace the null To date ie.
Person House From To
------ ----- ---- --
1 586 2000-04-16 2010-12-03
2 123 2001-01-01 2012-09-27
2 NULL 2012-09-27 2004-01-01
2 104 2004-01-01 2012-11-24
3 987 1999-12-31 2009-08-01
3 NULL 2009-08-01 9999-01-01
I understand that if there is no previous address before a null address, that will have to stay null, but if a null address is the last know address I would like to change the To date to 9999-01-01 as in person 3.
This type of problem seems to me where set theory no longer becomes a good solution, however I am required to find a DB2 solution because that's what my boss uses!
any pointers/suggestions welcome.
Thanks.
It might look something like this:
select
person,
house,
coalesce(from_date, prev_to_date) from_date,
case when rn = 1 then coalesce (to_date, '9999-01-01')
else coalesce(to_date, next_from_date) end to_date
from
(select person, house, from_date, to_date,
lag(to_date) over (partition by person order by from_date nulls last) prev_to_date,
lead(from_date) over (partition by person order by from_date nulls last) next_from_date,
row_number() over (partition by person order by from_date desc nulls last) rn
from temp
) t
The above is not tested but it might give you an idea.
I hope in your actual table you have a column other than to_date and from_date that allows you to order rows for each person, otherwise you'll have trouble sorting NULL dates, as you have no way of knowing the actual sequence.
create table Temp
(
person varchar(2),
house int,
from_date date,
to_date date
)
insert into temp values
(1,586,'2000-04-16','2010-12-03 '),
(2,123,'2001-01-01','2012-09-27'),
(2,NULL,NULL,NULL),
(2,104,'2004-01-01','2012-11-24'),
(3,987,'1999-12-31','2009-08-01'),
(3,NULL,NULL,NULL)
select A.person,
A.house,
isnull(A.from_date,BF.to_date) From_date,
isnull(A.to_date,isnull(CT.From_date,'9999-01-01')) To_date
from
((select *,ROW_NUMBER() over (order by (select 0)) rownum from Temp) A left join
(select *,ROW_NUMBER() over (order by (select 0)) rownum from Temp) BF
on A.person = BF.person and
A.rownum = BF.rownum + 1)left join
(select *,ROW_NUMBER() over (order by (select 0)) rownum from Temp) CT
on A.person = CT.person and
A.rownum = CT.rownum - 1

SQL Query for avoiding any repetition for a specific column terms

I am looking to design a query in which I need DISTINCT terms in a column without repetition. I am using the SQL Server 2008 R2 edition.
Here is my sample table:
id bank_code bank_name interest_rate
----------------------------------------------------------
1 123 abc 3.5
2 456 xyz 3.7
3 123 abc 3.4
4 789 pqr 3.3
5 123 abc 3.6
6 456 xyz 3.1
What I want is, to sort the table descending on the 'interest_rates' column but without any repetition of the terms in 'bank_code'.
Here is what I want:
id bank_code bank_name interest_rate
----------------------------------------------------------
2 456 xyz 3.7
5 123 abc 3.6
4 789 pqr 3.3
I have been trying the DISTINCT operator but it selects the unique combination of all the columns and not the single column for repetition.
Here is what I am doing, which clearly would not do get me what I want:
SELECT DISTINCT TOP 5 [ID], [BANK_CODE]
,[BANK_NAME]
,[INTEREST_RATE]
FROM [SAMPLE]
ORDER BY [INTEREST_RATE] DESC
Is there a way to achieve this?
Any help is appreciated.
;WITH x AS
(
SELECT id,bank_code,bank_name,interest_rate,
rn = ROW_NUMBER() OVER (PARTITION BY bank_code ORDER BY interest_rate DESC)
FROM dbo.[SAMPLE]
)
SELECT id,bank_code,bank_name,interest_rate
FROM x WHERE rn = 1
ORDER BY interest_rate DESC;
Try using analytical functions:
;WITH CTE AS
(
SELECT *, ROW_NUMBER() OVER(PARTITION BY bank_code ORDER BY interes_rate DESC) Corr
FROM [Sample]
)
SELECT id, bank_code, banck_name, interest_rate
FROM CTE
WHERE Corr = 1
not sure about the [] syntax, but you probably need something like this:
SELECT min([ID]), [BANK_CODE], [BANK_NAME], max([INTEREST_RATE])
FROM [SAMPLE]
GROUP BY [BANK_CODE], [BANK_NAME]
ORDER BY 4 DESC
How about something like this. It is simple, but will duplicate if you have interest rates that are the same.
select ID, #sample.Bank_code, bank_name, #sample.interest_Rate
from #sample
join
(
SELECT [BANK_CODE], MAX(interest_rate) as interest_Rate
FROM #sample
GROUP BY bank_code
) as groupingtable
on groupingtable.bank_code = #sample.bank_code
and groupingtable.interest_Rate = #sample.interest_rate