SQL Select Distinct row values as Column headers maintain individual row values - sql

I have a table that looks like this:
QuestionNum AnswerChoice
1 a
1 a
2 b
2 b
2 a
3 c
3 d
3 c
4 a
4 b
I would like to select the distinct values from the QuestionNum column as column headers and still list each answer choice underneath, so it should look like this:
1 2 3 4
a b c a
a b d b
a c
I started looking at Pivot tables, but the QuestionNum is going to be unknown. Also, I couldnt figure out a way to select multiple rows from the original.

You can do this with conditional aggregation. The challenge is that you need a key, and row_number() provides the key:
select max(case when QuestionNum = 1 then AnswerChoice end) as q_1,
max(case when QuestionNum = 2 then AnswerChoice end) as q_2,
max(case when QuestionNum = 3 then AnswerChoice end) as q_3,
max(case when QuestionNum = 4 then AnswerChoice end) as q_4
from (select t.*,
row_number() over (partition by QuestionNum order by examInstanceID) as seqnum
from table t
) t
group by seqnum;

Related

How to append a count number in duplicate values in a column and update in SQL Server?

Currently my table looks like this; I want to add the count numbers with distinct InstanceId and duplicate values.
Id
InstanceId
Name
1
1
DiscoveryInstance
2
1
DiscoveryInstance
3
2
ETLInstance
4
3
DiscoveryInstance
5
3
DiscoveryInstance
6
2
ETLInstance
7
2
ETLInstance
I want the output to be like this:
Id
InstanceId
Name
1
1
DiscoveryInstance
2
1
DiscoveryInstance_Backup_1
3
2
ETLInstance
4
3
DiscoveryInstance
5
3
DiscoveryInstance_Backup_1
6
2
ETLInstance_Backup_1
7
2
ETLInstance_Backup_2
I don't want to update the first value and update should start with the next duplicate value in the column.
How to update this table to make this output possible in SQL Server query?
EDIT This solution addresses the ORIGINAL question and original output. This is no longer valid because you changed your desired output.
You could use rank() and concat in this manner:
with cte as (select id, name, rank() over (partition by name order by id) as name_rank
from my_table
)
select t.id,
case
when c.name_rank = 1 then t.name
else concat(t.name, '_Backup_', c.name_rank - 1)
end name
from my_table t
join cte c
on t.id = c.id
Output:
id
name
1
DiscoveryInstance
2
DiscoveryInstance_Backup_1
3
ETLInstance
4
DiscoveryInstance_Backup_2
5
DiscoveryInstance_Backup_3
6
ETLInstance_Backup_1
DB-fiddle found here. I see you updated the question after I posted this answer by adding another column, but that does not look important at the moment.
EDIT
This is an updated answer (thanks Guido) that would address your newly updated output:
with cte as (select id, name, rank() over (partition by name, instanceid order by id) as name_rank
from mytable
)
select t.id,
case
when c.name_rank = 1 then t.name
else concat(t.name, '_Backup_', c.name_rank - 1)
end name
from mytable t
join cte c
on t.id = c.id
Another option is using the row_number() like this
This solution uses your new column instanceid to get the correct data
select t.id,
case when rownumber > 1 then t.Name + '_Backup_' + convert(varchar(10), t.rownumber - 1)
else t.Name
end
from ( select t.id,
t.name,
row_number() over (partition by t.Name, t.instanceid order by t.id) as rownumber
from mytable t
) t
order by t.id
See this DBFiddle
output is
id
(No column name)
1
DiscoveryInstance
2
DiscoveryInstance_Backup_1
3
ETLInstance
4
DiscoveryInstance
5
DiscoveryInstance_Backup_1
6
ETLInstance_Backup_1
7
ETLInstance_Backup_2

Update table column based on CTE result

I have the code shown below, and I want to update my original table to reflect the results of this query. I want each record's Route_type column to update with the corresponding value from the Route_type column in the query based on the code associated with each record. For instance, all records with code=1 should have Route_Type updated to "Other" based on the query.
With Route_Number_CTE (Code,Year_and_Week, Route_Count) As
(
Select
Code, Year_and_Week, Count(Route) AS Route_Count
From
Deliveries
Group by
Code, Year_and_Week
)
select
d.Code,
min(r.Route_Count) As Min_Count,
max(r.Route_Count) As Max_Count,
(case
When max(r.Route_Count) = 1 then 'One'
When max(r.Route_Count) <= 3 AND min(r.Route_Count) > 1 then 'Three or less'
When min(r.Route_Count) > 4 then 'Four or More'
Else 'Other'
End) As Route_Type
From
Deliveries as d
inner join
Route_Number_CTE as r on d.Code = r.Code
Group By
d.Code;
Query results:
Code Min_Count Max_Count Route_Type
----------------------------------------
1 1 4 Other
2 1 2 Three or less
3 3 3 Three or less
Deliveries:
Code Route Route_Type
-------------------------
1 A
1 C
1 D
2 A
2 C
2 B
3 A
3 C
3 D
I think that you could use window functions and an updatable cte. This is simpler, and should be more efficient as it avoids the need for aggregation and joins:
with cte as (
select route_type, max(cnt) over(partition by code) max_cnt
from (
select d.*, count(*) over(partition by code, year_and_week) cnt
from deliveries d
) d
)
update cte
set route_type = case
when max_cnt = 1 then 'One'
when max_cnt <= 3 then 'Three or less'
when max_cnt > 4 then 'Four or more'
end
I figured it out by just creating a second CTE from the second Select statement and using an update query with the result.

Easiest way to select distinct with least number of null

I want to create a view over a table that has 500k rows and 10 columns. In that table there are duplicate id but with different amount of information, because some of the columns are NULL. My objective is to keep one column in case of duplicates, but want to keep the one with less number of NULL values.
Let me explain it with a quick example. I am working with a query similar to this.
CREATE TABLE test (ID INT, b char(1), c char (1), d char(1))
INSERT INTO test(ID,b,c,d) VALUES
(1,NULL,NULL,NULL),
(1,'B', NULL,NULL),
(1,'B','C',NULL),
(1,'B','C','D'),
(2,'E','F',NULL),
(2,'E',NULL,NULL),
(3,NULL,NULL,NULL),
(3,'G',NULL,NULL)
SELECT DISTINCT ID,b,c,d FROM test
DROP TABLE test
The result is
ID b c d
--------------------
1 NULL NULL NULL
1 B NULL NULL
1 B C NULL
1 B C D
2 E F NULL
2 E NULL NULL
3 NULL NULL NULL
3 G NULL NULL
However, the output I want to see is
ID b c d
--------------------
1 B C D
2 E F NULL
3 G NULL NULL
So, based on the id and if there are duplicates, I want to have the row with the least number of nulls. How is it possible?
Thank you very much
If you want the row with the least number of NULLs, then you would basically count them:
select t.*
from test t
order by ( (case when b is null then 1 else 0 end) +
(case when c is null then 1 else 0 end) +
(case when d is null then 1 else 0 end)
) desc
fetch first 1 row only;
However, if you want one row per id with a non-NULL value in each column (if available) then #maSTAShuFu's answer is appropriate.
EDIT:
If you want one row per client, then simply use row_number():
select t.*
from (select t.*,
row_number() over (partition by client_id
order by ( (case when b is null then 1 else 0 end) +
(case when c is null then 1 else 0 end) +
(case when d is null then 1 else 0 end)
) desc
) as seqnum
from t
) t
where seqnum = 1;
using MAX.
SELECT
MAX(ID) ID,
MAX(B) B,
MAX(C) C,
MAX(D) D
FROM test

selecting the highest count for a categorical variable when grouping

I have the following table:
custID Cat
1 A
1 B
1 B
1 B
1 C
2 A
2 A
2 C
3 B
3 C
4 A
4 C
4 C
4 C
What I need is the most efficient way to aggregate by CustID in such a manner that I obtain the most frequent category (cat), the second most frequent and the third. The output of the above should be
most freq 2nd most freq 3rd most freq
1 B A C
2 A C Null
3 B C Null
4 C A Null
When there is a tie in the count I do not really care what is first and what is second. For example for customer 1 2nd most freq and 3rd most freq could be swapped because each of them occur 1 time only.
Any sql would be fine, preferable hive sql.
Thank you
Try to use group by twice and dense_rank() to sort accorting to the cat count. Actually I'm not 100% sure , but I guess it should work in hive as well.
select custId,
max(case when t.rn = 1 then cat end) as [most freq],
max(case when t.rn = 2 then cat end) as [2nd most freq],
max(case when t.rn = 3 then cat end) as [3th most freq]
from
(
select custId, cat, dense_rank() over (partition by custId order by count(*) desc) rn
from your_table
group by custId, cat
) t
group by custId
demo
According to the comments I add slightly modified solution that conforms with Hive SQL
select custId,
max(case when t.rn = 1 then cat else null end) as most_freq,
max(case when t.rn = 2 then cat else null end) as 2nd_most_freq,
max(case when t.rn = 3 then cat else null end) as 3th_most_freq
from
(
select custId, cat, dense_rank() over (partition by custId order by ct desc) rn
from (
select custId, cat, count(*) ct
from your_table
group by custId, cat
) your_table_with_counts
) t
group by custId
Hive SQL demo
SELECT journal, count(*) as frequency
FROM ${hiveconf:TNHIVE}
WHERE journal IS NOT NULL
GROUP BY journal
ORDER BY frequency DESC
LIMIT 5;

Exclude value of a record in a group if another is present

In the example table below, I'm trying to figure out a way to sum amount over id for all marks where mark 'C' doesn't exist within an id. When mark 'C' does exist in an id, I want the sum of amounts over that id, excluding the amount against mark 'A'. As illustration, my desired output is at the bottom. I've considered using partitions and the EXISTS command, but I'm having trouble conceptualizing the solution. If any of you could take a look and point me in the right direction, it would be greatly appreciated :)
sample table:
id mark amount
------------------
1 A 1
2 A 3
2 B 2
3 A 2
4 A 1
4 B 3
5 A 1
5 C 3
6 A 2
6 C 2
desired output:
id sum(amount)
-----------------
1 1
2 5
3 2
4 4
5 3
6 2
select
id,
case
when count(case mark when 'C' then 1 else null end) = 0
then
sum(amount)
else
sum(case when mark <> 'A' then amount else 0 end)
end
from sampletable
group by id
Here is my effort:
select id, sum(amount) from table t where not t.id = 'A' group by id
having id in (select id from table t where mark = 'C')
union
select id, sum(amount) from table t where t.id group by id
having id not in (select id from table t where mark = 'C')
SELECT
id,
sum(amount) AS sum_amount
FROM atable t
WHERE mark <> 'A'
OR NOT EXISTS (
SELECT *
FROM atable
WHERE id = t.id
AND mark = 'C'
)
GROUP BY
id
;