SQL Level the data conditionally - sql

I have the following puzzle to solve (an urgent business assignment to be exact)
SQL SERVER 2008
I have a table of this form
ID Market SubMarket Value
1 1 1 3
2 1 2 6
3 1 3 2
4 2 23 1
5 2 24 9
I have specific MarketIDs and every MarketID has specific SubMarketIDs (maximum 5 - I know how may for each)
eg MarketID 1 has SubMarketIDs 1,2,3
MarketID 2 has SubMarketIDs 23,24 etc
and each SubMarketID has a variable value
I must transform my data in a fixed table of this type
MarketID SubMarketAvalue SubMarketBValue SubMarketCValue....SubMarketEValue
1 3 6 2 null
2 1 9 null null
SubMarketAValue must contain the value of the smaller SubMarketID
SubMarketBValue must contain the value of the next bigger SubMarketID

You did not specify the RDBMS, but you can use the following in SQL Server 2005+, Oracle and PostgreSQL:
select market,
max(case when rn = 1 then value end) as SubMarketAvalue,
max(case when rn = 2 then value end) as SubMarketBvalue,
max(case when rn = 3 then value end) as SubMarketCvalue,
max(case when rn = 4 then value end) as SubMarketDvalue,
max(case when rn = 5 then value end) as SubMarketEvalue
from
(
select id, market, submarket, value,
row_number() over(partition by market
order by market, submarket) rn
from yourtable
) x
group by market
see SQL Fiddle with Demo

Related

how to convert a table to another in SQL (similar to pivot, but not exactly)

I have a database table looks like this in the below, in SQL Server 2016:
ProjectKey - Type - Percentage
----------------------------------------
40 8 100%
50 6 40%
50 9 60%
60 3 30%
60 8 30%
60 9 40%
(the max rows for the same ProjectKey is 3)
I want to write a query to be able to convert the above table to the following:
ProjectKey - Type1 - Percentage1 - Type2 - Percentage2 - Type3 - Percentage3
-------------------------------------------------------------------------------------
40 8 100% null null null null
50 6 40% 9 60% null null
60 3 30% 8 30% 9 40%
If it can be achieved by writing a SQL query that would be great. Anyone can help? Thank you very much!
You can use row_number() and conditional aggregation:
select projectkey,
max(case when seqnum = 1 then type end) as type_1,
max(case when seqnum = 1 then percentage end) as percentage_1,
max(case when seqnum = 2 then type end) as type_2,
max(case when seqnum = 2 then percentage end) as percentage_2,
max(case when seqnum = 3 then type end) as type_3,
max(case when seqnum = 3 then percentage end) as percentage_3
from (select t.*,
row_number() over (partition by projectkey order by type) as seqnum
from t
) t
group by projectkey;

(SQL) Per ID, starting from the first row, return all successive rows with a value N greater than the prior returned row

I have the following example dataset:
ID
Value
Row index (for reference purposes only, does not need to exist in final output)
a
4
1
a
7
2
a
12
3
a
12
4
a
13
5
b
1
6
b
2
7
b
3
8
b
4
9
b
5
10
I would like to write a SQL script that returns the next row which has a Value of N or more than the previously returned row starting from the first row per ID and ordered ascending by [Value]. An example of the final table for N = 3 should look like the following:
ID
Value
Row index
a
4
1
a
7
2
a
12
3
b
1
6
b
4
9
Can this script be written in a vectorised manner? Or must a loop be utilised? Any advice would be greatly appreciated. Thanks!
SQL tables represent unordered sets. There is no definition of "previous" value, unless you have a column that specifies the ordering. With such a column, you can use lag():
select t.*
from (select t.*,
lag(value) over (partition by id order by <ordering column>) as prev_value
from t
) t
where prev_value is null or prev_value <= value - 3;
EDIT:
I think I misunderstood what you want to do. You seem to want to start with the first row for each id. Then get the next row that is 3 or higher in value. Then hold onto that value and get the next that is 3 or higher than that. And so on.
You can do this in SQL using a recursive CTE:
with ts as (
select distinct t.id, t.value, dense_rank() over (partition by id order by value) as seqnum
from t
),
cte as (
select id, value, value as grp_value, 1 as within_seqnum, seqnum
from ts
where seqnum = 1
union all
select ts.id, ts.value,
(case when ts.value >= cte.grp_value + 3 then ts.value else cte.grp_value end),
(case when ts.value >= cte.grp_value + 3 then 1 else cte.within_seqnum + 1 end),
ts.seqnum
from cte join
ts
on ts.id = cte.id and ts.seqnum = cte.seqnum + 1
)
select *
from cte
where within_seqnum = 1
order by id, value;
Here is a db<>fiddle.

how to set auto increment column value with condition

I have table like this:
value nextValue
1 2
2 3
3 20
20 21
21 22
22 23
23 NULL
Value is ordered ASC, nextValue is next row Value.
requirement is group by with condition nextValue-value>10, and count how many values in different groups.
For example, there should be two groups (1,2,3) and (20,21,22,23), first group count is 3, the second group count is 4.
I'm trying to mark each group with unique number, so I could group by these marked nums
value nextValue mark
1 2 1
2 3 1
3 20 1
20 21 2
21 22 2
22 23 2
23 NULL 2
But I don't know how to write mark column, I need an autocrement variable when nextValue-value>10.
Can I make it happen in Hive? Or there's better solution for the requirement?
If I understand correctly, you can use a cumulative sum. The idea is to set a flag when next_value - value > 10. This identifies the groups. So, this query adds a group number:
select t.*,
sum(case when nextvalue > value + 10 then 1 else 0 end) over (order by value desc) as mark
from t
order by value;
You might not find this solution satisfying, because the numbering is in descending order. So, a bit more arithmetic fixes that:
select t.*,
(sum(case when nextvalue > value + 10 then 1 else 0 end) over () + 1 -
sum(case when nextvalue > value + 10 then 1 else 0 end) over (order by value desc)
) as mark
from t
order by value;
Here is a db<>fiddle.
Calculate previous value, then calculate new_group_flag if value-prev_value >10, then calculate cumulative sum of new_group_flag to get group number (mark). Finally you can calculate group count using analytics function or group-by (in my example analytics count is used to show you the full dataset with all intermediate calculations). See comments in the code.
Demo:
with your_data as (--use your table instead of this
select stack(10, --the number of tuples generated
1 ,
2 ,
3 ,
20 ,
21 ,
22 ,
23 ,
40 ,
41 ,
42
) as value
)
select --4. Calculate group count, etc, etc
value, prev_value, new_group_flag, group_number,
count(*) over(partition by group_number) as group_count
from
(
select --3. Calculate cumulative sum of new group flag to get group number
value, prev_value, new_group_flag,
sum(new_group_flag) over(order by value rows between unbounded preceding and current row)+1 as group_number
from
(
select --2. calculate new_group_flag
value, prev_value, case when value-prev_value >10 then 1 else 0 end as new_group_flag
from
(
select --1 Calculate previous value
value, lag(value) over(order by value) prev_value
from your_data
)s
)s
)s
Result:
value prev_value new_group_flag group_number group_count
1 \N 0 1 3
2 1 0 1 3
3 2 0 1 3
20 3 1 2 4
21 20 0 2 4
22 21 0 2 4
23 22 0 2 4
40 23 1 3 3
41 40 0 3 3
42 41 0 3 3
This works for me
It needs "rows between unbounded preceding and current row" in my case.
select t.*,
sum(case when nextvalue > value + 10 then 1 else 0 end) over (order by value desc rows between unbounded preceding and current row) as mark
from t
order by value;

Sequencing and re-setting in SQL Server 2008

I am actually new to SQL server 2008, and I am trying to sequence and re-set a number in a table. The source is something like:
Row Refrec FLAG
1 5 NULL
2 4 X
3 3 NULL
4 2 NULL
5 1 Y
6 5 A
7 4 B
8 3 NULL
9 2 NULL
10 1 NULL
The result should look like:
Row Refrec FLAG SEQUENCE
1 5 NULL NULL
2 4 X 0
3 3 NULL 1
4 2 NULL 2
5 1 Y 0
6 5 A 0
7 4 B 0
8 3 NULL 1
9 2 NULL 2
10 1 NULL 3
Thanks!
It looks like you want to enumerate the sequence values for NULL values, setting all the other values to 0. I'm not sure why the first value is NULL, but that is easily fixed.
The following may do what you want:
select t.*,
(case when flag is not null then 0
else row_number() over (partition by seqnum - row order by row)
end) as Sequence
from (select t.*, row_number() over (partition by flag order by row) as seqnum
from table t
);
If you really care about the first value:
select t.*,
(case when row = 1 then NULL
when flag is not null then 0
else row_number() over (partition by seqnum - row order by row)
end) as Sequence
from (select t.*, row_number() over (partition by flag order by row) as seqnum
from table t
);

Exclude value of a record in a group if another is present

In the example table below, I'm trying to figure out a way to sum amount over id for all marks where mark 'C' doesn't exist within an id. When mark 'C' does exist in an id, I want the sum of amounts over that id, excluding the amount against mark 'A'. As illustration, my desired output is at the bottom. I've considered using partitions and the EXISTS command, but I'm having trouble conceptualizing the solution. If any of you could take a look and point me in the right direction, it would be greatly appreciated :)
sample table:
id mark amount
------------------
1 A 1
2 A 3
2 B 2
3 A 2
4 A 1
4 B 3
5 A 1
5 C 3
6 A 2
6 C 2
desired output:
id sum(amount)
-----------------
1 1
2 5
3 2
4 4
5 3
6 2
select
id,
case
when count(case mark when 'C' then 1 else null end) = 0
then
sum(amount)
else
sum(case when mark <> 'A' then amount else 0 end)
end
from sampletable
group by id
Here is my effort:
select id, sum(amount) from table t where not t.id = 'A' group by id
having id in (select id from table t where mark = 'C')
union
select id, sum(amount) from table t where t.id group by id
having id not in (select id from table t where mark = 'C')
SELECT
id,
sum(amount) AS sum_amount
FROM atable t
WHERE mark <> 'A'
OR NOT EXISTS (
SELECT *
FROM atable
WHERE id = t.id
AND mark = 'C'
)
GROUP BY
id
;