SQL group table by "leading rows" without pl/sql - sql

I have this table (short example) with two columns
1 a
2 a
3 a3
4 a
5 a
6 a6
7 a
8 a8
9 a
and I would like to group/partition them into groups separated by those leading "a", ideally to add another column like this, so I can address those groups easily.
1 a 0
2 a 0
3 a3 3
4 a 3
5 a 3
6 a6 6
7 a 6
8 a8 8
9 a 8
problem is that setup of the table is dynamic so I can't use staticly lag or lead functions, any ideas how to do this without pl/sql in postgres version 9.5

Assuming the leading part is a single character. Hence the expression right(data, -1) works to extract the group name. Adapt to your actual prefix.
The solution uses two window functions, which can't be nested. So we need a subquery or a CTE.
SELECT id, data
, COALESCE(first_value(grp) OVER (PARTITION BY grp_nr ORDER BY id), '0') AS grp
FROM (
SELECT *, NULLIF(right(data, -1), '') AS grp
, count(NULLIF(right(data, -1), '')) OVER (ORDER BY id) AS grp_nr
FROM tbl
) sub;
Produces your desired result exactly.
NULLIF(right(data, -1), '') to get the effective group name or NULL if none.
count() only counts non-null values, so we get a higher count for every new group in the subquery.
In the outer query, we take the first grp value per grp_nr as group name and default to '0' with COALESCE for the first group without name (which has a NULL as group name so far).
We could use min() or max() as outer window function as well, since there is only one non-null value per partition anyway. first_value() is probably cheapest since the rows are sorted already.
Note the group name grp is data type text. You may want to cast to integer, if those are clean (and reliably) integer numbers.

This can be achieved by setting rows containing a to a specific value and all the other rows to a different value. Then use a cumulative sum to get the desired number for the rows. The group number is set to the next number when a new value in the val column is encountered and all the proceeding rows with a will have the same group number as the one before and this continues.
I assume that you would need a distinct number for each group and the number doesn't matter.
select id, val, sum(ex) over(order by id) cm_sum
from (select t.*
,case when val = 'a' then 0 else 1 end ex
from t) x
The result for the query above with the data in question, would be
id val cm_sum
--------------
1 a 0
2 a 0
3 a3 1
4 a 1
5 a 1
6 a6 2
7 a 2
8 a8 3
9 a 3

With the given data, you can use a cumulative max:
select . . .,
coalesce(max(substr(col2, 2)) over (order by col1), 0)
If you don't strictly want the maximum, then it gets a bit more difficult. The ANSI solution is to use the IGNORE NULLs option on LAG(). However, Postgres does not (yet) support that. An alternative is:
select . . ., coalesce(substr(reft.col2, 2), 0)
from (select . . .,
max(case when col2 like 'a_%' then col1 end) over (order by col1) as ref_col1
from t
) tt join
t reft
on tt.ref_col1 = reft.col1

You can also try this :
with mytable as (select split_part(t,' ',1)::integer id,split_part(t,' ',2) myvalue
from (select unnest(string_to_array($$1 a;2 a;3 a3;4 a;5 a;6 a6;7 a;8 a8;9 a$$,
';'))t) a)
select id,myvalue,myresult from mytable join (
select COALESCE(NULLIF(substr(myvalue,2),''),'0') myresult,idmin id_down
,COALESCE(lead(idmin) over (order by myvalue),999999999999) id_up
from (
select myvalue,min(id) idmin from mytable group by 1
) a) b
on id between id_down and id_up-1

Related

Joining next Sequential Row

I am planing an SQL Statement right now and would need someone to look over my thougts.
This is my Table:
id stat period
--- ------- --------
1 10 1/1/2008
2 25 2/1/2008
3 5 3/1/2008
4 15 4/1/2008
5 30 5/1/2008
6 9 6/1/2008
7 22 7/1/2008
8 29 8/1/2008
Create Table
CREATE TABLE tbstats
(
id INT IDENTITY(1, 1) PRIMARY KEY,
stat INT NOT NULL,
period DATETIME NOT NULL
)
go
INSERT INTO tbstats
(stat,period)
SELECT 10,CONVERT(DATETIME, '20080101')
UNION ALL
SELECT 25,CONVERT(DATETIME, '20080102')
UNION ALL
SELECT 5,CONVERT(DATETIME, '20080103')
UNION ALL
SELECT 15,CONVERT(DATETIME, '20080104')
UNION ALL
SELECT 30,CONVERT(DATETIME, '20080105')
UNION ALL
SELECT 9,CONVERT(DATETIME, '20080106')
UNION ALL
SELECT 22,CONVERT(DATETIME, '20080107')
UNION ALL
SELECT 29,CONVERT(DATETIME, '20080108')
go
I want to calculate the difference between each statistic and the next, and then calculate the mean value of the 'gaps.'
Thougts:
I need to join each record with it's subsequent row. I can do that using the ever flexible joining syntax, thanks to the fact that I know the id field is an integer sequence with no gaps.
By aliasing the table I could incorporate it into the SQL query twice, then join them together in a staggered fashion by adding 1 to the id of the first aliased table. The first record in the table has an id of 1. 1 + 1 = 2 so it should join on the row with id of 2 in the second aliased table. And so on.
Now I would simply subtract one from the other.
Then I would use the ABS function to ensure that I always get positive integers as a result of the subtraction regardless of which side of the expression is the higher figure.
Is there an easier way to achieve what I want?
The lead analytic function should do the trick:
SELECT period, stat, stat - LEAD(stat) OVER (ORDER BY period) AS gap
FROM tbstats
The average value of the gaps can be done by calculating the difference between the first value and the last value and dividing by one less than the number of elements:
select sum(case when seqnum = num then stat else - stat end) / (max(num) - 1);
from (select period, row_number() over (order by period) as seqnum,
count(*) over () as num
from tbstats
) t
where seqnum = num or seqnum = 1;
Of course, you can also do the calculation using lead(), but this will also work in SQL Server 2005 and 2008.
By using Join also you achieve this
SELECT t1.period,
t1.stat,
t1.stat - t2.stat gap
FROM #tbstats t1
LEFT JOIN #tbstats t2
ON t1.id + 1 = t2.id
To calculate the difference between each statistic and the next, LEAD() and LAG() may be the simplest option. You provide an ORDER BY, and LEAD(something) returns the next something and LAG(something) returns the previous something in the given order.
select
x.id thisStatId,
LAG(x.id) OVER (ORDER BY x.id) lastStatId,
x.stat thisStatValue,
LAG(x.stat) OVER (ORDER BY x.id) lastStatValue,
x.stat - LAG(x.stat) OVER (ORDER BY x.id) diff
from tbStats x

How do I aggregate numbers from a string column in SQL

I am dealing with a poorly designed database column which has values like this
ID cid Score
1 1 3 out of 3
2 1 1 out of 5
3 2 3 out of 6
4 3 7 out of 10
I want the aggregate sum and percentage of Score column grouped on cid like this
cid sum percentage
1 4 out of 8 50
2 3 out of 6 50
3 7 out of 10 70
How do I do this?
You can try this way :
select
t.cid
, cast(sum(s.a) as varchar(5)) +
' out of ' +
cast(sum(s.b) as varchar(5)) as sum
, ((cast(sum(s.a) as decimal))/sum(s.b))*100 as percentage
from MyTable t
inner join
(select
id
, cast(substring(score,0,2) as Int) a
, cast(substring(score,charindex('out of', score)+7,len(score)) as int) b
from MyTable
) s on s.id = t.id
group by t.cid
[SQLFiddle Demo]
Redesign the table, but on-the-fly as a CTE. Here's a solution that's not as short as you could make it, but that takes advantage of the handy SQL Server function PARSENAME. You may need to tweak the percentage calculation if you want to truncate rather than round, or if you want it to be a decimal value, not an int.
In this or most any solution, you have to count on the column values for Score to be in the very specific format you show. If you have the slightest doubt, you should run some other checks so you don't miss or misinterpret anything.
with
P(ID, cid, Score2Parse) as (
select
ID,
cid,
replace(Score,space(1),'.')
from scores
),
S(ID,cid,pts,tot) as (
select
ID,
cid,
cast(parsename(Score2Parse,4) as int),
cast(parsename(Score2Parse,1) as int)
from P
)
select
cid, cast(round(100e0*sum(pts)/sum(tot),0) as int) as percentage
from S
group by cid;

SQL random number that doesn't repeat within a group

Suppose I have a table:
HH SLOT RN
--------------
1 1 null
1 2 null
1 3 null
--------------
2 1 null
2 2 null
2 3 null
I want to set RN to be a random number between 1 and 10. It's ok for the number to repeat across the entire table, but it's bad to repeat the number within any given HH. E.g.,:
HH SLOT RN_GOOD RN_BAD
--------------------------
1 1 9 3
1 2 4 8
1 3 7 3 <--!!!
--------------------------
2 1 2 1
2 2 4 6
2 3 9 4
This is on Netezza if it makes any difference. This one's being a real headscratcher for me. Thanks in advance!
To get a random number between 1 and the number of rows in the hh, you can use:
select hh, slot, row_number() over (partition by hh order by random()) as rn
from t;
The larger range of values is a bit more challenging. The following calculates a table (called randoms) with numbers and a random position in the same range. It then uses slot to index into the position and pull the random number from the randoms table:
with nums as (
select 1 as n union all select 2 union all select 3 union all select 4 union all select 5 union all
select 6 union all select 7 union all select 8 union all select 9
),
randoms as (
select n, row_number() over (order by random()) as pos
from nums
)
select t.hh, t.slot, hnum.n
from (select hh, randoms.n, randoms.pos
from (select distinct hh
from t
) t cross join
randoms
) hnum join
t
on t.hh = hnum.hh and
t.slot = hnum.pos;
Here is a SQLFiddle that demonstrates this in Postgres, which I assume is close enough to Netezza to have matching syntax.
I am not an expert on SQL, but probably do something like this:
Initialize a counter CNT=1
Create a table such that you sample 1 row randomly from each group and a count of null RN, say C_NULL_RN.
With probability C_NULL_RN/(10-CNT+1) for each row, assign CNT as RN
Increment CNT and go to step 2
Well, I couldn't get a slick solution, so I did a hack:
Created a new integer field called rand_inst.
Assign a random number to each empty slot.
Update rand_inst to be the instance number of that random number within this household. E.g., if I get two 3's, then the second 3 will have rand_inst set to 2.
Update the table to assign a different random number anywhere that rand_inst>1.
Repeat assignment and update until we converge on a solution.
Here's what it looks like. Too lazy to anonymise it, so the names are a little different from my original post:
/* Iterative hack to fill 6 slots with a random number between 1 and 13.
A random number *must not* repeat within a household_id.
*/
update c3_lalfinal a
set a.rand_inst = b.rnum
from (
select household_id
,slot_nbr
,row_number() over (partition by household_id,rnd order by null) as rnum
from c3_lalfinal
) b
where a.household_id = b.household_id
and a.slot_nbr = b.slot_nbr
;
update c3_lalfinal
set rnd = CAST(0.5 + random() * (13-1+1) as INT)
where rand_inst>1
;
/* Repeat until this query returns 0: */
select count(*) from (
select household_id from c3_lalfinal group by 1 having count(distinct(rnd)) <> 6
) x
;

Grouping values based on sequence in SQL

Is there a way to create just using select statements a table that contains in a column the range of the repeating values like in this example?
Example:
from the input table:
id value:
1 25
2 25
3 24
4 25
5 25
6 25
7 28
8 28
9 11
should result:
range value
1-2 25
3-3 24
4-6 25
7-8 28
9-9 11
Note: The id values from the input table are always in order and the difference between 2 values ordered by id is always equal to 1
You want to find sequences of consecutive values. Here is one approach, using window functions:
select min(id), max(id), value
from (select id, value,
row_number() over (order by id) as rownum,
row_number() over (partition by value order by id) as seqnum
from t
) t
group by (rownum - seqnum), value;
This takes into account that a value might appear in different places among the rows. The idea is simple. rownum is a sequential number. seqnum is a sequential number that increments for a given value. The difference between these is constant for values that are in a row.
Let me add, if you actually want the expression as "1-2", we need more information. Assuming the id is a character string, one of the following would work, depending on the database:
select min(id)+'-'+max(id), . . .
select concat(min(id), '-', max(id)), . . .
select min(id)||'-'||max(id), . . .
If the id is an integer (as I suspect), then you need to replace the ids in the above expressions with cast(id as varchar(32)), except in Oracle, where you would use cast(id as varchar2(32)).
SELECT CONCAT(MIN(id), '-', MAX(id)) AS id_range, value
FROM input_table
GROUP BY value
Maybe this:
SELECT MIN(ID), MAX(ID), VALUE FROM TABLE GROUP BY VALUE

Sort data row in sql

please help me i have columns from more than one table and the data type for all these columns is integer
i want to sort the data row (not columns (not order by)) Except the primary key column
for example
column1(pk) column2 column3 column4 column5
1 6 5 3 1
2 10 2 3 1
3 1 2 4 3
How do I get this result
column1(pk) column2 column3 column4 column5
1 1 3 5 6
2 1 2 3 10
3 1 2 3 4
Please help me quickly .. Is it possible ?? or impossible ???
if impossible how I could have a similar result regardless of sort
What database are you using? The capabilities of the database are important. Second, this suggests a data structure issue. Things that need to be sorted would normally be separate entities . . . that is, separate rows in a table. The rest of this post answers the question.
If the database supports pivot/unpivot you can do the following:
(1) Unpivot the data to get in the format , ,
(2) Use row_number() to assign a new column, based on the ordering of the values.
(3) Use the row_number() to create a varchar new column name.
(4) Pivot the data again using the new column.
You can do something similar if this functionality is not available.
First, change the data to rows:
(select id, 'col1', col1 as val from t) union all
(select id, 'col2', col2 from t) union all
. . .
Call this byrow. The following query appends a row number:
select br.*, row_number() over (partition by id order by val) as seqnum
from byrow br
Put this into a subquery to unpivot. The final solution looks like:
with byrow as (<the big union all query>)
select id,
max(case when seqnum = 1 then val end) as col1,
max(case when seqnum = 2 then val end) as col2,
...
from (select br.*, row_number() over (partition by id order by val) as seqnum
from byrow br
) br
group by id
You can use pivot function of sql server to convert the row in column. Apply the sorting there and again convert column to row using unpivot.
Here is a good example using PIVOT, you should be able to adapt this to meet your needs
http://blogs.msdn.com/b/spike/archive/2009/03/03/pivot-tables-in-sql-server-a-simple-sample.aspx