determining histogram bin size - sql

I'm looking to create a histogram in SQL (which in itself isn't too tricky), but what I'm looking for is a way of splitting the bins so that each bin / band has the same proportion of the data included within.
For example if I have the sample data (the value column) and I want to divide it into 5 bins, I know that I can work out the number of bins by doing something like
(MAX(Value) - MIN(Value)) / numberofsteps
Will give the groups we see in the band 1 column.
However what I want is for the bands to be calculated so that each band accounts for (100 / n) % of the total where n is the number of bands (so in this case each of the 5 bands would represent 20% of the total data) - which is what is shown in the band 2 column
Value band 1 band 2
1 | 1 to 2 | 0 to 1
1 | 1 to 2 | 0 to 1
1 | 1 to 2 | 0 to 1
1 | 1 to 2 | 0 to 1
2 | 1 to 2 | 2 to 3
2 | 1 to 2 | 2 to 3
3 | 1 to 2 | 2 to 3
3 | 1 to 2 | 2 to 3
4 | 3 to 4 | 4 to 6
4 | 3 to 4 | 4 to 6
5 | 5 to 6 | 4 to 6
6 | 5 to 6 | 4 to 6
7 | 7 to 8 | 7 to 8
8 | 7 to 8 | 7 to 8
8 | 7 to 8 | 7 to 8
8 | 7 to 8 | 7 to 8
9 | 9 to 10 | 9 to 10
10 | 9 to 10 | 9 to 10
10 | 9 to 10 | 9 to 10
10 | 9 to 10 | 9 to 10
Is there a way to do this in SQL (i'm using SQL server 2005 if that helps), possibly without creating a UDF and having it so that I can easily alter the number of bins would be great (if that's not asking the impossible!)
Thanks

To divide into bins you can use the ntile function.
with Vals AS
(
SELECT 1 AS value UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 8 UNION ALL SELECT 8 UNION ALL SELECT 9 UNION ALL SELECT 10 UNION ALL SELECT 10 UNION ALL SELECT 10
), TiledVals AS
(
SELECT value, NTILE(5) OVER (ORDER BY value) AS BinNumber
FROM Vals
)
SELECT value, BinNumber,
Min(value) OVER (PARTITION BY BinNumber) As StartBin,
MAX(value) OVER (PARTITION BY BinNumber) As EndBin
FROM TiledVals
Gives
value BinNumber StartBin EndBin
----------- -------------------- ----------- -----------
1 1 1 1
1 1 1 1
1 1 1 1
1 1 1 1
2 2 2 3
2 2 2 3
3 2 2 3
3 2 2 3
4 3 4 6
4 3 4 6
5 3 4 6
6 3 4 6
7 4 7 8
8 4 7 8
8 4 7 8
8 4 7 8
9 5 9 10
10 5 9 10
10 5 9 10
10 5 9 10

Related

RANK data by value in the column

I'd like to divide the data into separate groups (chunks) based on the value in the column. If the value increase above certain threshold, the value in the "group" should increase by 1.
This would be easy to achieve in MySQL, by doing CASE WHEN #val > 30 THEN #row_no + 1 ELSE #row_no END however I am using Amazon Redshift where this is not allowed.
Sample fiddle: http://sqlfiddle.com/#!15/00b3aa/6
Suggested output:
ID
Value
Group
1
11
1
2
11
1
3
22
1
4
11
1
5
35
2
6
11
2
7
11
2
8
11
2
9
66
3
10
11
3
A cumulative sum should do what you want:
SELECT *, sum((val>=30)::INTEGER) OVER (ORDER BY id BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) FROM mydata ORDER BY id;
id | val | sum
----+-----+-----
1 | 11 | 0
2 | 11 | 0
3 | 22 | 0
4 | 11 | 0
5 | 35 | 1
6 | 11 | 1
7 | 11 | 1
8 | 11 | 1
9 | 66 | 2
10 | 11 | 2

get data from same table in sql using join

I have a table [dbo].[UserImages] where user uploads their photos after every 6 day, total 18 records for user 3. 9 records of day 1 and 9 records of day 6. There are 4 columns In this table
[Id, UserId, Image, Day]
Id UserId Image Day
1 3 3_20200408_1.png 1
2 3 3_20200408_2.png 1
3 3 3_20200408_3.png 1
4 3 3_20200408_4.png 1
5 3 3_20200408_5.png 1
6 3 3_20200408_6.png 1
7 3 3_20200408_7.png 1
8 3 3_20200408_8.png 1
9 3 3_20200408_9.png 1
10 3 3_20200410_9.png 6
11 3 3_20200410_2.png 6
12 3 3_20200410_3.png 6
13 3 3_20200410_4.png 6
14 3 3_20200410_5.png 6
15 3 3_20200410_6.png 6
16 3 3_20200410_7.png 6
17 3 3_20200410_8.png 6
18 3 3_20200410_9.png 6
I need something like that
ImgCount UserId ImageDay1 ImageDay6
1 3 3_20200408_1.png 3_20200408_1.png
2 3 3_20200408_2.png 3_20200408_2.png
3 3 3_20200408_3.png 3_20200408_3.png
4 3 3_20200408_4.png 3_20200408_4.png
5 3 3_20200408_5.png 3_20200408_5.png
6 3 3_20200408_6.png 3_20200408_6.png
7 3 3_20200408_7.png 3_20200408_7.png
8 3 3_20200408_8.png 3_20200408_8.png
9 3 3_20200408_9.png 3_20200408_9.png
What should I do for this
You can use row_number() and aggregation:
select
imgCount,
userId,
max(case when day = 1 then image end) ImageDay1,
max(case when day = 6 then image end) ImageDay6
from (
select t.*, row_number() over(partition by userId, day order by image) imgCount
from mytable t
where day in (1, 6)
) t
group by userId, imgCount
order by ImgCount
Demo on DB Fiddle:
ImgCount | userId | ImageDay1 | ImageDay6
:------- | -----: | :--------------- | :---------------
1 | 3 | 3_20200408_1.png | 3_20200410_1.png
2 | 3 | 3_20200408_2.png | 3_20200410_2.png
3 | 3 | 3_20200408_3.png | 3_20200410_3.png
4 | 3 | 3_20200408_4.png | 3_20200410_4.png
5 | 3 | 3_20200408_5.png | 3_20200410_5.png
6 | 3 | 3_20200408_6.png | 3_20200410_6.png
7 | 3 | 3_20200408_7.png | 3_20200410_7.png
8 | 3 | 3_20200408_8.png | 3_20200410_8.png
9 | 3 | 3_20200408_9.png | 3_20200410_9.png

Oracle SQL use previous column value to lookup next row

What I currently have:
ID FROM_REF TO_REF
--- -------- ----
1 1 10
1 2 3
1 3 4
1 3 5
1 5 6
1 6 7
1 7 9
1 8 11
1 9 8
1 10 2
What's needed is the SORT column which I can use to sort according later on:
ID FROM_REF TO_REF SORT
--- -------- ---- ----
1 1 10 1
1 10 2 2
1 2 3 3
1 3 4 4
1 4 5 5
1 5 6 6
1 6 7 7
1 7 9 8
1 9 8 9
1 8 11 10
NOTE: TO_REF column indicate next FROM_REF.
How do I write SQL to achieve the SORT column as result?
Please help.
You can use a RECURSIVE function.
WITH X (ID, FROM_REF, TO_REF) AS
(
SELECT ID, FROM_REF, TO_REF
FROM tbl
WHERE FROM_REF = 1
UNION ALL
SELECT tbl.ID, tbl.FROM_REF, tbl.TO_REF
FROM tbl
JOIN X
ON tbl.ID = X.ID
AND tbl.FROM_REF = X.TO_REF
)
SELECT ID, FROM_REF, TO_REF
FROM X
ID | FROM_REF | TO_REF
-: | -------: | -----:
1 | 1 | 10
1 | 10 | 2
1 | 2 | 3
1 | 3 | 4
1 | 4 | 5
1 | 5 | 6
1 | 6 | 7
1 | 7 | 9
1 | 9 | 8
1 | 8 | 11
dbfiddle here
A simple hierarchical query, I presume.
SQL> with test (from_ref, to_ref) as
2 (select 1, 10 from dual union
3 select 2, 3 from dual union
4 select 3, 4 from dual union
5 select 4, 5 from dual union
6 select 5, 6 from dual union
7 select 6, 7 from dual union
8 select 7, 9 from dual union
9 select 8, 11 from dual union
10 select 9, 8 from dual union
11 select 10, 2 from dual
12 )
13 select from_ref, to_ref, level rn
14 from test
15 connect by from_ref = prior to_ref
16 start with from_ref = (select min(from_ref) from test);
FROM_REF TO_REF RN
---------- ---------- ----------
1 10 1
10 2 2
2 3 3
3 4 4
4 5 5
5 6 6
6 7 7
7 9 8
9 8 9
8 11 10
10 rows selected.
SQL>

SAS Create column of Re-indexed month values based on macro value 12 months back

I have some code that I update each month for work that calculates a number of statistics over the last 12 months of shippment data for a number of different shipment groups i.e. plane, train, truck, ship etc.
In short, I have a column named shipment date which spans around 100 thousand rows and repeats the values 1-12 specifiying the month and another column named shipment count which gives the number of shipments on that date which looks something like this:
data shipments
input Shipment_Month Shipment Count;
datalines;
1 2
2 3
3 5
4 6
5 7
6 9
7 10
8 11
9 12
10 11
11 8
12 7
1 .
2 .
3 .
4 .
. .
. .
. .
. .
. .
Each time I run the code there are two macro variables that update which give me the end of last month and the beggining of the month 12 months ago (spans 12 months). What i'd like to do is just create a column called Shipment_reindex which is populated by reindexed values 1-12 from the Shipment_Month column with the value of the macro that gives me 12 months back i.e. month(&Back_12_Months) = 5 rei-indexed to 1 and so on i.e. 5=1, 6=2, 7=3, 8=4, 9=5, 10=6, 11=7, 12=8, 1=9, 2=10, 3=11, 4=12.
Shipment_Month Shipment_Count Shipment_reindex
1 2 9
2 3 10
3 5 11
4 6 12
5 7 1
6 9 2
7 10 3
8 11 4
9 12 5
10 11 6
11 8 7
12 7 8
1 . 9
2 . 10
3 . 11
4 . 12
5 . 1
. .
I can think of the hard way to do it:
if Shipment_Month=5 then shipment_Reindex=1;
else if Shipment=6 then Shipment_Reindex=2
....
But i'd like for it to update the column automatically when I run the code instead of me constantly having to change the values. I've been messing around with arrays, but am getting nowhere fast.
Please help me out!
Thanks,
- Keith
The problem can be solved by simple algebra criteria.
You can change the value from where you want to start the month as in this step call symputx("Back_12_Months",5)
data shipments;
input Shipment_Month Shipment_Count;
datalines;
1 2
2 3
3 5
4 6
5 7
6 9
7 10
8 11
9 12
10 11
11 8
12 7
1 .
2 .
3 .
4 .
5 .
6 .
7 .
;
run;
data shipments1;
set shipments;
call symputx("Back_12_Months",5);
if Shipment_Month >= &Back_12_Months.
then Shipment_reindex = Shipment_Month+1-&Back_12_Months. ;
else Shipment_reindex = Shipment_Month+13-&Back_12_Months.;
run;
My Output:
Shipment_Month | Shipment_Count | Shipment_reindex
1 | 2 | 9
2 | 3 | 10
3 | 5 | 11
4 | 6 | 12
5 | 7 | 1
6 | 9 | 2
7 | 10 | 3
8 | 11 | 4
9 | 12 | 5
10 | 11 | 6
11 | 8 | 7
12 | 7 | 8
1 | . | 9
2 | . | 10
3 | . | 11
4 | . | 12
5 | . | 1
6 | . | 2
7 | . | 3
Let me know in case of any queries.

Select pair where value of other column is equal

How do I select a pair (number, number) where tabid is equal for two numbers from the following table (i.e: number 7 and 11 have the same tabid):
tabid | number
---------+--------
1 | 6
1 | 6
2 | 7
3 | 8
4 | 8
5 | 10
5 | 11
6 | 12
6 | 11
5 | 6
4 | 7
3 | 8
2 | 11
The result of this should be:
number | number
---------+--------
7 | 11
7 | 8
10 | 11
11 | 12
6 | 10
6 | 11
Is this what you're looking for:
select
t1.number, t2.number
from t t1, t t2
where t1.tabid = t2.tabid
and t1.number < t2.number;
produces:
NUMBER NUMBER
---------- ----------
6 10
6 11
7 8
7 11
10 11
11 12
Use array_agg to concatenate the tabid's into an array. Thereafter self join this cte to check if one array is an overlap of the other using the array operator &&.
with concatenated as (
select array_agg(tabid) as arr_tab, num
from t
group by num
)
select c1.num,c2.num
from concatenated c1
join concatenated c2 on c1.num < c2.num
where c2.arr_tab && c1.arr_tab
order by 1,2
Sample Demo