Create range bins in hive for histograms

Create range bins in hive for histograms - hive

I have a data set which contains students_id and their ages. I want the marks should be arranged in a range or bin with the bucket size of 10.
stud_id ages
101 11
102 13
103 21
104 25
Similarly i have date for more number of records. this has to be arranged with a bin size of 10.
The Expected output is:
stud_id ages_bin
101 11-20
102 11-20
103 21-30
104 21-30
I tried simple case statement in hive.
select stud_id,
case when ages between 0 and 10 then '0-10'
when ages between 11 and 20 then '11-20'
when ages between 21 and 30 then '21-30'
when ages between 31 and 40 then '31-40'
when ages between 41 and 50 then '41-50'
when ages between 51 and 60 then '51-60'
when ages between 61 and 70 then '61-70'
when ages between 71 and 80 then '71-80'
when ages between 81 and 90 then '81-90'
when ages between 91 and 100 then '91-100'
when ages between 101 and 110 then '101-110'
when ages between 111 and 120 then '111-120'
when ages between 121 and 130 then '121-130'
when ages between 131 and 140 then '131-140'
when ages between 141 and 150 then '141-150'
else NULL end as ages_bin
from students
Is there any simple way to have the binned data with bucket size 10?
can someone help me in writing a simple code?

There's one simple method to arrange the range of bins for histogram. Here is the code:
select stud_id,floor((ages)/10)*10 as strt_range,
floor((ages)/10)*10+9 as end_range from students
This produces the following output:
stud_id ages_bin
101 10-19
102 10-19
103 20-29
104 20-29

Try this. This should be able get u the bins in bin format :
select stud_id, concat(cast(floor((ages)/10)*10 as string),'-',
cast(floor((ages)/10)*10+9 as string)) from students
to be able to get appropriate output, it would better if u group it and order it
appropriately

Related

How can I group and get MS Access query to show only rows with a maximum value in a specified field for a consecutive number of times?

I have a large access table that I need to pull specific data from with a query.
I need to get a list of all the IDs that meet a specific criteria, i.e. 3 months in a row with a cage number less than 50.
The SQL code I'm currently working with is below, but it only gives me which months of the past 3 had a cage number below 50.
SELECT [AbBehWeeklyMonitor Database].AnimalID, [AbBehWeeklyMonitor Database].Date, [AbBehWeeklyMonitor Database].Cage
FROM [AbBehWeeklyMonitor Database]
WHERE ((([AbBehWeeklyMonitor Database].Date)>=DateAdd("m",-3,Date())) AND (([AbBehWeeklyMonitor Database].Cage)<50))
ORDER BY [AbBehWeeklyMonitor Database].AnimalID DESC;
I would need it to look at the past 3 months for each ID, and only output if all 3 met the specific criteria, but I'm not sure where to go from here.
Any help would be appreciated.
Data Sample:
Date
AnimalID
Cage
6/28/2022
12345
50
5/19/2021
12345
32
3/20/2008
12345
75
5/20/2022
23569
4
8/20/2022
23569
4
5/20/2022
44444
71
8/1/2012
44444
4
4/1/2022
78986
30
1/20/2022
78986
1
9/14/2022
65659
59
8/10/2022
65659
48
7/14/2022
65659
30
6/14/2022
95659
12
8/14/2022
91111
51
7/14/2022
91111
5
6/14/2022
91111
90
8/14/2022
88888
4
7/14/2022
88888
5
6/14/2022
88888
15

Consider:
Query1:
SELECT AnimalID, Count(*) AS Cnt
FROM Table1
WHERE (((Cage)<50) AND (([Date]) Between #6/1/2022# And #8/31/2022#))
GROUP BY AnimalID
HAVING (((Count(*))=3));
Query2
SELECT Table1.*
FROM Query1 INNER JOIN Table1 ON Query1.AnimalID = Table1.AnimalID
WHERE ((([Date]) Between #6/1/2022# And #8/31/2022#));
Output:
Date AnimalID Cage
6/14/2022 65659 12
7/14/2022 65659 30
8/10/2022 65659 48
6/14/2022 88888 15
7/14/2022 88888 5
8/14/2022 88888 4
Date is a reserved word and really should not use reserved words as names.

SQL percentage calculation over the hour

I have a table consisting of thousands of devices similar to the one below, and I want to calculate the time spent by the devices in certain locations as a percentage on an hourly basis using this table.
(Values are given as an example.)
device
geohash
gridtype
total_hour_count
total_day_count
avg_spent_hour
67a47cd76baff7e2
sxk9g3
Work
500
25
20.00
67a47cd76baff7e2
swy9g3
Home
590
27
18.00
67a47cd76baff7e2
szbvfd
Other
420
18
9.28
02d171810d7ae1f5
swdvdf
Home
274
30
18,54
02d171810d7ae1f5
sdefvx
Work
184
22
17,51
02d171810d7ae1f5
dfvcxv
Other
122
19
14,12
...
...
...
...
...
...
As an example the desired output:
deviceid
home_percent
work_percent
other_percent
67a47cd76baff7e2
35
35
30
02d171810d7ae1f5
50
25
25
784faeff1c8b76c1
90
5
5
28fa9ca3dfff8a6f
80
10
10
f2f6324d5149e336
80
0
20
d84410d139981c19
25
50
25
...
...
...
...
Thanks for your help.

transpose column to row oracle

I have a query returned value in this form (query return more than 50 columns).
1-99transval 100-200transval 200-300transval ... 1-99nontransval 100...
50 90 80 67 58
For a row value. I want these details to be converted into columns and take the following shape:
Range Transval NonTransval
1-99 50 67
100-200 90 58

In pure SQL, it will need a lot of coding because you will have to manually put the range as there is no relation between the values and the range at all. Had there been a relationship, you could use CASE expression and build the range dynamically.
SQL> WITH DATA AS
2 (SELECT 50 "1-99transval",
3 90 "100-200transval",
4 80 "200-300transval",
5 67 "1-99nontransval",
6 58 "100-200nontransval",
7 88 "200-300nontransval"
8 FROM dual
9 )
10 SELECT '1-99' range,
11 "1-99transval" transval,
12 "1-99nontransval" nontransval
13 FROM DATA
14 UNION
15 SELECT '100-200' range,
16 "100-200transval",
17 "100-200nontransval" nontransval
18 FROM DATA
19 UNION
20 SELECT '200-300' range,
21 "200-300transval",
22 "200-300nontransval" nontransval
23 FROM DATA;
RANGE TRANSVAL NONTRANSVAL
------- ---------- -----------
1-99 50 67
100-200 90 58
200-300 80 88
From Oracle database 11g Release 1 and above, you could use UNPIVOT
SQL> WITH DATA AS
2 (SELECT 50 "1-99transval",
3 90 "100-200transval",
4 80 "200-300transval",
5 67 "1-99nontransval",
6 58 "100-200nontransval",
7 88 "200-300nontransval"
8 FROM dual
9 )
10 SELECT *
11 FROM DATA
12 UNPIVOT( (transval,nontransval)
13 FOR RANGE IN ( ("1-99transval","1-99nontransval") AS '1-99'
14 ,("100-200transval","100-200nontransval") AS '100-200'
15 ,("200-300transval","200-300nontransval") AS '200-300'));
RANGE TRANSVAL NONTRANSVAL
------- ---------- -----------
1-99 50 67
100-200 90 58
200-300 80 88
Above, in your case you need to replace the WITH clause with your existing query as a sub-query. You need to include other columns in the UNION.
In PL/SQL, you could (ab)use EXECUTE IMMEDIATE and get the "range" by extracting the column names in dynamic sql.
Although, it would be much better to modify/rewrite your existing query which you have not shown yet.

If you are using Oracle 11g version then you can use the UNPIVOT feature.
CREATE TABLE DATA AS
SELECT 50 "1-99transval",
90 "100-200transval",
80 "200-300transval",
67 "1-99nontransval",
58 "100-200nontransval",
88 "200-300nontransval"
FROM dual
SELECT *
FROM DATA
UNPIVOT( (Transval,NonTransval) FOR Range IN ( ("1-99transval","1-99nontransval") as '1-99'
,("100-200transval","100-200nontransval") as '100-200'
,("200-300transval","200-300nontransval") as '200-300'))
http://sqlfiddle.com/#!4/c9747/3/0

SQL access Group By and Sum

I'm having an issue with MS Access with my SQL statement.
I have companies and teams, and they each have a balance of money.
(Company1 can have team 1,2,3,4 and Company2 can have team 1,2,3,4,5. Though Comapny1 team1 is not the same as Company2 team1!)
But I have a ton of entries which each correspond to a seller.
I want to sum every balances for each company and each team, no matter which seller it is:
I actually have :
SELECT Company, Team, Sum(Balance) AS tot_balance
FROM Retro2014
GROUP BY Company, Team
But the amounts are 5 to 10 time bigger then they should be when i sum it manually. (But I have around 1200 seller, I can't do it all manually)
EDIT: What I want is something like this:
Company Team tot_balance
------- ---- -----------
Company1 Team1 1000
Company1 Team2 1530
Company1 Team3 120
Company1 Team4 500
Company2 Team1 800
Company2 Team2 750
Company2 Team3 420
Company2 Team4 820
Company2 Team5 120
... ... ...
EDIT2:
I have those values now :
Company Team tot_balance REAL_Balance
10 90 2 534.60 269.06
10 92 813.30 120.89
10 95 1 384.75 210.89
10 96 950.72 142.43
10 97 3 957.03 789.92
10 98 4 822.34 1128.71
EDIT3 : And the source values are those:
COMPANY TEAM SELLER BALANCE
10 50 123.65
10 90 L07630 245.06
10 90 L07630 4
10 90 L07630 8
10 90 L07630 4
10 90 L07630 8
10 92 L96420 32.93
10 92 L96420 87.96
10 95 35.74
10 95 16
10 95 4
10 95 12
10 95 12
10 95 131.15
10 96 L04771 65.5
10 96 L04771 12
10 96 L04771 8
10 96 L04771 8
10 96 L04771 48.93
10 97 L94605 61.93
10 97 L94605 4
10 97 L94605 8
10 97 L94605 233.76
10 97 L94605 344.97
10 97 L94605 90.33
10 97 L94605 38.93
10 97 L94605 4
10 97 L94605 4
10 98 L95652 42.51
10 98 L95652 34.75
10 98 L95652 549.26
10 98 L95652 320.36
10 98 L95652 20
10 98 L95652 112.58
10 98 L95652 41.25
10 98 L95652 8
Thanks,
Phil

As long as the table don't contain multiple entries on Company, Team, Balance
than you SQL should work just fine.
But given your explained issue, I pressume there are more values than what are shown and can therefore cause more rows with the same information shown more than once, which would result in an incorrect summerization. Here is what I would suggest:
Select Company, Team, Sum(Balance) as tot_balance from (
SELECT Company, Team, Balance
FROM Retro2014
GROUP BY Company, Team, Balance ) as b
GROUP by Company, Team

Exclude the specific kind of record

I am using SQL Server 2008 R2. I do have records as below in a table :
Id Sys Dia Type UniqueId
1 156 20 first 12345
2 157 20 first 12345
3 150 15 last 12345
4 160 17 Average 12345
5 150 15 additional 12345
6 157 35 last 891011
7 156 25 Average 891011
8 163 35 last 789521
9 145 25 Average 789521
10 156 20 first 963215
11 150 15 last 963215
12 160 17 Average 963215
13 156 20 first 456878
14 157 20 first 456878
15 150 15 last 456878
16 160 17 Average 456878
17 150 15 last 246977
18 160 17 Average 246977
19 150 15 additional 246977
Regarding this data, these records are kind of groups that have common UniqueId. The records can be of type "first, last, average and additional". Now, from these records I want to select "average" type of records only if they have "first" or "additional" kind of reading in group. Else I want to exclude them from selection..
The expected result is :
Id Sys Dia Type UniqueId
1 156 20 first 12345
2 157 20 first 12345
3 150 15 last 12345
4 160 17 Average 12345
5 150 15 additional 12345
6 157 35 last 891011
7 163 35 last 789521
8 156 20 first 963215
9 150 15 last 963215
10 160 17 Average 963215
11 156 20 first 456878
12 157 20 first 456878
13 150 15 last 456878
14 160 17 Average 456878
15 150 15 last 246977
16 160 17 Average 246977
17 150 15 additional 246977
In short, I don't want to select the record that have type="Average" and have only "last" type of record with same UniqueId. Any solution?

Using EXISTS operator along correlated sub-query:
SELECT * FROM dbo.Table1 t1
WHERE [Type] != 'Average'
OR EXISTS (SELECT * FROM Table1 t2
WHERE t1.UniqueId = t2.UniqueId
AND t1.[Type] = 'Average'
AND t2.[Type] IN ('first','additional'))
SQLFiddle DEMO

Try something like this:
SELECT * FROM MyTable WHERE [Type] <> 'Average'
UNION ALL
SELECT * FROM MyTable T WHERE [Type] = 'Average'
AND EXISTS (SELECT * FROM MyTable
WHERE [Type] IN ('first', 'additional')
AND UniqueId = T.UniqueId)
The first SELECT statement gets all records except the ones with Type = 'Average'. The second SELECT statement gets only the Type = 'Average' records that have at least one record with the same UniqueId, that is of type 'first' or 'additional'.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Create range bins in hive for histograms - hive

There's one simple method to arrange the range of bins for histogram. Here is the code: select stud_id,floor((ages)/10)10 as strt_range, floor((ages)/10)10+9 as end_range from students This produces the following output: stud_id ages_bin 101 10-19 102 10-19 103 20-29 104 20-29

Try this. This should be able get u the bins in bin format : select stud_id, concat(cast(floor((ages)/10)10 as string),'-', cast(floor((ages)/10)10+9 as string)) from students to be able to get appropriate output, it would better if u group it and order it appropriately

Related

How can I group and get MS Access query to show only rows with a maximum value in a specified field for a consecutive number of times?

SQL percentage calculation over the hour

transpose column to row oracle

SQL access Group By and Sum

Exclude the specific kind of record

Categories

Resources

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Create range bins in hive for histograms - hive

There's one simple method to arrange the range of bins for histogram. Here is the code: select stud_id,floor((ages)/10)*10 as strt_range, floor((ages)/10)*10+9 as end_range from students This produces the following output: stud_id ages_bin 101 10-19 102 10-19 103 20-29 104 20-29

Try this. This should be able get u the bins in bin format : select stud_id, concat(cast(floor((ages)/10)*10 as string),'-', cast(floor((ages)/10)*10+9 as string)) from students to be able to get appropriate output, it would better if u group it and order it appropriately

Related

How can I group and get MS Access query to show only rows with a maximum value in a specified field for a consecutive number of times?

SQL percentage calculation over the hour

transpose column to row oracle

SQL access Group By and Sum

Exclude the specific kind of record

Categories

Resources

There's one simple method to arrange the range of bins for histogram. Here is the code: select stud_id,floor((ages)/10)10 as strt_range, floor((ages)/10)10+9 as end_range from students This produces the following output: stud_id ages_bin 101 10-19 102 10-19 103 20-29 104 20-29

Try this. This should be able get u the bins in bin format : select stud_id, concat(cast(floor((ages)/10)10 as string),'-', cast(floor((ages)/10)10+9 as string)) from students to be able to get appropriate output, it would better if u group it and order it appropriately