Transpose columns with repeated data

Transpose columns with repeated data - pandas

I have the data stored in columns that I need to change to rows. The transpose method is not working as expected.
reg_no st_name six seven eight nine ten
number
1 1200210606 DORIN 18 28 98 78 58
2 1001200049 PRAMA 79 69 59 19 29
3 1205210026 PILLA 47 57 67 27 17
4 1205210064 SUSAT 16 16 66 76 86
5 10002100113 PAVITH 15 85 75 65 15
The expected results will look something like this.
1 1200210606 DORIN six 18
1 1200210606 DORIN seven 28
1 1200210606 DORIN eight 98
1 1200210606 DORIN nine 78
1 1200210606 DORIN ten 58
2 1001200049 PRAMA six 79
2 1001200049 PRAMA seven 69
2 1001200049 PRAMA eight 59
2 1001200049 PRAMA nine 19
2 1001200049 PRAMA ten 29
3 1205210026 PILLA six 47
3 1205210026 PILLA seven 57
3 1205210026 PILLA eight 67
3 1205210026 PILLA nine 27
3 1205210026 PILLA ten 17
4 1205210064 SUSAT six 16
4 1205210064 SUSAT seven 16
4 1205210064 SUSAT eight 66
4 1205210064 SUSAT nine 76
4 1205210064 SUSAT ten 86
5 10002100113 PAVITH six 15
5 10002100113 PAVITH seven 85
5 10002100113 PAVITH eight 75
5 10002100113 PAVITH nine 65
5 10002100113 PAVITH ten 15

You are trying to convert wide format to long format.
Use melt function for that.
ref : http://pandas.pydata.org/pandas-docs/stable/generated/pandas.melt.html
import pandas as pd
pd.melt(df,['number,reg_no','st_name']) # df is your dataframe object
you can use sort() after you melt to get the exact order

Related

How to visualize multi-indexed series into a heatmap in pandas?

I am trying to turn this kind of a series:
Animal Idol
50 60 15
55 14
81 14
80 13
56 11
53 10
58 9
57 9
50 9
59 6
52 6
61 1
52 52 64
58 28
55 21
81 17
60 16
50 16
56 15
80 12
61 10
59 10
53 9
57 4
53 53 27
56 14
58 10
50 9
80 8
52 6
55 6
61 5
81 5
60 4
57 4
59 3
Into something looking more like this:
Animal/Idol 60 55 81 80 ...
50 15 14 14 13
52 16 21 17 12
53 4 6 5 8
...
My base for the series here is actually a data frame looking like this (The unnamed values in series being sums of times a pair of animal and idol repeated, and there are many idols to each animal):
Animal Idol
1058 50 50
1061 50 50
1197 50 50
1357 50 50
1637 50 50
... ... ...
2780 81 81
2913 81 81
2915 81 81
3238 81 81
3324 81 81
Sadly, I have no clue how to convert any of this 2 into the desired form. I guess the good name for it is a pivot table, however I could not get the good result using them. How would You transform any of these into the form I need? I would also like to know, how to visualize this kind of a pivot table (if thats a good name) into a heat map, where color for each cell would differ based on the value in the cell (the higher the value, the deeper the colour). Thanks in advance!

i think you are looking for .unstack() (https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.unstack.html) to unstack data.
To visualize you can use multiple tools. I like using holoviews (https://holoviews.org/),
hv.Image should be able to plot a 2d array . You can use hv.Image(df.unstack().values) to do that.
Example:
df = pd.DataFrame({'data': np.random.randint(0, 100, 100)}, index=pd.MultiIndex.from_tuples([(i, j) for i in range(10) for j in range(10)]))
df
unstack:
df_unstacked = df.unstack()
df_unstacked
plot:
import holoviews as hv
hv.Image(df_unstacked.values)
or to plot with matplotlib:
import matplotlib
import matplotlib as mpl
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
im = ax.imshow(df_unstacked.values)

SQL Server : create new column category price according to price column

I have a SQL Server table with a column price looking like this:
10
96
64
38
32
103
74
32
67
103
55
28
30
110
79
91
16
71
36
106
89
87
59
41
56
89
68
32
80
47
45
77
64
93
17
88
13
19
83
12
76
99
104
65
83
95
Now my aim is to create a new column giving a category from 1 to 10 to each of those values.
For instance the max value in my column is 110 the min is 10. Max-min = 100. Then if I want to have 10 categories I do 100/10= 10. Therefore here are the ranges:
10-20 1
21-30 2
31-40 3
41-50 4
51-60 5
61-70 6
71-80 7
81-90 8
91-100 9
101-110 10
Desired output:
my new column called cat should look like this:
price cat
-----------------
10 1
96 9
64 6
38 3
32 3
103 10
74 7
32 3
67 6
103 10
55 5
28 2
30 3
110 10
79 7
91 9
16 1
71 7
36 3
106 10
89 8
87 8
59 5
41 4
56 5
89 8
68 6
32 3
80 7
47 4
45 4
77 7
64 6
93 9
17 1
88 8
13 1
19 1
83 8
12 1
76 7
99 9
104 10
65 6
83 8
95 9
Is there a way to perform this with T-SQL? Sorry if this question is maybe too easy. I searched long time on the web. So either the problem is not as simple as I imagine. Either I entered the wrong keywords.

Yes, almost exactly as you describe the calculation:
select price,
1 + (price - min_price) * 10 / (max_price - min_price + 1) as decile
from (select price,
min(price) over () as min_price,
max(price) over () as max_price
from t
) t;
The 1 + is because you want the values from 1 to 10, rather than 0 to 9.

Yes - a case statement can do that.
select
price
,case
when price between 10 and 20 then 1
when price between 21 and 30 then 2
when price between 31 and 40 then 3
when price between 41 and 50 then 4
when price between 51 and 60 then 5
when price between 61 and 70 then 6
when price between 71 and 80 then 7
when price between 81 and 90 then 8
when price between 91 and 100 then 9
when price between 101 and 110 then 10
else null
end as cat
from [<enter your table name here>]

How to divide a result set into equal parts?

I have a table new_table
ID PROC_ID DEP_ID OLD_STAFF NEW_STAFF
1 15 43 58 ?
2 19 43 58 ?
3 29 43 58 ?
4 31 43 58 ?
5 35 43 58 ?
6 37 43 58 ?
7 38 43 58 ?
8 39 43 58 ?
9 58 43 58 ?
10 79 43 58 ?
How I can select all proc_ids and update new_staff, for example
ID PROC_ID DEP_ID OLD_STAFF NEW_STAFF
1 15 43 58 15
2 19 43 58 15
3 29 43 58 15
4 31 43 58 15
5 35 43 58 23
6 37 43 58 23
7 38 43 58 23
8 39 43 58 28
9 58 43 58 28
10 79 43 58 28
15 - 4(proc_id)
23 - 3(proc_id)
28 - 3(proc_id)
58 - is busi
where 15, 23, 28 and 58 staffs in one dep

"how to divide equal parts"
Oracle has a function, ntile() which splits a result set into equal buckets. For instance this query puts your posted data into four buckets:
SQL> select id
2 , proc_id
3 , ntile(4) over (order by id asc) as gen_staff
4 from new_table;
ID PROC_ID GEN_STAFF
---------- ---------- ----------
1 15 1
2 19 1
3 29 1
4 31 2
5 35 2
6 37 2
7 38 3
8 39 3
9 58 4
10 79 4
10 rows selected.
SQL>
This isn't quite the solution you want but you need to clarify your requirements before it's possible to provide a complete answer.

update new_table
set new_staff='15'
where ID in('1','2','3','4')
update new_table
set new_staff='28'
where ID in('8','9','10')
update new_table
set new_staff='23'
where ID in('5','6','7')
Not sure if this is what you mean.

Strange results with VAR and STDEV

This
SELECT
AVG(s.Amount/100)[Avg],
STDEV(s.Amount/100) [StDev],
VAR(s.Amount/100) [Var]
Returns this:
Avg StDev Var
133 550.82021581146 303402.910146583
Statistics aren't my strongest suit, but how is it possible that standard deviation and variance are larger than the average? Not only that, but variance is almost 100x larger than the largest sample in set.
Here is the entire sample set, with the above replaced with
SELECT s.Amount/100
while the rest of the query is identical
Amount
4645
3182
422
377
359
298
278
242
230
213
182
180
174
166
150
130
116
113
109
107
102
96
84
78
78
76
66
64
61
60
60
60
59
59
56
49
46
41
41
39
38
36
29
27
26
25
25
25
24
24
24
22
22
22
20
20
19
19
19
19
19
18
17
17
17
16
14
13
12
12
12
11
11
10
10
10
10
9
9
9
8
8
8
7
7
6
6
6
3
3
3
3
2
2
2
2
2
1
1
1
1
1
1

You need to read a book on statistics, or at least start with the Wikipedia pages that describe the concepts.
The standard deviation and variance are very related. The variance is the square (or close enough to the square) of the standard deviation. You can check that this is true of your numbers.
There is not really a relationship between the standard deviation and the average. The standard deviation is measuring the dispersal of the data around the average. The data can be arbitrarily dispersed around an average.
You might be confused because there are estimates on standard deviation/standard error when you assume a particular distribution of the data. However, those estimates are about the distribution and not about the data.

Using SQL SMS How do I return a list of numbers using low and high columns

I have a table SUB_Inst with columns id, low and high. How would I query the low and high numbers returning a new column with a record for each number from low to high?
Current table SUB_Inst
id low High
1 55 63
2 232 234
3 4 7
etc.
Desired Results
id low High Num_list
1 55 63 55
1 55 63 56
1 55 63 57
1 55 63 58
1 55 63 59
1 55 63 60
1 55 63 61
1 55 63 62
1 55 63 63
2 232 234 232
2 232 234 233
2 232 234 234
3 4 7 4
3 4 7 5
3 4 7 6
3 4 7 7
etc.
I tried something like this:
SELECT Low, HIGH,
(SELECT CAST(number as varchar)+','
FROM NUMBERS
WHERE number >= Low and number <= High
FOR XML PATH(''))
FROM SUB_Inst
but it returned all the numbers in one field like this which won't work:
Low High Num_List
24 27 24,25,26,27,
34 36 34,35,36,
10 17 10,11,12,13,14,15,16,17,
34 36 34,35,36,
65 67 65,66,67,
502 504 502,503,504,
56 59 56,57,58,59,
Thank you.

I think you want this :
SELECT id,low,high,number as Num_List
FROM SUB_Inst , NUMBERS
where low<=number and high>=number

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Transpose columns with repeated data - pandas

Related

How to visualize multi-indexed series into a heatmap in pandas?

SQL Server : create new column category price according to price column

How to divide a result set into equal parts?

Strange results with VAR and STDEV

Using SQL SMS How do I return a list of numbers using low and high columns

Categories

Resources