I need to group a result set

I need to group a result set - sql

I am using SQL Server2014. I have a problem
select *
from (
select [DataTime] datatime,[Temperature] temperature,[Humidity] humidity,
b.[serialnumber] serialnumber, Row_Number() OVER(ORDER BY a.datatime) rownum
from [dbo].[datalog] a,[report_devicelist] b where a.deviceno = b.deviceno and b.report_no='201906140013yEcD'
and a.datatime between '2019-04-09 15:05:00' and '2019-04-09 16:20:52'
) as t where rownum between 1 and 50
time temperature humidity serialnumber rownum
2019-04-09 15:05:01 268 0 ch4 1
2019-04-09 15:05:01 272 0 ch5 2
2019-04-09 15:05:01 266 0 ch6 3
2019-04-09 15:05:01 264 0 ch7 4
2019-04-09 15:05:01 263 0 ch8 5
2019-04-09 15:06:01 253 0 ch3 15
2019-04-09 15:06:01 245 0 ch2 16
2019-04-09 15:06:01 257 0 ch1 17
2019-04-09 15:06:01 272 0 ch14 18
2019-04-09 15:06:01 250 0 ch13 19
2019-04-09 15:06:01 254 0 ch12 20
2019-04-09 15:06:01 263 0 ch11 21
2019-04-09 15:06:01 256 0 ch10 22
time ch1 ch2 ch3 ch4 ch5 ch6 ch7 ch8 ch9
2019/03/05 11:41:01 16.9 15.3 17.2 17.1 15.2 16.9 17.4 16.1 17.1
2019/03/05 11:42:01 16.5 15.4 16.8 16.6 14.8 16.7 17.0 15.9 16.3
2019/03/05 11:43:01 16.3 15.5 16.6 16.2 14.5 16.5 16.6 15.9 15.9
2019/03/05 11:44:01 16.4 15.3 16.7 15.9 14.4 16.9 16.3 16.1 15.6
2019/03/05 11:45:01 16.8 15.2 16.7 15.7 14.3 16.7 16.0 16.6 15.4
2019/03/05 11:46:01 16.6 15.1 16.9 15.4 14.2 16.5 15.8 16.7 15.3
2019/03/05 11:47:01 16.6 15.2 17.4 15.4 14.3 17.7 15.9 16.6 15.3
2019/03/05 11:48:01 16.2 15.0 17.1 15.4 14.2 17.5 15.8 16.4 15.3
2019/03/05 11:49:01 15.8 14.5 16.8 15.2 14.1 17.1 15.5 16.1 15.1
2019/03/05 11:50:01 15.2 13.7 16.4 14.8 13.9 16.4 14.9 15.5 14.8
2019/03/05 11:51:01 14.6 12.7 15.8 14.3 13.5 15.6 14.2 14.8 14.2
I need to group the following results based on the event.
That is, all the serial number temperatures need to be listed at each time point.I tried to use datatime group by.But the other columns are not in the aggregate function.

It seems like you need to utilize pivot table as:
select top 50 datatime,[ch1], [ch2], [ch3], [ch4], [ch5], [ch6], [ch7], [ch8], [ch9]
from
(select
[DataTime] datatime,[Temperature] temperature
b.[serialnumber] serialnumber
from [dbo].[datalog] a,[report_devicelist] b where a.deviceno = b.deviceno and b.report_no='201906140013yEcD'
and a.datatime between '2019-04-09 15:05:00' and '2019-04-09 16:20:52'
) AS SourceTable
PIVOT
(
AVG(temperature)
FOR serialnumber IN ([ch1], [ch2], [ch3], [ch4], [ch5], [ch6], [ch7], [ch8], [ch9])
) AS PivotTable
Order by datatime;
and limit the number of queries to top 50 that sorted by [datatime].

Related

Slicing numpy with condition

I have numpy array with the sape of 178 rows X 14 columns like this:
0 1 2 3 4 5 6 7 8 9 10 11 \
0 1.0 14.23 1.71 2.43 15.6 127.0 2.80 3.06 0.28 2.29 5.64 1.04
1 1.0 13.20 1.78 2.14 11.2 100.0 2.65 2.76 0.26 1.28 4.38 1.05
2 1.0 13.16 2.36 2.67 18.6 101.0 2.80 3.24 0.30 2.81 5.68 1.03
3 1.0 14.37 1.95 2.50 16.8 113.0 3.85 3.49 0.24 2.18 7.80 0.86
4 1.0 13.24 2.59 2.87 21.0 118.0 2.80 2.69 0.39 1.82 4.32 1.04
.. ... ... ... ... ... ... ... ... ... ... ... ...
173 3.0 13.71 5.65 2.45 20.5 95.0 1.68 0.61 0.52 1.06 7.70 0.64
174 3.0 13.40 3.91 2.48 23.0 102.0 1.80 0.75 0.43 1.41 7.30 0.70
175 3.0 13.27 4.28 2.26 20.0 120.0 1.59 0.69 0.43 1.35 10.20 0.59
176 3.0 13.17 2.59 2.37 20.0 120.0 1.65 0.68 0.53 1.46 9.30 0.60
177 3.0 14.13 4.10 2.74 24.5 96.0 2.05 0.76 0.56 1.35 9.20 0.61
12 13
0 3.92 1065.0
1 3.40 1050.0
2 3.17 1185.0
3 3.45 1480.0
4 2.93 735.0
.. ... ...
173 1.74 740.0
174 1.56 750.0
175 1.56 835.0
176 1.62 840.0
177 1.60 560.0
[178 rows x 14 columns]
I tried to print it in dataframe for all the rows and only the first (index 0) column and the output worked like this:
0
0 1.0
1 1.0
2 1.0
3 1.0
4 1.0
.. ...
173 3.0
174 3.0
175 3.0
176 3.0
177 3.0
[178 rows x 1 columns]
using the same logic, I want totake all the rows and only the first column with the value is below 2. I tried to do it like this and it doesn't work:
reduced = data[data[:,0:1]<=2]
I got an
IndexError
like this:
IndexError Traceback (most recent call last)
<ipython-input-159-7eab0abd8f99> in <module>()
----> 1 reduced = data[data[:,0:1]<=2]
IndexError: boolean index did not match indexed array along dimension 1; dimension is 14 but corresponding boolean dimension is 1.
anybody can help me?
thank in advance

Solved it.
It is just as simple as just convert the numpy array to dataframe and then select rows based on condition in dataframe:
reduced = data[data['class'] <= 2]

awk: print values of one field on another field

I have a csv file with 2 fields: date ($1) and daily temperatures throughout the year ($2) and I want to extract the temperatures from April to September, but each month on another column like this:
April
May
17 C
20 C
15 C
22 C
15 C
21 C
...
Using the following command I get a temp.csv file with all temperatures in a single column:
awk ' /2020-04/ {print $2}' year-temperatures.csv >> temp.csv
awk ' /2020-05/ {print $2}' year-temperatures.csv >> temp.csv
awk ' /2020-06/ {print $2}' year-temperatures.csv >> temp.csv
What should be done to put each month be in another column?

Look at the following script (temperature.awk):
BEGIN {
SUBSEP="#"
}
{
month=0+substr($1,6,2);
day=0+substr($1,9,2);
a[month,day]=$2;
}
END{
printf("%5s ","")
for (month=1; month<=12; month++) {
printf("%5s ", month);
}
printf("\n");
for (day=1; day<=31; day++) {
printf("%4s: ", day)
for (month=1;month<=12; month++) {
printf("%5s ", a[month,day])
}
printf("\n")
}
}
When doing: gawk -F, -f temperature.awk year-temperatures.csv >> temp.csv
Your temp.csv should look sometheing like this (with my test data):
1 2 3 4 5 6 7 8 9 10 11 12
1: 17.5 19.9 21.5 19.6 18.7 14.2 18.5 18.9 15.9 14.3 21.4 21.4
2: 18.6 20.7 17.6 14 12.7 13.4 17.1 12.3 21.6 17.3 18.8 12.8
3: 18.3 21.8 21.8 19.1 15.6 12.5 18 12.8 18.5 21.7 17.6 17.8
4: 14 14.7 13.9 21.6 18 20.3 16.8 15 15.7 14.4 19.5 18.7
5: 12.7 16.3 12.3 18.7 20.9 12.1 18.1 14.5 21.1 15 12.6 18.1
6: 19.7 15.2 17.7 16.5 18.6 17.4 17.9 15.4 16.4 19.9 12.7 12.2
7: 18.3 15.1 19.7 14.6 18.2 18.7 13.2 21.8 16.5 12.4 13.8 15
8: 20.2 18.2 13.5 21.3 13.4 19.4 20.2 20.6 21.5 20.3 18.7 16.2
9: 14.4 13.4 16.4 20.8 20.3 18.8 19.5 15.7 15.7 12.4 20.3 14.1
10: 19.4 20.7 19.3 18.2 19.4 14 14.9 14.7 12.2 19.1 13.2 20
11: 21.8 21.2 15.2 16.7 14 21.4 14.1 14.5 12.1 16.3 13.4 15.8
12: 18.8 21.9 16.2 16.7 20 13.3 13.8 16.2 21.6 12.2 15.1 16.8
13: 16.5 14 13.4 21.5 16 20 14.7 15.5 19.7 20 13.4 14.7
14: 14.3 12.2 16.2 15.5 18 18.1 20 17 21.9 21.3 19.9 21.2
15: 20 16.9 19.1 21.1 19.7 18.4 14.1 16.3 18.5 14.6 17.2 19.7
16: 15.1 16.1 14.8 16.9 12.8 15.8 18.2 18.5 14.7 16.9 14.1 13.1
17: 13.3 17.7 14.7 19.2 12.9 21.6 16.8 21.6 16.2 19 17.1 14.1
18: 19.5 18.3 17.3 13.3 14.2 18.9 17.4 20.4 14.6 12.4 21.3 19.5
19: 15.4 16.3 20.1 16.8 20.2 17.6 14.4 15.4 12.6 12.8 13 13
20: 16.8 14.7 16.6 12.2 16.2 19.3 18 13.8 17 14.9 19 14.5
21: 15.4 12.4 20.6 18.6 18.7 21.8 14.7 20.6 15.1 13.9 14.1 21.8
22: 14.9 16.1 21.4 14.4 12.8 19.2 17.5 19.5 12.8 12.7 21.5 13.1
23: 16.3 21.1 12.9 14.3 16.1 18.6 21.3 13.9 16.6 20.2 13.2 18.5
24: 14.9 15.3 18.7 16.3 19.8 13.5 12.1 19 12.7 20.5 19.5 20.9
25: 13.3 21 12.5 16.5 18.9 19.4 14.8 21.3 21.5 20.2 15.9 17
26: 20 17.4 14.4 21.7 12.8 14.6 15.5 17.4 17.5 17.5 18.9 20.2
27: 18 12 12.5 17.1 15.7 12.9 21 21.2 20.8 15 14.8 18.3
28: 17.9 15.9 17.6 18.2 17.7 18.5 16.7 21.8 19.6 20.2 15.6 18.7
29: 13.8 18.2 17.9 19.7 21.7 18.6 13.4 13.7 14.1 21.2 16.7
30: 13.1 16.1 12.9 13.3 21.1 20.9 19.5 17.5 18 17.4 15.3
31: 12.3 14 15.2 16.7 15.3 15.5 14.4
The first couple of lines from my testdata look like this:
2022-01-01, 17.5
2022-01-02, 18.6
2022-01-03, 18.3
2022-01-04, 14
2022-01-05, 12.7
2022-01-06, 19.7
2022-01-07, 18.3
2022-01-08, 20.2

Pandas Boxplot Highlight Specific Values in DF

I have a df called "YMp" and I have made a boxplot for the data with the dates 1991 - 2019 but I need to show the current year (2020) values as colored points or values with a legend showing the year 2020 over-plotted on the boxplot.
The data looks like this -
month 01 02 03 04 05 06 07 08 09 10 11 12
year
1991 -4.9 12.2 -11.1 -18.0 -27.5 1.7 7.4 22.7 38.3 4.2 -0.9 5.3
1992 -10.9 -17.1 -7.7 14.8 14.8 -9.6 17.0 24.7 32.3 0.3 -21.6 15.3
1993 -1.8 -2.3 -3.8 0.4 -4.8 -7.7 11.7 26.3 17.1 2.6 4.4 2.4
1994 2.6 2.5 -6.2 -3.2 2.2 -3.0 13.8 3.9 30.4 -25.7 -1.8 -2.2
1995 -8.6 -3.3 -18.4 -14.0 -19.3 13.2 9.8 -23.2 16.0 -15.2 0.6 -8.5
1996 -5.5 -10.4 -0.3 7.2 13.0 3.6 5.2 1.4 -10.3 -2.9 15.4 -0.6
1997 -11.1 8.9 -1.1 -12.7 3.0 -4.0 27.1 32.6 -4.5 -15.5 -5.5 -20.9
1998 -22.0 -16.6 3.2 0.7 15.4 16.0 18.5 2.7 -32.3 16.3 -5.4 12.9
1999 -1.0 0.3 -8.5 9.9 7.4 -2.1 10.9 -5.5 18.5 17.4 17.5 11.1
2000 5.4 12.9 -24.8 15.7 -9.3 20.7 18.2 23.2 16.6 26.8 -17.7 17.3
2001 -3.9 14.5 -4.7 18.6 5.6 22.4 -3.3 18.2 5.3 31.2 6.0 -4.0
2002 -9.0 19.5 12.5 24.5 27.6 -9.3 3.7 13.7 -32.7 -19.5 0.7 -6.1
2003 23.6 -11.7 -16.5 -2.1 6.5 -13.7 0.4 8.0 -13.7 -16.1 7.3 13.1
2004 6.6 4.7 36.8 12.8 29.5 6.4 -12.2 -0.6 -7.7 -15.2 -1.1 12.7
2005 6.3 1.1 -14.6 9.4 -7.5 6.1 -9.2 -1.3 36.1 -4.9 10.8 -11.7
2006 7.3 8.3 1.7 11.8 -14.7 33.3 9.1 -0.0 3.0 1.4 -2.8 8.8
2007 5.4 0.2 7.2 -3.9 6.6 -8.3 -28.2 -7.6 3.3 -7.4 25.0 -7.3
2008 5.0 -5.6 7.6 -0.4 -1.2 13.9 -11.3 -29.7 16.7 43.1 2.4 3.5
2009 -2.2 17.1 9.8 8.9 -9.2 -14.4 6.1 21.7 -0.2 -26.7 -9.1 -18.2
2010 -2.6 -12.1 0.8 -16.5 4.1 3.9 -21.5 -3.3 -18.9 22.8 -6.5 -5.3
2011 -12.4 -3.8 1.2 -14.9 -2.0 6.8 -12.6 -16.9 8.3 10.7 -0.7 4.6
2012 0.5 -3.0 -1.0 -6.5 7.5 -17.9 -4.3 -26.3 -2.6 3.0 12.3 -15.3
2013 -1.7 -15.1 18.8 -8.3 7.5 -4.5 -19.3 0.9 -33.9 -10.6 -0.4 4.4
2014 7.2 -20.0 -8.4 2.0 10.1 -20.2 7.8 -14.9 -11.4 -6.9 -0.3 6.4
2015 18.4 6.2 10.5 -16.5 -11.9 7.0 -7.3 -6.7 -20.8 -13.9 -3.3 -14.8
2016 -11.3 28.5 -9.2 -4.2 -9.7 1.0 -5.1 -18.9 -3.3 19.1 -1.1 10.1
2017 -8.6 -8.1 21.2 4.5 -21.2 -28.5 -6.8 -30.8 -19.7 13.3 7.2 9.9
2018 26.9 3.1 -7.1 -3.4 -8.7 -15.5 12.8 3.9 -16.4 -7.9 -25.7 -9.2
2019 2.1 -17.3 10.2 1.0 -13.5 -3.4 -14.0 -20.7 -1.5 -28.4 -5.4 -13.9
The current year 2020 data looks like this -
2020 0.3 6.2 2.0 -17.9 -0.4 6.0 -24.5 2.5 -12.1 4.6 NaN NaN
My boxplot looks like this without the 2020 data plotted or highlighted below. Thank you for helping with ideas about doing this.

Try catching the axis instance and plot again:
ax = df.boxplot()
ax.scatter(np.arange(df.shape[1])+1, df.loc[2000], color='r')
Output:

how can i convert time-series data to numpy array keep time order in pandas?

i have below dataframe belows .. and i wanna convert it to numpy array.
when i tried.. time order is broken converted to numpy array.
may it's because it is time-series data (19:00~0:00:00~07:00:00)
how can i keep time-order convert dataframe to numpy array?
aaa \
Date 2015-12-06 2015-12-13 2015-12-20 2015-12-23 2015-12-26 2016-01-03
Time
19:00:00 4.72 8.50 3.87 7.95 1.76 9.82
19:15:00 4.54 8.00 3.72 8.14 1.74 9.77
19:30:00 4.44 8.17 3.72 7.99 1.75 9.77
19:45:00 4.37 7.92 3.28 7.94 1.89 9.61
20:00:00 4.03 7.54 2.48 7.99 1.98 9.46
20:15:00 3.74 7.86 3.30 7.68 1.63 9.30
20:30:00 3.48 8.41 3.52 7.88 1.52 9.22
20:45:00 3.31 8.52 3.81 7.83 1.54 9.08
21:00:00 3.17 8.23 3.97 7.96 1.63 9.14
21:15:00 2.99 8.23 3.37 7.61 1.87 9.14
21:30:00 2.96 8.26 3.23 7.63 2.03 9.13
21:45:00 2.69 7.89 3.10 7.34 2.12 9.04
22:00:00 2.62 7.83 2.94 7.21 2.11 9.04
22:15:00 2.55 7.78 2.83 7.26 2.39 9.01
22:30:00 2.49 7.73 2.89 7.15 2.30 9.08
22:45:00 2.48 7.80 2.79 7.02 2.22 8.92
23:00:00 2.38 7.71 2.92 7.17 2.43 8.80
23:15:00 2.23 7.74 3.01 7.24 2.33 8.56
23:30:00 2.29 7.51 3.10 7.14 2.38 8.32
23:45:00 2.29 7.31 3.00 6.89 2.10 8.02
00:00:00 2.17 6.84 2.84 6.89 1.82 7.86
00:15:00 2.13 6.84 2.65 7.06 1.36 7.95
00:30:00 2.21 6.78 2.63 6.98 0.92 7.97
00:45:00 2.19 6.41 2.18 7.08 1.05 7.80
01:00:00 2.13 6.24 1.56 7.20 0.81 7.73
01:15:00 2.14 5.90 1.39 7.31 1.01 7.89
01:30:00 2.13 5.74 1.81 7.58 0.79 7.91
01:45:00 2.11 5.82 1.60 7.47 1.19 8.02
02:00:00 1.72 6.01 0.90 7.14 1.27 8.09
02:15:00 1.94 6.04 1.12 7.33 0.95 8.13
02:30:00 2.05 6.00 1.44 7.06 1.15 8.15
02:45:00 1.96 6.03 1.45 6.86 1.05 7.95
03:00:00 1.63 6.28 1.62 6.85 1.22 7.43
03:15:00 1.79 6.14 1.41 6.94 1.05 6.97
03:30:00 1.37 6.03 1.29 6.98 1.27 6.97
03:45:00 1.44 5.84 1.01 7.29 1.31 6.90
04:00:00 1.37 5.62 0.92 7.13 1.35 6.77
04:15:00 1.62 5.75 0.95 7.18 1.21 7.09
04:30:00 1.64 5.71 1.06 7.18 1.32 7.27
04:45:00 1.40 5.46 0.79 7.17 1.55 7.35
05:00:00 1.51 5.48 0.64 6.83 1.42 7.27
05:15:00 1.46 5.80 0.52 6.58 1.60 7.21
05:30:00 1.61 5.59 0.35 6.98 1.54 7.13
05:45:00 1.49 5.28 0.46 6.58 1.58 7.04
06:00:00 1.55 5.00 0.17 6.35 1.88 7.10
06:15:00 1.94 4.94 -0.18 6.12 1.94 7.11
06:30:00 1.45 5.01 -0.31 6.02 1.90 7.14
06:45:00 1.36 4.90 -0.17 5.83 2.06 7.17
07:00:00 1.25 4.75 0.20 5.70 2.35 7.18

You need transpose DataFrame by T and convert to array:
arr = df.T.values
Or first convert to array and then transpose:
arr = df.values.T

numpy savetxt different cols different format output

I want to use np.savetxt(file,array,fmt='%8.1f') to save as txt
1958 6.4 1.8 7.7 70.1 41.4 38.5 65.4 25.7
1959 27.2 42.5 63.3 86.2 101.5 71.4 114.2 137.9
1960 22.9 18.3 28.7 106.5 159.1 50.4 203 121.6
1961 4.4 26.9 47.1 67.9 53.6 64.8 95 42
1962 20.9 31.2 60.6 38.8 66.2 37.9 67.9 62.3
1963 11.9 14.5 59 56 83.1 110.9 77.1 93.5
each element take up 8 spaces one by one(no seperation between each one).
First cols year format is %8d, and others is %8.1f. flush right.
How to do this in numpy? or using pandas?

n = len(df.columns)
fmt = ('{:8.0f}' + '{:8.1f}' * (n - 1)).format
print(df.apply(lambda x: fmt(*x), 1).to_csv(index=None, header=None))
1958 6.4 1.8 7.7 70.1 41.4 38.5 65.4 25.7
1959 27.2 42.5 63.3 86.2 101.5 71.4 114.2 137.9
1960 22.9 18.3 28.7 106.5 159.1 50.4 203.0 121.6
1961 4.4 26.9 47.1 67.9 53.6 64.8 95.0 42.0
1962 20.9 31.2 60.6 38.8 66.2 37.9 67.9 62.3
1963 11.9 14.5 59.0 56.0 83.1 110.9 77.1 93.5

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

I need to group a result set - sql

Related

Slicing numpy with condition

awk: print values of one field on another field

Pandas Boxplot Highlight Specific Values in DF

how can i convert time-series data to numpy array keep time order in pandas?

numpy savetxt different cols different format output

Categories

Resources