How to plot using timstamp and coordinates? - matplotlib

I have logs of mouse movement that is coordinates and timestamp .I want to plot the mouse movement using this log how can I do this I have no idea what API or what can be used to do the same.I want to know how start with if there is some way which exist.
My log is as follows
Date hr:min:sec ms x y
13/6/2020 13:13:33 521 291 283
13/6/2020 13:13:33 638 273 234
13/6/2020 13:13:33 647 272 233
13/6/2020 13:13:33 657 271 231
13/6/2020 13:13:33 667 269 230
13/6/2020 13:13:33 677 268 229
13/6/2020 13:13:33 687 267 228
13/6/2020 13:13:33 697 264 226

You're looking for geom_path() from ggplot2. The geom will connect a line between all your observations based on the order they appear in the dataframe. So, here's some x,y data that's expanded a bit:
df <- data.frame(
x=c(291,273,272,271,269,268,267,264,262,261,261,265,268,280,290),
y=c(283,234,233,231,230,229,228,226,230,235,237,248,252,246,235)
)
And some code to make a simple plot using geom_path():
p <- ggplot(df, aes(x=x,y=y)) + theme_classic() +
geom_path(color='blue') + geom_point()
p
If you want, you can even save that as an animation based on your time points. See the code below using the gganimate package:
library(gganimate)
df$time <- 1:15
a <- p + transition_reveal(time)
animate(a, fps=20)

Related

How to solve an error, "module 'numpy' has no attribute 'float'"?

Circumstance
WSL2
Docker
Virtualenv
Python 3.8.16
jupyterlab 3.5.2
numpy 1.24.1
prophet 1.1.1
fbprophet 0.7.1
Cython 0.29.33
ipython 8.8.0
pmdarima 2.0.2
plotly 5.11.0
pip 22.3.1
pystan 2.19.1.1
scikit-learn 1.2.0
konlpy 0.6.0 (just in the case)
nodejs 0.1.1 (just in the case)
pandas 1.5.2 (just in the case)
Error
main error message
AttributeError: module 'numpy' has no attribute 'float'
entire error message
INFO:fbprophet:Disabling yearly seasonality. Run prophet with yearly_seasonality=True to override this.
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Cell In[33], line 4
1 # Prophet() 모델을 읽어와서
2 # fit로 학습한다.
3 model_revenue = Prophet()
----> 4 model_revenue.fit(revenue_serial)
File /home/.venv/lib/python3.8/site-packages/fbprophet/forecaster.py:1115, in Prophet.fit(self, df, **kwargs)
1112 self.history = history
1113 self.set_auto_seasonalities()
1114 seasonal_features, prior_scales, component_cols, modes = (
-> 1115 self.make_all_seasonality_features(history))
1116 self.train_component_cols = component_cols
1117 self.component_modes = modes
File /home/.venv/lib/python3.8/site-packages/fbprophet/forecaster.py:765, in Prophet.make_all_seasonality_features(self, df)
763 # Seasonality features
764 for name, props in self.seasonalities.items():
--> 765 features = self.make_seasonality_features(
766 df['ds'],
767 props['period'],
768 props['fourier_order'],
769 name,
770 )
771 if props['condition_name'] is not None:
772 features[~df[props['condition_name']]] = 0
File /home/.venv/lib/python3.8/site-packages/fbprophet/forecaster.py:458, in Prophet.make_seasonality_features(cls, dates, period, series_order, prefix)
442 #classmethod
443 def make_seasonality_features(cls, dates, period, series_order, prefix):
444 """Data frame with seasonality features.
445
446 Parameters
(...)
456 pd.DataFrame with seasonality features.
457 """
--> 458 features = cls.fourier_series(dates, period, series_order)
459 columns = [
460 '{}_delim_{}'.format(prefix, i + 1)
461 for i in range(features.shape[1])
462 ]
463 return pd.DataFrame(features, columns=columns)
File /home/.venv/lib/python3.8/site-packages/fbprophet/forecaster.py:434, in Prophet.fourier_series(dates, period, series_order)
417 """Provides Fourier series components with the specified frequency
418 and order.
419
(...)
428 Matrix with seasonality features.
429 """
430 # convert to days since epoch
431 t = np.array(
432 (dates - datetime(1970, 1, 1))
433 .dt.total_seconds()
--> 434 .astype(np.float)
435 ) / (3600 * 24.)
436 return np.column_stack([
437 fun((2.0 * (i + 1) * np.pi * t / period))
438 for i in range(series_order)
439 for fun in (np.sin, np.cos)
440 ])
File /home/.venv/lib/python3.8/site-packages/numpy/__init__.py:284, in __getattr__(attr)
281 from .testing import Tester
282 return Tester
--> 284 raise AttributeError("module {!r} has no attribute "
285 "{!r}".format(__name__, attr))
AttributeError: module 'numpy' has no attribute 'float'
Example of dataset
ds y
0 2022-09-01 13:00:00 762
1 2022-09-01 15:00:00 746
2 2022-09-01 17:00:00 848
3 2022-09-01 19:00:00 866
4 2022-09-01 21:00:00 632
... ... ...
1881 2022-10-31 13:00:00 684
1882 2022-10-31 15:00:00 749
1883 2022-10-31 17:00:00 779
1884 2022-10-31 19:00:00 573
1885 2022-10-31 21:00:00 510
Type of variable
visitors_serial
ds datetime64[ns]
y int64
dtype: object
Short code
...
revenue_serial = pd.DataFrame(pd.to_datetime(df_active_time['START_DATE'], format="%Y%m%d %H:%M:%S"))
revenue_serial['객단가(원)']=df_active_time['객단가(원)']
revenue_serial = revenue_serial.reset_index(drop= True)
revenue_serial = revenue_serial.rename(columns={'START_DATE':'ds', '객단가(원)':'y'})
model_revenue = Prophet().
model_revenue.fit(revenue_serial)
I expected if I do upgrade the version of numpy module, it would be solved. It doesn't happend to solve
you could see it in your code the error numpy actually has no attribute float your code is t = np.array((dates - datetime(1970, 1, 1)).dt.total_seconds().astype(np.float) it should be
t = np.array(
(dates - datetime(1970, 1, 1))
.dt.total_seconds()
.astype(np.float32)
The alias numpy.float was deprecated in NumPy 1.20 and was removed in NumPy 1.24.
You can change it to numpy.float_, numpy.float64, or numpy.double. They all mean the same thing.
For your dependency prophet, the actual issue was already fixed in #1850 (March 2021), and it does appear to be fixed in v1.1.1 so it looks like you're not running the version you think you are.

Removing duplicates based on matching column values with boolean indexing

After merging two DF's I have the following dataset:
DB_ID
x_val
y_val
x01
405
407
x01
405
405
x02
308
306
x02
308
308
x03
658
658
x03
658
660
x04
None
658
x04
None
660
x05
658
660
x06
660
660
The y table contains multiple values for the left join variable (not included in table), resulting in multiple rows per unique DB_ID (string variable, not in df index).
The issue is that only one row is correct, where x_val and y_val match. I tried removing the duplicates with the following code:
df= df[~df['DB_ID'].duplicated() | combined['x_val'] != combined['y_val']]
This however doesn't work. I am looking for a solution to achieve the following result:
DB_ID
x_val
y_val
x01
405
405
x02
308
308
x03
658
658
x04
None
658
x05
658
660
x06
660
660
Idea is compare both column for not equal, then sorting and reove duplicates by DB_ID:
df = (df.assign(new = df['x_val'].ne(df['y_val']))
.sort_values(['DB_ID','new'])
.drop_duplicates('DB_ID')
.drop('new', axis=1))
print (df)
DB_ID x_val y_val
1 x01 405 405
3 x02 308 308
4 x03 658 658
6 x04 None 658
8 x05 658 660
9 x06 660 660
If need equal NaNs or Nones use:
df = (df.assign(new = df['x_val'].fillna('same').ne(df['y_val'].fillna('same')))
.sort_values(['DB_ID','new'])
.drop_duplicates('DB_ID')
.drop('new', axis=1))
Maybe, you can simply use:
df = df[df['x_val'] == df['y_val']]
print(df)
# Output
DB_ID x_val y_val
1 x01 405 405
3 x02 308 308
4 x03 658 658
I think you don't need drop_duplicates or duplicated but if you want to ensure there remains only one instance of each DB_ID, you can append .drop_duplicates('DB_ID')
df = df[df['x_val'] == df['y_val']].drop_duplicates('DB_ID')
print(df)
# Output
DB_ID x_val y_val
1 x01 405 405
3 x02 308 308
4 x03 658 658

Add column for percentages

I have a df who looks like this:
Total Initial Follow Sched Supp Any
0 5525 3663 968 296 65 533
I transpose the df 'cause I have to add a column with the percentages based on column 'Total'
Now my df looks like this:
0
Total 5525
Initial 3663
Follow 968
Sched 296
Supp 65
Any 533
So, How can I add this percentage column?
The expected output looks like this
0 Percentage
Total 5525 100
Initial 3663 66.3
Follow 968 17.5
Sched 296 5.4
Supp 65 1.2
Any 533 9.6
Do you know how can I add this new column?
I'm working in jupyterlab with pandas and numpy
Multiple column 0 by scalar from Total with Series.div, then multiple by 100 by Series.mul and last round by Series.round:
df['Percentage'] = df[0].div(df.loc['Total', 0]).mul(100).round(1)
print (df)
0 Percentage
Total 5525 100.0
Initial 3663 66.3
Follow 968 17.5
Sched 296 5.4
Supp 65 1.2
Any 533 9.6
Consider below df:
In [1328]: df
Out[1328]:
b
a
Total 5525
Initial 3663
Follow 968
Sched 296
Supp 65
Any 533
In [1327]: df['Perc'] = round(df.b.div(df.loc['Total', 'b']) * 100, 1)
In [1330]: df
Out[1330]:
b Perc
a
Total 5525 100.0
Initial 3663 66.3
Follow 968 17.5
Sched 296 5.4
Supp 65 1.2
Any 533 9.6

SQL JOIN with 2 aggregates returning incorrect results

I am trying to join 3 different tables to get how many Home Runs a player has in his career along with how many Awards they have recieved. However, I'm getting incorrect results:
Peoples
PlayerId
Battings
PlayerId, HomeRuns
AwardsPlayers
PlayerId, AwardName
Current Attempt
SELECT TOP 25 Peoples.PlayerId, SUM(Battings.HomeRuns) as HomeRuns, COUNT(AwardsPlayers.PlayerId)
FROM Peoples
JOIN Battings ON Battings.PlayerId = Peoples.PlayerId
JOIN AwardsPlayers ON AwardsPlayers.PlayerId = Battings.PlayerId
GROUP BY Peoples.PlayerId
ORDER BY SUM(HomeRuns) desc
Result
PlayerID HomeRuns AwardCount
bondsba01 35814 1034
ruthba01 23562 726
rodrial01 21576 682
mayswi01 21120 736
willite01 20319 741
griffke02 18270 667
schmimi01 18084 594
musiast01 16150 748
pujolal01 14559 414
dimagjo01 12996 468
ripkeca01 12499 609
gehrilo01 12325 425
aaronha01 12080 368
foxxji01 11748 462
ramirma02 10545 399
benchjo01 10114 442
sosasa01 9744 304
ortizda01 9738 360
piazzmi01 9394 396
winfida01 9300 460
rodriiv01 9019 667
robinfr02 8790 330
dawsoan01 8760 420
robinbr01 8576 736
hornsro01 8127 648
I am pretty confident it's my second join Do I need to do some sort of subquery or should this work? Barry Bonds definitely does not have 35,814 Home Runs nor does he have 1,034 Awards
If I just do a single join, I get the correct output:
SELECT TOP 25 Peoples.PlayerId, SUM(Battings.HomeRuns) as HomeRuns
FROM Peoples
JOIN Battings ON Battings.PlayerId = Peoples.PlayerId
GROUP BY Peoples.PlayerId
ORDER BY SUM(HomeRuns) desc
bondsba01 762
aaronha01 755
ruthba01 714
rodrial01 696
mayswi01 660
pujolal01 633
griffke02 630
thomeji01 612
sosasa01 609
robinfr02 586
mcgwima01 583
killeha01 573
palmera01 569
jacksre01 563
ramirma02 555
schmimi01 548
ortizda01 541
mantlmi01 536
foxxji01 534
mccovwi01 521
thomafr04 521
willite01 521
bankser01 512
matheed01 512
ottme01 511
What am I doing wrong? I'm sure it's how I'm joining my second table (AwardsPlayers)
I think you have two independent dimensions. The best approach is to aggregate before joining:
SELECT TOP 25 p.PlayerId, b.HomeRuns, ap.cnt
FROM Peoples p LEFT JOIN
(SELECT b.PlayerId, SUM(b.HomeRuns) as HomeRuns
FROM Battings b
GROUP BY b.PlayerId
) b
ON b.PlayerId = p.PlayerId LEFT JOIN
(SELECT ap.PlayerId, COUNT(*) as cnt
FROM AwardsPlayers ap
GROUP BY ap.PlayerId
) ap
ON ap.PlayerId = p.PlayerId
ORDER BY b.HomeRuns desc;
Result
bondsba01 762 47
aaronha01 755 16
ruthba01 714 33
rodrial01 696 31
mayswi01 660 32
pujolal01 633 23
griffke02 630 29
thomeji01 612 6
sosasa01 609 16
robinfr02 586 15
mcgwima01 583 9
killeha01 573 8
palmera01 569 8
jacksre01 563 13
ramirma02 555 19
schmimi01 548 33
ortizda01 541 18
mantlmi01 536 15
foxxji01 534 22
mccovwi01 521 10
thomafr04 521 10
willite01 521 39
bankser01 512 10
matheed01 512 4
ottme01 511 11

How to get multiresult with multicondition in Sql Server

I need help to solve this.
Hopefully someone can giving me advices.
For a sample, I've got data like :
PROCLIB.MARCH 1
First 10 Rows Only
Flight Date Depart Orig Dest Miles Boarded Capacity
-----------------------------------------------------------------
114 01MAR94 7:10 LGA LAX 2475 172 210
202 01MAR94 10:43 LGA ORD 740 151 210
219 01MAR94 9:31 LGA LON 3442 198 250
622 01MAR94 12:19 LGA FRA 3857 207 250
132 01MAR94 15:35 LGA YYZ 366 115 178
271 01MAR94 13:17 LGA PAR 3635 138 250
302 01MAR94 20:22 LGA WAS 229 105 180
114 02MAR94 7:10 LGA LAX 2475 119 210
202 02MAR94 10:43 LGA ORD 740 120 210
219 02MAR94 9:31 LGA LON 3442 147 250
and i have condition for ('LAX,ORD'), 'LAX','LON','YYZ',('PAR,LON,FRA'),'FRA' ...AND ELSE
What should i do with that data to show report as that condition in SQL?
Parameter that I made is
Dest like #dest -> (from table condition(('LAX, ORD'), 'LAX','LON',('PAR,LON,FRA'),'FRA',..etc)) +'%'
And Date like #date + '%'
And Depart like #depart + '%'
If I choose 'LAX' as #dest, then only 'LAX' will show
If I choose 'LAX,ORD' as #dest, then only 'LAX' and 'ORD' will show
Please I need help, advice and suggestion for this.
Thanks
If your #dest value is 'LAX,ORD', one query that will solve your solution is
select *
from PROCLIB.MARCH
where dest in ('LAX','ORD')
To parameterise that, you need it to become a table.
Dest
====
LAX
ORD
and the query becomes
select *
from PROCLIB.MARCH
where dest in (select Dest from #DestTable)
If you want to pass #dest as a string parameter, then you need to split it somehow into a table. A search for SQL split function will give a number of options.
If your query is encapsulated in a stored procedure, a better method is to pass the values as a table valued parameter. See http://blog.sqlauthority.com/2008/08/31/sql-server-table-valued-parameters-in-sql-server-2008/