SQL Convert Int to Time - sql

i'm trying to import data from a text file to an SQL database. The file is using TAB not , to separate fields. My issue on import has been that when I write the Time which is given as an Int it completely messes up on import.
Part of the text file:
North Felix 2011-07-01 0422 0.47 0012 0.69 2109 0.55 1311 1.44
North Felix 2011-07-02 0459 0.43 0048 0.72 2140 0.55 1342 1.47
North Felix 2011-07-03 0533 0.41 0123 0.75 2213 0.57 1412 1.46
North Felix 2011-07-04 0605 0.41 0158 0.79 2244 0.59 1441 1.41
My query result:
INSERT INTO `dbc`.`history_long` (`Region`,`Location`,`Date`,`LT1-time`,`LT1-height`,`HT1-time`,`HT1-height`,`LT2-time`,`LT2-height`,`HT2-time`,`HT2-height`)
values ('North','Felix','2011:00:00','422:00:00','0.47','12:00:00','0.69','2109:00:00','0.55','1311:00:00','1.44'),
('North','Felix','2011:00:00','459:00:00','0.43','48:00:00','0.72','2140:00:00','0.55','1342:00:00','1.47'),
('North','Felix','2011:00:00','533:00:00','0.41','123:00:00','0.75','2213:00:00','0.57','1412:00:00','1.46'),
('North','Felix','2011:00:00','605:00:00','0.41','158:00:00','0.79','2244:00:00','0.59','1441:00:00','1.41'),
The issue is for example L2-time becomes 2109:00:00 in the time column. Is there a way to convert this from Int to Time?

Here's how you could convert an int like 0422 to a time value:
SELECT CAST('00:00' AS time)
+ INTERVAL (#IntValue DIV 60) HOUR
+ INTERVAL (#IntValue % 60) MINUTE
Without seeing your import query it's impossible to say how you could actually apply it.

Related

Can I do complex rollups or sums in Oracle View?

At my job, I need to take some granular data collected in a twentieth of a mile and then roll it up to a tenth of a mile. This task is done with python scripts, but I was wondering if I can do it with a materialized view. Here is an example of what the data looks like it is simplest form, and what I would like the view to look like.
Simplest form:
Route Number
Beginning Mile Post
Ending Mile Post
Route Length
001
0
0.02
105.6
001
0.02
0.04
105.6
001
0.04
0.06
105.6
001
0.06
0.08
105.6
001
0.08
0.10
105.6
001
0.10
0.12
105.6
001
0.12
0.14
105.6
This is what I want the view to produce:
Route Number
Beginning Mile Post
Ending Mile Post
Route Length
001
0
0.1
528
001
0.1
0.14
211.2
I have tried using the rollup, sum, MOD, remainder, but not sure how to use them correctly.
I'm not even sure if this is possible through a view or not.
I will accept all suggestions and ideas.
What you need is to use TRUNC() function while creating a view such as
CREATE OR REPLACE VIEW v_Route AS
SELECT Route_Number,
MIN(TRUNC(Beginning_Mile_Post,1)) AS Beginning_Mile_Post,
MAX(Ending_Mile_Post) AS Ending_Mile_Post,
SUM(Route_Length) AS Route_Length
FROM t
GROUP BY Route_Number, TRUNC(Beginning_Mile_Post,1)
Demo

How to subtract value with next row value

I want to calculation subtraction field energy with next value energy I just try with my query but I think result is still wrong
For my code
SELECT
datapm.id,
datapm.tgl,
CONVERT ( CHAR ( 5 ), datapm.stamp , 108 ) stamp,
datapm.pmid,
datapm.vavg,
datapm.pf,
( CAST (datapm.energy AS FLOAT) - (select top 1 CAST (energy AS FLOAT) from datapm as dt2 where dt2.id > datapm.id and dt2.tgl=datapm.tgl)) as energy
FROM
datapm
GROUP BY
datapm.id,
datapm.tgl,
datapm.stamp,
datapm.pmid,
datapm.vavg,
datapm.pf,
datapm.energy
ORDER BY tgl desc
My sample
id pmdi tgl stamp vavg pf energy
787 SDPEXT_2 2021-09-06 06:00:00.0000000 407.82 0.98 1408014.25
788 SDPEXT_2 2021-09-06 07:00:00.0000000 403.31 0.85 1408041.00
789 SDPEXT_2 2021-09-06 08:00:00.0000000 408.82 0.87 1408081.75
Result I want
id pmdi tgl stamp vavg pf energy
787 SDPEXT_2 2021-09-06 06:00:00.0000000 407.82 0.98 -2.675
788 SDPEXT_2 2021-09-06 07:00:00.0000000 403.31 0.85 -4.075
789 SDPEXT_2 2021-09-06 08:00:00.0000000 408.82 0.87 -11.012
remove the GROUP BY in your query, you are not using any aggregate function
If energy is already in numeric data type, don't convert to float.
use LEAD() to get the next row value
SELECT . . .
(d.energy - LEAD (d.energy) OVER (PARTITION BY d.tgl
ORDER BY d.id)) / 10
FROM datapm d
not sure what is the actual formula, but looking at the result, you need to divide by 10 to obtain it

Pandas: Delete Rows or Interpolate

I'm trying to learn IoT data using time series. The data comes from two different sources. In some measurements, the difference between the sources is very small: one source has 11 rows and the second source has 15 rows. In other measurements, one source has 30 rows and the second source has 240 rows.
Thought to interpolate using:
df.resample('20ms').interpolate()
but sow that it delete some rows.
Is there any method to interpolate without deleting or should I delete rows?
EDIT - data and code:
#!/usr/bin/env python3.6
import pandas as pd
import sklearn.preprocessing
from pandas import read_csv
from pandas import datetime
from matplotlib import pyplot
first_df_file_name='interpolate_test.in'
df = read_csv(first_df_file_name, header=0, squeeze=True, delimiter=' ')
print(df.head(5))
idx=0
new_col = pd.date_range('1/1/2011 00:00:00.000000', periods=len(df.index), freq='100ms')
df.insert(loc=idx, column='date', value=new_col)
df.set_index('date', inplace=True)
upsampled = df.resample('20ms').interpolate()
print('20 ms, num rows', len(upsampled.index))
print(upsampled.head(5))
upsampled.to_csv('test_20ms.out')
upsampled = df.resample('60ms').interpolate()
print('60 ms, num rows', len(upsampled.index))
print(upsampled.head(5))
upsampled.to_csv('test_60ms.out')
This is the test input file name:
a b
100 200
200 400
300 600
400 800
500 1000
600 1100
700 1200
800 1300
900 1400
1000 2000
Here is the output (parts of it)
//output of interpolating by 20 milis - this is fine
a b
date
2011-01-01 00:00:00.000 100.0 200.0
2011-01-01 00:00:00.020 120.0 240.0
2011-01-01 00:00:00.040 140.0 280.0
2011-01-01 00:00:00.060 160.0 320.0
2011-01-01 00:00:00.080 180.0 360.0
60 ms, num rows 16
//output when interpolating by 60 milis - data is lost
a b
date
2011-01-01 00:00:00.000 100.0 200.0
2011-01-01 00:00:00.060 160.0 320.0
2011-01-01 00:00:00.120 220.0 440.0
2011-01-01 00:00:00.180 280.0 560.0
2011-01-01 00:00:00.240 340.0 680.0
So, should I delete rows from the largest source instead of interpolating? If I'm interpolating, how can I avoid loosing data?

How can I filter dataframe rows based on a quantile value of a column using groupby?

(There is probably a better way of asking the question, but hopefully this description will make it more clear)
A simplified view of my dataframe, showing 10 random rows, is:
Duration starting_station_id ending_station_id
5163 420 3077 3018
113379 240 3019 3056
9730 240 3047 3074
104058 900 3034 3042
93110 240 3055 3029
93144 240 3016 3014
48999 780 3005 3024
30905 360 3019 3025
88132 300 3022 3048
12673 240 3075 3031
What I want to do is groupby starting_station_id and ending_station_id and the filter out the rows where the value in the Duration column for a group falls above the .99 quantile.
To do the groupby and quantile computation, I do:
df.groupby( ['starting_station_id', 'ending_station_id'] )[ 'Duration' ].quantile([.99])
and some partial output is:
3005 3006 0.99 3825.6
3007 0.99 1134.0
3008 0.99 5968.8
3009 0.99 9420.0
3010 0.99 1740.0
3011 0.99 41856.0
3014 0.99 22629.6
3016 0.99 1793.4
3018 0.99 37466.4
What I believe this is telling me is that for the group ( 3005, 3006 ), the values >= 3825.6 fall into the .99 quantile. So, I want to filter out the rows where the duration value for that group is >= 3825.6. (And then do the same for all of the other groups)
What is the best way to do this?
Try this
thresholds = df.groupby(['start', 'end'])['x'].quantile(.99)
mask = (df.Duration.values > thresholds[[(x, y) for x, y in zip(df.start, df.end)]]).values
out = df[mask]

Generate Pandas DF OHLC data with Numpy

I would like to generate the following test data in my dataframe in a way similar to this:
df = pd.DataFrame(data=np.linspace(1800, 100, 400), index=pd.date_range(end='2015-07-02', periods=400), columns=['close'])
df
close
2014-05-29 1800.000000
2014-05-30 1795.739348
2014-05-31 1791.478697
2014-06-01 1787.218045
But using the following criteria:
intervals of 1 minute
increments of .25
prices moving up and down around 1800.00
maximum 2100.00, minimum 1700.00
parse_dates= "Timestamp"
Volume column rows have a range of min = 50 - max = 300
Day start 09:30 Day End 16:29:59
Please see desired output:
Open High Low Last Volume
Timestamp
2014-03-04 09:30:00 1783.50 1784.50 1783.50 1784.50 171
2014-03-04 09:31:00 1784.75 1785.75 1784.50 1785.25 28
2014-03-04 09:32:00 1785.00 1786.50 1785.00 1786.50 81
2014-03-04 09:33:00 1786.00
I have limited python experience and find the example for Numpy etc hard to follow as they look to be focused on academia. Is it possible to assist with this?