tensorflow Dataset consuming data becomes slower and slower with the increase of epochs - tensorflow

In tensor flow 1.4, the tf.data.Dataset class provides a repeat() function to make epochs, but its performance becomes slower and slower with the increase of epochs!
Here is the code:
num_data = 1000
num_epoch = 50
batch_size = 32
dataset = tf.data.Dataset.range(num_data)
dataset = dataset.repeat(num_epoch).batch(batch_size)
iterator = dataset.make_one_shot_iterator()
with tf.Session() as sess:
for epoch in xrange(num_epoch):
t1 = time.time()
for i in xrange(num_data/batch_size):
a = sess.run(iterator.get_next())
t2 = time.time()
print 'epoch %d comsuming_time %.4f'%(epoch,t2-t1)
and its outputs:
epoch 0 comsuming_time 0.1604
epoch 1 comsuming_time 0.1725
epoch 2 comsuming_time 0.1839
epoch 3 comsuming_time 0.1942
epoch 4 comsuming_time 0.2213
epoch 5 comsuming_time 0.2430
epoch 6 comsuming_time 0.2361
epoch 7 comsuming_time 0.2512
epoch 8 comsuming_time 0.2607
epoch 9 comsuming_time 0.2936
epoch 10 comsuming_time 0.3282
epoch 11 comsuming_time 0.2990
epoch 12 comsuming_time 0.3105
epoch 13 comsuming_time 0.3239
epoch 14 comsuming_time 0.3393
epoch 15 comsuming_time 0.3518
epoch 16 comsuming_time 0.3673
epoch 17 comsuming_time 0.3859
epoch 18 comsuming_time 0.3928
epoch 19 comsuming_time 0.4090
epoch 20 comsuming_time 0.4206
epoch 21 comsuming_time 0.4333
epoch 22 comsuming_time 0.4479
epoch 23 comsuming_time 0.4631
epoch 24 comsuming_time 0.4774
epoch 25 comsuming_time 0.4923
epoch 26 comsuming_time 0.5533
epoch 27 comsuming_time 0.5187
epoch 28 comsuming_time 0.5319
epoch 29 comsuming_time 0.5470
epoch 30 comsuming_time 0.5647
epoch 31 comsuming_time 0.5796
epoch 32 comsuming_time 0.6036

I think I have found the problem. It is sess.run(iterator.get_next()). Predifining get_next_op = iterator.get_next() outside the loop and run sess.run(get_next_op) will be OK.

Related

Decreasing loss for epochs but accuracy remains same for multiple epochs before changing

I am building a neural network model to identify the blade sharpness based on the cutting force at the given distance after incision. My data is in csv format and I am using a binary classification model with 2 hidden layers. I only 45 input data points. When I run I my neural network model the loss is decreasing but the accuracy remains same over multiple epochs before changing.
#Initialising the neural network
Classifier = Sequential()
#Adding the input layer and the first hidden layer
Classifier.add(Dense(units=2, kernel_initializer= 'he_uniform', activation= 'relu',input_dim = 2))
Classifier.add(Dense(units=2, kernel_initializer= 'he_uniform', activation= 'relu',))
#Adding the output layer
Classifier.add(Dense(units =1, kernel_initializer='glorot_uniform', activation = 'sigmoid',))
Classifier.summary()
Epoch 177/2000
1/1 [==============================] - 0s 98ms/step - loss: 0.5921 - accuracy: 0.7222 - val_loss: 0.6642 - val_accuracy: 0.5000
Epoch 178/2000
1/1 [==============================] - 0s 72ms/step - loss: 0.5915 - accuracy: 0.7222 - val_loss: 0.6627 - val_accuracy: 0.5000
Epoch 179/2000
1/1 [==============================] - 0s 83ms/step - loss: 0.5908 - accuracy: 0.7222 - val_loss: 0.6612 - val_accuracy: 0.5000
Epoch 180/2000
1/1 [==============================] - 0s 82ms/step - loss: 0.5902 - accuracy: 0.7222 - val_loss: 0.6597 - val_accuracy: 0.5000
Epoch 181/2000
1/1 [==============================] - 0s 123ms/step - loss: 0.5896 - accuracy: 0.7222 - val_loss: 0.6581 - val_accuracy: 0.5000
Epoch 182/2000
1/1 [==============================] - 0s 77ms/step - loss: 0.5889 - accuracy: 0.7222 - val_loss: 0.6566 - val_accuracy: 0.5000
Epoch 183/2000
1/1 [==============================] - 0s 75ms/step - loss: 0.5883 - accuracy: 0.7500 - val_loss: 0.6550 - val_accuracy: 0.5000
Epoch 184/2000
1/1 [==============================] - 0s 73ms/step - loss: 0.5877 - accuracy: 0.8056 - val_loss: 0.6533 - val_accuracy: 0.5000
Epoch 185/2000
1/1 [==============================] - 0s 83ms/step - loss: 0.5870 - accuracy: 0.8056 - val_loss: 0.6517 - val_accuracy: 0.5000
Epoch 186/2000
1/1 [==============================] - 0s 103ms/step - loss: 0.5864 - accuracy: 0.8056 - val_loss: 0.6500 - val_accuracy: 0.5000
Epoch 187/2000
1/1 [==============================] - 0s 95ms/step - loss: 0.5857 - accuracy: 0.8056 - val_loss: 0.6484 - val_accuracy: 0.5000
Epoch 188/2000
1/1 [==============================] - 0s 69ms/step - loss: 0.5851 - accuracy: 0.8056 - val_loss: 0.6467 - val_accuracy: 0.5000
Epoch 189/2000
1/1 [==============================] - 0s 84ms/step - loss: 0.5845 - accuracy: 0.8056 - val_loss: 0.6450 - val_accuracy: 0.5000
Epoch 190/2000
1/1 [==============================] - 0s 94ms/step - loss: 0.5838 - accuracy: 0.8056 - val_loss: 0.6433 - val_accuracy: 0.5000
Epoch 191/2000
1/1 [==============================] - 0s 86ms/step - loss: 0.5832 - accuracy: 0.8056 - val_loss: 0.6416 - val_accuracy: 0.5000
Epoch 192/2000
1/1 [==============================] - 0s 80ms/step - loss: 0.5825 - accuracy: 0.8056 - val_loss: 0.6399 - val_accuracy: 0.5000
Epoch 193/2000
1/1 [==============================] - 0s 63ms/step - loss: 0.5818 - accuracy: 0.8056 - val_loss: 0.6381 - val_accuracy: 0.5000
Epoch 194/2000
1/1 [==============================] - 0s 79ms/step - loss: 0.5812 - accuracy: 0.8056 - val_loss: 0.6364 - val_accuracy: 0.5000
Epoch 195/2000
1/1 [==============================] - 0s 87ms/step - loss: 0.5805 - accuracy: 0.8056 - val_loss: 0.6347 - val_accuracy: 0.5000
Epoch 196/2000
1/1 [==============================] - 0s 90ms/step - loss: 0.5799 - accuracy: 0.8056 - val_loss: 0.6330 - val_accuracy: 0.5000
Epoch 197/2000
1/1 [==============================] - 0s 83ms/step - loss: 0.5792 - accuracy: 0.8056 - val_loss: 0.6313 - val_accuracy: 0.7500
Epoch 198/2000
1/1 [==============================] - 0s 191ms/step - loss: 0.5785 - accuracy: 0.8333 - val_loss: 0.6296 - val_accuracy: 1.0000
Epoch 199/2000
1/1 [==============================] - 0s 77ms/step - loss: 0.5779 - accuracy: 0.8333 - val_loss: 0.6278 - val_accuracy: 1.0000
Epoch 200/2000
1/1 [==============================] - 0s 122ms/step - loss: 0.5772 - accuracy: 0.8333 - val_loss: 0.6261 - val_accuracy: 1.0000
Epoch 201/2000
1/1 [==============================] - 0s 98ms/step - loss: 0.5765 - accuracy: 0.8333 - val_loss: 0.6244 - val_accuracy: 1.0000
Epoch 202/2000
1/1 [==============================] - 0s 85ms/step - loss: 0.5758 - accuracy: 0.8333 - val_loss: 0.6226 - val_accuracy: 1.0000
Epoch 203/2000
1/1 [==============================] - 0s 107ms/step - loss: 0.5752 - accuracy: 0.8333 - val_loss: 0.6209 - val_accuracy: 1.0000
Epoch 204/2000
1/1 [==============================] - 0s 54ms/step - loss: 0.5745 - accuracy: 0.8333 - val_loss: 0.6192 - val_accuracy: 1.0000
Epoch 205/2000
1/1 [==============================] - 0s 67ms/step - loss: 0.5738 - accuracy: 0.8333 - val_loss: 0.6175 - val_accuracy: 1.0000
Epoch 206/2000
1/1 [==============================] - 0s 125ms/step - loss: 0.5731 - accuracy: 0.8333 - val_loss: 0.6158 - val_accuracy: 1.0000
Epoch 207/2000
1/1 [==============================] - 0s 101ms/step - loss: 0.5725 - accuracy: 0.8333 - val_loss: 0.6140 - val_accuracy: 1.0000
Epoch 208/2000
1/1 [==============================] - 0s 146ms/step - loss: 0.5718 - accuracy: 0.8333 - val_loss: 0.6123 - val_accuracy: 1.0000
Epoch 209/2000
1/1 [==============================] - 0s 218ms/step - loss: 0.5711 - accuracy: 0.8333 - val_loss: 0.6106 - val_accuracy: 1.0000
Epoch 210/2000
1/1 [==============================] - 0s 174ms/step - loss: 0.5704 - accuracy: 0.8333 - val_loss: 0.6088 - val_accuracy: 1.0000```

How to select all future closing times?

We are trying to calculate store closing times based upon their opening hours.
We have a data model like the following for each store.
id
weekday
opening_minute
shift_length
1
0
540
990
2
1
540
990
3
2
540
990
4
3
540
990
5
4
540
990
6
5
540
990
7
6
540
990
weekday is the day of the week, where 0 is Sunday.
opening_minute is the opening time where 540 is 09:00.
shift_length is the length of time that the store is open, where 990 is
16.5 hours.
There are some complexities here when it comes to calculating future closing
times.
We have tried the following SQL statement.
SELECT
"opening_hours".*,
date_trunc('week', '2021-04-29'::date - 1)::timestamp +
make_interval(
days => weekday,
mins => opening_minute + shift_length
) closes_at
FROM "opening_hours";
This statement gets us halfway there, and the output is as follows.
id
weekday
opening_minute
shift_length
closes_at
1
0
540
990
2021-04-28 01:30:00
2
1
540
990
2021-04-29 01:30:00
3
2
540
990
2021-04-30 01:30:00
4
3
540
990
2021-05-01 01:30:00
5
4
540
990
2021-05-02 01:30:00
6
5
540
990
2021-05-03 01:30:00
7
6
540
990
2021-05-04 01:30:00
We are making progress, but the output includes closing times for days in the
past. We want the output only to include future closing times.
We've been going round and round in circles with this. How can we output only
the future closing times? For example, if it is 08:30 on 29 April 2021, we would
expect the following output.
id
weekday
opening_minute
shift_length
closes_at
3
2
540
990
2021-04-30 01:30:00
4
3
540
990
2021-05-01 01:30:00
5
4
540
990
2021-05-02 01:30:00
6
5
540
990
2021-05-03 01:30:00
7
6
540
990
2021-05-04 01:30:00
1
0
540
990
2021-05-05 01:30:00
2
1
540
990
2021-05-06 01:30:00
Is there a problem adding a Where clause saying
Where (repeat complex closing calc) > LOCALTIMESTAMP
, assuming your date times in the calculation are not encumbered with time zone problems.
You can use generate_series to generate dates you want to calculate closing times for and then simply JOIN opening_hours to their "day of week" number:
SELECT
*
, day.date + (opening_minute + shift_length) * ('1 minute')::interval as closes_at
FROM
(SELECT '2021-04-30'::date + generate_series(0, 6) as date) as day
JOIN opening_hours ON date_part('dow', day.date) = opening_hours.weekday
ORDER BY
closes_at
You can check it in db<>fidle
In stead of '2021-04-30' you can pass any date you want to generate next 7 closing times.
If you want 7 closing times from now that are strictly in the future, you should generate 8 dates (as closing time of current_date might already be in the past) then filter and display first 7 results:
SELECT
*
, day.date + (opening_minute + shift_length) * ('1 minute')::interval as closes_at
FROM
(SELECT current_date + generate_series(0, 7) as date) as day
JOIN opening_hours ON date_part('dow', day.date) = opening_hours.weekday
WHERE
day.date + (opening_minute + shift_length) * ('1 minute')::interval > now()
ORDER BY
closes_at
LIMIT 7

How to get data and then convert it into date specific lines via wk?

I need to convert the first six values into date in the lines of EPOCH OF CURRENT MAP
I still haven't understood logic of awk.
File like as follows...
2018 1 2 0 0 0 EPOCH OF CURRENT MAP
Data
2018 1 2 1 0 0 EPOCH OF CURRENT MAP
Data
2018 1 2 2 0 0 EPOCH OF CURRENT MAP
Data
2018 1 2 3 0 0 EPOCH OF CURRENT MAP
Data
2018 1 2 4 0 0 EPOCH OF CURRENT MAP
Data
2018 1 2 5 0 0 EPOCH OF CURRENT MAP
Data
2018 1 2 6 0 0 EPOCH OF CURRENT MAP
...
I tried
awk '/EPOCH OF CURRENT MAP/ {printf "%s %s %s %s %s %s\n", $1,$2,$3,$4,$5,$6}' codg0010.18i
Output date format should be day, day of the abbreviated month, year hour:minute:second but excluding "EPOCH OF CURRENT"
02-Jan-2018 00:00:00
02-Jan-2018 01:00:00
02-Jan-2018 02:00:00
...
with GNU awk
$ awk '/EPOCH OF CURRENT MAP/{print strftime("%d-%b-%Y %H:%M:%S",mktime($1" "$2" "$3" "$4" "$5" "$6))}' file
02-Jan-2018 00:00:00
02-Jan-2018 01:00:00
02-Jan-2018 02:00:00
02-Jan-2018 03:00:00
02-Jan-2018 04:00:00
02-Jan-2018 05:00:00
02-Jan-2018 06:00:00

Hive rows generation from a column value

How can I generate the rows in Hive based on a single column value. For Eg,
I have the below table data. I need to generate the similar rows (all fields same) but with all column values for rating?
item_id cost rating
23 1290 0.08
14 1498 0.06
I need the output as shown below.
item_id cost rating
23 1290 0.08
23 1290 0.07
23 1290 0.06
23 1290 0.05
23 1290 0.04
23 1290 0.03
23 1290 0.02
23 1290 0.01
14 1498 0.06
14 1498 0.05
14 1498 0.04
14 1498 0.03
14 1498 0.02
14 1498 0.01
For example like this:
with initial_data as(
select stack(2,
23, 1290, 0.08,
14, 1498, 0.06
) as (item_id, cost, rating)
)
select item_id, cost, (i+1)/100 as rating
from
(
select d.*, cast(d.rating*100 as int)-1 as n --the number of rows to generate
from initial_data d
)s lateral view posexplode(split(space(s.n),' ')) e as i, x --generate rows with numbers (i)
order by item_id desc, rating desc; --remove ordering for faster processing if you do not need ordered output
Result:
OK
23 1290 0.08
23 1290 0.07
23 1290 0.06
23 1290 0.05
23 1290 0.04
23 1290 0.03
23 1290 0.02
23 1290 0.01
14 1498 0.06
14 1498 0.05
14 1498 0.04
14 1498 0.03
14 1498 0.02
14 1498 0.01
Time taken: 74.993 seconds, Fetched: 14 row(s)

awk compare 2 files, find min/max and store it

I have 3 files:
File_1 is static, the content is not changing, vales can be -160 to 0:
xdslcmd: ADSL driver and PHY status
Status: Idle
Retrain Reason: 0
Tone number QLN
0 0.0000
1 0.0000
2 0.0000
3 0.0000
4 0.0000
5 0.0000
6 0.0000
7 -160.0000
8 -119.2000
9 -128.6700
10 -113.1200
11 -93.1000
12 -130.0000
13 -120.0000
14 -110.0000
15 -100.0000
16 -90.0000
17 -100.0000
18 -110.0000
19 -120.0000
20 -130.0000
21 -140.0000
22 -110.0000
23 0.0000
24 0.0000
File_2 is looks like File_1 but the values are changing every time (values can be -160 to 0)
xdslcmd: ADSL driver and PHY status
Status: Idle
Retrain Reason: 0
Tone number QLN
0 0.0000
1 0.0000
2 0.0000
3 0.0000
4 0.0000
5 0.0000
6 0.0000
7 -160.0000
8 -159.2000
9 -148.6700
10 -123.1200
11 -83.1000
12 -100.0000
13 -100.0000
14 -100.0000
15 -80.0000
16 -80.0000
17 -110.0000
18 -120.0000
19 -130.0000
20 -140.0000
21 -150.0000
22 -100.0000
23 0.0000
24 0.0000
I want to compare File_2 $2 to File_1 $2 and store the difference between them inf File_3
Exmaple:
File_1 contains: 18 -120.0000
File_2 contains: 18 -140.0000
Expected output: 18 -20 0
File_1 contains the base values (considered as "0")
File_2 changes every time and holds the actual values.
The expected output is the min/max difference from the base values during the measurement.
It is possible that in the same tone QLN can be higher and lower during the measurement:
File_1 contains: 18 -120.0000
File_2 contains: 18 -140.0000
File_2 contains: 18 -100.0000 (in a later query)
Expected output: 18 -20 +20
File_1 and File_2 are sorted, the first 5 lines are not relevant.
awk 'FNR<6{next}NR==FNR{a[$1]=$2;next}{printf "%s\t%10f\n",$1,$2-a[$1]}' f1 f2
0  0.000000
1  0.000000
2  0.000000
3  0.000000
4  0.000000
5  0.000000
6  0.000000
7  0.000000
8 -40.000000
9 -20.000000
10 -10.000000
11 10.000000
12 30.000000
13 20.000000
14 10.000000
15 20.000000
16 10.000000
17 -10.000000
18 -10.000000
19 -10.000000
20 -10.000000
21 -10.000000
22 10.000000
23  0.000000
24  0.000000
Non-zero differences:
awk 'FNR<6{next}NR==FNR{a[$1]=$2;next}d=$2-a[$1]{printf "%s\t%10f\n",$1,d}' f1 f2
8 -40.000000
9 -20.000000
10 -10.000000
11 10.000000
12 30.000000
13 20.000000
14 10.000000
15 20.000000
16 10.000000
17 -10.000000
18 -10.000000
19 -10.000000
20 -10.000000
21 -10.000000
22 10.000000
The use of printf means you can change the format of the output, for instance to only two decimal places printf "%s\t%10.2f\n",$1,d.