I am trying to rewrite a tensorflow script in pytorch. I have a problem finding the equivalent part in torch for the following line from this script:
import tensorflow_probability as tfp
tfd = tfp.distributions
a_distribution = tfd.TransformedDistribution(
distribution=tfd.Normal(loc=0.0, scale=1.0),
bijector=tfp.bijectors.Chain([
tfp.bijectors.AffineScalar(shift=self._means,
scale=self._mags),
tfp.bijectors.Tanh(),
tfp.bijectors.AffineScalar(shift=mean, scale=std),
]),
event_shape=[mean.shape[-1]],
batch_shape=[mean.shape[0]])
In particular, I have a huge problem for replacing the tfp.bijectors.Chain component.
I wrote the following lines in torch, but I am wondering whether these lines in pytorch compatible with the above tensorflow code and whether I can specify the batch_shape somewhere?
base_distribution = torch.normal(0.0, 1.0)
transforms = torch.distributions.transforms.ComposeTransform([torch.distributions.transforms.AffineTransform(loc=self._action_means, scale=self._action_mag, event_dim=mean.shape[-1]), torch.nn.Tanh(),torch.distributions.transforms.AffineTransform(loc=mean, scale=std, event_dim=mean.shape[-1])])
a_distribution = torch.distributions.transformed_distribution.TransformedDistribution(base_distribution, transforms)
Any solution?
In Pytorch, the base distribution class Distribution expects both a batch_shape and a event_shape parameter. Now notice that the subclass TransformedDistribution does not take such parameters (src code). That's because they are inferred from the base distribution class provided on initialization: see here and here.
You already found out about AffineTransform and ComposeTransform. Keep in mind you must stick with classes from the torch.distributions.
This holds for torch.normal which should be replaced with Normal. With this class, the shape is inferred from the provided loc and scale tensors.
And nn.Tanh which should be replaced with TanhTransform.
Here is a minimal example using your transformation pipeline:
Imports:
from torch.distributions.normal import Normal
from torch.distributions import transforms as tT
from torch.distributions.transformed_distribution import TransformedDistribution
Parameters:
mean = torch.rand(2,2)
std = 1
_action_means, _action_mag = 0, 1
event_dim=mean.shape[-1]
Distribution definition:
a_distribution = TransformedDistribution(
base_distribution=Normal(loc=torch.full_like(mean, 0),
scale=torch.full_like(mean, 1)),
transforms=tT.ComposeTransform([
tT.AffineTransform(loc=_action_means, scale=_action_mag, event_dim=event_dim),
tT.TanhTransform(),
tT.AffineTransform(loc=mean, scale=std, event_dim=event_dim)]))
I have a code using tensorflow v1 and I'd like to migrate it toward native tensorflow 2.
The code defines random objects (using numpy.randomor random, a neural network (keras weight initialization etc) and other tensorflow's random functions. At the end, it makes predictions on a random test set and outputs loss/accuracy of the model.
For this task, I'm having the original code and a copy of it and I'm changing the code of the copy part by part. I want to make sure that the behaviour is the same so I want to set the randomness so that I can monitor if the loss/accuracy change
However, even after setting the seeds of the various random modules in my original file, launching it multiple times still give different loss/accuracy
here are my libraries :
import time
import random
import my_file as mf // file in directory scope
import numpy as np
import copy
import os
from matplotlib import pyplot as plt
import tensorflow.compat.v1 as tf
and I'm setting the seeds at the beginning like that :
tf.set_random_seed(42)
random.seed(42)
np.random.seed(42)
My module my_file uses the random library and I'm also setting the seed there
I do understand from the docs that tf.set_random_seed only sets the global seed and that each random operation in tensorflow is also using its own seed, resulting in different behaviors for consecutive calls. For example if I call the training/testing cell 3 times I get the consecutive value of losses L1 -> L2 -> L3
However, this should still result in the same behavior if I restart the environment so why isn't it the case ? If I restart the kernel and execute 3 times I will get L1' =/= L1 -> L2' =/= L2 -> L3' =/= L3
What else should I verify to make sure the behaviour is the same everytime I restart the notebook kernel ?
Hi I've trying to get a TFX Pipeline going just as an exercise really. I'm using ImportExampleGen to load TFRecords from disk. Each Example in the TFRecord contains a jpg in the form of a byte string, height, width, depth, steering and throttle labels.
I'm trying to use StatisticsGen but I'm receiving this warning;
WARNING:root:Feature "image_raw" has bytes value "None" which cannot be decoded as a UTF-8 string. and crashing my Colab Notebook. As far as I can tell all the byte-string images in the TFRecord are not corrupt.
I cannot find concrete examples on StatisticsGen and handling image data. According to the docs Tensorflow Data Validation can deal with image data.
In addition to computing a default set of data statistics, TFDV can also compute statistics for semantic domains (e.g., images, text). To enable computation of semantic domain statistics, pass a tfdv.StatsOptions object with enable_semantic_domain_stats set to True to tfdv.generate_statistics_from_tfrecord.
But I'm not sure how this fits in with StatisticsGen.
Here is the code that instantiates an ImportExampleGen then the StatisticsGen
from tfx.utils.dsl_utils import tfrecord_input
from tfx.components.example_gen.import_example_gen.component import ImportExampleGen
from tfx.proto import example_gen_pb2
examples = tfrecord_input(_tf_record_dir)
# https://www.tensorflow.org/tfx/guide/examplegen#custom_inputoutput_split
# has a good explanation of splitting the data the 'output_config' param
# Input train split is _tf_record_dir/*'
# Output 2 splits: train:eval=8:2.
train_ratio = 8
eval_ratio = 10-train_ratio
output = example_gen_pb2.Output(
split_config=example_gen_pb2.SplitConfig(splits=[
example_gen_pb2.SplitConfig.Split(name='train',
hash_buckets=train_ratio),
example_gen_pb2.SplitConfig.Split(name='eval',
hash_buckets=eval_ratio)
]))
example_gen = ImportExampleGen(input=examples,
output_config=output)
context.run(example_gen)
statistics_gen = StatisticsGen(
examples=example_gen.outputs['examples'])
context.run(statistics_gen)
Thanks in advance.
From git issue response
Thanks Evan Rosen
Hi Folks,
The warnings you are seeing indicate that StatisticsGen is trying to treat your raw image features like a categorical string feature. The image bytes are being decoded just fine. The issue is that when the stats (including top K examples) are being written, the output proto is expecting a UTF-8 valid string, but instead gets the raw image bytes. Nothing is wrong with your setups from what I can tell, but this is just an unintended side-effect of a well-intentioned warning in the event that you have a categorical string feature which can't be serialized. We'll look into finding a better default that handles image data more elegantly.
In the meantime, to tell StatisticsGen that this feature is really an opaque blob, you can pass in a user-modified schema as described in the StatsGen docs. To generate this schema, you can run StatisticsGen and SchemaGen once (on a sample of data) and then modify the inferred schema to annotate that image features. Here is a modified version of the colab from #tall-josh:
Open In Colab
The additional steps are a bit verbose, but having a curated schema is often a good practice for other reasons. Here is the cell that I added to the notebook:
from google.protobuf import text_format
from tensorflow.python.lib.io import file_io
from tensorflow_metadata.proto.v0 import schema_pb2
# Load autogenerated schema (using stats from small batch)
schema = tfx.utils.io_utils.SchemaReader().read(
tfx.utils.io_utils.get_only_uri_in_dir(
tfx.types.artifact_utils.get_single_uri(schema_gen.outputs['schema'].get())))
# Modify schema to indicate which string features are images.
# Ideally you would persist a golden version of this schema somewhere rather
# than regenerating it on every run.
for feature in schema.feature:
if feature.name == 'image/raw':
feature.image_domain.SetInParent()
# Write modified schema to local file
user_schema_dir ='/tmp/user-schema/'
tfx.utils.io_utils.write_pbtxt_file(
os.path.join(user_schema_dir, 'schema.pbtxt'), schema)
# Create ImportNode to make modified schema available to other components
user_schema_importer = tfx.components.ImporterNode(
instance_name='import_user_schema',
source_uri=user_schema_dir,
artifact_type=tfx.types.standard_artifacts.Schema)
# Run the user schema ImportNode
context.run(user_schema_importer)
Hopefully you find this workaround is useful. In the meantime, we'll take a look at a better default experience for image-valued features.
Groked this and found the solution to be dramatically simpler than i thought...
from tfx.orchestration.experimental.interactive.interactive_context import InteractiveContext
import logging
...
logger = logging.getLogger()
logger.setLevel(logging.CRITICAL)
...
context = InteractiveContext(pipeline_name='my_pipe')
...
c = StatisticsGen(...)
...
context.run(c)
I want to compare two images for similarity. Since my purpose is to match a given image against a massive collection of images, I want to run the comparisons on GPU.
I came across tf.image.ssim and tf.image.psnr functions but I am unable to find and working examples only. The solutions in PyTorch is also appreciated. Since I don't have a good understanding of CUDA and C language, I am hesitant to try kernels in PyCuda.
Will it be helpful in terms of processing if I read the entire image collection and store as Tensorflow Records for future processing?
Any guidance or solution, greatly appreciated. Thank you.
Edit:- I am matching images of same size only. I don't want to do mere histogram match. I want to do SSIM or PSNR implementation for image similarity. So, I am assuming it would be similar in color, content etc
Check out the example on the tensorflow doc page (link):
im1 = tf.decode_png('path/to/im1.png')
im2 = tf.decode_png('path/to/im2.png')
print(tf.image.ssim(im1, im2, max_val=255))
This should work on latest version of tensorflow. If you use older versions tf.image.ssim will return a tensor (print will not give you a value), but you can call .run() to evaluate it.
There is no implementation of PSNR or SSIM in PyTorch. You can either implement them yourself or use a third-party package, like piqa which I have developed.
Assuming you already have torch and torchvision installed, you can get it with
pip install piqa
Then for the image comparison
import torch
from torchvision import transforms
from PIL import Image
im1 = Image.open('path/to/im1.png')
im2 = Image.open('path/to/im2.png')
transform = transforms.ToTensor()
x = transform(im1).unsqueeze(0).cuda() # .cuda() for GPU
y = transform(im2).unsqueeze(0).cuda()
from piqa import PSNR, SSIM
psnr = PSNR()
ssim = SSIM().cuda()
print('PSNR:', psnr(x, y))
print('SSIM:', ssim(x, y))
Background
I have been playing around with Deep Dream and Inceptionism, using the Caffe framework to visualize layers of GoogLeNet, an architecture built for the Imagenet project, a large visual database designed for use in visual object recognition.
You can find Imagenet here: Imagenet 1000 Classes.
To probe into the architecture and generate 'dreams', I am using three notebooks:
https://github.com/google/deepdream/blob/master/dream.ipynb
https://github.com/kylemcdonald/deepdream/blob/master/dream.ipynb
https://github.com/auduno/deepdraw/blob/master/deepdraw.ipynb
The basic idea here is to extract some features from each channel in a specified layer from the model or a 'guide' image.
Then we input an image we wish to modify into the model and extract the features in the same layer specified (for each octave),
enhancing the best matching features, i.e., the largest dot product of the two feature vectors.
So far I've managed to modify input images and control dreams using the following approaches:
(a) applying layers as 'end' objectives for the input image optimization. (see Feature Visualization)
(b) using a second image to guide de optimization objective on the input image.
(c) visualize Googlenet model classes generated from noise.
However, the effect I want to achieve sits in-between these techniques, of which I haven't found any documentation, paper, or code.
Desired result (not part of the question to be answered)
To have one single class or unit belonging to a given 'end' layer (a) guide the optimization objective (b) and have this class visualized (c) on the input image:
An example where class = 'face' and input_image = 'clouds.jpg':
please note: the image above was generated using a model for face recognition, which was not trained on the Imagenet dataset. For demonstration purposes only.
Working code
Approach (a)
from cStringIO import StringIO
import numpy as np
import scipy.ndimage as nd
import PIL.Image
from IPython.display import clear_output, Image, display
from google.protobuf import text_format
import matplotlib as plt
import caffe
model_name = 'GoogLeNet'
model_path = 'models/dream/bvlc_googlenet/' # substitute your path here
net_fn = model_path + 'deploy.prototxt'
param_fn = model_path + 'bvlc_googlenet.caffemodel'
model = caffe.io.caffe_pb2.NetParameter()
text_format.Merge(open(net_fn).read(), model)
model.force_backward = True
open('models/dream/bvlc_googlenet/tmp.prototxt', 'w').write(str(model))
net = caffe.Classifier('models/dream/bvlc_googlenet/tmp.prototxt', param_fn,
mean = np.float32([104.0, 116.0, 122.0]), # ImageNet mean, training set dependent
channel_swap = (2,1,0)) # the reference model has channels in BGR order instead of RGB
def showarray(a, fmt='jpeg'):
a = np.uint8(np.clip(a, 0, 255))
f = StringIO()
PIL.Image.fromarray(a).save(f, fmt)
display(Image(data=f.getvalue()))
# a couple of utility functions for converting to and from Caffe's input image layout
def preprocess(net, img):
return np.float32(np.rollaxis(img, 2)[::-1]) - net.transformer.mean['data']
def deprocess(net, img):
return np.dstack((img + net.transformer.mean['data'])[::-1])
def objective_L2(dst):
dst.diff[:] = dst.data
def make_step(net, step_size=1.5, end='inception_4c/output',
jitter=32, clip=True, objective=objective_L2):
'''Basic gradient ascent step.'''
src = net.blobs['data'] # input image is stored in Net's 'data' blob
dst = net.blobs[end]
ox, oy = np.random.randint(-jitter, jitter+1, 2)
src.data[0] = np.roll(np.roll(src.data[0], ox, -1), oy, -2) # apply jitter shift
net.forward(end=end)
objective(dst) # specify the optimization objective
net.backward(start=end)
g = src.diff[0]
# apply normalized ascent step to the input image
src.data[:] += step_size/np.abs(g).mean() * g
src.data[0] = np.roll(np.roll(src.data[0], -ox, -1), -oy, -2) # unshift image
if clip:
bias = net.transformer.mean['data']
src.data[:] = np.clip(src.data, -bias, 255-bias)
def deepdream(net, base_img, iter_n=20, octave_n=4, octave_scale=1.4,
end='inception_4c/output', clip=True, **step_params):
# prepare base images for all octaves
octaves = [preprocess(net, base_img)]
for i in xrange(octave_n-1):
octaves.append(nd.zoom(octaves[-1], (1, 1.0/octave_scale,1.0/octave_scale), order=1))
src = net.blobs['data']
detail = np.zeros_like(octaves[-1]) # allocate image for network-produced details
for octave, octave_base in enumerate(octaves[::-1]):
h, w = octave_base.shape[-2:]
if octave > 0:
# upscale details from the previous octave
h1, w1 = detail.shape[-2:]
detail = nd.zoom(detail, (1, 1.0*h/h1,1.0*w/w1), order=1)
src.reshape(1,3,h,w) # resize the network's input image size
src.data[0] = octave_base+detail
for i in xrange(iter_n):
make_step(net, end=end, clip=clip, **step_params)
# visualization
vis = deprocess(net, src.data[0])
if not clip: # adjust image contrast if clipping is disabled
vis = vis*(255.0/np.percentile(vis, 99.98))
showarray(vis)
print octave, i, end, vis.shape
clear_output(wait=True)
# extract details produced on the current octave
detail = src.data[0]-octave_base
# returning the resulting image
return deprocess(net, src.data[0])
I run the code above with:
end = 'inception_4c/output'
img = np.float32(PIL.Image.open('clouds.jpg'))
_=deepdream(net, img)
Approach (b)
"""
Use one single image to guide
the optimization process.
This affects the style of generated images
without using a different training set.
"""
def dream_control_by_image(optimization_objective, end):
# this image will shape input img
guide = np.float32(PIL.Image.open(optimization_objective))
showarray(guide)
h, w = guide.shape[:2]
src, dst = net.blobs['data'], net.blobs[end]
src.reshape(1,3,h,w)
src.data[0] = preprocess(net, guide)
net.forward(end=end)
guide_features = dst.data[0].copy()
def objective_guide(dst):
x = dst.data[0].copy()
y = guide_features
ch = x.shape[0]
x = x.reshape(ch,-1)
y = y.reshape(ch,-1)
A = x.T.dot(y) # compute the matrix of dot-products with guide features
dst.diff[0].reshape(ch,-1)[:] = y[:,A.argmax(1)] # select ones that match best
_=deepdream(net, img, end=end, objective=objective_guide)
and I run the code above with:
end = 'inception_4c/output'
# image to be modified
img = np.float32(PIL.Image.open('img/clouds.jpg'))
guide_image = 'img/guide.jpg'
dream_control_by_image(guide_image, end)
Question
Now the failed approach how I tried to access individual classes, hot encoding the matrix of classes and focusing on one (so far to no avail):
def objective_class(dst, class=50):
# according to imagenet classes
#50: 'American alligator, Alligator mississipiensis',
one_hot = np.zeros_like(dst.data)
one_hot.flat[class] = 1.
dst.diff[:] = one_hot.flat[class]
To make this clear: the question is not about the dream code, which is the interesting background and which is already working code, but it is about this last paragraph's question only: Could someone please guide me on how to get images of a chosen class (take class #50: 'American alligator, Alligator mississipiensis') from ImageNet (so that I can use them as input - together with the cloud image - to create a dream image)?
The question is how to get images of the chosen class #50: 'American alligator, Alligator mississipiensis' from ImageNet.
Go to image-net.org.
Go to "Download".
Follow the instructions for "Download Image URLs":
How to download the URLs of a synset from your Brower?
1. Type a query in the Search box and click "Search" button
The alligator is not shown. ImageNet is under maintenance. Only ILSVRC synsets are included in the search results. No problem, we are fine with the similar animal "alligator lizard", since this search is about getting to the right branch of the WordNet treemap. I do not know whether you will get the direct ImageNet images here even if there were no maintenance.
2. Open a synset papge
Scrolling down:
Scrolling down:
Searching for the American alligator, which happens to be a saurian diapsid reptile as well, as a near neighbour:
3. You will find the "Download URLs" button under the left-bottom corner of the image browsing window.
You will get all of the URLs with the chosen class. A text file pops up in the browser:
http://image-net.org/api/text/imagenet.synset.geturls?wnid=n01698640
We see here that it is just about knowing the right WordNet id that needs to be put at the end of the URL.
Manual image download
The text file looks as follows:
http://farm1.static.flickr.com/136/326907154_d975d0c944.jpg
http://weeksbay.org/photo_gallery/reptiles/American20Alligator.jpg
...
till image number 1261.
As an example, the first URL links to:
And the second is a dead link:
The third link is dead, but the fourth is working.
The images of these URLs are publicly available, but many links are dead, and the pictures are of lower resolution.
Automated image download
From the ImageNet guide again:
How to download by HTTP protocol? To download a synset by HTTP
request, you need to obtain the "WordNet ID" (wnid) of a synset first.
When you use the explorer to browse a synset, you can find the WordNet
ID below the image window.(Click Here and search "Synset WordNet ID"
to find out the wnid of "Dog, domestic dog, Canis familiaris" synset).
To learn more about the "WordNet ID", please refer to
Mapping between ImageNet and WordNet
Given the wnid of a synset, the URLs of its images can be obtained at
http://www.image-net.org/api/text/imagenet.synset.geturls?wnid=[wnid]
You can also get the hyponym synsets given wnid, please refer to API
documentation to learn more.
So what is in that API documentation?
There is everything needed to get all of the WordNet IDs (so called "synset IDs") and their words for all synsets, that is, it has any class name and its WordNet ID at hand, for free.
Obtain the words of a synset
Given the wnid of a synset, the words of
the synset can be obtained at
http://www.image-net.org/api/text/wordnet.synset.getwords?wnid=[wnid]
You can also Click Here to
download the mapping between WordNet ID and words for all synsets,
Click Here to download the
mapping between WordNet ID and glosses for all synsets.
If you know the WordNet ids of choice and their class names, you can use the nltk.corpus.wordnet of "nltk" (natural language toolkit), see the WordNet interface.
In our case, we just need the images of class #50: 'American alligator, Alligator mississipiensis', we already know what we need, thus we can leave the nltk.corpus.wordnet aside (see tutorials or Stack Exchange questions for more). We can automate the download of all alligator images by looping through the URLs that are still alive. We could also widen this to the full WordNet with a loop over all WordNet IDs, of course, though this would take far too much time for the whole treemap - and is also not recommended since the images will stop being there if 1000s of people download them daily.
I am afraid I will not take the time to write this Python code that accepts the ImageNet class number "#50" as the argument, though that should be possible as well, using mapping tables from WordNet to ImageNet. Class name and WordNet ID should be enough.
For a single WordNet ID, the code could be as follows:
import urllib.request
import csv
wnid = "n01698640"
url = "http://image-net.org/api/text/imagenet.synset.geturls?wnid=" + str(wnid)
# From https://stackoverflow.com/a/45358832/6064933
req = urllib.request.Request(url, headers={'User-Agent': 'Mozilla/5.0'})
with open(wnid + ".csv", "wb") as f:
with urllib.request.urlopen(req) as r:
f.write(r.read())
with open(wnid + ".csv", "r") as f:
counter = 1
for line in f.readlines():
print(line.strip("\n"))
failed = []
try:
with urllib.request.urlopen(line) as r2:
with open(f'''{wnid}_{counter:05}.jpg''', "wb") as f2:
f2.write(r2.read())
except:
failed.append(f'''{counter:05}, {line}'''.strip("\n"))
counter += 1
if counter == 10:
break
with open(wnid + "_failed.csv", "w", newline="") as f3:
writer = csv.writer(f3)
writer.writerow(failed)
Result:
If you need the images even behind the dead links and in original quality, and if your project is non-commercial, you can sign in, see "How do I get a copy of the images?" at the Download FAQ.
In the URL above, you see the wnid=n01698640 at the end of the URL which is the WordNet id that is mapped to ImageNet.
Or in the "Images of the Synset" tab, just click on "Wordnet IDs".
To get to:
or right-click -- save as:
You can use the WordNet id to get the original images.
If you are commercial, I would say contact the ImageNet team.
Add-on
Taking up the idea of a comment: If you do not want many images, but just the "one single class image" that represents the class as much as possible, have a look at Visualizing GoogLeNet Classes and try to use this method with the images of ImageNet instead. Which is using the deepdream code as well.
Visualizing GoogLeNet Classes
July 2015
Ever wondered what a deep neural network thinks a Dalmatian should
look like? Well, wonder no more.
Recently Google published a post describing how they managed to use
deep neural networks to generate class visualizations and modify
images through the so called “inceptionism” method. They later
published the code to modify images via the inceptionism method
yourself, however, they didn’t publish code to generate the class
visualizations they show in the same post.
While I never figured out exactly how Google generated their class
visualizations, after butchering the deepdream code and this ipython
notebook from Kyle McDonald, I managed to coach GoogLeNet into drawing
these:
... [with many other example images to follow]