Tensorflow default data format in GPU - tensorflow

I'm working on Tensorflow with GPU. I was curious about data format of tensor. I thought that data are stored in GPU/CPU in Row Major.
However, if I want to store the data in column-major in one operation(Op), can I change the data format only for that operation(Op)? (ex. put some options in function indicates change the format of data)
For instance in matmul operation, there exist options related to transpose. Is there any change in data format (Column Major / Row Major) if I transpose the matrix?
Thanks.

Yes, the default data format is row-major which is opposite to Eigen.
If you are using Python, then you will need to transpose your data when simulating a col-major layout. When using C++ nothing prevents you from employing Eigen::RowMajor instead.
The matmul has options transpose_a and transpose_b as (cu-)BLAS can handle both formats without explicit transpose, e.g. see GEMM. So it will not change your data format. It is only a trick, to prevent additionals launch of CUDA kernels or other functions beforehand to minimize run-time.
It is part of the BLAS specification, e.g. see LAPACK

Related

label encoding in dask_cudf dataframe

I am trying to use dask_cudf to preprocess a very large dataset (150,000,000+ records) for multi-class xgboost training and am having trouble encoding the class column (dtype is string). I tried using the 'replace' function, but the error message said the two dtypes must match. I tried using dask_ml.LabelEncoder, but it said string arrays aren't supported in cudf. I tried using compute() in various ways, but i kept running into out-of-memory errors (i'm assuming because operations on cudf dataframe require a smaller dataset). I also tried pulling the class column out, encoding, and then merging it back with the dataframe, but the partitions do not line up. I tried manually lining them up, but dask_cudf seemingly does not support repartioning using 'divisions' parameter (got error saying something like 'old and new partitions do not match'). Any help on how to do this would be much appreciated.
Strings aren't supported on xgboost. Not having seen your data, here are a few ways quick and dirty ways I've modified string columns to train, as generally strings may not matter:
If the strings were actually numeric (like dates), converting to int (int8 int16, int32)
I did this by hashmapping the strings and then running xgboost (basically creating a reversible conversion between string and integer as long as you don't change the integer) and train on your current, now hashed as an integer, column.
if the strings are classes, manually naming class numbers (0,1,2,...,n) in a new column and train on that one.
There are definitely other, better ways. As for the second part of your question, left a comment.
Now, your XGBoost model and your dask-cudf dataframe per-GPU allocation must fit on a single GPU, or you will get memory errors. If your model will be considering a large amount of data, please train on the largest GPU memory sized cluster you can. A100s can have 40GB and 80GB. Some older compute GPUs, V100 and GV100 have 32GB. A6000 and RTX8000 have 48GB. then it goes to 24, 16, and lower from there. Please size your GPUs accordingly

Best way to convert TensorProto to TensorFlow tensor

As far as I can tell, there are at least two different ways to recover a Tensor from a TensorProto in Tensorflow 2.3. Say, for the sake of example, that we have
tensor = tf.range(10)
tproto = tf.make_tensor_proto(tensor)
Then:
You can use tf.make_ndarray like so
tf.constant(tf.make_ndarray(tproto))
Or you can use tf.io.parse_tensor like so
tf.io.parse_tensor(tproto.SerializeToString(), out_type=tf.int32)
I feel both of these are a bit artificial, since in the former you end up with an intermediate numpy array, and in the latter you have to serialize the TensorProto to a string and parse it back. Additionally, parse_tensor won't automatically recover the correct data type from the TensorProto. So:
Is there a function to do the conversion in a single step? I'd like to see something like tf.from_tensor_proto doing the conversion all at once optimizing for speed and memory allocation (or, if tf.constant(tf.make_ndarray(tproto)) is the best you can do, just wrapping this up).
Otherwise, which of the two options above should be preferred (in terms of efficiency, memory usage, etc.)?

Does the sklearn.ensemble.GradientBoostingRegressor support sparse input samples?

I’m using sklearn.ensemble.GradientBoostingRegressor on data that is sometimes lacking some values. I can’t easily impute these data because they have a great variance and the estimate is very sensitive to them. They are also almost never 0.
The documentation of the fit method says about the first parameter X:
The input samples. Internally, it will be converted to dtype=np.float32 and if a sparse matrix is provided to a sparse csr_matrix.
This has lead me to think that the GradientBoostingRegressor can work with sparse input data.
But internally it calls check_array with implicit force_all_finite=True (the default), so that I get the following error if I put in a csr_matrix with NaN values:
ValueError: Input contains NaN, infinity or a value too large for dtype('float32')
Does the GradientBoostingRegressor not actually support sparse data?
Update:
I’m lucky in that I don’t have any meaningful zeros. My calling code now looks like this:
predictors['foobar'] = predictors['foobar'].fillna(0) # for columns that contain NaNs
predictor_matrix = scipy.sparse.csr_matrix(
predictors.values.astype(np.float)
)
predictor_matrix.eliminate_zeros()
model.fit(predictor_matrix, regressands)
This avoids the exception above. Unfortunately there is no eliminate_nans() method. (When I print a sparse matrix with NaNs, it lists them explicitly, so spareness must be something other than containing NaNs.)
But the prediction performance hasn’t (noticeably) changed.
Perhaps you could try using LightGBM. Here is a discussion in Kaggle about how it handles missing values:
https://www.kaggle.com/c/home-credit-default-risk/discussion/57918
Good luck

Multiple outputs per input in Tensorflow

Is it possible to get the semantics of an unbounded arc in Tensorflow without directly enqueuing in the op itself?
For example, if I want to write an operation on that takes a scalar string my_string and "emits" tuples of ("string", num_occurrences_in_my_string), I have to resort to either of the following output options (as far as I know):
return the values necessary to construct a sparse Tensor
take a queue reference (of the correct type) and directly enqueue the input myself (like the tf.TextLineReader does)
As far as I can tell from the paper from Google on the Tensorflow "programming language", these are the only ways to accomplish it.
Is there a way in Tensorflow to emit an arbitrary number of output "rounds" per a given input besides the aforementioned workarounds?

Fit scikit-learn algorithms with data stored in SFrame

Is it possible to use data stored in Sframe to train e.g., a Random Forest, of scikit-learn implementation without converting the whole dataset to numpy?
According by Turi-forum:
"If you use the most recent version of SFrame (which only became available via pip yesterday) you can use the tonumpy function to create an ndarray from an SFrame."
https://forum.turi.com/discussion/1642/