Use LightGBM to predict person who owns a row in CSV file - predict

I would like to use LightGBM to predict and classify various things about invoices in a CSV file.
So I have a CSV file with training invoices. I would like to use LightGBM to determine which person "owns" each row in the CSV file, so to use LightGBM to predict and classify the person for each row in the CSV file. How can I do this? Are there any LightGBM example of this kind of classification?

Related

how to prepare own dataset for CNN

I am building a CNN model to classify own dataset in python using tensorflow 2. How should i prepare my dataset in the directory to load it to the model.
Create the train dataset and test dataset, extract them into 2 different folders named as “train” and “test”. The train folder should contain ‘n’ folders each containing images of respective classes. For example, in the Dog vs Cats dataset, the train folder should have 2 folders, namely “Dogs” and “Cats” containing respective images inside them. The same should be done for the test folder.
Then, you should use tf.keras.preprocessing.image.ImageDataGenerator and its flow_from_directory function.

how to load large datasets of numpy arrays in order to train a CNN model in tensorflow2.1.0

I'm training a convolutional neural network (CNN) model for a binary classification task in tensorflow2.1.0.
The feature of each instance is a 4-dimensional numpy array with shape of (50, 50, 50, 2), in which the type of each element is float32.
The label of each instance is 1 or 0
My largest training dataset can contain up to ~100 millions of instances.
To efficiently train the model, is it best to serialize my training data and store it in a set of files with TFrecord format, and then load them with tf.data.TFRecordDataset() and parse them with tf.data.map()?
If so, could you show me an example of how to serialize the pairs of feature-label and store them into TFrecord files, then how to load and parse them?
I did not find appropriate example in the website of Tensorflow.
Or is there any better way to store and load the huge datasets? Thanks very much.
There are many ways to efficiently build data pipeline without TFRecord click thislink it was very useful
To extract images from directory efficiently then click this link.
Hope this helped you.

How to get weights into text file from saved model in Keras/Tensorflow

I have a trained model in 'h5' format. It has some layers and their names with it. I want to read the weights and put them in a single array text file.
I am trying to use h5py but it needs the name of layer in details manually and then weights can be extracted and saved.
Is there other technique to write the weights to text file automatically?

Convert textual document to tf.data in tensorflow for reading sequentially

In a textual corpus, there are 50 textual documents that each document approximately is about 80 lines.
I want to feed my corpus as an input to tensorflow, but I want to batch each document when system read each document? actually same as TfRecord that used for images I want to by using Tf.Data make batch each document in my corpus for reading it sequentially?
How can I solve this issue?
You can create a TextLineDataset that will contain the lines of your documents:
dataset = tf.data.TextLineDataset(['doc1.txt', 'doc2.txt', ...])
After you create the dataset, you can split the strings into batches using the batch method and other methods of the Dataset class.

How to use TensorFlow to predict large csv files by chunks and glue results together

Now that I've trained a predicting model with TensorFlow, and there's a large test.csv file that's too big to fit into memory, can it be possible to feed it by a smaller chunk at a time and then concat them again within one session?
Using tf.estimator.Estimator for your model and calling the predict method using the numpy_input_fn will give you all the pieces to build what you want.