Can we import pdf/doc files or contents to Docfx at once ? without copying line by line?

Can we import pdf/doc files or contents to Docfx at once ? without copying line by line? - documentation

I have number of pdf documentations, and I want to Can we import pdf/doc files or contents to Docfx at once ? without copying line by line? So that the file structure remains the same.

Related

trouble with utf-8 with julia and jupyterlab

I'm reading the csv file at https://github.com/VinitaSilaparasetty/julia-beginners/blob/master/data/nba/nba19-20.csv
I get a DataFrame and I save it as XLSX. When I try to read it in jupyterlab I get the error the file is not UTF-8 encoded and therefore the file is not read.
This is my code:
using HTTP, XLSX, CSV, DataFrames
df = CSV.read(HTTP.get("https://raw.githubusercontent.com/VinitaSilaparasetty/julia-beginners/master/data/nba/nba19-20.csv").body)
# first(df,5) # first shows the top five rows ok
XLSX.writetable("data/nba/nba19-20.XLSX", collect(eachcol(df)), names(df), overwrite = true)
The file is saved in my data folder. When I try to open it with jupyterlab, I get a pop up with the file is not UTF-8 encoded and the file is not opened.
When I try to open the file in Ubuntu (with LibreOffice) I do not see anything suspicious.
As I'm new to Julia I'm struggling to understand where the problem lies or how to fix it.
I tried to see if I could encode the dataframe in UTF-8 (after saving the file to disk) with
data = DataFrame(CSV.File(open(read,"data/nba/nba19-20.csv", enc"utf-8")))
But I did not see any change. Any suggestion is welcome.

Do you have the jupyterlab-spreadsheet plugin installed? JupyterLab by default doesn't support opening xlsx files (it isn't mentioned in the file formats list here for example).
See also this similar question involving Python pandas (which says pretty much the same thing).

Pandas to_csv with ZIP compresses whole directory

df.to_csv("/path/to/destination.zip", compression="zip")
The above line will generate a file called destination.zip in the directory /path/to/.
Decompressing the ZIP file, will result in a directory structure path/to/destination.zip where destination.zip is the CSV file.
Why is the path/to/ folder structure included in the compressed file? Is there any way to avoid this?
Was blown away by this, currently writing the ZIP locally (destination.zip) and using os.rename to move it to the desired location.. Is this a bug ?

Importing a random csv file from a folder into pandas

I have a folder with several csv files, with file names between 100 and 400 (Eg. 142.csv, 278.csv etc). Not all the numbers between 100-400 are associated with a file, for example there is no 143.csv. I want to write a loop that imports 5 random files into separate dataframes in pandas instead of manually searching and typing out the file names over and over. Any ideas to get me started with this?

You can use glob and read all the csv files in the directory.
file = glob.glob('*.csv')
random_files=np.random.choice(file,5)
dataframes= []
for fp in random_files :
dataframes.append(pd.read_csv(fp))
From this you can chose the random 5 files from directory and then read them seprately.
Hope I answer your question

Import Data From CSV Using Control File Is failing

I am importing CSV file into HANA server using control file using hdbsql and for that purpose i am using IMPORT FROm CSVs statements into control file. My HANA Studio file import is working fine but when i am trying to import through hdbsql using control file as input , my import is failing for no reason, no error.
My CSV file is record delimited {CR}{LF} and i am using '\r\n' as record delimited separator and this file is UTF-16LE encoded.

Just add to #LarsBr. comment, you also need to be careful on where you will load the file from.
It needs to be in a specific directory or you will need to adjust the configuration to use a different one.
Here is a tutorial I wrote to explain that: https://developers.sap.com/tutorials/mlb-hxe-import-data-sql-import.html
There is a "ERROR LOG" option available as well documented here: https://help.sap.com/viewer/4fe29514fd584807ac9f2a04f6754767/latest/en-US/20f712e175191014907393741fadcb97.html

In kettle use text file input read csv file from a tar.gz file but it didn't worked. Where it might be wrong?

I have a csv file that is tared and zipped. So I have test.tar.gz.
I would like, through text file input, read csv file.
I try this tar:gz:file://C:/test/test.tar.gz!/test.tar! use wildcard like ".*\.csv".
But it sometime can't read success.
It throws Exception
org.apache.commons.vfs.FileNotFolderException:
Could not list the contents of
"tar:gz:file:///C:/test/test.tar.gz!/test.tar!/"
because it is not a folder.
I use windows8.1, pdi 5.2
Where it might be wrong?

For a compressed file csv reading, "Text File Input" step in Pentaho Kettle only supports the first files inside the compressed folder(either in Zip/GZip file). Check the Pentaho Wiki in the compression section.
Now for your issue, try removing the wildcard entry since only the first file inside the zip/gzip file will be read. (as explained above)
I have placed a sample code containing both reading zip and gzip files. Check it here.
Hope it helps :)

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Can we import pdf/doc files or contents to Docfx at once ? without copying line by line? - documentation

I have number of pdf documentations, and I want to Can we import pdf/doc files or contents to Docfx at once ? without copying line by line? So that the file structure remains the same.

Related

trouble with utf-8 with julia and jupyterlab

Pandas to_csv with ZIP compresses whole directory

Importing a random csv file from a folder into pandas

Import Data From CSV Using Control File Is failing

In kettle use text file input read csv file from a tar.gz file but it didn't worked. Where it might be wrong?

Categories

Resources

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Can we import pdf/doc files or contents to Docfx at once ? without copying line by line? - documentation

I have number of pdf documentations, and I want to Can we import pdf/doc files or contents to Docfx at once ? without copying line by line? So that the file structure remains the same.

Related

trouble with utf-8 with julia and jupyterlab

Pandas to_csv with ZIP compresses whole directory

Importing a *random* csv file from a folder into pandas

Import Data From CSV Using Control File Is failing

In kettle use text file input read csv file from a tar.gz file but it didn't worked. Where it might be wrong?

Categories

Resources

Importing a random csv file from a folder into pandas