READ CSV from DROPBOX - pandas

I have a program that takes input from a CSV that is in my colleague's dropbox at a certain URL and builds a dataframe. The program starts like this:
df=pd.read_csv('www.dropbox.com/s/certainurl=1')
I know that you have to put dl=1 otherwise it doesn't work but as I try to run the program it did see the CSV but it downloads data in a confusing way and subsequently, it cannot read the columns. Of course, if I download the CSV and I run the program taking the CSV from local all works perfectly.
I don't have Dropbox and looking in similar question on StackOverflow no answers fits.
How to solve it?
Thanks in advance!

FWIW the problem was that my colleague saved the file as a particular CSV that uses ";" instead of "," to separate values. So I fixed with:
df=pd.read_csv('www.dropbox.com/s/certainurl=1', sep=";")
and the program worked.

Related

Access when exporting it removes spaces at the end of a string

Long story short, I am dealing with an excel files, which need to be modified a little bit. As the files are coming on weekly basis, I decided to write a simple program via Access, which will help me to make the process fully automatic.
The first step was to upload the excel file into an Access database. I managed to achieve that by creating a custom function and inside to just use the "DoCMD.TransferSpreadsheet acImport" aproach.
The second step was to create two queries and update the table that I just uploaded. That was also pretty straight forward too.
However, the third step is what I am struggling with. Now when the table is updated I wanted to export it back to .xlsx format. However, when I do that no matter if I do it manually via the "External Data" tab or simply use "DoCMD.TransferText acExport" approach I noticed that a few columns that have a space after the end of the string are trimmed automatically. For example, original:"string ", but after exporting it is changed to "string".
I would be really grateful if someone can tell me how to specify to Access that the space after the string is intended and not done by mistake? Preferably with a VBA solution than having to do it manually. Thank you in advance for the help!
PS: I know that .CSV format would be way better, but sadly I need it to be in a XLSX format.

How can I upload from Google Sheets to BigQuery? Can I do it through a Pandas DF?

Okay. Let me tell you the long story here. I'm not really a programmer; I basically use Python to make some of my chores easier.
I have been saving RSS feed data at Google Sheets for months (yes, with IFTTT); it has been quite easy, but now it's monstrous in size and hard to query. So, I'm trying to upload those into BigQuery.
The easiest way would be to download as a CSV and then upload to GCS, but it breaks due to characters included in feed item titles. If it recognizes something as an unclosed comma, the CSV blobs together all subsequent values until it recognizes something as a closing comma. If I clean those characters in a text editor, I lose important information in idiosincraticly formatted URLs.
Then I tried uploading from Sheets to BigQuery, setting a "create table" job at the graphical user interface. Should be easy, since the option for importing from Sheets is there. Nope, though. It recognizes the number of columns but imports zip from them.
Then I had the idea of importing the data into a Pandas dataframe on Colab, then doing some cleaning, then uploading to BigQuery. The importing and the cleaning did work, but I didn't find documentation I was able to follow on uploading to BQ.
I tried uploading the dataframe into GCS as a CSV, so I could later upload it to BQ, but forget about it: the same CSV errors happen. I need to bypass CSV as a middleman.
Any ideas of what I could do?

pyPDF2 error - PyPDF2.utils.PdfStreamError: Stream has ended unexpectedly

I am Python newbie and wrote a script about a year back to retrieve pdf files and merge them into a single file/book. The script works and does what I need it to do, but lately as I am no longer the only user there seems to be some files that I think are causing it to crash. My suspicion is that it may be newer pdf files that have higher resolution images or forms etc. that pyPDF2 may not be able to handle, but I don't know that for sure. One thing that would help to know is what file(s) is creating the issue to scan the files, instead of using the native pdf file, but I can't figure that out since the error occurs upon the writing of all files. Below is the snippet of my code that I use for merging them along with the error. Is pyPDF2 still being updated? Is there are better supported/commercial utility out there that can do this? Any suggestions and help would be greatly appreciated!
for f in Submittal_Files:
h_path, f_name = os.path.split(f)
outfile.append(open(f, 'rb'))
outfile.write(open(Final_Submittal_File_Name + ".pdf", 'wb'))
print("\nSUCCESSFULLY COMPLETED!")
print("PRESS ENTER TO END!")
program_holder = input()
break
Sorry about the long error log before. I won't be able to get the files until Monday 7/23 when I will have access to them. I will provide 2 examples.
LOG:
https://www.dropbox.com/s/ymbe1tnak7uuvs7/PDF-Merge-Crash-log.txt?dl=0
File Set #1 - Large Submittal - I had to delete some files to protect the innocent. They are mostly cover pages for sections and I know those are not an issue as they are used all the time.
https://www.dropbox.com/s/xavrpfxmo6dr7mb/Case%20%231%20-%20JUST%20PDFs.rar?dl=0
File Set #2 - Smaller Submittal - Same as above.
https://www.dropbox.com/s/2lxk8bts0w2qsx8/Case%20%232%20-%20JUST%20PDFS.rar?dl=0

VB.net: Is there a way to get the Printed File Path from Printform?

im working on my first big program. I will try to explain short how the important part of the program works, and then try to explain what my problem is.
My program is used by few people, they drive around the Europe and repair our machines. After the work, they start my program, and write a report. Until now the program was generating at the end 3 Files. (PDF file generated by printform, text file which contained the same information's again and the last file that was an Excel file, that one contained Data that was written inside the Datagridview.
These workers, used Email to send all 3 files separated. As you imagine, sometimes that can end bad, cause after work they are tired and sometimes they send the wrong files. So I made a upgrade, which gives the user a possibility to send the files directly from the Program, being sure everything is fine. In background I created a directory where 2 of 3 files always get saved. The problem is, while using printform, there opens a window where the user can select the path. And here start the troubles, some of the workers select different a different path, but then my program wont find the files again(its very important that are 3 files are together). I searched for something that would look like
dim printformpath as string = printform.getpath
is there something that works that way? I was searching but I didn't found anything helpful.
Thank you for understanding & help
Thanks, I added the path to the printFileName property and changed printform settings from Print to preview to print to file :)
Have a nice day

Method to inspect first 4 bytes and rename file extension

I have a large batch of assorted files, all missing their file extension.
I'm currently using Windows 7 Pro. I am able to "open with" and experiment to determine what application opens these files, and rename manually to suit.
However I would like some method to identify the correct file type (typically PDF, others include JPG, HTML, DOC, XLS and PPT), and batch rename to add the appropriate file extension.
I am able to open some files with notepad and review the first four bytes, which in some cases shows "%PDF".
I figure a small script would be able to inspect these bytes, and rename as appropriate. However not all files give such an easy method. HTML, JPG, DOC etc do not appear to give such an easy identifier.
This Powershell method appears to be close: https://superuser.com/questions/186942/renaming-multiple-file-extensions-based-on-a-condition
Difficulty here is focusing the method to work on file types with no extension; and then what to do with the files that don't have the first four bytes identifier?
Appreciate any help!!
EDIT: Solution using TriD seen here: http://mark0.net/soft-trid-e.html
And recursive method using Powershell to execute TriD here: http://mark0.net/forum/index.php?topic=550.0
You could probably save some time by getting a file utility for Windows (see What is the equivalent to the Linux File command for windows?) and then writing a simple script that maps from file type to extension.
EDIT: Looks like the TriD utility that's mentioned on that page can do what you want out of the box; see the -ae and -ce options)
Use python3.
import os,re
fldrPth = "path/to/folder" # relative to My Documents
os.chdir(fldrPth)
for i in os.listdir():
with open(i,'r') as doc:
st = doc.read(4)
os.rename(i,i+'.'+re.search(r'\w+',st).group())
Hopefully this would work.
I don't have test files to check the code. Take a backup and then run it and let me know if it works.