I am trying to append several pandas dataframes to a csv file but I cannot know ahead of time which dataframe will be appended first as they are each generated on different worker machines. I can append each one to pre-made empty csv file by doing: df.to_csv('test.csv', mode='a', header=False) but then my csv has no header and I just has data.
If I make header=True then I have copies of the header every time I append which is redundant since they are all the same. Is there any direct way to overcome this? I suppose I can check the file each time I want to append to see if there is a header but that feels inefficient.
Related
I need to remove redundant data in xlsx file in which data is entered using pandas. Any help on how to remove the redundant data?
Thanks!
You can use pandas.read_excel to load your sheet as a pandas dataframe. Then use pandas.DataFrame.drop_duplicates to remove the redundant data. You can then write it as you want.
Not sure why i am loosing some values when I use separator(any) to save dataframe to csv.
Thanks a lot for your help
The input dataframe is enter image description here
So when I use
df.to_csv(csv_file, index=False, date_format='%m-%d-%Y', sep='-')
output is (note I have removed the delimiter'-' in the csv bellow screenshot manually to make the screenshot look clear). Main point is I am missing some values from some rows randomly like 1M Libor from 4th and 6th row with delimiter on. With out delimiter it saves the csv as it is in dataframe
VS
df.to_csv(csv_file, index=False, date_format='%m-%d-%Y')
output is
I have imported a .csv file using df=pandas.read_csv(.....). I calculated the number of rows in this dataframe df using print(len(df)) and it's size had some 30 rows less than the originally imported file. BUT, when I exported df directly after import as .csv (without doing any operation on this dataframe df) using df.to_csv(....), then the exported file had the same number of rows as the originally imported .csv file.
It's very hard for me to debug so as to explain the diffference between the lengths of the dataframe on one hand and both imported and exported .csv file on the other, as there are more than half-a-million rows in the dataset. Can anyone provide some hints as to what can cause such a bizzare behavior?
I need to use VBA to import a large CSV excel file into an Access table. The delimiter is "" (double quotes) except for some reason the first value is followed by " (only one quote) instead of two like every other value. The first row contains the column headers and are delimited the same way. At the bottom I have attached an example.
The CSV files are generated automatically by an accounting system daily so I cannot change the format. They are also quite large (150,000+ lines, many columns). I'm fairly new to VBA, so as much detail as is possible would be much appreciated.
Thanks in advance!
Example of format
That doesn't sound like a CSV file. Can you open it in Excel, convert it to a true CSV, and then import that into Access? You will find many VBA-driven import options at the URL below.
http://www.accessmvp.com/KDSnell/EXCEL_Import.htm
Also, take a look at these URLs.
http://www.erlandsendata.no/english/index.php?d=envbadacimportado
http://www.erlandsendata.no/english/index.php?d=envbadacimportdao
I have a CSV file which contain millions records/rows. The header row is like:
<NIC,Name,Address,Telephone,CardType,Payment>
In my scenario I want to load data "CardType" is equal to "VIP". How can I preform this operation without loading whole records in the file into a staging table?
I am not loading these records into a data warehouse. I only need to separate these data in CSV file.
The question isn't super-clear, but it sounds like you want to do some processing of the rows before outputting them back into another CSV file. If that's the case, then you'll want to make use of the various transforms available, notably Conditional Split. In there, you can look for rows where CardType == VIP and send those down one output (call it "Valid Rows"), and send the others into the default output. Connect up your valid rows output to your CSV destination and that should be it.