Collectd with CSV output

Collectd with CSV output - collectd

I have installed Collectd on Linux Ubuntu and I added plugin for CSV output in collectd.conf file, I am getting 2 column values in the csv ,but as there is no column name specifically, could someone et me know what values it is importing and is it possible to format the csv as we wish.

The first column is the timestamp in epoch, the second column is the value collectd is reporting - the value depends on the CSV file you're looking at, like cpu-user, memory-free, etc

you can see folders hierarchy, like user>cpu-0>cpu-idle-date from this cpu-idle-date you can see epoch and values here.
This means now you are opening csv files from cpu-idle data in cpu folders created by collectd.

Related

python pandas removes leading 0 while writing to csv

I am facing an issue which might be related to this question and others similar to this, i decided to create a sperate question because i feel my problem might have some additional things that i need to consider. Here is what I am facing right now.
I have a dataframe in pandas where it reads the data from sql and shows up something like following:
in picture it shows me that values have leading '0' and the datatype of this column is 'object'.
when i run this SQL and export to csv on my windows machine (python 3.7, pandas 1.0.3), it works exactly as required and shows the correct output,
the problem occurs when i try to run on my Linux machine (python 3.5.2, pandas 0.24.2), it always removes the leading zeros while writing to CSV, the csv looks like the following image:
i am not sure, what should i be changing to get the desired result at both environments. will appreciate any help.
Edit:
confirmed that read from SQL in ubuntu dataframe also has leading zeros:

If you can use xlsx files instead of csv, then replace df.to_csv with df.to_excel and the file extension to xlsx.
With xlsx files you also get to store the types, so excel will not assume them to be numbers.
csv vs excel

Pentaho | Issue with CSV file to Table output

I am working in Pentaho spoon. I have a requirement to load CSV file data into one table.
I have used , as delimter in CSV file. I can see correct data in preview of CSV file input step. But when I tried to insert data into Table Output step, I am getting data truncation error.
This is because I have below kind of values in one of my column.
"2,ABC Squere".
As you see, I have "," in my column value so it is truncating and throwing error.How to solve this problem?
I want to upload data in Table with this kind of values..

Here is one way of doing it
test.csv
--------
colA,colB,colC
ABC,"2,ABC Squere",test
See below the settings. The key is to use "" as encloser and , as delimiter.

you can change the delimiter say to PIPE and also keeping data as quoted text like "1,Name" this will treat the same as 1 column

My file gets truncated in Hive after uploading it completely to Cloudera Hue

I am using Cloudera's Hue. In the file browser, I upload a .csv file with about 3,000 rows (my file is small <400k).
After uploading the file I go to the Data Browser, create a table and import the data into it.
When I go to Hive and perform a simple query (say SELECT * FROM table) I only see results for 99 rows. The original .csv has more than those rows.
When I do other queries I notice that several rows of data are missing although they show in the preview in the Hue File Browser.
I have tried with other files and they also get truncated sometimes at 65 rows or 165 rows.
I have also removed all the "," from the .csv data before uploading the file.

I finally solved this. There were several issues that appeared to cause a truncation.
The main was that the variable type automatically set after importing the data was assigned according to the first lines. So when the data type changed from TinyINT to INT it got truncated or changed to "NULL". To solve this perform EDA and change the datatype before creating the table.
Other issues were that the memory I had assigned to the virtual machine slowed the preview process and that the csv contained commas. You can set the VM to have more memory or change a csv to tab separated.

Bash creating csv from values

I'm trying to create a csv in my bash script from some values I'm getting from another non-csv file.
The problem I see is that the values have commas(,) in them.
The csv file wrong because of that (the values with commas in them are 2 or more different values now)
Is there any way to get rid of that problem or any other way to build a csv in a bash script. I can create any other files too, it just needs to be compatible with standard sql import.
Thanks

I now added quotation marks before and after every value and it works great. The csv looks like it should.

pentaho ETL Tool data migration

I am migrating the data through pentaho. there is a problem occur when the number of rows is more than 4 lankhs.transaction fail in b/w the transaction.how can we migrate the large data by pentaho ETL Tool.

As a basic debugging, do the following
If your output is a text file or Excel file, make sure that you check the size of string/text columns. As defaut the 'text ouput step' will take the maximum string length and when you start writing, it can throw up heap errors. So reduce the size and re-run the ktr files.
If the output is a table ouput step, then again check for columns with datatypes and maximum column size defined in your output table.
Kindly share the error logs if you think there is something else running around. :)

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Collectd with CSV output - collectd

I have installed Collectd on Linux Ubuntu and I added plugin for CSV output in collectd.conf file, I am getting 2 column values in the csv ,but as there is no column name specifically, could someone et me know what values it is importing and is it possible to format the csv as we wish.

The first column is the timestamp in epoch, the second column is the value collectd is reporting - the value depends on the CSV file you're looking at, like cpu-user, memory-free, etc

you can see folders hierarchy, like user>cpu-0>cpu-idle-date from this cpu-idle-date you can see epoch and values here. This means now you are opening csv files from cpu-idle data in cpu folders created by collectd.

Related

python pandas removes leading 0 while writing to csv

Pentaho | Issue with CSV file to Table output

My file gets truncated in Hive after uploading it completely to Cloudera Hue

Bash creating csv from values

pentaho ETL Tool data migration

Categories

Resources