I just started so that might be stupid, but I have following problem:
I created a .csv-file for some basic data description. However, although they are all numerical values without any missing values when using df.dtyped() I receive all variables as objects with only some being int64 or float64. Do I have to manually convert all object variables to numerical ones with code?
Or is there anything I did wrong when creating my csv?
Also the date I have saved in the format yyyy-mm-dd is shown as object instead of date format.
The numbers of the data range from [0,2] for some variables and [0,2000000] for others.
Could the formatting in Excel be a problem?
Is there any "How to build your csv"-documentation? So that I dont have to ask stupid beginner questions like this?
Additionally, I was told for a model to work properly I need to do some Scaling/Normalization of my data as the value ranges differ a lot.. Where can I find more information on that?
I would suggest you just do data type conversion before saving the CSV file. you can use the below function as well for conversion.
astype()
to_numeric()
convert_dtypes()
you can use the attached link for scaling information. https://www.analyticsvidhya.com/blog/2020/07/types-of-feature-transformation-and-scaling/
pd.read_csv has already an option to specify the type so if you want you can specify the dtypeType with read_csv. For the date, you always have to change the format to datetime
To scale or normalize your date is going to depend on which machine learning model you are going to use also.
For example : if use a random forest and a KNN, the KNN will need to have scaling feature since it works with distance.
Hands-On Machine Learning with Scikit-Learn, Keras, and Tensorflow: Concepts, Tools, and Techniques to Build Intelligent Systems is a good book to start in my personal opinion
Thanks for the ideas.
In the end a pd.readcsv(title, decimal:',') helped to create them as floats. As I used german formatting.
But conversion with to_numeric() also worked
I'm doing some exercises of Databases from LeetCode. I want to test my codes on my laptop using MySQL. I hope to have a easy way to import data.
Here is the input data from LeetCode:\
{"headers":{"insurance":["PID","TIV_2015","TIV_2016","LAT","LON"]},"rows":
{"insurance":[[1,224.17,952.73,32.4,20.2],[2,224.17,900.66,52.4,32.7],
[3,824.61,645.13,72.4,45.2],[4,424.32,323.66,12.4,7.7],
[5,424.32,282.9,12.4,7.7],[6,625.05,243.53,52.5,32.8],
[7,424.32,968.94,72.5,45.3],[8,624.46,714.13,12.5,7.8],
[9,425.49,463.85,32.5,20.3],[10,624.46,776.85,12.4,7.7],
[11,624.46,692.71,72.5,45.3],[12,225.93,933,12.5,7.8],
[13,824.61,786.86,32.6,20.3],[14,824.61,935.34,52.6,32.8]]}}
What is the data type?
This is a JSON string. JSON is a common data interchange format.
A third-party program stores tracking data to the db, but I not understand the format. I know that postgis is working there and this column should contain GPS location(s) and maybe additional data.
Example (db dump as csv):
"Location","DateTime"
"010100000023E37C4023E33C40417F41EF407F4740","2020-05-24 15:33:53+00"
How can I decode Location column data?
This is Well-known binary format.
See PostGIS methods for WKB: ST_AsBinary, ST_GeomFromWKB.
WKT methods: ST_AsText, ST_GeomFromText.
The example in WKT format: POINT(28.887256651392033 46.99416914651966).
For .Net can use Geo, NetTopologySuite.IO.TinyWKB.
I copy a csv file with a json-string column to the data flow.
I want to flatten it by the json-string column, but the column is not recognized as a json format.
How do I convert it to json-format column, or do you have other ways to deal with it? Thank you
You could ref my answer here: https://stackoverflow.com/a/65770042/10549281
If you have any other concerns, please feel free to let me know.
HTH.
What is the best way to handle datatype conversion between MySQL and PHP while using Phalcon models. When a datetime field is retrieved from MySQL, it is converted to a string which I want to automatically convert to datetime. Similarly for MySQL decimal fields, I want to convert the value to a custom Decimal field.
So, where exactly does this datatype conversion happen? OR if it does not, what's the best way to achieve this kind of data conversion? I went through the documentation but couldn't find anything relevant to this.
Any help is highly appreciated.
There are two ways to handle this that I know of.
One is using model annotations to describe metadata:
http://docs.phalconphp.com/en/latest/reference/models.html#annotations-strategy
This will solve your issue with decimals but not with datetime it sounds like.
The other is by using an afterFetch hook to mutate the model:
http://docs.phalconphp.com/en/latest/reference/models.html#initializing-preparing-fetched-records