Why pandas.Series.values drops some scalar data exists in original Series? - pandas

I have a pandas.Series object docs of which scalar values are string.
when I tries to iterate over docs.values, for example make list(docs), some of the scalar entries are dropped, or become a NoneType.
For instance, given target_index is an index with buggy issue, when I check docs[target_index], it returns a string data. However, when I execute list(docs)[target_index], it returns None.
Since pandas.Series.values turn data into numpy.ndarray, I guess the issue has something to do with numpy data type or something, but I cannot figure out whats going wrong exactly.
Here is the buggy json file of dataframe
https://gist.github.com/goodcheer/f9c990171a57ff053b4b0539396f63f6
docs is the profile column series

Related

Alternative to case statments when changing a lot of numeric controls

I'm pretty new to LabVIEW, but I do have experience in other programing languages like Python and C++. The code I'm going to ask about works, but there was a lot of manual work involved when putting it together. Basically I read from a text file and change control values based on values in the text file, in this case its 40 values.
I have set it up to pull from a text file and split the string by commas. Then I loop through all the values and set the indicator to read the corresponding value. I had to create 40 separate case statements to achieve this. I'm sure there is a better way of doing this. Does anyone have any suggestions?
There could be done following improvements (additionally to suggested by sweber:
If file contains just data, without "label - value" format, then you could read it as csv (comma separated values) format, and read actually just 1st row.
Currently, you set values based on order. In this case, you could: create reference to all indicators, build them to array in proper order, in For Loop assign values to indicators via property node Value.
Overall, I support sweber that if it is some key - value data, then better to use either JSON format, or .ini file format, which support such structure.
Let's start with some optimization:
It seems your data file contains nothing more than just 40 numbers. You can wire an 1D DBL array to the default input of the string-to-array VI, and you will get just a 1D array out. No need for a 2D array.
Second, there is no need to convert the FOR index value to a string, the CASE accepts integers, too.
Now, about your question: The simplest solution is to display the values as array, just as they come from the string-to-array VI.
But I guess each value has a special meaning, and you would like to display it's name/description somehow. In this case, create a cluster with 40 values, edit their labels as you like, and make sure their order in the cluster is the same as the order of the values in the files.
Then, wire the 1D array of values to this cluster via an array-to-cluster VI.
If you plan to use the text file to store and load the values, converting the cluster data to JSON and vv. might be something for you, as it transports the labels of the cluster into the file, too. (However, changing labels is an issue, then)

tensorflow work function with data

I have a question. For example, I have an array and want to change their boundary values. I want to write a function that changes these values. I get an array to this function.
The question is does these function link to these array or copy it?
Best regards,
Elijah.
Values of nodes are immutable. Only variables can be modified.
If you have an operation that changes some of the values it will produce a different array.
It's not specified if it's modified in place or first copied and then modified. That is implementation dependent.

Dynamic type cast in select query

I have totally rewritten my question because of inaccurate description of the problem!
We have to store a lot of different informations about a specific region. For this we need a flexible data structure which does not limit the possibilities for the user.
So we've create a key-value table for this additional data which is described through a meta table which contains the datatype of the value.
We already use this information for queries over our rest api. We then automatically wrap the requested field with into a cast.
SQL Fiddle
We return this data together with information form other tables as a JSON object. We convert the corresponding rows from the data-table with array_agg and json_object into a JSON object:
...
CASE
WHEN count(prop.name) = 0 THEN '{}'::json
ELSE json_object(array_agg(prop.name), array_agg(prop.value))
END AS data
...
This works very well. Now the problem we have is if we store data like a floating point number into this field, we then get returned a string representation of this number:
e.g. 5.231 returns as "5.231"
Now we would like to CAST this number during our select statement into the right data-format so the JSON result would be correctly formatted. We have all the information we need so we tried following:
SELECT
json_object(array_agg(data.name),
-- here I cast the value into the right datatype!
-- results in an error
array_agg(CAST(value AS datatype))) AS data
FROM data
JOIN (
SELECT name, datatype
FROM meta)
AS info
ON info.name = data.name
The error message is following:
ERROR: type "datatype" does not exist
LINE 3: array_agg(CAST(value AS datatype))) AS data
^
Query failed
PostgreSQL said: type "datatype" does not exist
So is it possible to dynamically cast the text of the data_type column to a postgresql type to return a well-formatted JSON object?
First, that's a terrible abuse of SQL, and ought to be avoided in practically all scenarios. If you have a scenario where this is legitimate, you probably already know your RDBMS so intimately, that you're writing custom indexing plugins, and wouldn't even think of asking this question...
If you tell us what you're actually trying to do, there's about a 99.9% chance we can tell you a better way to do it.
Now with that disclaimer aside:
This is not possible, without using dynamic SQL. With a sufficiently recent version of PostgreSQL, you can accomplish this with the use of 'EXECUTE IMMEDIATE', which you can read about in the manual. It basically boils down to using EXEC.
Note, however, that even using this method, the result for every row fetched in the same query must have the same data type. In other words, you can't expect that row 1 will have a data type of VARCHAR, and row 2 will have INT. That is completely impossible.
The problem you have is, that json_object does create an object out of a string array for the keys and another string array for the values. So if you feed your JSON objects into this method, it will always return an error.
So the first problem is, that you have to use a JSON or JSONB column for the values. Or you can convert the values from string to json with to_json().
Now the second problem is that you need to use another method to create your json object because you want to feed it with a string array for the keys and a json-object array for the values. For this there is a method called json_object_agg.
Then your output should be like the one you expected! Here the full query:
SELECT
json_object_agg(data.name, to_json(data.value)) AS data
FROM data

Do Apache Pig UDFs ever get called with null tuples?

https://wiki.apache.org/pig/UDFManual
The example UDF has a null-check on the input tuple in the exec method. The various built-in methods sometimes do and sometimes don't.
Are there actually any cases where a Pig script will cause a UDF to be called with a null input tuple? Certainly an empty input tuple is normal and expected, or a tuple of one null value, but I've never the tuple itself be null.
A tuple may be null because a previous UDF returns null.
Think on log analysis system, where you 1. parse the log, 2. enrich it with external data.
LOG --(PARSER)--> PARSED_LOG --(ENRICHENER)--> ENRICHED_LOG
If one log LOG have wrong format and cannot be parsed, the UDF PARSED_LOG may returns null.
Therefore, if used directly, the ENRICHENER have to test the input.
You can FILTER those null values before too, specially if the tuples are used multiple times or STORED.
My best understanding after working with Pig for a while is that bare nulls are never passed bare nulls, always a non-null Tuple (which itself may contain nulls.)

Pandas interprets 'timestamp without timezones' columns as different types

I read a table with pandas:
import pandas as pd
import numpy as np
con = psycopg2.connect(...)
mframe = pd.read_sql('''select dt_A, dt_B from (...)''',con)
Both columns (dt_A and dt_B) are of type 'timestamp without timezone' in the database. However, they are read as different types by pandas:
mframe.dt_A.dtype, mframe.dt_B.dtype
Yields:
(dtype('O'), dtype('<M8[ns]'))
I was able to force both columns to be recognized as
"<M8[ns]"
using the 'parse_dates' parameter, but I'd like to understand what causes this. As far as I've checked, neither column contains any 'Na's (which was my first suspicion). What could case them to be interpreted differently?
Update:
I'm using Pandas version 0.15.1; and I can reproduce the problem using both sqlalchemy and psycopg2 connections.
Update 2: running the original query with a small limit works as I expected - that is, both columns have the same dtype "M8[ns]". Still not sure what kind of entry (something ill-formatted?) is causing this, but I'm satisfied for now.
Update 3: joris got it. See the comments below.
As you noted that it works correctly when limiting to some data (with adding LIMIT 5 to your query), it probably has to do with some 'incorrect' values in the dates.
To find out what value is causing the problem, you can read in all data (resulting in the object dtype), and then do the conversion manually with:
pd.to_datetime(column, errors='raise')
The errors='raise' will ensure you get an error message indicating which date cannot be converted.
To ensure that the column is converted to datetime64 values, regardless of invalid values, you should specify the column in the parse_dates kwarg.
It seems that when using read_sql_table, the invalid date will be converted to NaT automatically, while read_sql_query will leave the values as datetime.datetime values. I opened an issue for this inconsistency: https://github.com/pydata/pandas/issues/9261