How to load a key with values of different types in tfrecord? - tensorflow

I have some third-party generated tfrecord files. I just found there is a specific key that has different value types in these tfrecord files, shown as follows.
key: "similarity"
value {
float_list {
value: 0.3015786111354828
}
}
key: "similarity"
value {
bytes_list {
value: ""
}
}
When I try to decode this key-value pair in tfrecord, I encounter a problem. I cannot find the suitable type for this key similarity. When I use tf.string or tfds.features.Text() in tfds.features.FeaturesDict for decoding, it returns the error
Data types don't match. Data type: float but expected type: string
When I use tf.float64 in tfds.features.FeaturesDict for decoding, it returns the error
Data types don't match. Data type: string but expected type: float
I wonder if there is anything in tfds.features or tf.train.Example that allows me to decode both float and string?
Or if there is something like tfds.decode.SkipDecoding() that allows me read this key similarity and decide how to decode it afterwards? I am aware that tfds.builder().as_dataset() has that option, but I cannot find one in tf.data.TFRecordDataset. I have tried to simply remove the entry correspondind to the key similarity, but the data read from the tfrecord dataset simply drop the entry similarity.
Thanks a lot!

Related

Group lines of text to a map using Kotlin.Collections functions

Let's say I have a text file with contents like this:
[John]
likes: cats, dogs
dislikes: bananas
other info: is a man baby
[Amy]
likes: theater
dislikes: cosplay
[Ghost]
[Gary]
age: 42
Now, this file is read to a String. Is there a way to produce a Map<String, List<String>> that would have the following content:
Key: "John"
Value: ["likes: cats, dogs", "dislikes: bananas"., "other info: is a man baby"]
Key: "Amy"
Value: ["likes: theater", "dislikes: cosplay"]
Key: "Ghost"
Value: []
Key: "Gary"
Value: ["age: 42"]
that is to say, is there a sequence of Kotlin.Collections operators that would take the key from the brackets and take all the following lines as a value for that key, collecting all these key-value pairs into a map? The number of lines belonging to any given entry is unknown beforehand - there might be any amount of lines of properties, including zero lines.
I'm aware this is trivial to implement without Kotlin.Collections; the question is, is there a (possibly elegant) way of doing it with the Collections operations?
You can do it like this:
text.split("\n\n")
.associate {
val lines = it.split("\n")
lines[0].drop(1).dropLast(1) to lines.drop(1)
}
Try it yourself
Here, we first divide the entire text into a list (by splitting with consecutive new lines) where each element contains data for one person.
Next we use associate to convert the list into a map by mapping each list element to a map entry. To get the map entry we first get the lines from person data string. The key is lines[0].drop(1).dropLast(1) i.e. first line after removing the first ([) and last (]) characters. The value is the list of all lines except the first one.
This might work, this divides the content by [ and then take the remaining elements for each group.
text.split("\\s(?=\\[)".toRegex())
.map { it.split("\n").filter(String::isNotEmpty) }
.associate {
it.first().replace("[\\[\\]]".toRegex(), "") to it.takeLast(it.size-1)
}

How to chain filter expressions together

I have data in the following format
ArrayList<Map.Entry<String,ByteString>>
[
{"a":[a-bytestring]},
{"b":[b-bytestring]},
{"a:model":[amodel-bytestring]},
{"b:model":[bmodel-bytestring]},
]
I am looking for a clean way to transform this data into the format (List<Map.Entry<ByteString,ByteString>>) where the key is the value of a and value is the value of a:model.
Desired output
List<Map.Entry<ByteString,ByteString>>
[
{[a-bytestring]:[amodel-bytestring]},
{[b-bytestring]:[bmodel-bytestring]}
]
I assume this will involve the use of filters or other map operations but am not familiar enough with Kotlin yet to know this
It's not possible to give an exact, tested answer without access to the ByteString class — but I don't think that's needed for an outline, as we don't need to manipulate byte strings, just pass them around. So here I'm going to substitute Int; it should be clear and avoid any dependencies, but still work in the same way.
I'm also going to use a more obvious input structure, which is simply a map:
val input = mapOf("a" to 1,
"b" to 2,
"a:model" to 11,
"b:model" to 12)
As I understand it, what we want is to link each key without :model with the corresponding one with :model, and return a map of their corresponding values.
That can be done like this:
val output = input.filterKeys{ !it.endsWith(":model") }
.map{ it.value to input["${it.key}:model"] }.toMap()
println(output) // Prints {1=11, 2=12}
The first line filters out all the entries whose keys end with :model, leaving only those without. Then the second creates a map from their values to the input values for the corresponding :model keys. (Unfortunately, there's no good general way to create one map directly from another; here map() creates a list of pairs, and then toMap() creates a map from that.)
I think if you replace Int with ByteString (or indeed any other type!), it should do what you ask.
The only thing to be aware of is that the output is a Map<Int, Int?> — i.e. the values are nullable. That's because there's no guarantee that each input key has a corresponding :model key; if it doesn't, the result will have a null value. If you want to omit those, you could call filterValues{ it != null } on the result.
However, if there's an ‘orphan’ :model key in the input, it will be ignored.

What is an efficient way to parse query result into a struct that has two fileds: a string and an array of structs using pkg sqlx?

I have written the following query and found myself trying to find an efficient way to parse the result of it:
SELECT modes.mode_name, confs.config_name, confs.field1, confs.field2, confs.field3
FROM modes
JOIN confs ON modes.config_name = confs.name
WHERE modes.mode_name = $1 ORDER BY confs.config_name ASC;
Foe each mode there are multiple corresponding configs - table modes has two columns forming a primary key - mode_name and config_name.
Here are the structs I have to use:
type Mode struct {
Name string `db:"mode_name"`
Configs []Config
}
type Config struct {
Name string `db:"name" json:"-"`
Mode string `db:"mode_name" json:"mode_name,omitempty"`
Field1 float32 `db:"field1" json:"field1,omitempty"`
Field2 float32 `db:"field2" json:"field2,omitempty"`
Field3 float32 `db:"field3" json:"field3,omitempty"`
}
I expect to find a way to populate Mode struct with the data from the query above:
Name from mode_name
Then parse each corresponding config into Config struct and add them to []Configs into the Configs field
I have studied the docs for pkg sqlx, picked and tried several options that looked promising:
sqlx.QueryxContext alongside attempting to iterate over Rows with StructScan
sqlx.NamedExec - parsing directly into the struct (which fails yet again as mine has an embedded struct inside).
Both of them failed and I am beginning to think there might be no elegant way to solve the task in these circumstances with the aforementioned tools.

Get field names from a TFRecord

Given a .tfrecord file, we can define a record iterator
record_iterator = tf.python_io.tf_record_iterator(path=record)
Then parse it using
example = tf.train.SequenceExample()
for element in record_iterator:
example.ParseFromString(element)
The question is: how do we infer all the field names in the context ?
If we know the structure in advance, we can say example.context.feature["width"]. In addition, str(example.context) returns a string with the entire record structure. However, I was wondering if there is any built-in function to get the field names and avoid parsing this string (e.g. by searching for "key")

How can I change column data type from float to string in Julia?

I am trying to get a column in a dataframe form float to string. I have tried
df = readtable("data.csv", coltypes = {String, String, String, String, String, Float64, Float64, String});
but I got complained
syntax: { } vector syntax is discontinued
I also have tried
dfB[:serial] = string(dfB[:serial])
but it didn't work either. So, I'd like to know what would be the proper approach to change column data type in Julia.
thx
On your first attempt, Julia tells you what the problem is - you can't make a vector with {}, you need to use []. Also, the name of the keyword argument should be eltypes rather than coltypes.
On the second try, you don't have a float, you have a Vector of floats. So to change the type you need to change the type of all elements. In Julia, elementwise operations on vectors are generalized by the 'dot' syntax, e.g. string.(collect(dfB[:serial])) . The collect is needed currently to cast the DataArray to a normal Array first – this will fail if the DataArray contains NAs. IMHO the DataFrames interface is still rather wonky, so expect a few headaches like this ATM.