Comparing numbers in JSON Schema - jsonschema

I have number property in JSON schema
"years": {"type": "number", "pattern": "^([0-9]|10)$"}
I want to match this number in a condition where I need to check whether number is less than 3, is there a way to do it ? I tried
"if": {"properties": {"years": {"anyOf": [0,1,2]}}

you want exclusiveMaximum. see https://json-schema.org/understanding-json-schema/reference/numeric.html#range
note that you may want minimum: 0 to exclude negative numbers.
you may also want type: integer instead of type: number if you don't want to include fractional numbers.
pattern is incorrect as it applies to strings, not numbers.
anyOf takes a schema, not values, but you could use enum: [0, 1, 2] if those are the only allowed values.

Related

Can OpenRefine easily do One Hot Encoding?

I have a dataset like a multiple choice quiz result. One of the fields is semi-colon delimited. I would like to break these in to true/false columns.
Input
Student
Answers
Alice
B;C
Bob
A;B;D
Carol
A;D
Desired Output
Student
A
B
C
D
Alice
False
True
True
False
Bob
True
True
False
True
Carol
True
False
False
True
I've already tried "Split multi-valued cells" and "Split in to several columns", but these don't give me what I would like.
I'm aware that I could do a custom grel/python/jython along the lines of "if value in string: return true" for each value, but I was hoping there would be a more elegant solution.
Can anyone suggest a starting point?
GREL in OpenRefine has a somehow limited number of datastructures, but you can still build simple algorithms with it.
For your encoding you need two datastructures:
a list (technical array) of all available categories.
a list of the categories in the current cell.
With this you can check for each category, whether it is present in the current cell or not.
Assuming that the number of all available categories is somehow assessable,
I will use a hard coded list ["A", "B", "C", "D"].
The list of categories in the current cell we get via value.split(/\s*;\s*/).
Note that I am using an array instead of string matching
and use splitting with a regular expression considering whitespace.
This is mainly defensive programming and hopefully the algorithm will still be understandable.
So let's wrap this all together into a GREL expression and create a new column (or transform the current one):
with(
value.split(/\s*;\s*/),
cell_categories,
forEach(
["A", "B", "C", "D"],
category,
if(cell_categories.inArray(category), 1, 0)))
.join(";")
You can then split the new column into several columns using ; as separator.
The new column names you have to assign manually (sry ;).
Update: here is a more elaborate version to automatically extract the categories.
The idea is to create a single record for the whole dataset to be able to access all the entries in the column "Answers" and then extract all available categories from it.
Create a new column "Record" with content "Record".
Move the column "Record" to the beginning.
Blank down the column "Record".
Add a new column "Categories" based on the column "Answers" with the following GREL expression:
if(row.index>0, "",
row.record.cells["Answers"].value
.join(";")
.split(/\s*;\s*/)
.uniques()
.sort()
.join(";"))
Fill down the column "Categories".
Add a new column "Encoding" based on the column "Answers with the following GREL expression:
with(
value.split(/\s*;\s*/),
cell_categories,
forEach(
cells["Categories"].value.split(";"),
category,
if(cell_categories.inArray(category), 1, 0)))
.join(";")
Split the column "Encoding" on the character ;.
Delete the columns "Record" and "Categories".

Is there an equivalent of an f-string in Google Sheets?

I am making a portfolio tracker in Google Sheets and wanted to know if there is a way to link the "TICKER" column with the code in the "PRICE" column that is used to pull JSON data from Coin Gecko. I was wondering if there was an f-string like there is in Python where you can insert a variable into the string itself. Ergo, every time the Ticker column is updated the coin id will be updated within the API request string. Essentially, string interpolation
For example:
TICKER PRICE
BTC =importJSON("https://api.coingecko.com/api/v3/coins/markets?vs_currency=usd&ids={BTC}","0.current_price")
You could use CONCATENATE for this:
https://support.google.com/docs/answer/3094123?hl=en
CONCATENATE function
Appends strings to one another.
Sample Usage
CONCATENATE("Welcome", " ", "to", " ", "Sheets!")
CONCATENATE(A1,A2,A3)
CONCATENATE(A2:B7)
Syntax
CONCATENATE(string1, [string2, ...])
string1 - The initial string.
string2 ... - [ OPTIONAL ] - Additional strings to append in sequence.
Notes
When a range with both width and height greater than 1 is specified, cell values are appended across rows rather than down columns. That is, CONCATENATE(A2:B7) is equivalent to CONCATENATE(A2,B2,A3,B3, ... , A7,B7).
See Also
SPLIT: Divides text around a specified character or string, and puts each fragment into a separate cell in the row.
JOIN: Concatenates the elements of one or more one-dimensional arrays using a specified delimiter.

derive features from date string in TensorFlow

I try to parse a CSV file which contains a date string (format "2018-03-30 09:30:05").
It should be turned into one-hot encoded features in the form of day / hour / minute / second.
One obvious way to do this is using pandas and store in a separate file or HDF store.
But in order to simplify the workflow (and leverage the GPU), I would like to do this directly in TensorFlow.
Assume the date string is on position -2, I thought something like tf.int32(tf.substr(row[-2],0,4)) should work to get the year, but it returns TypeError: 'DType' object is not callable.
with tf.python_io.TFRecordWriter("train_sample_sorted.tfrecords") as tf_writer:
i = 0
for row in myArray:
i +=1
if(i%10000==0):
print(row[-2])
#timefeatures = int(row[-2][0:4]) ## TypeError: Value must be iterable
#timefeatures = tf.int32(tf.substr(row[-2],0,4)) ## TypeError: 'DType' object is not callable
features, label = row[:-2], row[-1]
example = tf.train.Example()
example.features.feature["features"].float_list.value.extend(features)
example.features.feature["timefeatures"].float_list.value.extend(timefeatures)
example.features.feature["label"].int64_list.value.append(label)
tf_writer.write(example.SerializeToString())
What is the best practice to handle date strings as input features? Is there a way around pre-processing?
Thanks
The first version int( row[ -2 ][ 0 : 4 ] ) fails for two reasons: one is that indexing cannot be used on a string tensor's strings, and if it didn't fail for that, it would fail because you cannot convert it to int like that.
The second version tf.int32( tf.substr( row[ -2 ], 0, 4 ) ) is almost there, it does the string splitting fine, but to convert strings to numbers you have to use tf.string_to_number you cannot simply cast a string to a number like that with tensors.
Without access to the data you use I couldn't test it, but this should work:
tf.string_to_number( tf.substr( row[ -2 ], 0, 4 ), out_type = tf.int32 )

SSRS Format Custom Number

I have a report that I am running that will end up with values like this
100.00
99.98
98.80
100.00
I have a custom format set up where I want to remove the decimals after the 100s and keep all other decimal places. Here is my expression
=IIF(Fields!Field1.Value Is "100.00", "100", Fields!Field1.Value)
This works for the 100 but it removes the 0 after 98.80. Here is what I end up with.
100
99.98
98.8
100
Is there a way to not remove that single trailing 0?
I would leav ethe value expression alone, just let that be the field value.
I would do this in the format expression of the cell/textbox instead. So the format expression would be something like.
=IIF(Fields!Field1.Value = 100, "f0", "f2")
This assumes your data is numeric.
=IIF(Fields!Field1.Value Is "100.00", "100", Format(Fields!Field1.Value,"##.##"))

how we can find regular expression for following strings

Find regular expressions representing the following set:
The set of all strings over {a,b} in which the number of
occurrences of a is divided by 3.
The set of all strings over {0,1} beginning with 00
You can draw out a DFA and use that to find the regular expression.
For example, for 1., it would be
Then you use convert this into a regular expression. This is one way
For 1, you need an expression that gives every possible way of having a string over {a,b} with the occurrences of a divisible by 3. There can be 0 a's since 0 is divisible by 3. There can be 3 a's, 6 a's, 9 a's, and so on. An expression for this is (bababab)+b. The second term allows for the possibility of 0 a's and any amount of b's since 0 a's is divisible by 3. The first term accounts for all other possibilities of strings with a number of a's divisible by 3.
For 2, the set of all strings over {0,1} is (0+1)* and if it must begin with 00, then the regex is simply 00(0+1)*