CTF input file for CNTK. Entire sequence (dynamic axis) in one single line without Sequence ID

CTF input file for CNTK. Entire sequence (dynamic axis) in one single line without Sequence ID - cntk

I have a model with multiple fixed size inputs and one dynamic axis input. I was creating a CTF format file for this. As mentioned in the documentation, for dynamic size input we should keep the CTF input as
0 |word 234:1 |tag 12:1
0 |word 123:1 |tag 10:1
0 |word 123:1 |tag 13:1
1 |word 234:1 |tag 12:1
1 |word 123:1 |tag 10:1
...
I was wondering if we can change this to a single line input without needing to specify explicit sequence IDs, something like this
|word 234:1 |tag 12:1 |word 123:1 |tag 10:1 |word 123:1 |tag 13:1
|word 234:1 |tag 12:1 |word 123:1 |tag 10:1
Can this be done? Will the CTFDeserializer maintain the order of the sequence in this case?

This is allowed. If sequence IDs are omitted, every line will be interpreted as a new sequence.

Related

Conditionally remove a field in Splunk

I have a table generated by chart that lists the results of a compliance scan
These results are typically Pass, Fail, and Error - but sometimes there is "Unknown" as a response
I want to show the percentage of each (Pass, Fail, Error, Unknown), so I do the following:
| fillnull value=0 Pass Fail Error Unknown
| eval _total=Pass+Fail+Error+Unknown
<calculate percentages for each field>
<append "%" to each value (Pass, Fail, Error, Unknown)>
What I want to do is eliminate a "totally" empty column, and only display it if it actually exists somewhere in the source data (not merely because of the fillnull command)
Is this possible?
I was thinking something like this, but cannot figure out the second step:
| eventstats max(Unknown) as _unk
| <if _unk is 0, drop the field>
edit
This could just as easily be reworded to:
if every entry for a given field is identical, remove it
Logically, this would look something like:
if(mvcount(values(fieldname))<2), fields - fieldname
Except, of course, that's not valid SPL

could you try that logic after the chart :
``` fill with null values ```
| fillnull value=null()
``` do 90° two time, droping empty/null ```
| transpose 0 include_empty=false | transpose 0 header_field=column | fields - column
[edit:] it is working when I do the following but not sure it is easy to make it working on all conditions
| stats count | eval keep=split("1 2 3 4 5"," ") | mvexpand keep
| table keep nokeep
| fillnull value=null()
| transpose 0 include_empty=false | transpose 0 header_field=column | fields - column
[edit2:] and if you need to add more null() could be done like that
| stats count | eval keep=split("1 2 3 4 5"," "), nokeep=0 | mvexpand keep
| table keep nokeep
| foreach nokeep [ eval nokeep=if(nokeep==0,null(),nokeep) ]
| transpose 0 include_empty=false | transpose 0 header_field=column | fields - column

How to convert print output to pyspark dataframe (no pandas allowed)

The usual code
print((sparkdf.count(), len(sparkdf.columns)))
Since I using HDFS system that fully on HDFS, no pandas allowed, The output I need
|-------|-------|
|row |columns|
|-------|-------|
|1500 | 22 |
|-------|-------|

Just use spark.createDataFrame and pass the values as a list of tuple:
spark.createDataFrame([(sparkdf.count(), len(sparkdf.columns))], schema=['rows', 'columns'])

SQL - Extracting first 5 consecutive numbers from alphanumeric string

I am using AWS Athena, so functions are a bit limiting. But essentially I want to extract the first 5 consecutive and sequential numbers from a alphanumeric field.
From the first example, you can see it ignores the first 1 because there aren't 4 trailing numbers. I want to find and extract the first 5 numbers that are given together from this field. The output field is what I am hoping to achieve.

This will find an exact sequence of 5 digits.
a sequence of less or more than 5 digits will be ignored.
^|\D = Indication for the start of the text OR a non-digit character
\d{5} = 5 digits
\D|$ = A non-digit character OR indication for the end of the text
with t (Example) as (values ('Ex/l/10345/Pl'), ('Ex/23453PlWL'), ('ID09456//'))
select Example, regexp_extract(Example, '(^|\D)(\d{5})(\D|$)', 2) as Output
from t
+---------------+--------+
| Example | Output |
+---------------+--------+
| Ex/l/10345/Pl | 10345 |
| Ex/23453PlWL | 23453 |
| ID09456// | 09456 |
+---------------+--------+

VB.net | Save data from DataGridView to text file (Line per line)

The application I designed so far has a DataGridView which loads data from a text file line per line.
All I want is for the code to save the (first row, first column) on the first line as a string, then (first row, second column) on the second line, etc.
Here is an example of what my table looks like:
|-------------------------------------------------------|
| ID | Date | Height | Weight | BMI | Units |
|-------------------------------------------------------|
| 01 | 16/06 | 1.74 | 64 | 20.9 | Metric |
| 02 | 17/06 | 1.74 | 63 | 20.6 | Metric |
|-------------------------------------------------------|
So from this example, after the data has been saved to the text file it should look exactly like this:
01
16/06
1.74
64
20.9
Metric
02
17/06
1.74
63
20.6
Metric
I came across some excellent code which does this with tabs, instead of a next line, here it is:
dgvMeasures.ClipboardCopyMode = DataGridViewClipboardCopyMode.EnableWithoutHeaderText
dgvMeasures.SelectAll()
IO.File.WriteAllText(fileName, dgvMeasures.GetClipboardContent.GetText.TrimEnd)
dgvMeasures.ClearSelection()
NOTE: The DataGridView is called dgvMeasures
Also please note that I cannot provide anything that I have already tried since there is nothing I can do, I have no idea what to do.
So if there is anyone who could help, it would be greatly appreciated

To do this, you just need to use a writer, and go through it in the way you want.
Using writer As New System.IO.StreamWriter(filePath)
For row As Integer = 0 To dgvMeasures.RowCount - 1
For col As Integer = 0 To dgvMeasures.ColumnCount - 1
writer.WriteLine(dgvMeasures.Rows(row).Cells(col).Value.ToString)
Next
Next
End Using
This will go through each column for each row (as you describe), and then go to the next row.
I am sure you have a reason for writing the text file like this, but if you want to read it back in at some point, I would really recommend using a tab-delimited (or similar) format.

Tabulate Command Stata

I don't know if Stata can do this but I use the tabulate command a lot in order to find frequencies. For instance, I have a success variable which takes on values 0 to 1 and I would like to know the success rate for a certain group of observations ie tab success if group==1. I was wondering if I can do sort of the inverse of this operation. That is, I would like to know if I can find a value of "group" for which the frequency is greater than or equal to 15% for example.
Is there a command that does this?
Thanks
As an example
sysuse auto
gen success=mpg<29
Now I want to find the value of price such that the frequency of the success variable is greater than 75% for example.

According to #Nick:
ssc install groups
sysuse auto
count
74
#return list optional
local nobs=r(N) # r(N) gives total observation
groups rep78, sel(f >(0.15*`r(N)')) #gives the group for which freq >15 %
+---------------------------------+
| rep78 Freq. Percent % <= |
|---------------------------------|
| 3 30 43.48 57.97 |
| 4 18 26.09 84.06 |
+---------------------------------+
groups rep78, sel(f >(0.10*`nobs'))# more than 10 %
+----------------------------------+
| rep78 Freq. Percent % <= |
|----------------------------------|
| 2 8 11.59 14.49 |
| 3 30 43.48 57.97 |
| 4 18 26.09 84.06 |
| 5 11 15.94 100.00 |
+----------------------------------+

I'm not sure if I fully understand your question/situation, but I believe this might be useful. You can egen a variable that is equal to the mean of success, by group, and then see which observations have the value for mean(success) that you're looking for.
egen avgsuccess = mean(success), by(group)
tab group if avgsuccess >= 0.15
list group if avgsuccess >= 0.15
Does that accomplish what you want?

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

CTF input file for CNTK. Entire sequence (dynamic axis) in one single line without Sequence ID - cntk

This is allowed. If sequence IDs are omitted, every line will be interpreted as a new sequence.

Related

Conditionally remove a field in Splunk

How to convert print output to pyspark dataframe (no pandas allowed)

SQL - Extracting first 5 consecutive numbers from alphanumeric string

VB.net | Save data from DataGridView to text file (Line per line)

Tabulate Command Stata

Categories

Resources