CTF input file for CNTK. Entire sequence (dynamic axis) in one single line without Sequence ID - cntk

I have a model with multiple fixed size inputs and one dynamic axis input. I was creating a CTF format file for this. As mentioned in the documentation, for dynamic size input we should keep the CTF input as
0 |word 234:1 |tag 12:1
0 |word 123:1 |tag 10:1
0 |word 123:1 |tag 13:1
1 |word 234:1 |tag 12:1
1 |word 123:1 |tag 10:1
...
I was wondering if we can change this to a single line input without needing to specify explicit sequence IDs, something like this
|word 234:1 |tag 12:1 |word 123:1 |tag 10:1 |word 123:1 |tag 13:1
|word 234:1 |tag 12:1 |word 123:1 |tag 10:1
Can this be done? Will the CTFDeserializer maintain the order of the sequence in this case?

This is allowed. If sequence IDs are omitted, every line will be interpreted as a new sequence.

Related

Conditionally remove a field in Splunk

I have a table generated by chart that lists the results of a compliance scan
These results are typically Pass, Fail, and Error - but sometimes there is "Unknown" as a response
I want to show the percentage of each (Pass, Fail, Error, Unknown), so I do the following:
| fillnull value=0 Pass Fail Error Unknown
| eval _total=Pass+Fail+Error+Unknown
<calculate percentages for each field>
<append "%" to each value (Pass, Fail, Error, Unknown)>
What I want to do is eliminate a "totally" empty column, and only display it if it actually exists somewhere in the source data (not merely because of the fillnull command)
Is this possible?
I was thinking something like this, but cannot figure out the second step:
| eventstats max(Unknown) as _unk
| <if _unk is 0, drop the field>
edit
This could just as easily be reworded to:
if every entry for a given field is identical, remove it
Logically, this would look something like:
if(mvcount(values(fieldname))<2), fields - fieldname
Except, of course, that's not valid SPL
could you try that logic after the chart :
``` fill with null values ```
| fillnull value=null()
``` do 90° two time, droping empty/null ```
| transpose 0 include_empty=false | transpose 0 header_field=column | fields - column
[edit:] it is working when I do the following but not sure it is easy to make it working on all conditions
| stats count | eval keep=split("1 2 3 4 5"," ") | mvexpand keep
| table keep nokeep
| fillnull value=null()
| transpose 0 include_empty=false | transpose 0 header_field=column | fields - column
[edit2:] and if you need to add more null() could be done like that
| stats count | eval keep=split("1 2 3 4 5"," "), nokeep=0 | mvexpand keep
| table keep nokeep
| foreach nokeep [ eval nokeep=if(nokeep==0,null(),nokeep) ]
| transpose 0 include_empty=false | transpose 0 header_field=column | fields - column

How to convert print output to pyspark dataframe (no pandas allowed)

The usual code
print((sparkdf.count(), len(sparkdf.columns)))
Since I using HDFS system that fully on HDFS, no pandas allowed, The output I need
|-------|-------|
|row |columns|
|-------|-------|
|1500 | 22 |
|-------|-------|
Just use spark.createDataFrame and pass the values as a list of tuple:
spark.createDataFrame([(sparkdf.count(), len(sparkdf.columns))], schema=['rows', 'columns'])

SQL - Extracting first 5 consecutive numbers from alphanumeric string

I am using AWS Athena, so functions are a bit limiting. But essentially I want to extract the first 5 consecutive and sequential numbers from a alphanumeric field.
From the first example, you can see it ignores the first 1 because there aren't 4 trailing numbers. I want to find and extract the first 5 numbers that are given together from this field. The output field is what I am hoping to achieve.
This will find an exact sequence of 5 digits.
a sequence of less or more than 5 digits will be ignored.
^|\D = Indication for the start of the text OR a non-digit character
\d{5} = 5 digits
\D|$ = A non-digit character OR indication for the end of the text
with t (Example) as (values ('Ex/l/10345/Pl'), ('Ex/23453PlWL'), ('ID09456//'))
select Example, regexp_extract(Example, '(^|\D)(\d{5})(\D|$)', 2) as Output
from t
+---------------+--------+
| Example | Output |
+---------------+--------+
| Ex/l/10345/Pl | 10345 |
| Ex/23453PlWL | 23453 |
| ID09456// | 09456 |
+---------------+--------+

VB.net | Save data from DataGridView to text file (Line per line)

The application I designed so far has a DataGridView which loads data from a text file line per line.
All I want is for the code to save the (first row, first column) on the first line as a string, then (first row, second column) on the second line, etc.
Here is an example of what my table looks like:
|-------------------------------------------------------|
| ID | Date | Height | Weight | BMI | Units |
|-------------------------------------------------------|
| 01 | 16/06 | 1.74 | 64 | 20.9 | Metric |
| 02 | 17/06 | 1.74 | 63 | 20.6 | Metric |
|-------------------------------------------------------|
So from this example, after the data has been saved to the text file it should look exactly like this:
01
16/06
1.74
64
20.9
Metric
02
17/06
1.74
63
20.6
Metric
I came across some excellent code which does this with tabs, instead of a next line, here it is:
dgvMeasures.ClipboardCopyMode = DataGridViewClipboardCopyMode.EnableWithoutHeaderText
dgvMeasures.SelectAll()
IO.File.WriteAllText(fileName, dgvMeasures.GetClipboardContent.GetText.TrimEnd)
dgvMeasures.ClearSelection()
NOTE: The DataGridView is called dgvMeasures
Also please note that I cannot provide anything that I have already tried since there is nothing I can do, I have no idea what to do.
So if there is anyone who could help, it would be greatly appreciated
To do this, you just need to use a writer, and go through it in the way you want.
Using writer As New System.IO.StreamWriter(filePath)
For row As Integer = 0 To dgvMeasures.RowCount - 1
For col As Integer = 0 To dgvMeasures.ColumnCount - 1
writer.WriteLine(dgvMeasures.Rows(row).Cells(col).Value.ToString)
Next
Next
End Using
This will go through each column for each row (as you describe), and then go to the next row.
I am sure you have a reason for writing the text file like this, but if you want to read it back in at some point, I would really recommend using a tab-delimited (or similar) format.

Tabulate Command Stata

I don't know if Stata can do this but I use the tabulate command a lot in order to find frequencies. For instance, I have a success variable which takes on values 0 to 1 and I would like to know the success rate for a certain group of observations ie tab success if group==1. I was wondering if I can do sort of the inverse of this operation. That is, I would like to know if I can find a value of "group" for which the frequency is greater than or equal to 15% for example.
Is there a command that does this?
Thanks
As an example
sysuse auto
gen success=mpg<29
Now I want to find the value of price such that the frequency of the success variable is greater than 75% for example.
According to #Nick:
ssc install groups
sysuse auto
count
74
#return list optional
local nobs=r(N) # r(N) gives total observation
groups rep78, sel(f >(0.15*`r(N)')) #gives the group for which freq >15 %
+---------------------------------+
| rep78 Freq. Percent % <= |
|---------------------------------|
| 3 30 43.48 57.97 |
| 4 18 26.09 84.06 |
+---------------------------------+
groups rep78, sel(f >(0.10*`nobs'))# more than 10 %
+----------------------------------+
| rep78 Freq. Percent % <= |
|----------------------------------|
| 2 8 11.59 14.49 |
| 3 30 43.48 57.97 |
| 4 18 26.09 84.06 |
| 5 11 15.94 100.00 |
+----------------------------------+
I'm not sure if I fully understand your question/situation, but I believe this might be useful. You can egen a variable that is equal to the mean of success, by group, and then see which observations have the value for mean(success) that you're looking for.
egen avgsuccess = mean(success), by(group)
tab group if avgsuccess >= 0.15
list group if avgsuccess >= 0.15
Does that accomplish what you want?