Processing loading table data - file-upload

I have a text file "celldata.txt" containing a very simple table of data.
1 2 3 4
5 6 7 8
9 10 11 12
1 2 3 4
2 3 4 5
The problem is when it comes to accessing the data at a certain column and row.
My approach has been to load using loadTable.
Table table;
int numCols;
int numRows;
void setup() {
size(200,200);
table = loadTable("celldata.txt","tsv");
numRows=table.getRowCount();
numCols=table.getColumnCount();
}
void draw() {
background(255);
fill(0);
text(numRows +" "+ numCols,100,100); // Check num of cols and rows
println(table.getFloat(0,0));
}
Question 1: When I do this, it says the number of rows are 5 and the number of columns is just 1. Why is it not 5 x 4?
Question 2: Why is table.getFloat(0,0) "NaN" instead of the first element of the data?
I want to use a much bigger matrix later and access certain elements (of type double) with something like getFloat(i,j) and be able to loop through all elements.
Using the same example data as I, can someone please help me understand what is wrong with my code and how to access the textfile's data? Should I be using another method than loadTable?

You've told Processing that the file contains tab separated values (by using the "tsv" option), but your file contains space separated values.
Since your file does not contain any tabs, it reads the entire row as a single value. So the 0,0 position of your table is 1 2 3 4, which isn't a number- hence the NaN. This is also why it thinks your table only has one column.
You should modify your celldata.txt file to actually be separated by tabs instead of spaces:
1 2 3 4
5 6 7 8
9 10 11 12
1 2 3 4
2 3 4 5
You could also separate them by commas and then use the "csv" option.
If you're still having trouble, you can see what Processing is reading in by adding saveTable(table, "data/new.csv"); to the end of your setup() function and then looking at that file. It will be a list of values separated by commas, so you can see exactly where Processing thinks the cells of the table are.

Related

Is there a way to detect special chars such as '?' or any, in a column in huge dataframe with thousands of records?

INPUT
A B C
0 1 2 3
1 4 ? 6
2 7 8 ?
... ... ... ...
551 4 4 6
552 3 7 9
There might be '?' in between somewhere which is undetectable, I tried doing it with
pd.to_numeric, error='coerce'
but it only show first 5 and last 5 rows, and I cant check all rows/columns for special chars
So how to actually deal with this problem and make dataset clean
Once detected I know how to remove those and fill with their respective column mean values, so thats not an issue
Please I'm new to this stack overflow and switching from a non-IT field
The below is an easier way without using regex.
special = '[#_!#$%^&*()<>?/\|}{~:]'
df['B'].str.count(special)
Please refer to below link to do it using regex:
regex

Unable to identify strange whitespace character in MSSQL table

We have a process that reads an XML file into our database and inserts any rows that aren't currently in another table to that table.
This process also has a trigger to write to an audit table and a nightly snapshot is also held in another table.
In the XML holding table a field looks like 1234567890123456 but it exists on our live table as 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6. Those spaces will not be removed by any combination of REPLACE functions. We have tried all CHAR values and it does not recognise the character. The audit table and nightly snapshot, however, contain the correct values.
Similarly, if we run a comparison between SELECT CASE WHEN '1234567890123456' = '1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 ' THEN 1 ELSE 0 END, this returns 1, so they match. However LEN('1234567890123456') is 16 and LEN('1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 ') is 32.
We have ran some queries to loop through the characters in the field and output the ASCII and Unicode values for the characters. The digits return the correct ASCII/Unicode values, but this random whitespace character does not return a value.
An example of the incorrectly displayed one is 0x35000000320000003800000036000000380000003300000039000000370000003800000037000000330000003000000035000000340000003000000033000000 and a correct one is 0x3500320038003600380033003200300030003000360033003600380036003000. Both were added by the same means on the same day. One has the extra bytes, the other is fine.
How can we identify this character and get rid of it? Is there a reason this would have been inserted originally? How can we avoid this in future?
Data entry
It looks like some null (i.e. Char(0)) characters have got into the data.
If the data was supposed to be ASCII when it was entered but UTF-16 data got, then it could be:
Entered character codes: 48 00
Sent to database: 48 00 00 00
To avoid that, remove disallowed characters as the first step in processing the input, say by using a regex to replace [\x00-\x1F] with an empty string.
Data repair
Search for entries which a Char(0) in them to confirm that they can be found that way.
If so, replace the Char(0) with an empty string.
If that doesn't work, you could convert the data to the format '0x35000000320000003800000036000000380000003300000039000000370000003800000037000000330000003000000035000000340000003000000033000000', replace '000000' with '00', and then convert back.

SPSS: How can I copy values from a variable (column) and paste it below the other one using syntax?

[SPSS] How can I copy values from a variable (column) and paste it below the other one by syntax?
I need to merge 10 columns and I cant do this only by copy paste.
I have this: [1]: https://i.imgur.com/I5DFV.jpg "tooltip"
var1 var2
1 3 6
2 4 7
3 5 8
4
5
.
.
.
and I want this:
newvar
1 3
2 4
3 5
4 6
5 7
6 8
If you want to create new lines (so you get two lines with one variable instead of one line with two variables), You can use varstocases like this:
varstocases /make NewVar from Var1 Var2/index=originVar(NewVar).
this will get both the old variables into the new one, and create an additional variable called originVar which will contain the name of the original variable that each number in NewVar came from.
ADDITION:
if your file was originally sorted by a specific variable(s) you can now just sort again by your original variable and by originVar. If you don't have a variable that conserves the original order, just create one before rustructure:
compute OrigOrder=$casenum.
restructure....
sort cases by OrigOrder originVar./* or by originVar OrigOrder.
Your example may imply that you already have empty lined to which you want to copy values from previous lines. This is a different situation, you can do it this way:
compute NewVar=Var1.
if missing(NewVar) NewVar=lag(Var2).

gnuplot: Spurious data points in plots when using index

I'm trying to use gnuplot 4.6 patchlevel 6 to visualize some data from a file test.dat which looks like this:
#Pkg 1
type min max avg
small 1 10 5
medium 5 15 7
large 10 20 15
#Pkg 2
small 3 9 5
medium 5 13 6
large 11 17 13
(Note that the values are actually separated by tabs even though it shows as spaces here.)
My gnuplot commands are
reset
set datafile separator "\t"
plot 'test.dat' index 0 using 2:xticlabels(1) title col, '' using 3 title col, '' using 4 title col
This works fine as long as there is only a single data block in test.dat. When I add the second block spurious data points appear. Why is that and how can it be fixed?
YFTR: Using stat on the file yields only expected results. It reports two data blocks for the full file and correct values (for min, max and sum) when I specify one of the two using index
as mentioned in the comment to the question, one has to explicitly repeat the index 0 specification within all parts of the plot command as
plot 'test.dat' index 0 using 2, '' index 0 using 3, ...
otherwise '' refers to all blocks in the data file

SAS INPUT COLUMN

I have a problem in SAS, I would like to know how can I input several columns in only one column(put everything in a single variable)?
For example, I have 3 columns but I would like to put this 3 columns in only one column.
like this:
1 2 3
1 3 1
3 4 4
output:
1
1
3
2
3
4
3
1
4
I'm assuming you're reading from a file, so use the trailing ## to keep reading variables past the end of the line:
data want;
input a ##;
cards;
1 2 3
1 3 1
3 4 4
;
run;
If the dataset is not big just split it to several small data set with one variable each, then rename all variables to one name and concatenate vertiacally using simple set statement. I am sure there are more elegant solutions than this one and if your data set is big let me know, I will write the actual code needed to perform this action with optimal coding