Pandas Python How to handle question mark that appeared in dataframe - pandas

I have these question marks that appeared in my data frame just next to numbers and I dont know how to erase or or replace them. I dont want to drop the whole row since it may result in inaccurate results.
. Value
0 58
1 82
2 69
3 48
4 8

I agree with the comments above that you should look into how you imported the data. But here is the answer to your question of how to remove the non numeric characters:
This will remove the non numeric characters
df['Value'] = df['Value'].str.extract('(\d+)')
Then if you wish to change the datatype to in you can use this:
df['Value'] = pd.to_numeric(df['Value'])

Related

how to add a character to every value in a dataframe without losing the 2d structure

Today my problem is this: I have a dataframe of 300 X 41. Its encoded with numbers. I want to append an 'a' to each value in the dataframe so that another down stream program will not fuss about these being 'continuous variables' which they arent, they are factors. Simple right?
Every way I can think to do this though returns a dataframe or object that is not 300x 41...but just one long list of altered values:
Please end this headache for me. How can I do this in a way that returns a 400 X 31 altered output?
> dim(x)
[1] 300 41
>x2 <- sub("^","a",x)
>dim(x2)
[1] 12300 1

groupby 2 columns and count into separate columns based on one columns cases

I'm trying to group by 2 columns of which the first value has 5 different values and the second 2.
My data looks like this:
and using
df_counted = df_analysis
.groupby(['TYPE', 'RESULT'])
.size()
.sort_values(ascending=False)
.reset_index(name='COUNT')
I was able to transform it into the cases I want:
However I don't want a column for result, just for counts.
It's suppoed to be like
COUNT_TRUE COUNT_FALSE
FORWARD 21 182
BACKWARD 34 170
RIGHT 24 298
LEFT 20 242
NEUTRAL 16 82
The best I could do there was this. How do I get there?
Pandas has a feature of making a pivot table with dataframe. Your task can also be done by making pivot table.
df_counted.pivot_table(index="TYPE", columns="RESULT", values="COUNT")
Result:
Solved it and went a kind of full SQL there. It's not elegant, but it works:
df_counted is the last df from the question with the NaN values.
# drop duplicates for the first counts
df_pos = df_counted.drop_duplicates(subset=['TYPE'], keep='first').drop(columns=['COUNT_POS'])
# drop duplicates for the first counts
df_neg = df_counted.drop_duplicates(subset=['TYPE'], keep='last').drop(columns=['COUNT_NEG'])
# join on TYPE
df = df_pos.set_index('TYPE').join(df_neg.set_index('TYPE'))
If someone has a more elegant way of doing this, I'd be super interested to see it.

Pandas dataframe selection df['a'][50][:51]

I have a dataframe where one of the column name is 'a'
I came across a following selection expression
dataframe['a'][50][:50]
I understand dataframe['a'][50] selects the row 49 in column ['a'], but what does [:50] do?
Thank you
If dataframe['a'][50][:50] doesn't error out and it actually returns something, it means the row 49 in column ['a'] contains iterables(more precisely sequence types) such as list, string, tuple...
dataframe['a'][50][:50] returns the sequence from element 0 to 49 from the value of the row 49 in column ['a'].
As I said above, if the row 49 in column ['a'] doesn't contain a sequence type, you will get errors. Try check dataframe['a'][50] to see if it is a sequence type
Note: dataframe['a'][50] is chain-indexing. It is not recommended. However, it is out of the scope of this question so I don't go into the detail of it.

How to display the numeric numbers

Here's the content of my DataGrid
id
1
2
3A
4
5
6A
..
...
10V1
I want to get the max number from the datagrid. Then, I want to
display the next number (In this case: 11) in the textbox beside the grid
Expected Output
id
1
2
3A
4
5
6A
..
...
10V1
11
I tried the following code:
textbox1.text = gridList.Rows(gridlist.RowCount() - 1).Cells(1).Value + 1
It works if the previous row values is entirely numeric. However, if the value is alpahnumeric, I am getting the following error:
Conversion from string "10V1" to type 'Double' is not valid.
Can someone help me solve this problem? I am looking for a solution in VB.Net
You may want to look into Regex to do that (based on what I understand from your question)
Here's a related question on this.
Regex.Match will return the part of the string that will match the expression... In your case, you want the first number in your string (Try "^\d+" as your expression, it will find any serie of numbers at the beginning of your string). You can then convert the result string into an int and add 1 to it.
Hope this helps!
Edit: Here's more info on regex expressions.

How to split a really long mysql result set into two lines?

Suppose you have a result that is 100 chars long but you only have a 50 char width. How do you split a MYSQL result into two rows of 50 chars each?
Could you clarify the question a bit? Are you looking to insert 100 chars of data into a 50 char column? Or do you have 100 chars in the database but only have space in your app to display 50 chars?
I have 100 chars in the database result set but I want the result set string to have a break after the 50th char and continue onto the next line.
Example
SELECT * FROM FOO
returns
1 2 3 4 5 6 7 8 9...50 51 52 53..98 99 100
but I want
1 2 3 4 5 6 7 8 9...50
51 52... 99 100
Is this possible?
SELECT substring(col, 1, 50) FROM foo
UNION ALL
SELECT substring(col, 51) FROM foo
Your'e asking a question about formatting data for viewing. SQL is a declarative data retrieval language, not a data pretty formatting language. You should solve this problem in your non-SQL code.
Formatting data in a SQL query is not a good idea, unless you have to write something that will run in a query analyzer. Your question isn't specific about whether or not that is the case.
Do you want to return the result set in PHP or MySQL? If the former, then it's easier.
Take the string, and take the first 100 characters, put in a line break, and then the rest of the string.
MySQL would work on the same principle, but you may have issues with line-break characters.