How to remove rows in a dataframe whose column values are not in a list - pandas

I have a dataframe with several different possible values for a particular column. I also have a set that has the column values of rows that I actually care about. I want to update the dataframe such that it removes all rows whose column values are not found in the list I made . How would I do this?

If I get your question then, for a given column col you could do something like this:
df = df.loc[df[col].isin(your_list)]

Related

Pandas - Insert values at specific index and specific column but if there are multiple same index values

I am wondering if there's a way to insert some scalar number into a dataframe at specific index and specific column but if there are multiple same index values:
I have this dataframe with the index value (TEVA US EQUITY) but it appears multiple times (6) in the index. I want to insert a value in the first non-empty row at specific column, so that it looks like
and the next time I insert values, I want to upload them to the second row (because the second row is the first non-empty row).

How to delete a specific row in pandas dataframe

I want to delete 9 rows with duplicated IDs, got the index for them then tried delete but the length after deletion shows that 714 rows got deleted. Is the pandas index not unique? How can I do this correctly
I search the pandas dataframe and identified the rows I wanted to delete. Wrote down the IDs then attempted deletion but something went wrong it seems.
I suspect the pandas index is not unique and more than intended row got deleted.
How can I create a unique index or how to use the index correctly?
len(CrimeClean) #result 690130
CrimeCleanV1 = CrimeClean.drop([5650, 3725, 6373, 2469, 7751, 7058, 3859, 3640, 3141])
#Validation, row 7751 should not appear
CrimeCleanV1[CrimeCleanV1.Crime_ID == "56882eb6d444d5677ac90c06a0582fe70fe1fd932fd5bd902a5aa4a2aa363bf3"]
#Only one row instead of two appeared as intended
len(CrimeCleanV1) #result 689416
You can do:
CrimeCleanV1.reset_index(drop=True,inplace=True)
Then
CrimeCleanV1[~CrimeCleanV1.index.isin([List with your row numbers where now they are indexes])]
You wrote in your comment: they are not completely the same.
So I assume that the criterion to mark a row as duplicate is a list
of columns, which should be unique.
So you can drop duplicates, passing subset parameter, with just
this list of columns.
Another point to decide is which duplicated rows are actually to be
deleted (keep parameter) - leave first occurrence / leave
last occurrence / drop all.
Another important test concerning your data is to run:
CrimeClean[CrimeClean.index in [...]]
substituting as [...] the same list as you used in CrimeClean.drop.
Then you will see, how many rows exist with the passed id values.

How to split single column (with unequal values) to multiple columns sorted according to values from the original single column?

How do I separate values from single columns to multiple columns with the new columns sorted by values (and sometimes blank cells) using Excel Macro code?
Try: =IF(ISNUMBER(SEARCH(B$1,$A2)),B$1,"")

I want to filter column on specific names in column, and delete these

I have values in my dataframe column that are bad for my data and I need to remove them, I only know how to do this with one row:
df=df[df.name!='susan']
but I want to delete 4 other names within my column

How would you total the values in one column and keep a distinct value in another?

I have a database table that has multiple codes in one column that correspond to certain values in another column. For example, a particular code in column A corresponds to a value in column B. There are thousands of duplicate entries in column A that correspond to different values in column B. I want to add up all of the values in column B that have the particular code in column A, while only keeping one copy of the code from column A. You may think of the columns as key-value pairs, where column A contains the key and column B contains the value.
Basically, I want to add all the values in column B where column A is a specific value, and I want to do for this all of the unique "keys" in column A. I'm sure that this is a simple task; however, I am pretty new to SQL. Any help would be greatly appreciated.
Here is the result I'm looking for.
This should work:
SELECT
A,
SUM(B) AS sum_b
FROM [yourTable]
GROUP BY A
SELECT COLUMNA, SUM(ISNULL(COLUMNB,0)) AS TOTAL
FROM
dbo.TableName
GROUP BY COLUMNA