This question already has answers here:
How to filter Pandas dataframe using 'in' and 'not in' like in SQL
(11 answers)
Closed 3 years ago.
I have a dataframe and a list as follows.
id title description
0 17810732 "nn nn." "nnnn nnnn"
1 17810731 "mm mm." "mmmm mmmm"
2 17810739 "ll ll." "llll llll"
3 17810738 "jj jj." "jjjj jjjj"
ids = [17810738, 17810731]
I want to get a dataframe that only has rows corresponding to the ids list.
So my output should be as follows.
id title description
0 17810738 "jj jj." "jjjj jjjj"
1 17810731 "mm mm." "mmmm mmmm"
I have been using this code.
for id in ids:
print(df.loc[df["id"] == id])
However, it only returns seperate dataframes to each id, which is not what I need.
I am happy to provide more details if needed.
The solution is in isin method, so
df[df['id'].isin(ids)]
will do the trick.
Related
This question already has answers here:
How to use count and group by at the same select statement
(11 answers)
Closed 6 months ago.
I want to count a certain value in a column and output it as a number.
Here is an example:
id
job
1
police
2
police
3
ambulance
Now I want to count the value "police" in the "job" column and make the result in a number, so because there are two entries with "police" in the column it would be as output the number two. With the value "ambulance" it is only one entry so the result would be 1.
Can anyone tell me how to write this as code?
I have now searched a lot on the Internet and tried myself but I have found nothing that worked.
You're saying you want to count how many of each type of job there is, right?
SELECT COUNT(*), job
FROM tablename
GROUP BY job
This question already has answers here:
How to get distinct rows in dataframe using pyspark?
(2 answers)
Closed 2 years ago.
I need to select 2 columns from a fact table (attached below). The problem I find is that for one of the columns I need unique values and for the other one I'm happy to have them duplicated as they below to a specific ticket id.
Fact table used:
df = (
spark.table(f'nn_table_{country}.fact_table')
.filter(f.col('date_key').between(start_date,end_date))
.filter(f.col('is_client_plus')==1)
.filter(f.col('source')=='tickets')
.filter(f.col('subtype')=='item_pm')
.filter(f.col('external_id')=='DISC0000077144 | DISC0000076895')
.filter(f.col('external_id').isNotNull())
.select('customer_id','external_id').distinct()
#.join(dim_promotions, 'external_id', 'left')
)
display(df)
As you can see, the select statement contains a customer_id and external_id column, where I'm only interested in get the unique customer_id.
.select('customer_id','external_id').distinct()
Desired output:
customer_id external_id
77000000505097070 DISC0000077144
77000002294023644 DISC0000077144
77000000385346302 DISC0000076895
77000000291101490 DISC0000076895
any idea about how to do that? or if it's possible?
Thanks in advance!
Use dropDuplicates:
df.select('customer_id','external_id').dropDuplicates(['customer_id'])
This question already has answers here:
Get statistics for each group (such as count, mean, etc) using pandas GroupBy?
(9 answers)
Closed 3 years ago.
I am trying to calculate sum of column_name "Email" and trying to groupby column_name "country". The error is in converting str to int
Script:
df3 = pd.read_csv('hi.csv',sep = ',')
df3['h1'] = df3['h1'].astype(int)
new_df3=df3.groupby(['country']).h1.sum().to_frame('count').reset_index()
print(new_df3)
Input:
cou h1
A hi
C watsup
G hi
try this
df3.groupby('country')['Email1'].count()
there is no need to convert email to int just counting emails should do the work.
This question already has answers here:
SQL Query to concatenate column values from multiple rows in Oracle
(10 answers)
Closed 4 years ago.
I have a very simple query:
select date,route,employee
from information
where date=Trunc(Sysdate)
however, since for some routes, there are more than 2 employees are assigned, so the query will return two rows
But I want get one route for one row, so the ideal output should be:
so the two names are in the same row, and combine with "|", so how can I achieve this goal in PL/SQL?
You can use listagg function, but you have to add Date and Route to grouping functions as well
SELECT LISTAGG(emp, ' | ')
WITHIN GROUP (ORDER BY emp) "Emp",
MAX(date) "Date",
MAX(route) "Route"
FROM information
WHERE date=Trunc(Sysdate);
This question already has answers here:
Sliding certain records to the end of a run of the same date
(3 answers)
Closed 5 years ago.
For example, I have a table in Oracle db where the values in a column are:
A
B
C
D
I would like to fetch the following output from the table(here, B,C,D are in alphabetical order and A is put to the last):
B
C
D
A
Note: They have a unique key with the column if that helps.
Have a case expression in the ORDER BY to put A rows at the end:
order by case when columnname = 'A' then 1 else 0 end, columnname