Pandas/SQL join - sql

I would like to add some data (event_date) from table B to table A, as described below. It looks like a join on event_id, however this column contains duplicate values in both tables. There are more columns in both tables but I'm omitting them for clarity.
How to achieve the desired effect in Pandas and in SQL in the most direct way?
Table A:
id,event_id
1,123
2,123
3,456
4,456
5,456
Table B:
id,event_id,event_date
11,123,2017-02-06
12,456,2017-02-07
13,123,2017-02-06
14,456,2017-02-07
15,123,2017-02-06
16,123,2017-02-06
Desired outcome (table A + event_date):
id,event_id,event_date
1,123,2017-02-06
2,123,2017-02-06
3,456,2017-02-07
4,456,2017-02-07
5,456,2017-02-07

Using merge, first drop duplicates from B
In [662]: A.merge(B[['event_id', 'event_date']].drop_duplicates())
Out[662]:
id event_id event_date
0 1 123 2017-02-06
1 2 123 2017-02-06
2 3 456 2017-02-07
3 4 456 2017-02-07
4 5 456 2017-02-07

SQL part:
select distinct a.*, b.event_date
from table_a a
join table_b b
on a.event_id = b.event_id;

You can use Pandas Merge to get the desired result. Finally get only the columns that you are interested from DataFrame
df_Final = pd.merge(df1,df2,on='event_id',how='left')
print df_Final[['id_y','event_id','event_date']]
output
id_y event_id event_date
0 1 123 2017-02-06
1 2 123 2017-02-06
2 3 456 2017-02-07
3 4 456 2017-02-07
4 5 456 2017-02-07
5 1 123 2017-02-06
6 2 123 2017-02-06
7 3 456 2017-02-07
8 4 456 2017-02-07
9 5 456 2017-02-07
10 1 123 2017-02-06
11 2 123 2017-02-06
12 1 123 2017-02-06
13 2 123 2017-02-06

Related

looking for values from another table where they do not exist in a given group

I have two tables:
SHOPPING
date
id_customer
id_shop
id_fruit
28.03.2018
7423
123
1
13.02.2019
8408
354
1
28.03.2019
7767
123
9
13.02.2020
8543
472
7
28.03.2020
8640
346
9
13.02.2021
7375
323
9
28.03.2021
7474
323
8
13.02.2022
7476
499
1
28.03.2022
7299
123
4
13.02.2023
8879
281
2
28.03.2023
8353
452
1
13.02.2024
8608
499
6
28.03.2024
8867
318
1
13.02.2025
7997
499
6
28.03.2025
7715
499
4
13.02.2026
7673
441
7
FRUITS
id_fruit
name
1
apple
2
pear
3
grape
4
banana
5
plum
6
melon
7
watermelon
8
orange
9
pineapple
I would like to find fruits that have never been bought in a specific id_shop
I tried with this:
SELECT
s.idshop,
s.id_fruit ,
f.name
FROM
shopping s
LEFT JOIN fruit f ON f.id_fruit = s.id_fruit
WHERE NOT EXISTS (
SELECT *
FROM
fruit f1
WHERE f1.id_fruit = s.id_fruit
)
but it does not work...
Yes, you need an OUTER JOIN, but that should be RIGHT JOIN along with NULL values picked from shopping table after join applied, considering your current query such as
SELECT f.*
FROM shopping s
RIGHT JOIN fruit f
ON f.id_fruit = s.id_fruit
WHERE s.id_fruit IS NULL
Demo

Creating a new calculated column in SQL

Is there a way to find the solution so that I need for 2 days, there are 2 UD's because there are June 24 2 times and for the rest there are single days.
I am showing the expected output here:
Primary key UD Date
-------------------------------------------
1 123 2015-06-24 00:00:00.000
6 456 2015-06-24 00:00:00.000
2 123 2015-06-25 00:00:00.000
3 658 2015-06-26 00:00:00.000
4 598 2015-06-27 00:00:00.000
5 156 2015-06-28 00:00:00.000
No of times Number of days
-----------------------------
4 1
2 2
The logic is 4 users are there who used the application on 1 day and there are 2 userd who used the application on 2 days
You can use two levels of aggregation:
select cnt, count(*)
from (select date, count(*) as cnt
from t
group by date
) d
group by cnt
order by cnt desc;

How do I change my SQL SELECT GROUP BY query to show me which records are missing a value?

I have a list of codes by area and type. I need to get the unique codes for each type, which I can do with a simple SELECT query with a GROUP BY. I now need to know which area does not have one of the codes. So how do I run a query to group by unique values and tell me how records do not have one of the values?
ID Area Type Code
1 10 A 123
2 10 A 456
3 10 B 789
4 10 B 987
5 10 C 654
6 10 C 321
7 20 A 123
8 20 B 789
9 20 B 987
10 20 C 654
11 20 C 321
12 30 A 137
13 30 A 456
14 30 B 579
15 30 B 789
16 30 B 987
17 30 C 654
18 30 C 321
I can run this query to group them by type and get get the unique codes:
SELECT tblExample.Type, tblExample.Code
FROM tblExample
GROUP BY tblExample.Type, tblExample.Code
This gives me this:
Type Code
A 123
A 137
A 456
B 579
B 789
B 987
C 321
C 654
Now I need to know which areas do not have a given code. For example, Code 123 does not appear for Area 10 and code 137 does not appear for codes 10 and 20. How do I write a query to give me that areas are missing a code? The format of the output doesn't matter, I just need to get the results. I'm thinking the results could be in one column or spread out in multiple columns:
Type Code Missing Areas or Missing1 Missing2
A 123 30 30
A 137 10, 20 10 20
A 456 20 20
B 579 10, 20 10 20
B 789
B 987
C 321
C 654
You can get a list of the missing code/areas by first generating all combinations and then filtering out the ones that exist:
select t.type, c.code
from (select distinct type from tblExample) t cross join
(select distinct code from tblExample) c left join
tblExample e
on t.type = e.type and c.code = e.code
where e.type is null;

Sqlite query for fetching latest Exam date time with distinct patientID

In Sqlite db I have a table: Examination with columns ExamID, InternalPID, ExamDateTime
ExamID InternalPID ExamDateTime (from left to right)
1 2 2015-03-11
2 1 2015-11-11
3 4 2015-05-01
4 6 2015-08-10
5 2 2015-04-22
6 1 2014-12-11
7 2 2015-03-12
the query output should be latest Examination date of each patient. i.e the InternalPID should be distinct with its latest ExamDateTime.
Expect output from query:
ExamID InternalPID ExamDateTime
5 2 2015-04-22
2 1 2015-11-11
3 4 2015-05-01
4 6 2015-08-10
Thank you in advance
You can do this using a join and aggregation or a clever where clause:
select e.*
from examination e
where e.ExamDateTime = (select max(e2.ExamDateTime)
from examination e2
where e2.patientid = e.patientid
);

SQL Syntax to group this data in SQL Server 2012

I have a table (called StayDate) which looks like this:
ResaID Date RoomCategory RateAmount
1 2014-09-01 A 125
1 2014-09-02 A 125
1 2014-09-03 B 140
2 2014-09-04 A 125
2 2014-09-05 A 125
2 2014-09-06 A 125
2 2014-09-07 C 160
2 2014-09-08 C 160
The output from the SQL syntax I'm after need to look like this:
ResaID Count RoomCategory RateAmount
1 2 A 125
1 1 B 140
2 3 A 125
2 2 C 160
Can anyone help with the SQL syntax needed to summarize the data as above?
A way to do this without a GROUP BY:
SELECT DISTINCT ResaID, COUNT(*) OVER (PARTITION BY ResaID, RoomCategory, RateAmount) Count, RoomCategory, RateAmount
FROM StayDate