SQL Compare 2 tables and show only non-matching data - sql

I know this has been ask quite a few times but I didn't find the answer my problem. I am trying to compare two tables and see which messages has been read by customers (cust_id).
The message_details is split into 4 categories (all, none, single and trade)
It seems to work fine apart from when the condition in message_details is set to all. This will always display even if the customer has read the message.
I hope someone can help.
I have 2 tables
message_details
id
date
subject
message
condition
1
2022-01-18
Testing
This is a test to all people
all
2
2022-01-19
To all single
This is a single test
single
3
2022-01-20
To all none
This is a None test
none
4
2022-01-21
To all trade
This is a Trade test
trade
5
2022-01-19
To all single
This is a single test 2
single
6
2022-01-19
To all single
This is a single test 3
single
message_read
id
date
cust_id
message_id
condition
1
2022-01-18
283
1
read
2
2022-01-21
283
2
read
3
2022-01-18
283
5
read
4
2022-01-21
283
6
read
5
2022-01-18
211
1
read
6
2022-01-21
211
2
read
7
2022-01-18
211
5
read
8
2022-01-21
211
6
read
9
2022-01-18
213
1
read
10
2022-01-21
213
2
read
11
2022-01-18
213
5
read
12
2022-01-21
213
6
read
I am using the following code to check which cust_id has read the message_details.
SELECT
id, date, subject, message, condition
FROM
message_details
WHERE
NOT EXISTS
(
SELECT message_id, cust_id FROM message_read
WHERE message_read.cust_id = '283' AND
message_read.message_id = message_details.id
)
AND
message_details.condition = 'single'
OR
message_details.condition = 'all'
This seems to work (not sure it's the correct way of doing it) but doesn't hide id 1 from message_details even though cust_id = 283 has read it.

Related

count number of records by month over the last five years where record date > select month

I need to show the number of valid inspectors we have by month over the last five years. Inspectors are considered valid when the expiration date on their certification has not yet passed, recorded as the month end date. The below SQL code is text of the query to count valid inspectors for January 2017:
SELECT Count(*) AS RecordCount
FROM dbo_Insp_Type
WHERE (dbo_Insp_Type.CERT_EXP_DTE)>=#2/1/2017#);
Rather than designing 60 queries, one for each month, and compiling the results in a final table (or, err, query) are there other methods I can use that call for less manual input?
From this sample:
Id
CERT_EXP_DTE
1
2022-01-15
2
2022-01-23
3
2022-02-01
4
2022-02-03
5
2022-05-01
6
2022-06-06
7
2022-06-07
8
2022-07-21
9
2022-02-20
10
2021-11-05
11
2021-12-01
12
2021-12-24
this single query:
SELECT
Format([CERT_EXP_DTE],"yyyy/mm") AS YearMonth,
Count(*) AS AllInspectors,
Sum(Abs([CERT_EXP_DTE] >= DateSerial(Year([CERT_EXP_DTE]), Month([CERT_EXP_DTE]), 2))) AS ValidInspectors
FROM
dbo_Insp_Type
GROUP BY
Format([CERT_EXP_DTE],"yyyy/mm");
will return:
YearMonth
AllInspectors
ValidInspectors
2021-11
1
1
2021-12
2
1
2022-01
2
2
2022-02
3
2
2022-05
1
0
2022-06
2
2
2022-07
1
1
ID
Cert_Iss_Dte
Cert_Exp_Dte
1
1/15/2020
1/15/2022
2
1/23/2020
1/23/2022
3
2/1/2020
2/1/2022
4
2/3/2020
2/3/2022
5
5/1/2020
5/1/2022
6
6/6/2020
6/6/2022
7
6/7/2020
6/7/2022
8
7/21/2020
7/21/2022
9
2/20/2020
2/20/2022
10
11/5/2021
11/5/2023
11
12/1/2021
12/1/2023
12
12/24/2021
12/24/2023
A UNION query could calculate a record for each of 50 months but since you want 60, UNION is out.
Or a query with 60 calculated fields using IIf() and Count() referencing a textbox on form for start date:
SELECT Count(IIf(CERT_EXP_DTE>=Forms!formname!tbxDate,1,Null)) AS Dt1,
Count(IIf(CERT_EXP_DTE>=DateAdd("m",1,Forms!formname!tbxDate),1,Null) AS Dt2,
...
FROM dbo_Insp_Type
Using the above data, following is output for Feb and Mar 2022. I did a test with Cert_Iss_Dte included in criteria and it did not make a difference for this sample data.
Dt1
Dt2
10
8
Or a report with 60 textboxes and each calls a DCount() expression with criteria same as used in query.
Or a VBA procedure that writes data to a 'temp' table.

Pandas: to get mean for each data category daily [duplicate]

I am a somewhat beginner programmer and learning python (+pandas) and hope I can explain this well enough. I have a large time series pd dataframe of over 3 million rows and initially 12 columns spanning a number of years. This covers people taking a ticket from different locations denoted by Id numbers(350 of them). Each row is one instance (one ticket taken).
I have searched many questions like counting records per hour per day and getting average per hour over several years. However, I run into the trouble of including the 'Id' variable.
I'm looking to get the mean value of people taking a ticket for each hour, for each day of the week (mon-fri) and per station.
I have the following, setting datetime to index:
Id Start_date Count Day_name_no
149 2011-12-31 21:30:00 1 5
150 2011-12-31 20:51:00 1 0
259 2011-12-31 20:48:00 1 1
3015 2011-12-31 19:38:00 1 4
28 2011-12-31 19:37:00 1 4
Using groupby and Start_date.index.hour, I cant seem to include the 'Id'.
My alternative approach is to split the hour out of the date and have the following:
Id Count Day_name_no Trip_hour
149 1 2 5
150 1 4 10
153 1 2 15
1867 1 4 11
2387 1 2 7
I then get the count first with:
Count_Item = TestFreq.groupby([TestFreq['Id'], TestFreq['Day_name_no'], TestFreq['Hour']]).count().reset_index()
Id Day_name_no Trip_hour Count
1 0 7 24
1 0 8 48
1 0 9 31
1 0 10 28
1 0 11 26
1 0 12 25
Then use groupby and mean:
Mean_Count = Count_Item.groupby(Count_Item['Id'], Count_Item['Day_name_no'], Count_Item['Hour']).mean().reset_index()
However, this does not give the desired result as the mean values are incorrect.
I hope I have explained this issue in a clear way. I looking for the mean per hour per day per Id as I plan to do clustering to separate my dataset into groups before applying a predictive model on these groups.
Any help would be grateful and if possible an explanation of what I am doing wrong either code wise or my approach.
Thanks in advance.
I have edited this to try make it a little clearer. Writing a question with a lack of sleep is probably not advisable.
A toy dataset that i start with:
Date Id Dow Hour Count
12/12/2014 1234 0 9 1
12/12/2014 1234 0 9 1
12/12/2014 1234 0 9 1
12/12/2014 1234 0 9 1
12/12/2014 1234 0 9 1
19/12/2014 1234 0 9 1
19/12/2014 1234 0 9 1
19/12/2014 1234 0 9 1
26/12/2014 1234 0 10 1
27/12/2014 1234 1 11 1
27/12/2014 1234 1 11 1
27/12/2014 1234 1 11 1
27/12/2014 1234 1 11 1
04/01/2015 1234 1 11 1
I now realise I would have to use the date first and get something like:
Date Id Dow Hour Count
12/12/2014 1234 0 9 5
19/12/2014 1234 0 9 3
26/12/2014 1234 0 10 1
27/12/2014 1234 1 11 4
04/01/2015 1234 1 11 1
And then calculate the mean per Id, per Dow, per hour. And want to get this:
Id Dow Hour Mean
1234 0 9 4
1234 0 10 1
1234 1 11 2.5
I hope this makes it a bit clearer. My real dataset spans 3 years with 3 million rows, contains 350 Id numbers.
Your question is not very clear, but I hope this helps:
df.reset_index(inplace=True)
# helper columns with date, hour and dow
df['date'] = df['Start_date'].dt.date
df['hour'] = df['Start_date'].dt.hour
df['dow'] = df['Start_date'].dt.dayofweek
# sum of counts for all combinations
df = df.groupby(['Id', 'date', 'dow', 'hour']).sum()
# take the mean over all dates
df = df.reset_index().groupby(['Id', 'dow', 'hour']).mean()
You can use the groupby function using the 'Id' column and then use the resample function with how='sum'.

SQL query that I have set up the algorithm but cannot write the code

I could not find keywords to describe in the title.
I have a problem and I just can explain with example, I have a table like this
user_id | transaction_id | bonus_id | created_at
1. 1 4 2021-05-01
1 3 65 2021-05-01
1 4 4 2021-05-02
1 1 5 2021-05-02
1. 3 76. 2021-05-03
1 2 5 2021-05-03
Due to a mistake I made in php here, transaction id 3 and bonus id 65 but the bonus id 4 that should be
I need to replace all transactions from transaction type 1 to the next transaction type 1 with the bonus id of the first transaction_type_1.
but of course I have to do this for every user. How can I do that?

query to pull up only those records where there is no corresponding negative value in a column

I have a table with close to 500k records. This is a place where users will add a particular record, in this case it's a coupon. At the point it's added, it has a positive '1' value, however, when this coupon gets used it get's a '-1' value. There's is also a scenario where if the coupon that was used gets returned, it gets voided in my system, and that changes the value to '1'.
here's a simplified structure:
CouponID CouponVendorID CouponValue CouponQty Action Barcode
1 117 25.00 1 Add 11112
2 117 -25.00 -1 Use 11112
3 117 25.00 1 Void 11112
4 117 17.00 1 Add 33331
5 117 90.00 1 Add 44441
6 117 5.00 1 Add 42424
7 117 -5.00 -1 Use 42424
So what I'm trying to do is find all cards (CouponID will do), for where I still have a valid coupon.
I am able to get the correct COUNT of valid coupons by taking the Sum(couponQty) with this query
select 'Avail'= sum(couponQty) from tblA where CouponVendorID = 117
However, I'm having a hard time find the detail records showing me which CouponID's are actually included in the COUNT of valid coupons. Any idea?
Desired end result:
CouponID CouponVendorID CouponValue CouponQty Action Barcode
1 117 25.00 1 Add 11112
4 117 17.00 1 Add 33331
5 117 90.00 1 Add 44441
CouponID = 1, was Added, then Used, then VOIDED, so it becomes still valid. CouponID = 4 was only added, and never used, same with CouponID =5.
Here's what I came up with seems to do the job.
SELECT CouponVendorId, Barcode, 'CpnQty' = Sum(CouponQty)From tblA
GROUP BY CouponVendorId, Barcode
HAVING CouponVendorId = 117 AND SUM(CouponQty) > 0

Access SQL - Select only the last sequence

I have a table with an ID and multiple informative columns. Sometimes however, I can have multiple data for an ID, so I added a column called "Sequence". Here is a shortened example:
ID Sequence Name Tel Date Amount
124 1 Bob 873-4356 2001-02-03 10
124 2 Bob 873-4356 2002-03-12 7
124 3 Bob 873-4351 2006-07-08 24
125 1 John 983-4568 2007-02-01 3
125 2 John 983-4568 2008-02-08 13
126 1 Eric 345-9845 2010-01-01 18
So, I would like to obtain only these lines:
124 3 Bob 873-4351 2006-07-08 24
125 2 John 983-4568 2008-02-08 13
126 1 Eric 345-9845 2010-01-01 18
Anyone could give me a hand on how I could build a SQL query to do this ?
Thanks !
You can calculate the maximum sequence using group by. Then you can use join to get only the maximum in the original data.
Assuming your table is called t:
select t.*
from t join
(select id, MAX(sequence) as maxs
from t
group by id
) tmax
on t.id = tmax.id and
t.sequence = tmax.maxs