Left Join Not Joining with a Single Record - sql

I have the following query:
Insert into cet_database.dbo.termData
(
termID,
studentID,
course,
[current],
program,
StbyCurrentClassID,
class,
classCode,
cancelled
)
Select
fm_stg.classByStudent_termData_assessmentData.termID,
fm_stg.classByStudent_termData_assessmentData.studentID,
fm_stg.classByStudent_termData_assessmentData.class_code,
case when fm_stg.classByStudent_termData_assessmentData.[current] = 'Yes' then 1 else 0 end,
fm_stg.classByStudent_termData_assessmentData.program,
fm_stg.classByStudent_termData_assessmentData.classByStudentID,
fm_stg.classByStudent_termData_assessmentData.class,
fm_stg.classByStudent_termData_assessmentData.classID,
case when fm_stg.classByStudent_termData_assessmentData.cancelled_flag = 1 then 1 else 0 end
From fm_stg.classByStudent_termData_assessmentData left outer join termData
On fm_stg.classByStudent_termData_assessmentData.class_code = termData.course
and fm_stg.classByStudent_termData_assessmentData.termID = termData.termID
and fm_stg.classByStudent_termData_assessmentData.studentID = fm_stg.classByStudent_termData_assessmentData.studentID
Where termData.StbyCurrentClassID is null
I use the query to import data into a staging table from another database (fm_stg.classByStudent_termData_assessmentData) before importing it into my database's tables. This particular query is part of a larger stored procedure that imports data into multiple tables related to termData.
When I run the sproc, I get the record inserted into fm_stg.classByStudent_termData_assessmentData but not into termData. I am only inserting one record when having this problem, but it works for the 10,000 records I did previously. I use the left join to establish what already exists in my database's table and what doesn't, then take the relevant records from the staging table. However, with this record:
316a, 39520, DEC 10, Yes, DEC10, 105713, DEC 10 (18), 6078, NULL, 2
The select returns nothing - why is this? The record definitely doesn't exist in my termData table and records insert into all my other tables from the staging table. The sproc is running all of the inserts in a transaction so as to avoid precisely this scenario where records are inserted in some tables and not others, but it doesn't seem to be working.

You say the query worked for the previous 10,000 records, but doesn't for the current one. The only thing that looks strange in your query is the third line in your ON clause where you compare a field (the studentID) with itself.
On fm_stg.classByStudent_termData_assessmentData.class_code = termData.course
and fm_stg.classByStudent_termData_assessmentData.termID = termData.termID
and fm_stg.classByStudent_termData_assessmentData.studentID = fm_stg.classByStudent_termData_assessmentData.studentID
I am just guessing here, but as this line is in the ON clause, did you want to compare the student ID, too? So it may be you were just lucky the query worked so far and now you stumble upon the student ID. I suppose the ON clause should look like this:
On fm_stg.classByStudent_termData_assessmentData.class_code = termData.course
and fm_stg.classByStudent_termData_assessmentData.termID = termData.termID
and fm_stg.classByStudent_termData_assessmentData.studentID = termData.studentID
By the way, queries get more readable by using table aliases. In the following query I use ad for fm_stg.classByStudent_termData_assessmentData and td for termData:
Insert into cet_database.dbo.termData
(
termID,
studentID,
course,
[current],
program,
StbyCurrentClassID,
class,
classCode,
cancelled
)
Select
ad.termID,
ad.studentID,
ad.class_code,
case when ad.[current] = 'Yes' then 1 else 0 end,
ad.program,
ad.classByStudentID,
ad.class,
ad.classID,
case when ad.cancelled_flag = 1 then 1 else 0 end
From fm_stg.classByStudent_termData_assessmentData ad
Left Outer Join termData td On ad.class_code = td.course
And ad.termID = td.termID
And ad.studentID = td.studentID
Where td.StbyCurrentClassID is null;
Moreover when checking for existence, why do you use the anti-join trick? Did you have issues with a straight-forward NOT EXISTS? Use tricks only when really needed. The query reads better as follows:
Insert into cet_database.dbo.termData
(
termID,
studentID,
course,
[current],
program,
StbyCurrentClassID,
class,
classCode,
cancelled
)
Select
termID,
studentID,
class_code,
case when [current] = 'Yes' then 1 else 0 end,
program,
classByStudentID,
class,
classID,
case when cancelled_flag = 1 then 1 else 0 end
From fm_stg.classByStudent_termData_assessmentData ad
Where Not Exists
(
Select *
From termData td
Where ad.class_code = td.course
And ad.termID = td.termID
And ad.studentID = td.studentID
);
With another DBMS you could even have used NOT IN (i.e. Where (class_code, termId, studenId) Not In (Select ...)) which is not correlated so such a typo as yours could not even have occurred, but SQL Server doesn't feature tuples in the IN clause unfortunately.

Related

Validate my interpretation of an SQL query

my question is definitely going to be a little different, so I hope I'm still adhering to the stack overflow question etiquette. With that in mind, I'll get straight to the point.
Essentially, since I am still learning SQL I was looking at examples of scheduled queries in GCP and came across something and I wanted to see if I understand what's going on. So I took the query and wrote some comments explaining what I think the lines in the query are doing. The context in the code itself is irrelevant, I'm more curious if I'm correctly understanding what each of the clauses is doing.
Would anyone be able to tell me if I am interpreting it correctly or if I misunderstood some stuff, based on my comments? The code and comments are below. Note that the comments come first and the queries I'm commenting on follow directly after.
-- Create temporary table with the subquery below via the WITH () clause
-- Table contains session date, which webpage, total sessions, total sessions with a logout, and total clicks
-- The data in this temporary table is coming from the `gcp-project-223467.web.top_level` table in BigQuery
-- The columns correspond to dates 01/01/2022 & onwards, and exclude the 'Home'and 'Team' pages
-- The resulting data in the temp table is grouped by date & page type (first and second columns of the resulting temp table)
WITH logins AS (
SELECT
session_date as date,
website_page as page,
SUM(sessions) AS sessions,
SUM(sessions_with_logout) AS logouts,
SUM(clicks) AS clicks
FROM `gcp-project-223467.web.top_level`
WHERE DATE_session >= "2022-01-01"
AND website_page NOT IN ('Home','Team')
AND clicks > 0
GROUP BY 1, 2
)
-- Select the data from the above subquery (via SELECT logins.*)
-- Left join another temp table with data coming from `ingka-web-analytics-prod.web_data.transactions` in BigQuery
-- Left join is being done according to the logins & login_days date_hit AND logins & login_days ´logins_web´ columns.
-- The specific data taken from the aforementioned BQ table is aggregated and filtered via CASE WHEN - THEN statements
-- Further conditions are specified via the WHERE statements
-- The resulting temporary table in the subquery under LEFT JOIN is named login_days.
-- The columns in the select statement before the left join (web logins, mobile logins etc)
-- are from the temporary table in the select statement under the left join statement
SELECT
logins.*,
logins_web,
mobile_logins,
logins_ios,
logins_android,
logins_final
FROM logins
LEFT JOIN (
SELECT
date_hit as date,
website_page as page,
SUM(CASE WHEN login_type = 'web' THEN SAFE_CAST(count_logins_final AS INT64) END ) AS logins_web,
COUNT(DISTINCT CASE WHEN login_type = 'mobile' THEN login_id END ) AS mobile_logins,
SUM(CASE WHEN login_type = 'ipad' THEN SAFE_CAST(count_logins_final AS INT64) END ) AS logins_ios,
COUNT(DISTINCT CASE WHEN login_type = 'android' THEN login_id END ) AS logins_android,
COUNT(DISTINCT login_id) AS logins_final,
FROM `gcp-project-223467.web.login_data`
WHERE date_hit >= "2022-01-01" AND website_page NOT IN ('Home','Team')
AND count_logins_final != 'NaN'
AND count_logins_final NOT LIKE '%,%'
AND count_logins_final > '0'
AND website_platform != 'ibes'
AND login_type = 'Successful'
GROUP BY 1, 2
)login_days
ON logins.date = login_days.date AND logins.page = login_days.page
WHERE sessions_with_logout > 0

SQL SSMS IF THEN with multiple criteria across same field

I need to pull a transaction record from a table if it is type 'C' and has a record post time greater than or equal to the post time for a record with type 'W' where the account numbers and post date are the same. I am struggling with creating an if/then where the posttime for type 'C' >= posttime for type 'W'... any help would be appreciated. I've done these types before but never for the same field where only one record item is different.
This would be the typical method using exists:
select * from transactions t
where t.actioncode = 'C' and exists (
select 1 from transactions t2
where t2.account_num = t.account_num and t2.postdate = t1.postdate
and t2.actioncode = 'W'
and t2.posttime < t1.posttime
)
If I understand you correctly, what you describe can be accomplished through JOINS.
Think relational data sets and SARGs.
While you still have not given us a table structure (which helps enormously), the solution can help steer you in the right direction. The following assumes a FACT table of TRANSACTIONS, where the carnality to itself is M:M
SELECT TOP 1000 A.ACTIONCODE, A.TRAN_RECORD --, any other needed columns
FROM TRANSACTIONS A
INNER JOIN (SELECT ACTIONCODE, POSTTIME, ACCOUNT_NUM, POSTDATE
FROM TRANSACTIONS
WHERE ACTIONCODE = 'W') B ON A.ACCOUNT_NUM = B.ACCOUNT_NUM
AND A.POSTDATE = B.POSTDATE
WHERE A.ACTIONCODE = 'C'
AND A.POSTTIME >= B.POSTTIME
UPDATED: I accidently forgot to include the correct number of columns. Always specify the same columns (or * if you do not care) that you will be using in your INNER JOIN.
Regardless, we optimize the query by only returning results that we will be using or seeing in our query.
This is what I had originally, but it just churned in SSMS without results. Essentially, I just need all 'C' type records returned where there is a 'W' type record with a posttime less than the 'C', but where the account numbers and postdate for the record are the same. posttime, postdate, type, and number are all fields in my table.
SELECT *
FROM TRANSACTIONS
WHERE ACTIONCODE = 'C' AND POSTTIME >= POSTTIME AND ACTIONCODE = 'W'

teradata case when issue

I have the following queries which are supposed to give the same result, but drastically different
1.
select count(*)
from qigq_sess_parse_2
where str_vendor = 'natural search' and str_category is null and destntn_url = 'http://XXXX.com';
create table qigq_test1 as
(
select case
when (str_vendor = 'natural search' and str_category is null and destntn_url = 'http://XXXX.com' ) then 1
else 0
end as m
from qigq_sess_parse_2
) with data;
select count(*) from qigq_test1 where m = 1;
the first block gives a total number of count 132868, while the second one only gives 1.
What are the subtle parts in the query that causes this difference?
Thanks
When you create a table in Teradata, you can specify it to be SET or MULTISET. If you don't specify, it defaults to SET. A set table cannot contain duplicates. So at most, your new table will contain two rows, a 0 and a 1, since that's all that can come from your case statement.
EDIT:
After a bit more digging, the defaults aren't quite that simple. But in any case, I suspect that if you add the MULTISET option to your create statement, you'll see the behavior your expect.
My guess would be that your Create Table statement is only pulling in one row of data that fits the parameters for the following Count statement. Try this instead:
CREATE TABLE qigq_test1 (m integer);
INSERT INTO qigq_test1
SELECT
CASE
WHEN (str_vendor = 'natural search' and str_category IS NULL AND destntn_url = 'http://XXXX.com' ) THEN 1
ELSE 0
END AS m
FROM qigq_sess_parse_2;
SELECT COUNT(*) FROM qigq_test1 WHERE m = 1;
This should pull ALL ROWS of data from qigq_sess_parse_2 into qigq_test1 as either a 0 or 1.

SELECT-CASE-IN-SELECT error: [SQL0115] Comparison operator IN not valid. In query db2

i have a problem in a db2 query
I tried run this query
SELECT t.* ,
CASE WHEN column in (SELECT data FROM otherTable WHERE conditions...)
then 5
else 0
end as 'My new data'
FROM table t
WHERE conditions....
But get error
[Error Code: -115, SQL State: 42601] [SQL0115] Comparison operator IN not valid.
When i change the sub-query to where statement like this
SELECT t.*
FROM table t
WHERE column in (SELECT data FROM otherTable WHERE conditions...)
Works fine
Why not work in the case statement? It is a limitation of db2?
And could make an equivalent behavior?
One way to do this is to left join to the table and check if it is not null.
In most cases this will be the fastest way because SQL servers are optimized to perform joins very quickly (but will depend on a number of factors including data model, indexes, data size, etc).
Like this:
SELECT t.* ,
CASE WHEN othertable.data is not null
then 5
else 0
end as 'My new data'
FROM table t
left join otherTable ON otherTable.column = data
WHERE conditions....
Try with using exists condition as below (put the column value in the where clause of subquery) :
SELECT t.* ,
CASE WHEN exists (SELECT data FROM otherTable WHERE conditions... and column=val)
then 5
else 0
end as 'My new data'
FROM table t
WHERE conditions....

Writing a single UPDATE statement that prevents duplicates

I've been trying for a few hours (probably more than I needed to) to figure out the best way to write an update sql query that will dissallow duplicates on the column I am updating.
Meaning, if TableA.ColA already has a name 'TEST1', then when I'm changing another record, then I simply can't pick a value for ColA to be 'TEST1'.
It's pretty easy to simply just separate the query into a select, and use a server layer code that would allow conditional logic:
SELECT ID, NAME FROM TABLEA WHERE NAME = 'TEST1'
IF TableA.recordcount > 0 then
UPDATE SET NAME = 'TEST1' WHERE ID = 1234
END IF
But I'm more interested to see if these two queries can be combined into a single query.
I am using Oracle to figure things out, but I'd love to see a SQL Server query as well. I figured a MERGE statement can work, but for obvious reasons you can't have the clause:
..etc.. WHEN NOT MATCHED UPDATE SET ..etc.. WHERE ID = 1234
AND you can't update a column if it's mentioned in the join (oracle limitation but not limited to SQL Server)
ALSO, I know you can put a constraint on a column that prevents duplicate values, but I'd be interested to see if there is such a query that can do this without using constraint.
Here is an example start-up attempt on my end just to see what I can come up with (explanations on it failed is not necessary):
ERROR: ORA-01732: data manipulation operation not legal on this view
UPDATE (
SELECT d.NAME, ch.NAME FROM (
SELECT 'test1' AS NAME, '2722' AS ID
FROM DUAL
) d
LEFT JOIN TABLEA a
ON UPPER(a.name) = UPPER(d.name)
)
SET a.name = 'test2'
WHERE a.name is null and a.id = d.id
I have tried merge, but just gave up thinking it's not possible. I've also considered not exists (but I'd have to be careful since I might accidentally update every other record that doesn't match a criteria)
It should be straightforward:
update personnel
set personnel_number = 'xyz'
where person_id = 1001
and not exists (select * from personnel where personnel_number = 'xyz');
If I understand correctly, you want to conditionally update a field, assuming the value is not found. The following query does this. It should work in both SQL Server and Oracle:
update table1
set name = 'Test1'
where (select count(*) from table1 where name = 'Test1') > 0 and
id = 1234