Validate my interpretation of an SQL query - sql

my question is definitely going to be a little different, so I hope I'm still adhering to the stack overflow question etiquette. With that in mind, I'll get straight to the point.
Essentially, since I am still learning SQL I was looking at examples of scheduled queries in GCP and came across something and I wanted to see if I understand what's going on. So I took the query and wrote some comments explaining what I think the lines in the query are doing. The context in the code itself is irrelevant, I'm more curious if I'm correctly understanding what each of the clauses is doing.
Would anyone be able to tell me if I am interpreting it correctly or if I misunderstood some stuff, based on my comments? The code and comments are below. Note that the comments come first and the queries I'm commenting on follow directly after.
-- Create temporary table with the subquery below via the WITH () clause
-- Table contains session date, which webpage, total sessions, total sessions with a logout, and total clicks
-- The data in this temporary table is coming from the `gcp-project-223467.web.top_level` table in BigQuery
-- The columns correspond to dates 01/01/2022 & onwards, and exclude the 'Home'and 'Team' pages
-- The resulting data in the temp table is grouped by date & page type (first and second columns of the resulting temp table)
WITH logins AS (
SELECT
session_date as date,
website_page as page,
SUM(sessions) AS sessions,
SUM(sessions_with_logout) AS logouts,
SUM(clicks) AS clicks
FROM `gcp-project-223467.web.top_level`
WHERE DATE_session >= "2022-01-01"
AND website_page NOT IN ('Home','Team')
AND clicks > 0
GROUP BY 1, 2
)
-- Select the data from the above subquery (via SELECT logins.*)
-- Left join another temp table with data coming from `ingka-web-analytics-prod.web_data.transactions` in BigQuery
-- Left join is being done according to the logins & login_days date_hit AND logins & login_days ´logins_web´ columns.
-- The specific data taken from the aforementioned BQ table is aggregated and filtered via CASE WHEN - THEN statements
-- Further conditions are specified via the WHERE statements
-- The resulting temporary table in the subquery under LEFT JOIN is named login_days.
-- The columns in the select statement before the left join (web logins, mobile logins etc)
-- are from the temporary table in the select statement under the left join statement
SELECT
logins.*,
logins_web,
mobile_logins,
logins_ios,
logins_android,
logins_final
FROM logins
LEFT JOIN (
SELECT
date_hit as date,
website_page as page,
SUM(CASE WHEN login_type = 'web' THEN SAFE_CAST(count_logins_final AS INT64) END ) AS logins_web,
COUNT(DISTINCT CASE WHEN login_type = 'mobile' THEN login_id END ) AS mobile_logins,
SUM(CASE WHEN login_type = 'ipad' THEN SAFE_CAST(count_logins_final AS INT64) END ) AS logins_ios,
COUNT(DISTINCT CASE WHEN login_type = 'android' THEN login_id END ) AS logins_android,
COUNT(DISTINCT login_id) AS logins_final,
FROM `gcp-project-223467.web.login_data`
WHERE date_hit >= "2022-01-01" AND website_page NOT IN ('Home','Team')
AND count_logins_final != 'NaN'
AND count_logins_final NOT LIKE '%,%'
AND count_logins_final > '0'
AND website_platform != 'ibes'
AND login_type = 'Successful'
GROUP BY 1, 2
)login_days
ON logins.date = login_days.date AND logins.page = login_days.page
WHERE sessions_with_logout > 0

Related

How to view different data depending on the value for SQL

How can I make the values show up differently?
Example:
I have a table 'Feelings' with a column called 'Happy'
If I select * from 'Happy' it will bring back values 1, 2, 3 (with user IDs to show which user is which feeling)
1 stands for Yes, 2 stands for no, 3 stands for maybe
I want the table to not show 1,2,3 but instead show yes,no,maybe
How would I go about to making this?
CASE HAPPY
WHEN '1' THEN 'Yes'
WHEN '2' THEN 'No'
WHEN '3' THEN 'Maybe'
ELSE 'Other'
END AS 'Happy'
While you can use a CASE as other answers here you can also use a join -- to a table of lookup values or to a CTE, or VALUES clause. Even a temp table in a stored procedure.
That would look like this:
SELECT user_id, COALESCE(lookup.value,'Unknown') as happy
FROM feelings
LEFT JOIN (
VALUES
('1','Yes'),
('2','No'),
('3','Maybe')
) AS lookup(k,value) ON feelings.happy = lookup.k
Using a join and storing related information in a table is a much more SQL way of doing things and offers many benefits including easier maintenance and often faster execution.
If you just have a table called Feelings and a column called Happy with just IDs, you can try the following query:
SELECT
user_id,
CASE HAPPY
WHEN '1' THEN 'Yes'
WHEN '2' THEN 'No'
WHEN '3' THEN 'Maybe'
ELSE 'Other'
END AS 'feeling_happy'
FROM FEELINGS
Otherwise, if you have 2 tables, one called Feelings with column Happy that has the IDs and another table called Happy with the IDs in one column and the defined statuses in another column (which we'll call happy_x), then you can simple use the following query:
SELECT
f.user_id, h.happy_x
FROM Feelings f
INNER JOIN Happy AS h
ON h.id = f.happy

Oracle SQL not bringing back duplicates to an Oracle Form 10g

I've created an Oracle SQL query which links to about five tables I'm using in an Oracle FROM clause to an Oracle Form but the problem with the query is that some records are duplicated, so I only want to show one line in the form and not any duplicate records. I've tried GROUP BY and PARTITION BY statements but the query becomes to slow with adding this into the statement.
I'm now thinking of doing this as a procedure and bring back just one of the duplicates if any occur. Would it be best to bring back an ORACLE table of records from the database into the form? How would it be best to look for a duplicate in an Oracle PL/SQL loop?
I've updated the question and adde the full query below to explain it better. The surr_id the first column in select statement below is unique but what I want to show in the Oracle form is the production number along with the other columns which are not unique. There can be duplicates of production number and even sometimes three production number records the same. Hope this helps. I was thinking of putting this in a loop and just grabbing the first production number and then only bringing back each record when the production number changes.
select x.surr_id ,
x.supplier_name as supplier ,
x.broadcaster_name as broadcaster ,
ptle.title as production_title ,
x.production_number as production_number ,
stle.title as series_title ,
x.production_source as supplied_source_ind ,
x.third_party_group_id ,
x.bro_broadcast_by_tp_surr_id ,
x.station_id from (select usage_headers.surr_id as surr_id ,
broad_supp.supplier_name as supplier_name ,
broad_supp.broadcaster_name as broadcaster_name ,
usage_headers.production_number as production_number ,
productions.production_source as production_source ,
broad_supp.station_id as station_id ,
usage_headers.prod_exploitation_cre_surr_id as prod_exploitation_cre_surr_id ,
usage_headers.bro_broadcast_by_tp_surr_id as bro_broadcast_by_tp_surr_id ,
productions.cre_surr_id as cre_surr_id ,
productions.prod_series_cre_surr_id as prod_series_cre_surr_id ,
broad_supp.third_party_group_id as third_party_group_id
from usage_headers, productions, (SELECT /*+ index (bro bro_pk) */
third_party.surr_id AS THIRD_PARTY_SURR_ID,
third_party.supplier_group_id AS THIRD_PARTY_GROUP_ID,
third_party.dn_root_tp_surr_id AS THIRD_PARTY_ROOT_ID,
third_party.supplier_name, bro.station_id AS STATION_ID,
bro.dn_tp_name AS BROADCASTER_NAME FROM ( SELECT tp.surr_id,
tp.name AS supplier_name,
tp.tp_surr_id AS supplier_group_id,
tp.dn_root_tp_surr_id FROM third_parties tp
CONNECT BY PRIOR tp.surr_id = tp.tp_surr_id
START WITH tp.surr_id IN (4251, 4247, 4237, 4034, 10157, 14362, 9834)) third_party
JOIN broadcasters bro ON (third_party.surr_id = bro.tp_surr_id)) broad_supp
where broad_supp.THIRD_PARTY_SURR_ID = usage_headers.bro_broadcast_by_tp_surr_id
AND usage_headers.prod_exploitation_cre_surr_id = productions.cre_surr_id
and usage_headers.prod_exploitation_cre_surr_id IS NOT NULL
and usage_headers.right_type in ('M','B')
AND usage_headers.udg_surr_id IS NOT NULL
AND NVL(usage_headers.dn_uls_usage_status,'3') NOT IN ('9', '11')
AND productions.production_source <> 'AP') x
LEFT OUTER JOIN titles ptle ON ( ptle.cre_surr_id = x.cre_surr_id AND ptle.tt_code = 'R')
LEFT OUTER JOIN titles stle ON ( stle.cre_surr_id = x.prod_series_cre_surr_id AND stle.tt_code = 'R')
thanks Guys in Advance
If you're getting records that are entirely duplicated then just adding a DISTINCT clause, so your SELECT becomes SELECT DISTINCT will ensure that only one of the records is returned. If even one column is different though then this won't work.

SQL SSMS IF THEN with multiple criteria across same field

I need to pull a transaction record from a table if it is type 'C' and has a record post time greater than or equal to the post time for a record with type 'W' where the account numbers and post date are the same. I am struggling with creating an if/then where the posttime for type 'C' >= posttime for type 'W'... any help would be appreciated. I've done these types before but never for the same field where only one record item is different.
This would be the typical method using exists:
select * from transactions t
where t.actioncode = 'C' and exists (
select 1 from transactions t2
where t2.account_num = t.account_num and t2.postdate = t1.postdate
and t2.actioncode = 'W'
and t2.posttime < t1.posttime
)
If I understand you correctly, what you describe can be accomplished through JOINS.
Think relational data sets and SARGs.
While you still have not given us a table structure (which helps enormously), the solution can help steer you in the right direction. The following assumes a FACT table of TRANSACTIONS, where the carnality to itself is M:M
SELECT TOP 1000 A.ACTIONCODE, A.TRAN_RECORD --, any other needed columns
FROM TRANSACTIONS A
INNER JOIN (SELECT ACTIONCODE, POSTTIME, ACCOUNT_NUM, POSTDATE
FROM TRANSACTIONS
WHERE ACTIONCODE = 'W') B ON A.ACCOUNT_NUM = B.ACCOUNT_NUM
AND A.POSTDATE = B.POSTDATE
WHERE A.ACTIONCODE = 'C'
AND A.POSTTIME >= B.POSTTIME
UPDATED: I accidently forgot to include the correct number of columns. Always specify the same columns (or * if you do not care) that you will be using in your INNER JOIN.
Regardless, we optimize the query by only returning results that we will be using or seeing in our query.
This is what I had originally, but it just churned in SSMS without results. Essentially, I just need all 'C' type records returned where there is a 'W' type record with a posttime less than the 'C', but where the account numbers and postdate for the record are the same. posttime, postdate, type, and number are all fields in my table.
SELECT *
FROM TRANSACTIONS
WHERE ACTIONCODE = 'C' AND POSTTIME >= POSTTIME AND ACTIONCODE = 'W'

Left Join Not Joining with a Single Record

I have the following query:
Insert into cet_database.dbo.termData
(
termID,
studentID,
course,
[current],
program,
StbyCurrentClassID,
class,
classCode,
cancelled
)
Select
fm_stg.classByStudent_termData_assessmentData.termID,
fm_stg.classByStudent_termData_assessmentData.studentID,
fm_stg.classByStudent_termData_assessmentData.class_code,
case when fm_stg.classByStudent_termData_assessmentData.[current] = 'Yes' then 1 else 0 end,
fm_stg.classByStudent_termData_assessmentData.program,
fm_stg.classByStudent_termData_assessmentData.classByStudentID,
fm_stg.classByStudent_termData_assessmentData.class,
fm_stg.classByStudent_termData_assessmentData.classID,
case when fm_stg.classByStudent_termData_assessmentData.cancelled_flag = 1 then 1 else 0 end
From fm_stg.classByStudent_termData_assessmentData left outer join termData
On fm_stg.classByStudent_termData_assessmentData.class_code = termData.course
and fm_stg.classByStudent_termData_assessmentData.termID = termData.termID
and fm_stg.classByStudent_termData_assessmentData.studentID = fm_stg.classByStudent_termData_assessmentData.studentID
Where termData.StbyCurrentClassID is null
I use the query to import data into a staging table from another database (fm_stg.classByStudent_termData_assessmentData) before importing it into my database's tables. This particular query is part of a larger stored procedure that imports data into multiple tables related to termData.
When I run the sproc, I get the record inserted into fm_stg.classByStudent_termData_assessmentData but not into termData. I am only inserting one record when having this problem, but it works for the 10,000 records I did previously. I use the left join to establish what already exists in my database's table and what doesn't, then take the relevant records from the staging table. However, with this record:
316a, 39520, DEC 10, Yes, DEC10, 105713, DEC 10 (18), 6078, NULL, 2
The select returns nothing - why is this? The record definitely doesn't exist in my termData table and records insert into all my other tables from the staging table. The sproc is running all of the inserts in a transaction so as to avoid precisely this scenario where records are inserted in some tables and not others, but it doesn't seem to be working.
You say the query worked for the previous 10,000 records, but doesn't for the current one. The only thing that looks strange in your query is the third line in your ON clause where you compare a field (the studentID) with itself.
On fm_stg.classByStudent_termData_assessmentData.class_code = termData.course
and fm_stg.classByStudent_termData_assessmentData.termID = termData.termID
and fm_stg.classByStudent_termData_assessmentData.studentID = fm_stg.classByStudent_termData_assessmentData.studentID
I am just guessing here, but as this line is in the ON clause, did you want to compare the student ID, too? So it may be you were just lucky the query worked so far and now you stumble upon the student ID. I suppose the ON clause should look like this:
On fm_stg.classByStudent_termData_assessmentData.class_code = termData.course
and fm_stg.classByStudent_termData_assessmentData.termID = termData.termID
and fm_stg.classByStudent_termData_assessmentData.studentID = termData.studentID
By the way, queries get more readable by using table aliases. In the following query I use ad for fm_stg.classByStudent_termData_assessmentData and td for termData:
Insert into cet_database.dbo.termData
(
termID,
studentID,
course,
[current],
program,
StbyCurrentClassID,
class,
classCode,
cancelled
)
Select
ad.termID,
ad.studentID,
ad.class_code,
case when ad.[current] = 'Yes' then 1 else 0 end,
ad.program,
ad.classByStudentID,
ad.class,
ad.classID,
case when ad.cancelled_flag = 1 then 1 else 0 end
From fm_stg.classByStudent_termData_assessmentData ad
Left Outer Join termData td On ad.class_code = td.course
And ad.termID = td.termID
And ad.studentID = td.studentID
Where td.StbyCurrentClassID is null;
Moreover when checking for existence, why do you use the anti-join trick? Did you have issues with a straight-forward NOT EXISTS? Use tricks only when really needed. The query reads better as follows:
Insert into cet_database.dbo.termData
(
termID,
studentID,
course,
[current],
program,
StbyCurrentClassID,
class,
classCode,
cancelled
)
Select
termID,
studentID,
class_code,
case when [current] = 'Yes' then 1 else 0 end,
program,
classByStudentID,
class,
classID,
case when cancelled_flag = 1 then 1 else 0 end
From fm_stg.classByStudent_termData_assessmentData ad
Where Not Exists
(
Select *
From termData td
Where ad.class_code = td.course
And ad.termID = td.termID
And ad.studentID = td.studentID
);
With another DBMS you could even have used NOT IN (i.e. Where (class_code, termId, studenId) Not In (Select ...)) which is not correlated so such a typo as yours could not even have occurred, but SQL Server doesn't feature tuples in the IN clause unfortunately.

How to update a PostgreSQL table with a count of duplicate items

I found two bugs in a program that created a lot of duplicate values:
an 'index' was created instead of a 'unique index'
a duplication checks wasn't integrated in one of 4 twisted routines
So I need to go in and clean up my database.
Step one is to decorate the table with a count of all the duplicate values (next I'll look into finding the first value, and then migrating everything over )
The code below works, I just recall doing a similar "update from select count" on the same table years ago, and I did it in half as much code.
Is there a better way to write this?
UPDATE
shared_link
SET
is_duplicate_of_count = subquery.is_duplicate_of_count
FROM
(
SELECT
count(url) AS is_duplicate_of_count
, url
FROM
shared_link
WHERE
shared_link.url = url
GROUP BY
url
) AS subquery
WHERE
shared_link.url = subquery.url
;
You query is fine, generally, except for the pointless (but also harmless) WHERE clause in the subquery:
UPDATE shared_link
SET is_duplicate_of_count = subquery.is_duplicate_of_count
FROM (
SELECT url
, count(url) AS is_duplicate_of_count
FROM shared_link
-- WHERE shared_link.url = url
GROUP BY url
) AS subquery
WHERE shared_link.url = subquery.url;
The commented clause is the same as
WHERE shared_link.url = shared_link.url
and therefore only eliminating NULL values (because NULL = NULL is not TRUE), which is most probably neither intended nor needed in your setup.
Other than that you can only shorten your code further with aliases and shorter names:
UPDATE shared_link s
SET ct = u.ct
FROM (
SELECT url, count(url) AS ct
FROM shared_link
GROUP BY 1
) AS u
WHERE s.url = u.url;
In PostgreSQL 9.1 or later you might be able to do the whole operation (identify dupes, consolidate data, remove dupes) in one SQL statement with aggregate and window functions and data-modifying CTEs - thereby eliminating the need for an additional column to begin with.