TSQL select clause includes more data than needed - sql

I have an query that is used to pull some data but when I join another table, its duplicating my results for every record it has joined on in the other table.
I'm sure this is a simple issue I am overlooking but cant seem to get it.
My query is here:
SELECT A.[id],
A.[subject],
A.[description],
CONVERT(VARCHAR(17), A.[startTime], 100) as startTime,
CONVERT(VARCHAR(17), A.[endTime], 100) as endTime,
A.[whoCreated],
A.[center],
B.[FirstName],
B.[LastName],
B.[ntid] as empNTID,
C.[centerName],
D.[employee],
E.[segmentID]
FROM Focus_Meetings AS A
JOIN empTable as B
ON A.[whoCreated] = B.[empID]
JOIN Focus_Centers as C
ON A.[center] = C.[id]
JOIN Focus_Attendees as D
ON D.[meetingID] = A.[id]
JOIN Focus_Meetings_Segments as E
ON E.[meetingID] = A.[id]
WHERE
(CAST(A.startTime AS DATE) = CAST(COALESCE(#meetingDate, A.startTime) AS DATE) OR
CAST(A.endTime AS DATE) = CAST(COALESCE(#meetingDate, A.endTime) AS DATE) OR
(E.[segmentID] IN( SELECT ParamValues.x2.value('segment[1]', 'INT')
FROM #meetingSegment.nodes('/segments/theSegment') AS ParamValues(x2))
)
)
FOR XML PATH ('details'), TYPE, ELEMENTS, ROOT ('root');
There is 1 record in the Focus_Meetings table and 5 records in the Focus_Meetings_Segments.
My result should only be the one meeting but its giving a record for every D.[employee] and E.[segmentID].
I assume that's how its supposed to work with my query but that's not my intent.
There are 5 segments attached to the meeting in the Focus_Meetings_Segments and when I search one of them, it should only by showing me the meeting 1 time, not once for each segment.

You are correct that this is how your query is supposed to work. This is a common problem that many people new to JOINS run into.
Essentially, you are currently asking SQL Server to return every set of data based on your JOINS and that is what it is doing. It sounds like what you want is for it to arbitrarily drop records from the result set.
Consider the following simplified version of your result set:
Subject | Description | SegmentId
-----------------------------------------
Whatever | Some desc... | 1
Whatever | Some desc... | 2
Based on your description, you only want the Whatever | Some desc... portion of the results to display one time.
If that is what you want to do, you have a couple of options.
Stop selecting the data (SegmentId) that is causing the records
to show twice and only select distinct records.
SELECT DISTINCT Subject, Description...
Specify an aggregate function on the data that is causing records to show twice and group by the rest.
SELECT Subject, Description, MAX(SegmentId)... GROUP BY Subject, Description
You should also evaluate exactly what you need to select vs. what you are selecting. If you are arbitrarily selecting the SegmentId then you probably don't need it in the first place.

when I search one of them it should only by showing me the meeting 1 time, not once for each segment
Then take Segment and Employee out of th emain query and do a subquery:
SELECT A.[id],
A.[subject],
A.[description],
CONVERT(VARCHAR(17), A.[startTime], 100) as startTime,
CONVERT(VARCHAR(17), A.[endTime], 100) as endTime,
A.[whoCreated],
A.[center],
B.[FirstName],
B.[LastName],
B.[ntid] as empNTID,
C.[centerName]
FROM Focus_Meetings AS A
JOIN empTable as B
ON A.[whoCreated] = B.[empID]
JOIN Focus_Centers as C
ON A.[center] = C.[id]
WHERE A.[id] IN
(
SELECT E.[meetingID]
FROM Focus_Meetings_Segments as E
WHERE
(CAST(A.startTime AS DATE) = CAST(COALESCE(#meetingDate, A.startTime) AS DATE) OR
CAST(A.endTime AS DATE) = CAST(COALESCE(#meetingDate, A.endTime) AS DATE) OR
(E.[segmentID] IN( SELECT ParamValues.x2.value('segment[1]', 'INT')
FROM #meetingSegment.nodes('/segments/theSegment') AS ParamValues(x2))
)
)
)
FOR XML PATH ('details'), TYPE, ELEMENTS, ROOT ('root');

Related

use multiple LEFT JOINs from multiple datasets SQL

I need to perform multiple JOINs, I am grabbing the data from multiple tables and JOINing on id. The tricky part is that one table I need to join twice. Here is the code:
(
SELECT
content.brand_identifier AS brand_name,
CAST(timestamp(furniture.date) AS DATE) AS order_date,
total_hearst_commission
FROM
`furniture_table` AS furniture
LEFT JOIN `content_table` AS content ON furniture.site_content_id = content.site_content_id
WHERE
(
timestamp(furniture.date) >= TIMESTAMP('2020-06-01 00:00:00')
)
)
UNION
(
SELECT
flowers.a_merchant_name AS merchant_name
FROM
`flowers_table` AS flowers
LEFT JOIN `content` AS content ON flowers.site_content_id = content.site_content_id
)
GROUP BY
1,
2,
3,
4
ORDER BY
4 DESC
LIMIT
500
I thought I could use UNION but it gives me an error Syntax error: Expected keyword ALL or keyword DISTINCT but got "("
I'm not able to comment, but like GHB states, the queries do not have the same number of columns; therefore, UNION will not work here.
I think it would be helpful to know why sub-queries are needed in the first place. I'm guessing this query does not product the results you want, so please elaborate on why that is.
select
f.a_merchant_name as merchant_name,
c.brand_identifier as brand_name,
CAST(timestamp(f.date) AS DATE) AS order_date,
total_hearst_commission
from furniture_table f
left join content_table c on c.site_content_id = f.site_content_id
where timestamp(f.date) >= TIMESTAMP('2020-06-01 00:00:00')
group by 1,2,3,4

Recursive subtraction from two separate tables to fill in historical data

I have two datasets hosted in Snowflake with social media follower counts by day. The main table we will be using going forward (follower_counts) shows follower counts by day:
This table is live as of 4/4/2020 and will be updated daily. Unfortunately, I am unable to get historical data in this format. Instead, I have a table with historical data (follower_gains) that shows net follower gains by day for several accounts:
Ideally - I want to take the follower_count value from the minimum date in the current table (follower_counts) and subtract the sum of gains (organic + paid gains) for each day, until the minimum date of the follower_gains table, to fill in the follower_count historically. In addition, there are several accounts with data in these tables, so it would need to be grouped by account. It should look like this:
I've only gotten as far as unioning these two tables together, but don't even know where to start with looping through these rows:
WITH a AS (
SELECT
account_id,
date,
organizational_entity,
organizational_entity_type,
vanity_name,
localized_name,
localized_website,
organization_type,
total_followers_count,
null AS paid_follower_gain,
null AS organic_follower_gain,
account_name,
last_update
FROM follower_counts
UNION ALL
SELECT
account_id,
date,
organizational_entity,
organizational_entity_type,
vanity_name,
localized_name,
localized_website,
organization_type,
null AS total_followers_count,
organic_follower_gain,
paid_follower_gain,
account_name,
last_update
FROM follower_gains)
SELECT
a.account_id,
a.date,
a.organizational_entity,
a.organizational_entity_type,
a.vanity_name,
a.localized_name,
a.localized_website,
a.organization_type,
a.total_followers_count,
a.organic_follower_gain,
a.paid_follower_gain,
a.account_name,
a.last_update
FROM a
ORDER BY date desc LIMIT 100
UPDATE: Changed union to union all and added not exists to remove duplicates. Made changes per the comments.
NOTE: Please make sure you don't post images of the tables. It's difficult to recreate your scenario to write a correct query. Test this solution and update so that I can make modifications if necessary.
You don't loop through in SQL because its not a procedural language. The operation you define in the query is performed for all the rows in a table.
with cte as (SELECT a.account_id,
a.date,
a.organizational_entity,
a.organizational_entity_type,
a.vanity_name,
a.localized_name,
a.localized_website,
a.organization_type,
(a.follower_count - (b.organic_gain+b.paid_gain)) AS follower_count,
a.account_name,
a.last_update,
b.organic_gain,
b.paid_gain
FROM follower_counts a
JOIN follower_gains b ON a.account_id = b.account_id
AND b.date < (select min(date) from
follower_counts c where a.account.id = c.account_id)
)
SELECT b.account_id,
b.date,
b.organizational_entity,
b.organizational_entity_type,
b.vanity_name,
b.localized_name,
b.localized_website,
b.organization_type,
b.follower_count,
b.account_name,
b.last_update,
b.organic_gain,
b.paid_gain
FROM cte b
UNION ALL
SELECT a.account_id,
a.date,
a.organizational_entity,
a.organizational_entity_type,
a.vanity_name,
a.localized_name,
a.localized_website,
a.organization_type,
a.follower_count,
a.account_name,
a.last_update,
NULL as organic_gain,
NULL as paid_gain
FROM follower_counts a where not exists (select 1 from
follower_gains c where a.account_id = c.account_id AND a.date = c.date)
You could do something like this, instead of using the variable you can just wrap it another bracket and write at end ) AS FollowerGrowth
DECLARE #FollowerGrowth INT =
( SELECT total_followers_count
FROM follower_gains
WHERE AccountID = xx )
-
( SELECT TOP 1 follower_count
FROM follower_counts
WHERE AccountID = xx
ORDER BY date ASCENDING )

SQL to extract distinct values from an UNION query ordered by values that are not required

I am trying to get the TOP 20 DISTINCT address records from a query with a UNION...
The issue is that I want to order by date first - so the customer sees the most recent - but each date makes the row unique and leaves me with a ton of replica addresses (please see below).
SELECT
CB.CustomerGuid, CB.DisplayAddress, CB.LocatorId
FROM
(SELECT
B.CustomerGuid, CAST(B.PickupDateTime AS DATE) AS TravelDate,
B.PickupDisplayAddress AS DisplayAddress, B.PickupAddressId AS LocatorId
FROM
Bookings B
WHERE
CustomerGuid = '463a20f2-a874-4964-865d-70d71065a69b'
UNION
SELECT
B2.CustomerGuid, CAST(B2.PickupDateTime AS DATE) AS TravelDate,
B2.DestinationDisplayAddress AS DisplayAddress, B2.PickupAddressId AS LocatorId
FROM
Bookings B2
WHERE
CustomerGuid = '463a20f2-a874-4964-865d-70d71065a69b'
ORDER BY
TravelDate DESC) AS CB
I felt the approach would be make the union query a subquery and query that (without the date) as DISTINCT but I have since learned that the order of rows in the result set is ultimately controlled by the ORDER BY clause in the outer SELECT. This therefore gives me the error:
The ORDER BY clause is invalid in views, inline functions, derived tables, subqueries, and common table expressions, unless TOP, OFFSET or FOR XML is also specified.
How do I get around this please?
Something like this I think would work, assuming it doesn't matter which traveldate you sort by:
SELECT TOP 20 CB.CustomerGuid, CB.DisplayAddress, CB.LocatorId FROM
(SELECT B.CustomerGuid, CAST(B.PickupDateTime AS DATE) AS TravelDate,
B.PickupDisplayAddress AS DisplayAddress, B.PickupAddressId AS LocatorId
FROM Bookings B
WHERE CustomerGuid = '463a20f2-a874-4964-865d-70d71065a69b'
UNION
SELECT B2.CustomerGuid, CAST(B2.PickupDateTime AS DATE) AS TravelDate,
B2.DestinationDisplayAddress AS DisplayAddress, B2.PickupAddressId AS LocatorId
FROM Bookings B2
WHERE CustomerGuid = '463a20f2-a874-4964-865d-70d71065a69b'
) AS CB
GROUP BY CB.CustomerGuid, CB.DisplayAddress, CB.LocatorId
ORDER BY MAX(TravelDate) DESC

Unpivot date columns to a single column of a complex query in Oracle

Hi guys, I am stuck with a stubborn problem which I am unable to solve. Am trying to compile a report wherein all the dates coming from different tables would need to come into a single date field in the report. Ofcourse, the max or the most recent date from all these date columns needs to be added to the single date column for the report. I have multiple users of multiple branches/courses for whom the report would be generated.
There are multiple blogs and the latest date w.r.t to the blogtitle needs to be grouped, i.e. max(date_value) from the six date columns should give the greatest or latest date for that blogtitle.
Expected Result:
select u.batch_uid as ext_person_key, u.user_id, cm.batch_uid as ext_crs_key, cm.crs_id, ir.role_id as
insti_role, (CASE when b.JOURNAL_IND = 'N' then
'BLOG' else 'JOURNAL' end) as item_type, gm.title as item_name, gm.disp_title as ITEM_DISP_NAME, be.blog_pk1 as be_blogPk1, bc.blog_entry_pk1 as bc_blog_entry_pk1,bc.pk1,
b.ENTRY_mod_DATE as b_ENTRY_mod_DATE ,b.CMT_mod_DATE as BlogCmtModDate, be.CMT_mod_DATE as be_cmnt_mod_Date,
b.UPDATE_DATE as BlogUpDate, be.UPDATE_DATE as be_UPDATE_DATE,
bc.creation_date as bc_creation_date,
be.CREATOR_USER_ID as be_CREATOR_USER_ID , bc.creator_user_id as bc_creator_user_id,
b.TITLE as BlogTitle, be.TITLE as be_TITLE,
be.DESCRIPTION as be_DESCRIPTION, bc.DESCRIPTION as bc_DESCRIPTION
FROM users u
INNER JOIN insti_roles ir on u.insti_roles_pk1 = ir.pk1
INNER JOIN crs_users cu ON u.pk1 = cu.users_pk1
INNER JOIN crs_mast cm on cu.crsmast_pk1 = cm.pk1
INNER JOIN blogs b on b.crsmast_pk1 = cm.pk1
INNER JOIN blog_entry be on b.pk1=be.blog_pk1 AND be.creator_user_id = cu.pk1
LEFT JOIN blog_CMT bc on be.pk1=bc.blog_entry_pk1 and bc.CREATOR_USER_ID=cu.pk1
JOIN gradeledger_mast gm ON gm.crsmast_pk1 = cm.pk1 and b.grade_handler = gm.linkId
WHERE cu.ROLE='S' AND BE.STATUS='2' AND B.ALLOW_GRADING='Y' AND u.row_status='0'
AND u.available_ind ='Y' and cm.row_status='0' and and u.batch_uid='userA_157'
I am getting a resultset for the above query with multiple date columns which I want > > to input into a single columnn. The dates have to be the most recent, i.e. max of the dates in the date columns.
I have successfully done the Unpivot by using a view to store the above
resultset and put all the dates in one column. However, I do not
want to use a view or a table to store the resultset and then do
Unipivot simply because I cannot keep creating views for every user
one would query for.
The max(date_value) from the date columns need to be put in one single column. They are as follows:
* 1) b.entry_mod_date, 2) b.cmt_mod_date ,3) be.cmt_mod_date , 4) b.update_Date ,5) be.update_date, 6) bc.creation_date *
Apologies that I could not provide the desc of all the tables and the
fields being used.
Any help to get the above mentioned max of the dates from these
multiple date columns into a single column without using a view or a
table would be greatly appreciated.*
It is not clear what results you want, but the easiest solution is to use greatest().
with t as (
YOURQUERYHERE
)
select t.*,
greatest(entry_mod_date, cmt_mod_date, cmt_mod_date, update_Date,
update_date, bc.creation_date
) as greatestdate
from t;
select <columns>,
case
when greatest (b_ENTRY_mod_DATE) >= greatest (BlogCmtModDate) and greatest(b_ENTRY_mod_DATE) >= greatest(BlogUpDate)
then greatest( b_ENTRY_mod_DATE )
--<same implementation to compare each time BlogCmtModDate and BlogUpDate separately to get the greatest then 'date'>
,<columns>
FROM table
<rest of the query>
UNION ALL
Select <columns>,
case
when greatest (be_cmnt_mod_Date) >= greatest (be_UPDATE_DATE)
then greatest( be_cmnt_mod_Date )
when greatest (be_UPDATE_DATE) >= greatest (be_cmnt_mod_Date)
then greatest( be_UPDATE_DATE )
,<columns>
FROM table
<rest of the query>
UNION ALL
Select <columns>,
GREATEST(bc_creation_date)
,<columns>
FROM table
<rest of the query>

SQL Server adjust each value in a column by another table

I have two tables, TblVal and TblAdj.
In TblVal I have a bunch of values that I need adjusted according to TblAdj for a given TblVal.PersonID and TblVal.Date and then returned in some ViewAdjustedValues. I must apply only those adjustments where TblAdj.Date >= TblVal.Date.
The trouble is that since all the adjustments are either a subtraction or a division, they need to be made in order. Here is the table structure:
TblVal: PersonID, Date, Value
TblAdj: PersonID, Date, SubtractAmount, DivideAmount
I want to return ViewAdjustedValues: PersonID, Date, AdjValue
Can I do this without iterating through TblAdj using a WHILE loop and an IF block to either subtract or divide as necessary? Is there some nested SELECT table magic I can perform that would be faster?
I think you can do it without a loop, but whether you want to or not is another question. A query that I think works is below (SQL Fiddle here). The key ideas are as follows:
Each SubtractAmount has the ultimate effect of subtracting SubtractAmount divided by the product of all later DivideAmounts for the same PersonID. The Date associated with the PersonID isn't relevant to this adjustment (fortunately). The CTE AdjustedAdjustments contains these adjusted SubtractAmount values.
The initial Value for a PersonID gets divided by the product of all DivideAmount values on or after that persons Date.
EXP(SUM(LOG(x))) works as an aggregate product if all values of x are positive. You should constrain your DivideAmount values to assure this, or adjust the code accordingly.
If there are no DivideAmounts, the associated product is NULL and changed to 1. Similarly, NULL sums of adjusted SubtractAmount values are changed to zero. A left join is used to preserve an values that are not subject to any adjustments.
SQL Server 2012 supports an OVER clause for aggregates, which was helpful here to aggregate "all later DivideAmounts."
WITH AdjustedAdjustments AS (
select
PersonID,
Date,
SubtractAmount/
EXP(
SUM(LOG(COALESCE(DivideAmount,1)))
OVER (
PARTITION BY PersonID
ORDER BY Date
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
)
) AS AdjustedSubtract,
DivideAmount
FROM TblAdj
)
SELECT
p.PersonID,
p.Value/COALESCE(EXP(SUM(LOG(COALESCE(DivideAmount,1)))),1)
-COALESCE(SUM(a.AdjustedSubtract),0) AS AmountAdjusted
FROM TblVal AS p
LEFT OUTER JOIN AdjustedAdjustments AS a
ON a.PersonID = p.PersonID
AND a.Date >= p.Date
GROUP BY p.PersonID, p.Value, p.Date;
Try something like following:
with CTE_TblVal (PersonID,Date,Value)
as
(
select A.PersonID, A.Date, A.Value
from TblVal A
inner join TblAdj B
on A.PersonID = B.PersonID
where B.Date >= A.Date
)
update CTE_TblVal
set Date = TblAdj.Date,
Value = TblAdj.Value
from CTE_TblVal
inner join TblAdj
on CTE_Tblval.PersonID = TblAdj.PersonID
output inserted.* into ViewAdjustedValues
select * from ViewAdjustedValues