Only include employees currently working for company - sql

Thanks to the input I've received earlier on this site, I use the below code to summarize how many months of experience our employees have in a specific market.
The issue however is that the end-result shows the summarized data for all employees, even if they no longer work for our company. What I actually would like, is the same output, but then only for those employees who are still present in the youngest date (SNAPSHOT_DATE).
I managed to get to a solution where I manually update the snapshot every month, but I'd rather go for an automated solution where the code itself determines what the youngest snapshot is.
Many thanks for your support :)
SELECT EMPLOYEE_ID,
ISNULL([Developing & Emerging],0) AS [Experience - D&E],
ISNULL([Developed],0) AS [Experience - D]
FROM
(
SELECT EMPLOYEE_ID,MARKET_TYPE_DESC,COUNT(SNAPSHOT_DATE) T
FROM [db_name].[AGL_V_HRA_FE_R].[VW_HRA_EMPLOYEE_DETAIL]
WHERE ALLC_SER_NUM = '1'
GROUP BY EMPLOYEE_ID,MARKET_TYPE_DESC
)P
PIVOT(
SUM(T)
FOR MARKET_TYPE_DESC IN ([Developing & Emerging],[Developed])
)PVT

It sounds like you are trying to reduce your query down to only return results where [db_name].[AGL_V_HRA_FE_R].[VW_HRA_EMPLOYEE_DETAIL].SNAPSHOT_DATE == the latest snapshot date.
There are multiple ways you can achieve this, but probably the simplest and most easily readable would be something like the following:
DECLARE #SnapshotDate DATETIME
SELECT #SnapshotDate = MIN(SNAPSHOT_DATE ) FROM [db_name].[AGL_V_HRA_FE_R].[VW_HRA_EMPLOYEE_DETAIL]
And then use a CTE to reduce the included list of employees to only those which have a record which matches this snapshot date, by joining the CTE in your main query:
;WITH CTE AS
(
SELECT EMPLOYEE_ID
FROM [db_name].[AGL_V_HRA_FE_R].[VW_HRA_EMPLOYEE_DETAIL]
WHERE SNAPSHOT_DATE = #SnapshotDate
)
SELECT EMPLOYEE_ID,
ISNULL([Developing & Emerging],0) AS [Experience - D&E],
ISNULL([Developed],0) AS [Experience - D]
FROM
(
SELECT ED.EMPLOYEE_ID,ED.MARKET_TYPE_DESC,COUNT(ED.SNAPSHOT_DATE) T
FROM [db_name].[AGL_V_HRA_FE_R].[VW_HRA_EMPLOYEE_DETAIL] ED
JOIN CTE C ON C.EMPLOYEE_ID = ED.EMPLOYEE_ID
WHERE ED.ALLC_SER_NUM = '1'
GROUP BY ED.EMPLOYEE_ID,ED.MARKET_TYPE_DESC
)P
PIVOT(
SUM(T)
FOR MARKET_TYPE_DESC IN ([Developing & Emerging],[Developed])
)PVT

Thanks, #James S! Much appreciated :)
Perhaps a stupid question, but should I just create one query combining both your inputs? Looking like the one below?
DECLARE #SnapshotDate DATETIME
SELECT #SnapshotDate = MIN(SNAPSHOT_DATE ) FROM [db_name].[AGL_V_HRA_FE_R].[VW_HRA_EMPLOYEE_DETAIL]
WITH CTE AS
(
SELECT EMPLOYEE_ID
FROM [db_name].[AGL_V_HRA_FE_R].[VW_HRA_EMPLOYEE_DETAIL]
WHERE SNAPSHOT_DATE = #SnapshotDate
)
SELECT EMPLOYEE_ID,
ISNULL([Developing & Emerging],0) AS [Experience - D&E],
ISNULL([Developed],0) AS [Experience - D]
FROM
(
SELECT EMPLOYEE_ID,MARKET_TYPE_DESC,COUNT(SNAPSHOT_DATE) T
FROM [db_name].[AGL_V_HRA_FE_R].[VW_HRA_EMPLOYEE_DETAIL] ED
JOIN CTE C ON C.EMPLOYEE_ID = ED.EMPLOYEE_ID
WHERE ALLC_SER_NUM = '1'
AND SNAPSHOT_DATE = #SnapshotDate
GROUP BY EMPLOYEE_ID,MARKET_TYPE_DESC
)P
PIVOT(
SUM(T)
FOR MARKET_TYPE_DESC IN ([Developing & Emerging],[Developed])
)PVT
I tried this, but it seems that doesn't work as it gives the following reply:
Incorrect syntax near 'CTE'. If this is intended to be a common table expression, you need to explicitly terminate the previous statement with a semi-colon.```

Related

AnalysisException: subqueries are not supported in the select list

I get this error code shown in title when using this following query. I'm trying query two tables to find total patients with hearing issues and the total of those patients with hearing issues who have undergone some sort of scan (MR,SC,CT).
SELECT (
SELECT COUNT(*)
FROM hearing_evaluation
where severity_of_hearing_loss <> 'Normal'
AND severity_of_hearing_loss <> 'insufficient data'
) AS patients_with_hearing_loss
, AVG(number_of_scans) AS avg_number_of_scans
FROM (
SELECT patient_id, COUNT(*) AS number_of_scans
from imaging
where patient_id IN (
SELECT patient_id
from hearing_evaluation
where severity_of_hearing_loss <> 'Normal'
and severity_of_hearing_loss <> 'insufficient data'
)
AND modality IN ('CT','MR','SC')
GROUP BY patient_id
) AS scans
Any help would be appreciated.
I tried, pls refer to below SQL - this will work in impala. Only issue i can see is, if hearing_evaluation has multiple patient ids for a given patient id, you need to de-duplicate the data.
There can be case when patient id doesnt exist in image table - in such case you need to apply RIGHT JOIN.
SELECT
COUNT(patient_id) AS patients_with_hearing_loss
, AVG(rs.number_of_scans) AS avg_number_of_scans
FROM (
SELECT i.patient_id patient_id, COUNT(*) AS number_of_scans
from imaging i ,hearing_evaluation h
where i. patient_id = h.patient_id
and h.severity_of_hearing_loss <> 'Normal'
and h.severity_of_hearing_loss <> 'insufficient data'
AND modality IN ('CT','MR','SC')
GROUP BY i.patient_id ) rs

Recursive subtraction from two separate tables to fill in historical data

I have two datasets hosted in Snowflake with social media follower counts by day. The main table we will be using going forward (follower_counts) shows follower counts by day:
This table is live as of 4/4/2020 and will be updated daily. Unfortunately, I am unable to get historical data in this format. Instead, I have a table with historical data (follower_gains) that shows net follower gains by day for several accounts:
Ideally - I want to take the follower_count value from the minimum date in the current table (follower_counts) and subtract the sum of gains (organic + paid gains) for each day, until the minimum date of the follower_gains table, to fill in the follower_count historically. In addition, there are several accounts with data in these tables, so it would need to be grouped by account. It should look like this:
I've only gotten as far as unioning these two tables together, but don't even know where to start with looping through these rows:
WITH a AS (
SELECT
account_id,
date,
organizational_entity,
organizational_entity_type,
vanity_name,
localized_name,
localized_website,
organization_type,
total_followers_count,
null AS paid_follower_gain,
null AS organic_follower_gain,
account_name,
last_update
FROM follower_counts
UNION ALL
SELECT
account_id,
date,
organizational_entity,
organizational_entity_type,
vanity_name,
localized_name,
localized_website,
organization_type,
null AS total_followers_count,
organic_follower_gain,
paid_follower_gain,
account_name,
last_update
FROM follower_gains)
SELECT
a.account_id,
a.date,
a.organizational_entity,
a.organizational_entity_type,
a.vanity_name,
a.localized_name,
a.localized_website,
a.organization_type,
a.total_followers_count,
a.organic_follower_gain,
a.paid_follower_gain,
a.account_name,
a.last_update
FROM a
ORDER BY date desc LIMIT 100
UPDATE: Changed union to union all and added not exists to remove duplicates. Made changes per the comments.
NOTE: Please make sure you don't post images of the tables. It's difficult to recreate your scenario to write a correct query. Test this solution and update so that I can make modifications if necessary.
You don't loop through in SQL because its not a procedural language. The operation you define in the query is performed for all the rows in a table.
with cte as (SELECT a.account_id,
a.date,
a.organizational_entity,
a.organizational_entity_type,
a.vanity_name,
a.localized_name,
a.localized_website,
a.organization_type,
(a.follower_count - (b.organic_gain+b.paid_gain)) AS follower_count,
a.account_name,
a.last_update,
b.organic_gain,
b.paid_gain
FROM follower_counts a
JOIN follower_gains b ON a.account_id = b.account_id
AND b.date < (select min(date) from
follower_counts c where a.account.id = c.account_id)
)
SELECT b.account_id,
b.date,
b.organizational_entity,
b.organizational_entity_type,
b.vanity_name,
b.localized_name,
b.localized_website,
b.organization_type,
b.follower_count,
b.account_name,
b.last_update,
b.organic_gain,
b.paid_gain
FROM cte b
UNION ALL
SELECT a.account_id,
a.date,
a.organizational_entity,
a.organizational_entity_type,
a.vanity_name,
a.localized_name,
a.localized_website,
a.organization_type,
a.follower_count,
a.account_name,
a.last_update,
NULL as organic_gain,
NULL as paid_gain
FROM follower_counts a where not exists (select 1 from
follower_gains c where a.account_id = c.account_id AND a.date = c.date)
You could do something like this, instead of using the variable you can just wrap it another bracket and write at end ) AS FollowerGrowth
DECLARE #FollowerGrowth INT =
( SELECT total_followers_count
FROM follower_gains
WHERE AccountID = xx )
-
( SELECT TOP 1 follower_count
FROM follower_counts
WHERE AccountID = xx
ORDER BY date ASCENDING )

How do I select all id's that are present in one table but not in another

I am trying to get a list of department Ids are present in one table, (PS_Y_FORM_HIRE), but which don't exist in another table (PS_DEPARTMENT_VW).
Here is the basics of what I have which isn't working:
SELECT h.DEPTID FROM PS_Y_FORM_HIRE h, PS_DEPARTMENT_VW d WHERE NOT EXISTS (
SELECT d1.DEPTID FROM PS_DEPARTMENT_VW d1 WHERE d1.DEPTID = h.DEPTID
and d1.SETID_GL_DEPT = 'IDBYU'
);
I'm trying to form this query in SQL Developer, but it just returns a long list of blanks (after spinning/running the query for a very long time).
In addition, I need this to be effective dated, so that it only grabs the correct effective-dated row, but I was unsure how and where to incorporate this into the query.
EDIT I neglected to mention that only the department table is effective dated. The form hire table is not. I need to get the current effectively dated row from that in this query (to make sure the data is accurate).
Also note that DEPTID isn't a key on PS_Y_FORM_HIRE, but is on PS_DEPARTMENT_VW. (Along with SETID_GL_DEPT and EFFDT).
So again, ideally, I will have a list of all the department ids that appear in PS_Y_FORM_HIRE, but which are not in PS_DEPARTMENT_VW.
SELECT DEPTID
FROM PS_Y_FORM_HIRE
MINUS
SELECT DEPTID
FROM PS_DEPARTMENT_VW
WHERE SETID_GL_DEPT = 'IDBYU';
or
SELECT DEPTID
FROM PS_Y_FORM_HIRE
WHERE DEPTID NOT IN (
SELECT DEPTID
FROM PS_DEPARTMENT_VW
WHERE SETID_GL_DEPT = 'IDBYU'
)
or
SELECT DEPTID
FROM PS_Y_FORM_HIRE h
WHERE NOT EXISTS (
SELECT 1
FROM PS_DEPARTMENT_VW d
WHERE SETID_GL_DEPT = 'IDBYU'
AND d.DEPTID = h.DEPTID
)
This seems like a job for the MINUS operation. Something like
select deptid from ps_y_form_hire where eff_date = <whatever>
minus
select deptid from ps_department_vw <where eff_date = ...>
You didn't provide information to determine what exactly you want done with the effective dates; adapt as needed.
SELECT h.DEPTID
FROM PS_Y_FORM_HIRE h
WHERE h.DEPTID NOT IN (SELECT p.DEPTID
FROM PS_DEPARTMENT_VW p
WHERE p.SETID_GL_DEPT = 'IDBYU')
Your question is a bit unclear around why you want effective dated rows as you are not checking effective status or any other field that may have changed between effective rows. If your question is, You want to know all DEPTIDs in PS_Y_FORM_HIRE that either don't exist or are inactive as of a current effective date, then the SQL below should help
SELECT DEPTID
FROM PS_Y_FORM_HIRE h
WHERE
H.DEPTID NOT IN ( SELECT d.DEPTID
FROM PS_DEPARTMENT_VW d
WHERE d.EFF_STATUS = 'A'
AND d.EFFDT = (SELECT MAX(EFFDT)
FROM PS_DEPARTMENT_VW d2
WHERE d2.SETID_GL_DEPT = d.SETID_GL_DEPT
AND d2.DEPTID = d.DEPTID
AND d2.EFFDT <= CURRENT_DATE)
)

Datediff between two tables

I have those two tables
1-Add to queue table
TransID , ADD date
10 , 10/10/2012
11 , 14/10/2012
11 , 18/11/2012
11 , 25/12/2012
12 , 1/1/2013
2-Removed from queue table
TransID , Removed Date
10 , 15/1/2013
11 , 12/12/2012
11 , 13/1/2013
11 , 20/1/2013
The TansID is the key between the two tables , and I can't modify those tables, what I want is to query the amount of time each transaction spent in the queue
It's easy when there is one item in each table , but when the item get queued more than once how do I calculate that?
Assuming the order TransIDs are entered into the Add table is the same order they are removed, you can use the following:
WITH OrderedAdds AS
( SELECT TransID,
AddDate,
[RowNumber] = ROW_NUMBER() OVER(PARTITION BY TransID ORDER BY AddDate)
FROM AddTable
), OrderedRemoves AS
( SELECT TransID,
RemovedDate,
[RowNumber] = ROW_NUMBER() OVER(PARTITION BY TransID ORDER BY RemovedDate)
FROM RemoveTable
)
SELECT OrderedAdds.TransID,
OrderedAdds.AddDate,
OrderedRemoves.RemovedDate,
[DaysInQueue] = DATEDIFF(DAY, OrderedAdds.AddDate, ISNULL(OrderedRemoves.RemovedDate, CURRENT_TIMESTAMP))
FROM OrderedAdds
LEFT JOIN OrderedRemoves
ON OrderedAdds.TransID = OrderedRemoves.TransID
AND OrderedAdds.RowNumber = OrderedRemoves.RowNumber;
The key part is that each record gets a rownumber based on the transaction id and the date it was entered, you can then join on both rownumber and transID to stop any cross joining.
Example on SQL Fiddle
DISCLAIMER: There is probably problem with this, but i hope to send you in one possible direction. Make sure to expect problems.
You can try in the following direction (which might work in some way depending on your system, version, etc) :
SELECT transId, (sum(add_date_sum) - sum(remove_date_sum)) / (1000*60*60*24)
FROM
(
SELECT transId, (SUM(UNIX_TIMESTAMP(add_date)) as add_date_sum, 0 as remove_date_sum
FROM add_to_queue
GROUP BY transId
UNION ALL
SELECT transId, 0 as add_date_sum, (SUM(UNIX_TIMESTAMP(remove_date)) as remove_date_sum
FROM remove_from_queue
GROUP BY transId
)
GROUP BY transId;
A bit of explanation: as far as I know, you cannot sum dates, but you can convert them to some sort of timestamps. Check if UNIX_TIMESTAMPS works for you, or figure out something else. Then you can sum in each table, create union by conveniently leaving the other one as zeto and then subtracting the union query.
As for that devision in the end of first SELECT, UNIT_TIMESTAMP throws out miliseconds, you devide to get days - or whatever it is that you want.
This all said - I would probably solve this using a stored procedure or some client script. SQL is not a weapon for every battle. Making two separate queries can be much simpler.
Answer 2: after your comments. (As a side note, some of your dates 15/1/2013,13/1/2013 do not represent proper date formats )
select transId, sum(numberOfDays) totalQueueTime
from (
select a.transId,
datediff(day,a.addDate,isnull(r.removeDate,a.addDate)) numberOfDays
from AddTable a left join RemoveTable r on a.transId = r.transId
order by a.transId, a.addDate, r.removeDate
) X
group by transId
Answer 1: before your comments
Assuming that there won't be a new record added unless it is being removed. Also note following query will bring numberOfDays as zero for unremoved records;
select a.transId, a.addDate, r.removeDate,
datediff(day,a.addDate,isnull(r.removeDate,a.addDate)) numberOfDays
from AddTable a left join RemoveTable r on a.transId = r.transId
order by a.transId, a.addDate, r.removeDate

SQL query ...multiple max value selection. Help needed

Business World 1256987 monthly 10 2009-10-28
Business World 1256987 monthly 10 2009-09-23
Business World 1256987 monthly 10 2009-08-18
Linux 4 U 456734 monthly 25 2009-12-24
Linux 4 U 456734 monthly 25 2009-11-11
Linux 4 U 456734 monthly 25 2009-10-28
I get this result with the query:
SELECT DISTINCT ljm.journelname,ljm. subscription_id,
ljm.frequency,ljm.publisher, ljm.price, ljd.receipt_date
FROM lib_journals_master ljm,
lib_subscriptionhistory
lsh,lib_journal_details ljd
WHERE ljd.journal_id=ljm.id
ORDER BY ljm.publisher
What I need is the latest date in each journal?
I tried this query:
SELECT DISTINCT ljm.journelname, ljm.subscription_id,
ljm.frequency, ljm.publisher, ljm.price,ljd.receipt_date
FROM lib_journals_master ljm,
lib_subscriptionhistory lsh,
lib_journal_details ljd
WHERE ljd.journal_id=ljm.id
AND ljd.receipt_date = (
SELECT max(ljd.receipt_date)
from lib_journal_details ljd)
But it gives me the maximum from the entire column. My needed result will have two dates (maximum of each magazine), but this query gives me only one?
You could change the WHERE statement to look up the last date for each journal:
AND ljd.receipt_date = (
SELECT max(subljd.receipt_date)
from lib_journal_details subljd
where subljd.journelname = ljd.journelname)
Make sure to give the table in the subquery a different alias from the table in the main query.
You should use Group By if you need the Max from date.
Should look something like this:
SELECT
ljm.journelname
, ljm.subscription_id
, ljm.frequency
, ljm.publisher
, ljm.price
, **MAX(ljd.receipt_date)**
FROM
lib_journals_master ljm
, lib_subscriptionhistory lsh
, lib_journal_details ljd
WHERE
ljd.journal_id=ljm.id
GROUP BY
ljm.journelname
, ljm.subscription_id
, ljm.frequency
, ljm.publisher
, ljm.price
Something like this should work for you.
SELECT ljm.journelname
, ljm.subscription_id
, ljm.frequency
, ljm.publisher
, ljm.price
,md.max_receipt_date
FROM lib_journals_master ljm
, ( SELECT journal_id
, max(receipt_date) as max_receipt_date
FROM lib_journal_details
GROUP BY journal_id) md
WHERE ljm.id = md.journal_id
/
Note that I have removed the tables from the FROM clause which don't contribute anything to the query. You may need to replace them if yopu simplified your scenario for our benefit.
Separate this into two queries one will get journal name and latest date
declare table #table (journalName as varchar,saleDate as datetime)
insert into #table
select journalName,max(saleDate) from JournalTable group by journalName
select all fields you need from your table and join #table with them. join on journalName.
Sounds like top of group. You can use a CTE in SQL Server:
;WITH journeldata AS
(
SELECT
ljm.journelname
,ljm.subscription_id
,ljm.frequency
,ljm.publisher
,ljm.price
,ljd.receipt_date
,ROW_NUMBER() OVER (PARTITION BY ljm.journelname ORDER BY ljd.receipt_date DESC) AS RowNumber
FROM
lib_journals_master ljm
,lib_subscriptionhistory lsh
,lib_journal_details ljd
WHERE
ljd.journal_id=ljm.id
AND ljm.subscription_id = ljm.subscription_id
)
SELECT
journelname
,subscription_id
,frequency
,publisher
,price
,receipt_date
FROM journeldata
WHERE RowNumber = 1