GROUP BY with multiple fields (SQL Server 2000) - sql-server-2000

I have a table with 3 columns:
History:
ID | xDate | xUser
I would like to return the ID, xDate and xUser for the last xDate of each ID.
This is what I've got:
SELECT
ID, Last = Max(xDate)
FROM
History
GROUP BY
ID
ORDER BY
Last DESC
As soon as I add the xUser to the SELECT, it stops working.
Any help would be greatly appreciated.
ID | xDate | xUser
01 2014-1 Joe
01 2014-2 Bob
01 2014-3 Tom
02 2014-1 Joe
02 2014-2 Bob
02 2014-3 Tom
Desired results:
ID | xDate | xUser
01 2014-3 Tom
02 2014-3 Tom

You need to pre-query for each ID, what is the last date... Then from that, re-join to your history table on BOTH components to get the name corresponding to that ID
SELECT
H2.ID,
H2.xDate,
H2.Name
FROM
( select ID, max(xDate) ThisDate
from History
Where xdate > '2014-09-01'
group by ID ) PreCheck
JOIN History H2
on PreCheck.ID = H2.ID
AND PreCheck.ThisDate = H2.xDate
ORDER BY
H2.xDate DESC
Your issue will be if you have multiple entries on a given date unless the dates are really a full date/time for most recent entry.
Also, it would be best to have an index on your table on (ID, xDate)

Related

In PostgreSQL, in a table with multiple rows per unique ID, can you select one row by a condition per unique ID and other values?

Specifically, I a table of test data in which I am trying to pick one row of data per student and session (i.e. fall, winter, spring). The problem is that there are some students who re-took the test within the same session, and I would like my query to handle these occurrences.
So let's say a student (with studentid = 12345) took a test twice in the fall--Once on September 23rd with a score of 85/100, and then again on October 3rd with a 75/100. I want to know two different queries, one to handle each of the following:
Return the row of their most recent test (i.e. the test from Oct 3rd)
Return the row of their highest scoring test (i.e. the test from Sept 23rd)
Here is an example of a table similar to the one I am working with:
| studentid | session | testdate | score | schoolyear |
-----------------------------------------------------------
| ...
| 42532 | Fall | '2020-10-01' | 68 | '2020-2021'
| 42532 | Winter | '2021-02-02' | 70 | '2020-2021'
| 12345 | Fall | '2020-09-23' | 85 | '2020-2021' <--- (this student has two records for the fall)
| 12345 | Fall | '2020-10-03' | 75 | '2020-2021' <---
| 12345 | Winter | '2021-01-10' | 79 | '2020-2021'
| 83456 | Fall | '2020-09-08' | 90 | '2019-2020'
| 83456 | Winter | '2021-01-18' | 83 | '2019-2020'
| ...
So I want to run a query similar to the following:
SELECT studentid, session, testdate, score
FROM exam_result
WHERE schoolyear = '2020-2021'
-- (something to filter out the multiples)
Where it returns 1 row per student AND session, for all students
Any help would be greatly appreciated!
If you want just one row, use fetch or limit:
select er.*
from exam_result er
where er.studentid = 12345
order by testdate desc
limit 1;
Just adjust the order by for the row you want.
For all tests, you would use distinct on:
select distinct on (er.studentid) er.*
from exam_result er
where . . . -- whatever other conditions you have
order by er.studentid, testdate desc
Which score you prefer to have; best or worst or something else? This query gives you the best score. I dropped testdate away because it isn't unique. If you need test date it makes query a bit more complicated. And what date you want if studet gets the same score twice?
SELECT studentid, session, MAX(score)
FROM exam_result
WHERE schoolyear = '2020-2021'
GROUP BY studentid, session
If you need exam date this this kind of query gives you the first exam date when a student get his highest score. This isn't tested, but you get idea. Make separate subquery which gets min date for student/score combination and join it to your original query.
SELECT
a.student_id.
a.session,
b.exam_date,
a.score
FROM
exam_resut a JOIN
(SELECT
student_id, session, MIN(exam_date)
FROM exam_result
WHERE schoolyear = '2020-2021'
GROUP BY student_id, session) b
ON a.studet_id = b.student_id and a.session = b.sesion
WHERE a.schoolyear = '2020-2021'
GROUP BY a.studentid, a.session, b.exam_date

Using COUNT and GROUP BY in Spark SQL

I'm trying to get pretty basic output that pulls unique NDC Codes for medications and counts the number of unique patients that take each drug. My dataset basically looks like this:
patient_id | drug_ndc
---------------------
01 | 250
02 | 725
03 | 1075
04 | 1075
05 | 250
06 | 250
I want the output to look something like this:
NDC | Patients
--------------
250 | 3
1075 | 2
725 | 1
I tried using some queries like this:
select distinct drug_ndc as NDC, count patient_id as Patients
from table 1
group by 1
order by 1
But I keep getting errors. I've tried with and without using an alias, but to no avail.
The correct syntax should be:
select drug_ndc as NDC, count(*) as Patients
from table 1
group by drug_ndc
order by 1;
SELECT DISTINCT is almost never appropriate with GROUP BY. And you can can use COUNT(*) unless the patient id can be NULL.
to get the number of unique patients, you should do:
select drug_ndc as NDC, count(distinct patient_id) as Patients
from table 1
group by drug_ndc;

get data for a record in the past

I have 2 tables, table1 and table2.
What I want to achieve:
I want to return the current month info about an employee
AND
when they created an account from table 2. They might be have changed their position as of today so I want to capture info at a point in time and current on the same row.
College program table
Table 1
Name Acct_Cr_DT
a1 12/1/2018
b1 1/4/2018
c1 5/6/2018
Last Month (12/29) and current Month Data (1/29/2019). Assuming data refreshes on last day of every fiscal month.
Table 2
Name position gender Emp status FISCAL_MONTH_END_DATE
a1 Analyst M hourly 12/29/2018
b1 Intern F hourly 12/29/2018
c1 Director F hourly 12/29/2018
a1 Manager M hourly 1/29/2019
b1 Analyst F hourly 1/29/2019
c1 Director F hourly 1/29/2019
a1 was an analyst at the time of account creation.
b1 was an intern at the time of account creation.
Sample output: Need the info at the time of account creation before these got a promotion.
Name Acct_Cr_DT position gender Emp status FISCAL_MONTH_END_DATE
a1 12/1/2018 Analyst M hourly 1/29/2019
b1 1/4/2018 Intern F hourly 1/29/2019
c1 5/6/2018 Director F hourly 1/29/2019
If you want to return the current month info for table 1:
SELECT *
FROM YOUR_TABLE
WHERE MONTH(COLUMN_NAME) = MONTH(GETDATE())
AND YEAR(COLUMN_NAME) = YEAR(GETDATE())
However, judging your from your explanations, you need a join statement to capture info at a point in time and current on the same row. So you probably need this:
SELECT *
FROM TABLE_1 a inner join TABLE_2 b on a.id=b.id
WHERE MONTH(COLUMN_NAME) = MONTH(GETDATE())
AND YEAR(COLUMN_NAME) = YEAR(GETDATE())
Please provide sample output for further explanation.
Here you can try the following query.
SELECT TB1.name,Acct_Cr_DT,position,gender,Emp status,FISCAL_MONTH_END_DATE
FROM Table1 tb1
INNER JOIN Table2 tb2 ON tb1.name=tb2.name
WHERE DATEPART(YEAR,Acct_Cr_DT) = DATEPART(YEAR,FISCAL_MONTH_END_DATE) AND DATEPART(MONTH,FISCAL_MONTH_END_DATE)='12' AND DATEPART(DAY,FISCAL_MONTH_END_DATE)='29'
Getting record at the time of account creation and also the last day of the year which was the record refreshed.
It should be possible to solve this using the wonderful DB2 OLAP functions (aka window functions).
The following code use a subquery to pull out the first position of each employee and sort the records by fiscal end of month. The outer query then joins the results with table2 to get the employee hire date, and filters the records corresponding to the most recent fiscal end of month.
SELECT
tx.name, tx.first_position, tx.gender, tx.emp_status, tx.fiscal_month_end_date, t1.acct_cr_dt
FROM (
SELECT
t2.*,
FIRST_VALUE(t2.position) OVER(PARTITION BY t2.name ORDER BY fiscal_month_end_date) first_position,
DENSE_RANK() OVER (ORDER BY t2.fiscal_month_end_date desc) rnk
FROM table2 t2
) tx INNER JOIN table1 t1 ON t1.name = tx.name
WHERE tx.rnk = 1;
In this DB Fiddle demo, the query yields :
| name | first_position | gender | emp_status | fiscal_month_end_date | acct_cr_dt |
| ---- | -------------- | ------ | ---------- | --------------------- | ---------- |
| a1 | Analyst | M | hourly | 2019-01-29 | 2018-12-01 |
| b1 | Intern | F | hourly | 2019-01-29 | 2018-01-04 |
| c1 | Director | F | hourly | 2019-01-29 | 2018-05-06 |
NB : this a MySQL 8.0 fiddle, since there is no DB2 db fiddlde available in the wild...
I found the answer to this own my own.
Basically you'll have to join on Name and make sure the "Acct Create DT" is
<=FISCAL_MONTH_END_DATE" to get the info for point in time for that employee.
Now after that, create a sub-query with a LEFT JOIN on Table 2 and extract the current
"FISCAL_MONTH_END_DATE" to return current month data

sql, getting parents and child from one table

I'm trying to write a simple sql statement to select a parent and dependents from one table based on the parents hiring date. Because the hiring date field in dependents row is null, I'm only getting the parents. Can someone help?
PRIM KEY RECORD LAST FIRST HIRE DATE
12345 1 JONES MARY 1/1/2017
12345 2 JONES TIM
6789 1 SMITH CAROL 5/12/2014
23456 1 WHITAKE REGINA 5/14/2017
23456 2 WHITAKE JOE
parent has a row for each child in the table. Parent is 1 and all dependents have a 2. They share a primary key (parent's ssn). I want to select all parents who was hired between specific date range and their dependants rows. The dependent hire date column is null. So when I write the following... I'm only getting the parent rows...
SELECT PRIMARY_KEY_VALUE, RECORD_ID, LAST_NAME, FIRST_NAME, HIRE_DATE
FROM CIGNA_ELIGIBILITY
WHERE(HIRE_DATE BETWEEN '20171101' AND '20171131');
If i understand your problem correctly, that on the date range provided, you want to return records associated with it and all dependents(provided that parents/childs has same prim_key) then one way could be to use IN.
select *
from table1 t1
where t1.prim_key in
(
select t2.prim_key
from table1 t2
where t2.hire_date between '2017-01-01' AND '2017-01-30'
);
what the above query does is that from sub-query select PRIM_KEY of the date range specified and then in main query select all record associated with it.
Result:
+---+----------+--------+-------+-------+---------------------+
| | prim_key | record | last | first | hire_date |
+---+----------+--------+-------+-------+---------------------+
| 1 | 12345 | 1 | JONES | MARY | 01.01.2017 00:00:00 |
| 2 | 12345 | 2 | JONES | TIM | NULL |
+---+----------+--------+-------+-------+---------------------+
DEMO
Update:
Another option could be to use exists:
select *
from table1 t1
where exists
(
select 1
from table1 t2
where t1.prim_key = t2.prim_key
and t2.hire_date between '2017-01-01' AND '2017-01-30'
)

Select Earliest Date and Time from List of Distinct User Sessions

I have a table of user access sessions which records website visitor activity:
accessid, userid, date, time, url
I'm trying to retrieve all distinct sessions for userid 1234, as well as the earliest date and time for each of those distinct sessions.
SELECT
DISTINCT accessid,
date,
time
FROM
accesslog
WHERE userid = '1234'
GROUP BY accessid
This gives me the date and time of a random row within each distinct accessid. I've read a number of posts recommending the use of min() and max(), so I tried:
SELECT DISTINCT accessid, MIN(DATE) AS date, MIN(TIME) AS time FROM accesslog WHERE userid = '1234' GROUP BY accessid ORDER BY date DESC, time DESC
... and even...
SELECT DISTINCT accessid, MIN(CONCAT(DATE, ' ', TIME)) AS datetime FROM accesslog WHERE userid = '1234' GROUP BY accessid ORDER BY date DESC, time DESC
... but I never get the correct result of the earliest date and time.
What is the trick to ordering this kind of query?
EDIT -
Something weird is happening....
The code posted below by Bill Karwin correctly retrieves the earliest date and time for sessions that started in 2009-09. But, for sessions that began on some day in 2009-08, the time and date for the first hit occurring in the current month is what is returned. In other words, the query does not appear to be spanning months!
Example data set:
accessid | userid | date | time
1 | 1234 | 2009-08-15 | 01:01:01
1 | 1234 | 2009-09-01 | 12:01:01
1 | 1234 | 2009-09-15 | 13:01:01
2 | 1234 | 2009-09-01 | 14:01:01
2 | 1234 | 2009-09-15 | 15:01:01
At least on my actual data table, the query posted below finds the follow earliest date and time for each of the two accessid's:
accessid | userid | date | time
1 | 1234 | 2009-09-01 | 12:01:01
2 | 1234 | 2009-09-01 | 14:01:01
... and I would guess that the only reason the result for accessid 2 appears correct is because it has no hits in a previous month.
Am I going crazy?
EDIT 2 -
The answer is yes, I am going crazy. The query works on the above sample data when placed in a table of duplicate structure.
Here is the (truncated) original data. I included the very first hit, another hit in the same month, the first hit of the next month, and then the last hit of the month. The original data set has many more hits in between these points, for a total of 462 rows.
accessid | date | time
cbb82c08d3103e721a1cf0c3f765a842 | 2009-08-18 | 04:01:42
cbb82c08d3103e721a1cf0c3f765a842 | 2009-08-23 | 23:18:52
cbb82c08d3103e721a1cf0c3f765a842 | 2009-09-17 | 05:12:16
cbb82c08d3103e721a1cf0c3f765a842 | 2009-09-18 | 06:29:59
... the query returns the 2009-09-17 value as the earliest value when the original table is queried. But, when I copy the ........ oh, balls.
It's because the hits from 2009-08% have an empty userid field.
This is a variation of the "greatest-n-per-group" problem that comes up on StackOverflow several times per week.
SELECT
a1.accessid,
a1.date,
a1.time
FROM
accesslog a1
LEFT OUTER JOIN
accesslog a2
ON (a1.accessid = a2.accessid AND a1.userid = a2.userid
AND (a1.date > a2.date OR a1.date = a2.date AND a1.time > a2.time))
WHERE a1.userid = '1234'
AND a2.accessid IS NULL;
The way this works is that we try to find a row (a2) that has the same accessid and userid, and an earlier date or time than the row a1. When we can't find an earlier row, then a1 must be the earliest row.
Re your comment, I just tried it with the sample data you provided. Here's what I get:
+----------+------------+----------+
| accessid | date | time |
+----------+------------+----------+
| 1 | 2009-08-15 | 01:01:01 |
| 2 | 2009-09-01 | 14:01:01 |
+----------+------------+----------+
I'm using MySQL 5.0.75 on Mac OS X.
Try this
SELECT
accessid,
date,
time
FROM
accesslog
WHERE userid = '1234'
GROUP BY accessid
HAVING MIN(date)
It will return all unique accesses with minimum time for each for userid = '1234'.