SQL Get n last unique entries by date

SQL Get n last unique entries by date - sql

I have an access database that I'm well aware is quite poorly designed, unfortunately it is what I must use. It looks a little like the following:
(Row# is not a column in the database, it's just there to help me describe what I'm after)
Row# ID Date Misc
1 001 01/8/2013 A
2 001 01/8/2013 B
3 001 01/8/2013 C
4 002 02/8/2013 D
5 002 02/8/2013 A
6 003 04/8/2013 B
7 003 04/8/2013 D
8 003 04/8/2013 D
What I'm trying to do is obtain all information entered for the last n (by date) 'entries' where an 'entry' is all rows with a unique ID.
So if I want the last 1 entry I will get rows 6, 7 and 8. The last two entries will get me rows 4-8 etc.
I've tried to get the SN's needed in a subselect and then select all entries where those SN's appear, but I couldn't get it to work. Any help appreciated.
Thanks.

The proper Access syntax:
select *
from t
where ID in (select top 10 ID
from t
group by ID
order by max([date]) desc
)

I think this will work:
select *
from table
where Date in (
select distinct(Date) as unique_date from table order by unique_date DESC limit <num>
)
The idea is to use the subselect with a limit to only identify dates you care about.
EDIT: Some databases do not allow a limit in a subquery (I'm looking at you, mysql). In that case, you'll have to make a temporary table out of the subquery then select * from it.

Related

How do I stop my query from pulling duplicates?

Yes, I know this seems simple:
SELECT DISTINCT(...)
Except, it apparently isn't
Here is my actual Query:
SELECT
DeclinationReasons.Reason,
EmployeeInformation.ID,
EmployeeInformation.Employee,
EmployeeInformation.Active,
CompletedTrainings.DecShotDate,
CompletedTrainings.DecShotLocation,
CompletedTrainings.DecReason,
CompletedTrainings.DecExplanation,
IIf([DecShotLocation]="MCS","Yes","No") AS YesMCS,
IIf([DecReason]=1,1,0) AS YesAllergy,
IIf([DecReason]=2,1,0) AS YesImmune,
IIf([DecReason]=3,1,0) AS YesAdverse,
IIf([DecReason]=4,1,0) AS YesMedical,
IIf([DecReason]=5,1,0) AS YesSpiritual,
IIf([DecReason]=6,1,0) AS YesOther,
IIf([DecReason]=7,1,0) AS YesAlready
FROM
EmployeeInformation
INNER JOIN (CompletedTrainings
LEFT JOIN DeclinationReasons ON CompletedTrainings.DecReason = DeclinationReasons.ReasonID)
ON EmployeeInformation.ID = CompletedTrainings.Employee
GROUP BY
DeclinationReasons.Reason,
EmployeeInformation.ID,
EmployeeInformation.Employee,
EmployeeInformation.Active,
CompletedTrainings.DecShotDate,
CompletedTrainings.DecShotLocation,
CompletedTrainings.DecReason,
CompletedTrainings.DecExplanation,
IIf([DecShotLocation]="MCS","Yes","No"),
IIf([DecReason]=1,1,0),
IIf([DecReason]=2,1,0),
IIf([DecReason]=3,1,0),
IIf([DecReason]=4,1,0),
IIf([DecReason]=5,1,0),
IIf([DecReason]=6,1,0),
IIf([DecReason]=7,1,0)
HAVING
((((EmployeeInformation.Active) Like -1)
AND ((CompletedTrainings.DecShotDate + 365 >= DATE())
OR (CompletedTrainings.DecShotDate IS NULL))));
This is Joining a few tables (obviously) in order to get a number of records. The problem is that if someone is duplicated on the table with a NULL in one of the date fields, and a date in another field, it pulls both the NULL and the DATE, or pulls multiple NULLS it might pull multiple dates but those are not present right at the moment.
I need the Nulls, they are actual data in this particular case, but if someone has a date and a NULL I need to pull only the newest record, I thought I could add MAX(RecordID) from the table, but that didn't change the results of the query either.
That code:
SELECT
DeclinationReasons.Reason,
EmployeeInformation.ID,
EmployeeInformation.Employee,
EmployeeInformation.Active,
MAX(CompletedTrainings.RecordID),
CompletedTrainings.DecShotDate
...
And it returned the same issue, Duplicated EmployeeInformation.ID with different DecShotDate values.
Currently it returns:
ID
Active
DecShotDate
etc. x a bunch
1
-1
date date
whatever goes
2
-1
in these
2
-1
date date
columns
These are being used in a report, that is to determine the total number of employees who fit the criteria of the report. The NULLs in DecShotDate are needed as they show people who did not refuse to get a flu vaccine in the current year, while the dates are people who did refuse.
Now I have come up with one simple solution, I could add a column to the CompletedTrainings Table that contains a date or other value, and add that to the HAVING statement. This might be the right solution as this is a yearly training questionnaire that employees have to fill out. But I am asking for advice before doing this.
Am I right in thinking I need to add a column to filter by so that older data isn't being pulled, or should I be able to do this by pulling recordID, and did I just bork that part of the query up?
Edited to add raw table views:
EmployeeInformation Table:
ID
Last
First
empID
Active
Termdate
DoH
Title
PT/FT/PD
PI
1
Doe
Jane
982
-1
date
Sr
PD
X
2
Roe
John
278
0
date
date
Jr
PD
X
3
Moe
Larry
1232
-1
date
Sr
FT
X
4
Zoe
Debbie
1424
-1
date
Sr
PT
X
DeclinationReasons Table:
ReasonID
Reason
1
Allergy
2
Already got it
3
Illness
CompletedTrainings Table:
RecordID
Employee
Training
...
DecShotdate
DecShotLocation
DecShotReason
DecExp
1
1
4
date
location
2
text
2
1
4
3
2
4
4
3
4
date
location
3
text
5
3
4
date
location
1
text
6
4
4

After some serious soul searching, I decided to use another column and filter by that.
In the end my query looks like this:
SELECT *
FROM (
(
SELECT RecordID, DecShotDate, DecShotLocation, DecReason, DecExplanation, Employee,
IIf([DecShotLocation]="MCS","Yes","No") AS YesMCS, IIf([DecReason]=1,1,0) AS YesAllergy,
IIf([DecReason]=2,1,0) AS YesImmune, IIf([DecReason]=3,1,0) AS YesAdverse,
IIf([DecReason]=4,1,0) AS YesMedical, IIf([DecReason]=5,1,0) AS YesSpiritual,
IIf([DecReason]=6,1,0) AS YesOther, IIf([DecReason]=7,1,0) AS YesAlready
FROM CompletedTrainings WHERE (CompletedDate > DATE() - 365 ) AND (Training = 69)) AS T1
LEFT JOIN
(
SELECT ID, Active FROM EmployeeInformation) AS T2 ON T1.Employee = T2.ID)
LEFT JOIN
(
SELECT Reason, ReasonID FROM DeclinationReasons) AS T3 ON T1.DecReason = T3.ReasonID;
This may not have been the best solution, but it did exactly what I needed. Which is to get the information by latest entry into the database.
Previously I had tried to use MAX(), DISTINCT(), etc. but always had a problem of multiple records being retrieved. In this case, I intentionally SELECT the most recent records first, then join them to the results of the next query, and so on. Until I have all the required data for my report.
I write this in hopes someone else finds it useful. Or even better if someone tells me why this is wrong, so as to improve my own skills.

pagination and filtering on a very large table in postgresql (keyset pagination?)

I have a scientific database with currently 4,300,000 records. It's a scientific database, and an API is feeding it. In june 2020, I will probably have about 100,000,000 records.
This is de layout of the table 'output':
ID | sensor_ID | speed | velocity | direction
-----------------------------------------------------
1 | 1 | 10 | 1 | up
2 | 2 | 12 | 2 | up
3 | 2 | 11.5 | 1.5 | down
4 | 1 | 9.5 | 0.8 | down
5 | 3 | 11 | 0.75 | up
...
BTW, this is dummy data. But output is a table with 5 columns: ID, sensor_ID, speed, velocity and direction.
What I want to achieve is a decent pagination and filter method. I want to create a website (in nodejs) where this +4,000,000 records (for now) will be displayed, 10,000 records per page. I also want to be able to filter on sensor_ID, speed, velocity or direction.
For now, I have this query for selecting specific rows:
SELECT * FROM output ORDER BY ID DESC OFFSET 0 LIMIT 10000 // first 10,000 rows
SELECT * FROM output ORDER BY ID DESC OFFSET 10000 LIMIT 10000 // next 10,000 rows
...
I'm searching for some information/tips about creating a decent pagination method. For now, it's still quiet fast the way I do it, but I think it will be a lot slower when we hit +50,000,000 records.
First of all, I found this page: https://www.citusdata.com/blog/2016/03/30/five-ways-to-paginate/. I'm interested in the keyset pagination. But to be honest, I have no clue how to start.
What I think I must do:
Create an index on the ID-field:
CREATE UNIQUE INDEX index_id ON output USING btree (ID)
I also found this page: https://leopard.in.ua/2014/10/11/postgresql-paginattion. When you scroll down to "Improvement #2: The Seek Method", you can see that they dropped the OFFSET-clause, and are using a WHERE-clause. I also see that they are using the last insert ID in their query:
SELECT * FROM output WHERE ID < <last_insert_id_here> ORDER BY ID DESC LIMIT 10000
I do not fully understand this. For the first page, I need the very last insert ID. Then I fetch the 10,000 newest records. But after that, to get the second page, I don't need the very last insert ID, I need the 10,000th last insert ID (I guess).
Can someone give me a good explanation about pagination and filtering in a fast way.
The stuff I'm using:
- postgresql
- pgadmin (for database management)
- node.js (latest version)
Thanks everyone! And have a nice 2020!
EDIT 1: I have no clue, but could massiveJS (https://massivejs.org/) be something good to use? And should I use it on ALL queries, or only on the pagination queries?
EDIT 2: I THINK I got it figured out a little bit (correct me if I'm wrong).
Let's say I have 100,000 records:
1) Get the last inserted ID
2) Use this last inserted ID to fetch the last 10,000 records
SELECT * FROM output WHERE ID < 100000 ORDER BY ID DESC LIMIT 10000 // last insert ID is here 100,000 because I have 100,000 records
3) Show the 10,000 records but also save the insert ID of the 10,000the record to use in the next query
4) Get the next 10,000 records with the new last insert id
SELECT * FROM output WHERE ID < 90000 ORDER BY ID DESC LIMIT 10000 // 90,000 is the very last insert id - 10,000
5) ...
Is this correct?

Here's how I handle this. For the first page I fetch, I use
SELECT id, col, col, col
FROM output
ORDER BY id DESC
LIMIT 10000
Then, in my client program (node.js) I capture the id value from the last row of the result set. When I need the next page, I do this.
SELECT id, col, col, col
FROM output
WHERE id < my_captured_id_value
ORDER BY id DESC
This exploits the index. And it works correctly even if you have deleted some rows from the table.
By the way, you probably want a descending index if your first pagination page has the largest ids. CREATE UNIQUE INDEX index_id ON output USING btree (ID DESC).
Pro tip SELECT * is harmful to performance on large databases. Always list the columns you actually need.

In keyset pagination you should set WHERE clause on what you want to set in ORDER BY clause , and for DESC you should use < and vice versa.
For the first page you can use something like this:
SELECT Col1, Col2, Col3
FROM db.tbl
WHERE Col3 LIKE '%search_term%'
ORDER BY Col1 DESC , Col2 ASC
LIMIT 10000
and for next page , you should send value of Col1 and Col2 from last row of the result to the query like this:
SELECT Col1, Col2, Col3
FROM db.tbl
WHERE Col3 LIKE '%search_term%'
AND ( Col1 < Col1_last_row_value AND Col2 > Col2_last_row_value)
ORDER BY Col1 DESC , Col2 ASC
LIMIT 10000
and in the server or client side you should check if your query bring back any result or not , if not that means you are done and the loading-icon of "infinite scroll" has to be hidden

SELECT query for selecting all field and counting the contents of the fields

Hello can anyone help me with a query where you select all columns and then count the content of the columns. Not sure how this could be achieved can someone shed some light on that.
For example I have a table that returns number of tickets but those tickets can be more than one so I would like to select all the tickets depending on a variable and then count the contents of it like this:
Booking ID | Tickets | variable
1 2
2 1
3 5
How would I go about to count the contents of for example booking 1 and 3 so that the result will be 7
Thank you.

SELECT SUM(Tickets) FROM your_table WHERE Booking_ID IN (1,3)

Select distinct values for a particular column choosing arbitrarily from duplicates

I have health data relating to deaths. Individual should die once maximum. In the database they sometimes don't; probably because causes of death were changed but the original entry was not deleted. I don't really understand how this was allowed to happen, but it has. So, as a made up example, I have:
Row_number | Individual_ID | Cause_of_death | Date_of_death
------------+---------------+-----------------------+---------------
1 | 1 | Stroke | 3 march 2008
2 | 2 | Myocardial infarction | 1 jan 2009
3 | 2 | Pulmonary Embolus | 1 jan 2009
I want each individual to have only one cause of death.
In the example, I want a query that returns row 1 and either row 2 or row 3 (not both). I have to make an arbitrary choice between rows 2 and 3 because there is no timestamp in any of the fields that can be used to determine which is the revision; it's not ideal but is unavoidable.
I can't make the SQL work to do this. I've tried inner joining distinct Individual_ID to the other fields, but this still gives all the rows. I've tried adding a 'having count(Individual_ID) = 1' clause with it. This leaves out people with more than one cause of death completely. Suggestions on the internet seem to be based on using a timestamped field to choose the most recent, but I don't have that.
IBM DB2. Windows XP. Any thoughts gratefully received.

Have you tried using MIN (or MAX) against the cause of death. (and the date of death, if they died on two different dates)
SELECT IndividualID, MIN(Cause_Of_Death), MIN (Date_Of_Death)
from deaths
GROUP BY IndividualID

I don't know DB2 so I'll answer in general. There are two main approaches:
select *
from T
join (
select keys, min(ID) as MinID
from T
group by keys
) on T.ID = MinID
And
select *, row_number() over (partition by keys) as r
from T
where r = 1
Both return all rows, no matter if duplicate or not. But they returns only one duplicate per "key".
Notice, that both statements are pseudo-SQL.

The row_number() approach is probably preferable from a performance standpoint. Here is usr's example, in DB2 syntax:
select * from (
select T.*, row_number() over (partition by Individual_ID) as r
from T
)
where r=1;

How to group by a column

Hi I know how to use the group by clause for sql. I am not sure how to explain this so Ill draw some charts. Here is my original data:
Name Location
----------------------
user1 1
user1 9
user1 3
user2 1
user2 10
user3 97
Here is the output I need
Name Location
----------------------
user1 1
9
3
user2 1
10
user3 97
Is this even possible?

The normal method for this is to handle it in the presentation layer, not the database layer.
Reasons:
The Name field is a property of that data row
If you leave the Name out, how do you know what Location goes with which name?
You are implicitly relying on the order of the data, which in SQL is a very bad practice (since there is no inherent ordering to the returned data)
Any solution will need to involve a cursor or a loop, which is not what SQL is optimized for - it likes working in SETS not on individual rows

Hope this helps
SELECT A.FINAL_NAME, A.LOCATION
FROM (SELECT DISTINCT DECODE((LAG(YT.NAME, 1) OVER(ORDER BY YT.NAME)),
YT.NAME,
NULL,
YT.NAME) AS FINAL_NAME,
YT.NAME,
YT.LOCATION
FROM YOUR_TABLE_7 YT) A
As Jirka correctly pointed out, I was using the Outer select, distinct and raw Name unnecessarily. My mistake was that as I used DISTINCT , I got the resulted sorted like
1 1
2 user2 1
3 user3 97
4 user1 1
5 3
6 9
7 10
I wanted to avoid output like this.
Hence I added the raw id and outer select
However , removing the DISTINCT solves the problem.
Hence only this much is enough
SELECT DECODE((LAG(YT.NAME, 1) OVER(ORDER BY YT.NAME)),
YT.NAME,
NULL,
YT.NAME) AS FINAL_NAME,
YT.LOCATION
FROM SO_BUFFER_TABLE_7 YT
Thanks Jirka

If you're using straight SQL*Plus to make your report (don't laugh, you can do some pretty cool stuff with it), you can do this with the BREAK command:
SQL> break on name
SQL> WITH q AS (
SELECT 'user1' NAME, 1 LOCATION FROM dual
UNION ALL
SELECT 'user1', 9 FROM dual
UNION ALL
SELECT 'user1', 3 FROM dual
UNION ALL
SELECT 'user2', 1 FROM dual
UNION ALL
SELECT 'user2', 10 FROM dual
UNION ALL
SELECT 'user3', 97 FROM dual
)
SELECT NAME,LOCATION
FROM q
ORDER BY name;
NAME LOCATION
----- ----------
user1 1
9
3
user2 1
10
user3 97
6 rows selected.
SQL>

I cannot but agree with the other commenters that this kind of problem does not look like it should ever be solved using SQL, but let us face it anyway.
SELECT
CASE main.name WHERE preceding_id IS NULL THEN main.name ELSE null END,
main.location
FROM mytable main LEFT JOIN mytable preceding
ON main.name = preceding.name AND MIN(preceding.id) < main.id
GROUP BY main.id, main.name, main.location, preceding.name
ORDER BY main.id
The GROUP BY clause is not responsible for the grouping job, at least not directly. In the first approximation, an outer join to the same table (LEFT JOIN below) can be used to determine on which row a particular value occurs for the first time. This is what we are after. This assumes that there are some unique id values that make it possible to arbitrarily order all the records. (The ORDER BY clause does NOT do this; it orders the output, not the input of the whole computation, but it is still necessary to make sure that the output is presented correctly, because the remaining SQL does not imply any particular order of processing.)
As you can see, there is still a GROUP BY clause in the SQL, but with a perhaps unexpected purpose. Its job is to "undo" a side effect of the LEFT JOIN, which is duplication of all main records that have many "preceding" ( = successfully joined) records.
This is quite normal with GROUP BY. The typical effect of a GROUP BY clause is a reduction of the number of records; and impossibility to query or test columns NOT listed in the GROUP BY clause, except through aggregate functions like COUNT, MIN, MAX, or SUM. This is because these columns really represent "groups of values" due to the GROUP BY, not just specific values.

If you are using SQL*Plus, use the BREAK function. In this case, break on NAME.
If you are using another reporting tool, you may be able to compare the "name" field to the previous record and suppress printing when they are equal.

If you use GROUP BY, output rows are sorted according to the GROUP BY columns as if you had an ORDER BY for the same columns. To avoid the overhead of sorting that GROUP BY produces, add ORDER BY NULL:
SELECT a, COUNT(b) FROM test_table GROUP BY a ORDER BY NULL;
Relying on implicit GROUP BY sorting in MySQL 5.6 is deprecated. To achieve a specific sort order of grouped results, it is preferable to use an explicit ORDER BY clause. GROUP BY sorting is a MySQL extension that may change in a future release; for example, to make it possible for the optimizer to order groupings in whatever manner it deems most efficient and to avoid the sorting overhead.
For full information - http://academy.comingweek.com/sql-groupby-clause/

SQL GROUP BY STATEMENT
SQL GROUP BY clause is used in collaboration with the SELECT statement to arrange identical data into groups.
Syntax:
1. SELECT column_nm, aggregate_function(column_nm) FROM table_nm WHERE column_nm operator value GROUP BY column_nm;
Example :
To understand the GROUP BY clauserefer the sample database.Below table showing fields from “order” table:
1. |EMPORD_ID|employee1ID|customerID|shippers_ID|
Below table showing fields from “shipper” table:
1. | shippers_ID| shippers_Name |
Below table showing fields from “table_emp1” table:
1. | employee1ID| first1_nm | last1_nm |
Example :
To find the number of orders sent by each shipper.
1. SELECT shipper.shippers_Name, COUNT (orders.EMPORD_ID) AS No_of_orders FROM orders LEFT JOIN shipper ON orders.shippers_ID = shipper.shippers_ID GROUP BY shippers_Name;
1. | shippers_Name | No_of_orders |
Example :
To use GROUP BY statement on more than one column.
1. SELECT shipper.shippers_Name, table_emp1.last1_nm, COUNT (orders.EMPORD_ID) AS No_of_orders FROM ((orders INNER JOIN shipper ON orders.shippers_ID=shipper.shippers_ID) INNER JOIN table_emp1 ON orders.employee1ID = table_emp1.employee1ID)
2. GROUP BY shippers_Name,last1_nm;
| shippers_Name | last1_nm |No_of_orders |
for more clarification refer my link
http://academy.comingweek.com/sql-groupby-clause/

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas