MySQL/Ms SQL latest records with multiple id's - sql

I'm no sql-expert, but came across this problem:
I have to retrieve data from Microsoft SQL 2008 server. It holds different measurement data from different probes, that don't have any recording intervals. Meaning that some probe can transfer data in the database once every week, another once every second. Probes are identified by id's (not unique), and the point is to retrieve only the last record from each id (probe). Table looks like this (last 5, order by SampleDateTime desc):
TagID SampleDateTime SampleValue QualityID
13 634720670797944946 112 192
23 634720670797944946 38.1 192
17 634720670797944946 107.5 192
14 634720670748012090 110.6 192
19 634720670748012090 99.7 192
I CAN'T modify the server or even the settings, am only authorized to do queries. And I'd need to retrieve the requested data on even intervals (say once every minute or so). There are over 100 probes (with different id's) of which about 40 need to be read. So I am guessing that if this could be done in a single query it could be way more efficient than to get each row in a separate query.
Using MySQL and a similar table got the desired result this way (suggestions for a better way highly appreciated!):
SELECT TagID,SampleDateTime,SampleValue FROM
(
SELECT TagID,SampleDateTime,SampleValue FROM measurements
WHERE TagID IN(101,102,103) ORDER BY SampleDateTime DESC
)
AS table1 GROUP BY TagID;
Thought that would do the trick (didn't manage with MAX() or DISTINCT or no matter what I tried), as it did, with the correct data even. But naturally it doesn't work in Ms SQL because of 'GROUP BY'.
Column 'table1.SampleValue' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
I'm extremely stuck with this and so any insight would be more than welcome.

I am slightly confused as you have tagged MySQL and SQL-Server. For SQL-Server, I would use the ROW_NUMBER function to assist:
SELECT m.TagID, m.SampleDateTime, m.SampleValue, m.QualityID
FROM ( SELECT *, ROW_NUMBER() OVER(PARTITION BY TagID ORDER BY SampleDateTime DESC) [RowNumber]
FROM Measurements
) m
WHERE Rownumber = 1
The ROW_NUMBER function does exactly what it says on the tin, gives each row a number based on criteria you provide. So in the example above PARTITION BY TagID tells ROW_NUMBER to start again at 1 each time a new TagID is encountered. ORDER BY SampleDateTime DESC tells ROW_NUMBER to start numbering the each TagID at the latest entry and work upwards to the earliest entry.
The reason your query failed is because MySQL allows implicit group by, meaning that because you have only specified GROUP BY TagID any fields that are in the select list and not contained within an aggregate function will get the values of a "random" row assigned to them (the latest row in your case because you specified ORDER BY SampleDateTime DESC in the subquery.
Just in case it is required the following should work in most DBMS and is a better way of producing a similar query to the one you have been running in MySQL:
SELECT m.TagID, m.SampleDateTime, m.SampleValue, m.QualityID
FROM Measurements m
INNER JOIN
( SELECT TagID, MAX(SampleDateTime) AS SampleDateTime
FROM Measurements
GROUP BY TagID
) MaxTag
ON MaxTag.TagID = m.TagID
AND MaxTag.SampleDateTime = m.SampleDateTime

Related

(Hive) SQL retrieving data from a column that has 1 to N relationship in another column

How can I retrieve rows where BID comes up multiple times in AID
You can see the sample below, AID and BID columns are under the PrimaryID, and BIDs are under AID. I want to come up with an output that only takes records where BIDs had 1 to many relationship with records on AIDs column. Example output below.
I provided a small sample of data, I am trying to retrieve 20+ columns and joining 4 tables. I have unqiue PrimaryIDs and under those I have multiple unique AIDs, however under these AIDs I can have multiple non-unqiue BIDs that can repeatedly come up under different AIDs.
Hive supports window functions. A window function can associate every row in a group with an attribute of the group. Count() being one of the supported functions. In your case you can use that a and select rows for which that count > 1
The partition by clause you specify which columns define the group, tge same way that you would in the more familiar group by clause.
Something like this:
select * from
(
Select *,
count(*) over (partition by primaryID,AID) counts
from mytable
) x
Where counts>1

SQL to find best row in group based on multiple columns?

Let's say I have an Oracle table with measurements in different categories:
CREATE TABLE measurements (
category CHAR(8),
value NUMBER,
error NUMBER,
created DATE
)
Now I want to find the "best" row in each category, where "best" is defined like this:
It has the lowest errror.
If there are multiple measurements with the same error, the one that was created most recently is the considered to be the best.
This is a variation of the greatest N per group problem, but including two columns instead of one. How can I express this in SQL?
Use ROW_NUMBER:
WITH cte AS (
SELECT m.*, ROW_NUMBER() OVER (PARTITION BY category ORDER BY error, created DESC) rn
FROM measurements m
)
SELECT category, value, error, created
FROM cte
WHERE rn = 1;
For a brief explanation, the PARTITION BY clause instructs the DB to generate a separate row number for each group of records in the same category. The ORDER BY clause places those records with the smallest error first. Should two or more records in the same category be tied with the lowest error, then the next sorting level would place the record with the most recent creation date first.

SQL Server: I have multiple records per day and I want to return only the first of the day

I have some records track inquires by DATETIME. There is an glitch in the system and sometimes a record will enter multiple times on the same day. I have a query with a bunch of correlated subqueries attached to these but the numbers are off because when there were those glitches in the system then these leads show up multiple times. I need the first entry of the day, I tried fooling around with MIN but I couldn't quite get it to work.
I currently have this, I am not sure if I am on the right track though.
SELECT SL.UserID, MIN(SL.Added) OVER (PARTITION BY SL.UserID)
FROM SourceLog AS SL
Here's one approach using row_number():
select *
from (
select *,
row_number() over (partition by userid, cast(added as date) order by added) rn
from sourcelog
) t
where rn = 1
You could use group by along with min to accomplish this.
Depending on how your data is structured if you are assigning a unique sequential number to each record created you could just return the lowest number created per day. Otherwise you would need to return the ID of the record with the earliest DATETIME value per day.
--Assumes sequential IDs
select
min(Id)
from
[YourTable]
group by
--the conversion is used to stip the time value out of the date/time
convert(date, [YourDateTime]

How to group rows after another grouping in oracle?

I have a table called correctObjects. In this tablet here a lot of grups which has different number records. One example is given below as grup 544 has 5 rows in table. So firstly, I should group all records by GRUP COLUMN then I must do inner matching by CAP COLUMN. So in grup#544 there is three different CAP values then I must give Inner Group number to these records. How can I do these two level grouping process. GRUP column is already done. Inner Grup Column is null in every records.
After Inner Group process, It must look like as belows:
I am using Oracle 11g R2 and PL/SQL Developer
Your question lacks certain details, so I'll just give you a starting point, and you can tweak it to suit your needs.
It's not entirely clear, but the way I understand it, you want to rank the different rows by cap. And I think the ranking is independent for every distinct grup value.
What's not clear to me is why 125 mm is ranked 1, and 62 mm is ranked 2. Is it based on the value? Is it based on which row is the first one, and if so, how are the rows ordered? Or maybe you don't really care which one is first or second, as long as they are grouped correctly. I'll have to assume the latter.
In any case, it sounds like you want to use the dense_rank() analytic function in some form:
select mip, startmi, cap, grup,
dense_rank() over (partition by grup order by cap) as inner_grup
from tbl

Find row number in a sort based on row id, then find its neighbours

Say that I have some SELECT statement:
SELECT id, name FROM people
ORDER BY name ASC;
I have a few million rows in the people table and the ORDER BY clause can be much more complex than what I have shown here (possibly operating on a dozen columns).
I retrieve only a small subset of the rows (say rows 1..11) in order to display them in the UI. Now, I would like to solve following problems:
Find the number of a row with a given id.
Display the 5 items before and the 5 items after a row with a given id.
Problem 2 is easy to solve once I have solved problem 1, as I can then use something like this if I know that the item I was looking for has row number 1000 in the sorted result set (this is the Firebird SQL dialect):
SELECT id, name FROM people
ORDER BY name ASC
ROWS 995 TO 1005;
I also know that I can find the rank of a row by counting all of the rows which come before the one I am looking for, but this can lead to very long WHERE clauses with tons of OR and AND in the condition. And I have to do this repeatedly. With my test data, this takes hundreds of milliseconds, even when using properly indexed columns, which is way too slow.
Is there some means of achieving this by using some SQL:2003 features (such as row_number supported in Firebird 3.0)? I am by no way an SQL guru and I need some pointers here. Could I create a cached view where the result would include a rank/dense rank/row index?
Firebird appears to support window functions (called analytic functions in Oracle). So you can do the following:
To find the "row" number of a a row with a given id:
select id, row_number() over (partition by NULL order by name, id)
from t
where id = <id>
This assumes the id's are unique.
To solve the second problem:
select t.*
from (select id, row_number() over (partition by NULL order by name, id) as rownum
from t
) t join
(select id, row_number() over (partition by NULL order by name, id) as rownum
from t
where id = <id>
) tid
on t.rownum between tid.rownum - 5 and tid.rownum + 5
I might suggest something else, though, if you can modify the table structure. Most databases offer the ability to add an auto-increment column when a row is inserted. If your records are never deleted, this can server as your counter, simplifying your queries.