Selecting distinct values from database - sql

I have a table as follows:
ParentActivityID | ActivityID | Timestamp
1 A1 T1
2 A2 T2
1 A1 T1
1 A1 T5
I want to select unique ParentActivityID's along with Timestamp. The time stamp can be the most recent one or the first one as is occurring in the table.
I tried to use DISTINCT but i came to realise that it dosen't work on individual columns. I am new to SQL. Any help in this regard will be highly appreciated.

DISTINCT is a shorthand that works for a single column. When you have multiple columns, use GROUP BY:
SELECT ParentActivityID, Timestamp
FROM MyTable
GROUP BY ParentActivityID, Timestamp
Actually i want only one one ParentActivityID. Your solution will give each pair of ParentActivityID and Timestamp. For e.g , if i have [1, T1], [2,T2], [1,T3], then i wanted the value as [1,T3] and [2,T2].
You need to decide what of the many timestamps to pick. If you want the earliest one, use MIN:
SELECT ParentActivityID, MIN(Timestamp)
FROM MyTable
GROUP BY ParentActivityID

Try this:
SELECT [ParentActivityId],
MIN([Timestamp]) AS [FirstTimestamp],
MAX([Timestamp]) AS [RecentTimestamp]
FROM [Table]
GROUP BY [ParentActivityId]
This will provide you the first timestamp and the most recent timestamp for each ParentActivityId that is present in your table. You can choose the ones you need as per your need.

"Group by" is what you need here. Just do "group by ParentActivityID" and tell that most recent timestamp along all rows with same ParentActivityID is needed for you:
SELECT ParentActivityID, MAX(Timestamp) FROM Table GROUP BY ParentActivityID
"Group by" operator is like taking rows from a table and putting them in a map with a key defined in group by clause (ParentActivityID in this example). You have to define how grouping by will handle rows with duplicate keys. For this you have various aggregate functions which you specify on columns you want to select but which are not part of the key (not listed in group by clause, think of them as a values in a map).
Some databases (like mysql) also allow you to select columns which are not part of the group by clause (not in a key) without applying aggregate function on them. In such case you will get some random value for this column (this is like blindly overwriting value in a map with new value every time). Still, SQL standard together with most databases out there will not allow you to do it. In such case you can use min(), max(), first() or last() aggregate function to work around it.

Use CTE for getting the latest row from your table based on parent id and you can choose the columns from the entire row of the output .
;With cte_parent
As
(SELECT ParentActivityId,ActivityId,TimeStamp
, ROW_NUMBER() OVER(PARTITION BY ParentActivityId ORDER BY TimeStamp desc) RNO
FROM YourTable )
SELECT *
FROM cte_parent
WHERE RNO =1

Related

SQL Query for multiple columns with one column distinct

I've spent an inordinate amount of time this morning trying to Google what I thought would be a simple thing. I need to set up an SQL query that selects multiple columns, but only returns one instance if one of the columns (let's call it case_number) returns duplicate rows.
select case_number, name, date_entered from ticket order by date_entered
There are rows in the ticket table that have duplicate case_number, so I want to eliminate those duplicate rows from the results and only show one instance of them. If I use "select distinct case_number, name, date_entered" it applies the distinct operator to all three fields, instead of just the case_number field. I need that logic to apply to only the case_number field and not all three. If I use "group by case_number having count (*)>1" then it returns only the duplicates, which I don't want.
Any ideas on what to do here are appreciated, thank you so much!
You can use ROW_NUMBER(). For example
select *
from (
select *,
row_number() over(partition by case_number) as rn
) x
where rn = 1
The query above will pseudo-randomly pick one row for each case_number. If you want a better selection criteria you can add ORDER BY or window frames to the OVER clause.

Select columns from second subquery if first returns NULL

I have two queries that I'm running separately from a PHP script. The first is checking if an identifier (group) has a timestamp in a table.
SELECT
group, MAX(timestamp) AS timestamp, value
FROM table_schema.sample_table
GROUP BY group, value
If there is no timestamp, then it runs this second query that retrieves the minimum timestamp from a separate table:
SELECT
group, MIN(timestamp) as timestamp, value AS value
FROM table_schema.src_table
GROUP BY group, value
And goes on from there.
What I would like to do, for the sake of conciseness, is to have a single query that runs the first statement, but that defaults to the second if NULL. I've tried with coalesce() and CASE statements, but they require subqueries to return single columns (which I hadn't run into being an issue yet). I then decided I should try a JOIN on the table with the aggregate timestamp to get the whole row, but then quickly realized I can't variate the table being joined (not to my knowledge). I opted to try joining both results and getting the max, something like this:
Edit: I am so tired, this should be a UNION, not a JOIN
sorry for any possible confusion :(
SELECT smpl.group, smpl.value, MAX(smpl.timestamp) AS timestamp
FROM table_schema.sample_table as smpl
INNER JOIN
(SELECT src.group, src.value, MIN(src.timestamp) AS timestamp
FROM source_table src
GROUP BY src.group, src.value) AS history
ON
smpl.group = history.group
GROUP BY smpl.group, smpl.value
I don't have a SELECT MAX() on this because it's really slow as is, most likely because my SQL is a bit rusty.
If anyone knows a better approach, I'd appreciate it!
Please try this:
select mx.group,(case when mx.timestamp is null then mn.timestamp else mx.timestamp end)timestamp,
(case when mx.timestamp is null then mn.value else mx.value end)value
(
SELECT
group, MAX(timestamp) AS timestamp, value
FROM table_schema.sample_table
GROUP BY group, value
)mx
left join
(
SELECT
group, MIN(timestamp) as timestamp, value AS value
FROM table_schema.src_table
GROUP BY group, value
)mn
on mx.group = mn.group

SQL Server : verify that two columns are in same sort order

I have a table with an ID and a date column. It's possible (likely) that when a new record is created, it gets the next larger ID and the current datetime. So if I were to sort by date or I were to sort by ID, the resulting data set would be in the same order.
How do I write a SQL query to verify this?
It's also possible that an older record is modified and the date is updated. In that case, the records would not be in the same sort order. I don't think this happens.
I'm trying to move the data to another location, and if I know that there are no modified records, that makes it a lot simpler.
I'm pretty sure I only need to query those two columns: ID, RecordDate. Other links indicate I should be able to use LAG, but I'm getting an error that it isn't a built-in function name.
In other words, both https://dba.stackexchange.com/questions/42985/running-total-to-the-previous-row and Is there a way to access the "previous row" value in a SELECT statement? should help, but I'm still not able to make that work for what I want.
If you cannot use window functions, you can use a correlated subquery and EXISTS.
SELECT *
FROM elbat t1
WHERE EXISTS (SELECT *
FROM elbat t2
WHERE t2.id < t1.id
AND t2.recorddate > t1.recorddate);
It'll select all records where another record with a lower ID and a greater timestamp exists. If the result is empty you know that no such record exists and the data is like you want it to be.
Maybe you want to restrict it a bit more by using t2.recorddate >= t1.recorddate instead of t2.recorddate > t1.recorddate. I'm not sure how you want it.
Use this:
SELECT ID, RecordDate FROM tablename t
WHERE
(SELECT COUNT(*) FROM tablename WHERE tablename.ID < t.ID)
<>
(SELECT COUNT(*) FROM tablename WHERE tablename.RecordDate < t.RecordDate);
It counts for each row, how many rows have id less than the row's id and
how many rows have RecordDate less than the row's RecordDate.
If these counters are not equal then it outputs this row.
The result is all the rows that would not be in the same position after sorting by ID and RecordDate
One method uses window functions:
select count(*)
from (select t.*,
row_number() over (order by id) as seqnum_id,
row_number() over (order by date, id) as seqnum_date
from t
) t
where seqnum_id <> seqnum_date;
When the count is zero, then the two columns have the same ordering. Note that the second order by includes id. Two rows could have the same date. This makes the sort stable, so the comparison is valid even when date has duplicates.
the above solutions are all good but if both dates and ids are in increment then this should also work
select modifiedid=t2.id from
yourtable t1 join yourtable t2
on t1.id=t2.id+1 and t1.recordDate<t2.recordDate

Group by or Distinct - But several fields

How can I use a Distinct or Group by statement on 1 field with a SELECT of All or at least several ones?
Example: Using SQL SERVER!
SELECT id_product,
description_fr,
DiffMAtrice,
id_mark,
id_type,
NbDiffMatrice,
nom_fr,
nouveaute
From C_Product_Tempo
And I want Distinct or Group By nom_fr
JUST GOT THE ANSWER:
select id_product, description_fr, DiffMAtrice, id_mark, id_type, NbDiffMatrice, nom_fr, nouveaute
from (
SELECT rn = row_number() over (partition by [nom_fr] order by id_mark)
, id_product, description_fr, DiffMAtrice, id_mark, id_type, NbDiffMatrice, nom_fr, nouveaute
From C_Product_Tempo
) d
where rn = 1
And this works prfectly!
If I'm understanding you correctly, you just want the first row per nom_fr. If so, you can simply use a subquery to get the lowest id_product per nom_fr, and just get the corresponding rows;
SELECT * FROM C_Product_Tempo WHERE id_product IN (
SELECT MIN(id_product) FROM C_Product_Tempo GROUP BY nom_fr
);
An SQLfiddle to test with.
You need to decide what to do with the other fields. For example, for numeric fields, do you want a sum? Average? Max? Min? For non-numeric fields to you want the values from a particular record if there are more than one with the same nom_fr?
Some SQL Systems allow you to get a "random" record when you do a GROUP BY, but SQL Server will not - you must define the proper aggregation for columns that are not in the GROUP BY.
GROUP BY is used to group in conjunction with an aggregate function (see http://www.w3schools.com/sql/sql_groupby.asp), so it's no use grouping without counting, summing up etc. DISTINCT eleminates duplicates but how that matches with the other columns you want to extract, I can't imagine, because some rows will be removed from the result.

Aggregate functions in WHERE clause in SQLite

Simply put, I have a table with, among other things, a column for timestamps. I want to get the row with the most recent (i.e. greatest value) timestamp. Currently I'm doing this:
SELECT * FROM table ORDER BY timestamp DESC LIMIT 1
But I'd much rather do something like this:
SELECT * FROM table WHERE timestamp=max(timestamp)
However, SQLite rejects this query:
SQL error: misuse of aggregate function max()
The documentation confirms this behavior (bottom of page):
Aggregate functions may only be used in a SELECT statement.
My question is: is it possible to write a query to get the row with the greatest timestamp without ordering the select and limiting the number of returned rows to 1? This seems like it should be possible, but I guess my SQL-fu isn't up to snuff.
SELECT * from foo where timestamp = (select max(timestamp) from foo)
or, if SQLite insists on treating subselects as sets,
SELECT * from foo where timestamp in (select max(timestamp) from foo)
There are many ways to skin a cat.
If you have an Identity Column that has an auto-increment functionality, a faster query would result if you return the last record by ID, due to the indexing of the column, unless of course you wish to put an index on the timestamp column.
SELECT * FROM TABLE ORDER BY ID DESC LIMIT 1
I think I've answered this question 5 times in the past week now, but I'm too tired to find a link to one of those right now, so here it is again...
SELECT
*
FROM
table T1
LEFT OUTER JOIN table T2 ON
T2.timestamp > T1.timestamp
WHERE
T2.timestamp IS NULL
You're basically looking for the row where no other row matches that is later than it.
NOTE: As pointed out in the comments, this method will not perform as well in this kind of situation. It will usually work better (for SQL Server at least) in situations where you want the last row for each customer (as an example).
you can simply do
SELECT *, max(timestamp) FROM table
Edit:
As aggregate function can't be used like this so it gives error. I guess what SquareCog had suggested was the best thing to do
SELECT * FROM table WHERE timestamp = (select max(timestamp) from table)