Selecting only one duplicate record - sql

I have a 3-column table.
ID (unique)
code
time (timestamp)
I want to show code records in the specified date range like this ;
SELECT * FROM table
WHERE time >= '2018-12-13' AND time <= '2018-12-16 23:59:59.999' ORDER BY ID
It's work. I do not have a problem with this query.
But I need the results of the only one row for each duplicate code records.
How can I do this with the above query?

use corelated subquery
select t1.* from table t1
where time = select min(time) from table t2
where t2.code=t1.code
and time >= '2018-12-13' AND time <= '2018-12-16 23:59:59.999'

Add DISTINCT to your select query if you want the whole row to be one row as unique and rest duplicate rows eliminated or else group by can help as group by groups the multiple entries taking one entry as a group and performing aggregatiom for the other to keep both the group section and other columns in sink
SELECT DISTINCT * from table
WHERE time >= '2018-12-13'
AND time <= '2018-12-16 23:59:59.999'
ORDER BY ID
or
SELECT * from table
WHERE time >= '2018-12-13'
AND time <= '2018-12-16 23:59:59.999'
group by code
ORDER BY ID

An appropriate method would be:
SELECT t.*
FROM table t
WHERE t.id = (SELECT t2.id
FROM table t t2
WHERE t2.code = t.code AND
t2.time >= '2018-12-13' AND
t2.time < '2018-12-17'
ORDER BY t2.id
LIMIT 1
)
ORDER BY ID;
In addition to being syntactically correct SQL that (apart from the LIMIT) should work in almost any database, it can also take advantage of an index on (code, time, id). It should be faster than an alternative using GROUP BY.

You can use group by or distinct clause.
SELECT ID, code, time
FROM table
WHERE time >= '2018-12-13'
AND time <= '2018-12-16 23:59:59.999'
GROUP BY code
ORDER BY ID

Related

SQL: Select from another table (t2) without joining but referencing a column from t1

I have a table with columns date and net_sales. For each day, I want to get the sum of the net_sales for the last 30 days.
This is my query:
thirty_days_net_sales AS (
SELECT
t1.date,
t1.net_sales AS net_sales_on_date,
(SELECT SUM(t2.net_sales) FROM total_net_sales_per_day t2 WHERE t2.date >= DATE_SUB(t1.date, INTERVAL 30 DAY) AND t2.date <= t1.date)
FROM
total_net_sales_per_day t1)
When I run this query I get the error: LEFT OUTER JOIN cannot be used without a condition that is an equality of fields from both sides of the join.
I am using Google BigQuery. Thanks in advance for your help!
Consider rather below approach
select *, sum(net_sales) over win last_30_days
from total_net_sales_per_day
window win as (order by unix_date(date) range between 29 preceding and current row )
You would use window functions. If you have data for every day (as the name of the table implies:
SELECT tnspd.*
sum(netsales) over (partition by date
order by date
rows between -30 and current row
)
FROM total_net_sales_per_day tnspd;
As the error said, you have to add and equal condition. In case of using join you have to use ON keyword for one equal condition.
In your query, because you do not have join explicitly, you must have an equal condition something like I added below:
thirty_days_net_sales AS (
SELECT
t1.date,
t1.net_sales AS net_sales_on_date,
(SELECT SUM(t2.net_sales) FROM total_net_sales_per_day t2 WHERE t2.date >= DATE_SUB(t1.date, INTERVAL 30 DAY) AND t2.date <= t1.date
AND t1.id==t2.id)
FROM
total_net_sales_per_day t1)
This link might help for more information:
https://sql.info/d/solved-bigquery-left-outer-join-cannot-be-used-without-a-condition-that-is-an-equality-of-fields-from-both-sides-of-the-join

Self-referencing a table for previous record matching user ID

I'm trying to find the easiest way to calculate cycle times from SQL data. In the data source I have unique station ID's, user ID's, and a date/time stamp, along with other data they are performing.
What I want to do is join the table to itself so that for each date/time stamp I get:
- the date/time stamp of the most recent previous instance of that user ID within 3 minutes or null
- the difference between those two stamps (the cycle time = amount of time between records)
This should be simple but I can't wrap my brain around it. Any help?
Unfortunately SQL Server does not support date range specifications in window functions. I would recommend a lateral join here:
select
t.*,
t1.timestamp last_timestamp,
datediff(second, t1.timestamp, t.timestamp) diff_seconds
from mytable t
outer apply (
select top(1) t1.*
from mytable t1
where
t1.user_id = t.user_id
and t1.timestamp >= dateadd(minute, -3, t.timestamp)
and t1.timestamp < t.timestamp
order by t1.timestamp desc
) t1
The subquery brings the most recent row within 3 minutes for the same user_id (or an empty resultset, if there is no row within that timeframe). You can then use that information in the outer query to display the corresponding timestamp, and compute the difference with the current one.
Simply calculate the difference of the current and the LAG timestamp, if it's more than three minutes return NULL instead:
with cte as
(
select
t.*
,datediff(second, timestamp, lag(timestamp) over (partition by user_id order by timestamp) as diff_seconds
from mytable as t
)
select cte.*
,case when diff_seconds <= 180 then diff_seconds end
from cte

Grouping by ID and time interval in ms sql

I am trying to write a query that groups like ids within a timespan.
Real world scenario:
I want to see rows created by the same ID within 5 seconds of each other.
SELECT top 10
Id,
CreatedOn
FROM Logs
where ((DATEPART(SECOND, CreatedOn) + 5) - DATEPART(SECOND, CreatedOn)) < 10
GROUP BY
DATEPART(SECOND, CreatedOn),
Id,
CreatedOn
order by CreatedOn desc
This isnt quite right but I feel like I am on the right track.
thanks in advance
You may try doing a query on the condition that the ID matches, and the seconds since epoch is within 5 seconds of the matching record:
SELECT
t1.Id,
t1.CreatedOn
FROM logs t1
WHERE EXISTS (SELECT 1 FROM logs t2
WHERE t1.Id = t2.Id AND
t1.CreatedOn <> t2.CreatedOn AND
ABS(DATEDIFF(SECOND, t1.CreatedOn, t2.CreatedOn)) <= 5)
ORDER BY
t1.CreatedOn DESC;
Could be further optimized this way:
SELECT t1.Id,
,t1.CreatedOn
FROM logs t1
WHERE EXISTS (
SELECT 1
FROM logs t2
WHERE t2.Id = t1.Id
AND t2.CreatedOn <> t1.CreatedOn
AND ABS(DATEDIFF(SECOND, t1.CreatedOn, t2.CreatedOn)) <= 5
)
ORDER BY
t1.CreatedOn DESC;

How to show a row for the dates not in records of a table as zero

I am trying to show the records as zero for the dates not found.
Below is my basic query:
Select date_col, count(distinct file_col), count(*) from tab1
where date_col between 'date1' and 'date2'
group by date_col;
The output is for one date.
I want all the dates to be shown in result.
The general way to deal with this type of problem is to use something called a calendar table. This calendar table contains all the dates which you want to appear in your report. We can create a crude one by using a subquery:
SELECT
t1.date,
COUNT(DISTINCT t2.file_col) AS d_cnt,
COUNT(t2.file_col) AS cnt
FROM
(
SELECT '2018-06-01' AS date UNION ALL
SELECT '2018-06-02' UNION ALL
...
) t1
LEFT JOIN tab1 t2
ON t1.date = t2.date_col
WHERE
t1.date BETWEEN 'date1' and 'date2'
GROUP BY
t1.date;
Critical here is that we left join the calendar table to your table containing the actual data, but we count a column in your data table. This means that zero would be reported for any day not having matching data.
If you are using postgreSQL, you could generate series with necessary dates period.
SELECT
t1.date,
COUNT(DISTINCT t2.file_col) AS d_cnt,
COUNT(t2.file_col) AS cnt
FROM
(
select to_char( '?'::DATE + (interval '1' month * generate_series(0,11)),'yyyy-mm-dd')as month) x
...
) t1
LEFT JOIN tab1 t2
ON t1.date = to_char(t2.date_col,'yyyy-mm')
WHERE
t1.date BETWEEN 'date1' and 'date2'
GROUP BY
t1.date;
In this example show how to generate sequence for month period.

Is it possible to convert this query to use a join instead of a subquery?

SELECT
number, count(id)
FROM
tracking
WHERE
id IN (SELECT max(id) FROM tracking WHERE splitnr = 'a11' AND number >0 AND timestamp >= '2009-04-08 00:00:00' AND timestamp <= '2009-04-08 12:55:57' GROUP BY ident)
GROUP BY
number
How about this:
SELECT number, count(id)
FROM tracking
INNER JOIN (SELECT max(id) ID FROM tracking
WHERE splitnr = 'a11' AND
number >0 AND timestamp >= '2009-04-08 00:00:00' AND
timestamp <= '2009-04-08 12:55:57'
GROUP BY ident
) MID ON (MID.ID=tracking.id)
WHERE
GROUP BY number
Could you not do something like:
SELECT
number,
count(id)
FROM
tracking
WHERE
splitnr = 'a11' AND number > 0 AND timestamp >= '2009-04-08 00:00:00' AND timestamp <= '2009-04-08 12:55:57'
GROUP BY
number
ORDER BY
number DESC
LIMIT 0,1
(I don't really know MySQL by the way)
I'm assuming this would give you back the same resultset, you order it by the number desc because you want the maximum one, right? Then you can put the WHERE clause in and limit it by one to give you the first one which is essentially the same as MAX (I think) Thus removing the JOIN altogether.
EDIT: I didn't think you'd need the GROUP BY identd either
Slightly hard to make sure that I've got it entirely right without seeing the data and knowing exactly what you're trying to achieve but personally I'd turn the sub-query into a view and then join on that, so:
create view vMaximumIDbyIdent
as
SELECT ident, max(id) maxid
FROM tracking
WHERE splitnr = 'a11' AND number >0
AND timestamp >= '2009-04-08 00:00:00'
AND timestamp <= '2009-04-08 12:55:57'
GROUP BY ident
then:
SELECT
number, count(id)
FROM
tracking,
vMaximumIDbyIdent
WHERE
tracking.id = vMaximumIDbyIdent.maxid
GROUP BY
number
More readable and maintainable.