How to select the first observation in a category in PostgreSQL

How to select the first observation in a category in PostgreSQL - sql

My table contains different house IDs(dataid), time of observation(readtime), meter reading Basic Output
And the query is as follows Query statement :
select *
from university.gas_ert
where readtime between '01/01/2014' and '01/02/2014'
I am trying to get only the first observation of each day of all the dataids between the time span. I have tried GROUP BY, but it doesn't seem working.

Distinct ON could make your query much more simple.. More read in Documentation
Definition :
Keeps only the first row of each set of rows where the given
expressions evaluate to equal. Note that the “first row” of each set
is unpredictable unless ORDER BY is used to ensure that the desired
row appears first.
SELECT
DISTINCT ON (meter_value) meter_value,
dataid,
readtime
FROM
university.gas.ert
WHERE
readtime between '2014-01-01' and '2014-01-02'
ORDER BY
meter_value,
readtime ASC;

If you want one row for each unique dataid within the time range, you should use the DISTINCT ON construction. The following query will give you a row for each dataid for each day in the range described in the WHERE clause and lets you extend the range if you want to return rows for each day x dataid combination.
select distinct on(dataid, date_trunc('day', readtime)) *
from university.gas_ert
where readtime between '2014-01-01' and '2014-01-02'
order by dataid, date_trunc('day', readtime) asc

You can take a look at window functions to help out in this. ROW_NUMBER.
GROUP the records on the basis of day using date_trunc(ie without the time component) and then rank them on the basis of readtime asc
select *
from (
select *
,row_number() over(partition by date_trunc('day',a.readtime) order by a.readtime asc ) as rnk
from university.gas_ert a
)x
where x.rnk=1

Related

Trying to get the greatest value from a customer on a given day

What I need to do: if a customer makes more than one transaction in a day, I need to display the greatest value (and ignore any other values).
The query is pretty big, but the code I inserted below is the focus of the issue. I’m not getting the results I need. The subselect ideally should be reducing the number of rows the query generates since I don’t need all the transactions, just the greatest one, however my code isn’t cutting it. I’m getting the exact same number of rows with or without the subselect.
Note: I don’t actually have a t. in the actual query, there’s just a dozen or so other fields being pulled in. I added the t.* just to simplify the code example.*
SELECT
t.*,
(SELECT TOP (1)
t1.CustomerGUID
t1.Value
t1.Date
FROM #temp t1
WHERE t1.CustomerGUID = t.CustomerGUID
AND t1.Date = t.Date
ORDER BY t1.Value DESC) AS “Value”
FROM #temp t
Is there an obvious flaw in my code or is there a better way to achieve the result of getting the greatest value transaction per day per customer?
Thanks

you may want to do as follows:
SELECT
t1.CustomerGUID,
t1.Date,
MAX(t1.Value) AS Value
FROM #temp t1
GROUP BY
t1.CustomerGUID,
t1.Date

You can use row_number() as shown below.
SELECT
*
FROM
(
SELECT *, ROW_NUMBER() OVER (PARTITION BY CustomerGUID ORDER BY Date Desc) AS SrNo FROM <YourTable>
)
<YourTable>
WHERE
SrNo = 1
Sample data will be more helpful.

Try this window function:
MAX(value) OVER(PARTITION BY date,customer ORDER BY value DESC)
Its faster and more efficient.

Probably many other ways to do it, but this one is simple and works
select t.*
from (
select
convert(varchar(8), r.date,112) one_day
,max(r.Value) max_sale
from #temp r
group by convert(varchar(8), r.date,112)
) e
inner join #temp t on t.value = e.max_sale and convert(varchar(8), t.date,112) = e.one_day
if you have 2 people who spend the exact same amount that's also max, you'll get 2 records for that day.
the convert(varchar(8), r.date,112) will perform as desired on date, datetime and datetime2 data types. If you're date is a varchar,char,nchar or nvarchar you'll want to examine the data to find out if you left(t.date,10) or left(t.date,8) it.

If i've understood your requirement correctly you have stated"greatest value transaction per day per customer". That suggests to me you don't want 1 row per customer in the output but a row per day per customer.
To achieve this you can group on the day like this
Select t.customerid, datepart(day,t.date) as Daydate,
max(t.value) as value from #temp t group by
t.customerid, datepart(day,t.date);

how to get latest date column records when result should be filtered with unique column name in sql?

I have table as below:
I want write a sql query to get output as below:
the query should select all the records from the table but, when multiple records have same Id column value then it should take only one record having latest Date.
E.g., Here Rudolf id 1211 is present three times in input---in output only one Rudolf record having date 06-12-2010 is selected. same thing with James.
I tried to write a query but it was not succssful. So, please help me to form a query string in sql.
Thanks in advance

You can partition your data over Date Desc and get the first row of each partition
SELECT A.Id, A.Name, A.Place, A.Date FROM (
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY Id ORDER BY Date DESC) AS rn
FROM [Table]
) A WHERE A.rn = 1

you can use WITH TIES
select top 1 PERCENT WITH TIES * from t
order by (row_number() over(partition by id order by date desc))
https://dbfiddle.uk/?rdbms=sqlserver_2017&fiddle=280b7412b5c0c04c208f2914b44c7ce3

As i can see from your example, duplicate rows differ only in Date. If it's a case, then simple GROUP BY with MAX aggregate function will do the job for you.
SELECT Id, Name, Place, MAX(Date)
FROM [TABLE_NAME]
GROUP BY Id, Name, Place
Here is working example: http://sqlfiddle.com/#!18/7025e/2

Closest Date in Oracle

I am trying to get the closest date to a given date in Oracle. I have been working form How to get the closest dates in Oracle sql, but the example in that question uses two different tables. I'm no PL SQL guru and I'm struggling to get this to work. I have a single table that contains an ID field and a Date field. I need the ID that it closest to the date passed into the query.
select *
from ( select SEQ_ID, ENTERED_DATE, rank() over ( partition by ENTERED_DATE order by difference asc ) as rnk
from ( select SEQ_ID, ENTERED_DATE, abs(ENTERED_DATE - 2/9/1999) from DOWNTIME_DETAILS)) as difference
where rnk = 1
This gives me an error: "SQL command not properly ended"
How can I fix the query? What am I doing wrong?

The as difference is assigning a table alias. You can't use as for table aliases, only for column aliases (so as rnk is OK). Just remove the second as. As you are refering to difference in the outer query, it looks like you meant it to be a column alias and just had it in the wrong place:
select *
from (
select SEQ_ID, ENTERED_DATE,
rank() over ( order by difference ) as rnk
from (
select SEQ_ID, ENTERED_DATE,
abs(ENTERED_DATE - to_date('2/9/1999', 'MM/DD/YYYY')) as difference
from DOWNTIME_DETAILS
)
)
where rnk = 1
You also had a date without any quote marks, so that would have been interpreted as numbers in this case, and wouldn't have had the effect you were looking for. You should always use explicit conversion; I've guessed your date format. And you should not be partitioning by the original entered_date as that will make everything rank as 1. If you have two records that have the same difference they will still both rank as 1 so you'll see both. You could add a way to break ties by modifying the order by, e.g.
rank() over ( order by difference , entered_date, seq_id ) as rnk
... but you'll need to specify the criteria so it makes sense for your data and situation.
You could also do this:
select max(SEQ_ID) keep (dense_rank first
order by abs(ENTERED_DATE - to_date('2/9/1999', 'MM/DD/YYYY')))
as seq_id,
max(ENTERED_DATE) keep (dense_rank first
order by abs(ENTERED_DATE - to_date('2/9/1999', 'MM/DD/YYYY')))
as entered_date
from DOWNTIME_DETAILS;
... but then you have to supply the date twice.

PostgreSQL select daily max and corresponding hour of ocurrence

I have the following table structure, with daily-hourly data:
time_of_ocurrence(timestamp); particles(numeric)
"2012-11-01 00:30:00";191.3
"2012-11-01 01:30:00";46
...
"2013-01-01 02:30:00";319.6
How do i select the DAILY max and THE HOUR in which this max occur?
I've tried
SELECT date_trunc('hour', time_of_ocurrence) as hora,
MAX(particles)
from my_table WHERE time_of_ocurrence > '2013-09-01'
GROUP BY hora ORDER BY hora
But it doesn't work:
"2013-09-01 00:00:00";34.35
"2013-09-01 01:00:00";33.13
"2013-09-01 02:00:00";33.09
"2013-09-01 03:00:00";28.08
My result would be in this format instead (one max per day, showing the hour)
"2013-09-01 05:00:00";100.35
"2013-09-02 03:30:00";80.13
How can i do that? Thanks!

This type of question has come up on StackOverflow frequently, and these questions are categorized with the greatest-n-per-group tag, if you want to see other solutions.
edit: I changed the following code to group by day instead of by hour.
Here's one solution:
SELECT t.*
FROM (
SELECT date_trunc('day', time_of_ocurrence) as hora, MAX(particles) AS particles
FROM my_table
GROUP BY hora
) AS _max
INNER JOIN my_table AS t
ON _max.hora = date_trunc('day', t.time_of_ocurrence)
AND _max.particles = t.particles
WHERE time_of_ocurrence > '2013-09-01'
ORDER BY time_of_ocurrence;
This might also show more than one result per day, if more than one row has the max value.
Another solution using window functions that does not show such duplicates:
SELECT * FROM (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY date_trunc('day', time_of_ocurrence)
ORDER BY particles DESC) AS _rn
FROM my_table
) AS _max
WHERE _rn = 1
ORDER BY time_of_ocurrence;
If multiple rows have the same max, one row with nevertheless be numbered row 1. If you need specific control over which row is numbered 1, you need to use ORDER BY in the partitioning clause using a unique column to break such ties.

Use window functions:
select distinct
date_trunc('day',time_of_ocurrence) as day,
max(particles) over (partition by date_trunc('day',time_of_ocurrence)) as particles_max_of_day,
first_value(date_trunc('hour',time_of_ocurrence)) over (partition by date_trunc('day',time_of_ocurrence) order by particles desc)
from my_table
order by 1
One edge case here is if the same MAX number of particles show up in the same day, but in different hours. This version would randomly pick one of them. If you prefer one over the other (always the earlier one for example) you can add that to the order by clause:
first_value(date_trunc('hour',time_of_ocurrence)) over (partition by date_trunc('day',time_of_ocurrence) order by particles desc, time_of_ocurrence)

SQL How to remove duplicates within select query?

I have a table which looks like that:
As You see, there are some date duplicates, so how to select only one row for each date in that table?
the column 'id_from_other_table' is from INNER JOIN with the table above

There are multiple rows with the same date, but the time is different. Therefore, DISTINCT start_date will not work. What you need is: cast the start_date to a DATE (so the TIME part is gone), and then do a DISTINCT:
SELECT DISTINCT CAST(start_date AS DATE) FROM table;
Depending on what database you use, the type name for DATE is different.

Do you need any other information except the date? If not:
SELECT DISTINCT start_date FROM table;

You mention that there are date duplicates, but it appears they're quite unique down to the precision of seconds.
Can you clarify what precision of date you start considering dates duplicate - day, hour, minute?
In any case, you'll probably want to floor your datetime field. You didn't indicate which field is preferred when removing duplicates, so this query will prefer the last name in alphabetical order.
SELECT MAX(owner_name),
--floored to the second
dateadd(second,datediff(second,'2000-01-01',start_date),'2000-01-01') AS StartDate
From MyTable
GROUP BY dateadd(second,datediff(second,'2000-01-01',start_date),'2000-01-01')

Select Distinct CAST(FLOOR( CAST(start_date AS FLOAT ) )AS DATETIME) from Table

If you want to select any random single row for particular day, then
SELECT * FROM table_name GROUP BY DAY(start_date)
If you want to select single entry for each user per day, then
SELECT * FROM table_name GROUP BY DAY(start_date),owner_name

here is the solution for your query returning only one row for each date in that table
here in the solution 'tony' will occur twice as two different start dates are there for it
SELECT * FROM
(
SELECT T1.*, ROW_NUMBER() OVER(PARTITION BY TRUNC(START_DATE),OWNER_NAME ORDER BY 1,2 DESC ) RNM
FROM TABLE T1
)
WHERE RNM=1

You have to convert the "DateTime" to a "Date". Then you can easier select just one for the given date no matter the time for that date.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How to select the first observation in a category in PostgreSQL - sql

Related

Trying to get the greatest value from a customer on a given day

how to get latest date column records when result should be filtered with unique column name in sql?

Closest Date in Oracle

PostgreSQL select daily max and corresponding hour of ocurrence

SQL How to remove duplicates within select query?

Categories

Resources