Efficient query for the first result in groups (postgresql 9) - sql

I have a table with 200000 rows and columns: name and date. The dates and names may have repeated values. I would like get the first 300 unique names for the dates sorted in an ascending order and have this run fast as my table may have a million rows.
I am using postgresql 9.

SELECT name, date
FROM
(
SELECT DISTINCT ON (name) name, date
FROM table
ORDER BY name, date
) AS id_date
ORDER BY date
LIMIT 300;
The last query of #jachguate will miss names having two dates on the same date, however this one doesn't.
The query takes about 100 ms in a non-optimized postgresql 9.1 with about 100.000 entries, thus it may not scale to millions of entries.
An upgrade to postgresql 9.2 may help, as according to the release notes there are many performance improvements

use a CTE:
with unique_date_name as (
select date, name, count(*) rcount
from table
group by date, name
having count(*) = 1
)
select name, date
from unique_date_name
order by date limit 300;
Edit
From the comments, this result in poor performance, so try this other:
select date, name, count(*) rcount
from table
group by date, name
having count(*) = 1
order by date limit 300;
or, transforming the original query into a nested subquery in FROM instead of a CTE:
select name, date
from (
select date, name, count(*) rcount
from table
group by date, name
having count(*) = 1
) unique_date_name
order by date limit 300;
unfortunately I don't have a postgreSQL at hand to check if it works, but the optimizer will make a better work.
A Index for (date, name) is a must for optimal performance.

Related

how to get latest date column records when result should be filtered with unique column name in sql?

I have table as below:
I want write a sql query to get output as below:
the query should select all the records from the table but, when multiple records have same Id column value then it should take only one record having latest Date.
E.g., Here Rudolf id 1211 is present three times in input---in output only one Rudolf record having date 06-12-2010 is selected. same thing with James.
I tried to write a query but it was not succssful. So, please help me to form a query string in sql.
Thanks in advance
You can partition your data over Date Desc and get the first row of each partition
SELECT A.Id, A.Name, A.Place, A.Date FROM (
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY Id ORDER BY Date DESC) AS rn
FROM [Table]
) A WHERE A.rn = 1
you can use WITH TIES
select top 1 PERCENT WITH TIES * from t
order by (row_number() over(partition by id order by date desc))
https://dbfiddle.uk/?rdbms=sqlserver_2017&fiddle=280b7412b5c0c04c208f2914b44c7ce3
As i can see from your example, duplicate rows differ only in Date. If it's a case, then simple GROUP BY with MAX aggregate function will do the job for you.
SELECT Id, Name, Place, MAX(Date)
FROM [TABLE_NAME]
GROUP BY Id, Name, Place
Here is working example: http://sqlfiddle.com/#!18/7025e/2

select multiple records based on order by

i have a table with a bunch of customer IDs. in a customer table is also these IDs but each id can be on multiple records for the same customer. i want to select the most recently used record which i can get by doing order by <my_field> desc
say i have 100 customer IDs in this table and in the customers table there is 120 records with these IDs (some are duplicates). how can i apply my order by condition to only get the most recent matching records?
dbms is sql server 2000.
table is basically like this:
loc_nbr and cust_nbr are primary keys
a customer shops at location 1. they get assigned loc_nbr = 1 and cust_nbr = 1
then a customer_id of 1.
they shop again but this time at location 2. so they get assigned loc_nbr = 2 and cust_Nbr = 1. then the same customer_id of 1 based on their other attributes like name and address.
because they shopped at location 2 AFTER location 1, it will have a more recent rec_alt_ts value, which is the record i would want to retrieve.
You want to use the ROW_NUMBER() function with a Common Table Expression (CTE).
Here's a basic example. You should be able to use a similar query with your data.
;WITH TheLatest AS
(
SELECT *, ROW_NUMBER() OVER (PARTITION BY group-by-fields ORDER BY sorting-fields) AS ItemCount
FROM TheTable
)
SELECT *
FROM TheLatest
WHERE ItemCount = 1
UPDATE: I just noticed that this was tagged with sql-server-2000. This will only work on SQL Server 2005 and later.
Since you didn't give real table and field names, this is just psuedo code for a solution.
select *
from customer_table t2
inner join location_table t1
on t1.some_key = t2.some_key
where t1.LocationKey = (select top 1 (LocationKey) as LatestLocationKey from location_table where cust_id = t1.cust_id order by some_field)
Use an aggregate function in the query to group by customer IDs:
SELECT cust_Nbr, MAX(rec_alt_ts) AS most_recent_transaction, other_fields
FROM tableName
GROUP BY cust_Nbr, other_fields
ORDER BY cust_Nbr DESC;
This assumes that rec_alt_ts increases every time, thus the max entry for that cust_Nbr would be the most recent entry.
By using time and date we can take out the recent detail for the customer.
use the column from where you take out the date and the time for the customer.
eg:
SQL> select ename , to_date(hiredate,'dd-mm-yyyy hh24:mi:ss') from emp order by to_date(hiredate,'dd-mm-yyyy hh24:mi:ss');

oracle sql to get min timestamp when the count of results large than a number

in order to improve the performance, I need a sql to implement the following requirement.
If there is a table and has the following column:
id timestamp value
How can I get the min timestamp(e.g. :t1) when the count of the result > 100000 ?
then the following sql result--count(*) will > 100000
select count(*) from table where timestamp < :t1
My understanding of your question is: Find the earliest timestamp in the table for which there are at least 100,000 earlier rows.
There are probably many ways to do it; the main difficulty is trying to come up with an efficient one.
I think an analytic-function approach is most likely to work well. The most obvious choice is to use COUNT:
select min(timestamp) from (
select timestamp, count(*) over (order by timestamp rows between unbounded preceding and 1 preceding) earlier_rows
from table
)
where earlier_rows >= 100000
But I suspect using RANK or something similar will be faster:
select min(timestamp) from (
select timestamp, rank() over (order by timestamp) time_rank
from table
)
where time_rank > 100000
I'm not sure off the top of my head, but these may give slightly different results if there are duplicate timestamps.
This will give you the min and max value and the count
select
count(t.*),
min(t.timestamp),
max(t.timestamp)
from table t
where ( select count(*) from table t where t.timestamp < :t1 ) > 10000

Get Last Records for Multiple Items in SQL Server

I have a simple table that stores daily high temperatures for various cities. The table contains 3 fields: city_id, date, high_temp. (Unique is combo of city_id and date)
I'm looking for an efficient query for SQL Server that will allow me to pull the 3 most recent highs for each city. Thus, the result would include every city id listed 3 times.
What is extra challenging (for me at least) is that the high temps are not recorded every day for every city so I can't filter by (getdate()-1, getdate()-2, etc)
Is there a way to do this without a loop function? Much appreciated. Cheers.
For SQL 2005 and later use ROW_NUMBER:
WITH temps as
(
SELECT city_id, date, high_temp,
ROW_NUMBER() OVER (PARTITION BY city_id ORDER BY date DESC) RowNum
FROM TemperatureHistory <-- use the real table name here
)
SELECT city_id, date, high_temp FROM temps WHERE RowNum <= 3
SELECT *
FROM table T1
WHERE T1.date IN
(SELECT TOP 3 date
FROM table T2
WHERE T1.city_id = T2.city_id
ORDER BY T2.date DESC)

Return min date and corresponding amount to that distinct ID

Afternoon
I am trying to return the min value/ max values in SQL Server 2005 when I have multiple dates that are the same but the values in the Owed column are all different. I've already filtered the table down by my select statement into a temp table for a different query, when I've then tried to mirror I have all the duplicated dates that you can see below.
I now have a table that looks like:
ID| Date |Owes
-----------------
1 20110901 89
1 20110901 179
1 20110901 101
1 20110901 197
1 20110901 510
2 20111001 10
2 20111001 211
2 20111001 214
2 20111001 669
My current query:
Drop Table #Temp
Select Distinct Convert(Varchar(8), DateAdd(dd, Datediff(DD,0,DateDue),0),112)as Date
,ID
,Paid
Into #Temp
From Table
Where Paid <> '0'
Select ,Id
,Date
,Max(Owed)
,Min(Owed)
From #Temp
Group by ID, Date, Paid
Order By ID, Date, Paid
This doesn't strip out any of my dates that are the same, I'm new to SQL but I'm presuming its because my owed column has different values. I basically want to be able to pull back the first record as this will always be my minimum paid and my last record will always be my maximum owed to work out my total owed by ID.
I'm new to SQL so would like to understand what I've done wrong for my future knowledge of structuring queries?
Many Thanks
In your "select into"statement, you don't have an Owed column?
GROUP BY is the normal way you "strip out values that are the same". If you group by ID and Date, you will get one row in your result for each distinct pair of values in those two columns. Each row in the results represents ALL the rows in the underlying table, and aggregate functions like MIN, MAX, etc. can pull out values.
SELECT id, date, MAX(owes) as MaxOwes, MIN(owes) as minOwes
FROM myFavoriteTable
GROUP BY id, date
In SQL Server 2005 there are "windowing functions" that allow you to use aggregate functions on groups of records, without grouping. An example below. You will get one row for each row in the table:
SELECT id, date, owes,
MAX(Owes) over (PARTITION BY select, id) AS MaxOwes,
MIN(Owes) over (PARTITION BY select, id) AS MinOwes
FROM myfavoriteTable
If you name a column "MinOwes" it might sound like you're just fishing tho.
If you want to group by date you can't also group by ID, too, because ID is probably unique. Try:
Select ,Date
,Min(Owed) AS min_date
,Max(Owed) AS max_date
From #Temp
Group by Date
Order By Date
To get additional values from the row (your question is a bit vague there), you could utilize window functions:
SELECT DISTINCT
,Date
,first_value(ID) OVER (PARTITION BY Date ORDER BY Owed) AS min_owed_ID
,last_value(ID) OVER (PARTITION BY Date ORDER BY Owed) AS max_owed_ID
,first_value(Owed) OVER (PARTITION BY Date ORDER BY Owed) AS min_owed
,last_value(Owed) OVER (PARTITION BY Date ORDER BY Owed) AS max_owed
FROM #Temp
ORDER BY Date;