How to get the latest 3 records of each group from dolphindb database? - sql

My table name is trades, and its columns are permno, symbol, date, prc, shrout, ret, vol. I want to get the latest 3 records of each stock each date group. Does DolphinDB support such querying methods?

declare #trades as table
(
permno int,
symbol int,
groupdate date
)
insert into #trades(permno,symbol,groupdate)
values
(1,1,'2019-01-01'),
(2,2,'2019-01-01'),
(3,3,'2019-01-01'),
(4,4,'2019-01-01'),
(1,11,'2019-01-02'),
(2,22,'2019-01-02'),
(3,33,'2019-01-02'),
(4,44,'2019-01-02')
select * from(
select ROW_NUMBER() over(partition by groupdate order by groupdate)as rn,* from #trades)x
where rn <=3

In DolphinDB, one can use context-by clause to solve similar problems. For your question, use the code below:
select * from trades context by symbol, date limit -3
A negative value -3 for limit clause tells the system to get last 3 records for each symbol and date combination.

Related

BigQuery: How to delete rows that have 2 columns with identical data?

I haven't been able to find any similar question but I am looking for a way to delete all but 1 of similar rows that have 2 specific columns that contain identical data. For example:
price
symbol
date
13
RT
2020-10-1
80.9
DX
2020-10-2
81
DX
2020-10-2
90
AP
2020-10-3
89.9
AP
2020-10-3
90
AP
2020-10-3
85
DX
2020-10-4
In this example, I'd like to be able to run a query in the BQ console to find any of the rows with that have both the date AND the symbol as identical and delete one of them (which one gets deleted doesn't matter much.) The query would delete 1 of the DX rows on 2020-10-2 and 2 of the AP rows on 2020-10-3.
I appreciate the help!!
As you are using the big-query, I would suggest you to use CREATE OR REPLACE TABLE as follows:
CREATE OR REPLACE TABLE your_table
AS SELECT DISTINCT price, symbol, date
FROM your_table;
You can use this example code.
DELETE FROM [SampleDB].[dbo].[Employee]
WHERE ID NOT IN
(
SELECT MAX(ID) AS MaxRecordID
FROM [SampleDB].[dbo].[Employee]
GROUP BY [FirstName],
[LastName],
[Country]
);
Check this link for more info: https://www.sqlshack.com/different-ways-to-sql-delete-duplicate-rows-from-a-sql-table/
You specifically say that you want to delete based on two columns, not all three. In your example data, the price is the same on all rows, but that might not be the case in the real data.
You can use create or replace table, but I would recommend:
CREATE OR REPLACE TABLE t AS
SELECT ARRAY_AGG(t LIMIT 1)[ORDINAL(1)].*
FROM `t` t
GROUP BY symbol, date;
You can also express this using window functions:
CREATE OR REPLACE TABLE t AS
SELECT t.* EXCEPT (seqnum)
FROM (SELECT t.*,
ROW_NUMBER() OVER (PARTITION BY symbol, date ORDER BY price) as seqnum
FROM `t` t
) t
WHERE seqnum = 1;
Below is for BigQuery Standard SQL
create or replace table your_table as
with temp as (
select as value array_agg(t order by price limit 1) [offset(0)]
from your_table t
group by symbol, date
)
select * from temp;
Note: you can remove order by price part if you don't care about which exactly row to survive out of those with duplicate symbol and date
if applied to sample data from your question - resulted table is

MAX() for 2 Dates Separately

I am trying to find a way to get the Max Date from one field and then to remove duplication get the Max of those dates from another field.
So far I have managed to get the Max of the Effective Dates, but need to get the Max timestamp from those values to remove duplication.
Here is what I have so far:
SELECT
a2.CUST_ID
, Address
, Effective_Date --DATE variable
, Timestamp_Entry --DATETIME variable
FROM
(SELECT
CUST_ID
, MAX (Effective_Date) as Most_Effective_Date
FROM Address_Table
GROUP BY CUST_ID) a1
JOIN Address_Table a2
ON a1.CUST_ID = a2.CUST_ID and a1.Most_Effective_Date = a2.Effective_Date
(Some timestamp entrys may be newer entries with older effective date, which is why the Effective Date takes priority, and then the TimeStamp should remove duplicates
I think this is what you want:
select a.*
from (select a.*,
row_number() over (partition by cust_id order by effective_date desc, timestamp_entry desc) as seqnum
from address_table a
) a
where seqnum = 1;
This returns the "most recent" address for each customer based on the two columns.

SQL: select next available date for multiple records

I have an oracle DB.
My table has ID and DATE columns (and more).
I would like to select for every ID the next available record after a certain date. For only one ID the query would be:
SELECT * FROM my_table
WHERE id = 1 AND date >= '01.01.2018'
(just ignoring the to_date() function)
How would that look like for multiple IDs? And I do want to SELECT *.
Thanks!
We can use ROW_NUMBER here:
SELECT ID, date -- and maybe other columns
FROM
(
SELECT *,
ROW_NUMBER() OVER (PARTITION BY ID ORDER BY date) rn
FROM my_table
WHERE date >= date '2018-01-01'
) t
WHERE rn = 1
The idea here is to assign a row number to each ID partition, starting with the earliest date which occurs after the cutoff you specify. The first record from each partition would then be the immediate next date, assuming it exists.

Creating a table using hourly readings from a current table

I have a table that is being used to log values every second
datetime float float
25/02/2013 08:18:56 6 147
I need to create a table that has a single row for each hour over the last month.
Any help appreciated.
Why not just query the table for what you want?
The following query will return the first reading in every hour:
select <columns you want here>
from (select t.*,
row_number() over (partition by year(datetime), month(datetime), day(datetime),
datepart(hour, datetime)
) as seqnum
from t
) t
where seqnum = 1
You can put this into a table, adding the into statement after the select.

Return min date and corresponding amount to that distinct ID

Afternoon
I am trying to return the min value/ max values in SQL Server 2005 when I have multiple dates that are the same but the values in the Owed column are all different. I've already filtered the table down by my select statement into a temp table for a different query, when I've then tried to mirror I have all the duplicated dates that you can see below.
I now have a table that looks like:
ID| Date |Owes
-----------------
1 20110901 89
1 20110901 179
1 20110901 101
1 20110901 197
1 20110901 510
2 20111001 10
2 20111001 211
2 20111001 214
2 20111001 669
My current query:
Drop Table #Temp
Select Distinct Convert(Varchar(8), DateAdd(dd, Datediff(DD,0,DateDue),0),112)as Date
,ID
,Paid
Into #Temp
From Table
Where Paid <> '0'
Select ,Id
,Date
,Max(Owed)
,Min(Owed)
From #Temp
Group by ID, Date, Paid
Order By ID, Date, Paid
This doesn't strip out any of my dates that are the same, I'm new to SQL but I'm presuming its because my owed column has different values. I basically want to be able to pull back the first record as this will always be my minimum paid and my last record will always be my maximum owed to work out my total owed by ID.
I'm new to SQL so would like to understand what I've done wrong for my future knowledge of structuring queries?
Many Thanks
In your "select into"statement, you don't have an Owed column?
GROUP BY is the normal way you "strip out values that are the same". If you group by ID and Date, you will get one row in your result for each distinct pair of values in those two columns. Each row in the results represents ALL the rows in the underlying table, and aggregate functions like MIN, MAX, etc. can pull out values.
SELECT id, date, MAX(owes) as MaxOwes, MIN(owes) as minOwes
FROM myFavoriteTable
GROUP BY id, date
In SQL Server 2005 there are "windowing functions" that allow you to use aggregate functions on groups of records, without grouping. An example below. You will get one row for each row in the table:
SELECT id, date, owes,
MAX(Owes) over (PARTITION BY select, id) AS MaxOwes,
MIN(Owes) over (PARTITION BY select, id) AS MinOwes
FROM myfavoriteTable
If you name a column "MinOwes" it might sound like you're just fishing tho.
If you want to group by date you can't also group by ID, too, because ID is probably unique. Try:
Select ,Date
,Min(Owed) AS min_date
,Max(Owed) AS max_date
From #Temp
Group by Date
Order By Date
To get additional values from the row (your question is a bit vague there), you could utilize window functions:
SELECT DISTINCT
,Date
,first_value(ID) OVER (PARTITION BY Date ORDER BY Owed) AS min_owed_ID
,last_value(ID) OVER (PARTITION BY Date ORDER BY Owed) AS max_owed_ID
,first_value(Owed) OVER (PARTITION BY Date ORDER BY Owed) AS min_owed
,last_value(Owed) OVER (PARTITION BY Date ORDER BY Owed) AS max_owed
FROM #Temp
ORDER BY Date;