How can I convert 1 record with a start and end date into multiple records for each day in DolphinDB? - missing-data

So I have a table with the following columns:
For each record in the above table (e.g., stock A with a ENTRY_DT as 2011.08.22 and REMOVE_DT as 2011.09.03), I’d like to replicate it for each day between the start and end date (excluding weekends). The converted records keep the same value of fields S_INFO_WINDCODE and SW_IND_CODE as the original record.
Table after conversion should look like this:
(only records of stock A are shown)

As the data volume is not large, you can process each record with cj(cross join), then use function unionAll to combine all records into the output table.
The table:
t = table(`A`B`C as S_INFO_WINDCODE, `6112010200`6112010200`6112010200 as SW_IND_CODE, 2011.08.22 1998.11.11 1999.05.27 as ENTRY_DT, 2011.09.03 2010.10.08 2011.09.30 as REMOVE_DT)
Solution:
def f(t, i) {
windCode = t[i][`S_INFO_WINDCODE]
code = t[i][`SW_IND_CODE]
entryDate = t[i][`ENTRY_DT]
removeDate = t[i][`REMOVE_DT]
days = entryDate..removeDate
days = days[weekday(days) between 1:5]
return cj(table(windCode as S_INFO_WINDCODE, code as SW_IND_CODE), table(days as DT))
}
unionAll(each(f{t}, 1..size(t) - 1), false)

Related

SQL Server: Selecting Specific Records From a Table with Duplicate Records (Excluding Stale Data from a Query)

I'm trying to put together a query (select preferably) in SQL server that works with a single table. Said table is derived from two sets of data. Records where SET = OLD represent old data, records where SET = NEW represent new data. My intention is as follows:
If record CODE = A, keep/include the record.
If record CODE = C, keep/include the record but delete/exclude the corresponding record from the old set under the same ACT value.
If record CODE = D, delete/exclude it along with its corresponding record from the old set under the same ACT value.
If CODE = '' (blank/null), keep the record but only if it exists in the OLD set (meaning their isn't a corresponding record from the new set with the same ACT value)
What the table looks like before logic is applied:
ACT|STATUS |CODE|SET|VALUE
222| | |OLD|1
333| | |OLD|2
444| | |OLD|3
111|ADDED |A |NEW|4
222|CHANGED|C |NEW|5
333|DELETED|D |NEW|6
What the table should look like after logic is applied (end result)
ACT|STATUS |CODE|SET|VALUE
444| | |OLD|3
111|ADDED |A |NEW|4
222|CHANGED|C |NEW|5
While I can probably put together a select query to achieve the end result above I doubt it will run efficiently as the table in question has millions of records. What is the best way to do this without taking a long time to obtain the end result?
Something like this. you will have to split your query and union.
--Old Dataset
SELECT O.*
FROM MyTable O
LEFT JOIN Mytable N ON O.ACT = N.ACT AND N.[SET] = 'NEW'
WHERE O.[SET] ='OLD'
AND ISNULL(N.CODE,'A') = 'A'
UNION
-- New records
SELECT N.*
FROM MyTable N
WHERE N.[SET] ='NEW'
AND CODE <> 'D'

SQL - How to get rows within a date period that are within another date period?

I have the following table in the DDBB:
On the other side, i have an interface with an start and end filter parameters.
So i want to understand how to query the table to only get the data from the table which period are within the values introduces by the user.
Next I present the 3 scenarios possible. If i need to create one query per each scenario is ok:
Scenario 1:If the users only defines start = 03/01/2021, then the expected output should be rows with id 3,5 and 6.
Scenario 2:if the users only defines end = 03/01/2021, then the expected output shoud be rows with id 1 and 2.
Scenario 3:if the users defines start =03/01/2021 and end=05/01/2021 then the expected output should be rows with id 3 and 5.
Hope that makes sense.
Thanks
I will assume that start_date and end_date here are DateFields [Django-doc], and that you have a dictionary with a 'start' and 'end' as (optional) key, and these map to date object, so a possible dictionary could be:
# scenario 3
from datetime import date
data = {
'start': date(2021, 1, 3),
'end': date(2021, 1, 5)
}
If you do not want to filter on start and/or end, then either the key is not in the dictionary data, or it maps to None.
You can make a filter with:
filtr = {
lu: data[ky]
ky, lu in (('start', 'start_date__gte'), ('end', 'end_date__lte'))
if data.get(ky)
}
result = MyModel.objects.filter(**filtr)
This will then filter the MyModel objects to only retrieve MyModels where the start_date and end_date are within bounds.

How to delete duplicates data that is in between two common value?

How can I delete duplicate data based on the common value (Start and End)
(Time is unique key)
My table is:
Time
Data
10:24:11
Start
10:24:12
Result
10:24:13
Result
10:24:14
End
10:24:15
Start
10:24:16
Result
10:24:17
End
I want to get Data: Result in between Start and End that is with the MAX(TIME) when duplication does occur. as such
The result that I want:
Time
Data
10:24:11
Start
10:24:13
Result
10:24:14
End
10:24:15
Start
10:24:16
Result
10:24:17
End
I have tried rearranging the data, but couldn't seems to get the result that I want, Could someone give their advice on this case?
Update
I ended up not using either of the the approach suggested by #fredt and #airliquide as my version of HSQLDB doesn't support the function.
so what I did was, adding sequence and making Start = 1, Result = 2, and End = 3.
Sequence
Time
Data
Indicator
1
10:24:11
Start
1
2
10:24:12
Result
2
3
10:24:13
Result
2
4
10:24:14
End
3
5
10:24:15
Start
1
6
10:24:16
Result
2
7
10:24:17
End
3
Thereon, I make use of the indicator and sequence to get only latest Result. Such that if previous row is 2 (which is result), remove it.
The guide that I follow:
From: Is there a way to access the "previous row" value in a SELECT statement?
select t1.value - t2.value from table t1, table t2
where t1.primaryKey = t2.primaryKey - 1
Hi a first approach will be to use a lead function as folow
select hour,status from (select *,lead(status,1) over ( order by hour) as lead
from newtable)compare
where compare.lead <> status
OR lead is null
Give me what's expected using a postgres engine.
You can do this sort of thing with SQL procedures.
-- create the table with only two columns
CREATE TABLE actions (attime TIME UNIQUE, data VARCHAR(10));
-- drop the procedure if it exists
DROP PROCEDURE del_duplicates IF EXISTS;
create procedure del_duplicates() MODIFIES SQL DATA begin atomic
DECLARE last_time time(0) default null;
for_loop:
-- loop over the rows in order
FOR SELECT * FROM actions ORDER BY attime DO
-- each time 'Start' is found, clear the last_time variable
IF data = 'Start' THEN
SET last_time = NULL;
ITERATE for_loop;
END IF;
-- each time 'Result' is found, delete the row with previous time
-- if last_time is null, no row is actually deleted
IF data = 'Result' THEN
DELETE FROM actions WHERE attime = last_time;
-- then store the latest time
SET last_time = attime;
ITERATE for_loop;
END IF;
END FOR;
END
Your data must all belong to a single day, otherwise there will be strange overlaps that cannot be distinguished. It is better to use TIMESTAMP instead of TIME.

How to make operations on rows that resulted from a self join query?

I have a table containing many rows about financial data. Colums are as follows
Unixtime,open,high,low,close,timeframe,sourceId.
Given two assets with same timeframe but different sourceId, how to show a table which has
unixtime, Asset1open/asset2open,Asset1close/asset2close as columns?
Every resulting row should be the result of prices that have the same unixtime, and should be ordered by unixtime asc order.
How to do it with a self join?
You don't mention the specific database, so I'll assume this is for Sybase.
You can do:
select
a.unixtime,
a.open / b.open,
a.close / b.close
from t a
join t b on a.unixtime = b.unixtime and a.timeframe = b.timeframe
where a.sourceid = 123
and b.sourceid = 456
order by a.unixtime

SQL query to return nil for dates not present in the table

I have a table 'my_table'. It has the following data :
ID --- Date
1 --- 01/30/2012
2 --- 01/30/2012
3 --- 05/30/2012
I can write a SQL query to return the count of ID's between certain dates, grouped by month, like this :
{"01/30/2012" => 2, "05/30/2012" => 1}
How can I get a result which has all the missing months between the requested dates with value '0', like this :
{"01/30/2012" => 2, "02/30/2012" => 0, "03/30/2012" => 0, "04/30/2012" => 0, "05/30/2012" => 1}
Thanks in advance.
The way I do it is to have a static table with list of all the dates. In your case that's 30th of each month (what about February?). Lets call this table REF_DATE. It has a single column DT that holds the date.
Assuming that my_table only contains 0 or at most 1 distinct date (30th) in each month, what you need to do is:
select DT,count(ID) from
REF_DT REF
left outer join my_table MT
on REF.DT=my_table.DATE
group by REF.DT;
I came up with somewhat hackish way through rails
class Month<Date # for getting months in range
def succ
self >> 1
end
end
range = Month.new(2010,1,1)..Month.new(2013,1,1) # range of date to query
months=Hash.new
(range).each do |month|
months.merge!({month.to_s => 0}) # get all months as per range requirement of project
end
db_months = MyTable.find_all_by_date(range).group_by{ |u| u.date.beginning_of_month }.map{|m,v| [m.to_s , v.size]} #get all records grouped by months
all_months = months.merge(Hash[db_months]) # merge all missing months
Replace the range with the dates you want also the format of the date as per your requirement.