SQL - How to get rows within a date period that are within another date period? - sql

I have the following table in the DDBB:
On the other side, i have an interface with an start and end filter parameters.
So i want to understand how to query the table to only get the data from the table which period are within the values introduces by the user.
Next I present the 3 scenarios possible. If i need to create one query per each scenario is ok:
Scenario 1:If the users only defines start = 03/01/2021, then the expected output should be rows with id 3,5 and 6.
Scenario 2:if the users only defines end = 03/01/2021, then the expected output shoud be rows with id 1 and 2.
Scenario 3:if the users defines start =03/01/2021 and end=05/01/2021 then the expected output should be rows with id 3 and 5.
Hope that makes sense.
Thanks

I will assume that start_date and end_date here are DateFields [Django-doc], and that you have a dictionary with a 'start' and 'end' as (optional) key, and these map to date object, so a possible dictionary could be:
# scenario 3
from datetime import date
data = {
'start': date(2021, 1, 3),
'end': date(2021, 1, 5)
}
If you do not want to filter on start and/or end, then either the key is not in the dictionary data, or it maps to None.
You can make a filter with:
filtr = {
lu: data[ky]
ky, lu in (('start', 'start_date__gte'), ('end', 'end_date__lte'))
if data.get(ky)
}
result = MyModel.objects.filter(**filtr)
This will then filter the MyModel objects to only retrieve MyModels where the start_date and end_date are within bounds.

Related

How can I compare dates from two dataframes with an if statement?

Table 1 contains the history of all the employee information but only captures the data every 90 days. Table 2 contains the current information of all employees and is updated weekly with a timestamp.
Table 1 gets appended by table two every 90 days. I figured by taking the timestamp in table 1 and adding 90 days to it, and comparing it to the time stamp in table 2, I could use the logic below to execute the append, but I'm getting an error:
TypeError: '<' not supported between instances of 'DataFrame' and 'DataFrame'
Am I missing something?
# Let's say the max date in table 1 is 2023-01-15. Adding 90 days would put us on 2023-04-15
futr_date = spark.sql('SELECT date_add(MAX(tm_update), 90) AS future_date FROM tbl_one')
# Checking the date in the weekly refresh table, i have a timestamp of 2023-02-03
curr_date = spark.sql('SELECT DISTINCT tm_update AS current_date FROM tbl_two')
if curr_date > futr_date:
print('execute block of code that transforms table 2 data and append to table 1')
else:
print('ignore and check again next week')
Select statement is not returning value but dataframe and thats why you are getting error. If you want to get value you need to collect
futr_date = spark.sql('SELECT date_add(MAX(tm_update), 90) AS future_date FROM tbl_one').collect()[0]
In second sql you are using distinct to get date, which may return list of values, i am not sure if thats what you want. Maybe here you should use MIN? With onlny one ts value it may be not important, but with more values is may cause some issues
As i said, i am not sure if your logic is correct, but here is working example which you can use for further changes
import time
import pyspark.sql.functions as F
historicalData = [
(1, time.mktime(time.strptime("24/10/2022", "%d/%m/%Y"))),
(2, time.mktime(time.strptime("15/01/2023", "%d/%m/%Y"))),
(3, time.mktime(time.strptime("04/11/2022", "%d/%m/%Y"))),
]
currentData = [
(1, time.mktime(time.strptime("01/02/2023", "%d/%m/%Y"))),
(2, time.mktime(time.strptime("02/02/2023", "%d/%m/%Y"))),
(3, time.mktime(time.strptime("03/02/2023", "%d/%m/%Y"))),
]
oldDf = spark.createDataFrame(historicalData, schema=["id", "tm_update"]).withColumn(
"tm_update", F.to_timestamp("tm_update")
)
oldDf.createOrReplaceTempView("tbl_one")
currentDf = spark.createDataFrame(currentData, schema=["id", "tm_update"]).withColumn(
"tm_update", F.to_timestamp("tm_update")
)
currentDf.createOrReplaceTempView("tbl_two")
futr_date = spark.sql(
"SELECT date_add(MAX(tm_update), 90) AS future_date FROM tbl_one"
).collect()[0]
curr_date = spark.sql(
"SELECT cast(MIN(tm_update) as date) AS current_date FROM tbl_two"
).collect()[0]
print(futr_date)
print(curr_date)
if curr_date > futr_date:
print("execute block of code that transforms table 2 data and append to table 1")
else:
print("ignore and check again next week")
Output
Row(future_date=datetime.date(2023, 4, 15))
Row(current_date=datetime.date(2023, 2, 3))
ignore and check again next week

How can I convert 1 record with a start and end date into multiple records for each day in DolphinDB?

So I have a table with the following columns:
For each record in the above table (e.g., stock A with a ENTRY_DT as 2011.08.22 and REMOVE_DT as 2011.09.03), I’d like to replicate it for each day between the start and end date (excluding weekends). The converted records keep the same value of fields S_INFO_WINDCODE and SW_IND_CODE as the original record.
Table after conversion should look like this:
(only records of stock A are shown)
As the data volume is not large, you can process each record with cj(cross join), then use function unionAll to combine all records into the output table.
The table:
t = table(`A`B`C as S_INFO_WINDCODE, `6112010200`6112010200`6112010200 as SW_IND_CODE, 2011.08.22 1998.11.11 1999.05.27 as ENTRY_DT, 2011.09.03 2010.10.08 2011.09.30 as REMOVE_DT)
Solution:
def f(t, i) {
windCode = t[i][`S_INFO_WINDCODE]
code = t[i][`SW_IND_CODE]
entryDate = t[i][`ENTRY_DT]
removeDate = t[i][`REMOVE_DT]
days = entryDate..removeDate
days = days[weekday(days) between 1:5]
return cj(table(windCode as S_INFO_WINDCODE, code as SW_IND_CODE), table(days as DT))
}
unionAll(each(f{t}, 1..size(t) - 1), false)

Select last unique polymorphic objects ordered by created at in Rails

I'm trying to get unique polymorphic objects by the value of one of the columns. I'm using Postgres.
The object has the following properties: id, available_type, available_id, value, created_at, updated_at.
I'm looking to get the most recent object per available_id (recency determined by created_at) for records with the available_type of "User".
I've been trying ActiveRecord queries like this:
Service.where(available_type: "User").order(created_at: :desc).distinct(:available_id)
But it isn't limiting to one per available_id.
Try
Service.where(id: Service
.where(available_type: "User")
.group(:available_id)
.maximum(:id).values)
Using a left join is going to be your probably most efficient way
The following sql selects only rows where there are no rows with a larger created_at.
See this post for more info: https://stackoverflow.com/a/27802817/5301717
query = <<-SQL
SELECT m.* # get the row that contains the max value
FROM services m # "m" from "max"
LEFT JOIN services b # "b" from "bigger"
ON m.available_id = b.available_id # match "max" row with "bigger" row by `home`
AND m.available_type = b.available_type
AND m.created_at < b.created_at # want "bigger" than "max"
WHERE b.created_at IS NULL # keep only if there is no bigger than max
AND service.available_type = 'User'
SQL
Service.find_by_sql(query)
distinct doesn't take a column name as an argument, only true/false.
distinct is for returning only distinct records and has nothing to do with filtering for a specific value.
if you need a specific available_id, you need to use where
e.g.
Service.distinct.where(available_type: "User").where(available_id: YOUR_ID_HERE).order(created_at: :desc)
to only get the most recent add limit
Service.distinct.where(available_type: "User").where(available_id: YOUR_ID_HERE).order(created_at: :desc).limit(1)
if you need to get the most recent of each distinct available_id, that will require a loop
first get the distinct polymorphic values by only selecting the columns that need to be distinct with select:
available_ids = Service.distinct.select(:available_id).where(available_type: 'User')
then get the most recent of each id:
recents = []
available_ids.each do |id|
recents << Service.where(available_id: id).where(available_type: 'User').order(created_at: :desc).limit(1)
end

Take MIN EFF_DT and MAX_CANC_dt from data in PIG

Schema :
TYP|ID|RECORD|SEX|EFF_DT|CANC_DT
DMF|1234567|98765432|M|2011-08-30|9999-12-31
DMF|1234567|98765432|M|2011-04-30|9999-12-31
DMF|1234567|98765432|M|2011-04-30|9999-12-31
Suppose i have multiple records like this. I only want to display records that have minimum eff_dt and maximum cancel date.
I only want to display just This 1 record
DMF|1234567|98765432|M|2011-04-30|9999-12-31
Thank you
Get min eff_dt and max canc_dt and use it to filter the relation.Assuming you have a relation A
B = GROUP A ALL;
X = FOREACH B GENERATE MIN(A.EFF_DT);
Y = FOREACH B GENERATE MAX(A.CANC_DT);
C = FILTER A BY ((EFF_DT == X.$0) AND (CANC_DT == Y.$0));
D = DISTINCT C;
DUMP D;
Let's say you have this data (sample here):
DMF|1234567|98765432|M|2011-08-30|9999-12-31
DMF|1234567|98765432|M|2011-04-30|9999-12-31
DMF|1234567|98765432|M|2011-04-30|9999-12-31
DMX|1234567|98765432|M|2011-12-30|9999-12-31
DMX|1234567|98765432|M|2011-04-30|9999-12-31
DMX|1234567|98765432|M|2011-04-01|9999-12-31
Perform these steps:
-- 1. Read data, if you have not
A = load 'data.txt' using PigStorage('|') as (typ: chararray, id:chararray, record:chararray, sex:chararray, eff_dt:datetime, canc_dt:datetime);
-- 2. Group data by the attribute you like to, in this case it is TYP
grouped = group A by typ;
-- 3. Now, generate MIN/MAX for each group. Also, only keep relevant fields
min_max = foreach grouped generate group, MIN(A.eff_dt) as min_eff_dt, MAX(A.canc_dt) as max_canc_dt;
--
dump min_max;
(DMF,2011-04-30T00:00:00.000Z,9999-12-31T00:00:00.000Z)
(DMX,2011-04-01T00:00:00.000Z,9999-12-31T00:00:00.000Z)
If you need to, change datetime to charrary.
Note: there are different ways of doing this, what I am showing, except the load step, it produces the desired result in 2 steps: GROUP and FOREACH.

SQL query to return nil for dates not present in the table

I have a table 'my_table'. It has the following data :
ID --- Date
1 --- 01/30/2012
2 --- 01/30/2012
3 --- 05/30/2012
I can write a SQL query to return the count of ID's between certain dates, grouped by month, like this :
{"01/30/2012" => 2, "05/30/2012" => 1}
How can I get a result which has all the missing months between the requested dates with value '0', like this :
{"01/30/2012" => 2, "02/30/2012" => 0, "03/30/2012" => 0, "04/30/2012" => 0, "05/30/2012" => 1}
Thanks in advance.
The way I do it is to have a static table with list of all the dates. In your case that's 30th of each month (what about February?). Lets call this table REF_DATE. It has a single column DT that holds the date.
Assuming that my_table only contains 0 or at most 1 distinct date (30th) in each month, what you need to do is:
select DT,count(ID) from
REF_DT REF
left outer join my_table MT
on REF.DT=my_table.DATE
group by REF.DT;
I came up with somewhat hackish way through rails
class Month<Date # for getting months in range
def succ
self >> 1
end
end
range = Month.new(2010,1,1)..Month.new(2013,1,1) # range of date to query
months=Hash.new
(range).each do |month|
months.merge!({month.to_s => 0}) # get all months as per range requirement of project
end
db_months = MyTable.find_all_by_date(range).group_by{ |u| u.date.beginning_of_month }.map{|m,v| [m.to_s , v.size]} #get all records grouped by months
all_months = months.merge(Hash[db_months]) # merge all missing months
Replace the range with the dates you want also the format of the date as per your requirement.