due date estimating - sql

my "insurance_pay_dtl"(insurance table) consist 'ins_paid_dt'(insurance paid date) column,
i need to select all members whoever not paid insurance amount before due date,
due date is 1 year(365 days)
how do i do..?

You need to link insurance_pay_dtl table with insurance_farmer_hdr with its primary key, for example:
Select member_id, member_name from insurance_farmer_hdr ifd, insurance_pay_dtl ipd
where ifd.insurance_rec_id = ipd.insruance_rec_id and trunc(sysdate) > ifd.due_date
change the columns in above query as per your table columns and try.

Related

Date Range Nested Loop Optimization

Redshift db.
Table A is a date/calendar table.
Table B is a member table. Table B structured as a slowly changing dimension type 6. It has nearly 200 M records.
The goal is to write a performant performant query that gives the count of members for every day in the last 4 years. My first attempted resulting in a query like so:
select
date,
location,
sub_location,
race,
gender,
dob,
member_type,
count(distinct member_id)
from date_table d
join member_table m
on m.row_start <= d.full_date
and m.row_end >= d.full_date
and m.is_active = 'Y'
and m.row_end >= '2019-01-01'
where d.date_key >= 20190101
and d.date_key <= to_char(current_date, 'yyyymmdd')
group by
date,
location,
sub_location,
race,
gender,
dob,
member_type
The performance on this is god awful because of the join being a nested loop. I've been trying to think of a way to rework this to avoid that issue but have not had any success. Curious if there is a way to do so that would increase performance significantly.
For reference here are the table designs as well as the explain plan:
create table date_table
(
date_key integer not null encode delta
primary key,
full_date date encode delta,
)
diststyle all
sortkey (date_key);
create table member_tabnle
(
member_key bigint not null
primary key,
member_id integer,
location integer distkey,
sub_location integer encode zstd,
gender varchar(50) encode zstd,
race varchar(100) encode zstd,
date_of_birth date encode delta32k,
member_type char(10) encode zstd,
active char encode zstd,
row_start timestamp encode zstd,
row_end timestamp encode zstd,
)
diststyle key
interleaved sortkey (location, member_id);
execution plan
I've rewritten the query in various ways, none of which meaningfully impacted performance.
The output should be
Date, member attributes, count of records
You're in luck as I solved this exact issue a few years back and wrote up a description on the solution. You can find it here - http://wad-design.s3-website-us-east-1.amazonaws.com/sql_limits_wp.html
The basic issue you are facing is the need to massively grow the data before you can condense it down. These fat-in-the-middle queries can be expensive on Redshift and often spill to disk making them even more expensive. The solution is to not create a row for each account for each date but rather to look at it as counting account starts by date and account ends by date - the active accounts is the difference between the rolling sums of these values.
I was able to take a clients query run time down from 45 minutes to 17 seconds using this approach.
If the approach isn't clear let me know in a comment and I can help apply this approach to your situation. It can trip people up the first time.
This approach can be used to solve other problems efficiently like joining on the nearest date.

Multiple query data in ms access

I have a table in a accdb that consists of several columns. They include a social security number, several dates and monetary values. I am trying to query data in here ( there are over 600000 results in the accdb ) .
Social security number can appear once or several times in a database. The dates and the monetary values that are on the same line ( in a different column ) can be different, or not.
So let's say my table looks like this:
Ssn Date1 Date2 moneyvalue PostDate
123455 12-01-20 03-04-20 5.21 (A datettime value )
I am trying to do several things:
First I want to only select the ssn that appear at least twice in the database (or more).
From those results i want to only get the ones where date1 is equal to date2.
From those results i want to get the results where there are different values in moneyvalue per ssn.
I want to compare the moneyvalue from the ssn to the money value from the first time this ssn appears in the database ( so the one with the oldest datetime in postDate) and post this ssn if they moneyvalue is different.
Is this possible? How would i go on about this? I have to do this from within ms access sql window, i can't export the database to mssql as it is protected.
So to sum it up:
I want to retrieve all ssn that appear twice or more in the database, where date1 is equal to date2, and where the monetary value in record x does not match the monetary value in the ssn with the oldest postDate.
Your question suggests aggegation and multiple having clauses:
select ssn
from mytable
group by ssn
having
count(*) > 1
and sum(iif(date1 = date2, 1, 0)) > 1
and count(distinct moneyvalue) > 1
Another interpretation is a where clause on condition date1 = date2:
select ssn
from mytable
where date1 = date2
group by ssn
having
count(*) > 1
and count(distinct moneyvalue) > 1
However both queries are not equivalent, and my understanding is that the first one is what you asked for.

How to design this table schema (and how to query it)

I have a table that stores a list of members - for the sake of simplicity, I will use a simple real-world case that models my use case.
Let's use the analogy of a sports club or gym.
The membership of the gym changes every three months (for example) - with some old members leaving, some new members joining and some members staying unchanged.
I want to run a query on the table - spanning a multi-time period and return the average weight of all of the members in the club.
These are the tables I have come up with so far:
-- A table containing all members the gym has ever had
-- current members have their leave_date field left at NULL
-- departed members have their leave_date field set to the days they left the gym
CREATE TABLE IF NOT EXISTS member (
id PRIMARY KEY NOT NULL,
name TEXT NOT NULL,
join_date DATE NOT NULL,
-- set to NULL if user has not left yet
leave_date DATE DEFAULT NULL
);
-- A table of members weights.
-- This table is populated DAILY,after the weights of CURRENT members
-- has been recorded
CREATE TABLE IF NOT EXISTS current_member_weight (
id PRIMARY KEY NOT NULL,
calendar_date DATE NOT NULL,
member_id INTEGER REFERENCES member(id) NOT NULL,
weight REAL NOT NULL
);
-- I want to write a query that returns the AVERAGE daily weight of
-- CURRENT members of the gym. The query should take a starting_date
-- and an ending_date between which to calculate the daily
-- averages. The aver
-- PSEUDO SQL BELOW!
SELECT calendar_date, AVG(weight)
FROM member, current_member_weight
WHERE calendar_date BETWEEN(starting_date, ending_date);
I have two questions:
can the schema above be improved - if yes, please illustrate
How can I write an SQL* query to return the average weights calculated for all members in the gym during a specified period (t1, t2), where (t1,t2) spans a period that members have joined/left the gym?
[[Note about SQL]]
Preferably, any SQL shown would be database anagnostic, however if a particular flavour of SQL is to be used, I'd prefer PostgreSQL, since that this is the database I'm using.
Below SQL would work as long as the data in the gym_member table is consistent with the joining and leaving date of each member (i.e. for any member, the gym_member table should not have rows with calendar_date less his joining date or with calendar_date greater than his leaving date)
SELECT
gm.calendar_date,
AVG(gm.weight) avg_weight
FROM
member m,
gym_member gm
WHERE
m.id = gm.member_id
AND
gm.calendar_date >= '1-Jan-2017'
AND
gm.calendar_date <= '31-Dec-2017'
GROUP BY
gm.calendar_date

SQL getting count in a date range

I'm looking for input on getting a COUNT of records that were 'active' in a certain date range.
CREATE TABLE member {
id int identity,
name varchar,
active bit
}
The scenario is one where "members" number fluctuate over time. So I could have linear growth where I have 10 members at the beginning of the month and 20 at the end. Currently We go off the number of CURRENTLY ACTIVE (as marked by an 'active' flag in the DB) AT THE TIME OF REPORT. - this is hardly accurate and worse, 6 months from now, my "members" figure may be substantially different than now. and Since I'm doing averages per user, if I run a report now, and 6 months from now - the figures will probably be different.
I don't think a simple "dateActive" and "dateInactive" will do the trick... due to members coming and going and coming back etc. so:
JOE may be active 12-1 and deactivated 12-8 and activated 12-20
so JOE counts as being a 'member' for 8 days and then 11 days for a total of 19 days
but the revolving door status of members means keeping a separate table (presumably) of UserId, status, date
CREATE TABLE memberstatus {
member_id int,
status bit, -- 0 for in-active, 1 for active
date date
} (adding this table would make the 'active' field in members obsolete).
In order to get a "good" Average members per month (or date range) - it seems I'd need to get a daily average, and do an average of averages over 'x' days. OR is there some way in SQL to do this already.
This extra "status" table would allow an accurate count going back in time. So in a case where you have a revenue or cost figure, that DOESN'T change or is not aggregate, it's fixed, that when you want cost/members for last June, you certainly don't want to use your current members count, you want last Junes.
Is this how it's done? I know it's one way, but it the 'better' way...
#gordon - I got ya, but I guess I was looking at records like this:
Members
1 Joe
2 Tom
3 Sue
MemberStatus
1 1 '12-01-2014'
1 0 '12-08-2014'
1 1 '12-20-2014'
In this way I only need the last record for a user to get their current status, but I can track back and "know" their status on any give day.
IF I'm understanding your method it might look like this
CREATE TABLE memberstatus {
member_id int,
active_date,
inactive_date
}
so on the 1-7th the record would look like this
1 '12-01-2014' null
and on the 8th it would change to
1 '12-01-2014' '12-08-2014'
the on the 20th
1 '12-01-2014' '12-08-2014'
1 '12-20-2014' null
Although I can get the same data out, it seems more difficult without any benefit - am i missing something?
You could also use a 2 table method to have a one-to-many relationship for working periods. For example you have a User table
User
UserID int, UserName varchar
and an Activity table that holds ranges
Activity
ActivityID int, UserID int, startDate date, (duration int or endDate date)
Then whenever you wanted information you could do something like (for example)...
SELECT User.UserName, count(*) from Activity
LEFT OUTER JOIN User ON User.UserID = Activity.UserID
WHERE startDate >= '2014-01-01' AND startDate < '2015-01-01'
GROUP BY User.UserID, User.UserName
...to get a count grouped by user (and labeled by username) of the times they were became active in 2014
I have used two main ways to accomplish what you want. First would be something like this:
CREATE TABLE [MemberStatus](
[MemberID] [int] NOT NULL,
[ActiveBeginDate] [date] NOT NULL,
[ActiveEndDate] [date] NULL,
CONSTRAINT [PK_MemberStatus] PRIMARY KEY CLUSTERED
(
[MemberID] ASC,
[ActiveBeginDate] ASC
)
Every time a member becomes active, you add an entry, and when they become inactive you update their ActiveEndDate to the current date.
This is easy to maintain, but can be hard to query. Another option is to do basically what you are suggesting. You can create a scheduled job to run at the end of each day to add entries to the table .
I recommend setting up your tables so that you store more data, but in exchange the structure supports much simpler queries to achieve the reporting you require.
-- whenever a user's status changes, we update this table with the new "active"
-- bit, and we set "activeLastModified" to today.
CREATE TABLE member {
id int identity,
name varchar,
active bit,
activeLastModified date
}
-- whenever a user's status changes, we insert a new record here
-- with "startDate" set to the current "activeLastModified" field in member,
-- and "endDate" set to today (date of status change).
CREATE TABLE memberStatusHistory {
member_id int,
status bit, -- 0 for in-active, 1 for active
startDate date,
endDate date,
days int
}
As for the report you're trying to create (average # of actives in a given month), I think you need yet another table. Pure SQL can't calculate that based on these table definitions. Pulling that data from these tables is possible, but it requires programming.
If you ran something like this once-per-day and stored it in a table, then it would be easy to calculate weekly, monthly and yearly averages:
INSERT INTO myStatsTable (date, activeSum, inactiveSum)
SELECT
GETDATE(), -- based on DBMS, eg., "current_date" for Postgres
active.count,
inactive.count
FROM
(SELECT COUNT(id) FROM member WHERE active = true) active
CROSS JOIN
(SELECT COUNT(id) FROM member WHERE active = true) inactive

How to compare components of a date as a single date in Postgres?

I'm using Postgres 9.3. given this table:
CREATE TABLE release_country
(
release integer NOT NULL,
country integer NOT NULL,
date_year smallint,
date_month smallint,
date_day smallint,
)
I want a list of the earliest record for each release, in other words there can be multiple records in the table for the same release but different country. I want a list containing the release and the earliest date, but his will not work
select distinct release, min(t1.date_year), min(t1.date_month), min(t1.date_day)
FROM release_country t1
GROUP BY release;
Because it considers each portion of the date seperately. How do I consider the three portions as a single date, also having to contend with only the year portion being mandatory, the month and day portions may be null.
SELECT DISTINCT ON (release) *
FROM release_country
ORDER BY release, date_year, date_month, date_day;
I would consider storing a single date column instead of three smallint numbers. That's a lot cleaner and probably cheaper overall.
Explanation for DISTINCT ON:
Select first row in each GROUP BY group?
Try something like:
SELECT distinct on (release) *
FROM release_country
ORDER BY release, date_year, date_month, date_day