I want to set up a RDBMS for structured time series data of limited size (about 6000 series, 50mb of data) at various frequencies (daily, monthly, quarterly, annual CY and annual FY), and I want to run SQL queries on the database (mostly join various tables by time). The database is updated once a month. The variable names of the tables in this database are rather technical not very informative. The raw data is labeled as shown in the table below (example of a monthly table).
I started setting this up in MySQL and figured that just equipping tables with appropriate temporal identifiers gives me the join functionality I want. I could however not find out how to store the variable labels appropriately. Is it possible to somehow add attributes to the columns? Or can I link a table to the table mapping labels to the column names, such that it is carried along in joins? Or should I set this up using a different kind of database? (database must be easy to set up and host though, and SQL is strongly preferred). I am grateful for any advice.
Update:
I figured you can add comments to MySQL columns and tables, but it seems these cannot be queried in a standard way or carried along in joins. Is it possible to retrieve the information in the comments along with the queried data from a standard database connector (like this one for the R language: https://github.com/r-dbi/RMySQL)? Below a DDL example for tables with variable labels as comments.
-- Annual FY Table
CREATE TABLE IF NOT EXISTS BOU_MMI_AF (
FY VARCHAR(7) COMMENT "Fiscal Year (July - June)",
NFA DOUBLE COMMENT "Net Foreign Assets (NFA) (Shs billion)",
NDA DOUBLE COMMENT "Net Domestic Assets (NDA) (Shs billion)",
PRIMARY KEY (FY)
) COMMENT = "Annual FY";
-- Quarterly Table
CREATE TABLE IF NOT EXISTS BOU_FS (
Year INT CHECK (Year >= 1800 AND Year < 2100) COMMENT "Year",
Quarter VARCHAR(2) CHECK (Quarter IN ('Q1', 'Q2', 'Q3', 'Q4')) COMMENT "Quarter",
FY VARCHAR(7) COMMENT "Fiscal Year (July - June)",
QFY VARCHAR(2) CHECK (QFY IN ('Q1', 'Q2', 'Q3', 'Q4')) COMMENT "Quarter of Fiscal Year",
KA_RC_RWA DOUBLE COMMENT "Capital Adequacy (%): Regulatory capital to risk-weighted assets",
AQ_NPL_GL DOUBLE COMMENT "Asset quality (%): NPLs to total gross loans",
EP_RA DOUBLE COMMENT "Earnings & profitability (%): Return on assets",
L_BFA_TD DOUBLE COMMENT "Liquidity (%): Bank-funded advances to total deposits",
MS_FX_T1CA DOUBLE COMMENT "Market Sensitivity (%): Forex exposure to regulatory tier 1 capital",
PRIMARY KEY (Year, Quarter)
) COMMENT = "Quarterly";
-- Daily Table
CREATE TABLE IF NOT EXISTS BOU_I (
Date DATE CHECK (Date >= '1800-01-01' AND Date < '2100-01-01') COMMENT "Date",
Year INT CHECK (Year >= 1800 AND Year < 2100) COMMENT "Year",
Quarter VARCHAR(2) CHECK (Quarter IN ('Q1', 'Q2', 'Q3', 'Q4')) COMMENT "Quarter",
FY VARCHAR(7) COMMENT "Fiscal Year (July - June)",
QFY VARCHAR(2) CHECK (QFY IN ('Q1', 'Q2', 'Q3', 'Q4')) COMMENT "Quarter of Fiscal Year",
Month VARCHAR(9) CHECK (Month IN ('January' , 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December')) COMMENT "Month",
Day INT CHECK (Day > 0 AND Day < 32) COMMENT "Day",
I_Overnight DOUBLE COMMENT "Daily Interbank Money-Market Rates: Overnight (%)",
I_7day DOUBLE COMMENT "Daily Interbank Money-Market Rates: 7-day (%)",
I_Overall DOUBLE COMMENT "Daily Interbank Money-Market Rates: Overall (%)",
PRIMARY KEY (Date)
) COMMENT = "Daily";
So if I execute a query like
SELECT * FROM BOU_I NATURAL JOIN BOU_FS NATURAL JOIN BOU_MMI_AF;
using a statistical software environment like R or STATA connecting to the database using a MySQL connector, I'd like to see a table similar to the one shown in the figure, where I can retrieve both the names of the variables and the labels stored as comments in the DDL.
I would structure your data differently. I would put all your measures in a single table and have a single measure per row. I would then add a DATE table (so that you have the week/month/quarter/year values for each metric date) and a METRIC_TYPE table that holds the labels for each metric code.
By normalising the data like this I think you have a more flexible design and it'll allow you to do what you want.
This is only for illustration of what I mean - it is not meant to be a definitive design:
So I am pretty happy with the suggestion of #NickW. For reference I am sharing my final schema below. I still have some questions regarding it. So I mostly query the DATA table directly (which has some 700,000 obs), and joining information from the TIME, SERIES and DATASET tables as needed. I noticed that retrieving larger amounts of data can take some time. So I wondered: am I indexing this optimally?
Then, there are a few computed columns: The Ndatasets column in DATASOURCE is counting the number of DSID by Source in the DATASET table, the Updated column in DATASET shows when data was last added to a particular dataset. DS_From, DS_to, and S_from, S_to give the maximum time range where data is available for a given dataset and series. Currently, I am doing all these computations in R and inserting the data. I wonder if these computations could be done in MySQL, so as to have self-updating columns?
Grateful for any further comment on this.
DDL:
DROP SCHEMA IF EXISTS TSDB;
CREATE SCHEMA IF NOT EXISTS TSDB;
USE TSDB;
CREATE TABLE IF NOT EXISTS DATASOURCE (
Source VARCHAR(120),
Source_Url VARCHAR(200),
NDatasets INT NOT NULL,
Desription VARCHAR(3000) NOT NULL,
Access VARCHAR(3000) NOT NULL,
PRIMARY KEY (Source)
);
CREATE TABLE IF NOT EXISTS DATASET (
DSID VARCHAR(30), -- INT
Dataset VARCHAR(120) NOT NULL,
Frequency VARCHAR(9) NOT NULL CHECK (Frequency IN ('Daily' , 'Monthly', 'Quarterly', 'Annual CY', 'Annual FY')),
DS_From DATE CHECK (DS_From >= '1800-01-01' AND DS_From < '2100-01-01'),
DS_To DATE CHECK (DS_To >= '1800-01-01' AND DS_To < '2100-01-01'),
Updated DATE CHECK (Updated >= '1800-01-01' AND Updated < '2100-01-01'),
Desription VARCHAR(3000) NOT NULL,
Source VARCHAR(120), -- NOT NULL
DS_Url VARCHAR(200),
PRIMARY KEY (DSID),
FOREIGN KEY (Source) REFERENCES DATASOURCE (Source) ON DELETE CASCADE ON UPDATE CASCADE
);
CREATE INDEX idx_dataset_source ON DATASOURCE (Source);
CREATE TABLE IF NOT EXISTS SERIES (
DSID VARCHAR(30), -- INT
Series VARCHAR(30) NOT NULL,
Label VARCHAR(120) NOT NULL,
S_From DATE CHECK (S_From >= '1800-01-01' AND S_From < '2100-01-01'),
S_To DATE CHECK (S_To >= '1800-01-01' AND S_To < '2100-01-01'),
S_Source VARCHAR(120),
S_Url VARCHAR(200),
PRIMARY KEY (DSID, Series),
FOREIGN KEY (DSID) REFERENCES DATASET (DSID) ON DELETE CASCADE ON UPDATE CASCADE
);
CREATE INDEX idx_series_DSID ON SERIES (DSID);
CREATE TABLE IF NOT EXISTS TIME (
Date DATE UNIQUE CHECK (Date >= '1800-01-01' AND Date < '2100-01-01'),
Year INT NOT NULL CHECK (Year >= 1800 AND Year < 2100),
Quarter INT NOT NULL CHECK (Quarter >= 1 AND Quarter <= 4),
FY CHAR(7) NOT NULL,
QFY INT NOT NULL CHECK (QFY >= 1 AND QFY <= 4),
Month INT NOT NULL CHECK (Month >= 1 AND Month <= 12),
Day INT NOT NULL CHECK (Day > 0 AND Day < 32),
PRIMARY KEY (Date)
);
CREATE TABLE IF NOT EXISTS DATA (
Date DATE,
DSID VARCHAR(30),
Series VARCHAR(30),
Value DOUBLE NOT NULL,
PRIMARY KEY (Date, DSID, Series),
FOREIGN KEY (DSID) REFERENCES DATASET (DSID) ON DELETE CASCADE ON UPDATE CASCADE,
FOREIGN KEY (DSID, Series) REFERENCES SERIES (DSID, Series) ON DELETE CASCADE ON UPDATE CASCADE,
FOREIGN KEY (Date) REFERENCES TIME (Date) ON DELETE CASCADE ON UPDATE CASCADE
);
CREATE INDEX idx_data_DSID ON DATA (DSID);
CREATE INDEX idx_data_series ON DATA (DSID, Series);
CREATE INDEX idx_data_date ON DATA (Date);
EER Diagram:
Related
I have the following two tables which track stock prices daily
CREATE TABLE "public"."security" (
"symbol" text NOT NULL,
"security" text NOT NULL
CONSTRAINT "security_pk" PRIMARY KEY ("symbol")
)
and the following table which keeps track of the security prices daily
CREATE TABLE "public"."security_data" (
"ticker" text NOT NULL,
"time" date NOT NULL,
"low" numeric NOT NULL,
"high" numeric NOT NULL,
"open" numeric NOT NULL,
"close" numeric NOT NULL,
"volume" double precision,
CONSTRAINT "security_data_pkey" PRIMARY KEY ("ticker", "time"),
CONSTRAINT "security_data_ticker_fkey" FOREIGN KEY (ticker) REFERENCES security(symbol) ON UPDATE CASCADE ON DELETE CASCADE NOT DEFERRABLE
)
I want to come up with a SQL query that returns a list of "symbol" for stocks that had their average price for the past week higher by more than 5% than the average price for the past 6 months. I can't figure out the best way to write this query. Is it possible to write this in one query? If so how?
You can use aggregation. Assuming that you column close represents the price, you can do:
select ticker
from security_date
where time > current_date - '6 month'::interval
group by ticker
having avg(close) filter(where time > current_date - '1 week'::interval)
> 1.05 * avg(close)
It doesn't look like you need the "live" table to get the result you want.
how to create date format yyyy-mm with postgresql11
CREATE TABLE public."ASSOL"
(
id integer NOT NULL,
"ind" character(50) ,
"s_R" character(50) ,
"R" character(50) ,
"th" character(50),
"C_O" character(50) ,
"ASSOL" numeric(11,3),
date date,
CONSTRAINT "ASSOL_pkey" PRIMARY KEY (id)
This is a variation of Kaushik's answer.
You should just use the date data type. There is no need to create another type for this. However, I would implement this use a check constraint:
CREATE TABLE public.ASSOL (
id serial primary key,
ind varchar(50) ,
s_R varchar(50) ,
R varchar(50) ,
th varchar(50),
C_O varchar(50) ,
ASSOL numeric(11,3),
yyyymm date,
constraint chk_assol_date check (date = date_trunc('month', date))
);
This only allows you to insert values that are the first day of the month. Other inserts will fail.
Additional notes:
Don't use double quotes when creating tables. You then have to refer to the columns/tables using double quotes, which just clutters queries. Your identifiers should be case-insensitive.
An integer primary key would normally be a serial column.
NOT NULL is redundant for a PRIMARY KEY column.
Use reasonable names for columns. If you want a column to represent a month, then yyyymm is more informative than date.
Postgres stores varchar() and char() in the same way, but for most databases, varchar() is preferred because trailing spaces actually occupy bytes on the data pages.
for year and month you can try like below
SELECT to_char(now(),'YYYY-MM') as year_month
year_month
2019-05
You cannot create a date datatype that stores only the year and month component. There's no such option available at the data type level.
If you want to to truncate the day component to default it to start of month, you may do it. This is as good as having only the month and year component as all the dates will have day = 1 and only the month and year would change as per the time of running insert.
For Eg:
create table t ( id int, col1 text,
"date" date default date_trunc('month',current_date) );
insert into t(id,col1) values ( 1, 'TEXT1');
select * from t
d col1 date
1 TEXT1 2019-05-01
If you do not want to store a default date, simply use the date_trunc('month,date) expression wherever needed, it could either be in group by or in a select query.
I need to create a table to hold a schedule for meetings.
A meeting can be scheduled to be:
Daily
'Ever X days'. where X can be between 1 and 6.
Ending after X sessions. Where 'sessions' is basically the number of repeats.
Weekly
Which days during the week it can occur. Mon, Tue, etc. Can select more than one day per week.
The Date on which it ends.
Monthly
Use can select the day of the month it can occur (1st, 2nd etc)
OR they can select from a lookup of '1st, 2nd, 3rd, 4th or Last' and a Day 'Mon, Tues', saying, for example "The 2nd Friday" of the month.
How can I store all these scenarios in a single table?
I was thinking:
CREATE TABLE schedule
(
ID INT NOT NULL IDENTITY(1,1) PRIMARY KEY,
StartDate DATETIME NOT NULL,
EndTime TIME NULL,
RepeatTypeID INT NOT NULL, // Daily, Weekly, Monthly, None
// For Daily
EveryDayCount INT NULL, // to handle 'every 3 days',
RepeatCount INT NULL, // How many occurances. Can be shared with different RepeatTypes
// weekly
IsMonday BIT,
IsTuesday BIT,
etc // A field per day selection. Is there a better way?
// Monthly
MonthlyDayNumber INT NULL,
MonthlyRepeatIntervalID INT, // Lookup table with '1st, 2nd, 3rd, 4th, Last'
MonthlyDayRepeatSelection INT // Lookup on Monday, Tuesday etc
)
But this seems inefficient. Is there a better design pattern for these sorts of requirements?
So basically I once implemented the same functionality and I found that rather than ease of storage, that ease of retrieval and edit/update was of paramount importance.
You don't want to calculate all dates every single time, you query the DB for meeting dates or say like you have a function like showAllMeetingsForADate(somedate date) then you would not want to calculate dates for meeting at run time.
Holistically the most optimal storage is that you store meeting information calculation logic in a table and all meeting dates in another table like below.
However for the storage of meeting information, you should go with a normalized form.
Schedule Detail Tables
CREATE TABLE DailyScheduleDetails
(
ScheduleDetailsID INT PRIMARY KEY IDENTITY(1,1),
RecurrenceCount INT NOT NULL
)
CREATE TABLE WeeklyScheduleDetails
(
ScheduleDetailsID INT PRIMARY KEY IDENTITY(1,1),
OnMonday bit,
OnTuesday bit,
OnWednesday bit,
-- ...
OnSunday bit,
EndByDate Date NOT NULL
)
CREATE TABLE MonthlyScheduleDetails
(
ScheduleDetailsID INT PRIMARY KEY IDENTITY(1,1),
MonthlyDayNumber INT NULL,
MonthlyRepeatIntervalID INT, // Lookup table with '1st, 2nd, 3rd, 4th, Last'
-- Here I'd suggest using 0 for Last
MonthlyDayRepeatSelection INT // Lookup on Monday, Tuesday etc
)
Schedule
CREATE TABLE schedule
(
ID INT NOT NULL IDENTITY(1,1) PRIMARY KEY,
StartDateTime DATETIME NOT NULL,
EndDateTime DATETIME NULL,
RepeatTypeID INT NOT NULL, // Daily, Weekly, Monthly, None
ScheduleDetailsID INT
)
MeetingDates
CREATE TABLE MeetingDates
(
ID INT NOT NULL PRIMARY KEY IDENTITY(1,1),
MeetingID int,
MeetingStartDate datetime,
MeetingEndDate datetime -- because you can have meeting spanning days like 11:00 PM to 1:00 AM
--,user or guest information too
,CONSTRAINT FK_MeetingDates_Schedule FOREIGN KEY (MeetingID)
REFERENCES Schedule(ID)
)
Use an existing standard. That standard is iCalendar RRules and ExDates.
Just store the recurrance rule in the db as a varchar
Use an existing library (C#) to calculate upcoming dates
Even though you have daily, weekly, monthly etc... still means that a meeting will occur on some specific day ... right ?
Thus
CREATE TABLE schedule
(
ID INT NOT NULL IDENTITY(1,1) PRIMARY KEY,
StartDate DATETIME NOT NULL,
EndDate DATETIME NOT NULL,
RepeatTypeID INT NOT NULL, // Daily, Weekly, Monthly, None
RepeatCount INT NOT NULL,
DayOn INT NOT NULL, // can be a calculated field based on start date using DAY function
)
I believe this can capture all your schedule options.
I have a problem with finding how to declare the following type of association :
Say I have a table "Weekly" as such :
Weekly {
Id : Int <= PK
Week : Int
Year : Int
}
And a table "Monthly" :
Monthly{
Id : Int <= PK
Month: Int
Year : Int
}
I also have a "WeekMonth" Table :
Monthly{
Week : Int <= PK
Month : Int <= PK
Year : Int <= PK
}
As you my have guessed, i whant to be able to link the Weekly with WeekMonth and Monthly with WeekMonth too.
However, i can't seam to be able to do this : a foreign key on part of the composite primary key. Nevertheless, in my WeekMonth table, both the year and week and the year and month field are obviouly unique, so it should be able to work.
I've tried multiple approch to this problem , but as the custom mapping of week per month is a business need, I a bit stuck with it
in my WeekMonth table, both the year and week and the year and month field are obviouly unique
That isn't true. 'Year and week' may be unique, but it depends what 'week' is here - if it's the week within the month (i.e. 1-5) then it is not unique. If it's the week within the year (1-53) then it is; but you don't have a unique or primary key on that combination. And 'year and month' is not unique, as you will have multiple entries - either 4 or 5 - for each combination.
If you have a composite primary (or unique) key then a foreign key has to refer to all of the columns in that PK - otherwise they would not necessarily be unique.
A natural key isn't really working for you here. As well as not allowing the relationships you want, you're duplicating data in the parent and child tables. It would be better to have a synthetic key, e.g. set from a sequence:
WeekMonth{
WeekMonth_Id : Int <= PK (synthetic, e.g. from sequence)
Week : Int <= }
Month : Int <= } UK
Year : Int <= }
}
Weekly {
Weekly_Id : Int <= PK
WeekMonth_Id : Int <= FK to WeekMonth
}
Monthly{
Monthly_Id : Int <= PK
WeekMonth_Id : Int <= FK to WeekMonth
}
You don't need to duplicate the year/month/week values in the child tables as you can get them from the parent. And you shouldn't duplicate them, as you can't easily guarantee that the match the related parent record, as well as for general normalisation reasons.
I'm assuming you have other data in the weekly and monthly tables, otherwise they would be a bit pointless; any other table that has an FK to one of those could use an FK to WeekMonth instead.
If you do want to have the individual year/month/week values duplicated in the child tables then you will need separate unique keys for those combinations, in addition to your current PK. So you'd modify WeekMonth to have a unique key on year and month (which may be possible, depending what 'week' represents), and another unique key on year and month - but as that is not a unique combination you can't create that key.
Assuming that the WeekMonth table has Week values 1 through 53 for the year then:
CREATE TABLE WeekMonth(
Week INT,
Month INT,
Year INT,
CONSTRAINT WeekMonth__W_M_Y__PK PRIMARY KEY ( Week, Month, Year ),
CONSTRAINT WeekMonth__W_Y__PK UNIQUE ( Week, Year )
);
CREATE TABLE Monthly(
ID INT PRIMARY KEY,
Month INT,
Year INT,
FirstWeek INT GENERATED ALWAYS
AS ( TO_NUMBER(
TO_CHAR(
NEXT_DAY(
TO_DATE( month||'-'||year, 'MM-YYYY' ) - 1,
'MONDAY'
),
'WW'
)
)
),
CONSTRAINT Monthly__M_Y__PK FOREIGN KEY ( FirstWeek, Month, Year )
REFERENCES WeekMonth( Week, Month, Year )
);
CREATE TABLE Weekly(
ID INT PRIMARY KEY,
Week INT,
Year INT,
CONSTRAINT Weekly__W_Y__PK FOREIGN KEY ( Week, Year )
REFERENCES WeekMonth( Week, Year )
);
I am making a database with postgresql 9.1
Given tables:
CREATE TABLE rooms(
room_number int,
property_id int,
type character varying,
PRIMARY KEY (room_number, property_id)
);
Insert into rooms values (1,1,double),(2,1,double),(3,1,triple)
CREATE TABLE reservations(
reservation_ID int,
property_id int,
arrival date,
departure date,
room_num int,
PRIMARY KEY(reservation_ID,property_id)
FOREIGN KEY (room_number, property_id)
);
INSERT INTO orders VALUES (1,1,2013-9-27,2013-9-30,1),
(2,1,2013-9-27,2013-9-28,2),
(3,1,2013-9-29,2013-9-30,3);
I want to give 2 dates and check availability in between. So at the 1st column should apear:
all the dates between the given and
additional one column for every type of the room displaying the availability.
So my result, given 2013-9-27 & 2013-9-30 as input, must be sth like this:
I think the best solution would be use both generate_series() and crosstab() to create a dynamic table. Moreover you can use a left join from a CTE to your data tables so you get better information. Something like:
WITH daterange as (
SELECT s::date as day FROM generate_series(?, ?, '1 day')
)
SELECT dr.day, sum(case when r.type = 'double' then r.qty else 0) as room_double,
sum(case when r.type = 'triple' then r.qty else 0) as room_triple....
);
But note that crosstab would make the second query a little easier.