I need to write a Postgres script to loop over all the rows I have in a particular table, get the date column in that row, find all rows in a different table that have that date and assign that row the ID of the first row in its foreign key column. Explaining further, my data (simplified) looks like:
first_table:
id INT, run_date DATE
second table:
id INT, target_date TIMESTAMP, first_table_id INT
pseudocode:
first_table_array = SELECT * FROM first_table;
i, val = range first_table_array {
UPDATE second_table SET first_table_id={val.id} WHERE date={val.run_date}
}
But it's not actually that simple, because I'm comparing a date against a timestamp (which I presume will cause some problems?), and I need to refine the time period as well.
So instead of doing where date={val.run_date} I actually need to check whether the date is run_date starting at 4am, and going into 4am the following day.
So say run_date is 01/01/2015, I need to convert that value into a timestamp of 01/01/2015 04:00:00, and then create another timestamp of 01/02/2015 04:00:00, and do:
where date > first_date AND date < second_date.
So, something like this:
pseudocode:
first_table_array = SELECT * FROM first_table;
i, val = range first_table_array {
first_date = timestamp(val.run_date) + 4 hrs
second_date = timestamp(val.run_date + 1 day) + 4 hrs
UPDATE second_table SET first_table_id={val.id} WHERE date > first_date AND date < second_date
}
Is postgres capable of this sort of thing, or am I better off writing a script to do this? Do I need to use loops, or could I accomplish this stuff just using joins and standard SQL queries?
I did the best I could. A data sample would be great.
UPDATE second_table ST
SET ST.id = FT.id
FROM first_table FT
WHERE ST.date BETWEEN FT.run_date + interval '4 hour'
AND FT.run_date + interval '4 hour' + interval '1 day'
How to do an update + join in PostgreSQL?
http://www.postgresql.org/docs/9.1/static/functions-datetime.html
NOTE:
BETWEEN is equivalent to use <= and >=
Related
I have a table and I want to add some rows based on column 'days'. And for each of those rows, the 'Date' column will increment by 1 day.
For example:
Table X:
Date
Group
days
1/1/2023
A
3
5/1/2023
B
4
I want the result like this:
Date
Group
days
1/1/2023
A
3
2/1/2023
A
3
3/1/2023
A
3
5/1/2023
B
4
6/1/2023
B
4
7/1/2023
B
4
8/1/2023
B
4
How can I do it in postgresql ? Please help me
I have tried and I realy don't know how to do it. I just a newbie too
You can use generate_series() function with each incoming row to generate the dates appropriate for that row. (see demo)
select (generate_series( xdate
, xdate + (xdays-1) * interval '1 day'
, interval '1 day'
)
)::date "Date"
, xgroup "Group"
, xdays "Days"
from x;
Note: Each of your columns are poor choices for for names. They are all SQL Standard reserved works and/or Postgres reserved/restricted. They can be used safely if double quoted (i.e. "Date") but then must be double quoted everywhere they are used. Avoid double quoting if all ll possible.
When posting dates/timestamps to an international audience it is always best to use ISO 8601 Standard date format (yyyy-mm-dd). It is unambiguous regardless of your local format.
was: became:
You can call the function with the parameter of the number of rows you want to add:
CREATE OR REPLACE FUNCTION insertInfo(ZE int, need_date date default null) returns void AS
$$
BEGIN
if need_date is not null then
for z IN 1..ze LOOP
insert into x ( your_date,your_group,your_days)
select need_date + interval '1' day, your_group,your_days+z from x
where your_date = need_date;
end loop;
else
FOR i IN 1..ze LOOP
insert into x ( your_date,your_group,your_days)
select your_date + interval '1' day, your_group,your_days+1 from x
where your_date = (select max(your_date) from x);
end loop;
end if;
END
$$
LANGUAGE 'plpgsql';
if necessary date any count any quantity one parametr: select insertInfo(1,'2023-01-01');
if not necessary, just any quantity one parametr: select insertInfo(1);
I have a simple table, which is queried from my backend every minute.
id (int) | phone_number (string) | start (timedatestamp) | period (string) | occurances (int)
I make an sql query, which runs every minute, and returns the results. It's selects all phone_numbers which start this minute.
SELECT * FROM table
WHERE start >= date_trunc('minute', now()) and
start < date_trunc('minute', now()) + interval '1 minute'
as results
This runs fine, but I need to update the table as well, based on this select results.
There are two parts to this:
For each selected row, I need the occurrences to decrement by 1 and update the database with this
For each selected row, if the periodicity='MONTHLY", I need the start column to change to the date and time exactly a month from now.
Is it possible to do this in one SQL statement? Any help or examples are greatly appreciated :)
Yes and you can do so directly. The only 'twist' is when a column in mentioned in the SET clause Postgres always writes the Rvalue. When you desire to conditionally update a column you set the Rvalue to the existing value when the condition is not meet. See fiddle here.
update atable
set occurances = occurances-1
, start_tm = case when period_txt = 'Monthly'
then now()+interval '1 month'
else start_tm
end
where date_trunc('minute',start_tm) = date_trunc('minute',now());
i have this table:
COD (Integer) (PK)
ID (Varchar)
DATE (Date)
I just want to get the new ID's from today, compared with yesterday (the ID's from today that are not present yesterday)
This needs to be done with just one query, maximum efficiency because the table will have 4-5 millions records
As a java developer i am able to do this with 2 queries, but with just one is beyond my knowledge so any help would be so much appreciated
EDIT: date format is dd/mm/yyyy and every day each ID may come 0 or 1 times
Here is a solution that will go over the base data one time only. It selects the id and the date where the date is either yesterday or today (or both). Then it GROUPS BY id - each group will have either one or two rows. Then it filters by the condition that the MIN date in the group is "today". Those are the id's that exist today but did not exist yesterday.
DATE is an Oracle keyword, best not used as a column name. I changed that to DT. I also assume that your "dt" field is a pure date (as pure as it can be in Oracle, meaning: time of day, which is always present, is 00:00:00).
select id
from your_table
where dt in (trunc(sysdate), trunc(sysdate) - 1)
group by id
having min(dt) = trunc(sysdate)
;
Edit: Gordon makes a good point: perhaps you may have more than one such row per ID, in the same day? In that case the time-of-day may also be different from 00:00:00.
If so, the solution can be adapted:
select id
from your_table
where dt >= trunc(sysdate) - 1 and dt < trunc(sysdate) + 1
group by id
having min(dt) >= trunc(sysdate)
;
Either way: (1) the base table is read just once; (2) the column DT is not wrapped within any function, so if there is an index on that column, it can be used to access just the needed rows.
The typical method would use not exists:
select t.*
from t
where t.date >= trunc(sysdate) and t.date < trunc(sysdate + 1) and
not exists (select 1
from t t2
where t2.id = t.id and
t2.date >= trunc(sysdate - 1) and t2.date < trunc(sysdate)
);
This is a general solution. If you know that there is at most one record per day, there are better solutions, such as using lag().
Use MINUS. I suppose your date column has a time part, so you need to truncate it.
select id from mytable where trunc(date) = trunc(sysdate)
minus
select id from mytable where trunc(date) = trunc(sysdate) - 1;
I suggest the following function index. Without it, the query would have to full scan the table, which would probably be quite slow.
create idx on mytable( trunc(sysdate) , id );
I need to list number of column1 that have been added to the database over the selected time period (since the day the list is requested)-daily, weekly (last 7 days), monthly (last 30 days) and quarterly (last 3 months). for example below is the table I created to perform this task.
Column | Type | Modifiers
------------------+-----------------------------+-----------------------------------------------------
column1 character varying (256) not null default nextval
date timestamp without time zone not null default now()
coloumn2 charater varying(256) ..........
Now, I need the total count of entries in column1 with respect the selected time period.
Like,
Column 1 | Date | Coloumn2
------------------+-----------------------------+-----------------------------------------------------
abcdef 2013-05-12 23:03:22.995562 122345rehr566
njhkepr 2013-04-10 21:03:22.337654 45hgjtron
ffb3a36dce315a7 2013-06-14 07:34:59.477735 jkkionmlopp
abcdefgggg 2013-05-12 23:03:22.788888 22345rehr566
From above data, for daily selected time period it should be count= 2
I have tried doing this query
select count(column1) from table1 where date='2012-05-12 23:03:22';
and have got the exact one record matching the time stamp. But I really needed to do it in proper way I believe this is not an efficient way of retrieving the count. Anyone who could help me know the right and efficient way of writing such query would be great. I am new to the database world, and I am trying to be efficient in writing any query.
Thanks!
[EDIT]
Each query currently is taking 175854ms to get process. What could be the efficient way to lessen the time to have it processed accordingly. Any help would be really great. I am using Postgresql to do the same.
To be efficient, conditions should compare values of the sane type as the columns being compared. In this case, the column being compared - Date - has type timestamp, so we need to use a range of tinestamp values.
In keeping with this, you should use current_timestamp for the "now" value, and as confirmed by the documentation, subtracting an interval from a timestamp yields a timestamp, so...
For the last 1 day:
select count(*) from table1
where "Date" > current_timestamp - interval '1 day'
For the last 7 days:
select count(*) from table1
where "Date" > current_timestamp - interval '7 days'
For the last 30 days:
select count(*) from table1
where "Date" > current_timestamp - interval '30 days'
For the last 3 months:
select count(*) from table1
where "Date" > current_timestamp - interval '3 months'
Make sure you have an index on the Date column.
If you find that the index is not being used, try converting the condition to a between, eg:
where "Date" between current_timestamp - interval '3 months' and current_timestamp
Logically the same, but may help the optimizer to choose the index.
Note that column1 is irrelevant to the question; being unique there is no possibility of the row count being different from the number of different values of column1 found by any given criteria.
Also, the choice of "Date" for the column name is poor, because a) it is a reserved word, and b) it is not in fact a date.
If you want to count number of records between two dates:
select count(*)
from Table1
where "Date" >= '2013-05-12' and "Date" < '2013-05-13'
-- count for one day, upper bound not included
select count(*)
from Table1
where "Date" >= '2013-05-12' and "Date" < '2013-06-13'
-- count for one month, upper bound not included
select count(*)
from Table1
where
"Date" >= current_date and
"Date" < current_date + interval '1 day'
-- current date
What I understand from your wording is
select date_trunc('day', "date"), count(*)
from t
where "date" >= '2013-01-01'
group by 1
order by 1
Replace 'day' for 'week', 'month', 'quarter' as needed.
http://www.postgresql.org/docs/current/static/functions-datetime.html#FUNCTIONS-DATETIME-TRUNC
Create an index on the "date" column.
select count(distinct column1) from table1 where date > '2012-05-12 23:03:22';
I assume "number of column1" means "number of distinct values in column1.
Edit:
Regarding your second question (speed of the query): I would assume that an index on the date column should speed up the runtime. Depending on the data content, this could even be declared unique.
To throw another option into the mix...
Add a column of type "date" and index that -- named "datecol" for this example:
create index on tbl_datecol_idx on tbl (datecol);
analyze tbl;
Then your query can use an equality operator:
select count(*) from tbl where datecol = current_date - 1; --yesterday
Or if you can't add the date datatype column, you could create a functional index on the existing column:
create index tbl_date_fbi on tbl ( ("date"::DATE) );
analyze tbl;
select count(*) from tbl where "date"::DATE = current_date - 1;
Note1: you do not need to query "column1" directly as every row has that attribute filled due to the NOT NULL.
Note2: Creating a column named "date" is poor form, and even worse that it is of type TIMESTAMP.
A series of dates with a specified interval can be generated using a variable and a static date as per the linked question that I asked earlier. However when there's a where clause to produce a start date, the dates generation seems to stop and only shows the first interval date. I also checked other posts, those that I found e.g. 1, e.g. 2, e.g. 3 are shown with a static date or using CTE.. I am looking for a solution without storedprocedures/functions...
This works:
SELECT DATE(DATE_ADD('2012-01-12',
INTERVAL #i:=#i+30 DAY) ) AS dateO
FROM members, (SELECT #i:=0) r
where #i < DATEDIFF(now(), date '2012-01-12')
;
These don't:
SELECT DATE_ADD(date '2012-01-12',
INTERVAL #j:=#j+30 DAY) AS dateO, #j
FROM `members`, (SELECT #j:=0) s
where #j <= DATEDIFF(now(), date '2012-01-12')
and mmid = 100
;
SELECT DATE_ADD(stdate,
INTERVAL #k:=#k+30 DAY) AS dateO, #k
FROM `members`, (SELECT #k:=0) t
where #k <= DATEDIFF(now(), stdate)
and mmid = 100
;
SQLFIDDLE REFERENCE
Expected Results:
Be the same as the first query results given it starts generating dates with stDate of mmid=100.
Preferably in ANSI SQL so it can be supported in MYSQL, SQL Server/MS Access SQL as Oracle has trunc and rownum given per this query with 14 votes and PostGres has generatge_Series function. I would like to know if this is a bug or a limitation in MYSQL?
PS: I have asked a similar quetion before. It was based on static date values where as this one is based on a date value from a table column based on a condition.
The simplest way to insure cross-platform compatibility is to use a calendar table. In its simplest form
create table calendar (
cal_date date primary key
);
insert into calendar values
('2013-01-01'),
('2013-01-02'); -- etc.
There are many ways to generate dates for insertion.
Instead of using a WHERE clause to generate rows, you use a WHERE clause to select rows. To select October of this year, just
select cal_date
from calendar
where cal_date between '2013-10-01' and '2013-10-31';
It's reasonably compact--365,000 rows to cover a period of 1000 years. That ought to cover most business scenarios.
If you need cross-platform date arithmetic, you can add a tally column.
drop table calendar;
create table calendar (
cal_date date primary key,
tally integer not null unique check (tally > 0)
);
insert into calendar values ('2012-01-01', 1); -- etc.
To select all the dates of 30-day intervals, starting on 2012-01-12 and ending at the end of the calendar year, use
select cal_date
from calendar
where ((tally - (select tally
from calendar
where cal_date = '2012-01-12')) % 30 ) = 0;
cal_date
--
2012-01-12
2012-02-11
2012-03-12
2012-04-11
2012-05-11
2012-06-10
2012-07-10
2012-08-09
2012-09-08
2012-10-08
2012-11-07
2012-12-07
If your "mmid" column is guaranteed to have no gaps--an unspoken requirement for a calendar table--you can use the "mmid" column in place of my "tally" column.