Big Query Loop through Start and End date - sql

I have a table with Start and end date with the Interval of 6 months. Below is one example:
Row start_date end_date
1 2018-09-18 2019-03-18
2 2019-03-18 2019-09-18
3 2019-09-18 2020-03-18
I have a master table (which is very big), So I have loop through this start_date and end_date and insert the record selected into the different table. Below is the sample query.
create table dataset.t1 (v1,v2,v3,create_dt);
LOOP
insert into dataset.t1 (v1,v2,v3,create_dt) select v1,v2,v3,create_dt
from dataset.t2 where create_dt >= (select start_date from dataset.t1)
and create_date < (select end_date from dataset.t1)
END LOOP.
When I tried with Loop I am getting below error:
Query error: Scalar subquery produced more than one element at.
Could anyone please help me on how to implement this. My final goal is to improve performance by dividing the date into different ranges.

On the error that you've got, the problem is your (select start_date from dataset.t1) returns more than one element, not sure what you want to achieve but in order for the subquery to work, it should be something like (select MIN(start_date) from dataset.t1).
I don't understand your loop because nothing seems changed in your loop (beside you're inserting something to t1), you should think about when your loop should exit.

The below works in SQl server. You will have to find Big data equivalent for cursor. You can use an array to hole the start & end dates and loop through.
You can create a temporary table for storing the Start date & Enda date and called it #temp_db. Open a cursor and fetch the first start date & end date into a variable.from #temp_db.
select start_date,end_date into #start_date,#end_date from #temp_db
Execute the Sql:
Insert into #new_tbl
select * from #src_tbl where create_dt >= #start_date and create_dt < #end_date.
For every record, fetched from the #temp_db, start inserting into a new table/same table as per your request.
When there is no more row to fetch from your#temp_db..you would have inserted all the records into the #new_tbl.

Related

How to make a group query to select multiple rows?

I have a DateTime column (timestamp 2022-05-22 10:10:12) with a batch of stamps per each day.
I need to filter the rows where stamp is before 9am (here is no problem) and I'm using this code:
SELECT * FROM tickets
WHERE date_part('hour'::text, tickets.date_in) < 9::double precision;
The output is the list of the rows where the time in timestamp is less than 9 am (50 rows from 2000).
date_in
2022-05-22 08:10:12
2022-04-23 07:11:13
2022-06-15 08:45:26
Then I need to find all the days where at least one row has a stamp before 9 am - and here I'm stuck. Any idea how to select all the days where at least one stamp was before 9 am?
The code I'm trying:
SELECT * into temp1 FROM tickets
WHERE date_part('hour'::text, tickets.date_in) < 9::double precision
ORDER BY date_part('day'::text, date_in);
Select * into temp2
from tickets, temp1
where date_part('day'::text, tickets.date_in) = date_part('day'::text, temp1.date_in);
Update temp2 set distorted_route = 1;
But this is giving me nothing.
Expected output is to get all the days where at least one route was done before 9am:
date_in
2022-05-22 08:10:12
2022-05-22 10:11:45
2022-05-22 12:14:59
2022-04-23 07:11:13
2022-04-23 11:42:25
2022-06-15 08:45:26
2022-06-15 15:10:57
Should I make an additional table (temp1) to feed it with the first query result (just the rows before 9am) and then make a cross table query to find in the source table public.tickets all the days which are equal to the public.temp1?
Select * from tickets, temp1
where TO_Char(tickets.date_in, 'YYYY-MM-DD')
= TO_Char(temp1.date_in, 'YYYY-MM-DD');
or like this:
SELECT *
FROM tickets
WHERE EXISTS (
SELECT date_in FROM TO_Char(tickets.date_in, 'YYYY-MM-DD') = TO_Char(temp1.date_in, 'YYYY-MM-DD')
);
Ideally, I'd want to avoid using a temporary table and make a request just for one table.
After that, I need to create a view or update and add some remarks to the source table.
Assuming you mean:
How to select all rows where at least one row exists with a timestamp before 9 am of the same day?
SELECT *
FROM tickets t
WHERE EXISTS (
SELECT FROM tickets t1
WHERE t1.date_in::date = t.date_in::date -- same day
AND t1.date_in::time < time '9:00' -- time before 9:00
AND t1.id <> t.id -- exclude self
)
ORDER BY date_id; -- optional, but typically helpful
id being the PK column of your undisclosed table.
But be aware that ...
... typically you'll want to work with timestamptz instead of timestamp. See:
Ignoring time zones altogether in Rails and PostgreSQL
https://wiki.postgresql.org/wiki/Don%27t_Do_This#Don.27t_use_timestamp_.28without_time_zone.29
... this query is slow for big tables, because it cannot use a plain index on (date_id) (not "sargable"). Related:
How do you do date math that ignores the year?
There are various ways to optimize performance. The best way depends on undisclosed information for performance questions.

Limit result rows for minimal time intervals for PostgreSQL

Background: I am running TeslaMate/Grafana for monitoring my car status, one of the gauges plots the battery level fetched from database. My server is located remotely and running in a Dock from an old NAS, so both query performance and network overhead matters.
I found the koisk page frequently hangs and by investigation, it might caused by the query -- two of the plots returns 10~100k rows of results from database. I want to limit the number of rows returned by SQL queries, as the plots certainly don't have that much precision for drawing such detailed intervals.
I tried to follow this answer and use row_number() to pop only 100-th rows of results, but more complicated issues turned up, that is, the time intervals among rows are not consistent.
The car has 4 status, driving / online / asleep / offline.
If the car is at driving status, the time interval could be less than 200ms as the car pushes the status whenever it has new data.
If the car is at online status, the time interval could be several minutes as the system actively fetches the status from the car.
Even worse, if the system thinks the car is going to sleep and need to stop fetching status (to avoid preventing the car to sleep), the interval could be 40 minutes maximum depend on settings.
If the car is in asleep/offline status, no data is recorded at all.
This obviously makes skipping every n-th rows a bad idea, as for case 2-4 above, lots of data points might missing so that Grafana cannot plot correct graph representing the battery level at satisfactory precision.
I wonder if there's any possible to skip the rows by time interval from a datetime field rather than row_number() without much overhead from the query? i.e., fetch every row with minimal 1000ms from the previous row.
E.g., I have following data in the table, I want the rows returned are row 1, 4 and 5.
row date
[1] 1610000001000
[2] 1610000001100
[3] 1610000001200
[4] 1610000002000
[5] 1610000005000
The current (problematic) method I am using is as follows:
SELECT $__time(t.date), t.battery_level AS "SOC [%]"
FROM (
SELECT date, battery_level, row_number() OVER(ORDER BY date ASC) AS row
FROM (
SELECT battery_level, date
FROM positions
WHERE car_id = $car_id AND $__timeFilter(date)
UNION ALL
SELECT battery_level, date
FROM charges c
JOIN charging_processes p ON p.id = c.charging_process_id
WHERE $__timeFilter(date) AND p.car_id = $car_id) AS data
ORDER BY date ASC) as t
WHERE t.row % 100 = 0;
This method clearly gives problem that only returns alternate rows instead of what I wanted (given the last row reads t.row % 2 = 0)
PS: please ignore the table structures and UNION from the sample code, I haven't dig deep enough to the tables which could be other tweaks but irrelevant to this question anyway.
Thanks in advance!
You can use a recursive CTE:
WITH RECURSIVE rec(cur_row, cur_date) AS (
(
SELECT row, date
FROM t
ORDER BY date
LIMIT 1
)
UNION ALL
(
SELECT row, date
FROM t
JOIN rec
ON t.date >= cur_date + 1000
ORDER BY t.date
LIMIT 1
)
)
SELECT *
FROM rec;
cur_row
cur_date
1
1610000001000
4
1610000002000
5
1610000005000
View on DB Fiddle
Using a function instead would probably be faster:
CREATE OR REPLACE FUNCTION f() RETURNS SETOF t AS
$$
DECLARE
row t%ROWTYPE;
cur_date BIGINT;
BEGIN
FOR row IN
SELECT *
FROM t
ORDER BY date
LOOP
IF row.date >= cur_date + 1000 OR cur_date IS NULL
THEN
cur_date := row.date;
RETURN NEXT row;
END IF;
END LOOP;
END;
$$ LANGUAGE plpgsql;
SELECT *
FROM f();
row
date
1
1610000001000
4
1610000002000
5
1610000005000

Last 10 days Values based on Last available data

I have to fetch data from a table in such a way that my start date to fetch data is based on the date where the last data was inserted.
For example I have data from 24/01/2011 to now. But for some specific id we have inserted the last data on 24/01/2012. In this case for that id I have to fetch the data from 14/01/2012 to 24/01/2012.
Because I don't know the last date when data was inserted , so first I have to fetch the max of date and based on that I can find the start date. Is there any fast way to do that. So everything is handled in single and fast query.
If I understand your question, answer may look like this:
SELECT *
FROM table_1 t1
WHERE table_1.start_date >= to_date('14/01/2012', 'DD-MM-YYYY')
AND table_1.start_date <=
(SELECT MAX(t11.insert_date) FROM table_1 t11 WHERE t1.id = t11.id)
The below query selects all rows where the start date is greater than or equal to the last start date - 10 days. I believe it is correct for firebird
SELECT *
FROM table t1
WHERE t1.startdate >= dateadd(-10 day to MAX(t1.startdate)

T-SQL looping procedure

I have the following data:
ID Date interval interval_date tot_activity non-activity
22190 2011-09-27 00:00:00 1000 2011-09-27 10:00:00.000 265 15
I have another table with this data:
Date ID Start END sched_non_activity non_activity
10/3/2011 12:00:00 AM HBLV-22267 10/3/2011 2:02:00 PM 10/3/2011 2:11:00 PM 540
Now, in the second table's non_activity field, I would like this to be the value from the first table. However, I need to capture the tot_activity - non_activity where the intervals(in 15 min increments) from the first table, fall in the same time frame as the start and end of the second table.
I have tried setting variables and setting a loop where for each row it verifies the starttime by interval, but I have no idea how to return a variable with only one record, as I keep getting errors that my variable is getting too many results.
I have tried looking everywhere for tutorials and I can't find anything to help me out. Anyone have any pointers or tutorials on looping they could share?
You need to generate the interval end dates somehow; I'm assuming that there is always a record in the first table with a 15 minute interval record. In this case, an example would look like this:
;WITH Intervals
AS
(SELECT
Interval_date
,DATEADD(ms,840997,Interval_date) AS interval_end
,nonactivity
FROM A)
--Select query for Validation
--SELECT
-- b.[Date]
-- ,b.ID
-- ,b.Start
-- ,b.sched_non_activity
-- ,i.nonactivity
--FROM B
--JOIN Intervals AS i
--ON b.Start >= i.Interval_date
--AND b.[END] <= i.interval_end
UPDATE B
SET non_activity = i.nonactivity
FROM B
JOIN Intervals AS i
ON b.Start >= i.Interval_date
AND b.[END] <= i.interval_end
Obviously, you might need to tweak this depending on the exact circumstances.

SQL Query data issues

I have the following data:
ID Date interval interval_date tot_activity non-activity
22190 2011-09-27 00:00:00 1000 2011-09-27 10:00:00.000 265 15
I have another table with this data:
Date ID Start END sched_non_activity non_activity
10/3/2011 12:00:00 AM HBLV-22267 10/3/2011 2:02:00 PM 10/3/2011 2:11:00 PM 540
Now, in the second table's non_activity field, I would like this to be the value from the first table. However, I need to capture the tot_activity - non_activity where the intervals(in 15 min increments) from the first table, fall in the same time frame as the start and end of the second table.
I have the following so far:
SELECT 1.ID, 1.Date, 1.interval, 1.interval_date, 1.tot_activity, 1.non_activity,
1.tot_activity - 1.non_activity AS non_activity
FROM table1 AS 1 INNER JOIN
LIST AS L ON 1.ID = L.ID INNER JOIN
table2 AS 2 ON 1.Date = 2.Date AND L.ID = Right(2.ID,5)
Where 1.interval_date >= 2.Start AND 1.interval_date < 2.End
ORDER BY 1.ID, 1.interval_date
With this, I can already see I will be unable to capture if a start from table 2 is at 15:50, which means that I need to capture interval 15:45.
is there any way of doing this through queries, or should I be using variables, and doing the check per interval. Any help at all would be greatly appreciated.
I think you are asking too much from a query here.
What i would do is treat the two tables as lists ordered by time stamps and solve the problem programatically (ie not with a single query)
For example, create a function that traverses the first table in 15min increments and find the best match in the second table (i am guessing this is what you are trying to do). Implement your function to return the same results set as your query above or store it in a temporary table. Select from the result set. T-SQL is your friend :)
I'm having a tough time understanding your issue, but you might have better luck with the DATEDIFF function:
DATEDIFF(SECOND, 1.interval_date, 2.Start) >= 0 AND DATEDIFF(SECOND, 1.interval_date, 2.End) <= 0
I apologize if I'm not catching your drift. If I'm missing something, could you try to clarify a little bit?