How do I return columns in timeseries data as arrays - sql

I am using postgres and timescaledb to record data that will be used for dashboards/charting.
I have no issues getting the data I need I'm just not sure if I'm doing it the most efficient way.
Say I have this query
SELECT time, queued_calls, active_calls
FROM call_data ​
​ORDER BY time DESC LIMIT 100;
My front end receives this data for charting like such.
I feel like this is very inefficient by repeating the column name for each value.
Would it be better to send the data in a more efficient way like each column as an array of data like such.
{
time: [...],
queued_calls: [...],
active_calls: [...]
}
I guess my question is, should I be restructuring my query so the column data is in arrays somehow or is this something I should be doing after the query on the server before sending it to the client?
-- Update -- Additional Information
I'm using Node.js with Express and Sequelize as the ORM, however in this case I'm just executing a raw query via Sequelize.
The charting library I'm using on the front end also takes the series data as arrays so I was trying to kill two birds with one stone.
Frontend chart data format:
xaxis:{
categories: [...time]
}
series:[
{name: "Queued Calls", data: [...queued_calls]},
{name: "Active Calls", data: [...active_calls]}
]
Backend code:
async function getLocationData(locationId) {
return await db.sequelize.query(
'SELECT time, queued_calls, active_calls FROM location_data WHERE location_id = :locationId ORDER BY time DESC LIMIT 100;',
{
replacements: { locationId },
type: QueryTypes.SELECT,
}
);
}
...
app.get('/locationData/:locationId', async (req, res) => {
try {
const { locationId } = req.params;
const results = await getLocationData(parseInt(locationId));
res.send(results);
} catch (e) {
console.log('Error getting data', e);
}
});

Maybe you're looking for array aggregation?
Let me see if I follow the idea:
create table call_data (time timestamp, queued_calls integer, active_calls integer);
CREATE TABLE
I skipped the location_id just to simplify here.
Inserting some data:
tsdb=> insert into call_data values ('2021-03-03 01:00', 10, 20);
INSERT 0 1
tsdb=> insert into call_data values ('2021-03-03 02:00', 11, 22);
INSERT 0 1
tsdb=> insert into call_data values ('2021-03-03 03:00', 12, 25);
INSERT 0 1
And now check the data:
SELECT time, queued_calls, active_calls FROM call_data;
time | queued_calls | active_calls
---------------------+--------------+--------------
2021-03-03 01:00:00 | 10 | 20
2021-03-03 02:00:00 | 11 | 22
2021-03-03 03:00:00 | 12 | 25
(3 rows)
And if you want to get only the dates you'll have:
SELECT time::date, queued_calls, active_calls FROM call_data;
time | queued_calls | active_calls
------------+--------------+--------------
2021-03-03 | 10 | 20
2021-03-03 | 11 | 22
2021-03-03 | 12 | 25
(3 rows)
but still not grouping, so, you can use group by combined with array_agg for it:
SELECT time::date, array_agg(queued_calls), array_agg(active_calls) FROM call_data group by time::date;
time | array_agg | array_agg
------------+------------+------------
2021-03-03 | {10,11,12} | {20,22,25}

If the server is compressing data as it sends it won't make much difference over the network layer if you send the data in the array structure you're thinking of.
If you use the array structure you're thinking of you're breaking one of the benefits of JSON - structure with data. You might gain some speed increase but if you want to see the active calls for a time you'd have to have the correct index - and open the possibilty of index errors.
I recommend leaving the data as it is.

Related

How do I get all entries 5 minutes after a certain condition in SQL?

I am trying to solve the following problem using SQL:
I have a table (example shown below) with action items per user, the timestamp when the action happened and a unique identifier for each entry.
I want to find out what actions each user takes in the 5 minutes after a specific action occurs. For example, I want to see for all users with the action item "sit" what happens in the 5 minutes after that, so to see all entries starting with the "sit" action item.
I hope someone can help!!
Thank you!
table example
I started using ROW_NUMBER and then partition by users and order by time, but after that I dont know how to continue.
Your question is not entirely clear, however, in my understanding, it is easier to use a JOIN
create table log(UserName varchar(20),ActionTime datetime,ActionItem varchar(10),ActionId varchar(26));
insert into log values
('Anna' ,cast('2022-07-30 13:17:22' as datetime),'walk' ,'uid_1')
,('Peter' ,cast('2022-07-30 15:39:46' as datetime),'drive' ,'uid_2')
,('Sarah' ,cast('2022-07-30 09:07:53' as datetime),'stand' ,'uid_3')
,('Kurt' ,cast('2022-07-30 00:56:14' as datetime),'sit' ,'uid_4')
,('Deborah' ,cast('2022-07-30 15:26:02' as datetime),'lie' ,'uid_5')
,('Michelle',cast('2022-07-30 15:26:03' as datetime),'scratch','uid_6')
,('Sven' ,cast('2022-07-30 15:26:04' as datetime),'run' ,'uid_7')
,('Sarah' ,cast('2022-07-30 15:28:06' as datetime),'swim' ,'uid_8')
,('Peter' ,cast('2022-07-30 13:17:22' as datetime),'look' ,'uid_9')
;
select a.ActionId,a.UserName,a.ActionItem,a.ActionTime
,b.ActionTime,b.UserName,b.ActionItem,b.ActionId
from log a left join log b
on b.ActionId<>a.ActionId
and b.ActionTime>=a.ActionTime
and datediff(mi,a.ActionTime,b.ActionTime)<5
I guess this problem can not be solved with a single query. But you can use a series of queries.
In answer to your question I will use MySQL dialect of SQL. I believe it doesn't matter.
On first step let's assume that we are only interested in the last action "sit". In this case we can do such query:
SELECT * FROM user_actions WHERE ACTION_ITEM = "sit" ORDER BY TIMESTAMP DESC LIMIT 1;
So the result is
+------+---------------------+-------------+-------------------+
| USER | TIMESTAMP | ACTION_ITEM | UNIQUE_IDENTIFIER |
+------+---------------------+-------------+-------------------+
| Kurt | 2022-07-30 00:56:14 | sit | 4 |
+------+---------------------+-------------+-------------------+
Then save timestamp value in variable:
SELECT TIMESTAMP INTO #reason_ts FROM user_actions WHERE ACTION_ITEM = "sit" ORDER BY TIMESTAMP DESC LIMIT 1;
And now we need to get further actions in next 5 minutes (actually I took 12 hours because 5 minutes is not enough for your example). Let's do this:
SELECT csq.* FROM user_actions AS csq WHERE TIMESTAMP BETWEEN #reason_ts AND ADDTIME(#reason_ts, '12:00:00');
The result is:
+-------+---------------------+-------------+-------------------+
| USER | TIMESTAMP | ACTION_ITEM | UNIQUE_IDENTIFIER |
+-------+---------------------+-------------+-------------------+
| Sarah | 2022-07-30 09:07:53 | stand | 3 |
| Kurt | 2022-07-30 00:56:14 | sit | 4 |
+-------+---------------------+-------------+-------------------+
If you need all further action modify query:
SELECT csq.* FROM user_actions AS csq WHERE TIMESTAMP >= #reason_ts;
If you need not only last action "sit" it will be more difficult. I think you need to write some kind of script or sql function. But still it is doable.

How to transform a JSON String into a new table in Bigquery

I have a lot of raw data into one of my tables in Bigquery and from this raw data I need to create a new table.
Raw data table have a column named raw_output this column contains a JSON object that was stringify. It looks like that:
| raw_output |
| ----------------------------------------------------------------------|
| {"client":"A9310","c_integration":"889625","idntf":false,"nf_p":8.32} |
| {"client":"VB050","c_integration":"236590","idntf":true,"nf_p":4.36} |
| {"client":"XT5543","c_integration":"326957","idntf":true,"nf_p":2.33} |
From this table I would like to get something like:
client
c_integration
idntf
nf_p
A9310
889625
false
8.32
VB050
236590
true
4.36
XT5543
326957
true
2.33
So I can perform JOINS and do other operations with the data, I have looked into google's BQ docs (JSON functions) but I was not able to get the expected output. Any idea/solution is much appreciated.
Thank you all in advance.
This should help
with raw_data as (
select '{"client":"A9310","c_integration":"889625","idntf":false,"nf_p":8.32}' as raw_input union all
select '{"client":"VB050","c_integration":"236590","idntf":true,"nf_p":4.36}' as raw_input union all
select '{"client":"XT5543","c_integration":"326957","idntf":true,"nf_p":2.33}' as raw_input
)
select
json_extract_scalar(raw_input, '$.client' ) as client,
json_extract_scalar(raw_input, '$.c_integration' ) as c_integration,
json_extract_scalar(raw_input, '$.idntf' ) as idntf,
json_extract_scalar(raw_input, '$.nf_p' ) as nf_p
from raw_data

django database design when you will have too many rows

I have a django web app with postgres db; the general operation is that every day I have an array of values that need to be stored in one of the tables.
There is no foreseeable need to query the values of the array but need to be able to plot the values for a specific day.
The problem is that this array is pretty big and if I were to store it in the db, I'd have 60 million rows per year but if I store each row as a blob object, I'd have 60 thousand rows per year.
Is is a good decision to use a blob object to reduce table size when you do not want to query with the row of values?
Here are the two options:
option1: keeping all
group(foreignkey)| parent(foreignkey) | pos(int) | length(int)
A | B | 232 | 45
A | B | 233 | 45
A | B | 234 | 45
A | B | 233 | 46
...
option2: collapsing the array into a blob:
group(fk)| parent(fk) | mean_len(float)| values(blob)
A | B | 45 |[(pos=232, len=45),...]
...
so I do NOT want to query pos or length but I want to query group or parent.
An example of read query that I'm talking about is:
SELECT * FROM "mytable"
LEFT OUTER JOIN "group"
ON ( "group"."id" = "grouptable"."id" )
ORDER BY "pos" DESC LIMIT 100
which is a typical django admin list_view page main query.
I tried loading the data and tried displaying the table in the django admin page without doing any complex query (just a read query).
When I get pass 1.5 millions rows, the admin page freezes. All it takes is a some count query on that table to cause the app to crash so I should definitely either keep the data as a blob or not keep it in the db at all and use the filesystem instead.
I want to emphasize that I've used django 1.8 as my test bench so this is not a postgres evaluation but rather a system evaluation with django admin and postgres.

SQL dynamic column name

How do I declare a column name that changes?
I take some data from DB and I am interested in last 12 months, so I only take events that happend, let's say in '2016-07', '2016-06' and so on...
Then, I want my table to look like this:
event type | 2016-07 | 2016-06
-------------------------------
A | 12 | 13
B | 21 | 44
C | 98 | 12
How can I achieve this effect that the columns are named using previous YYYY-MM pattern, keeping in mind that the report with that query can be executed any time, so it would change.
Simplified query only for prev month:
select distinct
count(event),
date_year_month,
event_name
from
data_base
where date_year_month = TO_CHAR(add_months(current_date, -1),'YYYY-MM')
group by event_name, date_year_month
I don't think there is an automated way of pivoting the year-month columns, and change the number of columns in the result dynamically based on the data.
However if you are looking for pivoting solution, you accomplish using table functions in netezza.
select event_name, year_month, event_count
from event_counts_groupby_year_month, table(inza.inza.nzlua('
local rows={}
function processRow(y2016m06, y2016m07)
rows[1] = { 201606, y2016m06 }
rows[2] = { 201607, y2016m07 }
return rows
end
function getShape()
columns={}
columns[1] = { "year_month", integer }
columns[2] = { "event_count", double }
return columns
end',
y2016m06, y2016m07));
you could probably build a wrapper on this to dynamically generate the query based on the year month present in the table using shell script.

SQL Populate table with random data

I have a table with two fields:
id(UUID) that is primary Key and
description (var255)
I want to insert random data with SQL sentence.
I would like that description would be something random.
PS: I am using PostgreSQL.
I dont know exactly if this fits the requirement for a "random description", and it's not clear if you want to generate the full data: but, for example, this generates 10 records with consecutive ids and random texts:
test=# SELECT generate_series(1,10) AS id, md5(random()::text) AS descr;
id | descr
----+----------------------------------
1 | 65c141ee1fdeb269d2e393cb1d3e1c09
2 | 269638b9061149e9228d1b2718cb035e
3 | 020bce01ba6a6623702c4da1bc6d556e
4 | 18fad4813efe3dcdb388d7d8c4b6d3b4
5 | a7859b3bcf7ff11f921ceef58dc1e5b5
6 | 63691d4a20f7f23843503349c32aa08c
7 | ca317278d40f2f3ac81224f6996d1c57
8 | bb4a284e1c53775a02ebd6ec91bbb847
9 | b444b5ea7966cd76174a618ec0bb9901
10 | 800495c53976f60641fb4d486be61dc6
(10 rows)
The following worked for me:
create table t_random as select s, md5(random()::text) from generate_Series(1,5) s;
Here it is a more elegant way using the latest features. I will use the Unix dictionary (/usr/share/dict/words) and copy it into my PostgreSQL data:
cp /usr/share/dict/words data/pg95/words.list
Then, you can easily create a ton of no sense description BUT searchable using dictionary words with the following steps:
1) Create table and function. getNArrayS gets all the elements in an array and teh number of times it needs to concatenate.
CREATE TABLE randomTable(id serial PRIMARY KEY, description text);
CREATE OR REPLACE FUNCTION getNArrayS(el text[], count int) RETURNS text AS $$
SELECT string_agg(el[random()*(array_length(el,1)-1)+1], ' ') FROM generate_series(1,count) g(i)
$$
VOLATILE
LANGUAGE SQL;
Once you have all in place, run the insert using CTE:
WITH t(ray) AS(
SELECT (string_to_array(pg_read_file('words.list')::text,E'\n'))
)
INSERT INTO randomTable(description)
SELECT getNArrayS(T.ray, 3) FROM T, generate_series(1,10000);
And now, select as usual:
postgres=# select * from randomtable limit 3;
id | description
----+---------------------------------------------
1 | ultracentenarian splenodiagnosis manurially
2 | insequent monopolarity funipendulous
3 | ruminate geodic unconcludable
(3 rows)
I assume sentance == statement? You could use perl or plperl as perl has some good random data generators. Check out perl CPAN module Data::Random to start.
Here's a sample of a perl script to generate some different random stuff taken from CPAN.
use Data::Random qw(:all);
my #random_words = rand_words( size => 10 );
my #random_chars = rand_chars( set => 'all', min => 5, max => 8 );
my #random_set = rand_set( set => \#set, size => 5 );
my $random_enum = rand_enum( set => \#set );
my $random_date = rand_date();
my $random_time = rand_time();
my $random_datetime = rand_datetime();
open(FILE, ">rand_image.png") or die $!;
binmode(FILE);
print FILE rand_image( bgcolor => [0, 0, 0] );
close(FILE);