Groupby issue on multiple join using Grafana's (TimescaleDB) SQL plugin

Groupby issue on multiple join using Grafana's (TimescaleDB) SQL plugin - sql

I'm using Grafana's SQL plugin to query a TimescaleDB database.
The DB stores weather information as
| timestamp | location_id | data_type_id | value |
where location_id and data_type_id are foreign keys to table locations describing the locations and weather_data_types defining the measurement types (temperature, relative_humidity,...).
I'd like to query data on a time range, grouped by location and type.
I manage to group by one of them, but not both.
This works and groups by location:
SELECT
$__timeGroupAlias("timestamp", $__interval),
avg(value),
locations.name
FROM weather_data
JOIN locations ON weather_data.location_id = locations.id
GROUP BY 1, locations.name
ORDER BY 1
This works and groups by type:
SELECT
$__timeGroupAlias("timestamp", $__interval),
avg(value),
weather_data_types.name
FROM weather_data
JOIN weather_data_types ON weather_data.type_id = weather_data_types.id
GROUP BY 1, weather_data_types.name
ORDER BY 1
This does not work:
SELECT
$__timeGroupAlias("timestamp", $__interval),
avg(value),
locations.name,
weather_data_types.name
FROM weather_data
JOIN locations ON weather_data.location_id = locations.id
JOIN weather_data_types ON weather_data.type_id = weather_data_types.id
GROUP BY 1, locations.name, weather_data_types.name
ORDER BY 1
More specifically, I get the following error
Value column must have numeric datatype, column: name type: string value: relative_humidity
It seems the third groupby (silently) doesn't happen and weather_data_types.name is returned, which Grafana complains about because it can't plot strings.
Changing this to return the (integer) id instead removes the error message
SELECT
$__timeGroupAlias("timestamp", $__interval),
avg(value),
locations.name,
weather_data_types.id
FROM weather_data
JOIN locations ON weather_data.location_id = locations.id
JOIN weather_data_types ON weather_data.type_id = weather_data_types.id
GROUP BY 1, locations.name, weather_data_types.id
ORDER BY 1
but two series are plotted: avg and id, which shows the groupby type is not applied.
Is there anything wrong in my query? Is it an issue with the Grafana plugin?
I don't think it matters, but here's the model, defined with SQLAlchemy and hopefully self-explanatory.
class Location(Base):
__tablename__ = "locations"
id = sqla.Column(sqla.Integer, primary_key=True)
name = sqla.Column(sqla.String(80), unique=True, nullable=False)
country = sqla.Column(sqla.String(80), nullable=False)
latitude = sqla.Column(sqla.Float(), nullable=False)
longitude = sqla.Column(sqla.Float(), nullable=False)
class WeatherDataTypes(Base):
__tablename__ = "weather_data_types"
id = sqla.Column(sqla.Integer, primary_key=True)
name = sqla.Column(sqla.String(80), unique=True, nullable=False)
description = sqla.Column(sqla.String(500), nullable=False)
unit = sqla.Column(sqla.String(20), nullable=False)
min_value = sqla.Column(sqla.Float)
max_value = sqla.Column(sqla.Float)
class WeatherData(Base):
__tablename__ = "weather_data"
timestamp = sqla.Column(sqla.DateTime(timezone=True), primary_key=True)
location_id = sqla.Column(
sqla.Integer,
sqla.ForeignKey('locations.id'),
nullable=False,
primary_key=True
)
location = sqla.orm.relationship('Location')
type_id = sqla.Column(
sqla.Integer,
sqla.ForeignKey('weather_data_types.id'),
nullable=False,
primary_key=True
)
type = sqla.orm.relationship('WeatherDataTypes')
value = sqla.Column(sqla.Float)

Sending requests directly to postgresql helped me understand what is happening.
Apparently, when the query returns a column of values and a column of strings, the Grafana plugin assumes the values are to be plotted and the string column is meant to be used as labels for the plots.
I thought the plugin used the groupby to sort of extract the column to make it label information, but this magic doesn't work with two string columns as the plugin won't concatenate the values itself. Therefore the plugin complains about the second string column not being numbers which is kind of misleading because it would not complain about the first string column.
I could get it to work by concatenating the values I use for the groupby into a single column:
SELECT
time_bucket('21600s',"timestamp") AS "time",
avg(value),
CONCAT(locations.name, ' ', weather_data_types.name) AS "name"
FROM weather_data
JOIN locations ON weather_data.location_id = locations.id
JOIN weather_data_types ON weather_data.type_id = weather_data_types.id
GROUP BY 1, locations.name, weather_data_types.name
ORDER BY 1
This returns
time | avg | name
------------------------+--------------------+---------------------------
which is correctly interpreted by the plugin.

Related

Why isn't my SQL WHERE clause being respected?

SELECT * FROM "transaction"
WHERE type = {transaction_type} AND
(from_member_id AND to_member_id) IN (SELECT member_id FROM org WHERE org_member = {org.id})
I'm finding that the clause where I check that both the from_member_id and to_member_id is in the org that I want isn't actually being respected.
For clarity; I want to select all transactions that are between members in the same organisation.
Further, I would expect that my union would work too (as member is the same member referenced in org_member)
SELECT * FROM "transaction"
INNER JOIN member ON "transaction.member_id" = member.id
WHERE type = {transaction_type} AND
("transaction.from_member_id" AND "transaction.to_member_id") IN (SELECT member_id FROM org WHERE org_member = {org.id})
Any help would be greatly appreciated.
Also, NOTE: The {} syntax as I string interpolate in python, so please ignore.

In the expression:
(from_member_id AND to_member_id) IN (...)
the first part will be treated as a boolean expression with possible values 1, 0 and null. So it won't produce the expected result.
You need to check them both separately:
AND transaction.from_member_id IN (SELECT member_id FROM org WHERE org_member = {org.id})
AND transaction.to_member_id IN (SELECT member_id FROM org WHERE org_member = {org.id})

Join Table A or Table B in a single query based on some condition

A table has trades info.
The values for grd_market_exchange_cd are mapped to different time_zone_name, which can be obtained by the following ways:
WITH time_zone AS (
SELECT DISTINCT
grd.grd_market_exchange_cd,
ex.time_zone_name
FROM mdm_code.exchange_grd_exchange_map mp
JOIN mdm_code.grd_market_exchange grd ON grd.grd_market_exchange_cd = mp.grd_market_exchange_cd
JOIN mdm.market_exchange ex ON ex.market_exchange_cd = mp.market_exchange_cd
WHERE ex.deactivation_dt IS NULL
AND EXISTS (
SELECT 'X'
FROM ae_os_market_exchange e
WHERE e.market_exchange_cd = ex.market_exchange_cd
AND e.jobs_load_id = ( SELECT MAX(jobs_load_id) FROM ae_os_market_exchange )
)
AND ex.time_zone_name IS NOT NULL
)
SELECT
at.trade_id,
at.trade_execution_tmstmp,
at.grd_market_exchange_cd,
tz.time_zone_name
FROM cd_data.allocated_trade at
JOIN time_zone tz ON tz.grd_market_exchange_cd = at.grd_market_exchange_cd
WHERE TRUNC(TRADE_EXECUTION_TMSTMP) = TRUNC(SYSDATE)
AND TRADE_STATUS_CD IN ('EX','TR')
AND trade_id IN (
308983463,
308983465,
308983929,
308983950,
308979467
);
Option 1:
This is the first way. But the problem in fetching time_zone_name in this way is:
grd_market_exchange_cd is NULL in some cases.
1 grd_market_exchange_cd is mapped to 2 or more time zones in some cases.
grd_market_exchange_cd to time_zone_name mapping is not their for some codes.
We have pick time_zone_name through this way every time, except these three cases mentioned above. As seen above, I get two rows for the same trades if grd_market_exchange_cd
is mapped to 2 time zones or no row if it is NULL or not mapped.
Option 2:
If somehow I don't get time_zone_name by using grd_market_exchange_cd, then I would have to use different tables to fetch the value, as shown below:
SELECT
at.trade_id,
at.trade_execution_tmstmp,
at.grd_market_exchange_cd,
mme.time_zone_name
FROM cd_data.allocated_trade at
JOIN ae_os_listing_v l ON at.tradable_ent_id = l.cde_tradable_entity_id
JOIN ae_os_market_exchange me ON me.listing_key_id = l.listing_key_id
JOIN mdm.market_exchange mme ON me.market_exchange_cd = mme.market_exchange_cd
WHERE TRUNC(trade_execution_tmstmp) = TRUNC(SYSDATE)
AND at.trade_status_cd IN ('EX','TR')
AND me.jobs_load_id = (SELECT MAX(jobs_load_id) FROM ae_os_market_exchange)
AND me.global_exchange_designation_cd = 'P'
AND trade_id IN (
308983463,
308983465,
308983929,
308983950,
308979467
);
I have to develop a single query merging the above two queries so that I pick time_zone_name via Option 1, if grd_market_exchange_cd is mapped to exactly 1 time_zone_name or via Option 2, if I don't obtain a time_zone_name value via Option 1.

how to add OR constraints to exclude query in django?

I have the following exclude that brings back all of books that have had transactions in the past two minutes. I want to add another constraint that says only if from a series of stores.
In SQL it would be where location_id = 1 or location_id = 2 or location_id = 3
Books have a location_id
How can I apply that to the below query?
transaction_window = timezone.now() + datetime.timedelta(minutes=-2)
ts = Book.objects.exclude(book_id__in = Status.objects
.filter(transaction_time__gte=transaction_window)
.values_list('book_id', flat=True))

You probably just want location__in=(1, 2, 3)

Use a Q object. They're built-in to django and allow more complex lookups.

From your code assuming Book model has field book_id and Status has also field book_id and this field are just plain IntegerField or similar.
from django.db.models import Q
transaction_window = timezone.now() - datetime.timedelta(minutes=2)
statuses = Status.objects.filter(transaction_time__gte=transaction_window)
ts = Book.objects.filter(~Q(book__id__in=statuses)
| ~Q(location_id__in=(1, 2, 3)))
If my guess about your models is correct.

Data Modeling of Entity with Attributes

I'm storing some very basic information "data sources" coming into my application. These data sources can be in the form of a document (e.g. PDF, etc.), audio (e.g. MP3, etc.) or video (e.g. AVI, etc.). Say, for example, I am only interested in the filename of the data source. Thus, I have the following table:
DataSource
Id (PK)
Filename
For each data source, I also need to store some of its attributes. Example for a PDF would be "numbe of pages." Example for audio would be "bit rate." Example for video would be "duration." Each DataSource will have different requirements for the attributes that need to be stored. So, I have modeled "data source attribute" this way:
DataSourceAttribute
Id (PK)
DataSourceId (FK)
Name
Value
Thus, I would have records like these:
DataSource->Id = 1
DataSource->Filename = 'mydoc.pdf'
DataSource->Id = 2
DataSource->Filename = 'mysong.mp3'
DataSource->Id = 3
DataSource->Filename = 'myvideo.avi'
DataSourceAttribute->Id = 1
DataSourceAttribute->DataSourceId = 1
DataSourceAttribute->Name = 'TotalPages'
DataSourceAttribute->Value = '10'
DataSourceAttribute->Id = 2
DataSourceAttribute->DataSourceId = 2
DataSourceAttribute->Name = 'BitRate'
DataSourceAttribute->Value '16'
DataSourceAttribute->Id = 3
DataSourceAttribute->DataSourceId = 3
DataSourceAttribute->Name = 'Duration'
DataSourceAttribute->Value = '1:32'
My problem is that this doesn't seem to scale. For example, say I need to query for all the PDF documents along with thier total number of pages:
Filename, TotalPages
'mydoc.pdf', '10'
'myotherdoc.pdf', '23'
...
The JOINs needed to produce the above result are just too costly. How should I address this problem?

Scaling is one of the most common problems with EAV (Entity-Attribute-Value) data structures. In short, you have to ask for the meta data (i.e. locate the attributes) to get to the data. However, here is a query that you can use to get the data you want:
Select DataSourceId
, Min( Case When Name = 'TotalPages' Then Value End ) As TotalPages
, Min( Case When Name = 'BitRate' Then Value End ) As BitRate
, Min( Case When Name = 'Duration' Then Vlaue End ) As Duration
From DataSourceAttribute
Group By DataSourceId
In order to improve performance, you'll want an index on DataSourceId and perhaps Name as well. To get to the results you posted, you would do:
Select DataSource.FileName
, Min( Case When DataSourceAttribute.Name = 'TotalPages' Then Value End ) As TotalPages
, Min( Case When DataSourceAttribute.Name = 'BitRate' Then Value End ) As BitRate
, Min( Case When DataSourceAttribute.Name = 'Duration' Then Vlaue End ) As Duration
From DataSourceAttribute
Join DataSource
On DataSource.Id = DataSourceAttribute.DataSourceId
Group By DataSource.FileName

It seems like you want something a bit more losse than a typical Relational db. Sounds like a good candidate for something like Lucene or MongoDB. Lucene is an index engine which allows any type of document to be stored and indexed. MongoDB is in the middle between RDBMS and free-form document storage. JSON in some form or other (MongoDB is a good example) should fit nicely.

This might work, but define too costly...
select
datasource.id,
d1.id as d1id,
d1.value as d1filename,
d2.id as d2id,
d2.value as d2totalpages
from datasource
inner join datasourceattribute d1
on datasource.id = d1.datasourceid and d1.name = 'filename'
inner join datasourceattribute d2
on datasource.id = d2.datasourceid and d2.name = 'totalpages'
having d1filename like '%pdf'

Define Column Aliases?

I have a query like this:
SELECT `*`
FROM (`threads` t, `members` m)
WHERE `m`.`id` = t.author
AND `t`.`type` = '0'
AND `t`.`category` = '1'
And basically what happens is that there is an ID field in both tables (members and threads) so what's happening is that the results array is getting messed up. IE: There is only one ID field which is being populated from the members table.
What I need to do is make the results with a prefix infront of their key name so I can distinguish between the two:
IE: Add 't.' to all thread fields and 'm.' to all members fields.
So results should be like: m.id = x, t.id = y
Instead, results at the moment are like: id = x (the id field from the thread table is completely overwritten by the one from the members table)

You could do it by naming each column one by one:
SELECT m.`id` AS "m_id", `t`.`id` AS "t_id", t.`username` AS "t_username"
FROM (`threads` t, `members` m)
WHERE `m`.`id` = t.author
AND `t`.`type` = '0'
AND `t`.`category` = '1'

Use:
SELECT m.id AS x,
t.id 'y'
FROM MEMBERS m
JOIN THREADS t ON t.id = m.author
AND t.type = '0'
AND t.category = '1'
Column aliases are defined on a column by column basis - you can't use wildcards/etc.
You can use the AS keyword, or simply enclose the column alias within single quotes if the column name doesn't contain special characters - use double quotes if it does. You can combine single/double quote usage with the AS keyword.
Do I have to List all the Columns?
Yes, you have to list all the columns unless you like duplicate columns because you choose to use m.* or t.*. There is no convention in SQL that supports what you ask. SELECT * is not an ideal practice - read this answer for details beyond this situation why.
Addendum
I took the liberty of rewriting your query to use ANSI-92 JOIN syntax - your example used ANSI-89. There's no performance difference.

You can set an alias for columns that have the same name:
SELECT t.ID as threadsID, m.ID as membersID, <...> WHERE `m`.`id` = t.author AND `t`.`type` = '0' AND `t`.`category` = '1'

If you just want the id's to be separate, you can do
SELECT `*`, t.id as t_id, m.id as m_id
FROM (`threads` t, `members` m)
WHERE `m`.`id` = t.author
AND `t`.`type` = '0'
AND `t`.`category` = '1'
As the others said, it's a column alias.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Groupby issue on multiple join using Grafana's (TimescaleDB) SQL plugin - sql

Related

Why isn't my SQL WHERE clause being respected?

Join Table A or Table B in a single query based on some condition

how to add OR constraints to exclude query in django?

Data Modeling of Entity with Attributes

Define Column Aliases?

Categories

Resources