Problem
(This is for an open source, analytics library.)
Here's our query results from events_view:
id | visit_id | name | prop0 | prop1 | url
------+----------+--------+----------------------------+-------+------------
2004 | 4 | Magnus | 2021-10-26 02:25:55.790999 | 142 | cnn.com
2007 | 4 | Hartis | 2021-10-26 02:26:37.773999 | 25 | fox.com
Currently all columns are VARCHAR.
Column | Type | Collation | Nullable | Default
----------+-------------------+-----------+----------+---------
id | bigint | | |
visit_id | character varying | | |
name | character varying | | |
prop0 | character varying | | |
prop1 | character varying | | |
url | character varying | | |
They should be something like
Column | Type | Collation | Nullable | Default
----------+------------------------+-----------+----------+---------
id | bigint | | |
visit_id | bigint | | |
name | character varying | | |
prop0 | time without time zone | | |
prop1 | bigint | | |
url | character varying | | |
Desired result
Hardcoding these castings as in SELECT visit::bigint, name::varchar, prop0::time, prop1::integer, url::varchar FROM tbl won't do, column names are known in run time only.
To simplify things we could cast each column into only three types: boolean, numeric, or varchar. Use regexps below for matching types:
boolean: ^(true|false|t|f)$
numeric: ^(,-)[0-9]+(,\.[0-9]+)$
varchar: every result that does not match boolean and numeric above
What should be the SQL that discover what type each column is and dynamically cast them?
These are a few ideas rather than a true solution for this tricky job. A slow but very reliable function can be used instead of regular expressions.
create or replace function can_cast(s text, vtype text)
returns boolean language plpgsql immutable as
$body$
begin
execute format('select %L::%s', s, vtype);
return true;
exception when others then
return false;
end;
$body$;
Data may be presented like this (partial list of columns from your example)
create or replace temporary view tv(id, visit_id, prop0, prop1) as
values
(
2004::bigint,
4::bigint,
case when can_cast('2021-10-26 02:25:55.790999', 'time') then '2021-10-26 02:25:55.790999'::time end,
case when can_cast('142', 'bigint') then '142'::bigint end
), -- determine the types
(2007, 4, '2021-10-26 02:26:37.773999', 25)
-- the rest of the data here;
I believe that it is possible to generate the temporary view DDL dynamically as a select from events_view too.
Related
I have SQL for example
show tables from mydb;
It shows the list of table
|table1|
|table2|
|table3|
Then,I use sql sentence for each table.
such as "show full columns from table1 ;"
+----------+--------+-----------+------+-----+---------+----------------+---------------------------------+---------+
| Field | Type | Collation | Null | Key | Default | Extra | Privileges | Comment |
+----------+--------+-----------+------+-----+---------+----------------+---------------------------------+---------+
| id | bigint | NULL | NO | PRI | NULL | auto_increment | select,insert,update,references | |
| user_id | bigint | NULL | NO | MUL | NULL | | select,insert,update,references | |
| group_id | int | NULL | NO | MUL | NULL | | select,insert,update,references | |
+----------+--------+-----------+------+-----+---------+----------------+---------------------------------+---------+
So in this case I can use programming language such as .(this is not correct code just showing the flow)
tables = "show tables from mydb;"
for t in tables:
cmd.execute("show full columns from {t} ;")
However is it possible to do this in sql only?
If you are using MySQL you can use the system view - INFORMATION_SCHEMA.
It contains table name and column name (and other details). No loop is require and you can easily filter by other information, too.
SELECT *
FROM INFORMATION_SCHEMA.COLUMNS
If you are using Microsoft SQL Server, you can use the above command
I am trying to visualise in Grafana from timescale db with the following query
SELECT $__timeGroup(timestamp,'30m'), sum(error) as Error
FROM userCounts
WHERE serviceid IN ($Service) AND ciclusterid IN ($CiClusterId)
AND environment IN ($environment) AND filterid IN ($filterId)
AND $__timeFilter("timestamp")
GROUP BY timestamp;
however it gives an error and no data shows when i add the filterid IN ($filterId) part
have checked the variable names a thousand times but not sure what is error. Logically if the filters for variables are working in other conditions , it should work here also. not sure what is going wrong. Can anyone give input ?
Edit:
The schema is like
timestamp | timestamp without time zone | | not nul
l |
measurement | character varying(150) | |
|
filterid | character varying(150) | |
|
environment | character varying(150) | |
|
iscanary | boolean | |
|
servicename | character varying(150) | |
|
serviceid | character varying(150) | |
|
ciclusterid | character varying(150) | |
--more--
In grafana , it is giving the error
pq: column "in_orgs_that_have_had_an_operational_connector" does not exist
Where filterId = IN_ORGS_THAT_HAVE_HAD_AN_OPERATIONAL_CONNECTOR is selected, it is a value and not a column so not sure why they mentioned that, also they are showing in lower case while the value is in uppercase
Have the following tables, and want to create table card where the columns fee is calculated as product_price.price + product_sale.b_rate
and comm computed is product_price.price - product_sale.s_rate
is there a way to autopopulate the table card when table product_sale and product_price populates.
such as create table card with computed column or function?
need help
product_sale
+---------+-------------+----+
| column | type | pk |
+---------+-------------+----+
| s_date | date | Y |
| seller | varchar(50) | Y |
| country | varchar(50) | Y |
| b_rate | float | |
| s_rate | float | |
+---------+-------------+----+
product_price
+---------+-------------+----+
| column | type | pk |
+---------+-------------+----+
| p_date | date | Y |
| product | varchar(50) | Y |
| price | varchar(50) | |
+---------+-------------+----+
card
+---------+-------------+----+
| column | type | pk |
+---------+-------------+----+
| c_date | date | Y |
| seller | varchar(50) | Y |
| country | varchar(50) | Y |
| fee | float | |
| comm | float | |
+---------+-------------+----+
You can have "pseudo" virtual columns, not bona fide real ones. PostgreSQL can emulate virtual columns as functions on a table (not persisted), However, There are several limitations:
Virtual columns can only depend on values on the same row of the same table and/or generic database functions, not other rows of other tables.
They are not listed automatically when using * on a select. See Emulating Virtual Columns.
They cannot be indexed. Meaning, if you want to search by them the search is less than optimal.
Now, I guess the first item is a deal breaker for you. This is not what you need.
I've used this trick quite successfully several times already.
I have a table with the following format.
mysql> describe unit_characteristics;
+----------------------+------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+----------------------+------------------+------+-----+---------+----------------+
| id | int(10) unsigned | NO | PRI | NULL | auto_increment |
| uut_id | int(10) unsigned | NO | PRI | NULL | |
| uut_sn | varchar(45) | NO | | NULL | |
| characteristic_name | varchar(80) | NO | PRI | NULL | |
| characteristic_value | text | NO | | NULL | |
| creation_time | datetime | NO | | NULL | |
| last_modified_time | datetime | NO | | NULL | |
+----------------------+------------------+------+-----+---------+----------------+
each uut_sn has multiple characteristic_name/value pairs. I want to use MySQL to generate a table
+----------------------+-------------+-------------+-------------+--------------+
| uut_sn | char_name_1 | char_name_2 | char_name_3 | char_name_4 | ... |
+----------------------+-------------+-------------+-------------+--------------+
| 00000 | char_val_1 | char_val_2 | char_val_3 | char_val_4 | ... |
| 00001 | char_val_1 | char_val_2 | char_val_3 | char_val_4 | ... |
| 00002 | char_val_1 | char_val_2 | char_val_3 | char_val_4 | ... |
| ..... | char_val_1 | char_val_2 | char_val_3 | char_val_4 | ... |
+----------------------+------------------+------+-----+---------+--------------+
Is this possible with just one query?
Thanks,
-peter
This is a standard pivot query:
SELECT uc.uut_sn,
MAX(CASE
WHEN uc.characteristic_name = 'char_name_1' THEN uc.characteristic_value
ELSE NULL
END) AS char_name_1,
MAX(CASE
WHEN uc.characteristic_name = 'char_name_2' THEN uc.characteristic_value
ELSE NULL
END) AS char_name_2,
MAX(CASE
WHEN uc.characteristic_name = 'char_name_3' THEN uc.characteristic_value
ELSE NULL
END) AS char_name_3,
FROM unit_characteristics uc
GROUP BY uc.uut_sn
To make it dynamic, you need to use MySQL's dynamic SQL syntax called Prepared Statements. It requires two queries - the first gets a list of the characteristic_name values, so you can concatenate the appropriate string into the CASE expressions like you see in my example as the ultimate query.
You're using the EAV antipattern. There's no way to automatically generate the pivot table you describe, without hardcoding the characteristics you want to include. As #OMG Ponies mentions, you need to use dynamic SQL to general the query in a custom fashion for the set of characteristics you want to include in the result.
Instead, I recommend you fetch the characteristics one per row, as they are stored in the database, and if you want an application object to represent a single UUT with all its characteristics, you write code to loop over the rows as you fetch them in your application, collecting them into objects.
For example in PHP:
$sql = "SELECT uut_sn, characteristic_name, characteristic_value
FROM unit_characteristics";
$stmt = $pdo->query($sql);
$objects = array();
while ($row = $stmt->fetch()) {
if (!isset($objects[ $row["uut_sn"] ])) {
$object[ $row["uut_sn"] ] = new Uut();
}
$objects[ $row["uut_sn"] ]->$row["characteristic_name"]
= $row["characterstic_value"];
}
This has a few advantages over the solution of hardcoding characteristic names in your query:
This solution takes only one SQL query instead of two.
No complex code is needed to build your dynamic SQL query.
If you forget one of the characteristics, this solution automatically finds it anyway.
GROUP BY in MySQL is often slow, and this avoids the GROUP BY.
I was wondering if there's a drawback (other than bad practice) to using something like this
SELECT * FROM my_table WHERE id LIKE '1';
where id is an integer. I know you're supposed to use id=1 but I am writing a java program and if everything can use LIKE it'll be a lot easier for me. Also, so far, everything works fine; I get the correct query results, so if there is no drawback I will continue doing it like this.
edit: I am using MySQL.
MySQL will allow it, but will ignore the index:
mysql> describe METADATA_44;
+---------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+---------+--------------+------+-----+---------+-------+
| AtextId | int(11) | NO | PRI | NULL | |
| num | varchar(128) | YES | | NULL | |
| title | varchar(128) | YES | | NULL | |
| file | varchar(128) | YES | | NULL | |
| context | varchar(128) | YES | | NULL | |
| source | varchar(128) | YES | | NULL | |
+---------+--------------+------+-----+---------+-------+
6 rows in set (0.00 sec)
mysql> explain select * from METADATA_44 where Atextid like '7';
+----+-------------+-------------+------+---------------+------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------------+------+---------------+------+---------+------+------+-------------+
| 1 | SIMPLE | METADATA_44 | ALL | PRIMARY | NULL | NULL | NULL | 591 | Using where |
+----+-------------+-------------+------+---------------+------+---------+------+------+-------------+
mysql> explain select * from METADATA_44 where Atextid=7;
+----+-------------+-------------+-------+---------------+---------+---------+-------+------+-------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------------+-------+---------------+---------+---------+-------+------+-------+
| 1 | SIMPLE | METADATA_44 | const | PRIMARY | PRIMARY | 4 | const | 1 | |
+----+-------------+-------------+-------+---------------+---------+---------+-------+------+-------+
1 row in set (0.00 sec)
You'd need to look at the Query Execution Plan on your RDBMS to verify that LIKE with no wildcards is treated as efficiently as an = would be. A quick test in SQL Server shows that it would give you an index scan rather than a seek so I guess it doesn't look at that when generating the plan and for SQL Server using = would be much more efficient. I don't have a MySQL install to test against.
Edit: Just to update this SQL Server seems to handle it fine and do a seek when the data type is varchar. When it is run against an int column though you get the scan. This is because it does an implicit conversion to varchar on the int column so can't use the index.
You are better off writing your query as
SELECT * FROM my_table WHERE id = 1;
otherwise mysql will have to typecast '1' to int which is the type of the column id
so obviously there is a small performance penalty, when u know the type of the column supply the value according to that type
Speed. [15-char filler as there's not much more to say]
Without using any wildcards with LIKE, is should be fine for your needs if the speed/efficiency is something you don't bother with.