Replacing for loop by sql - sql

I have SQL for example
show tables from mydb;
It shows the list of table
|table1|
|table2|
|table3|
Then,I use sql sentence for each table.
such as "show full columns from table1 ;"
+----------+--------+-----------+------+-----+---------+----------------+---------------------------------+---------+
| Field | Type | Collation | Null | Key | Default | Extra | Privileges | Comment |
+----------+--------+-----------+------+-----+---------+----------------+---------------------------------+---------+
| id | bigint | NULL | NO | PRI | NULL | auto_increment | select,insert,update,references | |
| user_id | bigint | NULL | NO | MUL | NULL | | select,insert,update,references | |
| group_id | int | NULL | NO | MUL | NULL | | select,insert,update,references | |
+----------+--------+-----------+------+-----+---------+----------------+---------------------------------+---------+
So in this case I can use programming language such as .(this is not correct code just showing the flow)
tables = "show tables from mydb;"
for t in tables:
cmd.execute("show full columns from {t} ;")
However is it possible to do this in sql only?

If you are using MySQL you can use the system view - INFORMATION_SCHEMA.
It contains table name and column name (and other details). No loop is require and you can easily filter by other information, too.
SELECT *
FROM INFORMATION_SCHEMA.COLUMNS
If you are using Microsoft SQL Server, you can use the above command

Related

hive - show table's column details only

I have created a HIVE partition table and when I run describe table I see other table properties as well as the table column details. If I want to see only the table column details, then what command can I use?
create table t1 (x int, y int, s string) partitioned by (z date) stored as sequencefile;
describe t1;
+--------------------------+-----------------------+-----------------------+--+
| col_name | data_type | comment |
+--------------------------+-----------------------+-----------------------+--+
| x | int | |
| y | int | |
| s | string | |
| z | date | |
| | NULL | NULL |
| # Partition Information | NULL | NULL |
| # col_name | data_type | comment |
| | NULL | NULL |
| z | date | |
+--------------------------+-----------------------+-----------------------+--+
Can the last 5 rows be avoided?
| NULL | NULL |
| # Partition Information | NULL | NULL |
| # col_name | data_type | comment |
| | NULL | NULL |
| z | date | |
Also what does this NULL | NULL row means?
What you're looking for is this configuration parameter:
set hive.display.partition.cols.separately=false
From hive documentation:
In Hive 0.10.0 and earlier, no distinction is made between partition columns and non-partition columns while displaying columns for DESCRIBE TABLE. From Hive 0.12.0 onwards, they are displayed separately.
In Hive 0.13.0 and later, the configuration parameter hive.display.partition.cols.separately lets you use the old behavior, if desired (HIVE-6689). For an example, see the test case in the patch for HIVE-6689.

Are there problems with this 'Soft Delete' solution using EAV tables?

I've read some information about the ugly side of just setting a deleted_at field in your tables to signify a row has been deleted.
Namely
http://richarddingwall.name/2009/11/20/the-trouble-with-soft-delete/
Are there any potential problems with taking a row from a table you want to delete and pivoting it into some EAV tables?
For instance.
Lets Say I have two tables deleted and deleted_row respectively described as follows.
mysql> describe deleted;
+------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+------------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| tablename | varchar(255) | YES | | NULL | |
| deleted_at | timestamp | YES | | NULL | |
+------------+--------------+------+-----+---------+----------------+
mysql> describe deleted_rows;
+--------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+--------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| entity | int(11) | YES | MUL | NULL | |
| name | varchar(255) | YES | | NULL | |
| value | blob | YES | | NULL | |
+--------+--------------+------+-----+---------+----------------+
Now when you wanted to delete a row from any table you would delete it from the table then insert it into these tables as such.
deleted
+----+-----------+---------------------+
| id | tablename | deleted_at |
+----+-----------+---------------------+
| 1 | products | 2011-03-23 00:00:00 |
+----+-----------+---------------------+
deleted_row
+----+--------+-------------+-------------------------------+
| id | entity | name | value |
+----+--------+-------------+-------------------------------+
| 1 | 1 | Title | A Great Product |
| 2 | 1 | Price | 55.00 |
| 3 | 1 | Description | You guessed it... it's great. |
+----+--------+-------------+-------------------------------+
A few things I see off the bat.
You'll need to use application logic
to do the pivot (Ruby, PHP, Python,
etc)
The table could grow pretty big
because I'm using blob to handle
the unknown size of the row value
Do you see any other glaring problems with this type of soft delete?
Why not mirror your tables with archive tables?
create table mytable(
col_1 int
,col_2 varchar(100)
,col_3 date
,primary key(col_1)
)
create table mytable_deleted(
delete_id int not null auto_increment
,delete_dtm datetime not null
-- All of the original columns
,col_1 int
,col_2 varchar(100)
,col_3 date
,index(col_1)
,primary key(delete_id)
)
And then simply add on-delete-triggers on your tables that inserts the current row in the mirrored table before the deletion? That would provide you with dead-simple and very performant solution.
You could actually generate the tables and trigger code using the data dictionary.
Note that I might not want to have a unique index on the original primary key (col_1) in the archive table, because you may actually end up deleting the same row twice over time if you are using natural keys. Unless you plan to hook up the archive tables in your application (for undo purposes) you can drop the index entirely. Also, I added the time of delete (deleted_dtm) and a surrogate key that can be used to delete the deleted (hehe) rows.
You may also consider range partitioning the archive table on deleted_dtm. This makes it pretty much effortless to purge data from the tables.

Rewriting this subquery?

I am trying to build a new table such that the values in the existing table are NOT contained (but obviously the following checks for contained) in another table. Following is my table structure:
mysql> explain t1;
+-----------+---------------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-----------+---------------------+------+-----+---------+-------+
| id | int(11) | YES | | NULL | |
| point | bigint(20) unsigned | NO | MUL | 0 | |
+-----------+---------------------+------+-----+---------+-------+
mysql> explain whitelist;
+-------------+---------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------+---------------------+------+-----+---------+----------------+
| id | bigint(20) unsigned | NO | PRI | NULL | auto_increment |
| x | bigint(20) unsigned | YES | | NULL | |
| y | bigint(20) unsigned | YES | | NULL | |
| geonetwork | linestring | NO | MUL | NULL | |
+-------------+---------------------+------+-----+---------+----------------+
My query looks like this:
SELECT point
FROM t1
WHERE EXISTS(SELECT source
FROM whitelist
WHERE MBRContains(geonetwork, GeomFromText(CONCAT('POINT(', t1.point, ' 0)'))));
Explain:
+----+--------------------+--------------------+-------+-------------------+-----------+---------+------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+--------------------+-------+-------------------+-----------+---------+------+------+--------------------------+
| 1 | PRIMARY | t1 | index | NULL | point | 8 | NULL | 1001 | Using where; Using index |
| 2 | DEPENDENT SUBQUERY | whitelist | ALL | _geonetwork | NULL | NULL | NULL | 3257 | Using where |
+----+--------------------+--------------------+-------+-------------------+-----------+---------+------+------+--------------------------+
The query is taking 6 seconds to execute for 1000 records in t1 which is unacceptable for me. How can I rewrite this query using Joins (or perhaps a faster way if that exists) if I don't have a column to join on? Even a stored procedure is acceptable I guess in the worst case. My goal is to finally create a new table containing entries from t1. Any suggestions?
Unless the query optimizer is failing, a WHERE EXISTS construct should result in the same plan as a join with a GROUP clause. Look at optimizing MBRContains(geonetwork, GeomFromText(CONCAT('POINT(', t1.point, ' 0)')))), that's probably where your query is spending all its time. I don't have a suggestion for that, but here's your query written with a JOIN:
Select t1.point
from t1
join whitelist on MBRContains(whitelist.geonetwork, GeomFromText(CONCAT('POINT(', t1.point, ' 0)'))))
group by t1.point
;
or to get the points in t1 not in whitelist:
Select t1.point
from t1
left join whitelist on MBRContains(whitelist.geonetwork, GeomFromText(CONCAT('POINT(', t1.point, ' 0)'))))
where whitelist.id is null
;
This seems like a case where de-nomalizing t1 might be beneficial. Adding a GeomFrmTxt column with a value of GeomFromText(CONCAT('POINT(', t1.point, ' 0)')) could speed up the query you already have.

Create a summary result with one query

I have a table with the following format.
mysql> describe unit_characteristics;
+----------------------+------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+----------------------+------------------+------+-----+---------+----------------+
| id | int(10) unsigned | NO | PRI | NULL | auto_increment |
| uut_id | int(10) unsigned | NO | PRI | NULL | |
| uut_sn | varchar(45) | NO | | NULL | |
| characteristic_name | varchar(80) | NO | PRI | NULL | |
| characteristic_value | text | NO | | NULL | |
| creation_time | datetime | NO | | NULL | |
| last_modified_time | datetime | NO | | NULL | |
+----------------------+------------------+------+-----+---------+----------------+
each uut_sn has multiple characteristic_name/value pairs. I want to use MySQL to generate a table
+----------------------+-------------+-------------+-------------+--------------+
| uut_sn | char_name_1 | char_name_2 | char_name_3 | char_name_4 | ... |
+----------------------+-------------+-------------+-------------+--------------+
| 00000 | char_val_1 | char_val_2 | char_val_3 | char_val_4 | ... |
| 00001 | char_val_1 | char_val_2 | char_val_3 | char_val_4 | ... |
| 00002 | char_val_1 | char_val_2 | char_val_3 | char_val_4 | ... |
| ..... | char_val_1 | char_val_2 | char_val_3 | char_val_4 | ... |
+----------------------+------------------+------+-----+---------+--------------+
Is this possible with just one query?
Thanks,
-peter
This is a standard pivot query:
SELECT uc.uut_sn,
MAX(CASE
WHEN uc.characteristic_name = 'char_name_1' THEN uc.characteristic_value
ELSE NULL
END) AS char_name_1,
MAX(CASE
WHEN uc.characteristic_name = 'char_name_2' THEN uc.characteristic_value
ELSE NULL
END) AS char_name_2,
MAX(CASE
WHEN uc.characteristic_name = 'char_name_3' THEN uc.characteristic_value
ELSE NULL
END) AS char_name_3,
FROM unit_characteristics uc
GROUP BY uc.uut_sn
To make it dynamic, you need to use MySQL's dynamic SQL syntax called Prepared Statements. It requires two queries - the first gets a list of the characteristic_name values, so you can concatenate the appropriate string into the CASE expressions like you see in my example as the ultimate query.
You're using the EAV antipattern. There's no way to automatically generate the pivot table you describe, without hardcoding the characteristics you want to include. As #OMG Ponies mentions, you need to use dynamic SQL to general the query in a custom fashion for the set of characteristics you want to include in the result.
Instead, I recommend you fetch the characteristics one per row, as they are stored in the database, and if you want an application object to represent a single UUT with all its characteristics, you write code to loop over the rows as you fetch them in your application, collecting them into objects.
For example in PHP:
$sql = "SELECT uut_sn, characteristic_name, characteristic_value
FROM unit_characteristics";
$stmt = $pdo->query($sql);
$objects = array();
while ($row = $stmt->fetch()) {
if (!isset($objects[ $row["uut_sn"] ])) {
$object[ $row["uut_sn"] ] = new Uut();
}
$objects[ $row["uut_sn"] ]->$row["characteristic_name"]
= $row["characterstic_value"];
}
This has a few advantages over the solution of hardcoding characteristic names in your query:
This solution takes only one SQL query instead of two.
No complex code is needed to build your dynamic SQL query.
If you forget one of the characteristics, this solution automatically finds it anyway.
GROUP BY in MySQL is often slow, and this avoids the GROUP BY.

SQL LIKE question

I was wondering if there's a drawback (other than bad practice) to using something like this
SELECT * FROM my_table WHERE id LIKE '1';
where id is an integer. I know you're supposed to use id=1 but I am writing a java program and if everything can use LIKE it'll be a lot easier for me. Also, so far, everything works fine; I get the correct query results, so if there is no drawback I will continue doing it like this.
edit: I am using MySQL.
MySQL will allow it, but will ignore the index:
mysql> describe METADATA_44;
+---------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+---------+--------------+------+-----+---------+-------+
| AtextId | int(11) | NO | PRI | NULL | |
| num | varchar(128) | YES | | NULL | |
| title | varchar(128) | YES | | NULL | |
| file | varchar(128) | YES | | NULL | |
| context | varchar(128) | YES | | NULL | |
| source | varchar(128) | YES | | NULL | |
+---------+--------------+------+-----+---------+-------+
6 rows in set (0.00 sec)
mysql> explain select * from METADATA_44 where Atextid like '7';
+----+-------------+-------------+------+---------------+------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------------+------+---------------+------+---------+------+------+-------------+
| 1 | SIMPLE | METADATA_44 | ALL | PRIMARY | NULL | NULL | NULL | 591 | Using where |
+----+-------------+-------------+------+---------------+------+---------+------+------+-------------+
mysql> explain select * from METADATA_44 where Atextid=7;
+----+-------------+-------------+-------+---------------+---------+---------+-------+------+-------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------------+-------+---------------+---------+---------+-------+------+-------+
| 1 | SIMPLE | METADATA_44 | const | PRIMARY | PRIMARY | 4 | const | 1 | |
+----+-------------+-------------+-------+---------------+---------+---------+-------+------+-------+
1 row in set (0.00 sec)
You'd need to look at the Query Execution Plan on your RDBMS to verify that LIKE with no wildcards is treated as efficiently as an = would be. A quick test in SQL Server shows that it would give you an index scan rather than a seek so I guess it doesn't look at that when generating the plan and for SQL Server using = would be much more efficient. I don't have a MySQL install to test against.
Edit: Just to update this SQL Server seems to handle it fine and do a seek when the data type is varchar. When it is run against an int column though you get the scan. This is because it does an implicit conversion to varchar on the int column so can't use the index.
You are better off writing your query as
SELECT * FROM my_table WHERE id = 1;
otherwise mysql will have to typecast '1' to int which is the type of the column id
so obviously there is a small performance penalty, when u know the type of the column supply the value according to that type
Speed. [15-char filler as there's not much more to say]
Without using any wildcards with LIKE, is should be fine for your needs if the speed/efficiency is something you don't bother with.