Will data get deleted on dropping internal table using location clause during its creation from hive? - hive

In hive if I create an internal table using the loaction clause (mentioning loaction other than default location of hive) in table creation statement then on dropping that table will it delete the data from the specified location just like it does when the data is in default location of hive?

Yes, it will delete the location even it is not default location of hive also.
Let's assume i'm having test table in default database on /user/yashu/test5 directory.
hive> desc formatted test_tmp;
+-------------------------------+-------------------------------------------------------------+-----------------------+--+
| col_name | data_type | comment |
+-------------------------------+-------------------------------------------------------------+-----------------------+--+
| # col_name | data_type | comment |
| | NULL | NULL |
| id | int | |
| name | string | |
| | NULL | NULL |
| # Detailed Table Information | NULL | NULL |
| Database: | default | NULL |
| Owner: | shu | NULL |
| CreateTime: | Fri Mar 23 03:42:15 EDT 2018 | NULL |
| LastAccessTime: | UNKNOWN | NULL |
| Protect Mode: | None | NULL |
| Retention: | 0 | NULL |
| Location: | hdfs://nn1.com/user/yashu/test5 | NULL |
| Table Type: | MANAGED_TABLE | NULL |
| Table Parameters: | NULL | NULL |
| | numFiles | 1 |
| | totalSize | 12 |
| | transient_lastDdlTime | 1521790935 |
| | NULL | NULL |
| # Storage Information | NULL | NULL |
| SerDe Library: | org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | NULL |
| InputFormat: | org.apache.hadoop.mapred.TextInputFormat | NULL |
| OutputFormat: | org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat | NULL |
| Compressed: | No | NULL |
| Num Buckets: | -1 | NULL |
| Bucket Columns: | [] | NULL |
| Sort Columns: | [] | NULL |
| Storage Desc Params: | NULL | NULL |
| | field.delim | , |
| | serialization.format | , |
+-------------------------------+-------------------------------------------------------------+-----------------------+--+
hadoop directory having one .txt file in test 5 directory
bash$ hadoop fs -ls /user/yashu/test5/
Found 1 items
-rw-r--r-- 3 hdfs hdfs 12 2018-03-23 03:42 /user/yashu/test5/test.txt
Hive table data
select * from test_tmp;
+--------------+----------------+--+
| test_tmp.id | test_tmp.name |
+--------------+----------------+--+
| 1 | bar |
| 2 | foo |
+--------------+----------------+--+
once i drop the table in hive then the directory test5 also dropped from hdfs
hive> drop table test_tmp;
bash$ hadoop fs -ls /user/yashu/test5/
ls: `/user/yashu/test5/': No such file or directory
So once we delete the internal table in hive even the hive table is not on default location also drops the directory(location) that the table is pointing to.

Related

Presto datatype Mismatch Issue In Hive ORC Table

I am trying to query my hive orc table by presto ,In Hive its working Fine.In prestro I am able to access all the column except lowrange It's showing Below Erroe
error : Query 20220322_135856_00076_a33ec failed: Error opening Hive split hdfs://.....filename.orc
(offset=0, length=24216): Malformed ORC file. Cannot read SQL type varchar from ORC stream .lowrange
of type LONG [hdfs://.....filename.orc.orc]
I have set below property in presto before starting the query:
set hive1.orc.use-column-names=true
where hive1 is my catalog name.
I have also tried to change Hive tables datatype for this column as Double/BigInt,Int But Nothing Worked.
Can someone help me to resolve the error.
Table Description:
+-------------------------------+---------------------------------------------------------+-----------------------+--+
| col_name | data_type | comment |
+-------------------------------+---------------------------------------------------------+-----------------------+--+
| # col_name | data_type | comment |
| | NULL | NULL |
| lowrange | string | |
| type | string | |
| processed_date | string | |
| | NULL | NULL |
| # Partition Information | NULL | NULL |
| # col_name | data_type | comment |
| | NULL | NULL |
| type | string | |
| | NULL | NULL |
| # Detailed Table Information | NULL | NULL |
| Database: | test | NULL |
| Owner: | hdfs | NULL |
| CreateTime: | Tue Mar 22 08:28:49 UTC 2022 | NULL |
| LastAccessTime: | UNKNOWN | NULL |
| Protect Mode: | None | NULL |
| Retention: | 0 | NULL |
| Location: | hdfs://......../user/hdfs/test/ | NULL |
| Table Type: | EXTERNAL_TABLE | NULL |
| Table Parameters: | NULL | NULL |
| | EXTERNAL | TRUE |
| | skip.header.line.count | 1 |
| | transient_lastDdlTime | 1647937729 |
| | NULL | NULL |
| # Storage Information | NULL | NULL |
| SerDe Library: | org.apache.hadoop.hive.ql.io.orc.OrcSerde | NULL |
| InputFormat: | org.apache.hadoop.hive.ql.io.orc.OrcInputFormat | NULL |
| OutputFormat: | org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat | NULL |
| Compressed: | No | NULL |
| Num Buckets: | -1 | NULL |
| Bucket Columns: | [] | NULL |
| Sort Columns: | [] | NULL |
| Storage Desc Params: | NULL | NULL |
| | field.delim | , |
| | serialization.format | , |
+-------------------------------+---------------------------------------------------------+-----------------------+--+
Sample Data:
lowrange type processed_date
1234567890001212 01 20220323
1234567890001213 01 20220323
Table Create Statement:
CREATE EXTERNAL TABLE `table1`(
`lowrange` string,
`processed_date` string)
PARTITIONED BY (
`type` string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
LOCATION
'hdfs://......./user/hdfs/test'
TBLPROPERTIES (
'skip.header.line.count'='1')
Update: I dropped the existing Table and created new table with datatype from String to BigINt in hive table and able to select data from Table but When I am trying to perform lpad operation its again showing same issue.
Logic I want to apply on Field : lpad(lowrange ,13,'9')
Error: Unexpected parameters (bigint, integer, varchar(1)) for
function lpad. Expected: lpad(varchar(x), bigint, varchar(y))
then I tried to cast bigint to varchar using Below query:
Updated Logic : lpad(cast(lowrange as varchar),13,'9'))
Error:
Malformed ORC file. Cannot read SQL type bigint
from ORC stream .lowrange of type STRING

How to determine the name of an Impala object corresponds to a view

Is there a way in Impala to determine whether an object name returned by SHOW TABLES corresponds to a table or a view since:
this statement only return the object names, without their type
SHOW CREATE VIEW is just an alias for SHOW CREATE TABLE (same result, no view/table distinction)
DESCRIBE does not give any clue about the type of the item
Ideally I'd like to list all the tables + views and their types using a single operation, not one to retrieve the tables + views and then another call for each name to determine the type of the object.
(please note the question is about Impala, not Hive)
You can use describe formatted to know the type of an object
impala-shell> CREATE TABLE table2(
id INT,
name STRING
);
impala-shell> CREATE VIEW view2 AS SELECT * FROM table2;
impala-shell> DESCRIBE FORMATTED table2;
+------------------------------+--------------------------------------------------------------------+----------------------+
| name | type | comment |
+------------------------------+--------------------------------------------------------------------+----------------------+
| Retention: | 0 | NULL |
| Location: | hdfs://quickstart.cloudera:8020/user/hive/warehouse/test.db/table2 | NULL |
| Table Type: | MANAGED_TABLE | NULL |
+------------------------------+--------------------------------------------------------------------+----------------------+
impala-shell> DESCRIBE FORMATTED view2;
+------------------------------+-------------------------------+----------------------+
| name | type | comment |
+------------------------------+-------------------------------+----------------------+
| Protect Mode: | None | NULL |
| Retention: | 0 | NULL |
| Table Type: | VIRTUAL_VIEW | NULL |
| Table Parameters: | NULL | NULL |
| | transient_lastDdlTime | 1601632695 |
| | NULL | NULL |
| # Storage Information | NULL | NULL |
+------------------------------+-------------------------------+----------------------+
In the case of the table type is Table Type: MANAGED_TABLE and for the view is Table Type: VIRTUAL_VIEW
Other way is querying metastore database (if you can) to know about metadata in Impala(or Hive)
mysql> use metastore;
mysql> select * from TBLS;
+--------+-------------+-------+------------------+-----------+-----------+-------+----------+---------------+------------------------------------------------------------+---------------------------+----------------+
| TBL_ID | CREATE_TIME | DB_ID | LAST_ACCESS_TIME | OWNER | RETENTION | SD_ID | TBL_NAME | TBL_TYPE | VIEW_EXPANDED_TEXT | VIEW_ORIGINAL_TEXT | LINK_TARGET_ID |
+--------+-------------+-------+------------------+-----------+-----------+-------+----------+---------------+------------------------------------------------------------+---------------------------+----------------+
| 9651 | 1601631971 | 9331 | 0 | anonymous | 0 | 27996 | table1 | MANAGED_TABLE | NULL | NULL | NULL |
| 9652 | 1601632121 | 9331 | 0 | anonymous | 0 | 27997 | view1 | VIRTUAL_VIEW | SELECT `table1`.`id`, `table1`.`name` FROM `test`.`table1` | SELECT * FROM table1 | NULL |
| 9653 | 1601632676 | 9331 | 0 | cloudera | 0 | 27998 | table2 | MANAGED_TABLE | NULL | NULL | NULL |
| 9654 | 1601632695 | 9331 | 0 | cloudera | 0 | 27999 | view2 | VIRTUAL_VIEW | SELECT * FROM test.table2 | SELECT * FROM test.table2 | NULL |
+--------+-------------+-------+------------------+-----------+-----------+-------+----------+---------------+------------------------------------------------------------+---------------------------+----------------+

changing default value for one column in SQL 5.7 gives warning about other column

Since SQL 5.7, my customers get more problems. One of the issues I found was following:
When I run a query, I get a message about another column, that is really strange?? who can explain this?
mysql> ALTER TABLE advertisement ALTER COLUMN local_name set default 'x';
ERROR 1067 (42000): Invalid default value for 'end_time'
The table which created this error is following:
mysql> show columns from advertisement;
+----------------+--------------+------+-----+---------------------+----------------+
| Field | Type | Null | Key | Default | Extra |
+----------------+--------------+------+-----+---------------------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| local_name | varchar(64) | NO | | | |
| chinese_name | varchar(64) | NO | | | |
| image | varchar(128) | NO | | | |
| top1 | varchar(128) | NO | | | |
| center1 | varchar(128) | NO | | | |
| bottom1 | varchar(128) | NO | | | |
| bottom2 | varchar(128) | NO | | | |
| bottom3 | varchar(128) | NO | | | |
| top_colour1 | varchar(16) | NO | | | |
| center_colour1 | varchar(16) | NO | | | |
| bottom_colour1 | varchar(16) | NO | | | |
| bottom_colour2 | varchar(16) | NO | | | |
| bottom_colour3 | varchar(16) | NO | | | |
| start_time | timestamp | NO | | CURRENT_TIMESTAMP | |
| end_time | timestamp | NO | | 0000-00-00 00:00:00 | |
| hour | varchar(64) | NO | | | |
| status | smallint(6) | NO | | 0 | |
+----------------+--------------+------+-----+---------------------+----------------+
That is because of server SQL Mode - NO_ZERO_DATE.
From the reference: NO_ZERO_DATE - In strict mode, don't allow '0000-00-00' as a valid date. You can still insert zero dates with the IGNORE option. When not in strict mode, the date is accepted but a warning is generated.
Documentation link

Why is this SQL generating a temporary table and running so slow?

I have the following SQL generated from my Rails app, it is trying to get a list of all auto models that have live adverts in a marketplace app & from mysql:
SELECT `models`.* FROM `models`
INNER JOIN `autos` ON autos.model_id = models.id
INNER JOIN `ads` ON `ads`.id = `autos`.ad_id
WHERE (ads.ad_status_id = 4 AND pub_start_date < NOW() AND pub_end_date > NOW() AND models.manufacturer_id = 50 )
GROUP BY models.id ORDER BY models.name;
When I run an explain, this is what I get:
Id 1 1 1
Select Type SIMPLE SIMPLE SIMPLE
Table models autos ads
Type ref ref eq_ref
Possible Keys PRIMARY,manufacturer_id model_id,ad_id PRIMARY,quick_search,ad_status_id
Key manufacturer_id model_id PRIMARY
Key Length 5 4 4
Ref const concept_development.models.id concept_development.autos.ad_id
Rows 70 205 1
Extra Using where; Using temporary; Using filesort Using where Using where
I cannot understand why the query is generating a temporary table / using file-sort - all of the referenced keys are indexes. Been trying to figure this out for a few days now and getting nowhere.
Any help is very much appreciated!
EXPLAIN models:
+---------------------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+---------------------+-------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| name | varchar(32) | YES | | NULL | |
| manufacturer_id | int(11) | YES | MUL | NULL | |
| vehicle_category_id | int(11) | NO | MUL | 1 | |
| synonym_names | longtext | YES | | NULL | |
+---------------------+-------------+------+-----+---------+----------------+
SHOW INDEXES FROM models:
+--------+------------+---------------------+--------------+---------------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+--------+------------+---------------------+--------------+---------------------+-----------+-------------+----------+--------+------+------------+---------+
| models | 0 | PRIMARY | 1 | id | A | 2261 | NULL | NULL | | BTREE | |
| models | 1 | manufacturer_id | 1 | manufacturer_id | A | 205 | NULL | NULL | YES | BTREE | |
| models | 1 | vehicle_category_id | 1 | vehicle_category_id | A | 7 | NULL | NULL | | BTREE | |
+--------+------------+---------------------+--------------+---------------------+-----------+-------------+----------+--------+------+------------+---------+
MODEL TABLE STATUS:
+--------+--------+---------+------------+------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+---------------------+---------------------+-------------------+----------+----------------+---------+
| Name | Engine | Version | Row_format | Rows | Avg_row_length | Data_length | Max_data_length | Index_length | Data_free | Auto_increment | Create_time | Update_time | Check_time | Collation | Checksum | Create_options | Comment |
+--------+--------+---------+------------+------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+---------------------+---------------------+-------------------+----------+----------------+---------+
| models | MyISAM | 10 | Dynamic | 2261 | 26 | 61000 | 281474976710655 | 84992 | 0 | 2751 | 2010-09-28 18:42:45 | 2010-09-28 18:42:45 | 2010-09-28 18:44:00 | latin1_swedish_ci | NULL | | |
+--------+--------+---------+------------+------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+---------------------+---------------------+-------------------+----------+----------------+---------+
EXPLAIN ADS
+------------------+--------------------------+------+-----+---------------------+----------------+
| Field | Type | Null | Key | Default | Extra |
+------------------+--------------------------+------+-----+---------------------+----------------+
| id | int(10) | NO | PRI | NULL | auto_increment |
| fp_urn | int(10) | NO | MUL | 0 | |
| user_id | int(10) | NO | MUL | 0 | |
| ad_status_id | int(3) unsigned | NO | MUL | 1 | |
| style_id | int(10) | NO | | 3 | |
| search_tags | varchar(255) | YES | | NULL | |
| title | varchar(255) | NO | | | |
| description | text | YES | | NULL | |
| currency | enum('EUR','GBP') | NO | | EUR | |
| price | decimal(8,2) | NO | MUL | 0.00 | |
| proposal_type | enum('Offered','Wanted') | NO | | Offered | |
| category_id | int(10) | YES | | 0 | |
| contact | varchar(50) | NO | MUL | | |
| area_id | int(10) | NO | | 0 | |
| origin_id | int(10) | NO | | 0 | |
| reject_reason_id | int(3) | NO | | 0 | |
| date_created | timestamp | NO | | 0000-00-00 00:00:00 | |
| last_modified | timestamp | NO | | CURRENT_TIMESTAMP | |
| pub_start_date | datetime | YES | | 0000-00-00 00:00:00 | |
| pub_end_date | datetime | YES | | 0000-00-00 00:00:00 | |
| bumped_up_date | datetime | YES | | 0000-00-00 00:00:00 | |
| state | smallint(6) | YES | | NULL | |
| eproofed | tinyint(1) | NO | | 0 | |
| is_featured | int(1) | NO | | 0 | |
| num_featured_imp | int(10) | YES | | 0 | |
| num_direct_imp | int(10) | YES | | 0 | |
| is_top_listed | int(1) | NO | | 0 | |
| delta | tinyint(1) | NO | | 0 | |
| ext_ref_id | varchar(50) | YES | | NULL | |
| email_seller | tinyint(1) | YES | | 1 | |
| sort_by | int(10) | YES | | 8 | |
| permalink | varchar(500) | YES | | NULL | |
| external_url | varchar(255) | YES | | NULL | |
+------------------+--------------------------+------+-----+---------------------+----------------+
SHOW TABLE STATUS FROM concept_development WHERE NAME LIKE 'ads';
+------+--------+---------+------------+--------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+-----------------+----------+-------------------------------------------------+---------+
| Name | Engine | Version | Row_format | Rows | Avg_row_length | Data_length | Max_data_length | Index_length | Data_free | Auto_increment | Create_time | Update_time | Check_time | Collation | Checksum | Create_options | Comment |
+------+--------+---------+------------+--------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+-----------------+----------+-------------------------------------------------+---------+
| ads | InnoDB | 10 | Compact | 656350 | 232 | 152748032 | 0 | 87736320 | 340787200 | 1148382 | 2010-09-29 09:55:46 | NULL | NULL | utf8_general_ci | NULL | checksum=1 delay_key_write=1 row_format=DYNAMIC | |
+------+--------+---------+------------+--------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+-----------------+----------+-------------------------------------------------+---------+
SHOW INDEXES FROM ADS
+-------+------------+-----------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+-------+------------+-----------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+
| ads | 0 | PRIMARY | 1 | id | A | 521391 | NULL | NULL | | BTREE | |
| ads | 1 | NewIndex1 | 1 | ad_status_id | A | 15 | NULL | NULL | | BTREE | |
| ads | 1 | NewIndex1 | 2 | pub_end_date | A | 260695 | NULL | NULL | YES | BTREE | |
| ads | 1 | NewIndex1 | 3 | category_id | A | 521391 | NULL | NULL | YES | BTREE | |
| ads | 1 | NewIndex1 | 4 | style_id | A | 521391 | NULL | NULL | | BTREE | |
| ads | 1 | NewIndex2 | 1 | user_id | A | 130347 | NULL | NULL | | BTREE | |
| ads | 1 | NewIndex3 | 1 | price | A | 7667 | NULL | NULL | | BTREE | |
| ads | 1 | contact | 1 | contact | A | 260695 | NULL | NULL | | BTREE | |
| ads | 1 | fp_urn | 1 | fp_urn | A | 521391 | NULL | NULL | | BTREE | |
+-------+------------+-----------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+
EXPLAIN autos
+-------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------+-----+-------------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------+-----+-------------+----------------+
| id | int(10) | NO | PRI | NULL | auto_increment |
| ad_id | int(10) | YES | MUL | NULL | |
| style_id | int(10) | YES | MUL | NULL | |
| manufacturer_id | int(10) | NO | MUL | NULL | |
| model_id | int(10) | NO | MUL | NULL | |
| registration | varchar(10) | YES | | NULL | |
| year | int(4) | YES | | NULL | |
| fuel_type | enum('Petrol','Diesel') | NO | | Petrol | |
| colour | varchar(75) | YES | | NULL | |
| mileage | varchar(25) | NO | | Not Entered | |
| mileage_units | enum('mls','kms') | NO | | mls | |
| num_doors | varchar(25) | NO | | Not Entered | |
| num_owners | int(2) | YES | | NULL | |
| engine_size | varchar(10) | YES | | NULL | |
| transmission_type | enum('Manual','Automatic') | NO | | Manual | |
| body_type | enum('Saloon','Hatchback') | NO | | Saloon | |
| condition | varchar(75) | NO | | NA | |
| extra_features | text | YES | | NULL | |
| tax_expiry | varchar(7) | YES | | NULL | |
| nct_expiry | varchar(7) | YES | | NULL | |
| variation | text | YES | | NULL | |
| tax_class | enum('Agricultural','Bus') | NO | | Private | |
| co2 | int(9) | YES | | NULL | |
+-------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------+-----+-------------+----------------+
SHOW TABLE STATUS FROM concept_development WHERE NAME LIKE 'autos'
+-------+--------+---------+------------+--------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+-----------------+----------+-------------------------------------------------+---------+
| Name | Engine | Version | Row_format | Rows | Avg_row_length | Data_length | Max_data_length | Index_length | Data_free | Auto_increment | Create_time | Update_time | Check_time | Collation | Checksum | Create_options | Comment |
+-------+--------+---------+------------+--------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+-----------------+----------+-------------------------------------------------+---------+
| autos | InnoDB | 10 | Compact | 196168 | 136 | 26804224 | 0 | 26279936 | 340787200 | 485405 | 2010-09-17 22:09:45 | NULL | NULL | utf8_general_ci | NULL | checksum=1 delay_key_write=1 row_format=DYNAMIC | |
+-------+--------+---------+------------+--------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+-----------------+----------+-------------------------------------------------+---------+
show indexes from autos;
+-------+------------+-----------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+-------+------------+-----------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+
| autos | 0 | PRIMARY | 1 | id | A | 294937 | NULL | NULL | | BTREE | |
| autos | 1 | ad_id | 1 | ad_id | A | 294937 | NULL | NULL | YES | BTREE | |
| autos | 1 | style_id | 1 | style_id | A | 10 | NULL | NULL | YES | BTREE | |
| autos | 1 | manufacturer_id | 1 | manufacturer_id | A | 194 | NULL | NULL | | BTREE | |
| autos | 1 | model_id | 1 | model_id | A | 830 | NULL | NULL | | BTREE | |
+-------+------------+-----------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+
From the MySQL documentation:
Temporary tables can be created under conditions such as these:
* If there is an ORDER BY clause and a different GROUP BY clause, or if the ORDER BY or GROUP BY contains columns from tables other than the first table in the join queue, a temporary table is created.
http://dev.mysql.com/doc/refman/5.1/en/internal-temporary-tables.html
Change all the text columns to varchar. If you need to maintain them as "text", you'll need to snowflake the schema and exclude the description tables from this query.
If any of the columns in any of the tables are text or blob, MySQL automatically creates an on-disk temporary table, rather than an in-memory temporary table. The temporary table itself isn't killing you, it's the fact that it's writing it to the disk.
http://dev.mysql.com/doc/refman/5.1/en/internal-temporary-tables.html
Some conditions prevent the use of an in-memory temporary table, in which case the server uses an on-disk table instead:
Presence of a BLOB or TEXT column in the table
You have a index on pub_end_date, but not on pub_start_date and your WHERE clause references both.
It looks like it is not using the pub_end_date index, but this could be because it needs to check pub_start_date as well.
This isn't explaining why but how about you rewrite your query to not use a group by? I think you're just joining on those tables to ensure there exists an ad of interest. So how about:
SELECT `models`.*
FROM `models`
WHERE models.manufacturer_id = 50
AND EXISTS ( SELECT * FROM `autos`
INNER JOIN `ads` ON `ads`.id = `autos`.ad_id
WHERE autos.model_id = models.id
AND ads.ad_status_id = 4
AND ads.pub_start_date < NOW()
AND ads.pub_end_date > NOW()
)
ORDER BY models.name;
The performance issues might be related to the group by, in which case this would improve performance.
Maybe it'd look a bit nicer with and NOW() between ads.pub_start_date and ads.pub_end_date, if you're allowed to do that in mysql (and if it works how you want with the edge cases).

Why is my query so slow? Trying to find the null fields on a left join in mysql

EDIT: Updated with suggestions from Bill Karwin below. Still very slow.
I'm trying to write a query that will find all items on an order that are entered to a warehouse that doesn't have a record for that item in that warehouse. As an example, if item XYZ is entered for warehouse A, but warehouse A doesn't actually carry item XYZ, I want the order item to show up in my report.
I'm able to run the query just fine, but it seems to take forever (50 seconds). It seems to be hanging mainly on the "is null" condition I have in the where clause. If I remove the condition with the "is null" and run it, it executes in about 4.8s. Here's my query:
SELECT
saw_order.Wo,
saw_orderitem.Item,
saw_orderitem.Stock,
saw_order.`Status`,
saw_order.`Date`,
saw_orderitem.Warehouse,
saw_stockbalance.Balno,
saw_stockbalance.Stock
FROM
saw_order
Inner Join saw_orderitem ON saw_order.Wo = saw_orderitem.Wo
Inner Join saw_stock ON saw_orderitem.Stock = saw_stock.Stock
Left Join saw_stockbalance ON saw_orderitem.Stock = saw_stockbalance.Stock
AND saw_orderitem.Warehouse = saw_stockbalance.Warehouse
WHERE
saw_order.`Status` Between 3 and 81 and
saw_stockbalance.Stock Is Null
When I explain the query above, I see:
+----+-------------+------------------+--------+------------------------------+---------+---------+-------------------------------------------------------+-------+-------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------------+--------+------------------------------+---------+---------+-------------------------------------------------------+-------+-------------------------+
| 1 | SIMPLE | saw_stock | index | PRIMARY | PRIMARY | 17 | NULL | 32793 | Using index |
| 1 | SIMPLE | saw_orderitem | ref | PRIMARY,Stock,StockWarehouse | Stock | 17 | saws.saw_stock.Stock | 68 | |
| 1 | SIMPLE | saw_order | eq_ref | PRIMARY,Status | PRIMARY | 4 | saws.saw_orderitem.Wo | 1 | Using where |
| 1 | SIMPLE | saw_stockbalance | ref | Stock,Warehouse | Stock | 20 | saws.saw_orderitem.Stock,saws.saw_orderitem.Warehouse | 1 | Using where; Not exists |
+----+-------------+------------------+--------+------------------------------+---------+---------+-------------------------------------------------------+-------+-------------------------+
I'm pretty sure I have indexes for all of the fields of the respective tables in my joins, but can't figure out how to rewrite the query to make it go faster.
EDIT: Here are the indexes I have set up on my tables:
mysql> show index from saw_order;
+-----------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+-----------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| saw_order | 0 | PRIMARY | 1 | Wo | A | 553425 | NULL | NULL | | BTREE | |
| saw_order | 1 | Customer | 1 | Customer | A | 14957 | NULL | NULL | | BTREE | |
| saw_order | 1 | Other | 1 | Other | A | 218 | NULL | NULL | | BTREE | |
| saw_order | 1 | Site | 1 | Site | A | 8 | NULL | NULL | | BTREE | |
| saw_order | 1 | Date | 1 | Date | A | 1594 | NULL | NULL | | BTREE | |
| saw_order | 1 | Status | 1 | Status | A | 15 | NULL | NULL | | BTREE | |
+-----------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
6 rows in set
mysql> show index from saw_orderitem;
+---------------+------------+----------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+---------------+------------+----------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| saw_orderitem | 0 | PRIMARY | 1 | Wo | A | NULL | NULL | NULL | | BTREE | |
| saw_orderitem | 0 | PRIMARY | 2 | Item | A | 1842359 | NULL | NULL | | BTREE | |
| saw_orderitem | 1 | Stock | 1 | Stock | A | 27093 | NULL | NULL | | BTREE | |
| saw_orderitem | 1 | Product | 1 | Product | A | 803 | NULL | NULL | | BTREE | |
| saw_orderitem | 1 | GGroup | 1 | GGroup | A | 114 | NULL | NULL | | BTREE | |
| saw_orderitem | 1 | ShipVia | 1 | ShipVia | A | 218 | NULL | NULL | | BTREE | |
| saw_orderitem | 1 | Warehouse | 1 | Warehouse | A | 9 | NULL | NULL | | BTREE | |
| saw_orderitem | 1 | StockWarehouse | 1 | Stock | A | 27093 | NULL | NULL | | BTREE | |
| saw_orderitem | 1 | StockWarehouse | 2 | Warehouse | A | 49793 | NULL | NULL | | BTREE | |
+---------------+------------+----------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
9 rows in set
mysql> show index from saw_stock;
+-----------+------------+-------------------+--------------+-------------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+-----------+------------+-------------------+--------------+-------------------+-----------+-------------+----------+--------+------+------------+---------+
| saw_stock | 0 | PRIMARY | 1 | Stock | A | 32793 | NULL | NULL | | BTREE | |
| saw_stock | 1 | Class | 1 | Class | A | 655 | NULL | NULL | YES | BTREE | |
| saw_stock | 1 | DateFirstReceived | 1 | DateFirstReceived | A | 2732 | NULL | NULL | | BTREE | |
+-----------+------------+-------------------+--------------+-------------------+-----------+-------------+----------+--------+------+------------+---------+
3 rows in set
mysql> show index from saw_stockbalance;
+------------------+------------+-----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+------------------+------------+-----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| saw_stockbalance | 0 | PRIMARY | 1 | Balno | A | 146315 | NULL | NULL | | BTREE | |
| saw_stockbalance | 1 | Stock | 1 | Stock | A | 36578 | NULL | NULL | | BTREE | |
| saw_stockbalance | 1 | Stock | 2 | Warehouse | A | 146315 | NULL | NULL | | BTREE | |
| saw_stockbalance | 1 | Warehouse | 1 | Warehouse | A | 11 | NULL | NULL | | BTREE | |
+------------------+------------+-----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
4 rows in set
Any ideas?
I'd try to make it use a covering index. That is, instead of testing if Balno is null, test if one of the columns in your left outer join conditions is null. E.g. Stock or Warehouse.
You should also define an index over the two columns (Stock, Warehouse) in both tables saw_orderitem and saw_stockbalance.
Try:
SELECT t.wo,
soi.item,
soi.stock,
t.Status,
t.Date,
soi.warehouse,
NULL 'balno', --ssb.Balno,
NULL 'stock', --ssb.Stock
FROM SAW_ORDER t
JOIN SAW_ORDERITEM soi ON soi.wo = t.wo
JOIN SAW_STOCK ss ON ss.stock = soi.stock
WHERE t.status BETWEEN 3 AND 81
AND NOT EXISTS (SELECT NULL
FROM SAW_STOCKBALANCE ssb
WHERE ssb.stock != soi.stock
AND ssb.warehouse = soi.warehouse)
The problem with the query is that you are checking for nulls on a LEFT JOIN...