Understanding the precise difference in how SQL treats temp tables vs inline views - sql

I know similar questions have been asked, but I will try to explain why they haven't answered my exact confusion.
To clarify, I am a complete beginner to SQL so bear with me if this is an obvious question.
Despite being a beginner I have been fortunate enough to be given a role doing some data science and I was recently doing some work where I wrote a query that self-joined a table, then used an inline view on the result, which I then selected from. I can include the code if necessary but I feel it is not for the question.
After running this, the admin emailed me and asked to please stop since it was creating very large temp tables. That was all sorted and he helped me write it more efficiently, but it made me very confused.
My understanding was that temp tables are specifically created by a statement like
SELECT INTO #temp1
I was simply using a nested select statement. Other questions on here seem to confirm that temp tables are different. For example the question here along with many others.
In fact I don't even have privileges to create new tables, so what am I misunderstanding? Was he using "temp tables" differently from the standard use, or do inline views create the same temp tables?
From what I can gather, the only explanation I can think of is that genuine temp tables are physical tables in the database, while inline views just store an array in RAM rather than in the actual database. Is my understanding correct?

There are two kind of temporary tables in MariaDB/MySQL:
Temporary tables created via SQL
CREATE TEMPORARY TABLE t1 (a int)
Creates a temporary table t1 that is only available for the current session and is automatically removed when the current session ends. A typical use case are tests in which you don't want to clean everything up in the end.
Temporary tables/files created by server
If the memory is too low (or the data size is too large), the correct indexes are not used, etc. the database server needs to create temporary files for sorting, collecting results from subqueries, etc. Temporary files are an indicator of your database design / and / or instructions should be optimized. Disk access is much slower than memory access and unnecessarily wastes resources.
A typical example for temporary files is a simple group by on a column which is not indexed (information displayed in "Extra" column):
MariaDB [test]> explain select first_name from test group by first_name;
+------+-------------+-------+------+---------------+------+---------+------+---------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+-------+------+---------------+------+---------+------+---------+---------------------------------+
| 1 | SIMPLE | test | ALL | NULL | NULL | NULL | NULL | 4785970 | Using temporary; Using filesort |
+------+-------------+-------+------+---------------+------+---------+------+---------+---------------------------------+
1 row in set (0.000 sec)
The same statement with an index doesn't need to create temporary table:
MariaDB [test]> alter table test add index(first_name);
Query OK, 0 rows affected (7.571 sec)
Records: 0 Duplicates: 0 Warnings: 0
MariaDB [test]> explain select first_name from test group by first_name;
+------+-------------+-------+-------+---------------+------------+---------+------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+-------+-------+---------------+------------+---------+------+------+--------------------------+
| 1 | SIMPLE | test | range | NULL | first_name | 58 | NULL | 2553 | Using index for group-by |
+------+-------------+-------+-------+---------------+------------+---------+------+------+--------------------------+

Related

Is it more efficient to query a view from a stored procedure or include the table join in the stored procedure?

I have a database structure in MySQL similar to Instagram, where I have a table containing paths to pictures in a file system and a table containing user information as such:
Users:
ID | userName | age | gender
---|-----------|-----|-------
1 | MrBanana | 15 | 0
2 | BobTheMan | 21 | 0
3 | TheBest | 19 | 1
4 | MsTest | 24 | 1
Pictures:
ID | Path | userID
---|-----------|--------
1 | www.test1 | 2
2 | www.test2 | 4
3 | www.test3 | 3
4 | www.test4 | 2
Now the requirement is that whenever a picture is called up it will include the userName and ID. So the first Idea I had was to create a view that joins the two table so that a picture now also has the user name and ID of the images attached to it and then query the pictures out of that view. The query would be placed in a stored procedure. Now my question is if this is efficient or if it where more efficient to do the query and join in one stamens and put that into the stored procedure ?
My concern is that if I use the view approach, every time it queries the view it will have to first join the entirety of the two tables and if these tables become very big this would be a very time consuming process. So if I create a stored procedure that first finds all the needed pictures and then joins the user to it it would be more efficient.
I am not sure if I am understanding this correctly and would like to ask for help on which approach is better and would scale more effectively ?
Not sure which RDBMS are you using, but from my experience with SQL Server (and I guess that the other vendors do the same) an ordinary view would use the indexes
of the tables included in the view query as if you where doing that query outside the view.
So if you are worrying about if your vwPicturesWithUser would use the index of Pictures table when you query for the picture with ID=3, the answer is yes (well I guess that somebody could come up with some odd scenario where the query planner decides to ignore the index, but that would happen too querying without the view).

Update a MS Access field with the column count of another, not JOINed table

What I'm trying to do is create an update query in MS Access 2013 for a table separate from the actual data tables (meaning that there is no connection between the data table and the statistics table) to store some statistics (e.g. Count of records) that need to be stored for further calculations and later use.
I've looked up a bunch of tutorials in the past few days on this, with no luck of finding a solution to my problem, as all solutions included joining the tables, which - in my case - is irrelevant, as the table to-be-calculated-on is temporary with constantly changing data, thus I always want to count every record, find the max in the whole temp table, etc. on a given date (like logging).
The structure of statisticsTable:
| statDate (Date/time) | itemCount (integer) | ... |
----------------------------------------------------
| 01/01/2017 | 50 | ... |
| 02/01/2017 | 47 | ... |
| 03/01/2017 | 43 | ... |
| ... | ... | ... |
What I want to do, in semi-gibberish code:
UPDATE statisticsTable
SET itemCount = (SELECT Count(*) FROM tempTable)
WHERE statDate = 01/01/2017;
This should update the itemCount field of 01/01/2017 in the statisticsTable with the current row count of the temp table.
I know that this might not be the standard OR the correct use of MS Access or any DBMS in general, however, my assignment is rather limited, meaning I can't (shouldn't) modify any table structures, connections or the database structure in general, only create the update query that works as described above.
Is it possible to update a table's field value with the output of a query calculating on another table, WITHOUT joining the two tables in MS Access?
EDIT 1:
After further research, the function DCount() might be able to give the results I'm looking for, I will test it.
EDIT: I wrote a way more complicated answer that might not have even worked in Access (it would work in MS SQL-Server). Anyway.
What you need is a join criteria that is always true on which to base your update. You can just use is not null:
SELECT s.*, a.itemCount
FROM statisticsTable as s
INNER JOIN
(
SELECT count(*) as itemCount
from tempTable
) as a
on s.[some field that is always populated] is not null
and a.itemCount is not null

How to delete duplicate rows from a table without unique key with only "plain" SQL and no temporary tables?

Similar questions have been asked and answered here multiple times. From what I could find they were either specific to particular SQL implementation (Oracle, SQL Server, etc) or relied on a temporary table (where result would be initially copied).
I wonder if their is a platform-independent pure DML solution (just a single DELETE statement).
Sample data: Table A with a single field.
---------
|account|
|-------|
| A22 |
| A33 |
| A44 |
| A22 |
| A55 |
| A44 |
---------
The following SQL Fiddle shows Oracle-specific solution based on ROWID pseudo-column. It wouldn't work for any other database and is shown here just as an example.
The only platform-independent way I can think of is to store the data in a secondary table, truncate the first, and load it back in:
create table _tableA (
AccountId varchar(255)
);
insert into _TableA
select distinct AccountId from TableA;
truncate table TableA;
insert into TableA
select AccountId from _TableA;
drop table _TableA;
If you have a column that is unique for each account or relax the dialects of SQL, then you can possible find a single query solution.

MS Access SQL Insert data into table with relationship

I have been searching everywhere for a solution to this problem and I have concluded that I may be asking the wrong question. I am hoping that someone here can answer this question or tell me the right question to ask.
I have two tables Estimates and TakeoffSheet:
+----------------------+ +----------------------+
| Estimates | | TakeoffSheet |
+----------------------+ +----------------------+
| 1. Estimate_ID = PK | | 1. Sheet_ID = PK |
| 2. Number | | 2. Estimate_ID = FK |
| 3. Estimate_Name | | 3. Sheet_Name |
+----------------------+ +----------------------+
I am trying to insert a new row in Takeoffsheet where Estimate.Number=Something. Essentially there is a relationship defined between TakeoffSheet and Estimates however I dont know how to insert a new row in Takeoffsheet, with an Estimate_ID from Estimates. At least not in one SQL statement.
I know I can do it in multiple steps (first get the Estimate_ID, then add a new sheet with that) but I would rather do it in a single statement if possible. Is there a "join" type of insert in MS Access?
Thanks for any help!
If you defined the foreing keys, and created a View e.g. V_TakeOffSheet you will be able to append records per SQL like
Insert into V_TakeOffSheet (Sheet_Name,[Number],Estimate_Name) values ('Sheet1',1,'Est1');
Access will set Estimate_ID to the record fitting in Number and Estimate_Name, if this record is not existing it will be created and the new Estimate_ID will be used in TakeoffSheet .

How can I test that a SQL "CREATE TABLE" statement is equivalent to an existing table?

For each table in my MySQL database I have a mytablename.sql file which contains the CREATE TABLE statement for that table.
I want to add a test to check that noone has added/altered a column in the live database without updating this file - that it, I want to check that I can re-create an empty database using these scripts.
How can I do this reliably?
One option would be using SHOW COLUMNS FROM mytable (or DESCRIBE mytable), which, on the command line would output in a tabular form like this:
+---------+------------------+------+-----+---------------------+----------------+
| Field | Type | Null | Key | Default | Extra |
+---------+------------------+------+-----+---------------------+----------------+
| pk_guid | int(10) unsigned | NO | PRI | NULL | auto_increment |
| d_start | datetime | NO | MUL | 0000-00-00 00:00:00 | |
| d_end | datetime | NO | MUL | 0000-00-00 00:00:00 | |
+---------+------------------+------+-----+---------------------+----------------+
And then create a temporary table and compare the results.
This would be fine, except that if any columns have been added to the live database then the results might not be in the same row order.
Unfortunately, it doesn't seem to be possible to use ORDER BY with SHOW COLUMNS.
Another option would be to use SHOW CREATE TABLE, but that includes information such as the AUTO_INCREMENT counter value, which I don't care about.
Is there a better way of doing this?
I've done this using a somewhat ghetto/duct-tape PHP shell script across database servers (dev/production setup) in the past. Given you have 2 servers, and one of them is "right":
grab a list of columns per each table to check. SHOW CREATE TABLE works, or just SELECT * FROM x LIMIT 1 if you aren't interested in possibly altered data types / defaults.
array_sort() the field names alphabetically, and loop through each comparing whatever potential changes you're interested in.
Not recommending this though! I used this because the sysadmins didn't think anything of granting full db permissions to some outsourcing folks, so it was just a quick safety net, nothing as planned or permanent as unit tests.
A more elegant & cleaner way to do this (limited to MySQL5+ though?) would be to use the information_schema system database, particularly the COLUMNS table. If you don't have 2 servers, a separate database to hold/copy column info (e.g. a clone of information_schema with the latest "correct" schema) would work too. This would let you diff for new/dropped tables, indices, triggers, procedures, users/permissions, etc. as well.
you wrote:
One option would be using SHOW COLUMNS ... on the command line ....
[but the columns] might not be in the same row order.
You might force this order, post hoc. On a UNIX-ish system, you can simply:
$ mysql -Bse 'SHOW COLUMNS FROM possibly_altered' | sort
and compare that to similarly pared and sorted SHOW output on a temp table made by your .sql file. (The -Bs suppresses some of that mysql(1) fancy formatting and headers, not needed and somewhat inconvenient for a more programmatic comparison.)