Does no.of columns effect read query performance with projection - sql

Consider that I have a table USERS_1 with 2 columns : id and name
and another table USERS_2 with 3 columns: id, name and age.
I have indexes on id on both tables and both tables contains 20 rows with same date for id and name. Lets consider postgres DB as an example.
Will there be a performance difference between the following queries:
SELECT id, name FROM USERS_1 WHERE id < 10
SELECT id, name FROM USERS_2 WHERE id < 10
Lets say this WHERE clause matches with 5 rows in both tables.
I have heard that since the no.of columns are more in USERS_2, the I/O operations to be done might be more as the DB server has to read the entire row from disk, before projecting. Projection only helps in transferring lesser data to the client. Is that correct?
Ref: https://community.oracle.com/tech/developers/discussion/3764712/does-the-number-of-columns-in-a-table-can-affect-the-performance#:~:text=So%20yes%2C%20250%20columns%20typically,rows%20of%205%20cols%20each.
I do know that the no.of rows and columns are too minimal to observe any performance difference, but the intent is to understand how projection and I/O reads are related.

Related

Selecting a large number of rows by index using SQL

I am trying to select a number of rows by the value of a column called ID. I know you can do this pretty easily by:
SELECT col1, col2, col3 FROM mytable WHERE id IN (1,2,3,4,5...)
However, what if there are a few million IDs I want to select and the IDs don't always have pattern (which means I can't use something like BETWEEN x AND y)? Does this select statement still work or is there better ways of doing so?
The actual application is this. Filters are specified by users, which is compared to some attributes of the records. From those filters, we create a subset of the data which is of interest to a particular user. There are about 30 million records each with roughly ~3000 attributes (which is stored in roughly 30 tables, but every table has ID as a primary key), so every time someone makes a query about their desired subset of records, we'd have to join many tables, apply those filters, and figure out what his subset looks like. In order to avoid joining many tables all the time, I thought maybe it's a better idea to join the tables once, figure out the id of the selected subset, and this way each time a new query is made, all we have to do is select the relevant columns of the rows that match the filtered ids.
This depends on the database and the interface you are using. For a few hundred or thousand values, no problem. But your question specifies millions. And that could start to get into limits on the length of the query -- either specified by the database, the tool you are using, or intermediate libraries.
If you have so many ids, I would strongly recommend that you load them into a table in the database with the id as the primary key. Then use join or exists to identify the rows in your table that match.
Often, such a list would be generated in the database anyway. In that case, you can use a subquery or CTE and just include that code in your final query.

Best way to compare two tables in SQL by matching string?

I have a program where the goal is to take data from an API, and capture the differences in data from minute to minute. It involves three tables: Table 1 (for new data), Table 2 (for previous minutes data), Results table (for the results).
The sequence of the program is like this:
Update table 1 -> Calculate the differences from table 2 and update a "Results" table with the differences -> Copy table 1 to table 2.
Then it repeats! It's simple and it works.
Here is my SQL query:
Insert into Results (symbol, bid, ask, description, Vol_Dif, Price_Dif, Time) Select * FROM(
Select symbol, bid, ask, description, Vol_Dif, Price_Dif, '$now' as Time FROM (
Select t1.symbol, t1.bid, t1.ask, t1.description, (t1.volume - t2.volume) AS Vol_Dif, (t1.totalPrice - t2.totalPrice) AS Price_Dif
FROM `Table_1` t1
Inner Join (
Select id, volume, ask, totalPrice FROM Table_2) t2
ON t2.id = t1.id) as test
The tables are identical in structure, obviously. The primary key is the 'id' field that auto-increments. And as you can see, I am comparing both tables on the basis of these 'id' fields being equal.
The PROBLEM is that the API seems to be inconsistent. One API call will have 50,000 entries. The next one will have 51,000 entries. And the entries are not just added to the end or added to the beginning, they are mixed into the middle.
So, comparing on equal ID's means I am comparing entries for DIFFERENT data, IF the API calls return a different number entries.
The data that I am trying to get the differences of is the 'bid', 'ask', 'Vol_Dif', 'Price_Dif' from minute to minute. There are many instances of the same 'symbol's, so I couldn't compare with this. The ONLY other way to compare entries from table to table, beside the matching ID's, would be matching the "description" fields.
I have tried this. The script is almost the same as above except the end of the query is
ON t2.description = t1.description
The problem is that looking for matching description fields takes 3 minutes for 50,000 entries, whereas looking for matching ID's takes 1 second.
Is there a better, faster way to do what I'm trying to do? Thanks in advance. Any help is appreciated.

How to Duplicate a Small Table To All Amps?

I have a Small Table in a Teradata Database that consists of 30 rows and 9 columns.
How do I duplicate the Small Table across all amps?
Note: this is the opposite of what one usually wants to do with a Large Table, distribute the rows evenly
You can not "duplicate" the same table content across all amps. You can try to store all rows from the table to one AMP through unevenly distributed rows. So if I understand the request you want all rows from your small table to be stored on one amp only.
If so, you can create a column that has the same value for all rows(if you don't already have this). You can make it INTEGER column in order to use less space. Then you have to make this column primary index of the table and your actual keys you can make them as secondary keys.
You can check how the rows are stored on the amps true the code below.
SELECT
TABLENAME
,VPROC AS NUM_AMP
,CAST(SUM(CURRENTPERM)/(1024*1024*1024) AS DECIMAL(18,5)) AS USEDSPACE_IN_GB
FROM DBC.TABLESIZEV
WHERE UPPER(DATABASENAME) = UPPER('databasename') AND UPPER(TABLENAME) = UPPER('tablename')
GROUP BY 1, 2
ORDER BY 1;
or
SELECT
HASHAMP(HASHBUCKET(HASHROW(primary_index_columns))) AS "AMP"
,COUNT(*) AS CNT
FROM databasename.tablename
GROUP BY 1
ORDER BY 2 DESC;

difficulties to fetching data from table

We have a table of 627 columns and approx 850 000 records.
We are trying to retrieve only two columns and dump that data in new table, but the query is taking endless time and we are unable to get the result in new table.
create table test_sample
as
select roll_no, date_of_birth from sample_1;
We have unique index on roll_no column (varchar) and data type for date_of_birth is date.
Your query has no WHERE clause, so it scans the full table. It reads all the columns of every row into memory to extract the columns it needs to satisfy your query. This will take a long time because your table has 627 columns, and I'll bet some of them are pretty wide.
Additionally, a table with that many columns may give you problems with migrated rows or chaining. The impact of that will depend on the relative position of roll_no and date_of_birth in the table's projection.
In short, a table with 627 columns shows poor (non-existent) data modelling. Which doesn't help you now, it's just a lesson to be learned.
If this is a one-off exercise you'll just need to let the query run. (Although you should check whether it is running at all: can you see active progress in V$SESSION_LONGOPS?)

How does Oracle perform read operation?

Suppose we have a table which holds information about person. Columns like NAME or SURNAME are small (I mean their size isn't very large), but columns that hold a photo or maybe a person's video (blob columns) may be very large. So when we perform a select operation:
select * from person
it will retrieve all this information. But in most cases we need only retrieve name or surname of person, so we perform this query:
select name, surname from person
Question: will Oracle read the whole record (including the blob columns) and then simply filter out name and surname columns, or will it only read name and surname columns?
Also, even if we create a separate table for such large data(person's photo and video) and have a foreign key to that table in person's table and want to retrieve only photo, so we perform this query:
select photo
from person p
join largePesonData d on p.largeDataID = d.largeDataID
where p.id = 1
Will Oracle read a whole record in person table and whole record in largePesonData or will it simply read the column with photo in largePesonData?
Oracle reads the data in blocks.
Let's assume that your block size is 8192 bytes and your average row size is 100 bytes - that would mean each block would populate 8192/100 = 81 rows (It's not accurate since there is some overhead coming from the block header - but I'm trying to keep things simple).
So when you
select name, surname from person;
You actually retrieve at least on block with all of it's data (81 rows), and later after it is being screened returning you only the data you requested.
Two exceptions to this are:
BLOB Column - "select name, surename from person" will not retrieve the BLOB contents itself because BLOB columns contain a reference to the actual BLOB (which sits somewhere else on the tablespace or even in anoter TS)
Indexed columns - In case you created an index on the table using the columns name and surname it is possible that Oracle will only scan this specific index and retrieve only those two columns.