I have a Small Table in a Teradata Database that consists of 30 rows and 9 columns.
How do I duplicate the Small Table across all amps?
Note: this is the opposite of what one usually wants to do with a Large Table, distribute the rows evenly
You can not "duplicate" the same table content across all amps. You can try to store all rows from the table to one AMP through unevenly distributed rows. So if I understand the request you want all rows from your small table to be stored on one amp only.
If so, you can create a column that has the same value for all rows(if you don't already have this). You can make it INTEGER column in order to use less space. Then you have to make this column primary index of the table and your actual keys you can make them as secondary keys.
You can check how the rows are stored on the amps true the code below.
SELECT
TABLENAME
,VPROC AS NUM_AMP
,CAST(SUM(CURRENTPERM)/(1024*1024*1024) AS DECIMAL(18,5)) AS USEDSPACE_IN_GB
FROM DBC.TABLESIZEV
WHERE UPPER(DATABASENAME) = UPPER('databasename') AND UPPER(TABLENAME) = UPPER('tablename')
GROUP BY 1, 2
ORDER BY 1;
or
SELECT
HASHAMP(HASHBUCKET(HASHROW(primary_index_columns))) AS "AMP"
,COUNT(*) AS CNT
FROM databasename.tablename
GROUP BY 1
ORDER BY 2 DESC;
Related
Consider that I have a table USERS_1 with 2 columns : id and name
and another table USERS_2 with 3 columns: id, name and age.
I have indexes on id on both tables and both tables contains 20 rows with same date for id and name. Lets consider postgres DB as an example.
Will there be a performance difference between the following queries:
SELECT id, name FROM USERS_1 WHERE id < 10
SELECT id, name FROM USERS_2 WHERE id < 10
Lets say this WHERE clause matches with 5 rows in both tables.
I have heard that since the no.of columns are more in USERS_2, the I/O operations to be done might be more as the DB server has to read the entire row from disk, before projecting. Projection only helps in transferring lesser data to the client. Is that correct?
Ref: https://community.oracle.com/tech/developers/discussion/3764712/does-the-number-of-columns-in-a-table-can-affect-the-performance#:~:text=So%20yes%2C%20250%20columns%20typically,rows%20of%205%20cols%20each.
I do know that the no.of rows and columns are too minimal to observe any performance difference, but the intent is to understand how projection and I/O reads are related.
I have a table named "A" which have 2 columns, "A1" and "A2".
I want each unique value in column "A1" to have MAX 2 rows in the table, if a unique value in column "A1" have 5 rows, 3 rows should be deleted.
Which 3 rows to delete is determinated by the lowest values in column "A2".
The table consist of +20 million rows, +300000 unique values in column "A1" and up to 3000 rows per unique value in column "A1".
I have solved this with the following query:
with excess as
(
select
id,
row_number() over(partition by A1 order by A2 desc) as rownum
from
A
)
delete from excess
where rownum > 2
I'm satisfied with this query since it took 8 minutes for the initial batch and ~20 seconds in recurring executions.
Is this the most efficient query to achieve the requirements?
yes, this is is the most efficient query without copying the data into another table because it is making it in a single run against the table instead of joining back to itself. I would suggest that you use "delete top(N)" and keep the number under 5,000, if there are any other consumers of the table. this will attempt to prevent the lock escalation from escalating to a full table lock. it will also free up the tlogs on the server to be reused in between batches. if you do it all in one go, all of the deleted rows have to be accounted for in the tlogs, and the space can't be reused until the statement is complete. I would also suggest creating a composite index on (A1, A2).
if the number of rows that need to be deleted are a significant percentage, it would be faster to copy the rows where rownum <= 2 into a new table. then, drop the original table and rename the new table back to the original. if you have other consumers of the table and/or don't want to copy the data, then this may not be a valid solution.
I have a simple join table with two id columns in SQL Server.
Is there any way to select all rows in the exact order they were inserted?
If I try to make a SELECT *, even if I don't specify an ORDER BY clause, the rows are not being returned in the order they were inserted, but ordered by the first key column.
I know it's a weird question, but this table is very big and I need to check exactly when a strange behavior has begun, and unfortunately I don't have a timestamp column in my table.
UPDATE #1
I'll try to explain why I'm saying that the rows are not returned in 'natural' order when I SELECT * FROM table without an ORDER BY clause.
My table was something like this:
id1 id2
---------------
1 1
2 2
3 3
4 4
5 5
5 6
... and so on, with about 90.000+ rows
Now, I don't know why (probably a software bug inserted these rows), but my table have 4.5 million rows and looks like this:
id1 id2
---------------
1 1
1 35986
1 44775
1 60816
1 62998
1 67514
1 67517
1 67701
1 67837
...
1 75657 (100+ "strange" rows)
2 2
2 35986
2 44775
2 60816
2 62998
2 67514
2 67517
2 67701
2 67837
...
2 75657 (100+ "strange" rows)
Crazy, my table have now millions of rows. I have to take a look when this happened (when the rows where inserted) because I have to delete them, but I can't just delete using *WHERE id2 IN (strange_ids)* because there are "right" id1 columns that belongs to these id2 columns, and I can't delete them, so I'm trying to see when exactly these rows were inserted to delete them.
When I SELECT * FROM table, it returns me ordered by id1, like the above table, and
the rows were not inserted in this order in my table. I think my table is not corrupted because is the second time that this strange behavior happens the same way, but now I have so many rows that I can delete manually like it was on 1st time. Why the rows are not being returned in the order they were inserted? These "strange rows" were definetely inserted yesterday and should be returned near the end of my table if I do a SELECT * without an ORDER BY, isn't it?
A select query with no order by does not retrieve the rows in any particular order. You have to have an order by to get an order.
SQL Server does not have any default method for retrieving by insert order. You can do it, if you have the information in the row. The best way is a primary key identity column:
TableId int identity(1, 1) not null primary key
Such a column is incremented as each row is inserted.
You can also have a CreatedAt column:
CreatedAt datetime default getdate()
However, this could have duplicates for simultaneous inserts.
The key point, though, is that a select with no order by clause returns an unordered set of rows.
As others have already written, you will not be able to get the rows out of the link table in the order they were inserted.
If there is some sort of internal ordering of the rows in one or both of the tables that this link table is joining, then you can use that to try to figure out when the link table rows have been created. Basically, they cannot have been created BEFORE both of the rows containing the PK:s have been created.
But on the other hand you will not be able to find out how long after they have been created.
If you have decent backups, you could try to restore one or a few backups of varying age and then try to see if those backups also contains this strange behaviour. It could give you at least some clue about when the strangeness has started.
But the bottom line is that using just a select, there is now way to get the row out of a table like this in the order they were inserted.
If SELECT * doesn't return them in 'natural' order and you didn't insert them with a timestamp or auto-incrementing ID then I believe you're sunk. If you've got an IDENTITY field, order by that.
But the question I have is, how can you tell that SELECT * isn't returning them in the order they were inserted?
Update:
Based on your update, it looks like there is no method by which to return records as you wish, I'd guess you've got a clustered index on ID1?
Select *, %%physloc%% as pl from table
order by pl desc
I am trying to get the count of records from a table with around 40 million records. My query is as follows:
Select count(*) from Employee
where code = '000111' and status = 'A' and rank = 'B'
There are around 2-3 million records which satisfy the condition. Status has just 2 values (A and C) and rank too has only two values( A and B)
Indexes have been added for the columns 'code', 'status' and 'rank' and all are VARCHAR.
In spite of this, The above query is taking a lot of time.
Is there a way to retrieve the count in quick time?
Note that I just want the count of records.
Edit: Column Details
EMPLOYEE
CODE. NOT NULL VARCHAR2(6)
STATUS NOT NULL VARCHAR2(1)
RANK NOT NULL VARCHAR2(1)
Indexes:
CODE_IDX - Normal Index (For Code)
STATUS_IDX - BITMAP Index (For Status)
TARIFF_IDX - Normal Index (For Tariff)
If the table is continuously updated and you need to continuously get the count, I would recommend that you create a key/value table (if you don't have one already) that stores the count as an entry in the database rather than getting the count each time. That is to say if you need your query to be faster... that would definitely speed it up. Keep the key the primary key on the table and you won't have to worry about indexing. Update the key/value pair when a new entry is inserted or removed by subtracting or adding 1 to the value. Then just periodically make sure your value is spot on by getting the count in the manner you are already doing in a cron or some other fashion.
have you tried:
Select count(1) from Employee where code = '000111' and status = 'A' and rank = 'B'
I have a problem with comparing two selects in PostgreSQL. I'm executing these selects by JDBC, then create new tables by inserting data from the result set to new table. I do it because I want to avoid columns with same name like "count". Then I have to compare data in these tables.
The problem is that these tables should be same if there is same data with different order of columns. For example, if there are 3 columns (1, 2, 3) in tables t1 and t2 these tables are the same if t1.1 = t2.2 and t1.2 = t2.1 and t1.3 = t2.3.
The order of columns within a row is determined at the time of creation. If you do a
SELECT * FROM tbl;
or
TABLE tbl;
you get the column order you created the table with. If you name columns in your SELECT you get your columns in your explicit order.
You must always spell out the columns you use for an operation like yours. It could break if you alter the order of columns in one of your tables later. Do not rely on *.
The order of rows in a SELECT is indeterminate as long as you don't include an ORDER BY clause. If you want a specific order you have to ORDER BY a primary or unique column (or unique combination of columns). If you order by a non-unique set of columns, the rows within groups of the same key are again in indeterminate order.
SELECT col1, col2, col3 FROM tbl
ORDER BY <unique column or set of oclumns>;
Read the manual on the ORDER BY clause.