We have a table of 627 columns and approx 850 000 records.
We are trying to retrieve only two columns and dump that data in new table, but the query is taking endless time and we are unable to get the result in new table.
create table test_sample
as
select roll_no, date_of_birth from sample_1;
We have unique index on roll_no column (varchar) and data type for date_of_birth is date.
Your query has no WHERE clause, so it scans the full table. It reads all the columns of every row into memory to extract the columns it needs to satisfy your query. This will take a long time because your table has 627 columns, and I'll bet some of them are pretty wide.
Additionally, a table with that many columns may give you problems with migrated rows or chaining. The impact of that will depend on the relative position of roll_no and date_of_birth in the table's projection.
In short, a table with 627 columns shows poor (non-existent) data modelling. Which doesn't help you now, it's just a lesson to be learned.
If this is a one-off exercise you'll just need to let the query run. (Although you should check whether it is running at all: can you see active progress in V$SESSION_LONGOPS?)
Related
I am trying to select a number of rows by the value of a column called ID. I know you can do this pretty easily by:
SELECT col1, col2, col3 FROM mytable WHERE id IN (1,2,3,4,5...)
However, what if there are a few million IDs I want to select and the IDs don't always have pattern (which means I can't use something like BETWEEN x AND y)? Does this select statement still work or is there better ways of doing so?
The actual application is this. Filters are specified by users, which is compared to some attributes of the records. From those filters, we create a subset of the data which is of interest to a particular user. There are about 30 million records each with roughly ~3000 attributes (which is stored in roughly 30 tables, but every table has ID as a primary key), so every time someone makes a query about their desired subset of records, we'd have to join many tables, apply those filters, and figure out what his subset looks like. In order to avoid joining many tables all the time, I thought maybe it's a better idea to join the tables once, figure out the id of the selected subset, and this way each time a new query is made, all we have to do is select the relevant columns of the rows that match the filtered ids.
This depends on the database and the interface you are using. For a few hundred or thousand values, no problem. But your question specifies millions. And that could start to get into limits on the length of the query -- either specified by the database, the tool you are using, or intermediate libraries.
If you have so many ids, I would strongly recommend that you load them into a table in the database with the id as the primary key. Then use join or exists to identify the rows in your table that match.
Often, such a list would be generated in the database anyway. In that case, you can use a subquery or CTE and just include that code in your final query.
I have to fix a very poorly designed database.
The problem:
One Job Advertisment has one jobtitle, but many qualifying degrees.
(e.g., JobTitle:Analyst, Qualifications: Accounting Degree, or Finance Degree or Business Degree)
The tables:
TableName: UniqueJobName Columns: jobName(char) uniqueJobUid(bigint)
TableName: UniqueDegree Columns: degreeName(char) degreeUid(bigint)
TableName: Jobs Columns: jobName(char) jobUid(bigint),uniqueJobUid(bigint)
TableName: Job_Degree: jobUid(char) degreeName(char)
Relations
onetomany UniqueJobName.uniqueJobUid -> Jobs.uniqueJobUid
onetomany Jobs.jobUid-> Job_Degree.jobUid
There is NO relation between Jobs and UniqueDegree.
Technical Requirement
Rather than creating a column in Job_Degree for degreeUid, I want to create a new table: UniqueJob_UniqueDegree_Job (There are reasons for this that I won't explain here)
UniqueJob_UniqueDegree_Job will have three columns:
uniqueJobUid
jobId
degreeId
The trouble is that the Job table is already very big, 500,000 rows (and the Job_Degree table even bigger)
QUESTION:
What is the most efficient SQL statement for creating the UniqueJob_UniqueDegree_Job table given that part of the statement will be comparing the char column of UniqueDegree.degreeName and Job_Degree.degreeName?
Any hints would be most appreciated.
select j.jobname, j.jobuid, ud.degreeid
into UniqueJob_UniqueDegree_Job
from jobs j
join job_degree jd on j.jobuid = jd.jobuid
join uniquedegree ud on ud.jobname = jd.jobname
Having a hard time with getting uppercase letters etc because I use a worthless cellphone.
This should however do it. Note in order to do select Into... From the table cannot be created already (you can use convert or cast on each attribute in the select statement to get the data types correct with certainty.
If the table already exist then alter the query into
insert Into ..
select ...
from ....
500k rows is rather small as well. This shouldn't take more than a couple of seconds I'd estimate.
I need your assistance to figure out how to achieve the following in MS access database.
I have a table with a lot of columns but one of them has a numeric value that will be used as how many times will the record will be repeated.
I need to make another table with repeated records based on Count column.
Build a numbers (aka tally) table (you can google it). I'll call it tblNumbers. Then all you need to do is create a query SELECT <yourTable>.* FROM <yourTable>, tblNumbers WHERE tblNumbers.Number <= <yourTable>.<numberField>
My question is about table partitioning in SQL Server 2008.
I have a program that loads data into a table every 10 mins or so. Approx 40 million rows per day.
The data is bcp'ed into the table and needs to be able to be loaded very quickly.
I would like to partition this table based on the date the data is inserted into the table. Each partition would contain the data loaded in one particular day.
The table should hold the last 50 days of data, so every night I need to drop any partitions older than 50 days.
I would like to have a process that aggregates data loaded into the current partition every hour into some aggregation tables. The summary will only ever run on the latest partition (since all other partitions will already be summarised) so it is important it is partitioned on insert_date.
Generally when querying the data, the insert date is specified (or multiple insert dates). The detailed data is queried by drilling down from the summarised data and as this is summarised based on insert date, the insert date is always specified when querying the detailed data in the partitioned table.
Can I create a default column in the table "Insert_date" that gets a value of Getdate() and then partition on this somehow?
OR
I can create a column in the table "insert_date" and put a hard coded value of today's date.
What would the partition function look like?
Would seperate tables and a partitioned view be better suited?
I have tried both, and even though I think partition tables are cooler. But after trying to teach how to maintain the code afterwards it just wasten't justified. In that scenario we used a hard coded field date field that was in the insert statement.
Now I use different tables ( 31 days / 31 tables ) + aggrigation table and there is an ugly union all query that joins togeather the monthly data.
Advantage. Super timple sql, and simple c# code for bcp and nobody has complained about complexity.
But if you have the infrastructure and a gaggle of .net / sql gurus I would choose the partitioning strategy.
I have a very large database I would like to split up into tables. I would like to make it so when I run a distinct, it will make a table for every distinct name. The name of the table will be the data in one of the fields.
EX:
A --------- Data 1
A --------- Data 2
B --------- Data 3
B --------- Data 4
would result in 2 tables, 1 named A and another named B. Then the entire row of data would be copied into that field.
select distinct [name] from [maintable]
-make table for each name
-select [name] from [maintable]
-copy into table name
-drop row from [maintable]
Any help would be great!
I would advise you against this.
One solution is to create indexes, so you can access the data quickly. If you have only a handful of names, though, this might not be particularly effective because the index values would have select almost all records.
Another solution is something called partitioning. The exact mechanism differs from database to database, but the underlying idea is the same. Different portions of the table (as defined by name in your case) would be stored in different places. When a query is looking only for values for a particular name, only that data gets read.
Generally, it is bad design to have multiple tables with exactly the same data columns. Here are some reasons:
Adding a column, changing a type, or adding an index has to be done times instead of one time.
It is very hard to enforce a primary key constraint on a column across the tables -- you lose the primary key.
Queries that touch more than one name become much more complicated.
Insertions and updates are more complex, because you have to first identify the right table. This often results in overuse of dynamic SQL for otherwise basic operations.
Although there may be some simplifications (security comes to mind), most databases have other mechanisms that are superior to splitting the data into separate tables.
what you want is
CREATE TABLE new_table
AS (SELECT .... //the data that you want in this table);