Row returned by a query with ROWNUM and PRIMARY KEY column - sql

I'm reading about 1 of the oracle pseudocolumns i.e
ROWNUM: which return number indicating the order in which oracle return the row.
I have encountered some behavior here,
Example used
1. Create Script:
CREATE TABLE CUSTOMERS(
ID INT NOT NULL,
NAME VARCHAR (20) NOT NULL,
AGE INT NOT NULL,
ADDRESS CHAR (25) ,
SALARY DECIMAL (18, 2),
PRIMARY KEY (ID)
);
2 Insert Script:
INSERT INTO CUSTOMERS (ID,NAME,AGE,ADDRESS,SALARY)
VALUES (2, 'Max', 22, 'India', 4500.00 );
INSERT INTO CUSTOMERS (ID,NAME,AGE,ADDRESS,SALARY)
VALUES (1, 'Maths', 22, 'US', 4500.00 );
select * from CUSTOMERS; executed in the same order as inserted above,
ID NAME AGE ADDRESS SALARY
------ ---- --- ------- ------
2 Max 22 India 4500.00
1 Math 22 US 4500.00
Here if I run the select query,
select
rownum,
customers.ID
from customers;
I get below output:
ROWNUW ID
------ --
1 1
2 2
Here ID = 2 is inserted first but oracle returns in the 2nd row,
But if you include any other column from a table with ROWNUM and PK like
select
rownum,
customers.ID,customers.Name
from customers;
I get correct output (Correct inserted order) :
ROWNUW ID NAME
------ -- ----
1 2 Max
2 1 Math
If run query without Name column i.e only ROWNUW (pseudocolumns) and PK (table column)
We get this,
ROWNUW ID
------ --
1 1
2 2
My Question is Why is that ID=2 is not first returned row.
if I query any table column with ROWNUM, I get the result back based on insertion order.
Example below
select
rownum,
(NAME / AGE / ADDRESS SALARY) any one of these columns
From CUSTOMERS
But if use ROWNUM with ID (Primary Key column) insertion order is not working. why is this behavior with the only Primary Key column?

ROWNUM values depends on how oracle access the resultset. Once resultset is fetched, rownum is assigned to the rows.
As there is no guarantee in what order the data is returned, there is no guarantee in what order rownum is assigned.
Maybe if you try running both of your queries without rownum, you might get the rows swapped as you showed, or they might not.

Rownum will always be in order of 1,2,3,4 .. no matter what you are getting in result set. It is always calculated after result set is returned from query.
Now, you have ID column as PRIMARY KEY which makes it eligible for default clustered index creation in database. So when you select only PRIMARY column, it will be sorted as it was stored in DB that way. Index created as a result of PRIMARY key creation is sorted in ASC which is default.
Here if you select only PRIMARY KEY column i.e. "ID", Database is returning it in sorted order and ROWNUM is writing 1 & 2 to it.
But if you are selecting some extra non-PRIMARY KEY as well then it is totally upto oracle, whichever way it finds data faster. Sometimes it will return data from cache memory. If not it will read disk and fetch data. While reading datafile, there are multiple things which will differ machine to machine i.e is your table partitioned? is your table sharing tablespace ? what organization you are using in table ? blah blah....
Bottom line is, don't trust ROWNUM. It gives 1,2,3.. after you have applied your brain to the query.

Related

Count and Group By Returning Different Sets - Halfway solved :/

Can anyone help? My trouble is that people might have the same id or different and have different name spellings. If I group by id (which is not the primary key) I get a different amount of rows than if I group by ID and name. How do I just group by ID, while still having the ID and Name in the select?
Create Table Client(ID Int, Name Varchar(15))
Insert Into Client VALUES(11,'Batman'),(22,'Batman'),(33,'Robin'),(44,'Joker'),(44,'The Joker'),(33,'Robin')
Select Count(ID) From Client
Select * From Client
--This returns 4 rows as it should
Select Count (ID)
From Client
Group By ID
--This returns 5 rows because Joker and The Joker have different names, but the same ID. I want to count by ID and not the name, since so many have typos.
Select Count (ID), [Name] , ID
From Client
Group By ID, [Name]
How do I do this and have it work?
Select Count (ID), [Name] , ID
From Client
Group By ID --<< Always throws and error unless I include Name, which
--returns too many rows.
It should return
Count Name ID
1 Batman 11
1 Batman 22
2 Joker 44 --<< Correct
2 Robin 33
And not
Count Name ID
1 Batman 11
1 Batman 22
2 Robin 33
1 Joker 44 --Wrong
1 The Joker 44 --Wrong
using select count(*) from ClientLog will tell you exactly how many records there are in your table. If your ID field is the primary key, then select count(ID) from ClientLog should return the same number.
Your first query is a little confusing, because you're grouping by ID but not displaying the ID. So you're likely getting a row for each record, where the row value is 1.
Your 2nd query is also a bit confusing, because there's no aggregation happening (since your ID field is unique).
What specifically are you trying to obtain in your query (if anything besides just how many records you have in your table)?

How can i sort table records in SQL Server 2014 Management Studio by Alphabetical?

I have many record in one table :
1 dog
2 cat
3 lion
I want to recreate table or sort data with this Alphabetical order :
1 cat
2 dog
3 lion
Table 1
Id int Unchecked
name nvarchar(50) Checked
To create another table from your table :
CREATE TABLE T1
( ID INT IDENTITY PRIMARY KEY NOT NULL,
NAME NVARCHAR(50) NOT NULL
)
GO
INSERT INTO T1 VALUES ('Dog'),('Cat'),('Lion');
SELECT ROW_NUMBER ()OVER (ORDER BY NAME ASC) ID, NAME INTO T2 FROM T1 ORDER BY NAME ASC;
If you just want to sort the table data, use Order by
Select * from table_1 order by Name
If you want to change the Id's as well according to alphabetical order, create a new table and move the records to the new table by order.
SELECT RANK() OVER (ORDER BY name ) AS Id, name
INTO newTable
FROM table_1
In your database, the order of the records as they were inserted into the table does not necessarily dictate the order in which they're returned when queried. Nor does the ordering of a clustered key. There may be situations in which you appear to always get the same ordering of your results, but that is not guaranteed and may change at any time.
If the results of a query must be a specific order, then you must specify that ordering with an ORDER BY clause in your query (ORDER BY [Name] ASC in this particular case).
I understand, based upon your comments above, that you don't want this to be the answer. But this is how SQL Server (and any other relational database) works. If order matters, you specify that upon querying data from the system, not when inserting data into it.

how to increase character sequence in sql if given random number of table rows?

I'm currently studying SQL language in Oracle.
After making very simple STUDENT table, I thought about how to make character sequence in ID field.
For example, if STUDENT table has 6 rows, I want the ID field to be inserted by 'a','b','c'...'f' characters respectively. And another condition is that the ID sequence should be ordered by age in ascending order.
The below explanation is about STUDENT table description and current inserted value (ID field is currently empty).
NAME AGE GRADE ID
hi 15 1
dui 12 2
giyu 16 3
hero 27 4
power 55 3
rai 37 4
///////////////////////////////////////////////////////////////////////////////////////////////////////
DESC STUDENT
NAME VARCHAR2(20)
AGE NUMBER(5)
GRADE NUMBER
ID VARCHAR2(12)
I hope many brilliant ideas come up here =)
until now, this is very easy to come up with making table ordered by age.
but inserting character sequence respectively is ... well .. idea doesn't come up now. And this is not homework. i just want to practice sql language.
update tableX X
set ID=(
select ID from (
select rowid as rid,
chr(mod((row_number() over (order by age))-1,26)+97) as ID
from tableX T
)
where rid=X.rowid
)
Required order of the ID set in the over(order by ) clause. Function row_number() gets sequence number of rows in given order. mod() gets remainder of the division (for 26 chars only). chr() get char by the ascii code.

Get distinct information across many fields some of which are NULL

I have a table with just over 65 million rows and 140 columns. The data comes from several sources and is submitted at least every month.
I look for a quick way to grab specific fields from this data only where they are unique. Thing is, I want to process all the information to link which invoice was sent with which identifying numbers and it was sent by whom. Issue is, I don't want to iterate over 65 million records. If I can get distinct values, then I will only have to process say 5 million records as opposed to 65 million. See below for a description of the data and SQL Fiddle for a sample
If say a client submits an invoice_number linked to passport_number_1, national_identity_number_1 and driving_license_1 every month, I only want one row where this appears. i.e. the 4 fields have got to be unique
If they submit the above for 30 months then on the 31st month they send the invoice_number linked to passport_number_1, national_identity_number_2 and driving_license_1, I want to pick this row also since the national_identity field is new hence the whole row is unique
By linked to I mean they appear on the same row
For all fields its possible to have Null occurring at one point.
The 'pivot/composite' columns are the invoice_number and
submitted_by. If any of those aren't there, drop that row
I also need to include the database_id with the above data. i.e.
the primary_id which is auto generated by the postgresql database
The only fields that don't need to be returned are the other_column
and yet_another_column. Remember the table has 140 columns so don't
need them
With the results, create a new table that will hold this unique
records
See this SQL fiddle for an attempt to recreate the scenario.
From that fiddle, I'd expect a result like:
Row 1, 2 & Row 11: Only one of them shall be kept as they are exactly the
same. Preferably the row with the smallest id.
Row 4 and Row 9: One of them would be dropped as they are exactly the
same.
Row 5, 7, & 8: Would be dropped since they are missing either the
invoice_number or submitted_by.
The result would then have Row (1, 2 or 11), 3, (4 or 9), 6 and 10.
To get one representative row (with additional fields) from a group with the four distinct fields:
SELECT
distinct on (
invoice_number
, passport_number
, national_id_number
, driving_license_number
)
* -- specify the columns you want here
FROM my_table
where invoice_number is not null
and submitted_by is not null
;
Note that it is unpredictable which row exactly is returned unless you specify an ordering (documentation on distinct)
Edit:
To order this result by id simply adding order by id to the end doesn't work, but it can be done by eiter using a CTE
with distinct_rows as (
SELECT
distinct on (
invoice_number
, passport_number
, national_id_number
, driving_license_number
-- ...
)
* -- specify the columns you want here
FROM my_table
where invoice_number is not null
and submitted_by is not null
)
select *
from distinct_rows
order by id;
or making the original query a subquery
select *
from (
SELECT
distinct on (
invoice_number
, passport_number
, national_id_number
, driving_license_number
-- ...
)
* -- specify the columns you want here
FROM my_table
where invoice_number is not null
and submitted_by is not null
) t
order by id;
quick way to grab specific fields from this data only where they are unique
I don't think so. I think you mean you want to select a distinct set of rows from a table in which they are not unique.
As far as I can tell from your description, you simply want
SELECT distinct invoice_number, passport_number,
driving_license_number, national_id_number
FROM my_table
where invoice_number is not null
and submitted_by is not null;
In your SQLFiddle example, that produces 5 rows.

Removing duplicate records using another table's oid

Table 1 Table 2
-------- --------
oid oid (J)
sequence trip_id
stop
trip_update_id (J)
(J) = join
Table 1 and Table 2 are updated ever 30 seconds from an api simultaneously.
At the end of each day Table 1 has been filled with 98% duplicate data, this is because the data feed includes both new data generated in the last 30 seconds, and all data generated in previous feeds from the same day. As a result Table 1 is filled with mostly duplicate data (the oid is automatically generated upon insertion, therefore all oid are unique).
Table 2 has all unique records, therefore my question is what is the sql to turn Table 1 into all unique records for each trip_id in Table 2.
I'm not quite sure if I understand what the problem is, but here comes a few suggestions.
To remove rows from table1 with trip_update_id values not found in table2:
delete from table1
where trip_update_id not in (select trip_id from table2 where trip_id is not null)
(The is not null part is very important if trip_id is allowed to have NULL values!!!)
To duplicate remove trip_update_id rows from table 1, keep the one with highest oid:
delete from table1
where oid not in (select max(oid) from table1
group by trip_update_id)