Assign an ID Value for Every Set of Duplicates

Assign an ID Value for Every Set of Duplicates - sql

How can i generate an ID value for every set of duplicate records as seen in the second table with ID column? In other words, how can I let the first table to look like the second table using SQL query?
Assume that first name and last name in the first table can appear in duplicates.
Each first name and last name can have one or many purchase yr and cost.
The given image is just a sample. Total records in table 1 can reach thousands.
I'm using Oracle SQL.
Note: I'm working with one table only that is the first one. The second table is what I want.

You can use the DENSE_RANK analytic function to assign ID's as below:
EDIT:
Simplified query to generate ID's.
SELECT
DENSE_RANK() OVER (ORDER BY First_Name, Last_Name) ID,
t.*
FROM Table1 t;
Reference:
DENSE_RANK on Oracle Database SQL Reference

Related

SQL Assign Unique number to each unique value in a column

I have a table in Snowflake with people's names and other attributes. To simplify, it looks like the table below.
How can I add a new column with assigned unique number to each person directly to the table using SQL?
The ideal result is like below

Use dense_rank():
select name, dense_rank() over (order by name) as uniquenum
from t;
You can use this logic in an update, but the exact syntax depends on the database.

Get latest data for all people in a table and then filter based on some criteria

I am attempting to return the row of the highest value for timestamp (an integer) for each person (that has multiple entries) in a table. Additionally, I am only interested in rows with the field containing ABCD, but this should be done after filtering to return the latest (max timestamp) entry for each person.
SELECT table."person", max(table."timestamp")
FROM table
WHERE table."type" = 1
HAVING table."field" LIKE '%ABCD%'
GROUP BY table."person"
For some reason, I am not receiving the data I expect. The returned table is nearly twice the size of expectation. Is there some step here that I am not getting correct?

You can 1st return a table having max(timestamp) and then use it in sub query of another select statement, following is query
SELECT table."person", timestamp FROM
(SELECT table."person",max(table."timestamp") as timestamp, type, field FROM table GROUP BY table."person")
where type = 1 and field LIKE '%ABCD%'

Direct answer: as I understand your end goal, just move the HAVING clause to the WHERE section:
SELECT
table."person", MAX(table."timestamp")
FROM table
WHERE
table."type" = 1
AND table."field" LIKE '%ABCD%'
GROUP BY table."person";
This should return no more than 1 row per table."person", with their associated maximum timestamp.
As an aside, I surprised your query worked at all. Your HAVING clause referenced a column not in your query. From the documentation (and my experience):
The fundamental difference between WHERE and HAVING is this: WHERE selects input rows before groups and aggregates are computed (thus, it controls which rows go into the aggregate computation), whereas HAVING selects group rows after groups and aggregates are computed.

Append two rows into one (that differs on one colum)

I am trying to create a query that returns a single row for each unique ID in my oracle table.
The problem is that i have one column, Description, that isnt unique in each row (Description-column is the only coulmn that can differ for each ID row btw). This is what my table looks like:
ID Description Customer
==================================================
5119450733 Cost GOW_1
5119450733 Price GOW_1
1543512377 Cost GOW_2
Is there a way to query the table so that i append the results from Description so that i can have unique id rows? for example like this:
ID Description Customer
==================================================
5119450733 Cost,Price GOW_1
1543512377 Cost GOW_2

Use LISTAGG function if you are using Oracle 11g Release 2.
SELECT Id,
listagg(Description,',') WITHIN GROUP(ORDER BY description) AS Description,
Customer
FROM <table_name>
GROUP BY id, customer;
Refer the below link to know more about String Aggregation Techniques on different versions.

How do I get row id of a row in sql server

I have one table CSBCA1_5_FPCIC_2012_EES207201222743, having two columns employee_id and employee_name
I have used following query
SELECT ROW_NUMBER() OVER (ORDER BY EMPLOYEE_ID) AS ID, EMPLOYEE_ID,EMPLOYEE_NAME
FROM CSBCA1_5_FPCIC_2012_EES207201222743
But, it returns the rows in ascending order of employee_id, but I need the rows in order they were inserted into the table.

SQL Server does not track the order of inserted rows, so there is no reliable way to get that information given your current table structure. Even if employee_id is an IDENTITY column, it is not 100% foolproof to rely on that for order of insertion (since you can fill gaps and even create duplicate ID values using SET IDENTITY_INSERT ON). If employee_id is an IDENTITY column and you are sure that rows aren't manually inserted out of order, you should be able to use this variation of your query to select the data in sequence, newest first:
SELECT
ROW_NUMBER() OVER (ORDER BY EMPLOYEE_ID DESC) AS ID,
EMPLOYEE_ID,
EMPLOYEE_NAME
FROM dbo.CSBCA1_5_FPCIC_2012_EES207201222743
ORDER BY ID;
You can make a change to your table to track this information for new rows, but you won't be able to derive it for your existing data (they will all me marked as inserted at the time you make this change).
ALTER TABLE dbo.CSBCA1_5_FPCIC_2012_EES207201222743
-- wow, who named this?
ADD CreatedDate DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP;
Note that this may break existing code that just does INSERT INTO dbo.whatever SELECT/VALUES() - e.g. you may have to revisit your code and define a proper, explicit column list.

There is a pseudocolumn called %%physloc%% that shows the physical address of the row.
See Equivalent of Oracle's RowID in SQL Server

SQL does not do that. The order of the tuples in the table are not ordered by insertion date. A lot of people include a column that stores that date of insertion in order to get around this issue.

In SQL, why does group by make a difference when using having count()

I have a table that stores zone_id. Sometimes a zone id is twice in the database. I wrote a query to show only entries that have two or more entries of the same zone_id in the table.
The following query returns the correct result:
select *, count(zone_id)
from proxies.storage_used
group by zone_id desc
having count(zone_id) > 1;
However, if I group by last_updated or company_id, it returns random values. If I don't add a group by clause, it only displays one value as per the screenshot below. First output shows above query string, second output shows same query string without the 'group by' line and returns only one value:
correction: I'm a new member and thus can't post pictures directly, so I added it on minus: http://min.us/m3yrlkSMu#1o
While my query works, I don't understand why. Can somebody help me understand why group by is altering the actual output, instead of only the grouping of the output? I am using MySQL.

A group by divides the resulting rows into groups and performs the aggregate function on the records in each group. If you do a count(*) without a group by you will get a single count of all rows in a table. Since you didn't specify a group by there is only one group, all records in the table. If you do a count(*) with a group by of zone id, you will get a count of how many records there are for each zone id. If you do a count(*) of zone id and last updated date, you will get a count of how many rows were updated on each date in each zone.

Without a group by clause, everything is stored in the same group, so you get a single result. If there are more than one row in your table, then the having will succeed. So, you'll end up counting all the rows in your table...
source
From what I got, you could create a query with having and without group by only in two situations:
You have a where clause, and you want to test a condition on an aggregation of all rows that satisfy that clause.
Same as above, but for all rows in your table (in practice, it doesn't make sense, though).

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Assign an ID Value for Every Set of Duplicates - sql

You can use the DENSE_RANK analytic function to assign ID's as below: EDIT: Simplified query to generate ID's. SELECT DENSE_RANK() OVER (ORDER BY First_Name, Last_Name) ID, t.* FROM Table1 t; Reference: DENSE_RANK on Oracle Database SQL Reference

Related

SQL Assign Unique number to each unique value in a column

Get latest data for all people in a table and then filter based on some criteria

Append two rows into one (that differs on one colum)

How do I get row id of a row in sql server

In SQL, why does group by make a difference when using having count()

Categories

Resources