How to create a sequence field that is equal for duplicates and unique only for unique rows? - sql

I'm extracting data from several databases and want to keep track of duplicate records without purging them. My solution is to create a new sequence field, where rows are marked duplicate by having the same sequence number. Keep in mind that not all columns have to be equal to be considered a duplicate.
How do I do this? My goal is to have this table with all duplicate records intact, and finally another table where I would only have unique records by merging those with same sequence ID.

Try this:
select t.*, Sequence_ID=DENSE_RANK() over (
order by <fields_you_want_to_test_for_uniqueness>
)
from <your_table> t
Note that DENSE_RANK() gives you identical values for a "tie", but also gives you consecutive numbers (e.g. 1, 2, 3, 3, 4), whereas RANK() gives you the same value for a "tie", but then skips numbers (e.g. 1, 2, 3, 3, 5). Choose whichever one suits your needs.

Related

Is there way to add a field in a parent query that will increment as the query goes through all values generated in a subquery?

I think I have a table that lacks a true primary key and I need to make one in the output. I cannot modify the table.
I need to run a select query to generate a list of values (list_A), then take those values and query them to show all the records related to them. From those records, I do another select to extract a now visible list called list_B. From list_B, I can search them to reveal all the records related to the original list (list_A), with many of those records missing the values from list_A but still need to be counted.
Here's my process so far:
I declared a sequence called 'temp_key', which starts from 1 and increments by 1.
I add a field called 'temp_key' to the parent query, so that it will hopefully show which element of the original list_A sub-query the resulting records are related to.
I run into trouble because I don't know how to make the temp_key increment as the list_A sub-query moves from the beginning to end of all the values in the list.
SELECT currval(temp_key) AS temp_key, list_A, list_B
FROM table
WHERE list_B IN (SELECT DISTINCT list_B
FROM table
WHERE list_A IN (SELECT DISTINCT list_A
from table);
As it is now, the above query doesn't work because there seems to be no way to make the current value of temp_key increment upward as it goes through values from the list originally generated from the lowest level sub-query (list_A).
For example, there might be only 10 values in list_A. And the output could have 100s of records, all labeled 1 through 10, with many of those values missing values in the list_A field. But they still need to be labeled 1 through 10 because the values of list_B connect the two sets.
Maybe you can create a new primary key column first with the following code (concatenating row number with list_a):
WITH T AS (
SELECT currval(temp_key) AS temp_key, list_A, list_B,
CONCAT(ROW_NUMBER() OVER(PARTITION BY list_A ORDER BY list_B),list_A) AS Prim_Key
FROM table )
SELECT * fROM T
Then you can specify in the where clause what keys you want to select

Renaming Row Count Column in SQL

I can’t find how to rename the row counting column in a table in an SQL Server RDMS. When you create a table and you have user created columns, A and B for example, to the farthest right of those columns, you have the Row Number column.
It does not have a title. It just sequentially counts all the rows in your table. It's default. Is it possible to manipulate this column denoting the row numbers? Meaning, can I rename it, put its contents in descending order, etc. If so, how?
And if not, what are the alternatives to have a sequentially counting column counting all the rows in my table?
No. You can create your own column with sequential values using an identity column. This is usually a primary key.
Alternatively, when you query the table, you can assign a sequential number (with no gaps) using row_number(). In general, you want a column that specifies the ordering:
select t.*, row_number() over (order by <ordering column>) as my_sequential_column
from t;

Reset Column auto-increment

I have a table where a certain column has active auto-increment, but I did some tests and during the test I created several records and exclude some of them. I would like to generate new numbers for those who stayed, without having to delete ...
Ex: they were 3 record, one with code 5, the other with 16 and the other with 18. I wanted the first one to be 1, the second 2 and the third 3. Of course the new ones follow the sequence.
Changing ids is generally a bad idea. The purpose of an id is to identify rows in a table, both for foreign key relationships and over time.
But you can change the numbers if you need to:
with toupdate as (
select t.*, row_number() over (order by id) as new_id
from t
)
update toupdate
set id = new_id;

Get rows from column via sql injection

For a school project I need to hack a database (made by the school for practice) and I have to retrieve all the rows of a specific column via SQL-Injection. I have been provided the column and table name.
Here is my example query that needs help.
bla' union all select 2, DefaultCreditCardNumber,5,6 from buyer#
This will retrieve only 1 row each time I enter the query. Is it possible to get all the rows at once? If so how?
Unless the queries results are not printed in a loop you can not get all rows at once.
You need to get the data row by row:
bla' union all select 2, DefaultCreditCardNumber,5,6 from buyer LIMIT n,1#
Where n is the index of the row (0, 1, 2, ...)

SELECT Top(1), aggregate function with Where clause

OK, I've googled and googled and still can't get this.
Effectively, in a table containing several hundred thosand rows, one column has a unique idendtifier (not a PK and not really unique, but hey) and another has numerical values.
The unique identifier (UI) is unique only within that table and is sort of incremental, in that the highest number signifies the most recent table entry.
Effectively, I need to break the rows down to relevant rows using a WHERE clause, then get the most recent UI of those rows together with the SUM of the values of those rows.
i.e. if UI are 1, 3, 5, 7, 10 and the corresponding values for the aggregate function are 100, 300, 500, 700 and 1000, what I need to have as query result is UI 10, Sum 2600.
DB is SQL2000
How do I acheive this?
It sounds like all of the items in the table need to be summed and returned with the max identifier. Would this work for you?
Select Max(ID), Sum(Number) from TableName
ID would be your Unique Identifier column name.
Number would be your column name that holds the numbers.
TableName is the name of your table.