SQL update statement to populate numerical series for each distinct subset of table records - sql

I need the SQL update statement to assign consecutive sequence numbers to subsets of records in a table. I'm using MS access.
Let's say the current table has records like:
notebook,blue
notebook.Yellow
pencil,yellow
chair,blue
desk,green
desk,blue
I would like to add another field to the table and populate it as follows:
notebook,blue,1
notebook.Yellow,1
pencil,yellow,2
chair,blue,2
desk,green,1
desk,blue,3
you see that I have given a consecutive number assignment based on a certain set of criteria. In this example, the criteria was a distinct value in the second field (in real life, the criteria will be a distinct combination of values from several fields, but all the relevant fields are within the same table... no join is needed to get the criteria). since there are three records with blue in field 2, these are numbered 1,2,3. And since there are two records with yellow, they are numbered 1,2.
So I can't derive the numbering from the row number, since I have several numbering series in the same table all starting with 1.
Also, I need it to be a query where I don't have to explicitly specify the value in the second field. I just want each unique value in the second field to get its own numbering series. that is, I don't want to have to explicitly write one query to generate the numbers for "blue", and write a separate query to generate the numbers for "yellow"
The maximum number of records in the series is under 1000. So I don't mind if I would need to create and auxiliary table with 1000 records, with a field containing the values 1 to 1000. Then the update statement to the primary table could pull in the next value from the auxiliary table.
But I don't know the SQL syntax to use for this update statement, or for the update statement for any other approach. So I need your advice.

I'm not sure how to do this with a single SQL statement, but here are 2 SQL statements that could be used to handle each case:
insert into table ('desk', 'blue', 1)
where not exists (select field3 from table where field1 = 'desk' and field2 = 'blue');
insert into table (field1, field2, field3)
select field1, field2, count(1) + 1
from table
where field1 = 'desk'
and field2 = 'blue'
group by field1, field2;

Create Table #TableAutoIncrement (ID int identity(1000 , 1) , item varchar(20), COLOR varchar(20) )
Insert INTO #TableAutoIncrement
(item, COLOR )
SELECT item, COLOR FROM YOURTABLE
--- GETTING all the values from the temporary table
SELECT * FROM #TableAutoIncrement

A colleague of mine worked out the necessary SQL. Here's the generalized solution (note that I really needed to number the multiple series in my data set based on a combination of two fields. In my simplified example in the original post, I was using only one field--color--but since I really need two fields, that's what I show in this solution.
SELECT *,
(SELECT COUNT(T1.ID)
FROM
[TableName] AS T1
WHERE T1.ID >= T2.ID and t1.[NameCriteriaField1] = t2.[NameCriteriaField1]
and t1.[NameCriteriaField2]= t2.[NameCriteriaField2])
AS Sequence into OutputResultsTableName
FROM
[TableName] AS T2
ORDER BY [NameCriteriaField2] , [NameCriteriaField1]
The source table is set up with "ID" as field with an integer value. Every record has a unique value of ID, but it does not matter if there are gaps in the ID or how the records are sequenced against the ID. (e.g., the typical MS access auto numbered primary key field serves this purpose)
This query is set up to assume that there are two fields in your data set that you want to use to group your records and assign a numerical series count to each record within each group. (Thus your table may contain multiple groups, and each group has its own numbering series starting with 1. But the way the query is formulated, there are exactly 2 criteria that define the group.) You cannot use any where clauses to further filter the records that get counted. Through experimentation, I found that adding where clauses gives unreliable results where records can get omitted. So if you need the results to be filtered so that some records are not to be included in the numerical series for a particular group, then do one of the following before running my query:
run a query to delete the undesired records from the source table
first copy all records from the source table into a new table and delete the records from the new table that should not be numbered, and run my query on the new table
deleting extraneous records before running this query is needed only if those records qualify as members of a group defined by criteria 1 and criteria 2. If there are extraneous records that don't match those two criteria, you can leave them in the table, because they will not impact the numbering of the records within the groups that you care about. They will just get their own independent numbering, which you can just ignore.**
The numbering of each group starts at 1, and the query dynamically defines the groups based on the distinct combinations of criteria1 and criteria2. However, if you have records that do not belong to any group, these records will all be numbered with 0. (Criteria1 and criteria2--at least to the extent of my testing--are non-null values. (In theory--at least on Microsoft Access, an empty string is different than Null, but I did not test this with empty strings either.) If you have records that have null in the criteria1 or criteria2 fields, MS Access consider these records as not belonging to any group and thus numbers them with 0. That is, these distinct groups need to define by non-null values for criteria1 and criteria2, and thus this is different than the way SQL DISTINCT statement works.
If you need to have NULL as a valid criteria for defining the group (and thus to have groups defined by NULL numbered), it's very simple. Prior to running my query, first run an update statement that changes all instances of null values in criteria1 or criteria2 to the phrase "placeholder for null field". Then run my query. On the result set (after the numbering has been assigned to the groups), run another update to change all occurrences of the placeholder phrase back to null.
Adjustment to syntax if your group is defined by only one field criteria
SELECT *,
(SELECT COUNT(T1.ID)
FROM
[TableName] AS T1
WHERE T1.ID >= T2.ID and t1.[NameCriteriaField1] = t2.[NameCriteriaField1] )
AS Sequence into OutputResultsTableName
FROM
[TableName] AS T2
ORDER BY [NameCriteriaField2] , [NameCriteriaField1]
Adjustment to syntax if your group is defined by combination of 3 field criteria
SELECT *,
(SELECT COUNT(T1.ID)
FROM
[TableName] AS T1
WHERE T1.ID >= T2.ID and t1.[NameCriteriaField1] = t2.[NameCriteriaField1]
and t1.[NameCriteriaField2]= t2.[NameCriteriaField2]
and t1.[NameCriteriaField3]= t2.[NameCriteriaField3])
AS Sequence into OutputResultsTableName
FROM
[TableName] AS T2
ORDER BY [NameCriteriaField2] , [NameCriteriaField1]

Related

Is there way to add a field in a parent query that will increment as the query goes through all values generated in a subquery?

I think I have a table that lacks a true primary key and I need to make one in the output. I cannot modify the table.
I need to run a select query to generate a list of values (list_A), then take those values and query them to show all the records related to them. From those records, I do another select to extract a now visible list called list_B. From list_B, I can search them to reveal all the records related to the original list (list_A), with many of those records missing the values from list_A but still need to be counted.
Here's my process so far:
I declared a sequence called 'temp_key', which starts from 1 and increments by 1.
I add a field called 'temp_key' to the parent query, so that it will hopefully show which element of the original list_A sub-query the resulting records are related to.
I run into trouble because I don't know how to make the temp_key increment as the list_A sub-query moves from the beginning to end of all the values in the list.
SELECT currval(temp_key) AS temp_key, list_A, list_B
FROM table
WHERE list_B IN (SELECT DISTINCT list_B
FROM table
WHERE list_A IN (SELECT DISTINCT list_A
from table);
As it is now, the above query doesn't work because there seems to be no way to make the current value of temp_key increment upward as it goes through values from the list originally generated from the lowest level sub-query (list_A).
For example, there might be only 10 values in list_A. And the output could have 100s of records, all labeled 1 through 10, with many of those values missing values in the list_A field. But they still need to be labeled 1 through 10 because the values of list_B connect the two sets.
Maybe you can create a new primary key column first with the following code (concatenating row number with list_a):
WITH T AS (
SELECT currval(temp_key) AS temp_key, list_A, list_B,
CONCAT(ROW_NUMBER() OVER(PARTITION BY list_A ORDER BY list_B),list_A) AS Prim_Key
FROM table )
SELECT * fROM T
Then you can specify in the where clause what keys you want to select

How to update numerical column of one table based on matching string column from another table in SQL

I want to update numerical columns of one table based on matching string columns from another table.i.e.,
I have a table (let's say table1) with 100 records containing 5 string (or text) columns and 10 numerical columns. Now I have another table that has the same structure (columns) and 20 records. In this, few records contain updated data of table1 i.e., numerical columns values are updated for these records and rest are new (both text and numerical columns).
I want to update numerical columns for records with the same text columns (in table1) and insert new data from table2 into table1 where text columns are also new.
I thought of taking an intersect of these two tables and then update but couldn't figure out the logic as how can I update the numerical columns.
Note: I don't have any primary or unique key columns.
Please help here.
Thanks in advance.
The simplest solution would be to use two separate queries, such as:
UPDATE b
SET b.[NumericColumn] = a.[NumericColumn],
etc...
FROM [dbo].[SourceTable] a
JOIN [dbo].[DestinationTable] b
ON a.[StringColumn1] = b.[StringColumn1]
AND a.[StringColumn2] = b.[StringColumn2] etc...
INSERT INTO [dbo].[DestinationTable] (
[NumericColumn],
[StringColumn1],
[StringColumn2],
etc...
)
SELECT a.[NumericColumn],
a.[StringColumn1],
a.[StringColumn2],
etc...
FROM [dbo].[SourceTable] a
LEFT JOIN [dbo].[DestinationTable] b
ON a.[StringColumn1] = b.[StringColumn1]
AND a.[StringColumn2] = b.[StringColumn2] etc...
WHERE b.[NumericColumn] IS NULL
--assumes that [NumericColumn] is non-nullable.
--If there are no non-nullable columns then you
--will have to structure your query differently
This will be effective if you are working with a small dataset that does not change very frequently and you are not worried about high contention.
There are still a number of issues with this approach - most notably what happens if either the source or destination table is accessed and/or modified while the update statement is running. Some of these issues can be worked around other ways but so much depends on the context of how the tables are used that it is difficult to provide a more effective generically-applicable solution.

How to remove duplicate rows and keep one in an Access database?

I need to remove duplicate rows in my Access database, does anyone have generic query to do this? As I have this problem with multiple tables
There are two things you need to do,
Determine what the criteria are for a unique record - what is the list of columns where two, or more, records would be considered duplicates, e.g. JobbID and HisGuid
Decide what you want to do with the duplicate records - do you want to hard delete them, or set the IsDeleted flag that you have on the table
Once you've determined the criteria that you want to use for uniqueness you then need to pick 1 record from each group of duplicates to retain. A query along the lines of:
SELECT MAX(ID)
FROM MyTable
GROUP
BY JobbID, HisGuid
Will give you (and I've assumed that the ID column is an auto-increment/identity column that is unique across all records in the table) the highest value for each group of records where JobbID and HisGuid are both the same. You could use MIN(ID) if you want, it's up to you - you just need to pick ONE record from each group to keep.
Assuming that you want to set the IsDeleted flag on the records you don't want to keep, you can then incorporate this into an update query:
UPDATE MyTable
SET IsDeleted = 1
WHERE ID NOT IN
(
SELECT MAX(ID)
FROM MyTable
GROUP
BY JobbID, HisGuid
)
This takes the result of the query that retrieves the highest IDs and uses it to say set IsDeleted to 1 for all the records where the ID isn't the highest ID for each group of records where JobbID and HisGuid are the same.
The only part I can't help you with is running these queries in Access as I don't have it installed on the PC I'm using right now and my memory is a bit rusty regarding how/where to run arbitrary queries.

Get latest data for all people in a table and then filter based on some criteria

I am attempting to return the row of the highest value for timestamp (an integer) for each person (that has multiple entries) in a table. Additionally, I am only interested in rows with the field containing ABCD, but this should be done after filtering to return the latest (max timestamp) entry for each person.
SELECT table."person", max(table."timestamp")
FROM table
WHERE table."type" = 1
HAVING table."field" LIKE '%ABCD%'
GROUP BY table."person"
For some reason, I am not receiving the data I expect. The returned table is nearly twice the size of expectation. Is there some step here that I am not getting correct?
You can 1st return a table having max(timestamp) and then use it in sub query of another select statement, following is query
SELECT table."person", timestamp FROM
(SELECT table."person",max(table."timestamp") as timestamp, type, field FROM table GROUP BY table."person")
where type = 1 and field LIKE '%ABCD%'
Direct answer: as I understand your end goal, just move the HAVING clause to the WHERE section:
SELECT
table."person", MAX(table."timestamp")
FROM table
WHERE
table."type" = 1
AND table."field" LIKE '%ABCD%'
GROUP BY table."person";
This should return no more than 1 row per table."person", with their associated maximum timestamp.
As an aside, I surprised your query worked at all. Your HAVING clause referenced a column not in your query. From the documentation (and my experience):
The fundamental difference between WHERE and HAVING is this: WHERE selects input rows before groups and aggregates are computed (thus, it controls which rows go into the aggregate computation), whereas HAVING selects group rows after groups and aggregates are computed.

Compare table columns based on their relative position

I have a problem with comparing two selects in PostgreSQL. I'm executing these selects by JDBC, then create new tables by inserting data from the result set to new table. I do it because I want to avoid columns with same name like "count". Then I have to compare data in these tables.
The problem is that these tables should be same if there is same data with different order of columns. For example, if there are 3 columns (1, 2, 3) in tables t1 and t2 these tables are the same if t1.1 = t2.2 and t1.2 = t2.1 and t1.3 = t2.3.
The order of columns within a row is determined at the time of creation. If you do a
SELECT * FROM tbl;
or
TABLE tbl;
you get the column order you created the table with. If you name columns in your SELECT you get your columns in your explicit order.
You must always spell out the columns you use for an operation like yours. It could break if you alter the order of columns in one of your tables later. Do not rely on *.
The order of rows in a SELECT is indeterminate as long as you don't include an ORDER BY clause. If you want a specific order you have to ORDER BY a primary or unique column (or unique combination of columns). If you order by a non-unique set of columns, the rows within groups of the same key are again in indeterminate order.
SELECT col1, col2, col3 FROM tbl
ORDER BY <unique column or set of oclumns>;
Read the manual on the ORDER BY clause.