query that would count and increment the number of duplicate instances of that record - sql

Using Access 2010.
So if I had a table
COL1
A
B
A
C
A
and the run the query I would get the output in COL2 where 'A' is duplicated three times and its COL2 value is in turn incremented.
COL1 | COL2
A | 1
B | 1
A | 2
C | 1
A | 3

Add a field to your table. Choose AutoNumber as its data type and make it the table's primary key. I named the field ID, so my version of your sample data looks like this ...
ID COL1
1 A
2 B
3 A
4 C
5 A
The SELECT statement below returns this result set ...
ID COL1 COL2a COL2b
1 A 1 1
2 B 1 1
3 A 2 2
4 C 1 1
5 A 3 3
COL2a and COL2b show 2 methods to achieve the same result. DCount is Access-specific, and required quotes around the m.COL1 text values. The second approach, COL2b, uses a correlated subquery so could work in a different database if you choose. And with that approach, you wouldn't need to bother about quoting text values.
Either approach basically requires the db engine run an additional query for each row of the result set. So, with a huge table, performance will be a concern. Indexing will help there. Add an index on COL1 if there isn't one already. ID already has an index since it's the primary key.
If you can't add a field, and the table doesn't already include another suitable field, then I think you're out of luck. You won't be able to get what you want with an Access query.
SELECT
m.ID,
m.COL1,
DCount(
"*",
"MyTable",
"COL1 = '" & m.COL1 & "' AND ID <= " & m.ID
) AS COL2a,
(
SELECT Count(*)
FROM MyTable AS m2
WHERE m2.COL1 = m.COL1 AND m2.ID <= m.ID
) AS COL2b
FROM MyTable2 AS m
ORDER BY m.ID;

Related

Need to find out if all columns in a SQL Server table have the same value

I have the task to find out if all columns in a SQL Server table have exact the same value. The table content is created by a stored procedure and can vary in the number of columns. The first column is an ID, the second and the following columns must be compared if the all columns have exact the same value.
At the moment I do not have a clue how to achieve this.
The best solution would be to display only the rows, which have different values in one or multiple columns except the first column with ID.
Thank you so much for your help!!
--> Edit: The table looks this:
ID Instance1 Instance2 Instance3 Instance4 Instance5
=====================================================
A 1 1 1 1 1
B 1 1 0 1 1
C 55 55 55 55 55
D Driver Driver Driver Co-driver Driver
E 90 0 90 0 50
F On On On On On
The result should look like this, only the rows with one or multiple different column values should be display.
ID Instance1 Instance2 Instance3 Instance4 Instance5
=====================================================
B 1 1 0 1 1
D Driver Driver Driver Co-driver Driver
E 90 0 90 0 50
My table has more than 1000 rows and 40 columns
you can achieve this by using row_number()
Try the following code
With c as(
Select id
,field_1
,field_2
,field_3
,field_n
,row_number() over(partition by field_1,field_2,field_3,field_n order by id asc) as rn
From Table
)
Select *
From c
Where rn = 1
row_number with partition is going to show you if the field is repeated by assigning a number to a row based on field_1,field_2,field_3,field_n, for example if you have 2 rows with same field values the inner query is going to show you
rn field_1 field_2 field_3 field_n id
1 x y z a 5
2 x y z a 9
After that on the outer part of the query pick rn = 1 and you are going to obtain a query without repetitions based on fields.
Also if you want to delete repeated numbers from your table you can apply
With c as(
Select id
,field_1
,field_2
,field_3
,field_n
,row_number() over(partition by field_1,field_2,field_3,field_n order by id asc) as rn
From Table
)
delete
From c
Where rn > 1
The best solution would be to display only the rows, which have different values in one or multiple columns except the first column with ID.
You may be looking for a the following simple query, whose WHERE clause filters out rows where all fields have the same value (I assumed 5 fields - id not included).
SELECT *
FROM mytable t
WHERE NOT (
field1 = field2
AND field1 = field3
AND field1 = field4
AND field1 = field5
);

Assign a random order to each group

I want to expand each row in TableA into 4 rows. The result hold all the columns from TableA and two additional columns: SetID = ranging from 0 to 3 and unique when grouped by TableA. Random = a random permutation of SetID within the same grouping.
I use SQLite and would prefer a pure SQL solution.
Table A:
Description
-----------
A
B
Desired output:
Description | SetID | Random
------------|-------|-------
A | 0 | 2
A | 1 | 0
A | 2 | 3
A | 3 | 1
B | 0 | 3
B | 1 | 2
B | 2 | 0
B | 3 | 1
My attempt so far solves creating 4 rows for each row in TableA but doesn't get the permutation correctly. wrong will contain a random number ranging from 0 to 3. I need exactly one 0, 1, 2 and 3 for each unique value in Description and their order should be random.
SELECT
Description,
SetID,
abs(random()) % 4 AS wrong
FROM
TableA
LEFT JOIN
TableB
ON
1 = 1
Table B:
SetID
-----
0
1
2
3
Use a cross join
SELECT Description,
SetID,
abs(random()) % 4 AS wrong
FROM TableA
CROSS JOIN TableB
Consider a solution in your specialty, R. As you know, R maintains excellent database packages, one of which is RSQLite. Additionally, R can run commands via the connection without the need to import very large datasets.
Your solution is essentially a random sampling without replacement. Simply have R run the sampling and concatenate list items into an SQL string.
Below creates a table in the SQLite database where R sends the CREATE TABLE command to the SQL engine. No import or export of data. Should you need to run every four rows, run an iterative loop in a defined function that outputs the sql string. For append queries change the CREATE TABLE AS to INSERT INTO ... SELECT statement.
library(RSQLite)
sqlite <- dbDriver("SQLite")
conn <- dbConnect(sqlite,"C:\\Path\\To\\Database\\File\\newexample.db")
# SAMPLE WITHOUT REPLACEMENT
randomnums <- as.list(sample(0:3, 4, replace=F))
# SQL CONCATENATION
sql <- sprintf("CREATE TABLE PermutationsTable AS
SELECT a.Description, b.SetID,
(select %d from TableB WHERE TableB.SetID = b.SetID AND TableB.SetID=0
union select %d from TableB WHERE TableB.SetID = b.SetID AND TableB.SetID=1
union select %d from TableB WHERE TableB.SetID = b.SetID AND TableB.SetID=2
union select %d from TableB WHERE TableB.SetID = b.SetID AND TableB.SetID=3)
As RandomNumber
from TableA a, TableB b;",
randomnums[[1]], randomnums[[2]],
randomnums[[3]], randomnums[[4]])
# RUN QUERY
dbSendQuery(conn, sql)
dbDisconnect(conn)
You will notice a nested union subquery. This is used to achieve the inline random numbers for each row. Also, to return all possible combinations from all tables, no join statements are needed, simply list tables in FROM clause.

How can I select unique rows in a database over two columns?

I have found similar solutions online but none that I've been able to apply to my specific problem.
I'm trying to "unique-ify" data from one table to another. In my original table, data looks like the following:
USERIDP1 USERIDP2 QUALIFIER DATA
1 2 TRUE AB
1 2 CD
1 3 EF
1 3 GH
The user IDs are composed of two parts, USERIDP1 and USERIDP2 concatenated. I want to transfer all the rows that correspond to a user who has QUALIFIER=TRUE in ANY row they own, but ignore users who do not have a TRUE QUALIFIER in any of their rows.
To clarify, all of User 12's rows would be transferred, but not User 13's. The output would then look like:
USERIDP1 USERIDP2 QUALIFIER DATA
1 2 TRUE AB
1 2 CD
So basically, I need to find rows with distinct user ID components (involving two unique fields) that also possess a row with QUALIFIER=TRUE and copy all and only all of those users' rows.
Although this nested query will be very slow for large tables, this could do it.
SELECT DISTINCT X.USERIDP1, X.USERIDP2, X.QUALIFIER, X.DATA
FROM YOUR_TABLE_NAME AS X
WHERE EXISTS (SELECT 1 FROM YOUR_TABLE_NAME AS Y WHERE Y.USERIDP1 = X.USERIDP1
AND Y.USERIDP2 = X.USERIDP2 AND Y.QUALIFIER = TRUE)
It could be written as an inner join with itself too:
SELECT DISTINCT X.USERIDP1, X.USERIDP2, X.QUALIFIER, X.DATA
FROM YOUR_TABLE_NAME AS X
INNER JOIN YOUR_TABLE_NAME AS Y ON Y.USERIDP1 = X.USERIDP1
AND Y.USERIDP2 = X.USERIDP2 AND Y.QUALIFIER = TRUE
For a large table, create a new auxiliary table containing only USERIDP1 and USERIDP2 columns for rows that have QUALIFIER = TRUE and then join this table with your original table using inner join similar to the second option above. Remember to create appropriate indexes.
This should do the trick - if the id fields are stored as integers then you will need to convert / cast into Varchars
SELECT 1 as id1,2 as id2,'TRUE' as qualifier,'AB' as data into #sampled
UNION ALL SELECT 1,2,NULL,'CD'
UNION ALL SELECT 1,3,NULL,'EF'
UNION ALL SELECT 1,3,NULL,'GH'
;WITH data as
(
SELECT
id1
,id2
,qualifier
,data
,SUM(CASE WHEN qualifier = 'TRUE' THEN 1 ELSE 0 END)
OVER (PARTITION BY id1 + '' + id2) as num_qualifier
from #sampled
)
SELECT
id1
,id2
,qualifier
,data
from data
where num_qualifier > 0
Select *
from yourTable
INNER JOIN (Select UserIDP1, UserIDP2 FROM yourTable WHERE Qualifier=TRUE) B
ON yourTable.UserIDP1 = B.UserIDP1 and YourTable.UserIDP2 = B.UserIDP2
How about a subquery as a where clause?
SELECT *
FROM theTable t1
WHERE CAST(t1.useridp1 AS VARCHAR) + CAST(t1.useridp2 AS VARCHAR) IN
(SELECT CAST(t2.useridp1 AS VARCHAR) + CAST(t.useridp2 AS VARCHAR)
FROM theTable t2
WHERE t2.qualified
);
This is a solution in mysql, but I believe it should transfer to sql server pretty easily. Use a subquery to pick out groups of (id1, id2) combinations with at least one True 'qualifier' row; then join that to the original table on (id1, id2).
mysql> SELECT u1.*
FROM users u1
JOIN (SELECT id1,id2
FROM users
WHERE qualifier
GROUP BY id1, id2) u2
USING(id1, id2);
+------+------+-----------+------+
| id1 | id2 | qualifier | data |
+------+------+-----------+------+
| 1 | 2 | 1 | aa |
| 1 | 2 | 0 | bb |
+------+------+-----------+------+
2 rows in set (0.00 sec)

Select data if conditions on two separate rows are met

Consider the following dataset:
id dataid data
1 3095 5
1 3096 9
1 3097 8
2 3095 4
2 3096 9
2 3097 15
Now, in this, the column someid identifies to certain data, so if I see 3095, I know what data the data column represents (name, address, etc.). I need to do a check so that for the group of ids (i.e. 1 and 2) dataid=3095 then data=5 AND dataid=3096 then data=9, and if this is true, the id group will be selected and operations will be done on it.
Edit: Now I use the following SQL query to do the above:
SELECT *
FROM table s0
JOIN table s1 USING (dataid)
JOIN table s2 USING (dataid)
WHERE s1.dataid=359 AND s1.data=5
AND s2.dataid=360 AND s2.data=6;
But how can I get the output from rows to columns. The property values I need are still as key:pair values in rows and I would like them as columns.
So the output for the above would be:
id 3095 3096 3097
1 5 9 8
whereas currently it is returning from the above query:
id dataid data dataid_1 data_1 dataid_2 data_2
1 3095 5 Unnecessary stuff because of JOIN
1 3096 9
1 3097 8
Thanks and sorry if this is confusing.
SELECT id
FROM (SELECT DISTINCT id
FROM table) as ids
WHERE EXISTS (SELECT '1'
FROM table t2
WHERE t2.id = ids.id AND dataid = 3095 AND data = 5)
AND EXISTS (SELECT '1'
FROM table t2
WHERE t2.id = ids.id AND dataid = 3096 AND data = 9)
Also, if you're querying additional tables besides the given one (one with id as a unique key, preferrably), consider including that, to remove the need to use DISTINCT

Returning several rows from a single query, based on a value of a column

Let's say I have this table:
|Fld | Number|
1 5
2 2
And I want to make a select that retrieves as many Fld as the Number field has:
|Fld |
1
1
1
1
1
2
2
How can I achieve this? I was thinking about making a temporary table and instert data based on the Number, but I was wondering if this could be done with a single Select statement.
PS: I'm new to SQL
You can join with a numbers table:
SELECT Fld
FROM yourtable
JOIN Numbers
ON yourtable.Number <= Numbers.Number
A numbers table is just a table with a list of numbers:
Number
1
2
3
etc...
Not an great solution (since you still query your table twice, but maybe you can work from it)
SELECT t1.fld, t1.number
FROM table t1, (
SELECT ROWNUM number FROM dual
CONNECT BY LEVEL <= (SELECT MAX(number) FROM t1)) t2
WHERE t2.number<=t1.number
It generates maximum amount of rows needed and then filters it by each row.
I don't know if your RDBMS version supports it (although I rather suspect it does), but here is a recursive version:
WITH remaining (fld, times) as (SELECT fld, 1
FROM <table>
UNION ALL
SELECT a.fld, a.times + 1
FROM remaining as a
JOIN <table> as b
ON b.fld = a.fld
AND b.number > a.times)
SELECT fld
FROM remaining
ORDER BY fld
Given your source data table, it outputs this (count included for verification):
fld times
=============
1 1
1 2
1 3
1 4
1 5
2 1
2 2