Searching for non unique values between two columns

Searching for non unique values between two columns - sql

I have a single table with three columns, 'id', 'number' and 'transaction.' Each id should only be tied to one number however may exist many times in the table under different values of transaction. I've been unable to develop a query that will return cases of a single id sharing multiple numbers (and show the id and number in the report). I don't wish to delete these values via the query, I just need to see the values. See example below:
Here's a screenshot example: http://i591.photobucket.com/albums/ss355/riggins_83/table2_zps5509f3cf.jpg I appreciate the assistance, I've tried all the code posted here and it hasn't given me the output I'm looking for. As seen in the screenshot it's possible for the same ID number and Number to appear in the table multiple times with a different transaction number, what shouldn't occur is what's on rows 1 and 2 (two different numbers with same ID number). The ID number is a value that should always be tied to the same Number which the transaction is only linked to that line. I'm trying to generate output of each number that's sharing an ID number (and the shared ID Number if possible).
Test IDNumber Number Transaction
1 31 1551 5
2 31 1553 7
3 32 1701 8
4 33 1701 9
5 33 1701 10
6 33 1701 11
7 39 1885 12
The result of output I would need:
IDNumber Number
31 1551
31 1553
This output is showing me the Number (and ID number) in cases where an ID number is being shared between two (or possibly more) numbers. I know there are cases in the table where an ID number is being shared among many numbers.
Any assistance is greatly appreciated!

SELECT *
FROM thetable t0
WHERE EXISTS (
SELECT *
FROM thetable t1
WHERE t1.id = t0.id
-- Not clear from the question if the OP wants the records
-- to differ by number
-- AND t1.number <> t1.number
-- ... or by "transaction"
AND t1."transaction" <> t0."transaction"
-- ... or by either ???
-- AND (t1.number <> t1.number OR t1."transaction" <> t0."transaction")
);

SELECT IDNumber, Number
FROM YourTable
WHERE IDNumber IN (
SELECT IDNumber
FROM YourTable
GROUP BY IDNumber
HAVING COUNT(DISTINCT number) > 1
)
The subquery returns all the IDNumbers with more than 1 Number. Then the main query returns all the numbers for each of those IDNumbers.
DEMO

Related

Given a table of numbers, can I get all the rows which add up to less than or equal to a number?

Say I have a table with an incrementing id column and a random positive non zero number.
id
rand
1
12
2
5
3
99
4
87
Write a query to return the rows which add up to a given number.
A couple rules:
Rows must be "consumed" in order, even if a later row makes it a a perfect match. For example, querying for 104 would be a perfect match for rows 1, 2, and 4 but rows 1-3 would still be returned.
You can use a row partially if there is more available than is necessary to add up to whatever is leftover on the number E.g. rows 1, 2, and 3 would be returned if your max number is 50 because 12 + 5 + 33 equals 50 and 90 is a partial result.
If there are not enough rows to satisfy the amount, then return ALL the rows. E.g. in the above example a query for 1,000 would return rows 1-4. In other words, the sum of the rows should be less than or equal to the queried number.
It's possible for the answer to be "no this is not possible with SQL alone" and that's fine but I was just curious. This would be a trivial problem with a programming language but I was wondering what SQL provides out of the box to do something as a thought experiment and learning exercise.

You didn't mention which RDBMS, but assuming SQL Server:
DROP TABLE #t;
CREATE TABLE #t (id int, rand int);
INSERT INTO #t (id,rand)
VALUES (1,12),(2,5),(3,99),(4,87);
DECLARE #target int = 104;
WITH dat
AS
(
SELECT id, rand, SUM(rand) OVER (ORDER BY id) as runsum
FROM #t
),
dat2
as
(
SELECT id, rand
, runsum
, COALESCE(LAG(runsum,1) OVER (ORDER BY id),0) as prev_runsum
from dat
)
SELECT id, rand
FROM dat2
WHERE #target >= runsum
OR #target BETWEEN prev_runsum AND runsum;

Seperating a Oracle Query with 1.8 million rows into 40,000 row blocks

I have a project where I am taking Documents from one system and importing them into another.
The first system has the documents and associated keywords stored. I have a query that will return the results which will then be used as the index file to import them into the new system. There are about 1.8 million documents involved so this means 1.8 million rows (One per document).
I need to divide the returned results into blocks of 40,000 to make importing them in batches of 40,000 at a time, rather than one long import.
I have the query to return the results I need. Just need to know how to take that and break it up for easier import. My apologies if I have included to little information. This is my first time here asking for help.

Use the built-in function ORA_HASH to divide the rows into 45 buckets of roughly the same number of rows. For example:
select * from some_table where ora_hash(id, 44) = 0;
select * from some_table where ora_hash(id, 44) = 1;
...
select * from some_table where ora_hash(id, 44) = 44;
The function is deterministic and will always return the same result for the same input. The resulting number starts with 0 - which is normal for a hash, but unusual for Oracle, so the query may look off-by-one at first. The hash works better with more distinct values, so pass in the primary key or another unique value if possible. Don't use a low-cardinality column, like a status column, or the buckets will be lopsided.
This process is in some ways inefficient, since you're re-reading the same table 45 times. But since you're dealing with documents, I assume the table scanning won't be the bottleneck here.

A prefered way to bucketing the ID is to use the NTILE analytic function.
I'll demonstrate this on a simplified example with a table with 18 rows that should be divided in four chunks.
select listagg(id,',') within group (order by id) from tab;
1,2,3,7,8,9,10,15,16,17,18,19,20,21,23,24,25,26
Note, that the IDs are not consecutive, so no arithmetic can be used - the NTILE gets the parameter of the requested number of buckets (4) and calculates the chunk_id
select id,
ntile(4) over (order by ID) as chunk_id
from tab
order by id;
ID CHUNK_ID
---------- ----------
1 1
2 1
3 1
7 1
8 1
9 2
10 2
15 2
16 2
17 2
18 3
19 3
20 3
21 3
23 4
24 4
25 4
26 4
18 rows selected.
All but the last bucket are of the same size, the last one can be smaller.
If you want to calculate the ranges - use simple aggregation
with chunk as (
select id,
ntile(4) over (order by ID) as chunk_id
from tab)
select chunk_id, min(id) ID_from, max(id) id_to
from chunk
group by chunk_id
order by 1;
CHUNK_ID ID_FROM ID_TO
---------- ---------- ----------
1 1 8
2 9 17
3 18 21
4 23 26

Find 'Most Similar' Items in Table by Foreign Key

I have a child table with a number of charact/value pairs for a given 'material' (MaterialID). Any material can have a number of charact values and may have several of the same name (see id's 2,3).
The table has a large number of records (8+ million). What I'm trying to do is find the materials that are the most similar to a supplied material. That is, when I supply a MaterialID, I would like an ordered list of the most similar other materials (those with the most matching charact/value pairs).
I've done some research but, I may be missing some key terms or just not conceptualizing the problem correctly.
Any hints as to how to go about this would be very much appreciated.
ID MaterialID Charact Value
1 1 ROT_DIR CCW
2 1 SPECIAL_FEATURE CATALOG_CP
3 1 SPECIAL_FEATURE CHROME
4 1 SCHEDULE 80
5 2 BEARING_TYPE SB
6 2 SCHEDULE 80
7 3 ROT_DIR CCW
8 3 SPECIAL_FEATURE CATALOG_HSB
9 3 BEARING_TYPE SP
10 4 NDE_STYLE W_FAN
11 4 BEARING_TYPE SB
12 4 ROT_DIR CW*

You can do this with a self join:
select t.materialid, count(*) as nummatches
from t join
t tmat
on t.Charact = tmat.Charact and t.value = tmat.value
where tmat.materialid = #MaterialId
group by t.materialid
order by nummatches desc;
Notes:
You might want to remove the specified material, by adding where t.MaterialId <> tmat.MaterialId to the where clause.
If you want all materials, then make the join a left join and move the where condition to the on clause.
If you want only one material with the most matches, use select top 1.
If you want all materials with the most matches when there are ties, use `select top (1) with ties.

SQL complex grouping "in column"

I have a table with 3 columns (sorted by the first two):
letter
number (sorted for each letter)
difference between current number and previous number of the same letter
I'd like to calculate (with vanlla SQL) a fourth new column RESULT to group these data when the third column (difference of number between contiguos record; i.e #2 --> 4 = 5-1) is greater than 30 marking all the records of this interval with letter-number of the first record (i.e A1 for #1,#2,#3).
Since the difference between contiguos numbers makes sense just for records with the same letter, for the first record of a new letter, the value of differnce is 31 (meaning that it's a new group; i.e. #6).
Here is what I'd like to get as result:
# Letter Number Difference RESULT (new column)
1 A 1 1 A1
2 A 5 4 A1
3 A 7 2 A1
4 A 40 33 A40 (*)
5 A 43 3 A40
6 B 1 31 B1 (*)
7 B 25 24 B1
8 B 27 2 B1
9 B 70 43 B70 (*)
10 B 75 5 B70
Now I can only find the "breaking values" (*) with this query where they get a value of 1:
select letter
,number
,cast(difference/30 as int) break
from table
where cast(difference/30 as int) = 1
Even though I'm able to find these breaking values I can't finish my task.
Can anyone help me finding a way to obtain the column RESULT?
Thanks in advance
FF

As I understand you need to construct the last result column. You can use concat to do that:
SELECT letter
,number
,concat(letter, cast(difference/30 as int)) result
FROM table
HAVING result = 'A1'

after some exercise and a little help from a friend of mine, I've found a possible solution to my sql prolblem.
The only requirment for the solution is that my first record must have a value of 31 in Difference field (since I need "breaks" when Difference > 30 than the previous record).
Here is the query to get the column RESULT I needed:
select alls.letter
,alls.number
,ints.letter||ints.number as result
from competition.lag alls
,(select letter
,number
,difference
,result
from (select letter
,number
,difference
,case when difference>30 then 1 else 2 end as result
from competition.lag
) temp
where result = 1
) ints
where ints.letter=alls.letter
and alls.number>=ints.number
and alls.number-30<=ints.number

Group rows into sets of 5

TableA
Col1
----------
1
2
3
4....all the way to 27
I want to add a second column that assigns a number to groups of 5.
Results
Col1 Col2
----- ------
1 1
2 1
3 1
4 1
5 1
6 2
7 2
8 2...and so on
The 6th group should have 2 rows in it.
NTILE doesn't accomplish what I want because of the way NTILE handles the groups if they aren't divisible by the integer.
If the number of rows in a partition is not divisible by integer_expression, this will cause groups of two sizes that differ by one member. Larger groups come before smaller groups in the order specified by the OVER clause. For example if the total number of rows is 53 and the number of groups is five, the first three groups will have 11 rows and the two remaining groups will have 10 rows each. If on the other hand the total number of rows is divisible by the number of groups, the rows will be evenly distributed among the groups. For example, if the total number of rows is 50, and there are five groups, each bucket will contain 10 rows.
This is clearly demonstrated in this SQL Fiddle. Groups 4, 5, 6 each have 4 rows while the rest have 5. I have some started some solutions but they were getting lengthy and I feel like I'm missing something and that this could be done in a single line.

You can use this:
;WITH CTE AS
(
SELECT col1,
RN = ROW_NUMBER() OVER(ORDER BY col1)
FROM TableA
)
SELECT col1, (RN-1)/5+1 col2
FROM CTE;
In your sample data, col1 is a correlative without gaps, so you could use it directly (if it's an INT) without using ROW_NUMBER(). But in the case that it isn't, then this answer works too. Here is the modified sqlfiddle.

A bit of math can go a long way. subtracting 1 from all values puts the 5s (edge cases) into the previous group here, and 6's into the next. flooring the division by your group size and adding one give the result you're looking for. Also, the SQLFiddle example here fixes your iterative insert - the table only went up to 27.
SELECT col1,
floor((col1-1)/5)+1 as grpNum
FROM tableA

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas