How to give two strings the same ID if they have 80% similarity?

How to give two strings the same ID if they have 80% similarity? - sql

There are two id descriptions which are similar that is have say 80% similarity. I need both to be given same id.
There are other id descriptions having say 60% similarity. These should retain their own ids. Once an id desc has been considered and modified, it should not be taken as a reference. Further
example:
id id description
1 pepsodent
2 pepsodent salt
3 pepsod
4 pepsodent and salt
5 peps
Now, pepsodent matches with pepsodent salt.therefor both should be given id as 1
Now as pepsodent salt has already been modified,it cannot be used as a scale of reference further.

As I said in my comment above, you need to define exactly what the rules are for matching two records. In this example, I am giving the a New ID to any records that contain the entire string 'pepsodent'. The New ID for these records will be 999 but you can modify as you see fit:
SELECT ID, ID_Description,
CASE
WHEN ID_Description LIKE 'Pepsodent%' THEN 999
ELSE ID
END AS New_ID
FROM Table

Related

Multi-Row Per Record SQL Statement

I'm not sure this is possible but my manager wants me to do it...
Using the below picture as a reference, is it possible to retrieve a group of records, where each record has 2 rows of columns?
So columns: Number, Incident Number, Vendor Number, Customer Name, Customer Location, Status, Opened and Updated would be part of the first row and column: Work Notes would be a new row that spans the width of the report. Each record would have two rows. Is this possible with a GROUP BY statement?
Record 1
Row 1 = Number, Incident Number, Vendor Number, Customer Name, Customer Location, Status, Opened and Updated
Row 2 = Work Notes
Record 2
Row 1 = Number, Incident Number, Vendor Number, Customer Name, Customer Location, Status, Opened and Updated
Row 2 = Work Notes
Record n
...

I don't think that possible with the built in report engine. You'll need to export the data and format it using something else.
You could have something similar to what you want on short description (list report, group by short description), but you can't group by work notes so that's out.

One thing to note is that the work_notes field is not actually a field on the table, the work_notes field is of type journal_input, which means it's really just a gateway to the actual underlying data model. "Modifying" work_notes actually just inserts into sys_journal_field.
sys_journal_field is the table which stores the work notes you're looking for. Given a sys_id of an incident record, this URL will give you all journal field entries for that particular record:
/sys_journal_field_list.do?sysparm_query=name=task^element_id=<YOUR_SYS_ID>
You will notice this includes ALL journal fields (comments + work_notes + anything else), so if you just wanted work notes, you could simply add a query against element thusly:
/sys_journal_field_list.do?sysparm_query=name=task^element=work_notes^element_id=<YOUR_SYS_ID>
What this means for you!
While you can't separate a physical row into multiple logical rows in the UI, in the case of journal fields you can join your target table against the sys_journal_field table using a Database View. This deviates from your goal in that you wouldn't get a single row for all work notes, but rather an additional row for each matched work note.
Given an incident INC123 with 3 work notes, your report against the Database View would look kind of like this:
Row 1: INT123 | markmilly | This is a test incident |
Row 2: INT123 | | | Work note #1
Row 3: INT123 | | | Work note #2
Row 4: INT123 | | | Work note #3

Complicated SQL query vs datatable iteration and proccessing

I have a three table structure in SQL Server 2012: people, connections and messages. The affected schema would be like this:
People: Id (pk bigint), name...
Connections: Id (pk bigint), IdPpl1 fk, IdPpl2 fk
Messages: Id (pk uniqueidentifier), Idconnection (fk), Messagetype (smallint)
On the Connections table, IdPpl1 and IdPpl2 are fk's to people Id. It could happen to appear in this table the same "two people" but swapping their column, E.G:
Id IdPpl1 IdPpl2
.. ...... ......
3 101 105
8 105 101
9 101 106
10 106 101
The above situation is correct. Actually, those are the maximum occurrences of these "two people" in the table.
The Messages table holds the information of which "connection" sent a message.
Id IdConnection Messagetype
.. ............ ...........
24 3 1
25 8 1
26 3 2
27 8 2
28 9 3
29 10 2
(Note: the messages are one-way, that's why there can be two rows in the connections table affecting the same two people: on the first row, one person is the sender and the other the receiver, on the second row they swap)
Given a People Id, I need a SQL query to show "least connectiontype messages mutually sent by mutually connected people" and an extra colum indicating if the messagetype matches or not. The result should be like this, for People Id 101:
Person_id Person_name IdConnection MatchingMsgType
......... ........... ............ ...............
105 John 3 1
106 Peter 9 0
The first row appears because of MsgIds 24 and 25. A potential row corresponding with messages 26 and 27 won't appear because a previous matching messagetype was found.
The second row appears because of MsgIds 28 and 29, marking the messagetype as non-matching.
Currently I get all the "messages related to a person" and iterate through the datatable sorting, filtering and operating in-memory.
Would you go with a full-SQL solution (I want to preserve full isolation between app tiers) or is more suitable the datatable iteration?
Thanks in advance!!

Obviously it depends on the length of the resulting set of your current db query (the one resulting in all rows related to a user). It is not clear if rows are ever removed from you tables. If not, your solution does not scale, since the number of matching rows will grow for ever. If instead you can assert the the number of resulting rows has some bound (for example: the maximum number of connections a user can open at the same time) then your solution might be good enough.

How to make SQL query that will combine rows of result from one table with rows of another table in specific conditions in SQLite

I have aSQLite3 database with three tables. Sample data looks like this:
Original
id aName code
------------------
1 dog DG
2 cat CT
3 bat BT
4 badger BDGR
... ... ...
Translated
id orgID isTranslated langID aName
----------------------------------------------
1 2 1 3 katze
2 1 1 3 hund
3 3 0 3 (NULL)
4 4 1 3 dachs
... ... ... ... ...
Lang
id Langcode
-----------
1 FR
2 CZ
3 DE
4 RU
... ...
I want to select all data from Original and Translated in way that result would consist of all data in Original table, but aName of rows that got translation would be replaced with aName from Translated table, so then I could apply an ORDER BY clause and sort data in the desired way.
All data and table designs are examples just to show the problem. The schema does contain some elements like an isTranslated column or translation and original names in separate tables. These elements are required by application destination/design.
To be more specific this is an example rowset I would like to produce. It's all the data from table Original modified by data from Translated if translation is available for that certain id from Original.
Desired Result
id aName code isTranslated
---------------------------------
1 hund DG 1
2 katze CT 1
3 bat BT 0
4 dachs BDGR 1
... ... ... ...

This is a typcial application for the CASE expression:
SELECT Original.id,
CASE isTranslated
WHEN 1 THEN Translated.aName
ELSE Original.aName
END AS aName,
code,
isTranslated
FROM Original
JOIN Translated ON Original.id = Translated.orgID
WHERE Translated.langID = (SELECT id FROM Lang WHERE Langcode = 'DE')
If not all records in Original have a corresponding record in Translated, use LEFT JOIN instead.
If untranslated names are guaranteed to be NULL, you can just use IFNULL(Translated.aName, Original.aName) instead.

You should probably list the actual results you want, which would help people help you in the future.
In the current case, I'm guessing you want something along these lines:
SELECT Original.id, Original.code, Translated.aName
FROM Original
JOIN Lang
ON Lang.langCode = 'DE'
JOIN Translated
ON Translated.orgId = Original.id
AND Translated.langId = Lang.id
AND Translated.aName IS NOT NULL;
(Check out my example to see if these are the results you want).
In any case, the table set you've got is heading towards a fairly standard 'translation table' setup. However, there are some basic changes I'd make.
Original
Name the table to something specific, like Animal
Don't include a 'default' translation in the table (you can use a view, if necessary).
'code' is fine, although in the case of animals, genus/species probably ought to be used
Lang
'Lanugage' is often a reserved word in RDBMSs, so the name is fine.
Specifically name which 'language code' you're using (and don't abbreviate column names). There's actually (up to) three different ISO codes possible - just grab them all.
(Also, remember that languages have language-specific names, so language also needs it's own 'translation' table)
Translated
Name the table entity-specific, like AnimalNameTranslated, or somesuch.
isTranslated is unnecessary - you can derive it from the existence of the row - don't add a row if the term isn't translated yet.
Put all 'translations' into the table, including the 'default' one. This means all your terms are in one place, so you don't have to go looking elsewhere.

Convert any string to an integer

Simply put, I'd like to be able to convert any string to an integer, preferably being able to restrict the size of the integer and ensure that the result is always identical. In other words is there a hashing function, supported by Oracle, that returns a numeric value and can that value have a maximum?
To provide some context if needed, I have two tables that have the following, simplified, format:
Table 1 Table 2
id | sequence_number id | sequence_number
-------------------- -------------
1 | 1 1 | 2QD44561
1 | 2 1 | 6HH00244
2 | 1 2 | 5DH08133
3 | 1 3 | 7RD03098
4 | 2 4 | 8BF02466
The column sequence_number is number(3) in Table 1 and varchar2(11) in Table 2; it is part of the primary key in both tables.
The data is externally provided and cannot be changed; in Table 1 it is, I believe, created by a simple sequence but in Table 2 has a meaning. The data is made up but representative.
Someone has promised that we would output a number(3) field. While this is fine for the column in the first table, it causes problems for the second.
I would like to be able to both convert sequence_number to an integer (easy), that is less than 1000 (harder) and if at all possible is constant (seemingly impossible). This means that I would like '2QD44561' to always return 586. It does not matter, much, if two strings return the same number.
Simply converting to an integer I can use utl_raw.cast_to_number():
select utl_raw.cast_to_number((utl_raw.cast_to_raw('2QD44561'))) from dual;
UTL_RAW.CAST_TO_NUMBER((UTL_RAW.CAST_TO_RAW('2QD44561')))
---------------------------------------------------------
-2.033E+25
But as you can see this isn't less than 1000
I've also been playing around with dbms_crypto and utl_encode to see if I could come up with something but I've not managed to get a small integer. Is there a way?

How about ora_hash?
select ora_hash(sequence_number, 999) from table_2;
... will produce a maximum of 3 digits. You could also seed it with the id I suppose, but not sure that adds much with so few values, and I'm not sure you'd want that anyway.

You are talking about using a hash function. There are lots of solutions out there - sha1 is very common.
But just FYI, when you say "restrict the size of the integer" understand that you will then be mapping an infinite set of strings or numbers onto a limited set of values. So while your strings will always map to the same value when they are the same, they will not be the only string to map to that value

Microsoft Access 2010 - Updating Multiple Rows with Different values in ONE query

I have a question about updating multiple rows with different values in MS Access 2010.
Table 1: Food
ID | Favourite Food
1 | Apple
2 | Orange
3 | Pear
Table 2: New
ID | Favourite Food
1 | Watermelon
3 | Cherries
Right now, it looks deceptively simple to execute them separately (because this is just an example). But how would I execute a whole lot of them at the same time if I had, say, 500 rows to update out of 1000 records.
So what I want to do is to update the "Food" table based on the new values from the "New" table.
Would appreciate if anyone could give me some direction / syntax so that I can test it out on MS Access 2010. If this requires VBA, do provide some samples of how I should carry this out programmatically, not manually statement-by-statement.
Thank you!
ADDENDUM (REAL DATA)
Table: Competitors
Columns: CompetitorNo (PK), FirstName, LastName, Score, Ranking
query: FinalScore
Columns: CompetitorNo, Score, Ranking
Note - this query is a query of another query, which in turn, is a query of another query (could there be a potential problem here? There are at least 4 queries before this FinalScore query is derived. Should I post them?)
In the competitors table, all the columns except "Score" and "Ranking" are filled. We would need to take the values from the FinalScore query and insert them into the relevant competitor columns.
Addendum (Brief Explanation of Query)
Table: Competitors
Columns: CompetitorNo (PK), FirstName, LastName, Score, Ranking
Sample Data: AX1234, Simpson, Danny, <blank initially>, <blank initially>
Table: CompetitionRecord
Columns: EventNo (PK composite), CompetitorNo (PK composite), Timing, Bonus
Sample Data1: E01, AX1234, 14.4, 1
Sample Data2: E01, AB1938, 12.5, 0
Sample Data3: E01, BB1919, 13.0, 2
Event No specifies unique event ID
Timing measures the time taken to run 200 metres. The lesser, the better.
Bonus can be given in 3 values (0 - Disqualified, 1 - Normal, 2 - Exceptional). Competitors with Exceptional are given bonus points (5% off their timing).
Query: FinalScore
Columns: CompetitorNo (PK), Score, Ranking
Score is calculated by wins. For example, in the above event (E01), there are three competitors. The winner of the event is BB1919. Winners get 1 point. Losers don't get any points. Those that are disqualified do not receive any points as well.
This query lists the competitors and their cumulative scores (from a list of many events - E01, E02, E03 etc.) and calculates their ranking in the ranking column everytime the query is executed. (For example, a person who wins the most 200m events would be at the top of this list).
Now, I am required to update the Competitors table with this information. The query is rather complex - with all the grouping, summations, rankings and whatnots. Thus, I had to create multiple queries to achieve the end result.

How about:
UPDATE Food
INNER JOIN [New]
ON Food.ID=New.ID
SET Food.[Favourite Food] = New.[Favourite Food]

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas