Matching bitmasks using bitstrings (instead of ints) in SQL - sql

I found a great resource here (
Comparing two bitmasks in SQL to see if any of the bits match
) for doing searches in a SQL database, where you're storing data with multiple properties using bit masks. In the example, though, all the data is stored as ints and the where clause seems to only work with ints.
Is there an easy way to convert a very similar test case to use full bitstrings instead? So instead of an example like:
with test (id, username, roles)
AS
(
SELECT 1,'Dave',1
UNION SELECT 2,'Charlie',3
UNION SELECT 3,'Susan',5
UNION SELECT 4,'Nick',2
)
select * from test where (roles & 7) != 0
instead having something like:
with test (id, username, roles)
AS
(
SELECT 1,'Dave',B'001'
UNION SELECT 2,'Charlie',B'011'
UNION SELECT 3,'Susan',B'101'
UNION SELECT 4,'Nick',B'110'
)
select * from test where (roles & B'001') != 0
I can convert back and forth, but it's easier to visualize with the actual bitstrings. For my simple conversion (above) I get an error that the operator doesn't work for bitstrings. Is there another way to set this up that would work?

One way would be to just use a bit string on the right side of the expression, too:
WITH test (id, username, roles) AS (
VALUES
(1,'Dave',B'001')
,(2,'Charlie',B'011')
,(3,'Susan',B'101')
,(4,'Nick',B'110')
)
SELECT *, (roles & B'001') AS intersection
FROM test
WHERE (roles & B'001') <> B'000';
Or you can cast an integer 0 to bit(3)
...
WHERE (roles & B'001') <> 0::bit(3);
You may be interested in this related answer that demonstrates a number of ways to convert between boolean, bit string and integer:
Can I convert a bunch of boolean columns to a single bitmap in PostgreSQL?
Be aware that storing the data as integer can save some space. integer needs 4 bytes for up to 32 bit of information, while - I quote the manual at said location:
A bit string value requires 1 byte for each group of 8 bits, plus 5 or
8 bytes overhead depending on the length of the string [...]

Related

String split/chunks into chunked columns from one varchar column

Hopefully everyone is having a productive lockdown all over the world. This is my second issue I wanted some assistance with today.
What I have is a chat from a telecom company signing up new customers.
I have successfully collapsed them into 2x rows per unique_id - a unique chat interaction captured between customer and company agent.
I would like to now take each column (text) in each row and separate
it out to 5 equal varchar columns.
The objective is to splice/chunk a
conversation into 5 different stages within this table.
I do not
have access to delimiters as customers and company staff use
delimiting characters themselves so it makes this tricky.
Below I have 2 images with what the data looks like now and what I am looking for.
BEFORE
AFTER
I have looked at the following articles to try to crack it, but am stuck:
Split A Single Field Value Into Multiple Fixed-Length Column Values in T-SQL
How to Split String by Character into Separate Columns in SQL Server
How to split a comma-separated value to columns
How to split a single column values to multiple column values?
Split string in SQL Server to a maximum length, returning each as a row
Here is the SQL Fiddle page, but I am running this code in MS SQL Server: http://sqlfiddle.com/#!9/ddd08c
Here is the table creation code:
CREATE TABLE Table1
(`unique_id` double, `user` varchar(8), `text` varchar(144))
;
INSERT INTO Table1
(`unique_id`, `user`, `text`)
VALUES
(50314585222, 'customer', 'This is part 1 of long text. This is part 2 of long text. This is part 3 of long text. This is part 4 of long text. This is part 5 of long text.'),
(50314585222, 'company', 'This is part 1 of long text This is part 2 of long text This is part 3 of long text This is part 4 of long text This is part 5 of long text'),
(50319875222, 'customer', 'This is part 1 This is part 2 This is part 3 This is part 4 This is part 5'),
(50319875222, 'company', 'This is part 1 This is part 2 This is part 3 This is part 4 This is part 5')
;
I have requested an almost similar algorithm in R, in my history. I have been trying to do this in SQL.
I have manage to solve this with the T-SQL statement below:
WITH DataSource AS
(
SELECT *
,'\b.{1,'+CAST(CEILING(LEN([text]) * 1.0 /5) AS VARCHAR(12)) +'}\b' AS [pattern]
FROM TAble1
), PreparedData AS
(
SELECT unique_id
,[user]
,'text' + CAST(RM.matchID + 1 AS VARCHAR(12)) as [column]
,RM.CaptureValue AS [value]
FROM DataSource T
CROSS APPLY [dbo].[fn_Utils_RegexMatches] ([text], [pattern]) RM
)
SELECT *
FROM PreparedData DS
PIVOT
(
max([value]) for [column] IN ([text1], [text2], [text3], [text4], [text5])
) PVT;
In order to use this code, you need to implement SQL CLR function(s) for working with regular expression in the context of T-SQL (you need to invest some time understanding how SQL CLR works) - otherwise, you will not be able to use this solution.
So, having RegexMatches function, the first part is to build a regular expression pattern for splitting the data:
SELECT *
,'\b.{1,'+CAST(CEILING(LEN([text]) * 1.0 /5) AS VARCHAR(12)) +'}\b' AS [pattern]
FROM TAble1;
The pattern is \b.number\b and will match part of the strings with length number but not cutting the words (check if boundary works for you, because in some cases it won't).
Then, using our regex matches function we getting a result like this (the second common table expression):
And the data above is ready for pivoting which is pretty easy.
So, the notes are:
you need to implement Microsoft String Utility
you need to ensure the regex pattern works for you
you can split the T-SQL I used, check the other columns of the regex function and even make dynamic pivoting - the code is an example and need to modify/check it before using in production

Hard request SQL Server 2008

I have a pretty hard request to do in SQL Server 2008, but I'm not able to do the whole...
I have two kind of records :
16HENFC******** (8 numbers after more 'FC')
16HEN******* (7 numbers after more 'EN')
I have to select the * (which are in fact numbers), and add a 0 at the beginning of the second form of record to just have 8 long selected values.
Then I have to insert the result in a empty table.
I think I did the first part which is :
SUBSTRING(SELECT mycolumn1 FROM mytable1 WHERE mycolumn1 LIKE '16HENFC%', 5, 8) ;
In summary,
I have those records in my column :
'16HENFC071052'
'16HEN5130026'
I want to select them and transform them to insert those ones in an other column :
'05130026'
'FC071052'
[EDIT]=>
CREATE TABLE nom_de_la_table
(
colonne1 VARCHAR(250),
colonne2 VARCHAR(250)
)
INSERT INTO nom_de_la_table (colonne1)
VALUES
('16HEN5138745'),
('16HENFC071052v2'),
('16HENFC78942878'),
('16HEN4830026'),
('16HEN7815934'),
('16HENFC74859422'),
('16HEN9687326'),
('16HENFC74889639'),
('16HEN9798556');
[etc...]
So two different types of records, and I want to insert the result of what you did first with just two records in an other column but for the 956 records of my table. And this is the result with the two examples :
'05130026'
'FC071052'
Left-Filling a string is a relatively easy request. Here's an example:
select right(replicate('0',8) + right(test,len(test)-len('16HEN')),8)
from (
select '16HENFC071052' as test
union all
select '16HEN5130026' as test
) z
Use replicate to left-fill your string with the amount of digits you wish to end up with. Append your desired string, in this case, slice your prefix off by taking the right X characters where X = len(target) - len(prefix). Finally, take the right characters of the whole string equal to your desired length.

How do I remove the first character of a string and treat the remaining values as an integer in BigQuery

I currently am working with a large data set that was pre-populated in BigQuery. I have a column of orderID's which have the following set-up: o377412876, o380940924, etc. This is stored in a string. I need to do the following and am running into problems:
1) Strip off the first character using the BigQuery query language
2) Convert the remaining (or treat the remaining values), as an integer.
I will then run a join against the values. Now, I would be abundantly happier down this operation in either Python, R, or another language. That said, the challenge I have been given based on client needs is to write all the scripts in BigQuery's querying language.
SELECT 10 * INTEGER(REGEXP_REPLACE(x, '^.', ''))
FROM
(SELECT 'o1234' AS x)
12340
You can use SUBSTR function and SAFE_CAST (in case there are NULL values in your column). INTEGER does not work on BQ.
SELECT SAFE_CAST(SUBSTR(x, 2) AS INT64)
FROM (SELECT 'o1234' AS x)
Output: 1234

SQL pattern matching

I have a question related to SQL.
I want to match two fields for similarities and return a percentage on how similar it is.
For example if I have a field called doc, which contains the following
This is my first assignment in SQL
and in another field I have something like
My first assignment in SQL
I want to know how I can check the similarities between the two and return by how much percent.
I did some research and wanted a second opinion plus I never asked for source code. Ive looked at Soundex(), Difference(), Fuzzy string matching using Levenshtein distance algorithm.
You didn't say what version of Oracle you are using. This example is based on 11g version.
You can use edit_distance function of utl_match package to determine how many characters you need to change in order to turn one string to another. greatest function returns the greatest value in the list of passed in parameters. Here is an example:
-- sample of data
with t1(col1, col2) as(
select 'This is my first assignment in SQL', 'My first assignment in SQL ' from dual
)
-- the query
select trunc(((greatest(length(col1), length(col2)) -
(utl_match.edit_distance(col2, col1))) * 100) /
greatest(length(col1), length(col2)), 2) as "%"
from t1
result:
%
----------
70.58
Addendum
As #jonearles correctly pointed out, it is much simpler to use edit_distance_similarity function of utl_match package.
with t1(col1, col2) as(
select 'This is my first assignment in SQL', 'My first assignment in SQL ' from dual
)
select utl_match.edit_distance_similarity(col1, col2) as "%"
from t1
;
Result:
%
----------
71

Preserving NULL values in a Double Variable

I'm working on a vb.net application which imports from an Excel spreadsheet.
If rdr.HasRows Then
Do While rdr.Read()
If rdr.GetValue(0).Equals(System.DBNull.Value) Then
Return Nothing
Else
Return rdr.GetValue(0)
End If
Loop
Else
I was using string value to store the double values and when preparing the database statement I'd use this code:
If (LastDayAverage = Nothing) Then
command.Parameters.AddWithValue("#WF_LAST_DAY_TAG", System.DBNull.Value)
Else
command.Parameters.AddWithValue("#WF_LAST_DAY_TAG", Convert.ToDecimal(LastDayAverage))
End If
I now have some data with quite a few decimal places and the data was put into the string variable in scientific notation, so this seems to be the wrong approach. It didn't seem right using the string variable to begin with.
If I use a double or decimal type variable, the blank excel values come across as 0.0.
How can I preserve the blank values?
Note: I have tried
Variable as Nullabe(Of Double)
But when passing the value to the SQL insert I get: "Nullable object must have a value."
Solution:
Fixed by changing the datatype of the parameter in the sub I was calling and then using Variable.HasValue to do the conditional DBNull insert.
I don't know which API you're using to do database inserts, but with many of them, including ADO.NET, the proper way to insert nulls is to use DBNull.Value. So my recommendation is that you use Nullable(Of Double) in your VB code, but then when it comes time to do the insert, you'd substitute any null values with DBNull.Value.
You need the question mark ? to let Double (or any value type) can store null (or Nothing). E.g.:
Dim num as Double? = Nothing
Note the ? mark.
To store in the db:
If num Is Nothing Then
... System.DBNull.Value ...
Else
... num ...
End If
or better:
If num.HasValue Then
... System.DBNull.Value ...
Else
... num.Value ...
End If
I am posting an article HERE while I keep looking for how to acheive a solution to your situation, but the article might have another solution which is to remove the null values, and add a default value. If I find anything else I will post it.
When you set up a database (at least
in MS SQL Server) you can flag a field
as allowing NULL values and which
default values to take. If you look
through people's DB structures, you'll
see that a lot of people allow NULL
values in their database. This is a
very bad idea. I would recommend never
allowing NULL values unless the field
can logically have a NULL value (and
even this I find this only really
happens in DATE/TIME fields).
NULL values cause several problems. For starters, NULL values
are not the same as data values. A
NULL value is basically an undefined
values. On the ColdFusion end, this is
not terrible as NULL values come
across as empty strings (for the most
part). But in SQL, NULL and empty
string are very different and act very
differently. Take the following data
table for example:
id name
---------------
1 Ben
2 Jim
3 Simon
4 <NULL>
5 <NULL>
6 Ye
7
8
9 Dave
10
This table has some empty strings (id: 7, 8, 10) and some NULL values
(id: 4, 5). To see how these behave
differently, look at the following
query where we are trying to find the
number of fields that do not have
values:
Launch code in new window » Download code as text file »
* SELECT
* (
* SELECT
* COUNT( * )
* FROM
* test t
* WHERE
* LEN( t.name ) = 0
* ) AS len_count,
* (
* SELECT
* COUNT( * )
* FROM
* test t
* WHERE
* t.name IS NULL
* ) AS null_count,
* (
* SELECT
* COUNT( * )
* FROM
* test t
* WHERE
* t.name NOT LIKE '_%'
* ) AS like_count,
* (
* SELECT
* COUNT( * )
* FROM
* test t
* WHERE
* t.name IS NULL
* OR
* t.name NOT LIKE '_%'
* ) AS combo_count
This returns the following record:
LEN Count: 3
NULL Count: 2
LIKE Count: 3
Combo Count: 5
We were looking for 5 as records 4, 5, 7, 8, and 10 do not have values
in them. However, you can see that
only one attempt returned 5. This is
because while a NULL value does NOT
have a length, it is not a data type
that makes sense with length. How can
nothing have or not have a length?
It's like asking "What does that math
equation smell like?" You can't make
comparisons like that.
So, allowing NULL values makes you work extra hard to get the kind of
data you are looking for. From a
related angle, allowing NULL values
reduces your convictions about the
data in your database. You can never
quite be sure if a value exists or
not. Does that make you feel safe and
comfortable when programming?
Furthermore, while running LEN() on a NULL value doesn't act as you
might think it to, it also does NOT
throw an error. This will make
debugging your code even harder if you
do not understand the difference
between NULL values and data values.
Bottom line: DO NOT ALLOW NULL VALUES unless absolutely necessary.
You will only be making things harder
for yourself.