What does CAST(SUBSTRING()AS INTEGER)=x do? - sql

I am having an exam on SQL which I have rarely ever worked on. While going through the study material I encountered this example:
DELETE FROM table_name WHERE
CAST (SUBSTRING (attribute_name from x for y) AS INTEGER) =z;
Now, I am guessing that this would delete a specific line, where an attribute name would be specified in the code, but am unsure.

The substring() function extracts part of string (from the column attribute_name) from the position numbered x for y characters (or until position x + y - 1). For instance 1 to 3 would be the first three characters.
This is then converted to an integer and compared to another value.
Rows where the comparison returns "true" are deleted.

Related

How do I calculate the sum efficiently?

Given an integer n such that (1<=n<=10^18)
We need to calculate f(1)+f(2)+f(3)+f(4)+....+f(n).
f(x) is given as :-
Say, x = 1112222333,
then f(x)=1002000300.
Whenever we see a contiguous subsequence of same numbers, we replace it with the first number and zeroes all behind it.
Formally, f(x) = Sum over all (first element of the contiguous subsequence * 10^i ), where i is the index of first element from left of a particular contiguous subsequence.
f(x)=1*10^9 + 2*10^6 + 3*10^2 = 1002000300.
In, x=1112222333,
Element at index '9':-1
and so on...
We follow zero based indexing :-)
For, x=1234.
Element at index-'0':-4,element at index -'1':3,element at index '2':-2,element at index 3:-1
How to calculate f(1)+f(2)+f(3)+....+f(n)?
I want to generate an algorithm which calculates this sum efficiently.
There is nothing to calculate.
Multiplying each position in the array od numbers will yeild thebsame number.
So all you want to do is end up with 0s on a repeated number
IE lets populate some static values in an array in psuedo code
$As[1]='0'
$As[2]='00'
$As[3]='000'
...etc
$As[18]='000000000000000000'```
these are the "results" of 10^index
Given a value n of `1234`
```1&000 + 2&00 +3 & 0 + 4```
Results in `1234`
So, if you are putting this on a chip, then probably your most efficient method is to do a bitwise XOR between each register and the next up the line as a single operation
Then you will have 0s in all the spots you care about, and just retrive the values in the registers with a 1
In code, I think it would be most efficient to do the following
```$n = arbitrary value 11223334
$x=$n*10
$zeros=($x-$n)/10```
Okay yeah we can just do bit shifting to get a value like 100200300400 etc.
To approach this problem, it could help to begin with one digit numbers and see what sum you get.
I mean like this:
Let's say, we define , then we have:
F(1)= 45 # =10*9/2 by Euler's sum formula
F(2)= F(1)*9 + F(1)*100 # F(1)*9 is the part that comes from the last digit
# because for each of the 10 possible digits in the
# first position, we have 9 digits in the last
# because both can't be equal and so one out of ten
# becomse zero. F(1)*100 comes from the leading digit
# which is multiplied by 100 (10 because we add the
# second digit and another factor of 10 because we
# get the digit ten times in that position)
If you now continue with this scheme, for k>=1 in general you get
F(k+1)= F(k)*100+10^(k-1)*45*9
The rest is probably straightforward.
Can you tell me, which Hackerrank task this is? I guess one of the Project Euler tasks right?

search any word inside a string in million rows

I have a set of 50k values say X. each value i want to compare with a set of 10k values say Y. if X is present any where in the string Y it matches.
So each value in X i want to check across each value in Y and assign X if it matches.
what would be the best method to complete this task. It is required for a data mining project.
I loaded the data into MS Access database.
then using a vba program
take each X . Update Y if it matches (Like '%X%') but it is a never ending process. The columns are indexed but no effect.
Is there any algorithm or steps to reduce it into step-by-step process and complete the mapping faster?
Please let me know if there is any other options available other than the answers given below. I ll explain the scenario bit more
Table1.Data
sentense1
sentense2
sentense3
sentense4
sentense5
sentense6
-
-
-
Sentense100k
Table2.Phrase (Means multiple words)
Phrase1
Phrase2
Phrase3
Phrase4
Phrase5
-
-
-
Phrase 100k
Want to check Phrase1 has any Match in Sentense1 to Sentense100k Exact Match of Phrase, anywhere Match of Phrase, Maximum Words in Phrase1 Match in Sentense etc.. and create a map based on best Match(ideally exact phrase available anywhere in the sentense)
Table3 Output
Data Best Possible Phrase Second Best Phrase(Optional)
Sentense1 Phrase1000 Phrase50k
Sentense2 Phrase10 Phrase70k
Please let me know any tool,logic to perform this. The logic what i tried in SQL
1.
Select A.Data,B.Phrase from Table1 A left join Table2 B on A.Data Like '%' + B.Phrase + '%'
2.
Check for any word in phrase available in sentense. So replaced all spaces with % like word1%word2%word3. then did query as
A.Data Like '%' + B.Phrase + '%' which is
A.Data Like '%word1%word2%word3%'
But it takes days to complete the task for this much data.
Any readily usable tools, indexing methods,queries would really help. The answers given below seems too technical for me to adapt. Please guide
You can build a suffix tree in linear time (you can look up suffix trees online), out of the concatenation of all strings in X and Y, with special unique symbols that end each string.
Then for each string Xi in X, you look it up in the suffix tree (linear time in length of Xi) and assign Xi to each string in Y that is somewhere in the subtree rooted at the end of Xi.
This is linear time in the number of strings in Y that Xi is assigned to.
Thus you get an optimal O(N + k) time algorithm, where:
N is the total length of all the strings in X and Y,
and k is the total number of matches between query strings in X and target strings in Y.

Need Explanation of VB.Net Code

In the below example code from a tutorial, I understand how the majority of it works. There are a few things I do not understand, though. Please help me to understand the purpose of some of these characters and the purpose of them. I will explain below what I don't understand.
Module paramByref
Sub swap(ByRef x As Integer, ByRef y As Integer)
Dim temp As Integer
temp = x ' save the value of x
x = y ' put y into x
y = temp 'put temp into y
End Sub
Sub Main()
' local variable definition
Dim a As Integer = 100
Dim b As Integer = 200
Console.WriteLine("Before swap, value of a : {0}", a)
Console.WriteLine("Before swap, value of b : {0}", b)
' calling a function to swap the values '
swap(a, b)
Console.WriteLine("After swap, value of a : {0}", a)
Console.WriteLine("After swap, value of b : {0}", b)
Console.ReadLine()
End Sub
End Module
What I don't understand is why the need for x, y, and temp. why not just keep declaration of a and b and swap the values?
Also, on the Console.WriteLine, I understand the {0} is an nth reference or index position, but the comma suggests to output something else. Is it saying to output the value of a in the 0 index? If so, there can only be one zero index, so why does the next line reference the value of the zero index? Or am I all wrong on this? Any explanation would be greatly appreciated...please dumb your answer down for this newbie.
The reason for x, y, and temp is to demonstrate how to do swapping. Your question asks
why not just keep declaration of a and b and swap the values
In this situation where you aren't doing any incrementing, sure you could just afterward say b = 100 and a = 200 but if you were doing this say in a loop and constantly swapping them, if you were to do
a = b
b = a
you would have a=100, b=100 because you set the value of b = a which is 100, then set a as the same value so it is still 100. That's why you need the temp value.
With regards to the console.writeLine, you are absolutely correct that it is indexing the 0th index, so in the first example, it is a, however it can be reused in the next line, because it is a totally separate line of code. Each of those console.writeLines can exist independently, so the values are indexed on a line by line basis
You need a temp to store the value from the first var so that you do not lose it when you overwrite it with the value from the second var. Once the second var's value has been transferred into the first var, the value stored in temp (from the original first var) can be transferred into the second var (overwriting the original value).
This is a common method when creating custom sorting routines that shuffle the values from various positions and ranks in an array around.
a and b are variables that are only visible in method Main(). They cannot be accessed from a different method. On the other hand, x, y, and temp are variables that are only visible in method swap(). More precisely, x and y are the method's parameters. The keyword ByRef means that the parameters are references to variables. I.e. when you call swap(a, b), x becomes a reference to variable a and y becomes a reference to variable b. You can work with references like with other variables. The difference is that when you change x in swap(), a in Main will change accordingly.
The Console.WriteLine() method takes a string parameter and an arbitrary number of additional parameters. The {0} is an index into the list of those additional parameters. E.g. Console.WriteLine("{0} {1} {2}", 100, 200, 300) would output 100 200 300. and Console.WriteLine("{2} {1} {0}", 100, 200, 300) would output 300 200 100.

Number of possible binary strings of length k

One of my friends was asked this question recently:
You have to count how many binary strings are possible of length "K".
Constraint: Every 0 has a 1 in its immediate left.
This question can be reworded:
How many binary sequences of length K are posible if there are no two consecutive 0s, but the first element should be 1 (else the constrains fails). Let us forget about the first element (we can do it bcause it is always fixed).
Then we got a very famous task that sounds like this: "What is the number of binary sequences of length K-1 that have no consecutive 0's." The explanation can be found, for example, here
Then the answer will be F(K+1) where F(K) is the K`th fibonacci number starting from (1 1 2 ...).
∑ From n=0 to ⌊K/2⌋ of (K-n)Cn; n is the number of zeros in the string
The idea is to group every 0 with a 1 and find the number of combinations of the string, for n zeros there will be n ones grouped to them so the string becomes (k-n) elements long. There can be no more than of K/2 zeros as there would not have enough ones to be to the immediate left of each zero.
E.g. 111111[10][10]1[10] for K = 13, n = 3

How to choose a range for a loop based upon the answers of a previous loop?

I'm sorry the title is so confusingly worded, but it's hard to condense this problem down to a few words.
I'm trying to find the minimum value of a specific equation. At first I'm looping through the equation, which for our purposes here can be something like y = .245x^3-.67x^2+5x+12. I want to design a loop where the "steps" through the loop get smaller and smaller.
For example, the first time it loops through, it uses a step of 1. I will get about 30 values. What I need help on is how do I Use the three smallest values I receive from this first loop?
Here's an example of the values I might get from the first loop: (I should note this isn't supposed to be actual code at all. It's just a brief description of what's happening)
loop from x = 1 to 8 with step 1
results:
x = 1 -> y = 30
x = 2 -> y = 28
x = 3 -> y = 25
x = 4 -> y = 21
x = 5 -> y = 18
x = 6 -> y = 22
x = 7 -> y = 27
x = 8 -> y = 33
I want something that can detect the lowest three values and create a loop. From theses results, the values of x that get the smallest three results for y are x = 4, 5, and 6.
So my "guess" at this point would be x = 5. To get a better "guess" I'd like a loop that now does:
loop from x = 4 to x = 6 with step .5
I could keep this pattern going until I get an absurdly accurate guess for the minimum value of x.
Does anybody know of a way I can do this? I know the values I'm going to get are going to be able to be modeled by a parabola opening up, so this format will definitely work. I was thinking that the values could be put into a column. It wouldn't be hard to make something that returns the smallest value for y in that column, and the corresponding x-value.
If I'm being too vague, just let me know, and I can answer any questions you might have.
nice question. Here's at least a start for what I think you should do for this:
Sub findMin()
Dim lowest As Integer
Dim middle As Integer
Dim highest As Integer
lowest = 999
middle = 999
hightest = 999
Dim i As Integer
i = 1
Do While i < 9
If (retVal(i) < retVal(lowest)) Then
highest = middle
middle = lowest
lowest = i
Else
If (retVal(i) < retVal(middle)) Then
highest = middle
middle = i
Else
If (retVal(i) < retVal(highest)) Then
highest = i
End If
End If
End If
i = i + 1
Loop
End Sub
Function retVal(num As Integer) As Double
retVal = 0.245 * Math.Sqr(num) * num - 0.67 * Math.Sqr(num) + 5 * num + 12
End Function
What I've done here is set three Integers as your three Min values: lowest, middle, and highest. You loop through the values you're plugging into the formula (here, the retVal function) and comparing the return value of retVal (hence the name) to the values of retVal(lowest), retVal(middle), and retVal(highest), replacing them as necessary. I'm just beginning with VBA so what I've done likely isn't very elegant, but it does at least identify the Integers that result in the lowest values of the function. You may have to play around with the values of lowest, middle, and highest a bit to make it work. I know this isn't EXACTLY what you're looking for, but it's something along the lines of what I think you should do.
There is no trivial way to approach this unless the problem domain is narrowed.
The example polynomial given in fact has no minimum, which is readily determined by observing y'>0 (hence, y is always increasing WRT x).
Given the wide interpretation of
[an] equation, which for our purposes here can be something like y =
.245x^3-.67x^2+5x+12
many conditions need to be checked, even assuming the domain is limited to polynomials.
The polynomial order is significant, and the order determines what conditions are necessary to check for how many solutions are possible, or whether any solution is possible at all.
Without taking this complexity into account, an iterative approach could yield an incorrect solution due to underflow error, or an unfortunate choice of iteration steps or bounds.
I'm not trying to be hard here, I think your idea is neat. In practice it is more complicated than you think.