SQL Server Multiple Likes - sql

I have an unusual question that seems simple but has me stumped in a SQL Server stored procedure.
I have 2 tables as described below.
tblMaster
ID, CommitDate, SubUser, OrigFileName
Sample data
ID CommitDate SubUser OrigFileName
----------------------------------------
1 2014-10-07 Test1 Test1.pdf
2 2014-10-08 Test2 Test2.pdf
3 2014-10-09 Test3 Test3.pdf
The above table is basically the header table that tracks the committed files. In addition to this, we have a details table with the following structure.
tblIndex
ID, FileID (Linking column to the header row described above), Word
Sample data:
1. 1, 1, Oil
2. 2, 1, oil
3. 3, 2, oil
4. 4, 2, tank
5. 5, 3, tank
The above rows represent the words that we want to search on and if a certain criteria matches return the corresponding filename/header row ID. What I would love to figure out to do is if I do a search for
One word (i.e. "oil"), then the system should respond with all the files that meet the criteria (easiest case and figured out)
If more than one word is searched for (i.e. "oil" and "tank"), then we should only see the second file since it is the only one that has both oil and tank as its key words.
Tried using a LIKE "%oil%" AND LIKE "%tank%" and that resulted in no rows being created since one value can't be both oil and tank.
Tried doing a LIKE "%oil%" OR LIKE "%tank%" but I get files 1, 2, and 3 since the OR is inclusive of all the other rows.
One last thing, I recognize I could just do a search for the first term and then save the results into a temp table and then search for the second term in that second table and I will get what I am looking for. The problem with that is that I don't exactly know how many items will be searched for. I don't want to have to create a structure where I am constantly having to store data into another temp table if someone does a search for 6 "keywords".
Any help/ideas will be much appreciated.

try this ! slightly differing from the previous answer
SELECT distinct FileID,COUNT(distinct t.word) FROM tblIndex t
WHERE t.Word LIKE '%oil%' OR t.Word LIKE '%tank%'
GROUP BY FileID
HAVING COUNT(distinct t.word) > 1

One simple option would be to do something like this :
SELECT FileID
FROM tblIndex t
WHERE t.Word LIKE '%oil%' OR t.Word LIKE '%tank%'
GROUP BY FileID
HAVING COUNT(*) > 1
This assume you do not have duplicate in your tblIndex.
I'm also unsure whether you really need the like or not. According to your sample data you don't, a basic comparison would be way more efficient and would avoid possible collisions.

Related

How can I extract comma delimited values from one column and put them each in separate column in Google Data Studio?

Update: 1,2,3 are just examples it can also be 4,24,53
I have the following setup:
I store Data in BigQuery and use BigQuery as data source for my Data Studio project.
I have a column called Alarms and the data inside that column is as follow: it can be empty or 1 or 1,2 or 1,2,3 or 5,43,60 and so on. If it's empty or has 1 value then there is nothing to worry about, but if there are 2 or more values I have to do something.
name
Alarm
Mark
John
1
Eddie
1,2
Peter
1,2,3
What I need is to be able to put every value in a separate column or create a dropdown or something.
For example something like the table below or two drop down menus one to select the name and the other shows the alarms. (I prefer the drop downs).
name
Alarm
Mark
John
1
Eddie
1
2
Peter
1
2
3
Here I select Peter and the alarm drop down shows 3 alarms. or for Eddie it just shows 2 alarms and so on.
I read something about regex but I don't really understand how to put it to the test.
I found this online: (.+?)(?:,|$) but I don't know how to capture the output.
What I need is to be able to put every value in a separate column
Consider below approach
select * from (
select * except(alarm)
from your_table,
unnest(split(alarm)) flag with offset
)
pivot (min(flag) as alarm for offset in (0,1,2,3,4))
If applied to sample data in your question -output is

How to match entires in SQL based on their ending letter?

So I'm trying to match entries in two databases so in the new table the row is comprised of two words that end in the same ending letter. I'm working with two tables that have one column in each of them, each named word. table 1 contains the following data in order: Dog, High, It, Weeks, while table two contains the data: Bat, Is, Laugh, Sing. I need to select from both of these tables and match the words so that each row is as follows: Dog | Sing, High | Laugh, It | Bat, Weeks | Is
The screenshot is what I have so far for my SQL statement. I'm still early on in learning SQL so any info to help on this would be appreciated.
Recommend reading up on SUBSTR() for more information on why the below code works: https://docs.oracle.com/cd/B28359_01/olap.111/b28126/dml_functions_2101.htm#OLADM679
SELECT
a.word
, b.word
FROM sec1313_words1 a
JOIN sec1313_words2 b
ON SUBSTR(b.word, -1) = SUBSTR(a.word, -1)
ORDER BY a.word

<SQL>Extract data from table which is txt format (contain 1 fields only)

In our exsiting system we have a table to store the reported user info (let's say UserInfoRaw), this table contains 1 fields (detail) only , the sample data would be sth like below:
^StartNewUser
^UserName
Simon
^EnableFacebook
Y
^EnableTwitter
N
^EndNewUser
^StartNewUser
^UserName
Vicky
^EnableFacebook
N
^EndNewUser
Currently we need to convert this format to a query-able table, lets say "User-info" which contain below 3 fields , the output should be
UserName facebook twitter
==================================================
Simon Y N
Vicky N N
The Constraint is
1. I do know the tage fields i need to extract (said , ^EnableFacebook is a tag name that i know , can use for selection)
2. We're extract by user level, for each user they MUST have ^StartNewUser/^EndNewUser in the txt , this is pre-assumption.
3. The attribute tag may not exist for some cases (eg , Vicky's ^EnableTwitter tag) , it should treat this field as N while extraction.
4. We can use pure SQL here only as this is an interim solution for our MI Team , they can run SQL only and currently we can't do any program release to automate this process at this moment.
Currently we have come up a solution by RRN but need many interim table
1: Produce OUT1 which include the row number for start/end for each user
SELECT RRN(A) ,detail from UserInfoRaw A where A.detail in ('^StartNewUser' , '^EndNewUser')
OUT1 :
1 ^StartNewUser
8 ^EndNewUser
9 ^StartNewUser
14 ^EndNewUser
2: Produce OUT2 which include the row number for user name
Select RRN(A) ,detail from UserInfoRaw A where RRN(A) IN
(select RRN(B)+1 from UserInfoRaw B where B.detail = '^UserName')
OUT2 :
3 Simon
11 Vicky
3: Produce JOIN12 which include the row mapping for ^StartNewUser / ^UserName
Select MAX(A.row) as startRow , B.row as nameRow from OUT1 A,OUT2 B
where A.detail = '^StartNewUser' AND A.row <B.row
GROUP BY B.row
order by A.row
JOIN12 :
1 3
9 11
4: Join 3 table by the row of ^startNewUser to get the 1 field mapping
Select C.startRow ,A.detail , C.nameRow,B.detail from OUT1 A, OUT2 B, JOIN12 C where A.row=C.startRow and B.row=C.nameRow
Result :
1 ^startNewUser 3 Simon
9 ^startNewUser 11 Vicky
By this approach we can produce a 1 field mapping , and using similar step we can get all the result field table we want.
But we have 10+ fields to extract (mayeb more if business request) , so we're seek help here to see if we have better idea for this case. Thanks!
(ps: if you're a AS400 guy and you know how to produce result by wrkqry that would be the best :) you know what MI team im referring to... really mess..)
By your own admission, you have a text file, not a table.
SQL wasn't designed to deal with files (text or otherwise), it was designed to deal with databases, containing tables and relations.
Therefore, don't use SQL statements for this, process it as a text file. It's going to be faster and simpler. My default assumption is that a traditional RPGLE program doing READ is going to beat any attempt to do this sort of processing in SQL, simply because this is the type of workload it was designed for. Or use any other language that can process files.
(It would be easier to do this in SQL with a stored procedure and firing up a cursor, but it would be unwieldy at best, because SQL lacks many of the features makes doing this in a 'normal' programming language so much easier, like local, private functions)
tl;dr
I have a hammer, what's the best way to use it to cut a board in half?

Storing data about objects with a variable number of ordered subparts in Access Database

The situation: I have a database storing biological specimen data. One table contains data about each specimen. Each specimen has between 1 and 8 parts, which are ordered.
I would like to enumerate each subpart in a query, using the specimen id and the number of parts. So if I have 2 specimens, A and B, and A has 2 parts and B has 3 parts, I want the result:
Parts:
A - 1
A - 2
B - 1
B - 2
B - 3
I realize that this is probably a trivial task, but I don't know the correct terminology to talk about it in a way that help pages and Google will understand. Thank you.
Edit to add thoughts: If I were dealing with something like this in a non-SQL context, I'd use a for loop to iterate the enumeration process over each specimen, but I don't understand how to implement anything remotely similar in SQL.
You mentioned "main table" which implies there's some other table for the sub parts. What you're after is likely a simple JOIN:
SELECT
*
FROM
maintable
INNER JOIN
subtable
ON
subtable.mainid = maintable.id
If you want an exact query, post a screenshot of your database tables and their column names and any relationships

SQL Query with multiple values in one column

I've been beating my head on the desk trying to figure this one out. I have a table that stores job information, and reasons for a job not being completed. The reasons are numeric,01,02,03,etc. You can have two reasons for a pending job. If you select two reasons, they are stored in the same column, separated by a comma. This is an example from the JOBID table:
Job_Number User_Assigned PendingInfo
1 user1 01,02
There is another table named Pending, that stores what those values actually represent. 01=Not enough info, 02=Not enough time, 03=Waiting Review. Example:
Pending_Num PendingWord
01 Not Enough Info
02 Not Enough Time
What I'm trying to do is query the database to give me all the job numbers, users, pendinginfo, and pending reason. I can break out the first value, but can't figure out how to do the second. What my limited skills have so far:
select Job_number,user_assigned,SUBSTRING(pendinginfo,0,3),pendingword
from jobid,pending
where
SUBSTRING(pendinginfo,0,3)=pending.pending_num and
pendinginfo!='00,00' and
pendinginfo!='NULL'
What I would like to see for this example would be:
Job_Number User_Assigned PendingInfo PendingWord PendingInfo PendingWord
1 User1 01 Not Enough Info 02 Not Enough Time
Thanks in advance
You really shouldn't store multiple items in one column if your SQL is ever going to want to process them individually. The "SQL gymnastics" you have to perform in those cases are both ugly hacks and performance degraders.
The ideal solution is to split the individual items into separate columns and, for 3NF, move those columns to a separate table as rows if you really want to do it properly (but baby steps are probably okay if you're sure there will never be more than two reasons in the short-medium term).
Then your queries will be both simpler and faster.
However, if that's not an option, you can use the afore-mentioned SQL gymnastics to do something like:
where find ( ',' |fld| ',', ',02,' ) > 0
assuming your SQL dialect has a string search function (find in this case, but I think charindex for SQLServer).
This will ensure all sub-columns begin and start with a comma (comma plus field plus comma) and look for a specific desired value (with the commas on either side to ensure it's a full sub-column match).
If you can't control what the application puts in that column, I would opt for the DBA solution - DBA solutions are defined as those a DBA has to do to work around the inadequacies of their users :-).
Create two new columns in that table and make an insert/update trigger which will populate them with the two reasons that a user puts into the original column.
Then query those two new columns for specific values rather than trying to split apart the old column.
This means that the cost of splitting is only on row insert/update, not on _every single select`, amortising that cost efficiently.
Still, my answer is to re-do the schema. That will be the best way in the long term in terms of speed, readable queries and maintainability.
I hope you are just maintaining the code and it's not a brand new implementation.
Please consider to use a different approach using a support table like this:
JOBS TABLE
jobID | userID
--------------
1 | user13
2 | user32
3 | user44
--------------
PENDING TABLE
pendingID | pendingText
---------------------------
01 | Not Enough Info
02 | Not Enough Time
---------------------------
JOB_PENDING TABLE
jobID | pendingID
-----------------
1 | 01
1 | 02
2 | 01
3 | 03
3 | 01
-----------------
You can easily query this tables using JOIN or subqueries.
If you need retro-compatibility on your software you can add a view to reach this goal.
I have a tables like:
Events
---------
eventId int
eventTypeIds nvarchar(50)
...
EventTypes
--------------
eventTypeId
Description
...
Each Event can have multiple eventtypes specified.
All I do is write 2 procedures in my site code, not SQL code
One procedure converts the table field (eventTypeIds) value like "3,4,15,6" into a ViewState array, so I can use it any where in code.
This procedure does the opposite it collects any options your checked and converts it in
If changing the schema is an option (which it probably should be) shouldn't you implement a many-to-many relationship here so that you have a bridging table between the two items? That way, you would store the number and its wording in one table, jobs in another, and "failure reasons for jobs" in the bridging table...
Have a look at a similar question I answered here
;WITH Numbers AS
(
SELECT ROW_NUMBER() OVER(ORDER BY (SELECT 0)) AS N
FROM JobId
),
Split AS
(
SELECT JOB_NUMBER, USER_ASSIGNED, SUBSTRING(PENDING_INFO, Numbers.N, CHARINDEX(',', PENDING_INFO + ',', Numbers.N) - Numbers.N) AS PENDING_NUM
FROM JobId
JOIN Numbers ON Numbers.N <= DATALENGTH(PENDING_INFO) + 1
AND SUBSTRING(',' + PENDING_INFO, Numbers.N, 1) = ','
)
SELECT *
FROM Split JOIN Pending ON Split.PENDING_NUM = Pending.PENDING_NUM
The basic idea is that you have to multiply each row as many times as there are PENDING_NUMs. Then, extract the appropriate part of the string
While I agree with DBA perspective not to store multiple values in a single field it is doable, as bellow, practical for application logic and some performance issues. Let say you have 10000 user groups, each having average 1000 members. You may want to have a table user_groups with columns such as groupID and membersID. Your membersID column could be populated like this:
(',10,2001,20003,333,4520,') each number being a memberID, all separated with a comma. Add also a comma at the start and end of the data. Then your select would use like '%,someID,%'.
If you can not change your data ('01,02,03') or similar, let say you want rows containing 01 you still can use " select ... LIKE '01,%' OR '%,01' OR '%,01,%' " which will insure it match if at start, end or inside, while avoiding similar number (ie:101).