I have 3 tables:
Silk_Skey Name
1 Black White Checks Yellow Arms
2 Black Crimson Stripes
3 Crimson Yellow Stripes
Sub Colour Major Colour
Black Black
White White
Yellow Yellow
Crimson Red
MajorColour_Skey Major Colour
1 Black
2 White
3 Yellow
4 Red
And I want to achieve this:
ID Silk_Skey MajorColour_Skey
1 1 1
2 1 2
3 1 3
4 2 1
5 2 4
6 3 3
7 3 4
What I need to do is create a linked table matching all the colours from the 3 tables and break down the silks names so I would show 4 lines in the new table) see SQL below. My boss has advised me to use a 'IS IN' query but I have no idea what that is can you help?
SELECT s.Silks_Skey, mc.MajorColour_Skey
FROM Silks s INNER JOIN SubColour sc on sc.SubColour **'IS IN HERE'** s.SilksName
INNER JOIN MajorColour mc
ON sc.MajorColour = mc.MajorColour
You can use IN
AND table.column IN ('a','b','c')
or
AND table.column IN (1,2,3)
or if you're looking for a string like something you can do
AND table.column LIKE '%word' -- table.column ends with 'word'
AND table.column LIKE 'word%' -- table.column starts with 'word'
AND table.column LIKE '%word%' -- table.column has 'word' anywhere in the column
This is a design doomed to poor performance and awkward and painful to write queries. If your database will never be large, then it may be workable, but if it will be large, you cannot use this design structure and hope to have good performance because you will not be able to properly use indexes. Personally I would add a silk colors table related to the silks table and store the colors indivudally. One of the first rules of database design is never store more than one piece of informatino in a field. You are storing a list which always means you need a related table to have effective use of the database.
One clue to a bad (and over time usually unworkable)database design is if you need to join using functions or caluations of any type or if you need to use wildcards at the start of a phrase in a like clause. Fix this now and things will be much smoother, maintenance will take less time and performacne will be better. There is no upside to your current structure at all.
You may need to take a bit of extra time to parse and store the silk names by individual color, but the time you save in querying the database will be significant becasue you can now make use of a join and then use indexes. Search for fn_split and you will see a method of spliting the silk names into individual colors that you can use when you insert the records.
If you foolishly decide to retain the current structure, then look into using fuilltext search. It wil be faster than using a like clause with a wildcard as the first character.
For what you want to do, you need to do string manipulation because you are trying to compare one color to a list of colors in a string.
The like operator can do this. Try this on clause:
on ' '+ s.SilksName +' ' like '% '+sc.SubColour+' %'
This checks to see if a given color (sc.SubColour) in in the list (s.SilksName). For instance, if you have a list like 'RED GREEN' this will match either '%RED%' or '%GREEN%'.
The purpose of concatenating white space is to avoid partial-word matches. For instance, "blue-green" would match both "blue" and "green" without the delimiters.
The following query returns 7 rows, which seems to be correct (3 for the first row in silks and 2 for each of the other two):
with silks as (
select 1 as silks_skey, 'Black White Checks Yellow Arms' as silksname union all
select 2, 'Black Crimson Stripes' union all
select 3, 'Crimson Yellow Stripes'
),
subcolour as (
select 'black' as subcolour, 'black' as majorcolour union all
select 'white', 'white' union all
select 'yellow', 'yellow' union all
select 'crimson', 'red'
),
MajorColour as (
select 1 as MajorColour_skey, 'black' as MajorColour union all
select 2, 'white' union all
select 3, 'yellow' union all
select 4, 'red'
)
SELECT s.Silks_Skey, mc.MajorColour_Skey
FROM Silks s INNER JOIN SubColour sc on ' ' + s.SilksName + ' ' like '% ' + sc.SubColour + ' %'
INNER JOIN MajorColour mc
ON sc.MajorColour = mc.MajorColour
Sounds like what you really want to do is split the Name field on spaces and then for each one of those values which is contained in the colours table (joined on the sub-colour given that major colours are valid sub-colours too) you want one entry in a new table. Problem is that there is no intrinsic T-SQL function for splitting strings. To do that your best bet is to visit Erland Sommarskog's definitive answer on how to do this.
An alternative, and one which is not very neat and may or may not work, is to use the CONTAINS keyword in your predicate. However in order to achieve this you need to use full text indexing
and I suspect using Erland's excellent giudes on splitting strings and arrays in SQL will be more appropriate and faster.
This is the answer folks, thanks for all your ideas.
Select S.[Silks_Skey], MC.[MajorColour_Skey]
from [dbo].[Silks] S
inner join [dbo].[SubColour] SC on CHARINDEX(SC.[SubColour],S.[SilksName]) <> 0
inner join [dbo].[MajorColour] MC on SC.[MajorColour] = MC.[MajorColour]
UNION ALL
Select S.[Silks_Skey], MC.[MajorColour_Skey]
from [dbo].[Silks] S
inner join [dbo].[MajorColour] MC on CHARINDEX(MC.[MajorColour],S.[SilksName]) <> 0
ORDER BY S.[Silks_Skey]
Related
I am trying to make a query from a table in Access that would give me totals for different types of product based off of 2 categories, all within one query. For example my Table looks as follows:
Type
Description 1
Description 2
Date
New
Shiny
Black
1/1/2022
New
Black
Dull
1/1/2022
Old
Shiny
Grey
1/1/2022
Old
Grey
Dull
1/1/2022
The query results that I want to receive are as follows:
Description
New
Old
Shiny
1
1
Black
2
0
Dull
1
1
Grey
0
2
The dataset that I am working with isn't as clean as my example shown here and is causing some of the issues. I never had an issue with the code running, but I just felt that there had to be an easier way that I was missing.
They way I was doing it originally just turned into a bunch of separate query's and was messy to get around. I essentially wrote a query to separate the table into new and old types. From there I used a bunch of
SUM(IIF( Description 1 = "x" OR Description 2 = "x") AS X
SUM(IIF( Description 1 = "y" OR Description 2 = "y") AS Y
expressions to count my totals for each of the objects. This would give me a query where all the totals were displayed in columns. Then I created a separate query to join these data sets together into a presentable manner, but it was turning into too much for how many different "types" I had.
I was just looking for a way to combine all of this into 1 query that would make pulling reports much easier.
Strongly advise not to use space in naming convention nor reserved words as names. Date is a reserved word.
Consider:
Query1
SELECT Type, Description1 AS D, [Date], 1 AS Category FROM Table1
UNION SELECT Type, Description2, [Date], 2 FROM Table1;
UNION will not allow duplicate rows. Use UNION ALL to include all records, even if there are duplicates. There is no query designer or wizard for UNION - must type or copy/paste in SQLView of query builder.
Query2
TRANSFORM Nz(Count(Query1.Category),0) AS CountOfCategory
SELECT Query1.D
FROM Query1
GROUP BY Query1.D
PIVOT Query1.Type;
I need some help on this one. I have a query that I need to make work but I need to limit it by the results of another query.
SELECT ItemID, ItemNums
FROM dbo.Tables
ItemNums is a varchar field that is used to store the strings of the various item numbers.
This produces the following.
ItemID ItemNums
1 1, 4, 5
2 1, 3, 4, 5
3 2
4 4
5 1
I have another table that has each item number as an INT that I need to use to pull all ItemIDs that have the associated ItemNums
Something like this.
SELECT *
FROM dbo.Tables
WHERE ItemNums IN (4,5)
Any help would be appreciated.
If possible, you should change your database schema. In general, it's not good to store comma delimited lists in a relational database.
However, if that's not an option, here's one way using a join with like:
select *
from dbo.Tables t
join dbo.SecondTable st on ', '+t.ItemNums+',' like '%, '+st.ItemNumId+',%'
This concatenates commas to the beginning and end of the itemnums to ensure you only match on the specific ids.
I personally would recommend normalizing your dbo.tables.
It would be better as:
ItemID ItemNums
1 1
1 4
1 5
2 1
etc.
Then you can use a join or a sub query to pull out the rows with ItemNums in some list.
Otherwise, it's going to be a mess and not very fast.
Let's say there is a table call ITEM and it contains 3 attributes(name, id, price):
name id price
Apple 1 3
Orange 1 3
Banana 2 4
Cherry 3 5
Mango 1 3
How should I write a query to use a constants selection operator to select those item that have same prices and same ids ? The first thing come into my mind is use a rename operator to rename id to id', and price to price', then union it with the ITEM table, but since I need to select 2 tuples (price=price' & id=id') from the table, how can I select them without using the conjunctions operator in relational algebra ?
Thank you.
I'm not quite sure but for me, it would be something like this in relational calculus:
and then in SQL:
SELECT name FROM ITEM i WHERE
EXISTS ITEM u
AND u.name != i.name
AND u.price=i.price
AND u.id = i.id
But still, I think your assumption is right, you can still do it by renaming. I do believe it is a bit longer than what I did above.
I am working in Teradata with some descriptive data that needs to be transformed from a gerneric varchar(60) into the different field lengths based on the type of data element and the attribute value. So I need to take whatever is in the Varchar(60) and based on field 'ABCD' act on field 'XYZ'. In this case XYZ is a varchar(3). To do this I am using CASE logic within my select. What I want to do is
eliminate all occurances of non alphabet/numeric data. All I want left are upper case Alpha chars and numbers.
In this case "Where abcd = 'GROUP' then xyz should come out as a '000', '002', 'A', 'C'
eliminate extra padding
Shift everything Right
abcd xyz
1 GROUP NULL
2 GROUP $
3 GROUP 000000000000000000000000000000000000000000000000000000000000
4 GROUP 000000000000000000000000000000000000000000000000000000000002
5 GROUP A
6 GROUP C
7 GROUP r
To do this I have tried TRIM and SUBSTR amongst several other things that did not work. I have pasted what I have working now, but I am not reliably working through the data within the select. I am really looking for some options on how to better work with strings in Teradata. I have been working out of the "SQL Functions, Operators, Expressions and Predicates" online PDF. Is there a better reference. We are on TD 13
SELECT abcd
, CASE
-- xxxxxxxxxxxxxxxxxxxxxxxxxxxxx
WHEN abcd= 'GROUP'
THEN(
CASE
WHEN SUBSTR(tx.abcd,60, 4) = 0
THEN (
SUBSTR(tx.abcd,60, 3)
)
ELSE
TRIM (TRAILING FROM tx.abcd)
END
)
END AS abcd
FROM db.descr tx
WHERE tx.abcd IS IN ( 'GROUP')
The end result should look like this
abcd xyz
1 GROUP 000
2 GROUP 002
3 GROUP A
4 GROUP C
I will have to deal with approx 60 different "abcd" types, but they should all conform to the type of data I am currently seeing.. ie.. mixed case, non numeric, non alphabet, padded, etc..
I know there is a better way, but I have come in several circles trying to figure this out over the weekend and need a little push in the right direction.
Thanks in advance,
Pat
The SQL below uses the CHARACTER_LENGTH function to first determine if there is a need to perform what amounts to a RIGHT(tx.xyz, 3) using the native functions in Teradata 13.x. I think this may accomplish what you are looking to do. I hope I have not misinterpreted your explanation:
SELECT CASE WHEN tx.abcd = 'GROUP'
AND CHARACTER_LENGTH(TRIM(BOTH FROM tx.xyz) > 3
THEN SUBSTRING(tx.xyz FROM (CHARACTER_LENGTH(TRIM(BOTH FROM tx.xyz)) - 3))
ELSE tx.abcd
END
FROM db.descr tx;
EDIT: Fixed parenthesis in SUBSTRING
I am trying to write some SQL that will accept a set of letters and return all of the possible words it can make. My first thought was to create a basic three table database like so:
Words -- contains 200k words in real life
------
1 | act
2 | cat
Letters -- contains the whole alphabet in real life
--------
1 | a
3 | c
20 | t
WordLetters --First column is the WordId and the second column is the LetterId
------------
1 | 1
1 | 3
1 | 20
2 | 3
2 | 1
2 | 20
But I'm a bit stuck on how I would write a query that returns words that have an entry in WordLetters for every letter passed in. It also needs to account for words that have two of the same letter. I started with this query, but it obviously does not work:
SELECT DISTINCT w.Word
FROM Words w
INNER JOIN WordLetters wl
ON wl.LetterId = 20 AND wl.LetterId = 3 AND wl.LetterId = 1
How would I write a query to return only words that contain all of the letters passed in and accounting for duplicate letters?
Other info:
My Word table contains close to 200,000 words which is why I am trying to do this on the database side rather than in code. I am using the enable1 word list if anyone cares.
Ignoring, for the moment, the SQL part of the problem, the algorithm I'd use is fairly simple: start by taking each word in your dictionary, and producing a version of it with the letters in sorted order, along with a pointer back to the original version of that word.
This would give a table with entries like:
sorted_text word_id
act 123 /* we'll assume `act` was word number 123 in the original list */
act 321 /* we'll assume 'cat' was word number 321 in the original list */
Then when we receive an input (say, "tac") we sort it's letters, look it up in our table of sorted letters joined to the table of the original words, and that gives us a list of the words that can be created from that input.
If I were doing this, I'd have the tables for that in a SQL database, but probably use something else to pre-process the word list into the sorted form. Likewise, I'd probably leave sorting the letters of the user's input to whatever I was using to create the front-end, so SQL would be left to do what it's good at: relational database management.
If you use the solution you provide, you'll need to add an order column to the WordLetters table. Without that, there's no guarantee that you'll retrieve the rows that you retrieve are in the same order you inserted them.
However, I think I have a better solution. Based on your question, it appears that you want to find all words with the same component letters, independent of order or number of occurrences. This means that you have a limited number of possibilities. If you translate each letter of the alphabet into a different power of two, you can create a unique value for each combination of letters (aka a bitmask). You can then simply add together the values for each letter found in a word. This will make matching the words trivial, as all words with the same letters will map to the same value. Here's an example:
WITH letters
AS (SELECT Cast('a' AS VARCHAR) AS Letter,
1 AS LetterValue,
1 AS LetterNumber
UNION ALL
SELECT Cast(Char(97 + LetterNumber) AS VARCHAR),
Power(2, LetterNumber),
LetterNumber + 1
FROM letters
WHERE LetterNumber < 26),
words
AS (SELECT 1 AS wordid, 'act' AS word
UNION ALL SELECT 2, 'cat'
UNION ALL SELECT 3, 'tom'
UNION ALL SELECT 4, 'moot'
UNION ALL SELECT 5, 'mote')
SELECT wordid,
word,
Sum(distinct LetterValue) as WordValue
FROM letters
JOIN words
ON word LIKE '%' + letter + '%'
GROUP BY wordid, word
As you'll see if you run this query, "act" and "cat" have the same WordValue, as do "tom" and "moot", despite the difference in number of characters.
What makes this better than your solution? You don't have to build a lot of non-words to weed them out. This will constitute a massive savings of both storage and processing needed to perform the task.
There is a solution to this in SQL. It involves using a trick to count the number of times that each letter appears in a word. The following expression counts the number of times that 'a' appears:
select len(word) - len(replace(word, 'a', ''))
The idea is to count the total of all the letters in the word and see if that matches the overall length:
select w.word, (LEN(w.word) - SUM(LettersInWord))
from
(
select w.word, (LEN(w.word) - LEN(replace(w.word, wl.letter))) as LettersInWord
from word w
cross join wordletters wl
) wls
having (LEN(w.word) = SUM(LettersInWord))
This particular solution allows multiple occurrences of a letter. I'm not sure if this was desired in the original question or not. If we want up to a certain number of occurrences, then we might do the following:
select w.word, (LEN(w.word) - SUM(LettersInWord))
from
(
select w.word,
(case when (LEN(w.word) - LEN(replace(w.word, wl.letter))) <= maxcount
then (LEN(w.word) - LEN(replace(w.word, wl.letter)))
else maxcount end) as LettersInWord
from word w
cross join
(
select letter, count(*) as maxcount
from wordletters wl
group by letter
) wl
) wls
having (LEN(w.word) = SUM(LettersInWord))
If you want an exact match to the letters, then the case statement should use " = maxcount" instead of " <= maxcount".
In my experience, I have actually seen decent performance with small cross joins. This might actually work server-side. There are two big advantages to doing this work on the server. First, it takes advantage of the parallelism on the box. Second, a much smaller set of data needs to be transfered across the network.