I'm trying to create a new column in my BigQuery table based on the partial match values in another column. Here is an example of what I'm trying to do. Let's say this is my table:
id
statement
123
I like green
234
I love green
456
I like orange
123
I hate yellow
What I want to do is create a new table that looks like this:
id
statement
color
123
I like green
green
234
I love green
green
456
I like orange
orange
123
I hate yellow
None
I've tried using a query like this:
SELECT ID, Statement
CASE WHEN Statement IN ('green') THEN 'green' ELSE 'None' END AS color
CASE WHEN Statement IN ('orange') THEN 'orange' ELSE 'None' END AS color
FROM `data.Table1`
but this, of course, only works if it's an exact match. How can I get a new column with a value based on a partial match like described above? Thanks in advance.
You are close:
SELECT ID, Statement,
(CASE WHEN Statement LIKE '%green%' THEN 'green'
WHEN Statement LIKE '%orange%' THEN 'orange'
ELSE 'None'
END) as color
FROM `data.Table1`;
Note: This returns the first color that is matched, which is consistent with the way that you phrased the question.
Related
I'm hoping someone can help me with a SQL select statement for the following problem:
I have the following data in a table:
date
color
01/23/2023
RED
01/23/2023
BLUE
01/23/2023
GREEN
01/23/2023
GREEN
01/23/2023
RED
01/23/2023
BLUE
01/23/2023
GREEN
01/24/2023
BLUE
01/24/2023
GREEN
01/24/2023
BLUE
01/24/2023
RED
01/25/2023
GREEN
01/25/2023
BLUE
01/25/2023
GREEN
01/25/2023
BLUE
and I would like to produce this:
date
RED
BLUE
GREEN
01/23/2023
2
2
3
01/24/2023
1
2
1
01/25/2023
0
2
2
I'll have the exact colors so I don't need that to be determined. Sorry but I don't have any idea of how to even to beginning writing this select statement.
Thanks.
SELECT date,
SUM(CASE WHEN color = 'RED' THEN 1 ELSE 0 END) AS RED,
SUM(CASE WHEN color = 'BLUE' THEN 1 ELSE 0 END) AS BLUE,
SUM(CASE WHEN color = 'GREEN' THEN 1 ELSE 0 END) AS GREEN
FROM table1
GROUP BY date
ORDER BY date;
You can use CASE WHEN SUM
SELECT
edate,
SUM (CASE WHEN color = 'RED' THEN 1 ELSE 0 END) AS RED,
SUM (CASE WHEN color = 'BLUE' THEN 1 ELSE 0 END) AS BLUE,
SUM (CASE WHEN color = 'GREEN' THEN 1 ELSE 0 END) AS GREEN
FROM tablename
GROUP BY edate
First, it would be mighty helpful if you posted your example data as "Readily Consumable Data" on future posts. Like this for this post...
--===== Create "Readily Consumable Data to help those
-- That would help you.
DROP TABLE IF EXISTS #TestTable;
GO
SELECT *
INTO #TestTable
FROM (VALUES
('01/23/2023','RED')
,('01/23/2023','BLUE')
,('01/23/2023','GREEN')
,('01/23/2023','GREEN')
,('01/23/2023','RED')
,('01/23/2023','BLUE')
,('01/23/2023','GREEN')
,('01/24/2023','BLUE')
,('01/24/2023','GREEN')
,('01/24/2023','BLUE')
,('01/24/2023','RED')
,('01/25/2023','GREEN')
,('01/25/2023','BLUE')
,('01/25/2023','GREEN')
,('01/25/2023','BLUE')
)d(date,color)
;
Giving credit where credit is certainly due, both #Learning and #Horaciux gave excellent CROSSTAB answers that do exactly what you ask.
In anticipation of what the folks that asked originally ask you for this output will likely ask of you next, you can also easily add a "DateTotal" and a "ColorTotal". CROSSTABs do make this easy. PIVOTs? No so much and I'm not going to demo that.
Here's the code to include the additional DateTotal column and ColorTotal row. I also converted the CASE statements to IIF to make things less busy. To be sure, though, IIF resolves to CASE statements behind the scenes and that means there's no performance advantage. If you hate IIF, like some do, just change them back to CASE and Bob's your uncle. :D
SELECT Date = IIF(GROUPING(date) = 1, 'Total',CONVERT(CHAR(10),date,101))
,Red = SUM(IIF(color = 'RED' ,1,0))
,Blue = SUM(IIF(color = 'BLUE' ,1,0))
,Green = SUM(IIF(color = 'GREEN',1,0))
,DateTotal = COUNT(*)
FROM #TestTable
GROUP BY date WITH ROLLUP
ORDER BY GROUPING(date),Date
;
That produces the following output...
A simplified version of what I'm asking is: Say I already have a column for pedestrian jacket colors and it is very specific includes values like (canary, gold, dandelion, yellow, lemonade, sky blue, baby blue, cerulean, etc.). I want to create 2 new columns:
"Color Group" and "Light-Dark" so that I can label all of the yellow-ish jackets under Color Group yellow and I can classify all of them as light colors.
Please note that I'm working within a database product and all of the data is stored in the database I cannot create new data within the actual database. But in the reporting module, I can create columns based on existing columns. This data would only exist in the report and not in the database. Also, the way its set up I don't need the SELECT command that just happens by default but feel free to put it in your response it won't confuse me.
I've tried:
if
customer_jacket_color = "dandelion"
then
"Yellow"
Also tried it with WHEN instead of "if"
when customer_jacket_color_group = 'Yellow'
then
'light'
My expected results would take the 1 column of colors and give me 3 columns that describe the colors based on how I map the original values to the new columns.
Dandelion, Yellow, Light
Navy Blue, Blue, Dark
Mint, Green, Light...
I think you want a case expression:
select t.*,
(case when customer_jacket_color = 'dandelion' then 'Yellow'
when customer_jacket_color = 'sky blue' then 'Blue'
. . .
end) as color_group,
(case when customer_jacket_color = 'dandelion' then 'light'
when customer_jacket_color = 'sky blue' then 'dark'
. . .
end) as light_dark
from t;
I have a table CategoryColours, if a category is not found, it should return the colors from the default category "*".
Example if the table contains these lines:
Category Color
* white
* black
1 red
1 blue
1 green
1 black
If I search the category "1", the query should get the 4 colors.
If I search the category "2", which has no records in the table, the query should get the 2 colors from the category "*".
Is it possible to use OpenSQL to get the exact list that I need in a single statement?
I tried with CASE and subqueries (EXIST) but I didn't manage.
It's not a stopper for my code, since I can just check if my category has records first or select my category + the default always and then remove the default if the other has records.
I think you can use UNION. I didn't try code, it can include type errors.
SELECT
category,
color
FROM CategoryColours
WHERE category = lv_category
UNION
SELECT
category,
color
FROM CategoryColours
WHERE category eq '*'
AND NOT EXISTS ( SELECT color
FROM CategoryColours
WHERE category = lv_category
)
INTO TABLE #DATA(lt_itab).
I have a table similar to the following:
Date Description Value1 Value2
01/01/2012 shiny colour 2 0
01/01/2012 yellow colour 2 2
03/01/2012 matt colour 2 2
03/01/2012 matt 4 1
03/01/2012 shiny 2 2
I want to write a SELECT SQL query (T-SQL) that will output all of the above columns but also display an extra column as the output of the SELECT statement whose value depends on the presence of the word "colour" in the Description (if "colour" is present it would be one value, if not it would show a different value).
(I would also want to display another extra column on top of that whose value depends on the presence of the words "matt" or "shiny" in the Description column. But I assume the method of doing this would be similar).
I believe I should be able to do this using the COALESCE function but I'm not familiar with this and am struggling to get anything working?
EXTENSION
Hey, thanks for your answers. They're really helpful. I have one more extension to the question. My second generated column relies on info in the first generated column. So something like:
SELECT *,
CASE
WHEN Description LIKE '%colour%' THEN 'SomeValue'
ELSE 'Unclassified'
END AS Category1,
CASE
WHEN AnotherColumn LIKE 'Something' THEN 'SomeValue'
WHEN Category1='Unclassified' THEN 'Unclassified'
ELSE 'Generic'
END AS Category2
FROM table_name
How do I get the output of Category2 to rely on output of Category1? I'm trying something like the above but it's not working.
My extension question was answered here: T-SQL CASE statement relies on another CASE statement in same SELECT query
SELECT *,
CASE WHEN Description LIKE '%colour%' THEN
1
ELSE
0
END AS HasColour,
CASE WHEN Description LIKE '%matt%' THEN
1
ELSE
0
END AS HasMatt,
CASE WHEN Description LIKE '%shiny%' THEN
1
ELSE
0
END AS HasShiny
FROM table_name
You would just add more columns for all the different words that you want to search for. Obviously you can change the return type of the columns to whatever you want, but I thought a boolean would be suitable in this situation.
Unless I misunderstand what you are asking, you could use a case statement:
SELECT Date,
Description,
Value1,
Value2,
Case when Description like '%colour%' then OTHERCOL else OTHERCOL2 end as Colourful,
Case when Description like '%matt%' then OTHERCOL else OTHERCOL2 end as Matt,
Case when Description like '%shiny%' then OTHERCOL else OTHERCOL2 end as Shiny,
FROM yourTable
I have 3 tables:
Silk_Skey Name
1 Black White Checks Yellow Arms
2 Black Crimson Stripes
3 Crimson Yellow Stripes
Sub Colour Major Colour
Black Black
White White
Yellow Yellow
Crimson Red
MajorColour_Skey Major Colour
1 Black
2 White
3 Yellow
4 Red
And I want to achieve this:
ID Silk_Skey MajorColour_Skey
1 1 1
2 1 2
3 1 3
4 2 1
5 2 4
6 3 3
7 3 4
What I need to do is create a linked table matching all the colours from the 3 tables and break down the silks names so I would show 4 lines in the new table) see SQL below. My boss has advised me to use a 'IS IN' query but I have no idea what that is can you help?
SELECT s.Silks_Skey, mc.MajorColour_Skey
FROM Silks s INNER JOIN SubColour sc on sc.SubColour **'IS IN HERE'** s.SilksName
INNER JOIN MajorColour mc
ON sc.MajorColour = mc.MajorColour
You can use IN
AND table.column IN ('a','b','c')
or
AND table.column IN (1,2,3)
or if you're looking for a string like something you can do
AND table.column LIKE '%word' -- table.column ends with 'word'
AND table.column LIKE 'word%' -- table.column starts with 'word'
AND table.column LIKE '%word%' -- table.column has 'word' anywhere in the column
This is a design doomed to poor performance and awkward and painful to write queries. If your database will never be large, then it may be workable, but if it will be large, you cannot use this design structure and hope to have good performance because you will not be able to properly use indexes. Personally I would add a silk colors table related to the silks table and store the colors indivudally. One of the first rules of database design is never store more than one piece of informatino in a field. You are storing a list which always means you need a related table to have effective use of the database.
One clue to a bad (and over time usually unworkable)database design is if you need to join using functions or caluations of any type or if you need to use wildcards at the start of a phrase in a like clause. Fix this now and things will be much smoother, maintenance will take less time and performacne will be better. There is no upside to your current structure at all.
You may need to take a bit of extra time to parse and store the silk names by individual color, but the time you save in querying the database will be significant becasue you can now make use of a join and then use indexes. Search for fn_split and you will see a method of spliting the silk names into individual colors that you can use when you insert the records.
If you foolishly decide to retain the current structure, then look into using fuilltext search. It wil be faster than using a like clause with a wildcard as the first character.
For what you want to do, you need to do string manipulation because you are trying to compare one color to a list of colors in a string.
The like operator can do this. Try this on clause:
on ' '+ s.SilksName +' ' like '% '+sc.SubColour+' %'
This checks to see if a given color (sc.SubColour) in in the list (s.SilksName). For instance, if you have a list like 'RED GREEN' this will match either '%RED%' or '%GREEN%'.
The purpose of concatenating white space is to avoid partial-word matches. For instance, "blue-green" would match both "blue" and "green" without the delimiters.
The following query returns 7 rows, which seems to be correct (3 for the first row in silks and 2 for each of the other two):
with silks as (
select 1 as silks_skey, 'Black White Checks Yellow Arms' as silksname union all
select 2, 'Black Crimson Stripes' union all
select 3, 'Crimson Yellow Stripes'
),
subcolour as (
select 'black' as subcolour, 'black' as majorcolour union all
select 'white', 'white' union all
select 'yellow', 'yellow' union all
select 'crimson', 'red'
),
MajorColour as (
select 1 as MajorColour_skey, 'black' as MajorColour union all
select 2, 'white' union all
select 3, 'yellow' union all
select 4, 'red'
)
SELECT s.Silks_Skey, mc.MajorColour_Skey
FROM Silks s INNER JOIN SubColour sc on ' ' + s.SilksName + ' ' like '% ' + sc.SubColour + ' %'
INNER JOIN MajorColour mc
ON sc.MajorColour = mc.MajorColour
Sounds like what you really want to do is split the Name field on spaces and then for each one of those values which is contained in the colours table (joined on the sub-colour given that major colours are valid sub-colours too) you want one entry in a new table. Problem is that there is no intrinsic T-SQL function for splitting strings. To do that your best bet is to visit Erland Sommarskog's definitive answer on how to do this.
An alternative, and one which is not very neat and may or may not work, is to use the CONTAINS keyword in your predicate. However in order to achieve this you need to use full text indexing
and I suspect using Erland's excellent giudes on splitting strings and arrays in SQL will be more appropriate and faster.
This is the answer folks, thanks for all your ideas.
Select S.[Silks_Skey], MC.[MajorColour_Skey]
from [dbo].[Silks] S
inner join [dbo].[SubColour] SC on CHARINDEX(SC.[SubColour],S.[SilksName]) <> 0
inner join [dbo].[MajorColour] MC on SC.[MajorColour] = MC.[MajorColour]
UNION ALL
Select S.[Silks_Skey], MC.[MajorColour_Skey]
from [dbo].[Silks] S
inner join [dbo].[MajorColour] MC on CHARINDEX(MC.[MajorColour],S.[SilksName]) <> 0
ORDER BY S.[Silks_Skey]