How does Partitioning By a Substring in T-SQL Work? - sql

I found the perfect example while browsing through sites of what I'm looking for. In this code example, all country names that appear in long formatted rows are concatenated together into one result, with a comma and space between each country.
Select CountryName from Application.Countries;
Select SUBSTRING(
(
SELECT ',' + CountryName AS 'data()'
FROM Application.Countries FOR XML PATH('')
), 2 , 9999) As Countries
Source: https://www.mytecbits.com/microsoft/sql-server/concatenate-multiple-rows-into-single-string
My question is: how can you partition these results with a second column that would read as "Continent" in such a way that each country would appear within its respective continent? The theoretical "OVER (PARTITION BY Continent)" in this example would not work without an aggregate function before it. Perhaps there is a better way to accomplish this? Thanks.

Use a continents table (you seem not to have one, so derive one with distinct), and then use the same code in a cross apply using the where as a "join" condition:
select *
from
(
select distinct continent from Application.Countries
) t1
cross apply
(
Select SUBSTRING(
(
SELECT ',' + CountryName AS 'data()'
FROM Application.Countries as c FOR XML PATH('')
where c.continent=t1.continent
), 2 , 9999) As Countries
) t2
Note that it is more usual, and arguably has more finesse, to use stuff(x,1,1,'')instead of substring(x,2,9999) to remove the first comma.

Related

How to retrieve single column separated by comma with multiples values, and apply join in sql

I have the following two tables in sql.I want to get the calendarId from calenderschedule and join with calendar table to get the calendarcode for each productId. Output format is described below.
MS SQL Server 2012 version string_split is not working. Please help to get the desired output.
Table1: calenderschedule
productid, calendarid
100 1,2,3
200 1,2
Table2: calendar
calendarid, calendarCode
1 SIB
2 SIN
3 SIS
Output:
productId, calendarCode
100 SIB,SIN,SIS
200 SIB,SIN
You can normalize the data by converting to XML and then using CROSS APPLY to split it. Once it's normalized, use the STUFF function to combine the calendar codes into a comma-separated list. Try this:
;WITH normalized_data as (
SELECT to_xml.productid
,split.split_calendarid
FROM
(
SELECT *,
cast('<X>'+replace(cs.calendarid,',','</X><X>')+'</X>' as XML) as xmlfilter
FROM calendarschedule cs
) to_xml
CROSS APPLY
(
SELECT new.D.value('.','varchar(50)') as split_calendarid
FROM to_xml.xmlfilter.nodes('X') as new(D)
) split
) select distinct
n.productid
,STUFF(
(SELECT distinct ', ' + c.calendarCode
FROM calendar c
JOIN normalized_data n2 on n2.split_calendarid = c.calendarid
WHERE n2.productid = n.productid
FOR XML PATH ('')), 1, 1, '') calendarCode
from normalized_data n
I feel like this solution is a bit overly complex, but it's the only way I got it to work. If anybody knows how to simplify it, I'd love to hear some feedback.

SQL Query - Copy Data into One Field

I have a SQL database and I am writing a query:
SELECT *
FROM Consignments
INNER JOIN OrderDetail
ON Consignments.consignment_id = OrderDetail.consignment_id
INNER JOIN UserReferences
ON OrderDetail.record_id = UserReferences.record_id
WHERE Consignments.despatch_date = '2020-04-23'
Within the first column is:consignment_id [this is from the Consignments table]In the final column is:senders_reference [this is from the UserReferences table]
Now - the issue I have is - that when I am running the query to pick up all consignments for a particular date - it is displaying multiple rows (with duplicated consignment_id) when there are multiple senders references within the database.
If there is one senders reference number - then there is only 1 row.
This makes sense - because within the front-end for the database the user can enter 1 or more senders references.
Now - what I would like to do is to amend my query for the resulting data to only display 1 row for all consignments and if there are multiple senders reference numbers - to have them within the one field, separated by commas.
Is this doable from the query stage?
Or if not - after export, is it possible to develop a bat file to do the same thing?
For reference - this is what I mean - this is the result I am getting at the moment:
This is what I need:
You can use older style with the help of for xml :
select t.consignment_id,
stuff((select ', ' +convert(varchar(255), t1.sender_reference)
from table t1
where t1.consignment_id = t.consignment_id
for xml path('')
), 1, 1, ''
) as senders_reference
from (select distinct consignment_id from table t) t;
Edit : You can use CTE :
with cte as (
<your query>
)
select t.consignment_id,
stuff((select ', ' +convert(varchar(255), t1.sender_reference)
from cte t1
where t1.consignment_id = t.consignment_id
for xml path('')
), 1, 1, ''
) as senders_reference
from (select distinct consignment_id from cte t) t;
You seem to want to use the STRING_AGG function.
This answer covers it nicely
ListAGG in SQLSERVER

Can I use string_split with enforcing combination of labels?

So I have the following table:
Id Name Label
---------------------------------------
1 FirstTicket bike|motorbike
2 SecondTicket bike
3 ThirdTicket e-bike|motorbike
4 FourthTicket car|truck
I want to use string_split function to identify rows that have both bike and motorbike labels.
So the desired output in my example will be just the first row:
Id Name Label
--------------------------------------
1 FirstTicket bike|motorbike
Currently, I am using the following query but it is returning row 1,2 and 3. I only want the first. Is it possible?
SELECT Id, Name, Label FROM tickets
WHERE EXISTS (
SELECT * FROM STRING_SPLIT(Label, '|')
WHERE value IN ('bike', 'motorbike')
)
You can use APPLY & do aggregation :
SELECT t.id, t.FirstTicket, t.Label
FROM tickets t CROSS APPLY
STRING_SPLIT(t.Label, '|') t1
WHERE t1.value IN ('bike', 'motorbike')
GROUP BY t.id, t.FirstTicket, t.Label
HAVING COUNT(DISTINCT t1.value) = 2;
However, this breaks the normalization rules you should have separate table tickets.
You could just use string functions for this:
select t.*
from mytable t
where
'|' + label + '|' like '%|bike|%'
and '|' + label + '|' like '%|motorbike|%'
I would expect this to be more efficient than other methods that split and aggregate.
Please note, however, that you should really consider fixing your data model. Instead of storing delimited lists, you should have a separated table to represent the relation between tickets and labels, with one row per ticket/label tuple. Storing delimited lists in database column is a well-know SQL antipattern, that should be avoided at all cost (hard to maintain, hard to query, hard to enforce data integrity, inefficicent, ...). You can have a look at this famous SO post for more on this topic.
Yogesh beat me to it; my solution is similar but with a HUGE performance improvement worth pointing out. We'll start with this sample data:
SET NOCOUNT ON;
IF OBJECT_ID('tempdb..#tickets','U') IS NOT NULL DROP TABLE #tickets;
CREATE TABLE #tickets (Id INT, [Name] VARCHAR(50), Label VARCHAR(1000));
INSERT #tickets (Id, [Name], Label)
VALUES
(1,'FirstTicket' , 'bike|motorbike'),
(2,'SecondTicket', 'bike'),
(3,'ThirdTicket' , 'e-bike|motorbike'),
(4,'FourthTicket', 'car|truck'),
(5,'FifthTicket', 'motorbike|bike');
Now the original and much improved version:
-- Original
SELECT t.id, t.[Name], t.Label
FROM #tickets AS t
CROSS APPLY STRING_SPLIT(t.Label, '|') t1
WHERE t1.[value] IN ('bike', 'motorbike')
GROUP BY t.id, t.[Name], t.Label
HAVING COUNT(DISTINCT t1.[value]) = 2;
-- Improved Version Leveraging APPLY to avoid a sort
SELECT t.Id, t.[Name], t.Label
FROM #tickets AS t
CROSS APPLY
(
SELECT 1
FROM STRING_SPLIT(t.Label,'|') AS split
WHERE split.[value] IN ('bike','motorbike')
HAVING COUNT(*) = 2
) AS isMatch(TF);
Now the execution plans:
If you compare the costs: the "sortless" version is query 4.36 times faster than the original. In reality it's more because, with the first version, we're not just sorting, we are sorting three columns - an int and two (n)varchars. Because sorting costs are N * LOG(N), the original query gets exponentially slower the more rows you throw at it.

Google BigQuery: iterate CONTAINS function over subquery

Lets assume I have two tables:
girls prefixes
------ ----------
Le-na -na
Lo-ve -ve
Li-na -la
Lu-na -ta
Len-ka -ya
All girls names and prefixes are different length!
I want to select all girl names that contains prefixes table and to do it in a query(imagine I have many names and many prefixes).
I untested that for single case it is being completed like this:
SELECT girls,SOME(girls CONTAINS ("-na")) WITHIN RECORD FROM prefixes
But how do I implement iteration of CONTAINS function over subquery?
e.g.
SELECT girls,SOME(girls CONTAINS (SELECT * FROM prefixes))
WITHIN RECORD FROM prefixes
–– this doesn't work, cause Subselect not allowed in SELECT clause
I'd really appreciate any ideas, I've tried to search for this but couldn't find my case.
Have you tried just using join?
select *
from girls g join
prefixes p
on g.girls like concat('%', p.prefix);
This should work using standard SQL.
Assuming that the prefixes (well, suffixes) are always three characters, you can perform an efficient semi-join with the result of SUBSTR:
#standardSQL
WITH Girls AS (
SELECT name
FROM UNNEST(['Le-na', 'Lo-ve', 'Li-na', 'Lu-na', 'Len-ka']) AS name
),
Suffixes AS (
SELECT suffix
FROM UNNEST(['-na', '-ve', '-la', '-ta', '-ya']) AS suffix
)
SELECT
name
FROM Girls
WHERE EXISTS (
SELECT 1 FROM Suffixes WHERE suffix = SUBSTR(name, LENGTH(name) - 2)
);
Or you can use LIKE, but it is equivalent to performing a cross join with a filter, so it probably won't be as fast:
#standardSQL
WITH Girls AS (
SELECT name
FROM UNNEST(['Le-na', 'Lo-ve', 'Li-na', 'Lu-na', 'Len-ka']) AS name
),
Suffixes AS (
SELECT suffix
FROM UNNEST(['-na', '-ve', '-la', '-ta', '-ya']) AS suffix
)
SELECT
name
FROM Girls
WHERE EXISTS (
SELECT 1 FROM Suffixes WHERE name LIKE CONCAT('%', suffix)
);
Edit: another option that enumerates all name suffixes for use in the semi-join:
#standardSQL
WITH Girls AS (
SELECT name
FROM UNNEST(['Le-na', 'Lo-ve-lala', 'Li-na', 'Lu-eya', 'Len-ka']) AS name
),
Suffixes AS (
SELECT suffix
FROM UNNEST(['-na', '-ve', '-lala', '-ta', '-eya']) AS suffix
),
GirlNamePermutations AS (
SELECT name, SUBSTR(name, LENGTH(name) + 1 - len) AS name_suffix
FROM Girls
CROSS JOIN UNNEST(GENERATE_ARRAY(1, (SELECT MAX(LENGTH(suffix)) FROM Suffixes))) AS len
)
SELECT
name
FROM GirlNamePermutations
WHERE EXISTS (
SELECT 1
FROM Suffixes
WHERE suffix = name_suffix
);
If you know the range of suffix lengths, you could hard-code it instead, e.g. replace:
CROSS JOIN UNNEST(GENERATE_ARRAY(1, (SELECT MAX(LENGTH(suffix)) FROM Suffixes))) AS len
with:
CROSS JOIN UNNEST(GENERATE_ARRAY(1, 5)) AS len
Below is for BigQuery Standard SQL
#standardSQL
WITH girls AS (
SELECT name
FROM UNNEST(['Le-na', 'Lo-ve', 'Li-na', 'Lu-na', 'Len-ka']) AS name
),
suffixes AS (
SELECT suffix
FROM UNNEST(['-na', '-ve', '-la', '-ta', '-ya']) AS suffix
)
SELECT name
FROM girls
JOIN suffixes
ON ENDS_WITH(name, suffix)
as an option - in case you will need to extend this to find fragments inside name - you can use REGEXP_CONTAINS
SELECT name
FROM girls
JOIN suffixes
ON REGEXP_CONTAINS(name, suffix)
or - STARTS_WITH to match by prefixes (vs. suffixes)
SELECT name
FROM girls
JOIN suffixes
ON STARTS_WITH(name, suffix)

Find duplicates in SQL table, list multiple instances

I'm trying to list the multiple instances of tbldoc.[docid] from tbldoc where tbldoc.[filename] occurs more than once, id like them seperated by comma and grouped by [filename]
this code works great to find duplicates:
SELECT cast([filename] as varchar(max)),
COUNT(cast([filename] as varchar(max)))
FROM tbldoc
GROUP BY cast([filename] as varchar(max))
HAVING ( COUNT(cast([filename] as varchar(max))) > 1 )
but when i try adding [docid] i get an error:
Column 'tbldoc.DocID' is invalid in the select list because it is not
contained in either an aggregate function or the GROUP BY clause.
this is what i am trying:
SELECT [docid], cast([filename] as varchar(max)),
COUNT(cast([filename] as varchar(max)))
FROM tbldoc
GROUP BY cast([filename] as varchar(max))
HAVING ( COUNT(cast([filename] as varchar(max))) > 1 )
I have no idea how to get all of the [docid]s to list seperated by commas, I'm a a pretty new user when it comes to sql.
this is the output i would like to see:
[docids]|[filemame]|[instances]
12345,12346| excelfile.xls | 3
Thanks ahead of time for the help guys/gals! =)
Iyosha,
You need to join your first result set back to your full table to get the DocIDs. I'll take the CAST() as read to save some typing.
;with CountedFiles as
(
SELECT
filename,
COUNT(filename) as Total
FROM tbldoc
GROUP BY filename
HAVING COUNT(filename) > 1
)
select
cf.filename,
cf.Total,
td.DocID
from CountedFiles as cf
inner join tbldoc at td
on td.filename = cf.filename;
This will return one DocId, one filename and the count per row. You can then follow Adam's link to turn this into a comma list.