SQL Server [PATSTAT] query | Multiple charindex values & - sql

Hello Stack Overflow Community.
I am retrieving data with SQL from PATSTAT (patent data base from the European Patent Office). I have two issues (see below). For your info the PATSAT sql commands are quite limited.
I. Charindex with multiple values
I am looking for specific two specific patent groups ["Y02E" and "Y02C"] and want to retrieve data on these. I have found that using the charindex function works if I insert one group;
and charindex ('Y02E', cpc_class_symbol) > 0
But if I want to use another charindex function the query just times out;
and charindex ('Y02E', cpc_class_symbol) > 0 or charindex ('Y02C', cpc_class_symbol) >0
I am an absolute SQL rookie but would really appreciate your help!
II. List values from column in one cell with comma separation
Essentially I want to apply what I found as the "string_agg"-command, however, it does not work for this database. I have entries with a unique ID, which have multiple patent categories. For example:
appln_nr_epodoc | cpc_class_symbol
EP20110185794 | Y02E 10/125
EP20110185794 | Y02E 10/127
I would like to have it like this, however:
appln_nr_epodoc | cpc_class_symbol
EP20110185794 | Y02E 10/125, Y02E 10/127
Again, I am very new to sql, so any help is appreciated! Thank you!
I will also attach the full code here for transparency
SELECT a.appln_nr_epodoc, a.appln_nr_original, psn_name, person_ctry_code, person_name, person_address, appln_auth+appln_nr,
appln_filing_date, cpc_class_symbol
FROM
tls201_appln a
join tls207_pers_appln b on a.appln_id = b.appln_id
join tls206_person c on b.person_id = c.person_id
join tls801_country on c.person_ctry_code= tls801_country.ctry_code
join tls224_appln_cpc on a.appln_id = tls224_appln_cpc.appln_id
WHERE appln_auth = 'EP'
and appln_filing_year between 2005 and 2012
and eu_member = 'Y'
and granted = 'Y'
and psn_sector = 'company'
and charindex ('Y02E', cpc_class_symbol) > 0

For your part 2 here is a sample data i created
And here is the code. It gives me YOUR requested output.
create table #test_1 (
appln_nr_epodoc varchar(20) null
,cpc_class_symbol varchar(20) null
)
insert into #test_1 values
('EP20110185794','Y02E 10/125')
,('EP20110185794','Y02E 10/127')
,('EP20110185795','Y02E 10/130')
,('EP20110185796','Y02E 20/140')
,('EP20110185796','Y02E 21/142')
with CTE_1 as (select *
from (
select *
,R1_1 = Rank() over(partition by appln_nr_epodoc order by cpc_class_symbol )
from #test_1
) as a
where R1_1 = 1
)
,CTE_2 as (select *
from (
select *
,R1_1 = Rank() over(partition by appln_nr_epodoc order by cpc_class_symbol )
from #test_1
) as a
where R1_1 = 2 )
select a.appln_nr_epodoc
,a.cpc_class_symbol+','+c.cpc_class_symbol
from CTE_1 a
join CTE_2 c on c.appln_nr_epodoc = a.appln_nr_epodoc
Out put

Related

SQL - Get the sum of several groups of records

DESIRED RESULT
Get the hours SUM of all [Hours] including only a single result from each [DevelopmentID] where [Revision] is highest value
e.g SUM 1, 2, 3, 5, 6 (Result should be 22.00)
I'm stuck trying to get the appropriate grouping.
DECLARE #CompanyID INT = 1
SELECT
SUM([s].[Hours]) AS [Hours]
FROM
[dbo].[tblDev] [d] WITH (NOLOCK)
JOIN
[dbo].[tblSpec] [s] WITH (NOLOCK) ON [d].[DevID] = [s].[DevID]
WHERE
[s].[Revision] = (
SELECT MAX([s2].[Revision]) FROM [tblSpec] [s2]
)
GROUP BY
[s].[Hours]
use row_number() to identify the latest revision
SELECT SUM([Hours])
FROM (
SELECT *, R = ROW_NUMBER() OVER (PARTITION BY d.DevID
ORDER BY s.Revision)
FROM [dbo].[tblDev] d
JOIN [dbo].[tblSpec] s
ON d.[DevID] = s.[DevID]
) d
WHERE R = 1
If you want one row per DevId, then that should be in the GROUP BY (and presumably in the SELECT as well):
SELECT s.DevId, SUM(s.Hours) as hours
FROM [dbo].[tblDev] d JOIN
[dbo].[tblSpec] s
ON [d].[DevID] = [s].[DevID]
WHERE s.Revision = (SELECT MAX(s2.Revision) FROM tblSpec s2)
GROUP BY s.DevId;
Also, don't use WITH NOLOCK unless you really know what you are doing -- and I'm guessing you do not. It is basically a license that says: "You can get me data even if it is not 100% accurate."
I would also dispense with all the square braces. They just make the query harder to write and to read.

Count Similar Substrings SQL query

I've tried a few scenarios and googled a lot, but still can't find a solution.
I have a table of user names with entries something like the below:
UserName
Cakes420
18Jack01
18Jack04
16Jack22
22Jack16
Mapple7609
Chrom44
chrom22
chrom77
013Cake
016Cake
122Cake
123Cake87
So I need a query that checks for all records that share 4 or more (in sequence) characters in the table.
So I need to return something like :
Characters
Times Used
Names Sharing
Cake
5
Cakes420, 013Cake, 016Cake, 122Cake, 123Cake87
Chro
3
Chrom44, chrom22, chrom77
or anything similar as I'd prefer not to repeat patterns, but hey, at this stage if it returns the values properly, I don't mind.
The shared characters can naturally appear in any place in the string, which is what makes this so difficult.
Should you do this in T-SQL? Probably not.
Can you do this in T-SQL? Yes.
Sample data
create table Names
(
Name nvarchar(20)
);
insert into Names (Name) values
('Cakes420'),
('18Jack01'),
('18Jack04'),
('16Jack22'),
('22Jack16'),
('Mapple7609'),
('Chrom44'),
('chrom22'),
('chrom77'),
('013Cake'),
('016Cake'),
('122Cake'),
('123Cake87');
Solution
Using STRING_AGG() for easy concatenation. Available from SQL Server 2017. Alternatives available for older SQL versions (use the search box on this site, there are many examples).
with rcte as
(
select n.Name,
convert(nvarchar(4), substring(n.Name, 1, 4)) as Part,
1 as PartFrom
from Names n
where len(n.Name) >= 4
union all
select r.Name,
convert(nvarchar(4), substring(r.Name, r.PartFrom+1, r.PartFrom+4)),
r.PartFrom+1
from rcte r
where len(r.Name) >= r.PartFrom+4
),
cte_count as
(
select r.Part,
count(1) as PartCount
from rcte r
where r.Part not like '%[0-9]%' -- exclude parts with numbers in them
group by r.Part
having count(1) > 1
)
select c.Part,
c.PartCount,
string_agg(r.Name, ', ') as Names
from cte_count c
join rcte r
on r.Part = c.Part
group by c.Part,
c.PartCount
order by c.Part;
Result
Part PartCount Names
---- --------- ----------------------------------------------
Cake 5 Cakes420, 123Cake87, 122Cake, 016Cake, 013Cake
Chro 3 Chrom44, chrom22, chrom77
hrom 3 chrom77, chrom22, Chrom44
Jack 4 22Jack16, 16Jack22, 18Jack04, 18Jack01
Fiddle to see it in action with the intermediate CTE results.
Let's use Itzik Ben-Gan's Tally Function to break out a list of substrings, then group them. This is called N-Gram, after the more common Trigram which is 3-character substrings.
I've removed one extra cross-join from the function to speed it up slightly, it's now good for up to varchar(65536):
CREATE OR ALTER FUNCTION dbo.GetNums(#num AS BIGINT)
RETURNS TABLE
AS
RETURN
WITH
L0 AS ( SELECT 1 AS c
FROM (VALUES(1),(1),(1),(1),(1),(1),(1),(1),
(1),(1),(1),(1),(1),(1),(1),(1)) AS D(c) ),
L1 AS ( SELECT 1 AS c FROM L0 AS A CROSS JOIN L0 AS B ),
L2 AS ( SELECT 1 AS c FROM L1 AS A CROSS JOIN L1 AS B ),
Nums AS ( SELECT ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS rownum
FROM L2 )
SELECT TOP(#num)
rownum AS rn
FROM Nums
ORDER BY rownum;
GO
DECLARE #substringLen int = 4;
SELECT
Characters,
[Times Used] = COUNT(*),
[Names Sharing] = STRING_AGG(Username, ', ')
FROM (
SELECT DISTINCT
-- remove DISTINCT if you want to know about multiple in a single username
t.Username,
Characters = SUBSTRING(t.Username, n.rn, #substringLen)
FROM myTable t
CROSS APPLY dbo.GetNums (LEN(t.UserName) - #substringLen + 1) n
) t
GROUP BY t.Characters
HAVING COUNT(*) > 1

Multi row to a row sql

I have a table as bellow:
I want query to print output as bellow:
Note: Please, do not downvote. I know the rules of posting answers, but for such of questions there's no chance to post short answer. I posted it only to provide help for those who want to find out how to achieve that, but does not expect ready-to-use solution.
I'd suggest to read these articles:
PIVOT on two or more fields in SQL Server
Pivoting on multiple columns - SQL Server
Pivot two or more columns in SQL Server 2005
At first UNPIVOT then PIVOT. If number of rows for each Pod_ID is not always equal 3 then you need to use dynamic SQL. The basic sample:
SELECT *
FROM (
SELECT Pod_ID,
Purs + CASE WHEN RN-1 = 0 THEN '' ELSE CAST(RN-1 as nvarchar(10)) END as Purs,
[Values]
FROM (
SELECT Pod_ID,
Pur_Qty, --All columns that will be UNPIVOTed must be same datatype
Pur_Price,
CAST(ETD_Date as int) ETD_Date, -- that is why I cast date to int
ROW_NUMBER() OVER (ORDER BY (SELECT 1)) as RN
FROM YourTable
) as p1
UNPIVOT (
[Values] FOR [Purs] IN(Pur_Qty, Pur_Price, ETD_Date)
) as unpvt
) as p2
PIVOT (
MAX([Values]) FOR Purs IN (Pur_Qty,Pur_Price,ETD_Date,Pur_Qty1,Pur_Price1,ETD_Date1,Pur_Qty2,Pur_Price2,ETD_Date2)
) as pvt
Will bring you:
Pod_ID Pur_Qty Pur_Price ETD_Date Pur_Qty1 Pur_Price1 ETD_Date1 Pur_Qty2 Pur_Price2 ETD_Date2
F8E2F614-75BC-4E46-B7F8-18C7FC4E5397 24 22 20160820 400 33 20160905 50 44 20160830

Comparing a list of values

For example, I have a head-table with one column id and a position-table with id, head-id (reference to head-table => 1 to N), and a value. Now I select one row in the head-table, say id 1. I look into the position-table and find 2 rows which referencing to the head-table and have the values 1337 and 1338. Now I wanna select all heads which have also 2 positions with these values 1337 and 1338. The position-ids are not the same, only the values, because it is not a M to N relation. Can anyone tell me a SQL-Statement? I have no idea to get it done :/
Assuming that the value is not repeated for a given headid in the position table, and that it is never NULL, then you can do this using the following logic. Do a full outer join on the position table to the specific head positions you care about. Then check whether there is a full match.
The following query does this:
select *
from (select p.headid,
sum(case when p.value is not null then 1 else 0 end) as pmatches,
sum(case when ref.value is not null then 1 else 0 end) as refmatches
from (select p.value
from position p
where p.headid = <whatever>
) ref full outer join
position p
on p.value = ref.value and
p.headid <> ref.headid
) t
where t.pmatches = t.refmatches
If you do have NULLs in the values, you can accommodate these using coalesce. If you have duplicates, you need to specify more clearly what to do in this case.
Assuming you have:
Create table head
(
id int
)
Create table pos
(
id int,
head_id int,
value int
)
and you need to find duplicates by value, then I'd use:
Select distinct p.head_id, p1.head_id
from pos p
join pos p1 on p.value = p1.value and p.head_id<>p1.head_id
where p.head_id = 1
for specific head_id, or without last where for every head_id

MySQL intersection in subquery

I'm trying to create a filter for a list (of apartments), with a many-to-many relationship with apartment features through the apartsments_features table.
I would like to include only apartments that have all of some features (marked 'Yes' on a form) excluding all the ones that have any of another set features (marked 'No'). I realized too late that I couldn't use INTERSECT or MINUS in MySQL.
I have a query that looks something like:
SELECT `apartments`.* FROM `apartments` WHERE `apartments`.`id` IN (
SELECT `apartments`.`id` FROM `apartments` INTERSECT (
SELECT `apartment_id` FROM `apartments_features` WHERE `feature_id` = 103
INTERSECT SELECT `apartment_id` FROM `apartments_features` WHERE `feature_id` = 106
) MINUS (
SELECT `apartment_id` FROM `apartments_features` WHERE `feature_id` = 105 UNION
SELECT `apartment_id` FROM `apartments_features` WHERE `feature_id` = 107)
)
ORDER BY `apartments`.`name` ASC
I'm pretty sure there's a way to do this, but at the moment my knowledge is restricted to little more than simple left and right joins.
A slightly different way of doing it:
select a.*
from apartments a
join apartments_features f1
on a.apartment_id = f1.apartment_id and f1.feature_id in (103,106) -- applicable features
where not exists
(select null from apartments_features f2
where a.apartment_id = f2.apartment_id and f2.feature_id in (105,107) ) -- excluded features
group by f1.apartment_id
having count(*) = 2 -- number of applicable features
You could try something like this:
SELECT apartment_id
FROM
(
SELECT apartment_id
FROM apartments_features
WHERE feature_id IN (103, 106)
GROUP BY apartment_id
HAVING COUNT(*) = 2
) T1
LEFT JOIN
(
SELECT apartment_id
FROM apartments_features
WHERE feature_id IN (105, 107)
) T2
ON T1.apartment_id = T2.apartment_id
WHERE T2.apartment_id IS NULL
Join the result of this query to the apartments table to get the name, etc.