Any option except cursor in this kind of group by? - sql

I have a sample data as:
Johnson; Michael, Surendir;Mishra, Mohan; Ram
Johnson; Michael R.
Mohan; Anaha
Jordan; Michael
Maru; Tushar
The output of the query should be:
Johnson; Michael 2
Mohan; Anaha 1
Michael; Jordon 1
Maru; Tushar 1
Surendir;Mishra 1
Mohan; Ram 1
As you can see it is print the count of each name separated by , but with a twist. We cannot simply do a groupby on full name because sometimes the name may contain middle name 1st initial and sometimes it may not. Eg. Johnson; Michael and Johnson; Michael R. are counted as single name and hence their count is 2. Further either Johnson; Michael should appear or Johnson; Michael R. should appear in resultset with count of 2 (not both because that would be repeated record)
The table contains names separated by , and it is not possible to denormalize it as it is LIVE and given to us by someone else.
Is there anyway to write a query for this without using cursor? I have around 3 million records in my DB and I have to support pagination etc also. What do you think would be the best way to achieve this?

This is why your data should be normalised.
;with cte as
(
select 1 as Item, 1 as Start, CHARINDEX(',',People+',' , 1) as Split,
People+',' as People
from YourHorribleTable
union all
select cte.Item+1, cte.Split+1, nullif(CHARINDEX(',',people, cte.Split+1),0), People as Split
from cte
where cte.Split<>0
)
select Person, COUNT(*)
from
(
select case when nullif(charindex (' ', person, 2+nullif(CHARINDEX(';', person),0)),0) is null then person
else substring(person,1,charindex (' ', person, 2+nullif(CHARINDEX(';', person),0)))
end as Person
from
(
select LTRIM(RTRIM( SUBSTRING(people, start,isnull(split,len(People)+1)-start))) as person
from cte
) v
where person<>''
) v
group by Person
order by COUNT(*) desc

Related

T-SQL Count values in the entire PATH column

I am trying to count the number of times that name in NAME appear in BOSS_PATH.
The source
NAME BOSS_PATH
---------------------
WIN WIN
JOHN WIN|JOHN
DANG WIN|JOHN|DANG
JOSH JOSH
The result I want
NAME BOSS_PATH COUNT_UNDER
--------------------------------------
WIN WIN 2
JOHN WIN|JOHN 1
DANG WIN|JOHN|DANG 0
JOSH JOSH 0
My thought is the query would be something like this
(SUM(the frequency of NAME appears in BOSS_PATH) - 1) AS COUNT_UNDER
But I still have problem writing this as an actual query.
This will probably do it (though no idea how it will perform if there is a large volume of data). You can probably combine some of the steps but I've written it out as multiple CTEs to show the logical progression of the approach I used:
-- Get a CTE of all the names
WITH NAMES AS (
SELECT NAME FROM HIER_DATA
),
-- Get a CTE of all the paths
PATHS AS (
SELECT BOSS_PATH FROM HIER_DATA
),
-- Get a CTE of every name/path combination
-- The FULL JOIN syntax may need to be adjusted for your specific DBMS
CROSS_JOIN AS (
SELECT * FROM NAMES
FULL JOIN PATHS
)
-- Someone reports to the NAME if the BOSS_PATH contains the NAME followed by a '|'
-- If this is true give the record a value of 1 and sum by the NAME
SELECT NAME, SUM(CASE WHEN POSITION(NAME||'|',boss_path) > 0 THEN 1 ELSE 0 END) IS_PARENT
FROM CROSS_JOIN
GROUP BY NAME
ORDER BY NAME
;

Get union and intersection of jsonb array in Postgresql

I have a DB of people with jsonb column interests. In my application user can search for people by providing their hobbies which is set of some predefined values. I want to offer him a best match and in order to do so I would like to count match as intersection/union of interests. This way the top results won't be people who have plenty of hobbies in my DB.
Example:
DB records:
name interests::jsonb
Mary ["swimming","reading","jogging"]
John ["climbing","reading"]
Ann ["swimming","watching TV","programming"]
Carl ["knitting"]
user input in app:
["reading", "swimming", "knitting", "cars"]
my script should output this:
Mary 0.4
John 0.2
Ann 0.16667
Carl 0.25
Now I'm using
SELECT name
FROM people
WHERE interests #>
ANY (ARRAY ['"reading"', '"swimming"', '"knitting"', '"cars"']::jsonb[])
but this gives me even records with many interests and no way to order it.
Is there any way I can achieve it in a reasonable time - let's say up to 5 seconds in DB with around 400K records?
EDIT:
I added another example to clarify my calculations. My calculation needs to filter people with many hobbies. Therefore match should be calculated as Intersection(input, db_record)/Union(input, db_record).
Example:
input = ["reading"]
DB records:
name interests::jsonb
Mary ["swimming","reading","jogging"]
John ["climbing","reading"]
Ann ["swimming","watching TV","programming"]
Carl ["reading"]
Match for Mary would be calculated as (LENGTH(["reading"]))/(LENGTH(["swimming","reading","jogging"])) which is 0.3333
and for Carl it would be (LENGTH(["reading"]))/LENGTH([("reading")]) which is 1
UPDATE: I managed to do it with
SELECT result.id, result.name, result.overlap_count/(jsonb_array_length(persons.interests) + 4 - result.overlap_count)::decimal as score
FROM (SELECT t1.name as name, t1.id, COUNT(t1.name) as overlap_count
FROM (SELECT name, id, jsonb_array_elements(interests)
FROM persons) as t1
JOIN (SELECT unnest(ARRAY ['"reading"', '"swimming"', '"knitting"', '"cars"'])::jsonb as elements) as t2 ON t1.jsonb_array_elements = t2.elements
GROUP BY t1.name, t1.id) as result
JOIN persons ON result.id = persons.id ORDER BY score desc
Here's my fiddle https://dbfiddle.uk/?rdbms=postgres_12&fiddle=b4b1760854b2d77a1c7e6011d074a1a3
However it's not fast enough and I would appreciate any improvements.
One option is to unnest the parameter and use the ? operator to check each and every element the jsonb array:
select
t.name,
x.match_ratio
from mytable t
cross join lateral (
select avg( (t.interests ? a.val)::int ) match_ratio
from unnest(array['reading', 'swimming', 'knitting', 'cars']) a(val)
) x
It is not very clear what are the rules behind the result that you are showing. This gives you a ratio that represents the percentage of values in the parameter array that can be found in the interests of each person (so Mary gets 0.5 since she has two interests in common with the search parameter, and all other names get 0.25).
Demo on DB Fiddle
One option would be using jsonb_array_elements() to unnest the jsonb column :
SELECT name, count / SUM(count) over () AS ratio
FROM(
SELECT name, COUNT(name) AS count
FROM people
JOIN jsonb_array_elements(interests) AS j(elm) ON TRUE
WHERE interests #>
ANY (ARRAY ['"reading"', '"swimming"', '"knitting"', '"cars"']::jsonb[])
GROUP BY name ) q
Demo

Query to get number of % sign = length of string in the next row in Oracle 10g

is there any SQL query in Oracle10G which can give the desired output as given required in below sample.
Query should print the name first and in the second row it should print the "%" equal in number with the length of the string.
Could you please help?
Below is the sample of table column
JIM
JOHN
MICHAEL
and the output should come like below :
JIM
%%%
JOHN
%%%%
MICHAEL
%%%%%%%
This would normally be considered an issue for presentation logic, not database logic. However, one option would be to use union all, and then length and rpad to get the correct number of % signs. You'd also need to establish a row number to keep the order together.
Here's one approach:
select name
from (
select name, rownum rn
from yourtable
union all
select rpad('%', length(name), '%') name, rownum + 1 rn
from yourtable ) t
order by rn, name
SQL Fiddle Demo
you can check this link
http://www.club-oracle.com/articles/oracle-pivoting-row-to-column-conversion-techniques-sql-166/
there are many options discussed, it will help

sql statement to select previous rows to a search param

Im after an sql statement (if it exists) or how to set up a method using several sql statements to achieve the following.
I have a listbox and a search text box.
in the search box, user would enter a surname e.g. smith.
i then want to query the database for the search with something like this :
select * FROM customer where surname LIKE searchparam
This would give me all the results for customers with surname containing : SMITH . Simple, right?
What i need to do is limit the results returned. This statement could give me 1000's of rows if the search param was just S.
What i want is the result, limited to the first 20 matches AND the 10 rows prior to the 1st match.
For example, SMI search:
Sives
Skimmings
Skinner
Skipper
Slater
Sloan
Slow
Small
Smallwood
Smetain
Smith ----------- This is the first match of my query. But i want the previous 10 and following 20.
Smith
Smith
Smith
Smith
Smoday
Smyth
Snedden
Snell
Snow
Sohn
Solis
Solomon
Solway
Sommer
Sommers
Soper
Sorace
Spears
Spedding
Is there anyway to do this?
As few sql statements as possible.
Reason? I am creating an app for users with slow internet connections.
I am using POSTGRESQL v9
Thanks
Andrew
WITH ranked AS (
SELECT *, ROW_NUMBER() over (ORDER BY surname) AS rowNumber FROM customer
)
SELECT ranked.*
FROM ranked, (SELECT MIN(rowNumber) target FROM ranked WHERE surname LIKE searchparam) found
WHERE ranked.rowNumber BETWEEN found.target - 10 AND found.target + 20
ORDER BY ranked.rowNumber
SQL Fiddle here. Note that the fiddle uses the example data, and I modified the range to 3 entries before and 6 entries past.
I'm assuming that you're looking for a general algorithm ...
It sounds like you're looking for a combination of finding the matches "greater than or equal to smith", and "less than smith".
For the former you'd order by surname and limit the result to 20, and for the latter you'd order by surname descending and limit to 10.
The two result sets can then be added together as arrays and reordered.
I think you need to use ROW_NUMBER() (see this link).
WITH cust1 AS (
SELECT *, ROW_NUMBER() OVER (ORDER BY surname) as numRow FROM customer
)
SELECT c1.surname, c1.numRow, x.flag
FROM cust1 c1, (SELECT *,
case when numRow = (SELECT MIN(numRow) FROM cust1 WHERE surname='Smith') then 1 else 0 end as flag
FROM cust1) x
WHERE x.flag = 1 and c1.numRow BETWEEN x.numRow - 1 AND x.numRow + 1
ORDER BY c1.numRow
SQLFiddle here.
This works, but the flag finally isn't necessary and it would be a query like PinnyM posts.
A variation on #PinnyM's solution:
WITH ranked AS (
SELECT
*,
ROW_NUMBER() over (ORDER BY surname) AS rowNumber
FROM customer
),
minrank AS (
SELECT
*,
MIN(CASE WHEN surname LIKE searchparam THEN rowNumber END) OVER () AS target
FROM ranked
)
SELECT
surname
FROM minrank
WHERE rowNumber BETWEEN target - 10 AND target + 20
;
Instead of two separate calls to the ranked CTE, one to get the first match's row number and the other to read the results from, another CTE is introduced to serve both purposes. Can't speak for PostgreSQL but in SQL Server this might result in a better execution plan for the query, although in either case the real efficiency would still need to be verified by proper testing.

Need help in understanding a SELECT query

I have a following query. It uses only one table (Customers) from Northwind database.
I completely have no idea how does it work, and what its intention is. I hope there is a lot of DBAs here so I ask for explanation. particularly don't know what the OVER and PARTITION does here.
WITH NumberedWomen AS
(
SELECT CustomerId ,ROW_NUMBER() OVER
(
PARTITION BY c.Country
ORDER BY LEN(c.CompanyName) ASC
)
women
FROM Customers c
)
SELECT * FROM NumberedWomen WHERE women > 3
If you needed the db schema, it is here
This function:
ROW_NUMBER() OVER (PARTITION BY c.Country ORDER BY LEN(c.CompanyName) ASC)
assigns continuous row numbers to records within each country, ordering the records by LEN(companyName).
If you have these data:
country companyName
US Apple
US Google
UK BAT
UK BP
US GM
, then the query will assign numbers from 1 and 3 to the US companies and 1 to 2 to UK companies, ordering them by the name length:
country companyName ROW_NUMBER()
US GM 1
US Apple 2
US Google 3
UK BP 1
UK BAT 2
ROW_NUMBER() is a ranking function.
OVER tells it how to create rank numbers.
PARTITION BY [expression] tells the ROW_NUMBER function to restart ranking whenever [expression] contains a new value
In your case, for every country, a series of numbers starting with 1 is created. Within a country, the Companies are ordered by the length of their name (shorter name = lower rank).
The final query:
SELECT * FROM NumberedWomen WHERE women > 3
selects all customers except if the company-country combination is part of one of the companies with the 3 shortest names in the same country.