I have a table that contains students publications like this
id
student
1
john
2
anthony
3
steven
4
lucille
5
anthony
6
steven
7
john
8
lucille
9
john
10
anthony
11
steven
12
lucille
13
john
so the idea is about to have a query that fetchs all ordered occurences of a determinated student names
context :
answer to the question : how many times John is publishing just after Anthony (who is publishing just after Steven ...) and get id of each occurence
example :
If I look for all occurences of [john, anthony] I'll get (note that the ids must be successive for each occurence)
id
student
1
john
2
anthony
9
john
10
anthony
Or :
id
-- comment
1
(id of first occurence of john, anthony)
9
(id of second occurence of john, anthony)
If I look for [anthony, steven, lucille] i'll get
id
student
2
anthony
3
steven
4
lucille
10
anthony
11
steven
12
lucille
Or :
id
-- comment
2
(id of first occurence of anthony, steven, lucille)
10
(id of second occurence of anthony, steven, lucille)
Any ideas or leads to help me move forward?
That should do the trick, performance wise.
The main idea is to split the data by the first student that is in our search list, but not in all places -
Since the same student can appear multiple times in our search list, we need to make sure that we're not breaking the pattern in the middle.
We're doing that by verifying that each occurrence of the first student is far enough from its previous occurrence, that is, the distance between the two occurrences is bigger than the search list length (the number of non-unique students' names within the search list)
with
prm(students) as (select 'anthony,steven,lucille,anthony')
,prm_ext(search_pattern, first_student, tokens_num) as
(
select regexp_replace(students, '^|(,)','\1\d+;', 'g') as search_pattern
,split_part(students, ',', 1) as first_student
,array_length(string_to_array(students, ','), 1) as tokens_num
from prm
)
,prev_student as
(
select id
,student
,lag(id) over (partition by student order by id) as student_prev_id
from t
)
,seq as
(
select id
,student
,sum(case when student = p.first_student and coalesce(id - student_prev_id >= p.tokens_num, true) then 1 end) over (order by id) as seq_id
,id - max(case when student = p.first_student then id end) over (order by id) as distance_from_first_student
from prev_student cross join prm_ext as p
order by id
)
select split_part(unnest(regexp_matches(string_agg(id || ';' || student, ',' order by id), (select search_pattern from prm_ext), 'g')), ';', 1)::int as id
from seq cross join prm_ext p
where seq_id is not null
and distance_from_first_student < p.tokens_num
group by seq_id
This is the result for an extended data sample:
id
2
16
22
Fiddle
Start with this and if it explodes we'll do some performance improvements, with the price of making the code a little bit more complicated.
with
prm(students) as (select 'anthony,steven,lucille')
,prm_ext(students_regex) as (select regexp_replace(students, '^|(,)','\1\d+;', 'g') from prm)
select split_part(unnest(regexp_matches(string_agg(id || ';' || student, ',' order by id), (select students_regex from prm_ext), 'g')), ';', 1)::int as id
from t
id
2
10
with
prm(students) as (select 'anthony,steven,lucille')
,prm_ext(students_regex) as (select regexp_replace(students, '^|(,)','\1\d+;', 'g') from prm)
select cols[1]::int as id
,cols[2]::text as student
from (select string_to_array(string_to_table(unnest(regexp_matches(string_agg(id || ';' || student, ',' order by id), (select students_regex from prm_ext), 'g')), ','), ';') as cols
from t
) t
id
student
2
anthony
3
steven
4
lucille
10
anthony
11
steven
12
lucille
Fiddle
Related
i'm introducing you the problem with DISTINCT values by column condition i have dealt with and can't provide
any idea how i can solve it.
So. The problem is i have two Stephen here declared , but i don't want duplicates:
**
The problem:
**
id vehicle_id worker_id user_type user_fullname
9 1 NULL external_users John Dalton
10 1 16 employees Mike
11 1 1 employees Stephen
12 2 173 employee Nicholas
13 2 1 employee Stephen
14 1 NULL external_users Peter
**
The desired output:**
id vehicle_id worker_id user_type user_fullname
9 1 NULL external_users John Dalton
10 1 16 employees Mike
12 2 173 employee Nicholas
13 2 1 employee Stephen
14 1 NULL external_users Peter
I have tried CASE statements but without success. When i group by it by worker_id,
it removes another duplicates, so i figured out it needs to be grouped by some special condition?
If anyone can provide me some hint how i can solve this problem , i will be very grateful.
Thank's!
There are no duplicate rows in this table. Just because Stephen appears twice doesn't make them duplicates because the ID, VEHICLE_ID, and USER_TYPE are different.
What you need to do is decide how you want to identify the Stephen record you wish to see in the output. Is it the one with the highest VEHICLE_ID? The "latest" record, i.e. the one with the highest ID?
You will use that rule in a window function to order the rows within your criteria, and then use that row number to filter down to the results you want. Something like this:
select id, vehicle_id, worker_id, user_type, user_fullname
from (
select id, vehicle_id, worker_id, user_type, user_fullname,
row_number() over (partition by worker_id, user_fullname order by id desc) n
from user_vehicle
) t
where t.n = 1
I am fetching some data from a view with some joined tables through sqoop into an external table in impala. However I saw that the columns from one table multiply the rows.
For example
id first_name surname step name value
1 ted kast 1 museum visitor
1 ted kast 1 shop buyer
1 ted kast 2 museum visitor
1 ted kast 2 shop buyer
But I want to be something like that
id first_name surname step name_value
1 ted kast 1 [(museum visitor), (shop buyer)]
1 ted kast 2 [(museum visitor), (shop buyer)]
How can I achieve that in impala?
We can use aggregation here along with GROUP_CONCAT:
SELECT
id,
first_name,
surname,
step,
CONCAT('[', GROUP_CONCAT(CONCAT('(', CONCAT_WS(' ', name, value), ')'), ', '), ']') AS name_value
FROM yourTable
GROUP BY
id,
first_name,
surname,
step
ORDER BY id;
Here is a demo for MySQL, where the syntax is almost the same as for Impala.
Currently I have a table this :
Roll no. Names
------------------
1 Sam
1 Sam
2 Sasha
2 Sasha
3 Joe
4 Jack
5 Jack
5 Julie
I want to write a query in which I get count of the combination in another column
Required output
Combination distinct count
-----------------------------
2-Sasha 1
5-Jack 1
5-Julie 1
Basically, you could group by these columns and use a count function:
SELECT rollno, name, COUNT(*)
FROM mytable
GROUP BY rollno, name
You could also concat the two columns:
SELECT CONCAT(rollno, '-', name), COUNT(*)
FROM mytable
GROUP BY CONCAT(rollno, '-', name)
I have the following problem .
I want this SQL Server 2008 query
select code , name from customer
i want to have 2 or more of the same numbers appear in a separate column like this
code name repeating_numbers
x1 mike 1
x1500 George 2
x200 maria 1
x2098 john 2
a9876 mario 1
if i filter the query to show only the customers with M% i want to see
code name repeating_numbers
x1 mike 1
x200 maria 2
a9876 mario 1
if i want to see the names %o% i need to see
code name repeating_numbers
x1500 George 1
a9876 mario 2
x2098 john 1
in other words no mater the filters i need to see numbers 1,2 or maybe 3 in the future, repeat , thank you in advanced
You can use the ROW_NUMBER with the modulus operator (%):
SELECT code, name, (ROW_NUMBER() OVER (ORDER BY code ASC) - 1) % 2 + 1 AS repeating_numbers
FROM customer
The same is working with any other number too (like 3):
SELECT code, name, (ROW_NUMBER() OVER (ORDER BY code ASC) - 1) % 3 + 1 AS repeating_numbers
FROM customer
You can also use the following to avoid the ORDER BY:
SELECT code, name, (ROW_NUMBER() OVER (ORDER BY (SELECT 100)) - 1) % 2 + 1 AS repeating_numbers
FROM customer
demos: http://sqlfiddle.com/#!18/bdae7/6/1
I have got a table named student. I have written this query:
select * From student where sname in ('rajesh','rohit','rajesh')
In the above query it's returning me two records; one matching 'rajesh' and another matching: 'rohit'.
But i want there to be 3 records: 2 for 'rajesh' and 1 for 'rohit'.
Please provide me some solution or tell me where i am missing.
NOTE: the count of result of sub query is not fix there can be many words there some distinct and some multiple occurrence .
Thanks
Your requirements are not clear, and I'll try to explain why.
Let's define table students
ID FirstName LastName
1 John Smith
2 Mike Smith
3 Ben Bray
4 John Bray
5 John Smith
6 Bill Lynch
7 Bill Smith
Query with WHERE clause:
FirstName in ('Mike', 'Ben', 'Mike')
will return 2 rows only, because it could be rewritten as:
FirstName='Mike' or FirstName='Ben' or FirstName='Mike'
WHERE is filtering clause that just says if existing row satisfy given conditions or not (for each of rows created by FROM clause.
Let's say we have subquery that returns any number of non distinct FirstNames
In case if SQ contains 'Mike', 'Ben', 'Mike' using inner join you can get those 3 rows without problem
Select ST.* from Students ST
Inner Join (Select name from …. <your subquery>) SQ
On ST.FirstName=SQ.name
Result will be:
ID FirstName LastName
2 Mike Smith
2 Mike Smith
3 Ben Bray
Note data are not ordered by order of names returning by SQ. If you want that, SQ should return some ordering number, eg.:
Ord Name
1. Mike
2. Ben
3. Mike
In that case query should be:
Select ST.* from Students ST
Inner Join (Select ord, name from …. <your subquery>) SQ
On ST.FirstName=SQ.name
Order By SQ.ord
And result:
ID FirstName LastName
2 Mike Smith (1)
3 Ben Bray (2)
2 Mike Smith (3)
Now, let's se what will happen if subquery returns
Ord Name
1. Mike
2. Bill
3. Mike
You will end up with
ID FirstName LastName
2 Mike Smith (1)
6 Bill Lynch (2)
7 Bill Smith (2)
2 Mike Smith (3)
Even worse, if you have something like:
Ord Name
1. John
2. Bill
3. John
Result is:
ID FirstName LastName
1 John Smith (1)
4 John Bray (1)
5 John Smith (1)
6 Bill Lynch (2)
7 Bill Smith (2)
1 John Smith (3)
4 John Bray (3)
5 John Smith (3)
This is an complex situation, and you have to clarify precisely what requirement is.
If you need only one student with the same name, for each of rows in SQ, you can use something like SQL 2005+):
;With st1 as
(
Select Row_Number() over (Partition by SQ.ord Order By ID) as rowNum,
ST.ID,
ST.FirstName,
ST.LastName,
SQ.ord
from Students ST
Inner Join (Select ord, name from …. <your subquery>) SQ
On ST.FirstName=SQ.name
)
Select ID, FirstName, LastName
From st1
Where rowNum=1 -- that was missing row, added later
Order By ord
It will return (for SQ values John, Bill, John)
ID FirstName LastName
1 John Smith (1)
6 Bill Lynch (2)
1 John Smith (3)
Note, numbers (1),(2),(3) are shown to display value of ord although they are not returned by query.
If you can split the where clause in your calling code, you could perform a UNION ALL on each clause.
SELECT * FROM Student WHERE sname = 'rajesh'
UNION ALL SELECT * FROM Student WHERE sname = 'rohit'
UNION ALL SELECT * FROM Student WHERE sname = 'rajesh'
Try using a JOIN:
SELECT ...
FROM Student s
INNER JOIN (
SELECT 'rajesh' AS sname
UNION ALL
SELECT 'rohit'
UNION ALL
SELECT 'rajesh') t ON s.sname = t.sname
just because you've got a criteria in there two times doesn't mean that it will return 1 result per criteria. SQL engines usually just use the unique criteria - thus, from your example, there will be 2 criteria in IN clause: 'rajesh','rohit'
WHY do you need to return 2 results? are there two rajesh in your table? they should BOTH return then. You don't need to ask for rajesh twice for that to happen. What does your data look like? What do you want to see returned?
Hi i am query just as you give above and it give me all data that matches in the condition of in clause. just like your post
select * from person
where personid in (
'Carson','Kim','Carson'
)
order by FirstName
and its give me all records which fulfill this Criteria