Multiplied rows in impala - sql

I am fetching some data from a view with some joined tables through sqoop into an external table in impala. However I saw that the columns from one table multiply the rows.
For example
id first_name surname step name value
1 ted kast 1 museum visitor
1 ted kast 1 shop buyer
1 ted kast 2 museum visitor
1 ted kast 2 shop buyer
But I want to be something like that
id first_name surname step name_value
1 ted kast 1 [(museum visitor), (shop buyer)]
1 ted kast 2 [(museum visitor), (shop buyer)]
How can I achieve that in impala?

We can use aggregation here along with GROUP_CONCAT:
SELECT
id,
first_name,
surname,
step,
CONCAT('[', GROUP_CONCAT(CONCAT('(', CONCAT_WS(' ', name, value), ')'), ', '), ']') AS name_value
FROM yourTable
GROUP BY
id,
first_name,
surname,
step
ORDER BY id;
Here is a demo for MySQL, where the syntax is almost the same as for Impala.

Related

count different column values after grouping by

Consider this table:
id name department email
1 Alex IT blah#gmail.com
1 Alex IT blah#gmail.com
2 Jay HR jay#gmail.com
2 Jay Marketing zou#gmail.com
If I group byid,name and count I get:
id name count(*)
1 Alex 2
2 Jay 2
With this query:
select id,name,count(*) from tb group by id,name;
However I would like to count only records that diverge from department,email, so as to have:
id name count(*)
1 Alex 0
2 Jay 1
This time the count for the first group 1,Alex is 0 because department,email have the same values (duplicated) , on the other hand 2,Jay is one because department,email has one different value.
If you meant "two different values" for "Jay", you can use distinct:
select id,name,count(*) from (SELECT distinct * FROM tb) group by id,name;
You can use count(*) - 1 to get similar results in your question.

get occurence of multiple rows in postgresql

I have a table that contains students publications like this
id
student
1
john
2
anthony
3
steven
4
lucille
5
anthony
6
steven
7
john
8
lucille
9
john
10
anthony
11
steven
12
lucille
13
john
so the idea is about to have a query that fetchs all ordered occurences of a determinated student names
context :
answer to the question : how many times John is publishing just after Anthony (who is publishing just after Steven ...) and get id of each occurence
example :
If I look for all occurences of [john, anthony] I'll get (note that the ids must be successive for each occurence)
id
student
1
john
2
anthony
9
john
10
anthony
Or :
id
-- comment
1
(id of first occurence of john, anthony)
9
(id of second occurence of john, anthony)
If I look for [anthony, steven, lucille] i'll get
id
student
2
anthony
3
steven
4
lucille
10
anthony
11
steven
12
lucille
Or :
id
-- comment
2
(id of first occurence of anthony, steven, lucille)
10
(id of second occurence of anthony, steven, lucille)
Any ideas or leads to help me move forward?
That should do the trick, performance wise.
The main idea is to split the data by the first student that is in our search list, but not in all places -
Since the same student can appear multiple times in our search list, we need to make sure that we're not breaking the pattern in the middle.
We're doing that by verifying that each occurrence of the first student is far enough from its previous occurrence, that is, the distance between the two occurrences is bigger than the search list length (the number of non-unique students' names within the search list)
with
prm(students) as (select 'anthony,steven,lucille,anthony')
,prm_ext(search_pattern, first_student, tokens_num) as
(
select regexp_replace(students, '^|(,)','\1\d+;', 'g') as search_pattern
,split_part(students, ',', 1) as first_student
,array_length(string_to_array(students, ','), 1) as tokens_num
from prm
)
,prev_student as
(
select id
,student
,lag(id) over (partition by student order by id) as student_prev_id
from t
)
,seq as
(
select id
,student
,sum(case when student = p.first_student and coalesce(id - student_prev_id >= p.tokens_num, true) then 1 end) over (order by id) as seq_id
,id - max(case when student = p.first_student then id end) over (order by id) as distance_from_first_student
from prev_student cross join prm_ext as p
order by id
)
select split_part(unnest(regexp_matches(string_agg(id || ';' || student, ',' order by id), (select search_pattern from prm_ext), 'g')), ';', 1)::int as id
from seq cross join prm_ext p
where seq_id is not null
and distance_from_first_student < p.tokens_num
group by seq_id
This is the result for an extended data sample:
id
2
16
22
Fiddle
Start with this and if it explodes we'll do some performance improvements, with the price of making the code a little bit more complicated.
with
prm(students) as (select 'anthony,steven,lucille')
,prm_ext(students_regex) as (select regexp_replace(students, '^|(,)','\1\d+;', 'g') from prm)
select split_part(unnest(regexp_matches(string_agg(id || ';' || student, ',' order by id), (select students_regex from prm_ext), 'g')), ';', 1)::int as id
from t
id
2
10
with
prm(students) as (select 'anthony,steven,lucille')
,prm_ext(students_regex) as (select regexp_replace(students, '^|(,)','\1\d+;', 'g') from prm)
select cols[1]::int as id
,cols[2]::text as student
from (select string_to_array(string_to_table(unnest(regexp_matches(string_agg(id || ';' || student, ',' order by id), (select students_regex from prm_ext), 'g')), ','), ';') as cols
from t
) t
id
student
2
anthony
3
steven
4
lucille
10
anthony
11
steven
12
lucille
Fiddle

How to return a count of duplicates and unique values into a column in Access

I currently have this table:
First_Name
Last_Name
Jane
Doe
John
Smith
Bob
Smith
Alice
Smith
And I'm looking to get the table to look for duplicates in the last name and return a value into a new column and exclude any null/unique values like the table below, or return a Yes/No into the third column.
First_Name
Last_Name
Duplicates
Jane
Doe
0
John
Smith
3
Bob
Smith
3
Alice
Smith
3
OR
First_Name
Last_Name
Duplicates
Jane
Doe
No
John
Smith
Yes
Bob
Smith
Yes
Alice
Smith
Yes
When I'm trying to enter the query into the Access Database, I keep getting the run-time 3141 error.
The code that I tried in order to get the first option is:
SELECT first_name, last_name, COUNT (last_name) AS Duplicates
FROM table
GROUP BY last_name, first_name
HAVING COUNT(last_name)=>0
You can use a subquery. But I would recommend 1 instead of 0:
select t.*,
(select count(*)
from t as t2
where t2.last_name = t.last_name
)
from t;
If you really want zero instead of 1, then one method is:
select t.*,
(select iif(count(*) = 1, 0, count(*))
from t as t2
where t2.last_name = t.last_name
)
from t;

Finding distinct count of combination of columns values in sql

Currently I have a table this :
Roll no. Names
------------------
1 Sam
1 Sam
2 Sasha
2 Sasha
3 Joe
4 Jack
5 Jack
5 Julie
I want to write a query in which I get count of the combination in another column
Required output
Combination distinct count
-----------------------------
2-Sasha 1
5-Jack 1
5-Julie 1
Basically, you could group by these columns and use a count function:
SELECT rollno, name, COUNT(*)
FROM mytable
GROUP BY rollno, name
You could also concat the two columns:
SELECT CONCAT(rollno, '-', name), COUNT(*)
FROM mytable
GROUP BY CONCAT(rollno, '-', name)

Select query which returns exect no of rows as compare in values of sub query

I have got a table named student. I have written this query:
select * From student where sname in ('rajesh','rohit','rajesh')
In the above query it's returning me two records; one matching 'rajesh' and another matching: 'rohit'.
But i want there to be 3 records: 2 for 'rajesh' and 1 for 'rohit'.
Please provide me some solution or tell me where i am missing.
NOTE: the count of result of sub query is not fix there can be many words there some distinct and some multiple occurrence .
Thanks
Your requirements are not clear, and I'll try to explain why.
Let's define table students
ID FirstName LastName
1 John Smith
2 Mike Smith
3 Ben Bray
4 John Bray
5 John Smith
6 Bill Lynch
7 Bill Smith
Query with WHERE clause:
FirstName in ('Mike', 'Ben', 'Mike')
will return 2 rows only, because it could be rewritten as:
FirstName='Mike' or FirstName='Ben' or FirstName='Mike'
WHERE is filtering clause that just says if existing row satisfy given conditions or not (for each of rows created by FROM clause.
Let's say we have subquery that returns any number of non distinct FirstNames
In case if SQ contains 'Mike', 'Ben', 'Mike' using inner join you can get those 3 rows without problem
Select ST.* from Students ST
Inner Join (Select name from …. <your subquery>) SQ
On ST.FirstName=SQ.name
Result will be:
ID FirstName LastName
2 Mike Smith
2 Mike Smith
3 Ben Bray
Note data are not ordered by order of names returning by SQ. If you want that, SQ should return some ordering number, eg.:
Ord Name
1. Mike
2. Ben
3. Mike
In that case query should be:
Select ST.* from Students ST
Inner Join (Select ord, name from …. <your subquery>) SQ
On ST.FirstName=SQ.name
Order By SQ.ord
And result:
ID FirstName LastName
2 Mike Smith (1)
3 Ben Bray (2)
2 Mike Smith (3)
Now, let's se what will happen if subquery returns
Ord Name
1. Mike
2. Bill
3. Mike
You will end up with
ID FirstName LastName
2 Mike Smith (1)
6 Bill Lynch (2)
7 Bill Smith (2)
2 Mike Smith (3)
Even worse, if you have something like:
Ord Name
1. John
2. Bill
3. John
Result is:
ID FirstName LastName
1 John Smith (1)
4 John Bray (1)
5 John Smith (1)
6 Bill Lynch (2)
7 Bill Smith (2)
1 John Smith (3)
4 John Bray (3)
5 John Smith (3)
This is an complex situation, and you have to clarify precisely what requirement is.
If you need only one student with the same name, for each of rows in SQ, you can use something like SQL 2005+):
;With st1 as
(
Select Row_Number() over (Partition by SQ.ord Order By ID) as rowNum,
ST.ID,
ST.FirstName,
ST.LastName,
SQ.ord
from Students ST
Inner Join (Select ord, name from …. <your subquery>) SQ
On ST.FirstName=SQ.name
)
Select ID, FirstName, LastName
From st1
Where rowNum=1 -- that was missing row, added later
Order By ord
It will return (for SQ values John, Bill, John)
ID FirstName LastName
1 John Smith (1)
6 Bill Lynch (2)
1 John Smith (3)
Note, numbers (1),(2),(3) are shown to display value of ord although they are not returned by query.
If you can split the where clause in your calling code, you could perform a UNION ALL on each clause.
SELECT * FROM Student WHERE sname = 'rajesh'
UNION ALL SELECT * FROM Student WHERE sname = 'rohit'
UNION ALL SELECT * FROM Student WHERE sname = 'rajesh'
Try using a JOIN:
SELECT ...
FROM Student s
INNER JOIN (
SELECT 'rajesh' AS sname
UNION ALL
SELECT 'rohit'
UNION ALL
SELECT 'rajesh') t ON s.sname = t.sname
just because you've got a criteria in there two times doesn't mean that it will return 1 result per criteria. SQL engines usually just use the unique criteria - thus, from your example, there will be 2 criteria in IN clause: 'rajesh','rohit'
WHY do you need to return 2 results? are there two rajesh in your table? they should BOTH return then. You don't need to ask for rajesh twice for that to happen. What does your data look like? What do you want to see returned?
Hi i am query just as you give above and it give me all data that matches in the condition of in clause. just like your post
select * from person
where personid in (
'Carson','Kim','Carson'
)
order by FirstName
and its give me all records which fulfill this Criteria