How to pivot table and combine rows based on a condition - sql

I have the following table produced by the following SQL:
select userid, name, sirname, age from Users
I am wondering what would the best way be to convert this to something that looks like:
Would partition by + distinct be the most efficient way of doing this?

SELECT userid, [0] AS name, [1] AS sirname, age
FROM users
PIVOT
(MAX(name)
FOR sirname IN ([0],[1]))AS p

I would just suggest aggregation:
select userid, max(case when simame = 0 then name end) as name,
max(case when simmame = 1 then name end) as simame,
age
from t
group by userid, age;

I'd like to suggest, that the most natural solution would be:
SELECT
u1.id
,u1.name
,u2.name as sirname
,u1.age
FROM users u1
INNER JOIN users u2 ON u1.id=u2.id and u2.sirname=1
WHERE
u1.sirname=0
even more so, as this approach can easily be modelled to cope with situations where one of the two names may be missing.

Related

Google Biq Query and SQL

I'm used to working with SQL Server databases and now I need to query data from BigQuery.
What is a better way to query data from the table like this?
Where one column includes several columns...
BigQuery supports unnest() for turning array elements into rows. So, you can convert all of this into rows as:
select t.user_id, t.user_pseudo_id, up.*
from t cross join
unnest(user_properties) up;
You want a field per property. There are several ways to do this. If you want exactly one value per row, you can use a subquery and aggregation:
select t.user_id, t.user_pseudo_id, p.*
from t cross join
(select max(case when up.key = 'age' then up.string_value end) as age,
max(case when up.key = 'gender' then up.string_value end) as gender
from unnest(user_properties) up
) p
Usually subqueries are used like:
SELECT
user_id,
user_pseudo_id,
(SELECT value.string_value FROM user_properties WHERE key = "age") AS age,
(SELECT value.string_value FROM user_properties WHERE key = "gender") AS gender,
FROM dataset.table

Listagg + Count in Select duplicates

I'm writing up a query and cannot seem to get over this hurdle.
I am using both LISTAGG and COUNT (side-by-side) in it and whenever I do so, the ListAgg will duplicate when count is more than 1. Moreover, it adds more into the count when the ListAgg is more than one. They're each messing with each other, and I want to know how to keep them within the same query, but keep duplicates from appearing in the ListAgg while finding only the correct amount of instances for the Count.
I've tried using DISTINCT and various groupings, but to no avail.
Here is my (simplified) SQL:
SELECT DISTINCT /*+PARALLEL */ ID, NAME, LISTAGG(USERID, ';'), COUNT(MAIN_DATA)
FROM MAIN m
JOIN USERS u on m.pk1 = u.main_pk1
WHERE MAIN_DATA like '%keyword%'
GROUP BY ID, NAME
which yields something similar to this:
ID|NAME|USERID|MAIN_DATA
1|Hello|Jim|1
2|Hi|Arthur;Arthur;Arthur|3
3|Bonjour|Jane;Jane;Jim;Jim|4
When ID 2 should only have Arthur once, and there are only 2 instances of the keyword in ID 3, not 4. How can I achieve this?
Unfortunately, LISTAGG() doesn't support DISTINCT.
To remove duplicates, you need a subquery:
SELECT ID, NAME, LISTAGG(USERID, ';'), SUM(cnt)
FROM (SELECT ID, NAME, USERID, COUNT(*) as cnt
FROM MAIN m JOIN
USERS u
ON m.pk1 = u.main_pk1
WHERE m.MAIN_DATA like '%keyword%'
GROUP BY ID, NAME, USERID
) mu
GROUP BY ID, NAME;

using group by operators in sql

i have two columns - email id and customer id, where an email id can be associated with multiple customer ids. Now, I need to list only those email ids (along with their corresponding customer ids) which are having a count of more than 1 customer id. I tried using grouping sets, rollup and cube operators, however, am not getting the desired result.
Any help or pointers would be appreciated.
SELECT emailid
FROM
( SELECT emailid, count(custid)
FROM table
Group by emailid
Having count(custid) > 1
)
I think this will get you what you want, if I am understanding you question correctly
select emailid, customerid from tablename where emailid in
(
select emailid from tablename group by emailid having count(emailid) > 1
)
Sounds like you would need to use HAVING
e.g
SELECT email_id, COUNT(customer_id)
From sometable
GROUP BY email_id
HAVING COUNT(customer_id) > 1
HAVING allows you to filter following the grouping of a particular column.
WITH email_ids AS (
SELECT email_id, COUNT(customer_id) customer_count
FROM Table
GROUP BY email_id
HAVING count(customer_id) > 1
)
SELECT t.email_id, t.customer_id
FROM Table t
INNER JOIN email_ids ei
ON ei.email_id = t.email_id
If you need a comma separated list of all of their customer id's returned with the single email id, you could use GROUP_CONCAT for that.
This would find all email_id's with at least 1 customer_id, and give you a comma separated list of all customer_ids for that email_id:
SELECT email_id, GROUP_CONCAT(customer_id)
FROM your_table
GROUP BY email_id
HAVING count(customer_id) > 1;
Assuming email_id #1 was assigned to customer_ids 1, 2, & 3, your output would look like:
email_id | customer_id
1 | 1,2,3
I didn't realize you were using MS SQL, there's a thread here about simulating GROUP_CONCAT in MS SQL: Simulating group_concat MySQL function in Microsoft SQL Server 2005?
SELECT t1.email, t1.customer
FROM table t1
INNER JOIN (
SELECT email, COUNT(customer)
FROM table
GROUP BY email
HAVING COUNT(customer)>1
) t2 on t1.email = t2.email
This should get you what your looking for.
Basically, as other ppl have stated, you can filter group by results with HAVING. But since you want the customerids afterwards, join the entire select back to your original table to get your results. Could probably be done prettier but this is easy to understand.
SELECT
email_id,
STUFF((SELECT ',' + CONVERT(VARCHAR,customer_id) FROM cust_email_table T1 WHERE T1.email_id = T2.email_id
FOR
XML PATH('')
),1,1,'') AS customer_ids
FROM
cust_email_table T2
GROUP BY email_id
HAVING COUNT(*) > 1
this would give you a single row per email id and comma seperated list of customer id's.

Select entry of each group having exactly 1 entry

I am looking for an optimized query
let me show you a small example.
Lets suppose I have a table having three field studentId, teacherId and subject as
Now I want those data in which a physics teacher is teaching to only one student, i.e
teacher 300 is only teaching student 3 and so on.
What I have tried till now
select sid,tid from tabletesting with(nolock)
where tid in (select tid from tabletesting with(nolock)
where subject='physics' group by tid having count(tid) = 1)
and subject='physics'
The above query is working fine. But I want different solution in which I don't have to scan the same table twice.
I also tried using Rank() and Row_Number() but no result.
FYI :
I have showed you an example, this is not the actual table i am playing with, my table contain huge number of rows and columns and where clause is also very complex(i.e date comparison etc.), so I don't want to give the same where clause in subquery and outquery.
You can do this with window functions. Assuming that there are no duplicate students for a given teacher (as in your sample data):
select tt.sid, tt.tid
from (select tt.*, count(*) over (partition by teacher) as scnt
from TableTesting tt
) tt
where scnt = 1;
Another way to approach this, which might be more efficient, is to use an exists clause:
select tt.sid, tt.tid
from TableTesting tt
where not exists (select 1 from TableTesting tt1 where tt1.tid = tt.tid and tt1.sid <> tt.sid)
Another option is to use an analytic function:
select sid, tid, subject from
(
select sid, tid, subject, count(sid) over (partition by subject, tid) cnt
from tabletesting
) X
where cnt = 1

DB2 Query - eliminate maxvalues

I have the following problem (simplified):
I have a table that contains animals, e.g:
ID Type Birthday
1 Dog 1.1.2011
2 Cat 2.1.2009
3 Horse 5.1.2009
4 Cat 10.6.1999
5 Horse 9.3.2006
I know that all the animals belong to one "family". From each family I now want to see all the offspring, but I do not want to see the entry for the "founder of the family".
So for the simple sample above I just want to see this:
ID Type Birthday
2 Cat 2.1.2009
3 Horse 5.1.2009
So far I haven't been able to find a way of grouping the entries and then removing the first entry from each group. I was only able to find how to remove specific lines.
Is it even possible to solve this problem?
Thank you very much for your help. It is much appreciated.
A simple SQL(not necessary efficient can be:)
select
id, type, birthday
from animals
left join
(select type, min(birthday) min_birthday
from animals
group by type) a
on a.type=animals.type and a.min_birthday = animals.birthday
where a.type is null;
For best efficiency you can use an analytical function:
select id, type, birthday
from(
select
id,
type,
birthday,
row_number() over (partition by type order by birthday) as rnk
from animals
) a
where rnk >=2
For more examples with analytical functions, you can read this article
In SQL Server you can do:
select
id, type, birthday
from (
select
id, type, birthday,
row_number() over (partition by type order by birthday asc) r
from
animals
) q
where r > 1
The row_number() functions is rumoured to work also in DB2, but I don't know under which circumstances/versions.
The exists variant:
select id, type, birthday
from animals a
where exists (select null from animals e
where e.type = a.type and e.birthday < a.birthday)
(Edited, following comments.)