Let's suppose we have a table T1 and a table T2. There is a relation of 1:n between T1 and T2. I would like to select all T1 along with all their T2, every row corresponding to T1 records with T2 values concatenated, using only SQL-standard operations.
Example:
T1 = Person
T2 = Popularity (by year)
for each year a person has a certain popularity
I would like to write a selection using SQL-standard operations, resulting something like this:
Person.Name Popularity.Value
John Smith 1.2,5,4.2
John Doe NULL
Jane Smith 8
where there are 3 records in the popularity table for John Smith, none for John Doe and one for Jane Smith, their values being the values represented above. Is this possible? How?
I'm using Oracle but would like to do this using only standard SQL.
Here's one technique, using recursive Common Table Expressions. Unfortunately, I'm not confident on its performance.
I'm sure that there are ways to improve this code, but it shows that there doesn't seem to be an easy way to do something like this using just the SQL standard.
As far as I can see, there really should be some kind of STRINGJOIN aggregate function that would be used with GROUP BY. That would make things like this much easier...
This query assumes that there is some kind of PersonID that joins the two relations, but the Name would work too.
WITH cte (id, Name, Value, ValueCount) AS (
SELECT id,
Name,
CAST(Value AS VARCHAR(MAX)) AS Value,
1 AS ValueCount
FROM (
SELECT ROW_NUMBER() OVER (PARTITION BY Name ORDER BY Name) AS id,
Name,
Value
FROM Person AS per
INNER JOIN Popularity AS pop
ON per.PersonID = pop.PersonID
) AS e
WHERE id = 1
UNION ALL
SELECT e.id,
e.Name,
cte.Value + ',' + CAST(e.Value AS VARCHAR(MAX)) AS Value,
cte.ValueCount + 1 AS ValueCount
FROM (
SELECT ROW_NUMBER() OVER (PARTITION BY Name ORDER BY Name) AS id,
Name,
Value
FROM Person AS per
INNER JOIN Popularity AS pop
ON per.PersonID = pop.PersonID
) AS e
INNER JOIN cte
ON e.id = cte.id + 1
AND e.Name = cte.Name
)
SELECT p.Name, agg.Value
FROM Person p
LEFT JOIN (
SELECT Name, Value
FROM (
SELECT Name,
Value,
ROW_NUMBER() OVER (PARTITION BY Name ORDER BY ValueCount DESC)AS id
FROM cte
) AS p
WHERE id = 1
) AS agg
ON p.Name = agg.Name
This is an example result:
--------------------------------
| Name | Value |
--------------------------------
| John Smith | 1.2,5,4.2 |
--------------------------------
| John Doe | NULL |
--------------------------------
| Jane Smith | 8 |
--------------------------------
As per in Oracle you can use listagg to achive this -
select t1.Person_Name, listagg(t2.Popularity_Value)
within group(order by t2.Popularity_Value)
from t1, t2
where t1.Person_Name = t2.Person_Name (+)
group by t1.Person_Name
I hope this will solve your problem.
But the comment you have given after #DavidJashi question .. well this is not sql standard and I think he is correct. I am also with David that you can not achieve this in pure sql statement.
I know that I'm SUPER late to the party, but for anyone else that might find this, I don't believe that this is possible using pure SQL92. As I discovered in the last few months fighting with NetSuite to try to figure out what Oracle methods I can and cannot use with their ODBC driver, I discovered that they only "support and guarantee" SQL92 standard.
I discovered this, because I had a need to perform a LISTAGG(). Once I found out I was restricted to SQL92, I did some digging through the historical records, and LISTAGG() and recursive queries (common table expressions) are NOT supported in SQL92, at all.
LISTAGG() was added in Oracle SQL version 11g Release 2 (2009 – 11 years ago: reference https://oracle-base.com/articles/misc/string-aggregation-techniques#listagg) , CTEs were added to Oracle SQL in version 9.2 (2007 – 13 years ago: reference https://www.databasestar.com/sql-cte-with/).
VERY frustrating that it's completely impossible to accomplish this kind of effect in pure SQL92, so I had to solve the problem in my C# code after I pulled a ton of extra unnecessary data. Very frustrating.
Related
I'm trying to get months of Employees' birthdays that are found in at least 2 rows
I've tried to unite birthday information table with itself supposing that I could iterate through them abd get months that appear multiple times
There's the question: how to get birthdays with months that repeat more than once?
SELECT DISTINCT e.EmployeeID, e.City, e.BirthDate
FROM Employees e
GROUP BY e.BirthDate, e.City, e.EmployeeID
HAVING COUNT(MONTH(b.BirthDate))=COUNT(MONTH(e.BirthDate))
UNION
SELECT DISTINCT b.EmployeeID, b.City, b.BirthDate
FROM Employees b
GROUP BY b.EmployeeID, b.BirthDate, b.City
HAVING ...
Given table:
| 1 | City1 | 1972-03-26|
| 2 | City2 | 1979-12-13|
| 3 | City3 | 1974-12-16|
| 4 | City3 | 1979-09-11|
Expected result :
| 2 | City2 |1979-12-13|
| 3 | City3 |1974-12-16|
Think of it in steps.
First, we'll find the months that have more than one birthday in them. That's the sub-query, below, which I'm aliasing as i for "inner query". (Substitute MONTH(i.Birthdate) into the SELECT list for the 1 if you want to see which months qualify.)
Then, in the outer query (o), you want all the fields, so I'm cheating and using SELECT *. Theoretically, a WHERE IN would work here, but IN can have unfortunate side effects if a NULL comes back, so I never use it. Instead, there's a correlated sub=query; which is to say we look for any results where the month from the outer query is equal to the months that make the cut in the inner (correlated sub-) query.
When using a correlated sub-query in the WHERE clause, the SELECT list doesn't matter. You could put 1/0 and it won't throw an error. But I always use SELECT 1 to show that the inner query isn't actually returning any results to the outer query. It's just there to look for, well, the correlation between the two data sets.
SELECT
*
FROM
#table AS o
WHERE
EXISTS
(
SELECT
1
FROM
#table AS i
WHERE
MONTH(i.Birthdate) = MONTH(o.Birthdate)
GROUP BY
MONTH(i.Birthdate)
HAVING
COUNT(*) > 1
);
Seems to be an odd requirement.
This might help with some tweaks. Works in Oracle.
SELECT DATE FROM TABLE WHERE EXTRACT(MONTH FROM DATE)=EXTRACT(MONTH FROM SOMEDATE);
Give this a try and you may be able to dispense with your UNION:
SELECT
EmployeeId
, City
, BirthDate
FROM Employees
GROUP BY
EmployeeId
, City
, BirthDate
HAVING COUNT(Month(BirthDate)) > 2
Here is another approach using GROUP_CONCAT. It's not exactly what you're looking for but it might do the job. Eric's approach is better though. (Note: This is for MySQL)
SELECT GROUP_CONCAT(EmployeeID) EmployeeID, BirthDate, COUNT(*) DupeCount
FROM Employees
GROUP BY MONTH(BirthDate)
HAVING DupeCount> 1;
I have data as follows:
ID Name Data
1 Joe ["Mary","Joe"]
2 Mary ["Sarah","Mary","Mary"]
3 Bill ["Bill","Joe"]
4 James ["James","James","James"]
I want to write a query that selects the LAST element from the array, which does not equal the Name field. For example, I want the query to return the following results:
ID Name Last
1 Joe Mary
2 Mary Sarah
3 Bill Joe
4 James (NULL)
I am getting close - I can select the last element with the following query:
SELECT ID, Name,
(Data::json->(json_array_length(Data::json)-1))::text AS Last
FROM table;
ID Name Last
1 Joe Joe
2 Mary Mary
3 Bill Joe
4 James James
However, I need one more level - to evaluate the last item, and if it is the same as the name field, to try the next to last field, and so on.
Any help or pointers would be most appreciated!
json in Postgres 9.3
This is hard in pg 9.3, because useful functionality is missing.
Method 1
Unnest in a LEFT JOIN LATERAL (clean and standard-conforming), trim double-quotes from json after casting to text. See links below.
SELECT DISTINCT ON (1)
t.id, t.name, d.last
FROM tbl t
LEFT JOIN LATERAL (
SELECT ('[' || d::text || ']')::json->>0 AS last
FROM json_array_elements(t.data) d
) d ON d.last <> t.name
ORDER BY 1, row_number() OVER () DESC;
While this works, and I have never seen it fail, the order of unnested elements depends on undocumented behavior. See links below!
Improved the conversion from json to text with the expression provided by #pozs in the comment. Still hackish, but should be safe.
Method 2
SELECT DISTINCT ON (1)
id, name, NULLIF(last, name) AS last
FROM (
SELECT t.id, t.name
,('[' || json_array_elements(t.data)::text || ']')::json->>0 AS last
, row_number() OVER () AS rn
FROM tbl t
) sub
ORDER BY 1, (last = name), rn DESC;
Unnest in the SELECT list (non-standard).
Attach row number (rn) in parallel (more reliable).
Convert to text like above.
The expression (last = name) in the ORDER BY clause sorts matching names last (but before NULL). So a matching name is only selected if no other name is available. Last link below.
In the SELECT list, NULLIF replaces a matching name with NULL, arriving at the same result as above.
SQL Fiddle.
json or jsonb in Postgres 9.4
pg 9.4 ships all the necessary improvements:
SELECT DISTINCT ON (1)
t.id, t.name, d.last
FROM tbl t
LEFT JOIN LATERAL json_array_elements_text(data) WITH ORDINALITY d(last, rn)
ON d.last <> t.name
ORDER BY d.rn DESC;
Use jsonb_array_elements_text() for jsonb. All else the same.
json / jsonb functions in the manual
Related answers with more explanation:
How to turn json array into postgres array?
PostgreSQL unnest() with element number
Index for finding an element in a JSON array
Time based priority in Active Record Query
Select first row in each GROUP BY group?
I'm looking for an sql answer on how to merge two tables without anything in common.
So let's say you have these two tables without anything in common:
Guys Girls
id name id name
--- ------ ---- ------
1 abraham 5 sarah
2 isaak 6 rachel
3 jacob 7 rebeka
8 leah
and you want to merge them side-by-side like this:
Couples
id name id name
--- ------ --- ------
1 abraham 5 sarah
2 isaak 6 rachel
3 jacob 7 rebeka
8 leah
How can this be done?
I'm looking for an sql answer on how to merge two tables without anything in common.
You can do this by creating a key, which is the row number, and joining on it.
Most dialects of SQL support the row_number() function. Here is an approach using it:
select gu.id, gu.name, gi.id, gi.name
from (select g.*, row_number() over (order by id) as seqnum
from guys g
) gu full outer join
(select g.*, row_number() over (order by id) as seqnum
from girls g
) gi
on gu.seqnum = gi.seqnum;
Just because I wrote it up anyway, an alternative using CTEs;
WITH guys2 AS ( SELECT id,name,ROW_NUMBER() OVER (ORDER BY id) rn FROM guys),
girls2 AS ( SELECT id,name,ROW_NUMBER() OVER (ORDER BY id) rn FROM girls)
SELECT guys2.id guyid, guys2.name guyname,
girls2.id girlid, girls2.name girlname
FROM guys2 FULL OUTER JOIN girls2 ON guys2.rn = girls2.rn
ORDER BY COALESCE(guys2.rn, girls2.rn);
An SQLfiddle to test with.
Assuming, you want to match guys up with girls in your example, and have some sort of meaningful relationship between the records (no pun intended)...
Typically you'd do this with a separate table to represent the association (relationship) between the two.
This wouldn't give you a physical table, but it would enable you to write an SQL query representing the final results:
SELECT Girls.ID AS GirlId, Girls.Name AS GirlName, Guys.ID AS GuyId, Guys.Name AS GuyName
FROM Couples INNER JOIN
Girls ON Couples.GirlId = Girls.ID INNER JOIN
Guys ON Couples.GuyId = Guys.ID
which you could then use to create a table on the fly using the Select Into syntax
SELECT Girls.ID AS GirlId, Girls.Name AS GirlName, Guys.ID AS GuyId, Guys.Name AS GuyName
INTO MyNewTable
FROM Couples INNER JOIN
Girls ON Couples.GirlId = Girls.ID INNER JOIN
Guys ON Couples.GuyId = Guys.ID
(But standard Normalization rules would say it's best to keep them in distinct tables rather than creating a temp table, unless there's a performance reason not to do so.)
I need this all the time, -- creating templates in Excel using input from my tables. This pulls from one table that has my regions, the other with the quarters in a year. the result gives me one region name for each quarter/period.
SELECT b.quarter_qty, a.mkt_name FROM TBL_MKTS a, TBL_PERIODS b
Background
I have a table which has six columns. The first three columns create the pk. I'm tasked with removing one of the pk columns.
I selected (using distinct) the data into a temp table (excluding the third column), and tried inserting all of that data back into the original table with the third column being '11' for every row as this is what I was instructed to do. (this column is going to be removed by a DBA after I do this)
However, when I went to insert this data back into the original table I get a pk constraint error. (shocking, I know)
The other three columns are just date columns, so the distinct select didn't create a unique pk for each record. What I'm trying to achieve is just calling a distinct on the first two columns, and then just arbitrarily selecting the three other columns as it doesn't matter which dates I choose (at least not on dev).
What I've tried
I found the following post which seems to achieve what I want:
How do I (or can I) SELECT DISTINCT on multiple columns?
I tried the answers from both Joel,and Erwin.
Attempt 1:
However, with Joels answer the set returned is too large - the inner join isn't doing what I thought it would do. Selecting distinct col1 and col2 there are 400 columns returned, however when I use his solution 600 rows are returned. I checked the data and in fact there were duplicate pk's. Here is my attempt at duplicating Joels answer:
select a.emp_no,
a.eec_planning_unit_cde,
'11' as area, create_dte,
create_by_emp_no, modify_dte,
modify_by_emp_no
from tempdb.guest.temp_part_time_evaluator b
inner join
(
select emp_no, eec_planning_unit_cde
from tempdb.guest.temp_part_time_evaluator
group by emp_no, eec_planning_unit_cde
) a
ON b.emp_no = a.emp_no AND b.eec_planning_unit_cde = a.eec_planning_unit_cde
Now, if I execute just the inner select statement 400 rows are returned. If I select the whole query 600 rows are returned? Isn't inner join supposed to only show the intersection of the two sets?
Attempt 2:
I also tried the answer from Erwin. This one has a syntax error and I'm having trouble googling the spec on the where clause (specifically, the trick he is using with (emp_no, eec_planning_unit_cde))
Here is the attempt:
select emp_no,
eec_planning_unit_cde,
'11' as area, create_dte,
create_by_emp_no,
modify_dte,
modify_by_emp_no
where (emp_no, eec_planning_unit_cde) IN
(
select emp_no, eec_planning_unit_cde
from tempdb.guest.temp_part_time_evaluator
group by emp_no, eec_planning_unit_cde
)
Now, I realize that the post I referenced is for postgresql. Doesn't T-SQL have something similar? Trying to google parenthesis isn't working too well.
Overview of Questions:
Why doesn't inner join return an intersection of two sets? From googling this is what I thought it was supposed to do
Is there another way to achieve the same method that I was trying in attempt 2 in t-sql?
It doesn't matter to me which one of these I use, or if I use another solution... how should I go about this?
A select distinct will be based on all columns so it does not guarantee the first two to be distinct
select pk1, pk2, '11', max(c1), max(c2), max(c3)
from table
group by pk1, pk2
You could TRY this:
SELECT a.emp_no,
a.eec_planning_unit_cde,
b.'11' as area,
b.create_dte,
b.create_by_emp_no,
b.modify_dte,
b.modify_by_emp_no
FROM
(
SELECT emp_no, eec_planning_unit_cde
FROM tempdb.guest.temp_part_time_evaluator
GROUP BY emp_no, eec_planning_unit_cde
) a
JOIN tempdb.guest.temp_part_time_evaluator b
ON a.emp_no = b.emp_no AND a.eec_planning_unit_cde = b.eec_planning_unit_cde
That would give you a distinct on those fields but if there is differences in the data between columns you might have to try a more brute force approch.
SELECT a.emp_no,
a.eec_planning_unit_cde,
a.'11' as area,
a.create_dte,
a.create_by_emp_no,
a.modify_dte,
a.modify_by_emp_no
FROM
(
SELECT ROW_NUMBER() OVER(ORDER BY emp_no, eec_planning_unit_cde) rownumber,
a.emp_no,
a.eec_planning_unit_cde,
a.'11' as area,
a.create_dte,
a.create_by_emp_no,
a.modify_dte,
a.modify_by_emp_no
FROM tempdb.guest.temp_part_time_evaluator
) a
WHERE rownumber = 1
I'll reply one by one:
Why doesn't inner join return an intersection of two sets? From googling this is what I thought it was supposed to do
Inner join don't do an intersection. Le'ts supose this tables:
T1 T2
n s n s
1 A 2 X
2 B 2 Y
2 C
3 D
If you join both tables by numeric column you don't get the intersection (2 rows). You get:
select *
from t1 inner join t2
on t1.n = t2.n;
| N | S |
---------
| 2 | B |
| 2 | B |
| 2 | C |
| 2 | C |
And, your second query approach:
select *
from t1
where t1.n in (select n from t2);
| N | S |
---------
| 2 | B |
| 2 | C |
Is there another way to achieve the same method that I was trying in attempt 2 in t-sql?
Yes, this subquery:
select *
from t1
where not exists (
select 1
from t2
where t2.n = t1.n
);
It doesn't matter to me which one of these I use, or if I use another solution... how should I go about this?
yes, using #JTC second query.
I have one table named GUYS(ID,NAME,PHONE) and i need to add a count of how many guys have the same name and at the same time show all of them so i can't group them.
example:
ID NAME PHONE
1 John 335
2 Harry 444
3 James 367
4 John 742
5 John 654
the wanted output should be
ID NAME PHONE COUNT
1 John 335 3
2 Harry 444 1
3 James 367 1
4 John 742 3
5 John 654 3
how could i do that? i only manage to get lot of guys with different counts.
thanks
Update for 8.0+: This answer was written well before MySQL version 8, which introduced window functions with mostly the same syntax as the existing ones in Oracle.
In this new syntax, the solution would be
SELECT
t.name,
t.phone,
COUNT('x') OVER (PARTITION BY t.name) AS namecounter
FROM
Guys t
The answer below still works on newer versions as well, and in this particular case is just as simple, but depending on the circumstances, these window functions are way easier to use.
Older versions: Since MySQL, until version 8, didn't have analytical functions like Oracle, you'd have to resort to a sub-query.
Don't use GROUP BY, use a sub-select to count the number of guys with the same name:
SELECT
t.name,
t.phone,
(SELECT COUNT('x') FROM Guys ct
WHERE ct.name = t.name) as namecounter
FROM
Guys t
You'd think that running a sub-select for every row would be slow, but if you've got proper indexes, MySQL will optimize this query and you'll see that it runs just fine.
In this example, you should have an index on Guys.name. If you have multiple columns in the where clause of the subquery, the query would probably benefit from a single combined index on all of those columns.
Use an aggregate Query:
select g.ID, g.Name, g.Phone, count(*) over ( partition by g.name ) as Count
from
Guys g;
You can still use a GROUP BY for the count, you just need to JOIN it back to your original table to get all the records, like this:
select g.ID, g.Name, g.Phone, gc.Count
from Guys g
inner join (
select Name, count(*) as Count
from Guys
group by Name
) gc on g.Name = gc.Name
In Oracle DB you can use
SELECT ID,NAME,PHONE,(Select COUNT(ID)From GUYS GROUP BY Name)
FROM GUYS ;
DECLARE #tbl table
(ID int,NAME varchar(20), PHONE int)
insert into #tbl
select
1 ,'John', 335
union all
select
2 ,'Harry', 444
union all
select
3 ,'James', 367
union all
select
4 ,'John', 742
union all
select
5 ,'John', 654
SELECT
ID
, Name
, Phone
, count(*) over(partition by Name)
FROM #tbl
ORDER BY ID
select id, name, phone,(select count(name) from users u1 where u1.name=u2.name) count from users u2
try
select column1, count(1) over ()
it should help