Is SQL GROUP BY a design flaw? [closed]

Is SQL GROUP BY a design flaw? [closed] - sql

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Improve this question
Why does SQL require that I specify on which attributes to group? Why can't it just use all non-aggregates?
If an attribute is not aggregated and is not in the GROUP BY clause then nondeterministic choice would be the only option assuming tuples are unordered (mysql kind of does this) and that is a huge gotcha. As far as I know, Postgresql requires that all attributes not appearing in the GROUP BY must be aggregated, which reinforces that it is superfluous.
Am I missing something or is this a language design flaw that promotes loose implementations and makes queries harder to write?
If I am missing something, what is an example query where group attributes can not be inferred?

You don't have to group by the exactly the same thing you're selecting, e.g. :
SQL:select priority,count(*) from rule_class
group by priority
PRIORITY COUNT(*)
70 1
50 4
30 1
90 2
10 4
SQL:select decode(priority,50,'Norm','Odd'),count(*) from rule_class
group by priority
DECO COUNT(*)
Odd 1
Norm 4
Odd 1
Odd 2
Odd 4
SQL:select decode(priority,50,'Norm','Odd'),count(*) from rule_class
group by decode(priority,50,'Norm','Odd')
DECO COUNT(*)
Norm 4
Odd 8

There is one more reason for why does SQL requires that I specify on which attributes to group.
Lets sat we have two simple tables: friend and car, where we store info about our friends and their cars.
And lets say we want to show all our friends's data (from table friend) and for everyone of our friends, how many cars they own now, have sold, have crashed and the total number. Oh, and we want the elders first, younger last.
We'd do something like:
SELECT f.id
, f.firstname
, f.lastname
, f.birthdate
, COUNT(NOT c.sold AND NOT c.crashed) AS owned
, COUNT(c.sold) AS sold
, COUNT(c.crashed) AS crashed
, COUNT(c.friendid) AS totalcars
FROM friend f
LEFT JOIN car c <--to catch (shame!) those friends who have never had a car
ON f.id = c.friendid
GROUP BY f.id
, f.firstname
, f.lastname
, f.birthdate
ORDER BY f.birthdate DESC
But do we really need all those fields in the GROUP BY? Isn't every friend uniquely determined by his id? In other words, aren't the firstname, lastname and birthdate functionally dependend on the f.id? Why not just do (as we can in MySQL):
SELECT f.id
, f.firstname
, f.lastname
, f.birthdate
, COUNT(NOT c.sold AND NOT c.crashed) AS owned
, COUNT(c.sold) AS sold
, COUNT(c.crashed) AS crashed
, COUNT(c.friendid) AS totalcars
FROM friend f
LEFT JOIN car c <--to catch (shame!) those friends who have never had a car
ON f.id = c.friendid
GROUP BY f.id
ORDER BY f.birthdate
And what if we had 20 fields in the SELECT (plus ORDER BY) parts? Isn't the second query shorter, clearer and probably faster (in the RDBMS that accept it)?
I say yes. So, do the SQL 1999 and 2003 specs say, if this article is correct: Debunking group by myths

I would say if you have a large number of items in the group by clause then perhaps the core info should be pulled out into a tabular sub-query which you inner join into.
There is a probably a performance hit, but it makes for neater code.
select id, count(a), b, c, d
from table
group by
id, b, c, d
becomes
select id, myCount, b, c, d
from table t
inner join (
select id, count(*) as myCount
from table
group by id
) as myCountTable on myCountTable.id = t.id
That said, I'm interested to hear counter-arguments for doing this as opposed to having a large group by clause.

I agree its verbose that the group by list shouldn't implicitly be the same as then non-aggregated select columns. In Sas there are data aggregation operations that are more succinct.
Also : it's hard to come up with an example where it would be useful to have a longer list of columns in the group list than the select list. The best I can come up with is ...
create table people
( Nam char(10)
,Adr char(10)
)
insert into people values ('Peter', 'Tibet')
insert into people values ('Peter', 'OZ')
insert into people values ('Peter', 'OZ')
insert into people values ('Joe', 'NY')
insert into people values ('Joe', 'Texas')
insert into people values ('Joe', 'France')
-- Give me people where there is a duplicate address record
select * from people where nam in
(
select nam
from People
group by nam, adr -- group list different from select list
having count(*) > 1
)

If you issue just regarding to easier way to write scripts.
Here is one tip:
In MS SQL MGMS write you query in text something like select * from my_table
after that select text right click and "Design Query in Editor.."
Sql studio will open new editor with filed up all fields after that again right click and select "Add Gruop BY"
Sql MGM studio will add code for you .
I fund this method extremely useful for insert statements. When I need to write script for insert a lot of fields in table, I just do select * from table_where_want_to_insert and after that change type of select statement to insert,

I Agree
I quite agree with the question. I asked the same one here.
I honestly think it's a language flaw.
I realise that there are arguments against that, but I have yet to use a GROUP BY clause containing anything other than all the non-aggregated fields from the SELECT clause in the real world.

This thread provides some useful explanations.
http://social.msdn.microsoft.com/Forums/en/transactsql/thread/52482614-bfc8-47db-b1b6-deec7363bd1a

I'd say it is more likely to be a language design choice that decisions be explicit, not implicit. For instance, what if I wish to group the data in a different order than that in which I output the columns? Or if I want to group by columns that aren't included in the columns selected? Or if I want to output grouped columns only and not use aggregate functions? Only by explicitly stating my preferences in the group by clause are my intentions clear.
You also have to remember that SQL is a very old language (1970). Look at how Linq flipped everything around in order to make Intellisense work - it looks obvious to us now, but SQL predates IDEs and so couldn't have taken into account such issues.

The "superflous" attributes influence the ordering of the result.
Consider:
create table gb (
a number,
b varchar(3),
c varchar(3)
);
insert into gb values ( 3, 'foo', 'foo');
insert into gb values ( 1, 'foo', 'foo');
insert into gb values ( 0, 'foo', 'foo');
insert into gb values ( 20, 'foo', 'bar');
insert into gb values ( 11, 'foo', 'bar');
insert into gb values ( 13, 'foo', 'bar');
insert into gb values ( 170, 'bar', 'foo');
insert into gb values ( 144, 'bar', 'foo');
insert into gb values ( 130, 'bar', 'foo');
insert into gb values (2002, 'bar', 'bar');
insert into gb values (1111, 'bar', 'bar');
insert into gb values (1331, 'bar', 'bar');
This statement
select sum(a), b, c
from gb
group by b, c;
results in
44 foo bar
444 bar foo
4 foo foo
4444 bar bar
while this one
select sum(a), b, c
from gb
group by c, b;
results in
444 bar foo
44 foo bar
4 foo foo
4444 bar bar

Related

Clustering/Similarity between text cells in an postgres aggregate

I've got a table that has a text column and some other identifying features. I want to be able to group by one of the features and find out whether the text in the groups are similar or not. I want to use this to determine if there are multiple groups in my data or a single group (with some possible bad spelling) so that I can provide a rough "confidence" value showing if the aggregate represents a single group or not.
CREATE TABLE data_test (
Id serial primary key,
Name VARCHAR(70) NOT NULL,
Job VARCHAR(100) NOT NULL);
INSERT INTO data_test
(Name, Job)
VALUES
('John', 'Astronaut'),
('John', 'Astronaut'),
('Ann', 'Sales'),
('Jon', 'Astronaut'),
('Jason', 'Sales'),
('Pranav', 'Sales'),
('Todd', 'Sales'),
('John', 'Astronaut');
I'd like to run a query that was something like:
select
Job,
count(Name),
Similarity_Agg(Name)
from data_test
group by Job;
and receive
Job count Similarity
Sales 4 0.1
Astronaut 4 0.9
Basically showing that Astronaut names are very similar (or, more likely in my data, all the rows are referring to a single astronaut) and the Sales names aren't (more people working in sales than in space). I see there is a Postgres Module that can handle comparing two strings but it doesn't seem to have any aggregate functions in it.
Any ideas?

One option is a self-join:
select
d.job,
count(distinct d.id) cnt,
avg(similarly(d.name, d1.name)) avg_similarity
from data_test d
inner join data_test d1 on d1.job = d.job
group by d.job

Group by fields in single row [duplicate]

Does any one know how to create crosstab queries in PostgreSQL?
For example I have the following table:
Section Status Count
A Active 1
A Inactive 2
B Active 4
B Inactive 5
I would like the query to return the following crosstab:
Section Active Inactive
A 1 2
B 4 5
Is this possible?

Install the additional module tablefunc once per database, which provides the function crosstab(). Since Postgres 9.1 you can use CREATE EXTENSION for that:
CREATE EXTENSION IF NOT EXISTS tablefunc;
Improved test case
CREATE TABLE tbl (
section text
, status text
, ct integer -- "count" is a reserved word in standard SQL
);
INSERT INTO tbl VALUES
('A', 'Active', 1), ('A', 'Inactive', 2)
, ('B', 'Active', 4), ('B', 'Inactive', 5)
, ('C', 'Inactive', 7); -- ('C', 'Active') is missing
Simple form - not fit for missing attributes
crosstab(text) with 1 input parameter:
SELECT *
FROM crosstab(
'SELECT section, status, ct
FROM tbl
ORDER BY 1,2' -- needs to be "ORDER BY 1,2" here
) AS ct ("Section" text, "Active" int, "Inactive" int);
Returns:
Section | Active | Inactive
---------+--------+----------
A | 1 | 2
B | 4 | 5
C | 7 | -- !!
No need for casting and renaming.
Note the incorrect result for C: the value 7 is filled in for the first column. Sometimes, this behavior is desirable, but not for this use case.
The simple form is also limited to exactly three columns in the provided input query: row_name, category, value. There is no room for extra columns like in the 2-parameter alternative below.
Safe form
crosstab(text, text) with 2 input parameters:
SELECT *
FROM crosstab(
'SELECT section, status, ct
FROM tbl
ORDER BY 1,2' -- could also just be "ORDER BY 1" here
, $$VALUES ('Active'::text), ('Inactive')$$
) AS ct ("Section" text, "Active" int, "Inactive" int);
Returns:
Section | Active | Inactive
---------+--------+----------
A | 1 | 2
B | 4 | 5
C | | 7 -- !!
Note the correct result for C.
The second parameter can be any query that returns one row per attribute matching the order of the column definition at the end. Often you will want to query distinct attributes from the underlying table like this:
'SELECT DISTINCT attribute FROM tbl ORDER BY 1'
That's in the manual.
Since you have to spell out all columns in a column definition list anyway (except for pre-defined crosstabN() variants), it is typically more efficient to provide a short list in a VALUES expression like demonstrated:
$$VALUES ('Active'::text), ('Inactive')$$)
Or (not in the manual):
$$SELECT unnest('{Active,Inactive}'::text[])$$ -- short syntax for long lists
I used dollar quoting to make quoting easier.
You can even output columns with different data types with crosstab(text, text) - as long as the text representation of the value column is valid input for the target type. This way you might have attributes of different kind and output text, date, numeric etc. for respective attributes. There is a code example at the end of the chapter crosstab(text, text) in the manual.
db<>fiddle here
Effect of excess input rows
Excess input rows are handled differently - duplicate rows for the same ("row_name", "category") combination - (section, status) in the above example.
The 1-parameter form fills in available value columns from left to right. Excess values are discarded.
Earlier input rows win.
The 2-parameter form assigns each input value to its dedicated column, overwriting any previous assignment.
Later input rows win.
Typically, you don't have duplicates to begin with. But if you do, carefully adjust the sort order to your requirements - and document what's happening.
Or get fast arbitrary results if you don't care. Just be aware of the effect.
Advanced examples
Pivot on Multiple Columns using Tablefunc - also demonstrating mentioned "extra columns"
Dynamic alternative to pivot with CASE and GROUP BY
\crosstabview in psql
Postgres 9.6 added this meta-command to its default interactive terminal psql. You can run the query you would use as first crosstab() parameter and feed it to \crosstabview (immediately or in the next step). Like:
db=> SELECT section, status, ct FROM tbl \crosstabview
Similar result as above, but it's a representation feature on the client side exclusively. Input rows are treated slightly differently, hence ORDER BY is not required. Details for \crosstabview in the manual. There are more code examples at the bottom of that page.
Related answer on dba.SE by Daniel Vérité (the author of the psql feature):
How do I generate a pivoted CROSS JOIN where the resulting table definition is unknown?

SELECT section,
SUM(CASE status WHEN 'Active' THEN count ELSE 0 END) AS active, --here you pivot each status value as a separate column explicitly
SUM(CASE status WHEN 'Inactive' THEN count ELSE 0 END) AS inactive --here you pivot each status value as a separate column explicitly
FROM t
GROUP BY section

You can use the crosstab() function of the additional module tablefunc - which you have to install once per database. Since PostgreSQL 9.1 you can use CREATE EXTENSION for that:
CREATE EXTENSION tablefunc;
In your case, I believe it would look something like this:
CREATE TABLE t (Section CHAR(1), Status VARCHAR(10), Count integer);
INSERT INTO t VALUES ('A', 'Active', 1);
INSERT INTO t VALUES ('A', 'Inactive', 2);
INSERT INTO t VALUES ('B', 'Active', 4);
INSERT INTO t VALUES ('B', 'Inactive', 5);
SELECT row_name AS Section,
category_1::integer AS Active,
category_2::integer AS Inactive
FROM crosstab('select section::text, status, count::text from t',2)
AS ct (row_name text, category_1 text, category_2 text);
DB Fiddle here:
Everything works: https://dbfiddle.uk/iKCW9Uhh
Without CREATE EXTENSION tablefunc; you get this error: https://dbfiddle.uk/j8W1CMvI
ERROR: function crosstab(unknown, integer) does not exist
LINE 4: FROM crosstab('select section::text, status, count::text fro...
^
HINT: No function matches the given name and argument types. You might need to add explicit type casts.

Solution with JSON aggregation:
CREATE TEMP TABLE t (
section text
, status text
, ct integer -- don't use "count" as column name.
);
INSERT INTO t VALUES
('A', 'Active', 1), ('A', 'Inactive', 2)
, ('B', 'Active', 4), ('B', 'Inactive', 5)
, ('C', 'Inactive', 7);
SELECT section,
(obj ->> 'Active')::int AS active,
(obj ->> 'Inactive')::int AS inactive
FROM (SELECT section, json_object_agg(status,ct) AS obj
FROM t
GROUP BY section
)X

Sorry this isn't complete because I can't test it here, but it may get you off in the right direction. I'm translating from something I use that makes a similar query:
select mt.section, mt1.count as Active, mt2.count as Inactive
from mytable mt
left join (select section, count from mytable where status='Active')mt1
on mt.section = mt1.section
left join (select section, count from mytable where status='Inactive')mt2
on mt.section = mt2.section
group by mt.section,
mt1.count,
mt2.count
order by mt.section asc;
The code I'm working from is:
select m.typeID, m1.highBid, m2.lowAsk, m1.highBid - m2.lowAsk as diff, 100*(m1.highBid - m2.lowAsk)/m2.lowAsk as diffPercent
from mktTrades m
left join (select typeID,MAX(price) as highBid from mktTrades where bid=1 group by typeID)m1
on m.typeID = m1.typeID
left join (select typeID,MIN(price) as lowAsk from mktTrades where bid=0 group by typeID)m2
on m1.typeID = m2.typeID
group by m.typeID,
m1.highBid,
m2.lowAsk
order by diffPercent desc;
which will return a typeID, the highest price bid and the lowest price asked and the difference between the two (a positive difference would mean something could be bought for less than it can be sold).

There's a different dynamic method that I've devised, one that employs a dynamic rec. type (a temp table, built via an anonymous procedure) & JSON. This may be useful for an end-user who can't install the tablefunc/crosstab extension, but can still create temp tables or run anon. proc's.
The example assumes all the xtab columns are the same type (INTEGER), but the # of columns is data-driven & variadic. That said, JSON aggregate functions do allow for mixed data types, so there's potential for innovation via the use of embedded composite (mixed) types.
The real meat of it can be reduced down to one step if you want to statically define the rec. type inside the JSON recordset function (via nested SELECTs that emit a composite type).
dbfiddle.uk
https://dbfiddle.uk/N1EzugHk

Crosstab function is available under the tablefunc extension. You'll have to create this extension one time for the database.
CREATE EXTENSION tablefunc;
You can use the below code to create pivot table using cross tab:
create table test_Crosstab( section text,
status text,
count numeric)
insert into test_Crosstab values ( 'A','Active',1)
,( 'A','Inactive',2)
,( 'B','Active',4)
,( 'B','Inactive',5)
select * from crosstab(
'select section
,status
,count
from test_crosstab'
)as ctab ("Section" text,"Active" numeric,"Inactive" numeric)

Pivot table where column names are based on row values [duplicate]

Does any one know how to create crosstab queries in PostgreSQL?
For example I have the following table:
Section Status Count
A Active 1
A Inactive 2
B Active 4
B Inactive 5
I would like the query to return the following crosstab:
Section Active Inactive
A 1 2
B 4 5
Is this possible?

Install the additional module tablefunc once per database, which provides the function crosstab(). Since Postgres 9.1 you can use CREATE EXTENSION for that:
CREATE EXTENSION IF NOT EXISTS tablefunc;
Improved test case
CREATE TABLE tbl (
section text
, status text
, ct integer -- "count" is a reserved word in standard SQL
);
INSERT INTO tbl VALUES
('A', 'Active', 1), ('A', 'Inactive', 2)
, ('B', 'Active', 4), ('B', 'Inactive', 5)
, ('C', 'Inactive', 7); -- ('C', 'Active') is missing
Simple form - not fit for missing attributes
crosstab(text) with 1 input parameter:
SELECT *
FROM crosstab(
'SELECT section, status, ct
FROM tbl
ORDER BY 1,2' -- needs to be "ORDER BY 1,2" here
) AS ct ("Section" text, "Active" int, "Inactive" int);
Returns:
Section | Active | Inactive
---------+--------+----------
A | 1 | 2
B | 4 | 5
C | 7 | -- !!
No need for casting and renaming.
Note the incorrect result for C: the value 7 is filled in for the first column. Sometimes, this behavior is desirable, but not for this use case.
The simple form is also limited to exactly three columns in the provided input query: row_name, category, value. There is no room for extra columns like in the 2-parameter alternative below.
Safe form
crosstab(text, text) with 2 input parameters:
SELECT *
FROM crosstab(
'SELECT section, status, ct
FROM tbl
ORDER BY 1,2' -- could also just be "ORDER BY 1" here
, $$VALUES ('Active'::text), ('Inactive')$$
) AS ct ("Section" text, "Active" int, "Inactive" int);
Returns:
Section | Active | Inactive
---------+--------+----------
A | 1 | 2
B | 4 | 5
C | | 7 -- !!
Note the correct result for C.
The second parameter can be any query that returns one row per attribute matching the order of the column definition at the end. Often you will want to query distinct attributes from the underlying table like this:
'SELECT DISTINCT attribute FROM tbl ORDER BY 1'
That's in the manual.
Since you have to spell out all columns in a column definition list anyway (except for pre-defined crosstabN() variants), it is typically more efficient to provide a short list in a VALUES expression like demonstrated:
$$VALUES ('Active'::text), ('Inactive')$$)
Or (not in the manual):
$$SELECT unnest('{Active,Inactive}'::text[])$$ -- short syntax for long lists
I used dollar quoting to make quoting easier.
You can even output columns with different data types with crosstab(text, text) - as long as the text representation of the value column is valid input for the target type. This way you might have attributes of different kind and output text, date, numeric etc. for respective attributes. There is a code example at the end of the chapter crosstab(text, text) in the manual.
db<>fiddle here
Effect of excess input rows
Excess input rows are handled differently - duplicate rows for the same ("row_name", "category") combination - (section, status) in the above example.
The 1-parameter form fills in available value columns from left to right. Excess values are discarded.
Earlier input rows win.
The 2-parameter form assigns each input value to its dedicated column, overwriting any previous assignment.
Later input rows win.
Typically, you don't have duplicates to begin with. But if you do, carefully adjust the sort order to your requirements - and document what's happening.
Or get fast arbitrary results if you don't care. Just be aware of the effect.
Advanced examples
Pivot on Multiple Columns using Tablefunc - also demonstrating mentioned "extra columns"
Dynamic alternative to pivot with CASE and GROUP BY
\crosstabview in psql
Postgres 9.6 added this meta-command to its default interactive terminal psql. You can run the query you would use as first crosstab() parameter and feed it to \crosstabview (immediately or in the next step). Like:
db=> SELECT section, status, ct FROM tbl \crosstabview
Similar result as above, but it's a representation feature on the client side exclusively. Input rows are treated slightly differently, hence ORDER BY is not required. Details for \crosstabview in the manual. There are more code examples at the bottom of that page.
Related answer on dba.SE by Daniel Vérité (the author of the psql feature):
How do I generate a pivoted CROSS JOIN where the resulting table definition is unknown?

SELECT section,
SUM(CASE status WHEN 'Active' THEN count ELSE 0 END) AS active, --here you pivot each status value as a separate column explicitly
SUM(CASE status WHEN 'Inactive' THEN count ELSE 0 END) AS inactive --here you pivot each status value as a separate column explicitly
FROM t
GROUP BY section

You can use the crosstab() function of the additional module tablefunc - which you have to install once per database. Since PostgreSQL 9.1 you can use CREATE EXTENSION for that:
CREATE EXTENSION tablefunc;
In your case, I believe it would look something like this:
CREATE TABLE t (Section CHAR(1), Status VARCHAR(10), Count integer);
INSERT INTO t VALUES ('A', 'Active', 1);
INSERT INTO t VALUES ('A', 'Inactive', 2);
INSERT INTO t VALUES ('B', 'Active', 4);
INSERT INTO t VALUES ('B', 'Inactive', 5);
SELECT row_name AS Section,
category_1::integer AS Active,
category_2::integer AS Inactive
FROM crosstab('select section::text, status, count::text from t',2)
AS ct (row_name text, category_1 text, category_2 text);
DB Fiddle here:
Everything works: https://dbfiddle.uk/iKCW9Uhh
Without CREATE EXTENSION tablefunc; you get this error: https://dbfiddle.uk/j8W1CMvI
ERROR: function crosstab(unknown, integer) does not exist
LINE 4: FROM crosstab('select section::text, status, count::text fro...
^
HINT: No function matches the given name and argument types. You might need to add explicit type casts.

Solution with JSON aggregation:
CREATE TEMP TABLE t (
section text
, status text
, ct integer -- don't use "count" as column name.
);
INSERT INTO t VALUES
('A', 'Active', 1), ('A', 'Inactive', 2)
, ('B', 'Active', 4), ('B', 'Inactive', 5)
, ('C', 'Inactive', 7);
SELECT section,
(obj ->> 'Active')::int AS active,
(obj ->> 'Inactive')::int AS inactive
FROM (SELECT section, json_object_agg(status,ct) AS obj
FROM t
GROUP BY section
)X

Sorry this isn't complete because I can't test it here, but it may get you off in the right direction. I'm translating from something I use that makes a similar query:
select mt.section, mt1.count as Active, mt2.count as Inactive
from mytable mt
left join (select section, count from mytable where status='Active')mt1
on mt.section = mt1.section
left join (select section, count from mytable where status='Inactive')mt2
on mt.section = mt2.section
group by mt.section,
mt1.count,
mt2.count
order by mt.section asc;
The code I'm working from is:
select m.typeID, m1.highBid, m2.lowAsk, m1.highBid - m2.lowAsk as diff, 100*(m1.highBid - m2.lowAsk)/m2.lowAsk as diffPercent
from mktTrades m
left join (select typeID,MAX(price) as highBid from mktTrades where bid=1 group by typeID)m1
on m.typeID = m1.typeID
left join (select typeID,MIN(price) as lowAsk from mktTrades where bid=0 group by typeID)m2
on m1.typeID = m2.typeID
group by m.typeID,
m1.highBid,
m2.lowAsk
order by diffPercent desc;
which will return a typeID, the highest price bid and the lowest price asked and the difference between the two (a positive difference would mean something could be bought for less than it can be sold).

There's a different dynamic method that I've devised, one that employs a dynamic rec. type (a temp table, built via an anonymous procedure) & JSON. This may be useful for an end-user who can't install the tablefunc/crosstab extension, but can still create temp tables or run anon. proc's.
The example assumes all the xtab columns are the same type (INTEGER), but the # of columns is data-driven & variadic. That said, JSON aggregate functions do allow for mixed data types, so there's potential for innovation via the use of embedded composite (mixed) types.
The real meat of it can be reduced down to one step if you want to statically define the rec. type inside the JSON recordset function (via nested SELECTs that emit a composite type).
dbfiddle.uk
https://dbfiddle.uk/N1EzugHk

Crosstab function is available under the tablefunc extension. You'll have to create this extension one time for the database.
CREATE EXTENSION tablefunc;
You can use the below code to create pivot table using cross tab:
create table test_Crosstab( section text,
status text,
count numeric)
insert into test_Crosstab values ( 'A','Active',1)
,( 'A','Inactive',2)
,( 'B','Active',4)
,( 'B','Inactive',5)
select * from crosstab(
'select section
,status
,count
from test_crosstab'
)as ctab ("Section" text,"Active" numeric,"Inactive" numeric)

Placing different rows in succession

I've started working with access around 1 month ago and I'm actually making a tool for preventive medicine so they can use a digital version of their actual paper form.
While the program is nearly finished, the med who requested it now wants to export to excel (the easy part) all the data from a patient his treatment and all the medicines used during that treatment in a single line (the problem).
I've been beating my head over that for two days, trying and researching on google, but all i could find was how to put values from a column in a single cell, and that's not how it has to be displayed.
So far, my best attempt (which is far from a good one) has been something like that:
CREATE TABLE Patient
(`SIP` int, `name` varchar(10));
INSERT INTO Patient
(`SIP`, `name`)
VALUES
(70,'John');
-- A patient can have multiple treatments
CREATE TABLE Treatment
(`id` int, `SIPFK` int);
INSERT INTO Treatment
(`id`,`SIPFK`)
VALUES
(1,70);
-- A treatment can have multiple medicines used while it's open
CREATE TABLE Medicine
(`Id` int, `Name` varchar(8), `TreatFK` int);
INSERT INTO Medicine
(`Id`, `Name`, `TreatFK`)
VALUES
(7, 'Apples', 1),
(7, 'Tomatoes', 1),
(7, 'Potatoes', 1),
(8, 'Banana', 2),
(8, 'Peach', 2);
-- The query
select c.id, c.Name, p.id as id2, p.Name as name2, r.id as id3, r.Name as name3
from Medicine as c, Medicine as p, Medicine as r
where c.id = 7 and p.id=7 and r.id=7;
The output I was trying to get was:
7 | Apples | 7 | Tomatoes | 7 | Potatoes
The table medicines will have more columns than that and i need to show every row related to a treatment in a single row along with the treatment.
But the values keep repeating themselves on different rows and the output on the subsequent columns besides the first ones is not as expected. Also GROUP BY won't solve the problem and DISTINCT doesn't work.
The output of the query is as follows: sqlfiddle.com
If any one could give me a hint, I would be grateful.
EDIT: Since access is a derp and won't let me use any good SQL fix nor will recognize DISTINCT to make the data from the queries not repeat themselves, I will try and search for a way to organize the rows directly in the exported excel.
Thank you all for your help, I'll save it cause I'm sure it'll save me hours of hands in the head.

This is a bit problemation, because MS Access does not support recursive CTE's and I dont see a way of doing that without Ranking.
Hence, I have tried to reproduce the results by using subquery which ranks the Medicines
and store these into a temporary table.
create table newtable
select c.id
, c.Name
,(SELECT COUNT(T1.Name) FROM Medicine AS T1 WHERE t1.id=c.id and T1.Name >= c.Name) AS Rank
from Medicine as c;
Afterwards, it is easy because my query is mostly based on Ranks and IDs.
select distinct id
,(select Name from newtable t2 where t1.id=t2.id and rank=1) as firstMed
,(select Name from newtable t2 where t1.id=t2.id and rank=2) as secMed
,(select Name from newtable t2 where t1.id=t2.id and rank=3) as ThirdMed
from newtable t1;
According to me, the SELF JOIN concept and the notion of recursive CTE's are the most important points for that particular example and a good practice would be to do a resarch on these.
for reference: http://sqlfiddle.com/#!2/f80a9/2

Insert blank row between groups of rows and sorted by ID in sql

I have a table which has the following columns and values
ID TYPE NAME
1 MAJOR RAM
2 MAJOR SHYAM
3 MAJOR BHOLE
4 MAJOR NATHA
5 MINOR JOHN
6 MINOR SMITH
My requirement is to right a stored procedure (or SQL query) which would return the same resultset except that there will be blank line after the TYPE changes from one type to another type (major, minor).
MAJOR RAM
MAJOR SHYAM
MAJOR BHOLE
MAJOR NATHA
MINOR JOHN
MINOR SMITH
While i use this query for adding blank line but it is not sorted by basis of ID
select TYPE, NAME from (
select
TYPE as P1,
1 as P2,
ID,
TYPE,
NAME
from EMP
union all
select distinct
TYPE,
2,
'',
'',
N''
from EMP
) Report
order by P1, P2
go
How i sort data by ID
Thanks in advance

Yes, yes, don't do this, but here's the query to do it, assuming SQL Server 2008 R2. Other versions/rdbms you can achieve same functionality by writing two separate queries unioned together.
Query
; WITH DEMO (id, [type], [name]) AS
(
SELECT 1,'MAJOR','RAM'
UNION ALL SELECT 2,'MAJOR','SHYAM'
UNION ALL SELECT 3,'MAJOR','BHOLE'
UNION ALL SELECT 4,'MAJOR','NATHA'
UNION ALL SELECT 5,'MINOR','JOHN'
UNION ALL SELECT 6,'MINOR','SMITH'
)
, GROUPED AS
(
SELECT
D.[type]
, D.[name]
, ROW_NUMBER() OVER (ORDER BY D.[type] ASC, D.[name] DESC) AS order_key
FROM
DEMO D
GROUP BY
--grouping sets introduced with SQL Server 2008 R2
-- http://msdn.microsoft.com/en-us/library/bb510427.aspx
GROUPING SETS
(
[type]
, ([type], [name])
)
)
SELECT
CASE WHEN G.[name] IS NULL THEN NULL ELSE G.[type] END AS [type]
, G.[name]
FROM
GROUPED G
ORDER BY
G.order_key
Results
If you don't like the nulls, use coalsece to make empty strings
type name
MAJOR SHYAM
MAJOR RAM
MAJOR NATHA
MAJOR BHOLE
NULL NULL
MINOR SMITH
MINOR JOHN
NULL NULL

I agree with billinkc.
In a sequential mind, like mine, it can occur different.
The approach is to use a cursor and insert the records into a temp table.
This table can have a column, INT type, lets say it is called "POSITION" which increments with every insert.
Check for ID changes, and add the empty row everytime it does.
Finally make the SELECT order by "POSITION".
My context was:
An interface that dinamically adjust to what the user needs, one of the screens shows a payment table, grouped by provider with the approach early mentioned.
I decided to manage this from database and skip maintainance for the screen at client side because every provider has different payment terms.
Hope this helps, and lets keep an open mind, avoid saying "don't do this" or "this is not what SQL was designed for"

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Is SQL GROUP BY a design flaw? [closed] - sql

I Agree I quite agree with the question. I asked the same one here. I honestly think it's a language flaw. I realise that there are arguments against that, but I have yet to use a GROUP BY clause containing anything other than all the non-aggregated fields from the SELECT clause in the real world.

This thread provides some useful explanations. http://social.msdn.microsoft.com/Forums/en/transactsql/thread/52482614-bfc8-47db-b1b6-deec7363bd1a

Related

Clustering/Similarity between text cells in an postgres aggregate

Group by fields in single row [duplicate]

Pivot table where column names are based on row values [duplicate]

Placing different rows in succession

Insert blank row between groups of rows and sorted by ID in sql

Categories

Resources