How can I get a pivot table with concatenated values? - sql

I have the following data:
| ID | TYPE | USER_ID |
|----------|----------|----------|
| 1 | A | 7 |
| 1 | A | 8 |
| 1 | B | 6 |
| 2 | A | 9 |
| 2 | B | 5 |
I'm trying to create a query to return
| ID | RESULT |
|----------|----------|
| 1 | 7, 8, 6 |
| 2 | 9, 5 |
The USER_ID values must be ordered by the TYPE attribute.
Since I'm using MS ACCESS, I'm trying to pivot. What I've tried:
TRANSFORM first(user_id)
SELECT id, type
FROM mytable
GROUP BY id, type
ORDER BY type
PIVOT user_id
Error:
Too many crosstab column headers (4547).
I'm missing something in the syntax. However, it seems to be wrong since the first() aggregate needs to be changed to something else to concatenate the results.
PS: I'm using MS-ACCESS 2007. If you know a solution for SQL-Server or Oracle using only SQL (without vendor functions or stored procedures), I'll probably accept your answer since it will help me to find a solution for this problem.

You don't want to use PIVOT. Pivot will create a column named after each of your user IDs (1 - 7). Your TYPE field doesn't seem to do anything either.
Unfortunately, doing this in SQL Server requires the use of a function (FOR XML Path) that's not available in Access.
Here's a link with a similar Access function to do something similar.

Related

Union two query result column-wise

Say if I have two queries returning two tables with the same number of rows.
For example, if query 1 returns
| a | b | c |
| 1 | 2 | 3 |
| 4 | 5 | 6 |
and query 2 returns
| d | e | f |
| 7 | 8 | 9 |
| 10 | 11 | 12 |
How to obtain the following, assuming both queries are opaque
| a | b | c | d | e | f |
| 1 | 2 | 3 | 7 | 8 | 9 |
| 4 | 5 | 6 | 10 | 11 | 12 |
My current solution is to add to each query a row number column and inner join them
on this column.
SELECT
q1_with_rownum.*,
q2_with_rownum.*
FROM (
SELECT ROW_NUMBER() OVER () AS q1_rownum, q1.*
FROM (.......) q1
) q1_with_rownum
INNER JOIN (
SELECT ROW_NUMBER() OVER () AS q2_rownum, q2.*
FROM (.......) q2
) q2_with_rownum
ON q1_rownum = q2_rownum
However, if there is a column named q1_rownum in either of the query,
the above will break. It is not possible for me to look into q1 or q2;
the only information available is that they are both valid SQL queries
and do not contain columns with same names. Are there any SQL construct
similar to UNION but for columns instead of rows?
There is no such function. A row in a table is an entity.
If you are constructing generic code to run on any tables, you can try using less common values, such as "an unusual query rownum" -- or something more esoteric than that. I would suggest using the same name in both tables and then using using clause for the join.
Not sure if I understood your exact problem, but I think you mean both q1 and q2 are joined on a column with the same name?
You should add each table name before the column to distinguish which column is referenced:
"table1"."similarColumnName" = "table2"."similarColumnName"
EDIT:
So, problem is that if there is already a column with the same alias as your ROW_NUMBER(), the JOIN cannot be made because you have an ambiguous column name.
The easier solution if you cannot know your incoming table's columns is to make a solid alias, for example _query_join_row_number
EDIT2:
You could look into prefixing all columns with their original table's name, thus removing any conflict (you get q1_with_rows.rows and conflict column is q1_with_rows.q1.rows)
an example stack on this: In a join, how to prefix all column names with the table it came from

Access text count in query design

I am new to Access and am trying to develop a query that will allow me to count the number of occurrences of one word in each field from a table with 15 fields.
The table simply stores test results for employees. There is one table that stores the employee identification - id, name, etc.
The second table has 15 fields - A1 through A15 with the words correct or incorrect in each field. I need the total number of incorrect occurrences for each field, not for the entire table.
Is there an answer through Query Design, or is code required?
The solution, whether Query Design, or code, would be greatly appreciated!
Firstly, one of the reasons that you are struggling to obtain the desired result for what should be a relatively straightforward request is because your data does not follow database normalisation rules, and consequently, you are working against the natural operation of a RDBMS when querying your data.
From your description, I assume that the fields A1 through A15 are answers to questions on a test.
By representing these as separate fields within your database, aside from the inherent difficulty in querying the resulting data (as you have discovered), if ever you wanted to add or remove a question to/from the test, you would be forced to restructure your entire database!
Instead, I would suggest structuring your table in the following way:
Results
+------------+------------+-----------+
| EmployeeID | QuestionID | Result |
+------------+------------+-----------+
| 1 | 1 | correct |
| 1 | 2 | incorrect |
| ... | ... | ... |
| 1 | 15 | correct |
| 2 | 1 | correct |
| 2 | 2 | correct |
| ... | ... | ... |
+------------+------------+-----------+
This table would be a junction table (a.k.a. linking / cross-reference table) in your database, supporting a many-to-many relationship between the tables Employees & Questions, which might look like the following:
Employees
+--------+-----------+-----------+------------+------------+-----+
| Emp_ID | Emp_FName | Emp_LName | Emp_DOB | Emp_Gender | ... |
+--------+-----------+-----------+------------+------------+-----+
| 1 | Joe | Bloggs | 01/01/1969 | M | ... |
| ... | ... | ... | ... | ... | ... |
+--------+-----------+-----------+------------+------------+-----+
Questions
+-------+------------------------------------------------------------+--------+
| Qu_ID | Qu_Desc | Qu_Ans |
+-------+------------------------------------------------------------+--------+
| 1 | What is the meaning of life, the universe, and everything? | 42 |
| ... | ... | ... |
+-------+------------------------------------------------------------+--------+
With this structure, if ever you wish to add or remove a question from the test, you can simply add or remove a record from the table without needing to restructure your database or rewrite any of the queries, forms, or reports which depends upon the existing structure.
Furthermore, since the result of an answer is likely to be a binary correct or incorrect, then this would be better (and far more efficiently) represented using a Boolean True/False data type, e.g.:
Results
+------------+------------+--------+
| EmployeeID | QuestionID | Result |
+------------+------------+--------+
| 1 | 1 | True |
| 1 | 2 | False |
| ... | ... | ... |
| 1 | 15 | True |
| 2 | 1 | True |
| 2 | 2 | True |
| ... | ... | ... |
+------------+------------+--------+
Not only does this consume less memory in your database, but this may be indexed far more efficiently (yielding faster queries), and removes all ambiguity and potential for error surrounding typos & case sensitivity.
With this new structure, if you wanted to see the number of correct answers for each employee, the query can be something as simple as:
select results.employeeid, count(*)
from results
where results.result = true
group by results.employeeid
Alternatively, if you wanted to view the number of employees answering each question correctly (for example, to understand which questions most employees got wrong), you might use something like:
select results.questionid, count(*)
from results
where results.result = true
group by results.questionid
The above are obviously very basic example queries, and you would likely want to join the Results table to an Employees table and a Questions table to obtain richer information about the results.
Contrast the above with your current database structure -
Per your original question:
The second table has 15 fields - A1 through A15 with the words correct or incorrect in each field. I need the total number of incorrect occurrences for each field, not for the entire table.
Assuming that you want to view the number of incorrect answers by employee, you are forced to use an incredibly messy query such as the following:
select
employeeid,
iif(A1='incorrect',1,0)+
iif(A2='incorrect',1,0)+
iif(A3='incorrect',1,0)+
iif(A4='incorrect',1,0)+
iif(A5='incorrect',1,0)+
iif(A6='incorrect',1,0)+
iif(A7='incorrect',1,0)+
iif(A8='incorrect',1,0)+
iif(A9='incorrect',1,0)+
iif(A10='incorrect',1,0)+
iif(A11='incorrect',1,0)+
iif(A12='incorrect',1,0)+
iif(A13='incorrect',1,0)+
iif(A14='incorrect',1,0)+
iif(A15='incorrect',1,0) as IncorrectAnswers
from
YourTable
Here, notice that the answer numbers are also hard-coded into the query, meaning that if you decide to add a new question or remove an existing question, not only would you need to restructure your entire database, but queries such as the above would also need to be rewritten.

Spark SQL: Aggregate column values within a Group

I need to aggregate the values of a column articleId to an array. This needs to be done within a group which i create per groupBy beforehand.
My table looks the following:
| customerId | articleId | articleText | ...
| 1 | 1 | ... | ...
| 1 | 2 | ... | ...
| 2 | 1 | ... | ...
| 2 | 2 | ... | ...
| 2 | 3 | ... | ...
And I want to build something like
| customerId | articleIds |
| 1 | [1, 2] |
| 2 | [1, 2, 3] |
My code so far:
DataFrame test = dfFiltered.groupBy("CUSTOMERID").agg(dfFiltered.col("ARTICLEID"));
But here I get an AnalysisException:
Exception in thread "main" org.apache.spark.sql.AnalysisException: expression 'ARTICLEID' is neither present in the group by, nor is it an aggregate function. Add to group by or wrap in first() (or first_value) if you don't care which value you get.;
Can someone help to build a correct statement?
For SQL syntax, when you want to group by something, you must to include this "something" in select statement. Maybe in your sparkSQL code, it's not indicated this point.
You have a similar question so I think it's the solution for your problem SPARK SQL replacement for mysql GROUP_CONCAT aggregate function
This can be achieved using collect_list function, but it's available only if you're using HiveContext:
import org.apache.spark.sql.functions._
df.groupBy("customerId").agg(collect_list("articleId"))

Oracle view grouping elements [duplicate]

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
Oracle: Combine multiple results in a subquery into a single comma-separated value
Hi there,
this is my problem...
I have a table:
+------+------+------+
| CODE | NAME | TYPE |
+------+------+------+
| 1 | AAA | x |
+------+------+------+
| 2 | BBB | x |
+------+------+------+
| 3 | CCC | y |
+------+------+------+
| 4 | DDD | y |
+------+------+------+
I wanna make a view in ORACLE .... I wanna that the result is:
+---------+------+
| NAME | TYPE |
+---------+------+
| AAA;BBB | x |
+---------+------+
| CCC;DDD | y |
+---------+------+
Can I grouping AAA and BBB because they have same TYPE in a VIEW that in a NAME will be "AAA;BBB" ... so grouping various names divided with ;
Can anyone help me?
Regards,
Tommaso
Tim Hall has a page that covers the various string aggregation techniques available in Oracle depending on the Oracle version, what packages are installed in the database, and whether you can create new procedures to support this or whether you want it done in pure SQL.
If you are using 11.2, the simplest option would be to use the built-in LISTAGG analytic funciton
SELECT listagg(name, ';') within group (order by code), type
FROM your_table
GROUP BY type
If you are using an earlier version, my preference would be to use the custom aggregate function (Tim's string_agg).

Is it possible to construct dynamic aggregate columns in an ARel query that uses a join?

Here's a bit of sample context for my question below to help clarify what I'm asking...
The Schema
Users
- id
- name
Answers
- id
- user_id
- topic_id
- was_correct
Topics
- id
- name
The Data
Users
id | name
1 | Gabe
2 | John
Topics
id | name
1 | Math
2 | English
Answers
id | user_id | topic_id | was_correct
1 | 1 | 1 | 0
2 | 1 | 1 | 1
3 | 1 | 2 | 1
4 | 2 | 1 | 0
5 | 2 | 2 | 0
What I'd like to have, in a result set, is a table with one row per user, and two columns per topic, one that shows the sum of correct answers for the topic, and one that shows the sum of the incorrect answers for that topic. For the sample data above, this result set would look like:
My desired result
users.id | users.name | topic_1_correct_sum | topic_1_incorrect_sum | topic_2_correct_sum | topic_2_incorrect_sum
1 | Gabe | 1 | 1 | 1 | 0
2 | John | 0 | 1 | 0 | 1
Obviously, if there were more topics in the Topics table, I'd like this query to include new correct_sum and incorrect_sums for each topic that exists, so I'm looking for a way to write this without hard-coding topic_ids into the sum functions of my select clause.
Is there a smart way to magic this sort of thing with ARel?
Gabe,
What you're looking for here is a crosstab query. There are many approaches to writing this, unfortunately none that will be generic enough in SQL. AFAIK each database handles crosstabs differently. Another way of looking at this is as a "cube", something typically found in OLAP-type databases (as opposed to OLTP).
Its easily writeable in SQL, however will likely include some functions native to the database you're using. What DB are you using?
Your answers table looks like it needs to have 1,2,3,4,5 and not 1,1,1,1,1 as ids...