MS ACCESS Group Query - sql

This one really has me scratching my head. It's sort of like a GROUP_CONCAT, but different. I'm pretty sure there is no way to do this with SQL only. I have a query that does a flip-table on a normalized table. The result looks like this:
|_Category_|_FieldA_|_FieldB_|_FieldC_|
|----------|--------|--------|--------|
| CAT1 | A | | |
|----------|--------|--------|--------|
| CAT1 | | B | |
|----------|--------|--------|--------|
| CAT1 | | | C |
|----------|--------|--------|--------|
| CAT1 | D | | |
|----------|--------|--------|--------|
| CAT1 | | | E |
|----------|--------|--------|--------|
| CAT1 | F | | |
|----------|--------|--------|--------|
My challenge is to compress it into as few rows as possible, but only have one value per cell.
|_Category_|_FieldA_|_FieldB_|_FieldC_|
|----------|--------|--------|--------|
| CAT1 | A | B | C |
|----------|--------|--------|--------|
| CAT1 | D | | E |
|----------|--------|--------|--------|
| CAT1 | F | | |
|----------|--------|--------|--------|
Any Ideas?
Thanks in advance.
Mark

As i mentioned in my comment to the question, normalized table should look like:
|_Category_|_F_Name_|_F_Val__|
|----------|--------|--------|
| CAT1 | FieldA | A |
|----------|--------|--------|
| CAT1 | FieldB | B |
|----------|--------|--------|
| CAT1 | FieldC | C |
|----------|--------|--------|
| CAT1 | FieldB | D |
|----------|--------|--------|
| CAT1 | FieldC | E |
|----------|--------|--------|
| CAT1 | FieldA | F |
|----------|--------|--------|
How to achieve that?
SELECT A.Category, "FieldA" AS FieldName, A.FieldA AS FieldValue
FROM TableA AS A
WHERE NOT A.FieldA IS NULL
UNION ALL
SELECT A.Category, "FieldB", A.FieldB
FROM TableA AS A
WHERE NOT A.FieldB IS NULL
UNION ALL
SELECT A.Category, "FieldC", A.FieldC
FROM TableA AS A
WHERE NOT A.FieldC IS NULL;
To export the data into new table, use query:
SELECT B.* INTO TableB
FROM (
--above query
) AS B;
Do not forget to add autonumber field (as primary key) to TableB to be able to identify each record.
As per my understanding, you want to pivot data. It's not so simple, becasue we need to simulate
ROW_NUMBER() OVER(PARTITION BY FieldName, ORDER BY ID)
which is not supported in MS Access. How to workaround it?
SELECT B.ID, B.Category, B.FieldName, B.FieldValue,
(SELECT COUNT(A.FieldName)
FROM TableB AS A
WHERE A.FieldName=B.FieldName AND A.ID >=B.ID
GROUP BY A.FieldName ) AS TRank
FROM TableB AS B;
It should produce below record set:
ID Category FieldName FieldValue TRank
1 CAT1 FieldA A 3
2 CAT1 FieldA D 2
3 CAT1 FieldA F 1
4 CAT1 FieldB B 1
5 CAT1 FieldC C 2
6 CAT1 FieldC E 1
But... you can't use above query as a source of pivot data, because of "The Microsoft Access database engine does not recognize as a valid field name or expression. (Error 3070)" error message. So, finally, you should export these data into another table (let's say TableC).
SELECT C.* INSERT INTO TableC
FROM TableB AS C
Now, you can pivot data:
TRANSFORM First(A.FieldValue) AS FirstOfFieldValue
SELECT A.Category, A.TRank
FROM TableC AS A
GROUP BY A.Category, A.TRank
PIVOT A.FieldName;
Result:
Category TRank FieldA FieldB FieldC
CAT1 1 F B E
CAT1 2 D C
CAT1 3 A
Cheers,
Maciej

I had the same problem. Take a look at: Microsoft Access condense multiple lines in a table
or get the cloud version here https://www.apponfly.com/en/application/microsoft-access-2013
Works pretty good

Related

SQL select all rows that are not equal to an id, and replace the id column with the value - without cross join

Say I have a table like this:
+----+-------+
| id | value |
+----+-------+
| 1 | a |
| 1 | b |
| 2 | c |
| 2 | d |
| 3 | e |
| 3 | f |
+----+-------+
And I want to select all rows with id that are not a, and change their id to a; select all rows with id that are not b, and change the id to b; and select all rows with id that are not c, and change their id to c.
Here is the output I want:
+----+-------+
| id | value |
+----+-------+
| 1 | c |
| 1 | d |
| 1 | e |
| 1 | f |
| 2 | a |
| 2 | b |
| 2 | e |
| 2 | f |
| 3 | a |
| 3 | b |
| 3 | c |
| 3 | d |
+----+-------+
The only solution I can think of is through cross join and distinct:
select distinct a.id, b.value
from table a
cross join table b
where a.id != b.id
Is there any other way to avoid such expensive operation?
I think the typical way to write this is to generate all pairs of id and value and then remove the ones that exist:
select i.id, v.value
from (select distinct id from t) i cross join
(select distinct value from t) v left join
t
on t.id = i.id and t.value = i.value
where t.id is null;
First, I don't think this is what your query does. But this is what you seem to be describing.
From a performance perspective, you might have other sources for i and v that don't require subqueries. If so, use those for performance.
Finally, I don't think you can do much to improve the performance of this, apart from using explicit tables -- and perhaps having appropriate indexes on all the tables.

How to order rows by highest average of another table with same id

I have table A and B
A
+----+------+
| id | data |
+----+------+
| 1 | abc |
+----+------+
| 2 | xxx |
+----+------+
| 3 | qwe |
+----+------+
B
+------+--------+
| a_id | rating |
+------+--------+
| 2 | 1.5 |
+------+--------+
| 2 | 5 |
+------+--------+
| 3 | 2.5 |
+------+--------+
| 1 | 3 |
+------+--------+
| 3 | 1 |
+------+--------+
Now I want to get all data from A ordered by the average of rating in B.
The result should be:
xxx // because the average in table B is 3.25
abc // because the average in table B is 3
qwe // because the average in table B is 1.75
I am sure I have to use stuff like AVG() and ORDER BY DESC and a subquery, but I don't know how to combine.
this should work if you are using SQL server, as you wanted all the data from A, I added left join instead of inner join.
select a.ID, a.data , avg(b.rating) Avgrating from tableA a
left join tableB b on a.ID = b.a_id
group by a.ID ,a.data
order by Avgrating desc
Yes, use avg() aggregation function :
select A.data as "Data", avg(rating) as "Rating"
from A
join B on B.a_id = A.id
group by A.data;
Data Rating
---- ----
qwe 1.75
xxx 3.25
abc 3
Demo

Filter array depending on other table

I'm trying to filter values from an array. The information, which values should be kept, are in another table.
table_a table_b
___________________ ___________
| id | values | | keyword |
------------------- -----------
| 1 | [a, b, c] | | b |
| 2 | [d, e, f] | | e |
| 3 | [a, g] | | f |
------------------- -----------
I expect the following output:
output
________________________
| id | filtered_values |
------------------------
| 1 | [b] |
| 2 | [e, f] |
| 3 | [] |
------------------------
At the moment, I am using following query:
SELECT
id,
array_intersect(ta.values, tb.filter_keywords) AS filtered_values -- brickhouse UDF
FROM
table_a ta
CROSS JOIN (
SELECT
collect_set(keyword) as filter_keywords
FROM (
SELECT
"dummy" as grouping_dummy,
keyword
FROM
table_b
) tmp
GROUP BY
grouping_dummy
)
table_a has a couple million rows, table_b contains less than 1000 rows.
I guess the cross join is the bottleneck, because it uses only one reducer.
Is there any way to optimize this query?
Thanks!
I have a different assumption.
The reducer is needed in order to generate filter_keywords, not for the CROSS JOIN which is a map side operation.
So no problem here.
My guess is that the performance penalty comes from the use of array_intersect with an array of 1000 elements, therefor the solution would be avoiding it.
P.s.
There is no need for grouping_dummy.
You don't need to use GROUP BY in order to use aggregate functions.
select a.id
,collect_list (case when b.keyword is not null then a.val end) as vals
from (select a.id
,e.val
from table_a a
lateral view outer
explode (a.vals) e as val
) a
left join table_b b
on b.keyword =
a.val
group by a.id
+----+-----------+
| id | vals |
+----+-----------+
| 1 | ["b"] |
| 2 | ["e","f"] |
| 3 | [] |
+----+-----------+

Access Queries comparing two tables

I have two tables in Access, Table A and Table B:
Table MasterLockInsNew:
+----+-------+----------+
| ID | Value | Date |
+----+-------+----------+
| 1 | 123 | 12/02/13 |
| 2 | 1231 | 11/02/13 |
| 4 | 1265 | 16/02/13 |
+----+-------+----------+
Table InitialPolData:
+----+-------+----------+---+
| ID | Value | Date |Type
+----+-------+----------+---+
| 1 | 123 | 12/02/13 | x |
| 2 | 1231 | 11/02/13 | x |
| 3 | 1238 | 10/02/13 | y |
| 4 | 1265 | 16/02/13 | a |
| 7 | 7649 | 18/02/13 | z |
+----+-------+----------+---+
All I want are the rows from table B for IDs not contained in A. My current code looks like this:
SELECT Distinct InitialPolData.*
FROM InitialPolData
WHERE InitialPolData.ID NOT IN (SELECT Distinct InitialPolData.ID
from InitialPolData INNER JOIN
MasterLockInsNew
ON InitialPolData.ID=MasterLockInsNew.ID);
But whenever I run this in Access it crashes!! The tables are fairly large but I don't think this is the reason.
Can anyone help?
Thanks
or try a left outer join:
SELECT b.*
FROM InitialPolData b left outer join
MasterLockInsNew a on
b.id = a.id
where
a.id is null
Simple subquery will do.
select * from InitialPolData
where id not in (
select id from MasterLockInsNew
);
Try using NOT EXISTS:
SELECT Distinct i.*
FROM InitialPolData AS i
WHERE NOT EXISTS (SELECT 1
FROM MasterLockInsNew AS m
WHERE m.ID = i.ID)

Converting Name-Value table to another table with names as column headers but one name type can have multiple values

I have a following Table structure:
Structure-1
+------------+-------------+---------+
| SymbolCode | CategoryId | ItemId |
+------------+-------------+---------+
| 212374 | Cat1 | 1 |
| 212374 | Cat2 | 6 |
| 212374 | Cat3 | 5 |
| 212374 | Cat3 | 50 |
+------------+-------------+---------+
I would like to convert this structure into the following:
IntermidiateStructure
+------------+------+------+------+
| SymbolCode | Cat1 | Cat2 | Cat3 |
+------------+------+------+------+
| 212374 | 1 | 6 | 5 |
| 212374 | 1 | 6 | 50 |
+------------+------+------+------+
I have tried using PIVOT/CrossTab but I can't use aggregate functions because there is nothing to aggregate here. I have also tried CASE expression but I don't want 4 rows with null's appearing in the Cat1, Cat2 & Cat3 columns where they don't have any values. And if I use an aggregate function with CASE then I only get one value for CAT3 column.
I think the solution structure I am using perhaps is not accurate as it is an intermediate result for a query I am trying to build.
I have another Structure-2 which I need to join to Structure-1 given below:
+-------+------------+--------+
| Rule | CategoryId | ItemId |
+-------+------------+--------+
| Rule1 | Cat1 | 1 |
| Rule1 | Cat2 | 6 |
| Rule2 | Cat1 | 1 |
| Rule2 | Cat2 | 6 |
| Rule2 | Cat3 | 5 |
| Rule2 | Cat3 | 50 |
+-------+------------+--------+
Thus if I look at the Rule1 and Rule2 then only Rule2 should be applicable on SymbolCode 212374 as it matches the exact criteria, nothing more nothing less.
What sort of query I can build to do this?
You can still use an aggregate function to pivot the data, you'll just need something to be unique to allow for multiple rows to be returned. For your situation, I'd use a windowing function like row_number(). This will create a unique sequence for each SymbolCode, CategoryID - this number will then be used when grouping for the aggregation.
You'll start with a query similar to:
select
s1.SymbolCode,
s1.CategoryID,
s2.ItemId,
seq = row_number() over(partition by s1.symbolcode, s1.categoryid
order by s1.itemid)
from Structure1 s1
inner join Structure2 s2
on s1.categoryid = s2.categoryid
and s1.ItemId = s2.ItemId
See Demo. This give a result of:
| SYMBOLCODE | CATEGORYID | ITEMID | SEQ |
|------------|------------|--------|-----|
| 212374 | Cat1 | 1 | 1 |
| 212374 | Cat1 | 1 | 2 |
| 212374 | Cat2 | 6 | 1 |
| 212374 | Cat2 | 6 | 2 |
| 212374 | Cat3 | 5 | 1 |
| 212374 | Cat3 | 50 | 2 |
Now you have a seq column that contains a unique number for each set of SymbolCode, CategoryId. Once you have this value, then you can pivot the data into columns:
select SymbolCode,
Cat1 = max(case when categoryid = 'Cat1' then itemid end),
Cat2 = max(case when categoryid = 'Cat2' then itemid end),
Cat3 = max(case when categoryid = 'Cat3' then itemid end)
from
(
select
s1.SymbolCode,
s1.CategoryID,
s2.ItemId,
seq = row_number() over(partition by s1.symbolcode, s1.categoryid
order by s1.itemid)
from Structure1 s1
inner join Structure2 s2
on s1.categoryid = s2.categoryid
and s1.ItemId = s2.ItemId
) d
group by symbolcode, seq;
See SQL Fiddle with Demo. This gives a final result of:
| SYMBOLCODE | CAT1 | CAT2 | CAT3 |
|------------|------|------|------|
| 212374 | 1 | 6 | 5 |
| 212374 | 1 | 6 | 50 |
declare #t table (symbolcode int,category varchar(10),itemid int)
insert into #t (symbolcode,category,itemid)values (212374,'cat1',1)
insert into #t (symbolcode,category,itemid)values (212374,'cat2',6)
insert into #t (symbolcode,category,itemid)values (212374,'cat3',5)
insert into #t (symbolcode,category,itemid)values (212374,'cat3',60)
;WITH CTE AS(
select symbolcode,cat1,cat2,cat3
from
(
select symbolcode, category,itemid
from #t
) d
pivot
(
max(itemid)
for category in (cat1,cat2,cat3)
) piv
)
,CTE2 AS
(select symbolcode,cat1,cat2,cat3
from
(
select symbolcode, category,itemid
from #t
) d
pivot
(
MIN(itemid)
for category in (cat1,cat2,cat3)
) piv)
select * from CTE
UNION
select * from CTE2
Well for this very specific case, you could do this:
SELECT
SymbolCode
, (SELECT TOP 1 ItemId FROM MyTable WHERE CategoryId='Cat1' AND SymbolCode=mt.SymbolCode) AS Cat1
, (SELECT TOP 1 ItemId FROM MyTable WHERE CategoryId='Cat2' AND SymbolCode=mt.SymbolCode) AS Cat2
, ItemId AS Cat3
FROM MyTable mt
WHERE CategoryId='Cat3'