I'm new to SQL and can't seem to ask the right question, as in every time I formulate it differently I end up with the same results: A query that groups data by several properties (users that share more than one property).
I'm writing a JSP-based website on the Glassfish server, and manage the database from MS Access.
What I'm interested in, is basically grouping by several distinct properties. Say I have a table that shows which items a user has. Something like this:
id | name | item1 | item2 | item3 | item4 |
--------------------------------------------------
1 | name1 | yes | yes | | yes |
----+-------+---------+--------+--------+--------+
2 | name2 | yes | | | yes |
----+-------+---------+--------+--------+--------+
3 | name3 | | yes | | yes |
----+-------+---------+--------+--------+--------+
4 | name4 | yes | yes | | yes |
----+-------+---------+--------+--------+--------+
5 | name5 | | | yes | yes |
.
.
.
The query that I need, would return the following:
ItemID | Number of users with this item
-----------------------------------------
item1 | 3
item2 | 3
item3 | 1
item4 | 5
I don't see how GROUP BY can be used here, as the result I'm looking for basically demands that the properties of the original table, will now appear as the values in each row of the resulting table.
What is the right query, and how such operation is called (it's not grouping by each property, it's something else...)?
I can't see how you could do this in Access if you need it to work on any table, but if you know the table structure you can design a query like:
select 'item1' as ItemID, count(*) as [Number of users with this item]
from myTable
where item1 = 'Yes'
union
select 'item2', count(*)
from myTable
where item2 = 'Yes'
union
....
and so on.
You could do an aggregate query that returns 1 record with the count under each Item field. Assuming these are Yes/No type fields:
SELECT Sum(IIf([Item1],1,0)) AS Count1, Sum(IIf([Item2],1,0)) AS Count2, Sum(IIf([Item3],1,0)) AS Count3, Sum(IIf([Item4],1,0)) AS Count4 FROM Table1;
Your data structure is not normalized. Instead of 4 Item fields should be one Item field with the item code. Normalized structure would allow option for GROUP BY or CROSSTAB queries.
id | name | item |
----------------------+
1 | name1 | 1 |
----+-------+---------+
2 | name1 | 2 |
----+-------+---------+
3 | name1 | 4 |
----+-------+---------+
4 | name2 | 1 |
----+-------+---------+
5 | name2 | 4 |
Can emulate the normalized structure with a UNION query. There is no designer for UNION, must type into SQLView of query builder.
SELECT ID AS SourceRecID, [Name] AS User, 1 AS Item, "Item1" AS Source FROM tablename
UNION SELECT ID, [Name], 2, "Item2" FROM tablename
...;
Now use that query in another query to do the aggregate GROUP BY calcs. Or use it as the source for a report. A report will allow display of raw detail data as well as summary calcs.
Related
I have a dataset with big int array column in s3 and I want to filter rows efficiently based on array values. I know we can use gin index in sql table but need solution to work on s3 dataset. I am planning to use cluster id for each combinations of elements in array (as their cardinality is not huge. max 2500) and then store it as new column on which later on filter can applied.
Example,
Table A
+------+------+-----------+
| Col1 | Col2 | Col3 |
+------+------+-----------+
| 1 | 101 | [123,234] |
| 2 | 102 | [123] |
| 3 | 103 | [234,345] |
+------+------+-----------+
I am trying to add new column like,
Table B (column Col3 will be removed from actual schema)
+------+------+-----------+-----------+
| Col1 | Col2 | Col3 | Cid |
+------+------+-----------+-----------+
| 1 | 101 | [123,234] | 1 |
| 2 | 102 | [123] | 2 |
| 3 | 103 | [234,345] | 3 |
+------+------+-----------+-----------+
and there will be another table of mapping for col3 and Cid like,
Table C
+-----------+-----+
| Col3 | Cid |
+-----------+-----+
| [123,234] | 1 |
| [123] | 2 |
| [234,345] | 3 |
+-----------+-----+
This table C will be added a new entry if a new combination is created and B will be updated if any array element gets added or removed. Goal is to be able to filter out records from Table A based on values in array column efficiently. Queries like
123 = Any(Col3) can be served as Cid = 2 or queries like [123, 345] = Any(Col3) can be served as Cid in (2,3).
Is there any better way to do solve this problem?
Also I am thinking of creating required combinations at runtime to limit number of combinations. Is it a good idea to create minimum combinations?
In Postgres, you can create the table and use join to calculate the values:
create table array_dim as
select col3 as arr, row_number() over (order by min(col1)) as array_id
from t
group by col3;
You can then add the new column:
select a.*, ad.array_id
from a join
array_dim ad
on a.col3 = ad.arr
I have tables below as follows:
tbl_tasks
+---------+-------------+
| Task_ID | Assigned_ID |
+---------+-------------+
| 1 | 8 |
| 2 | 12 |
| 3 | 31 |
+---------+-------------+
tbl_resources
+---------+-----------+
| Task_ID | Source_ID |
+---------+-----------+
| 1 | 4 |
| 1 | 10 |
| 2 | 42 |
| 4 | 8 |
+---------+-----------+
A task is assigned to at least one person (denoted by the "assigned_ID") and then any number of people can be assigned as a source (denoted by "source_ID"). The ID numbers are all linked to names in another table. Though the ID numbers are named differently, they all return to the same table.
Would there be any way for me to combine the two tables based on ID such that I could search based on someone's ID number? For example- if I decide to search on or do a WHERE User_ID = 8, in order to see what Tasks that 8 is involved in, I would get back Task 1 and Task 4.
Right now, by joining all the tables together, I can easily filter on "Assigned" but not "Source" due to all the multiple entries in the table.
Use union all:
select distinct task_id
from ((select task_id, assigned_id as id
from tbl_tasks
) union all
(select task_id, source_id
from tbl_resources
)
) ti
where id = ?;
Note that this uses select distinct in case someone is assigned to the same task in both tables. If not, remove the distinct.
could you help me to make a select query for this case,
recently i'm looking for a way to implement expandable list view that fill the data from database, but i'm not found a proper example yet,
and this i'm thinking about another way,
i have 2 table :
table1 :
+------------+----------+
| id_table1 | Item |
+------------+----------+
| 1 | Item1 |
| 2 | Item2 |
| 3 | Item3 |
| 4 | Item2.1 |
| 5 | Item2.2 |
| 6 | Item3.1 |
| 7 | Item3.2 |
+------------+----------+
table 2 : id_table2.table2 = id_table1.table1 and table2.id_table = id_table1.table1
+------------+----------+
| id_table2 | id_table |
+------------+----------+
| 2 | 4 |
| 2 | 5 |
| 3 | 6 |
| 3 | 7 |
+------------+----------+
and with some select query the result will be :
Item1
Item2
Item2.1 //with space
Item2.2 //with space
Item3
Item3.1 //with space
Item3.2 //with space
You can do what you want to do with these tables, a la the following:
select id_table1, Item from table1
where not exists (
select id_table2
from table2 where id_table1=id_table)
union
select id_table2, child.Item
from table1 parent, table2, table1 child
where table2.id_table2=parent.id_table1
and table2.id_table=child.id_table1;
The first query finds those items that are "parent" items. The second one finds those that are children. (You might have some issues ordering later on. And this assumes only two levels at the moment.) But it is not a very clear way to do it. At least I would suggest column names that indicate what you are doing, e.g:
table1: ViewItem. Columns: id, Item
table2: ItemChild. Columns: parentId, childId
You will find quite a few hits on this type of question, hierarchical menus being one such application.
I have a single table that has thousands of rows of log statistics (number of views per page, per website) for multiple websites. For instance, in this table, multiple sites can have a separate stat for the same page name:
+----+---------+-------+------------+
| id | page | views | sitename |
+----+---------+-------+------------+
| 1 | page1 | 7 | name1 |
+----+---------+-------+------------+
| 2 | page1 | 5 | name2 |
+----+---------+-------+------------+
| 3 | page2 | 3 | name2 |
+----+---------+-------+------------+
| 4 | page1 | 7 | name3 |
+----+---------+-------+------------+
| 5 | page2 | 5 | name3 |
+----+---------+-------+------------+
| 6 | page3 | 3 | name3 |
+----+---------+-------+------------+
I am trying to formulate a query that will allow me to list each DISTINCT page name, for each DISTINCT sitename -- and display the number of views for that page per site:
+-------+----------+-------+---------+
| page | name1 | name2 | name3|
+-------+----------+-------+---------+
| page1 | 7 | 5 | 7 |
+-------+----------+-------+---------+
| page2 | NULL | 3 | 5 |
+-------+----------+-------+---------+
| page3 | NULL | NULL | 3 |
+-------+----------+-------+---------+
If a site does not have a stat for that particular page, the view value should be NULL. This leads me to think I'll need to do a JOIN/UNION some place, but I can't wrap my mind around it! How can I formulate this query when I need multiple distinct values? Thanks!
UPDATE (ANSWER WITH DYNAMIC SQL + PIVOT):
PIVOT was the correct method, which both potential answers included. Gave the official credit to 2nd responder since the SUM of the values in the actual SELECT statement was the key to getting the dynamic SQL query to work properly.
http://sqlfiddle.com/#!6/7b4cb/37/0
Also, used some TSQL from here:
SQL Server Pivot with Dynamic Fields
If the site list is static you can use a pivot like so
SELECT
page,
sum(name1) name1,
sum(name2) name2,
sum(name3) name3
FROM
(SELECT id, page, views, sitename
FROM Table1) p
PIVOT
(
SUM(VIEWS)
FOR SiteName IN (name1, name2, name3)
) as pvt
GROUP BY
page
SQL Fiddle
If it needs to be dynamic you can use this technique that uses dynamic sql.
Bluefeet gives an example here
You can use Pivot functionality for the above mentioned scenario. Please find the implementation below:
http://msdn.microsoft.com/en-us/library/ms177410(v=sql.105).aspx
http://blog.sqlauthority.com/2008/06/07/sql-server-pivot-and-unpivot-table-examples/
This is the basic question about sql statements.
What is the difference between
SELECT * FROM "Users"
and
SELECT "Users".* FROM "Users"
[TableName].[column] is usually used to pinpoint the table you wish to use when two tables a present in a join or a complex statement and you want to define which column to use out of the two with the same name.
It's most common use is in a join though, for a basic statement such as the one above there is no difference and the output will be the same.
In your case there is no difference. It emerges, when you are selecting from multiple tables. * takes data from all the tables, TABLE_NAME.* - all the data from this table. Suppose, we have a database with 2 tables:
mysql> SELECT * FROM report;
+----+------------+
| id | date |
+----+------------+
| 1 | 2013-05-01 |
| 2 | 2013-06-02 |
+----+------------+
mysql> SELECT * FROM sites_to_report;
+---------+-----------+---------------------+------+
| site_id | report_id | last_run | rows |
+---------+-----------+---------------------+------+
| 1 | 1 | 2013-05-01 16:20:21 | 1 |
| 1 | 2 | 2013-05-03 16:20:21 | 1 |
| 2 | 2 | 2013-05-03 14:21:47 | 1 |
+---------+-----------+---------------------+------+
mysql> SELECT
-> *
-> FROM
-> report
-> INNER JOIN
-> sites_to_report
-> ON
-> sites_to_report.report_id=report.id;
+----+------------+---------+-----------+---------------------+------+
| id | date | site_id | report_id | last_run | rows |
+----+------------+---------+-----------+---------------------+------+
| 1 | 2013-05-01 | 1 | 1 | 2013-05-01 16:20:21 | 1 |
| 2 | 2013-06-02 | 1 | 2 | 2013-05-03 16:20:21 | 1 |
| 2 | 2013-06-02 | 2 | 2 | 2013-05-03 14:21:47 | 1 |
+----+------------+---------+-----------+---------------------+------+
mysql> SELECT
-> report.*
-> FROM
-> report
-> INNER JOIN
-> sites_to_report
-> ON
-> sites_to_report.report_id=report.id;
+----+------------+
| id | date |
+----+------------+
| 1 | 2013-05-01 |
| 2 | 2013-06-02 |
| 2 | 2013-06-02 |
+----+------------+
In the case of example given by you, there is no difference between them when it comes to semantics.When it comes to performance it might be too little... just parsing two different length strings....
But, it is only true for the example given by you. Where as in queries where multiple tables are involved tableName.* disambiguate the table from which table we want to select all columns.
Example:
If you have two tables TableA and TableB. Let's suppose that they have column with same names that is Name. If you want to specify from which table you want to select Name column. Table-name qualifier helps.
`select TableA.Name, TableB.Name where TableA.age=TableB.age`
That's all I can say.
The particular examples specified would return the same result and have the same performance. There would be no difference in that respect, therefore.
However, in some SQL products, difference in interpreting * and alias.* has effect, in particular, on what else you can add to the query. More specifically, in Oracle, you can mix an alias.* with other expressions being returned as columns, i.e. this
SELECT "Users".*, SomeColumn * 2 AS DoubleValue FROM "Users"
would work. At the same time, * must stand on its own, meaning that the following
SELECT *, SomeColumn * 2 AS DoubleValue FROM "Users"
would be illegal.
For the examples you provided, the only difference is in syntax. What both of the queries share is that they are really bad. Select * is evil no matter how you write it and can get you into all kinds of trouble. Get into the habit of listing the columns you want to have included in your result set.