Sql: simultaneous aggregate from two tables - sql

I have two tables: a Files table, which includes the file type, and a File Properties table, which references the file table via a foreign key. Sample Files table:
| id | name | type |
---------------------
| 1 | file1 | zip |
| 2 | file2 | zip |
| 3 | file3 | zip |
| 4 | file4 | jpg |
And the Properties table:
| file_id | property |
-----------------------
| 1 | x |
| 2 | x |
I want to make a query, which shows the count of each file type, and how many files of that type have a property.
So in the example, the result would be
| type | filecount | prop count |
----------------------------------
| zip | 3 | 2 |
| jpg | 1 | 0 |
I could accomplish this by
select f.type, (select count(id) from files where type = f.type), count(fp.id) from
files as f, file_properties as fp where f.id = fp.file_id group by f.type;
But this seems very suboptimal and is very slow. Any better way to do this?

select type, count(*) as filecount, sum(pc.count) as [prop count]
from Files f
left outer join (
select file_id, count(*) as count
from Properties p
group by file_id
) pc on f.id = pc.file_id
group by type

Related

Selecting the two most common attribute pairings from a Entity-Attribute Table?

I have a simple Entity-Attribute table in my database describing simply if an Entity has some Attribute by the existance of a row consisting of (Entity, Attribute).
I want to find out, of all the Entities with two and only two Attributes, what are the most common Attribute pairs
For example, if my table looked like:
+--------+-----------+
| Entity | Attribute |
+--------+-----------+
| Bob | A |
| Sally | B |
| Terry | C |
| Bob | B |
| Sally | A |
| Terry | D |
| Larry | C |
+--------+-----------+
I would want it to return
+-------------+-------------+-------+
| Attribute-1 | Attribute-2 | Count |
+-------------+-------------+-------+
| A | B | 2 |
| C | D | 1 |
+-------------+-------------+-------+
I currently have a short query that looks like:
WITH TwoAtts (
SELECT entity
FROM table
GROUP BY entity
HAVING COUNT(att) = 2
)
SELECT t1.att, t2.att, COUNT(entity)
FROM table t1
JOIN table t2
ON t1.entity = t2.entity
WHERE t1.entity IN (SELECT * FROM TwoAtts)
AND t1.att != t2.att
GROUP BY t1.att, t2.att
ORDER BY COUNT(entity) DESC
but is only capable of producing "duplicate" results like
+-------------+-------------+-------+
| Attribute-1 | Attribute-2 | Count |
+-------------+-------------+-------+
| A | B | 2 |
| B | A | 2 |
| D | C | 1 |
| C | D | 1 |
+-------------+-------------+-------+
In a sense I would like to be able to run a unordered DISTINCT / set operator over the two attribute columns, but I am not sure how to acheive this functionality in SQL?
Hmmm, I think you want two levels of aggregation, with some filtering:
select attribute_1, attribute_2, count(*)
from (select min(ea.attribute) as attribute_1, max(ea.attribute) as attribute_2
from entity_attribute ea
group by entity
having count(*) = 2
) aa
group by attribute_1, attribute_2;
Here is a db<>fiddle

Joining table on two columns only joins it on a single

How do I correctly join a table on two columns. My issue is that the result is not correct as it only joins on a single column.
This question started of in this other question: SQL query returns product of results instead of sum . I am creating a new question as there is an other issue I am trying to solve.
I join a table of materials on a table which contains multiple supply and disposal movements. Each movement references a material id. I would like to join the material on each movement.
My query:
SELECT supply_material_refer, disposal_material_refer, material_id, material_name
FROM "construction_sites"
JOIN projects ON construction_sites.project_refer = projects.project_id
JOIN addresses ON construction_sites.address_refer = addresses.address_id
cross join lateral ( select *
from (select row_number() over () as rn, *
from supplies
where supplies.supply_project_refer = projects.project_id) as supplies
full join (select row_number() over () as rn, *
from disposals
where disposals.disposal_project_refer = projects.project_id
) as disposals
on (supplies.rn = disposals.rn)
) as combined
LEFT JOIN materials material ON combined.disposal_material_refer = material.material_id
OR combined.supply_material_refer = material.material_id
WHERE (projects.project_name = 'Project 15')
ORDER BY construction_site_id asc;
The result of the query:
+-----------------------+-------------------------+-------------+---------------+
| supply_material_refer | disposal_material_refer | material_id | material_name |
+-----------------------+-------------------------+-------------+---------------+
| 1 | 1 | 1 | Materialtest |
| 2 | 1 | 1 | Materialtest |
| 2 | 1 | 2 | Dirt |
| 1 | 1 | 1 | Materialtest |
| 2 | 1 | 1 | Materialtest |
| 2 | 1 | 2 | Dirt |
| 1 | (null) | 1 | Materialtest |
| 4 | (null) | 4 | Stones |
+-----------------------+-------------------------+-------------+---------------+
An example line I have issues with:
+------------------------+-------------------------+-------------+---------------+
| supply_material_refer | disposal_material_refer | material_id | material_name |
+------------------------+-------------------------+-------------+---------------+
| 2 | 1 | 1 | Materialtest |
+------------------------+-------------------------+-------------+---------------+
A prefered output would be like:
+------------------------+----------------------+-------------------------+------------------------+
| supply_material_refer | supply_material_name | disposal_material_refer | disposal_material_name |
+------------------------+----------------------+-------------------------+------------------------+
| 2 | Dirt | 1 | Materialtest |
+------------------------+----------------------+-------------------------+------------------------+
I have created a sqlfiddle with dummy data: http://www.sqlfiddle.com/#!17/863d78/2
To my understanding the solution would be to have a disposal_material column and and supply_material column for the material names. I do not know how I can achieve this goal though...
Thanks for any help!

How to pivot sql data, and squash results into non-null rows by Date per ID

Not a good title to the post, but hopefully it'll catch some eyes.
I have a very complex situation in T-SQL that I am unable to accomplish. I'm hoping someone with expertise knows an elegant and fast solution so that my performance is not impacted. I'm dealing with billions of rows.
PREFACE
I have a table called Customers with a unique ID. Those customers have Files, Files have Properties, and each Property Name corresponds to a single Value.
Tables:
Customers
Files -
Property - contains both Name and Value
The Customer ID is present in all of these tables, as are audit fields such as UpdatedDtm and CreationDtm.
USE CASE
I need to join all customers to their files (filtering for a few) and then tie every file to their properties (again filtering these). This is easy but results in lots of rows, one for each customer x file x property.
I know that the property names will never changes, and I want to return just a select few, so I used a pivot and resulted in a nice table, but it fell apart after I started doing more complex queries.
THE PROBLEM
First, the properties have a DateTime for when they were altered (UpdatedDtm), and I need to return everything altered from 1 hour of the creation date (CreationDtm) in the File table.
This results in me trimming down my list of potential properties, but now I have a table with an RowNumber() per ID and no good way to pivot and select the first one that isn't null and still preserve the number of columns for the table defnition. This is important because I'm using Dynamic SQL and placing it in an indexed temp table with a Composite Key on CustomerID and FileName.
BEFORE PIVOT
| UpdatedDtm | CustomerID | FileName | Property | Value |
| ---------- | ---------- | ---------- | -------- | -------------- |
| 1/1/2015 | 1 | FileOne | Size | NULL |
| 1/1/2015 | 1 | FileOne | Format | JPG |
| 1/7/2015 | 1 | FileOne | Size | 88KB |
| 1/7/2015 | 1 | FileOne | Format | JPG |
| 1/7/2015 | 1 | FileOne | Comment | NULL |
| 1/11/2015 | 1 | FileOne | Comment | NULL |
| 1/1/2015 | 1 | FileTwo | Size | 91KB |
| 1/1/2015 | 1 | FileTwo | Format | PNG |
| 1/11/2015 | 1 | FileTwo | Comment | NULL |
| 1/2/2015 | 2 | FileThree | Size | 74KB |
| 1/2/2015 | 2 | FileThree | Format | XLS |
| 1/2/2015 | 2 | FileThree | State | Open |
| 1/7/2015 | 2 | FileThree | State | Closed |
| 1/10/2015 | 2 | FileThree | Comment | NULL |
| 1/1/2015 | 3 | FileFour | Size | 2KB |
| 1/2/2015 | 3 | FileFour | Size | 10KB |
| 1/3/2015 | 3 | FileFour | Size | 13KB |
| 1/4/2015 | 3 | FileFour | Size | 21KB |
| 1/5/2015 | 3 | FileFour | Size | 27KB |
| 1/6/2015 | 3 | FileFour | Size | 32KB |
| 1/7/2015 | 3 | FileFour | Size | 39KB |
| 1/8/2015 | 3 | FileFour | Size | 44KB |
| 1/1/2015 | 3 | FileFour | Format | TXT |
| 1/1/2015 | 3 | FileFour | Comment | NULL |
Please don't ask me why the database is setup this way or to change the schema. That is set in stone and out of my control. I need to be able to solve the use case as described.
AFTER PIVOT (Expectation)
| CustomerID | FileName | Size | Format | State | Comment |
| ---------- | ---------- | ---- | ------ | ------ | ------- |
| 1 | FileOne | 88KB | JPG | NULL | NULL |
| 1 | FileTwo | 91KB | PNG | NULL | NULL |
| 2 | FileThree | 74KB | XLS | Closed | NULL |
| 3 | FileFour | 44KB | TXT | NULL | NULL |
I have included some NULL values and missing values to showcase that I need to preserve the same columnar properties regardless of them having data, but I also need to squash the data by the the first non-null value within my date range.
CODE (My attempt)
IF Object_id('tempdb..#FilesQuery') IS NOT NULL DROP TABLE #FilesQuery;
CREATE TABLE #FilesQuery (
SeqNum int,
CustomerID numeric(16,0),
FileName varchar(64),
PropertyName varchar(64),
PropertyValue varchar(64)
)
INSERT INTO #FilesQuery
SELECT
CASE WHEN P.[Value] IS NOT NULL
THEN ROW_NUMBER() OVER (partition by C.CustomerID order by UpdatedDtm)
ELSE 0
END as SeqNum,
C.CustomerID
,F.Name as FileName
,P.Name as PropertyName
,P.Value as PropertyValue
FROM Customers C
INNER JOIN Files F ON F.CustomerID = C.CustomerID
LEFT JOIN Properties P
ON P.CustomerID = C.CustomerID
AND P.FileID = F.FileID
WHERE F.FileName IN ('FileOne','FileTwo','FileThree','FileFour')
AND P.Name IN ('Size','Format','State','Comment')
--PIVOT
DECLARE #cols AS nvarchar(MAX)
SELECT #cols = STUFF(
(SELECT DISTINCT ',' + QUOTENAME(PropertyName)
FROM #FilesQuery fq
FOR XML PATH(''),TYPE).value('.','NVARCHAR(MAX)'),1,1,'')
DECLARE #dynSql AS nvarchar(MAX)
SET #dynSql = '
SELECT DISTINCT *
FROM (
SELECT
fq.CustomerID,
fq.FileName,
fq.PropertyName,
fq.PropertyValue
FROM #FilesQuery fq
) SRC
PIVOT (
Max([PropertyValue])
FOR PropertyName IN (' + #cols + ')
) PVT
'
IF Object_id('tempdb..#Results') IS NOT NULL DROP TABLE #Results;
CREATE TABLE #Results (
CustomerID varchar(16) NOT NULL,
FileName varchar(64) NOT NULL,
FileSize varchar(64) NULL,
FileFormat varchar(64) NULL,
FileState varchar(64) NULL,
FileComment varchar(64) NULL,
CONSTRAINT pk_CustDoc PRIMARY KEY (CustomerID,FileName)
)
INSERT INTO #Results EXEC #dynSql;
I'm sorry this code isn't complete, it is the working section I have. The other tries I made resulted in bad data pulls.
I tried using SeqNum and a combination of case statements to try and select the first non-null value for each row so that the data was all on one line, but it ended up being more like.
FileOne NULL NULL Open NULL
FileOne NULL JPG NULL NULL
and so on...
I've been struggling on solving this special case for awhile and am about to scrap and it do something procedural with looping, but that would kill my query time and performance.
Anyone have a good solution? Am I over-thinking things?
you should filter your data before you PIVOT and you will get your desired results. Here is a cte version to show you the steps of how to get what you want.
;WITH cteDefineRowPrecedence AS (
SELECT *
,ROW_NUMBER() OVER (PARTITION BY CustomerId, FileName, Property ORDER BY
CASE WHEN Value IS NOT NULL THEN 0 ELSE 1 END
,UpdatedDtm DESC) as RowNum
FROM
#Table
)
, cteDesiredRwows AS (
SELECT
CustomerId
,FileName
,Property
,Value
FROM
cteDefineRowPrecedence t
WHERE
t.RowNum = 1
AND t.Value IS NOT NULL
)
SELECT *
FROM
cteDesiredRwows t
PIVOT (
MAX(Value)
FOR Property IN (Size,[Format],[State],Comment)
) p
ORDER BY
CustomerId
,FileName
And here is a nested query version that will make it easier to embed/put in your dynamic sql....
SELECT *
FROM
(
SELECT CustomerId, FileName, Property, Value
FROM
(SELECT *
,ROW_NUMBER() OVER (PARTITION BY CustomerId, FileName, Property ORDER BY
CASE WHEN Value IS NOT NULL THEN 0 ELSE 1 END
,UpdatedDtm DESC) as RowNum
FROM
#Table) r
WHERE
r.RowNum = 1
AND r.Value IS NOT NULL
) t
PIVOT (
MAX(Value)
FOR Property IN (Size,[Format],[State],Comment)
) p
ORDER BY
CustomerId
,FileName
You might need to add a WHERE condition inside the CTE definition to restrict the date/time range to what you want.
WITH CTE AS (
SELECT DISTINCT
CustomerID
, FileName
, Property
, Value
FROM
<table_name>
)
SELECT *
FROM
CTE
PIVOT (MAX(value) FOR Property IN( 'Size', 'Format', 'State', 'Comment')) p

How to select from table A one row and table B multi rows?

I have two tables.
Pages
+-----------+----------+------------+
| ID | title | URL |
+-----------+----------+------------+
| 1 | test | test.html |
| 2 | test2 | test2.html |
+-----------+----------+------------+
Files
+-----------+----------+------------+
| ID | page_id | name |
+-----------+----------+------------+
| 10 | 1 | a.jpg |
| 11 | 1 | b.jpg |
| 12 | 2 | c.jpg |
+-----------+----------+------------+
How to select from PAGES one row and FILES multi rows??
My query as:
select * from pages,files WHERE (pages.id = page_id) AND (url='$url')
The output for above query:
test
a.jpg
The output I need:
test
a.jpg
b.jpg
This is more of a sql question than anything to do with specifically PHP.
I think this is what you want, but I'm not sure with your wording.
SELECT pages.title, files.name
FROM pages
INNER JOIN files ON pages.id = files.page_id
WHERE (pages.url='$url')
GROUP BY files.name;
Something like this
SELECT
Files.Name
FROM Files
INNER JOIN Pages ON Files.page_id = Pages.Id
WHERE Pages.url = '$url'
UNION ALL
SELECT
Pages.title
FROM Pages
WHERE Pages.url = '$url'

Count within the result set of a subquery

I have the following relations in my database:
Invoice InvoiceMeal
--------------------- ---------------------------
| InvoiceId | Total | | Id | InvoiceId | MealId |
--------------------- ---------------------------
| 1 | 22.32 | | 1 | 1 | 3 |
--------------------- ---------------------------
| 2 | 12.18 | | 2 | 1 | 2 |
--------------------- ---------------------------
| 3 | 27.76 | | 3 | 2 | 2 |
--------------------- ---------------------------
Meal Type
----------------------------------- -------------------
| Id | Name | TypeId | | Id | Name |
----------------------------------- -------------------
| 1 | Hamburger | 1 | | 1 | Meat |
----------------------------------- -------------------
| 2 | Soja Beans | 2 | | 2 | Vegetarian |
----------------------------------- -------------------
| 3 | Chicken | 2 |
-----------------------------------
What I want to query from the database is InvoiceId and Total of all Invoices which consist of at least two Meals where at least one of the Meals is of Type Vegetarian. I have the following SQL query and it works:
SELECT
i."Id", i."Total"
FROM
public."Invoice" i
WHERE
(SELECT COUNT(*)
FROM public."InvoiceMeal" im
WHERE im."InvoiceId" = i."Id" AND
(SELECT COUNT(*)
FROM public."Meal" m, public."Type" t
WHERE im."MealId" = m."Id" AND
m."TypeId" = t."Id" AND
g."Name" = 'Vegetarian') > 0
) >= 2;
My problem with this query is that I can not easily modify the condition that there must at least one vegetarien Meal. I want to be able, for example, to change it to at least two vegetarian meals. How can I achieve this with my query?
I would approach this by joining the tables together and using aggregation. The having clause can handle the conditions:
select i.Id, i.Total
from InvoiceMeal im join
Invoice i
on i.InvoiceId = im.InvoiceId join
Meal m
on im.mealid = m.mealid join
Type t
on m.typeid = t.typeid
group by i.Id, i.Total
having count(distinct im.mealid) >= 2 and
sum(case when t.name = 'Vegetarian' then 1 else 0 end) > 0;
I also see no reason to put double quotes around column names. That just makes the query harder to write and read.