How do I find the size of a column in GB? - sql

I have two tables with attachments stored in my databse. People from multiple buildings use my application to upload attachments to the database (the files themselves are stored in type bytea). I need to figure out the total size (in GB) of the attachments for each building.
SELECT sum(octet_length(attachmentfile))
FROM schema.tattachments a, schema.trequests r, schema.tlocations l
WHERE r.location = l.locationid
AND a.reqid = r.reqid
AND r.sys_actntm > '2014-09-18'
GROUP BY l.building
The above SQL returns a bigint result separated by building:
sum bigint
--|-----------
1 | 15782159611407981
2 | 1140653769
3 | 710849157667
etc...
How can I format this SQL statement so it will give me the info in GB or MB, rather than these large numbers?

PostgreSQL has some system administration functions that will take care of this for you. Something along these lines should work.
SELECT pg_size_pretty(sum(octet_length(attachmentfile)::bigint))
FROM schema.tattachments a, schema.trequests r, schema.tlocations l
WHERE r.location = l.locationid
AND a.reqid = r.reqid
AND r.sys_actntm > '2014-09-18'
GROUP BY l.building;
You might be better off replacing octet_length() with pg_column_size(). Not sure how that would affect your query.

Related

BigQuery - Adwords Data Transfer - AccountStats vs AccountBasicStats

For many tables, there's always a AccountStats vs AccountBasicStats.
The same SQL query might have different values from Stats vs BasicStats, for example:
SELECT
cs.Date,
SUM(cs.Impressions) AS Sum_Impressions,
SUM(cs.Clicks) AS Sum_Clicks,
SUM(cs.Interactions) AS Sum_Interactions,
(SUM(cs.Cost) / 1000000) AS Sum_Cost,
SUM(cs.Conversions) AS Sum_Conversions
FROM
`{dataset_id}.Customer_{customer_id}` c
LEFT JOIN
`{dataset_id}.AccountBasicStats_{customer_id}` cs
<-----OR USING----->
`{dataset_id}.AccountStats_{customer_id}` cs
ON
c.ExternalCustomerId = cs.ExternalCustomerId
WHERE
c._DATA_DATE = c._LATEST_DATE
AND c.ExternalCustomerId = {customer_id}
GROUP BY
1
ORDER BY
1
It seems the main difference is ClickType column, which might double count based on the documentation: ClickType.
The BasicStats seems the most accurate, and match up exactly from adwords. While the Stats give around 2x-3x increase in impressions.
Is there a way to transform the data so that both queries would get the same results?
Since there's no basic stats for Hourly data, which I'm interested.
According to:
https://groups.google.com/forum/#!topic/adwords-api/QiY_RT9aNlM
Seems that there is no way to de-segment the data after ClickType is brought in.

Defaulting missing data

I have a complex set of schema that I am trying to pull data out of for a report. The query for it joins a bunch of tables together and I am specifically looking to pull a subset of data where everything for it might be null. The original relations for the tables look as such.
Location.DeptFK
Dept.PK
Section.DeptFK
Subsection.SectionFK
Question.SubsectionFK
Answer.QuestionFK, SubmissionFK
Submission.PK, LocationFK
From here my problems begin to compound a little.
SELECT Section.StepNumber + '-' + Question.QuestionNumber AS QuestionNumberVar,
Question.Question,
Subsection.Name AS Subsection,
Section.Name AS Section,
SUM(CASE WHEN (Answer.Answer = 0) THEN 1 ELSE 0 END) AS NA,
SUM(CASE WHEN (Answer.Answer = 1) THEN 1 ELSE 0 END) AS AnsNo,
SUM(CASE WHEN (Answer.Answer = 2) THEN 1 ELSE 0 END) AS AnsYes,
(select count(distinct Location.Abbreviation) from Department inner join Plant on location.DepartmentFK = Department.PK WHERE(Department.Name = 'insertParameter'))
as total
FROM Department inner join
section on Department.PK = section.DepartmentFK inner JOIN
subsection on Subsection.SectionFK = Section.PK INNER JOIN
question on Question.SubsectionFK = Subsection.PK INNER JOIN
Answer on Answer.QuestionFK = question.PK inner JOIN
Submission on Submission.PK = Answer.SubmissionFK inner join
Location on Location.DepartmentFK = Department.PK AND Location.pk = Submission.PlantFK
WHERE (Department.Name = 'InsertParameter') AND (Submission.MonthTested = '1/1/2017')
GROUP BY Question.Question, QuestionNumberVar, Subsection.Name, Section.Name, Section.StepNumber
ORDER BY QuestionNumberVar;
There are 15 total locations, with this query I get 12. If I remove a relation in the join for Location I get 15 total locations but my answer data gets multiplied by 15. My issue is that not all locations are required to test at the same time so their answers should default to NA, They don't get records placed in the DB so the relationship between Location/Submission is absent.
I have a workaround almost in place via the select count distinct but, The second part is a query for finding what each location answered instead of a sum which brings the problem right back around. It also has to be dynamic because the input parameters for a department won't bring a static number of locations back each time.
I am still learning my SQL so any additional material to look at for building this query would also be appreciated. So I guess the big question here is, How would I go about creating default data in this query for anytime the Location/Submission relation has a null value?
Edit: Dummy Data
QuestionNumberVar | Section | Subsection | Question | AnsYes | AnsNo | NA (expected)
1-1.1 Math Algebra Did you do your homework? 10 1 1(4)
1-1.2 Math Algebra Did your dog eat it? 9 3 0(3)
2-1.1 English Greek Did you do your homework? 8 0 4(7)
I have tried making left joins at various applicable portions of the code to no avail. All attempts at left joins have ended with no effect on info output. This query feeds into the Dataset for an SSRS report. There are a couple workarounds for this particular section via an expression to take total Locations and subtract AnsYes and AnsNo to get the true NA value but as explained above doesn't help with my next query.
Edit: SQL Server 2012 for those who asked
Edit: my attempt at an isnull() on the missing data returns nothing I suspect because the query already eliminates the "null/missing" data. Left joining while doing this has also failed. The point of failure is on Submissions. if we bind it to Locations there are locations missing but if we don't bind it there are multiplied duplicates because Department has a One-To-Many with Location and not vice versa. I am unable to make any schema changes to improve this process.
There is a previous report that I am trying to emulate/update. It used C# logic to process data and run multiple queries to attain the same data. I don't have this luxury. (previous report exports to excel directly instead of SSRS). Here is the previous logic used.
select PK from Department where Name = 'InsertParameter';
select PK from Submission where LocationFK = 'Location.PK_var' and MonthTested = '1/1/2017'
Then it runs those into a loop where it processes nulls into NA using C# logic
EDIT (Mediocre Solution): I ended up doing the workaround of making a calculated field that subtracts Yes and No from the total # of Locations that have that Dept. This is a mediocre solution because I didn't solve my original problem and made 3 datasets that should have been displayed as a singular dataset. One for question info, one for each locations answer and one for locations that didnt participate. If a true answer comes up I will check its validity but for now, Problem psuedo solved.

How to make a query to obtain only results that have N number within a range of values?

I'm trying to extract nutrient data in MS Access 2007 from the USDA food database, freely available at http://www.ars.usda.gov/Services/docs.htm?docid=24912
I need records that have ALL nutrients from NUT_DATA.Nutr_No . Those records have values between '501' and '511' . But I wish to exclude incomplete records that have missing values.
Currently, Baby food banana has all from nutrient 501 to 511, but Baby food Beverage has only 9 of the nutrients listed, and many others are like that.
As a last resort, I guess it would be acceptable to have all records, showing null for missing values, as long as each FOOD_DES.Long_Desc has exactly 11 records, one for each NUT_DATA.Nutr_No OR NUTR_DEF.NutrDesc (which correspond to each other).
SELECT
FOOD_DES.NDB_No, FOOD_DES.FdGrp_Cd, FOOD_DES.Long_Desc, NUT_DATA.Nutr_No, NUTR_DEF.NutrDesc, NUT_DATA.Nutr_Val, WEIGHT.Amount, WEIGHT.Msre_Desc, WEIGHT.Gm_Wgt, [WEIGHT]![Amount] & " " & [WEIGHT]![Msre_Desc] AS msre
FROM
NUTR_DEF inner JOIN ((FOOD_DES INNER JOIN NUT_DATA ON FOOD_DES.NDB_No=NUT_DATA.NDB_No) INNER JOIN WEIGHT ON FOOD_DES.NDB_No=WEIGHT.NDB_No) ON NUTR_DEF.Nutr_No=NUT_DATA.Nutr_No
WHERE
(NUT_DATA.Nutr_No between '501' and '511' ) and ((WEIGHT.Seq)="1") and NUT_DATA.Nutr_Val > '0' and
// this part is me out of ideas trying stuff, but didn't help
EXISTS (SELECT 1
FROM
NUTR_DEF inner JOIN ((FOOD_DES INNER JOIN NUT_DATA ON FOOD_DES.NDB_No=NUT_DATA.NDB_No) INNER JOIN WEIGHT ON FOOD_DES.NDB_No=WEIGHT.NDB_No) ON NUTR_DEF.Nutr_No=NUT_DATA.Nutr_No
WHERE count FOOD_DES.Long_Desc = "11" )
//end wild of experimentation
ORDER BY FOOD_DES.Long_Desc, NUTR_DEF.SR_Order;
This is a sample of the data. I just copied the most important columns. The red is not what I'm looking for because it doesn't have all 11 nutrients. I can paste on the google doc the whole table if someone thinks that would help.
https://docs.google.com/spreadsheets/d/1FghDD59wy2PYlpsqUlYVc3Ulwvy4MMLagpBUYtvLBfI/edit?usp=sharing
As your starting point, identify which food items have values > 0 for all 11 of those nutrients. Check whether this simpler GROUP BY query shows you the correct items:
SELECT ndat.NDB_No
FROM
NUT_DATA AS ndat
INNER JOIN WEIGHT AS wt
ON ndat.NDB_No = wt.NDB_No
WHERE
ndat.Nutr_Val>0
AND ndat.Nutr_No IN('501','502','503','504','505','506','507','508','509','510','511')
AND wt.Seq='1'
GROUP BY ndat.NDB_No
HAVING Count(ndat.Nutr_No)=11;
Note you could use Val(ndat.Nutr_No) Between 501 And 511 as the Nutr_No restriction, which would give you a more concise statement. However, evaluating Val() for every row of the table means that approach would forego the performance benefit of indexed retrieval ... so that version of the query should be noticeably slower.
Save that query and create a new query which joins it to the base tables for the additional data you need from other columns. Or use it as a subquery instead of a named query if you prefer.

I am trying to write a SQL nested query that finds/uses a max value to find the entry just before the max value

I am fairly new to SQL and am trying to write a query that finds the last time a water meter was read so I can see the value. There is a table of properties that have meters and another table of meters that stores the inputs from engineers. Every input is listed as a sequence, a keyword lists the type of input and expression lists their entry. The max sequence will not always be the answer.
What I am looking for is the last time the read the meter for water and then also get the value for electricity from that reading which is stored in the previous entry (sequence). To make it harder engineers input the sequence number and some go by ones (1,2,3) and others go by twos (2,4,6) so the previous entry may be minus one or maybe minus two.
I can write the queries to find the max sequence and another one to find the entry one previous or two previous but cannot figure out how to make it into one query.
to find the max sequence for site 12345, I have:
SELECT MAX(M.SEQUENCE) maxseq
FROM METERS M JOIN PROPERTY P ON M.PROPNUM = P.PROPNUM
WHERE (P.CORP_ID ='12345' AND M.KEYWORD = 'WTR')
I manually search for the entry before to get the electricity entry with the following query.
SELECT P.NAME, P.CORP_ID, M.KEYWORD, M.SEQUENCE, M.EXPRESSION
FROM METERS M JOIN PROPERTY P ON M.PROPNUM = P.PROPNUM
WHERE (P.CORP_ID ='12345')
ORDER BY M.SEQUENCE
I have tried different nested queries but have not been able to write anything that will work.
The data that I am interested in for the meters table looks like:
PROPNUM SEQUENCE KEYWORD EXPRESSION
10a124 95 ELC 9845
10a124 96 WTR 4521
10a124 97 SVC A105
10a124 98 HEALTH GOOD
10a124 99 DAY 150209
10a124 100 HEALTH GOOD
10a124 101 ELC 10283
10a124 102 WTR 4621
I use the property table to find the PROPNUM for the site I am interested as I have the site's ID (CORP_ID) but not its PROPNUM value.
The result I would like to get back would look like below.
NAME WTR_EXPRESSION ELC_EXPRESSION
SMITH 4621 10283
You can inner join the METER table to the PROPERTY table once for each KEYWORD, and specify that the SEQUENCE for 'ELC' (guessing KEYWORD) is less than the 'WTR' SEQUENCE. Since you are on SQL SERVER, we can do this in a CTE and inner join that data set to the METER table to display the EXPRESSION values for each KEYWORD in a single row:
;with wtr_elc as (
select
p.PROPNUM,
p.NAME,
max(w.SEQUENCE) as max_wtr_seq,
max(e.SEQUENCE) as max_elc_seq
from PROPERTY as p
inner join METERS as w
on w.PROPNUM = p.PROPNUM
w.KEYWORD = 'WTR'
inner join METERS as e
on e.PROPNUM = p.PROPNUM
and e.KEYWORD = 'ELC'
and e.SEQUENCE < w.SEQUENCE
where p.CORP_ID ='12345'
group by
p.PROPNUM,
p.NAME)
select
wtr_elc.NAME,
wtr.EXPRESSION as WTR_EXPRESSION,
elc.EXPRESSION as ELC_EXPRESSION
from METERS as wtr
inner join wtr_elc
on wtr_elc.PROPNUM = wtr.PROPNUM
and wtr_elc.max_wtr_seq = wtr.SEQUENCE
inner join METERS elc
on wtr_elc.PROPNUM = elc.PROPNUM
and wtr_elc.max_elc_seq = elc.SEQUENCE
and elc.KEYWORD = 'ELC'
where wtr.KEYWORD = 'WTR'
If you want to do this for more or all PROPERTY records, you can modify the where clause in the CTE.

sql join within join?

I need your help building a SQL statement I can't wrap my head around.
In a database, I have four tables - files, folders, folders_files and links.
I have many files. One of them is called "myFile.txt".
I have many folders. "myFile.txt" is in some of them. The first folder it appears in is called "firstFolder".
I have many links to many folders. The first link to "firstFolder" is called "firstLink".
The data structure for the example would be:
// files
Id: 10
Name: "myFile.txt"
// folders
Id: 20
Name: "firstFolder"
// folder_files (join table)
Id: 30
Folder_Id: 20 (meaning "firstFolder")
File_Id: 1 (meaning "myFile.txt")
// links
Id: 40
Name: "firstLink"
Folder_Id: 20 (meaning "firstFolder")
FIRST QUESTION: How do I get the record for "myFile.txt" AND the Name and Id of "firstLink" (the first link), querying on file Id = 10, based on the lowest Id of the folder and the link?
SECOND QUESTION: How do I get the record for "myFile.txt" AND the Name and Id of "firstLink" (the first link), querying on all files, based on the lowest Id of the folder and the link?
put another way - how do I get the first link to the first folder containing "myFile.txt"?
Resulting in a record that looks like:
Id: 10
Name: "myFile.txt"
LinkId: 40
LinkName: "firstLink"
Thanks!
You should try to think about how you want your result set to look. SQL is designed to describe result sets. If you can write out a hypothetical result set, you might have an easier time writing SQL that will render that result set.
I had a hard time understanding what you are looking for, but I'm sure it's a fairly straight forward problem. I would be able to help you easier if you could describe you results more clearly, although you might not need my help anymore!
For example (going with you original schema) Q1 & Q2:
files.Id, files.Name, links.Id, links.Name (4 columns)
Q1:
SELECT
files.Id, files.Name, links.Id, links.Name
FROM
files, links
INNER JOIN
folder_files
ON files.Id = folder_files.File_Id
INNER JOIN
links
ON links.Id = folder_files.Folder_Id
WHERE
files.Id = 10
ORDER BY
folder_files.File_Id ASC, links.Id ASC
LIMIT 1;
(JOIN with folders table not necessary)
Q2:
Change both ASC to DESC
This selects all links for file id 10:
select links.id, links.name
from files
left join folder_files on files.id = folder_files.file_id
left join folders on folder_files.folder_id = folders.id
left join links on links.folder_id = folders.id
where files.id=10;
Change the where clause, add limit or whatever for other things you want. It should be simple to modify this.
I would try this:
select f.*
, l.Id as LinkId
, l.Name as LinkName,
from Link l
inner join Folder_Files ff on ff.Folder_Id = l.Folder_Id
inner join Files f on f.Id = ff.File_Id
where f.Id = 10
Resulting to:
Id | Name | LinkId | LinkName
10 | myFile.txt | 40 | firstLink
Is this what you want?
Taking into account:
more folders per file
more links per folder
taking the lowest id folder for link, and lowest id link for folder
With help of: mysql: group by ID, get highest priority per each ID
The answer for ALL files in the files table ( go for JohnB's solution for a single file, it would be faster):
SELECT file_id, file_name, link_id, link_name FROM (
SELECT file_id, file_name, link_id, link_name,
#r := CASE WHEN #prev_file_id = file_id
THEN #rn + 1
ELSE 1
END AS r,
#prev_file_id := file_id
FROM (
SELECT
f.id as file_id, f.name as file_name, l.id as link_id, l.name as link_name
FROM files f
JOIN folder_files ff
ON ff.file_id = f.id
JOIN links l
ON l.folder_id = ff.folder_id
ORDER BY ff.folder_id, l.id -- first folder first, first link to first folder second
) derived1,
(SELECT #prev_file_id := NULL,#r:=0) vars
) derived2
WHERE r = 1;