I need help creating some kind of function/method to count how many files are in a particular folder. Here is an example sample data set I am using:
FullPath
Type
Age (Years)
Computer\User01\MyDocuments\
Folder
4
Computer\User01\MyDocuments\thisisafile.xlsx
File
2.2
Computer\User01\MyDocuments\anotherfile.doc
File
1
Computer\User01\MyDocuments\onemorefile.doc
File
1.5
Computer\User01\MyDocuments\secondfile.pptx
File
1.6
As you can see from the sample data set, the folder: "Computer\User01\MyDocuments" contains 4 files. I could write the following code to show how many files are in this folder:
SELECT COUNT(*) AS No_of_files
FROM SampleDataSet
WHERE Type = 'File'
AND FullPath LIKE 'Computer\User01\MyDocuments\%'
However, my data set contains hundreds of thousands of folders, all with a different number of files associated with them and therefore I can't specify the "FullPath" in the LIKE statement each time.
My desired output looks like this:
FullPath
Type
No_of_files
Computer\User01\MyDocuments\
Folder
1500
Computer\User01\Pictures\
Folder
20
Computer\User01\Desktop\
Folder
14
Computer\User01\Downloads\
Folder
10 000
Does anyone know if this is possible and if there's an efficient way of doing this?
Any help would be much appreciated, thanks!
Seems that you could use a JOIN from the table onto itself to achieve this. Something like this:
SELECT D.FullPath,
D.[Type],
COUNT(F.FullPath) AS Files
FROM dbo.YourTable D
LEFT JOIN dbo.YourTable F ON F.FullPath LIKE D.FullPath + '%'
AND F.[Type] = 'File'
WHERE D.[Type] = 'Folder'
GROUP BY D.FullPath,
D.[Type];
I have uploaded a file containing a list of cities called [Sandbox].[dbo].[Cities for Special Project]. I am trying to find out how revenue we are collecting from the cities in my list.
I can find out how much revenue we collect from each city thats already in my DB but I am unsure how on how to match that to the file that I uploaded.
Select individual.[vchCity] as City
, sum(staging.[Price_PerLicense]) as Total
From Engine1.DB1.[dbo].[Individual] individual
Join
Engine2.DB2.[dbo].[Daily_License_Report_Detail] staging on individual.[iIndividualId] = staging.[OwnerId]
Group by individual.[vchCity]
How would I match what I have to the cities thats in my uploaded file.
with CityTotal as(
Select individual.[vchCity] as City, sum(staging.[Price_PerLicense]) as Total
From Engine1.DB1.[dbo].[Individual] individual
Join Engine2.DB2.[dbo].[Daily_License_Report_Detail] staging on individual.[iIndividualId] = staging.[OwnerId]
Group by individual.[vchCity]
)
Select cs.city, ct.total
from [Sandbox].[dbo].[Cities for Special Project] cs
left join cititytotal ct on ct.city = cs.city
This is one way
So, I have tables named files and folders with ids and usual relation between them: file.folder_id = folder.id
Additionally, some of files/folders could be ignored by field ignore.
I'm trying to get list of folders and counts of files in corresponding folders.
My first approach worked fine but missed empty folders:
SELECT folders.id, folders.name, count(files.id) kount
FROM folders, files
WHERE folders.site_id=111
AND files.ignored=0
AND folders.ignored=0
AND files.site_id=111
AND files.folder_id=folders.id
GROUP BY folders.name
ORDER BY folders.name
So I look for LEFT JOIN
SELECT folders.id, folders.name, count(files.id) kount
FROM folders
LEFT JOIN files
ON files.folder_id=folders.id
WHERE folders.site_id=111
AND folders.ignored=0
AND files.ignored=0
AND files.site_id=111
GROUP BY folders.name
ORDER BY folders.name
but again - empty folders are missing. What I'm doing wrong?
You need to put the conditions filtering the joined table directly into the left join
SELECT folders.id, folders.name, count(files.id) kount
FROM folders
LEFT JOIN files ON files.folder_id=folders.id
AND files.ignored=0
AND files.site_id=111
WHERE folders.site_id=111
AND folders.ignored=0
GROUP BY folders.name
ORDER BY folders.name
Try this.
SELECT folders.id, folders.name, count(files.id) kount
FROM folders
LEFT JOIN files
ON files.folder_id=folders.id
AND files.ignored=0
AND files.site_id=111
WHERE folders.site_id=111
AND folders.ignored=0
GROUP BY folders.name
ORDER BY folders.name
The most common way of getting content data from specific files bigquery-public-data:github_repos by name is like this:
SELECT *
FROM [bigquery-public-data:github_repos.sample_contents]
WHERE id IN (SELECT id FROM (
SELECT *
FROM [bigquery-public-data:github_repos.sample_files]
WHERE path = 'README.md'
))
This query gives me 14557 results.
I thought that running below query will give me the same ammount of results:
SELECT contents.*
FROM [bigquery-public-data:github_repos.sample_contents] contents
INNER JOIN [bigquery-public-data:github_repos.sample_files] files
ON contents.id = files.id
WHERE files.path = 'README.md'
But it ends up with 14645 results.
Why there is the difference between this two results, and witch one is a proper one for selecting content data of README.md file?
EDIT:
It looks like forked files without modification have the same id across others repos (forks).
First query gives you all contents with files having path = 'README.md' no matter how many times that file id is present in files table
Second query gives you same content as many times as respective file is in files table - because of JOIN
You can run below to validate this
SELECT EXACT_COUNT_DISTINCT(contents.id)
FROM [bigquery-public-data:github_repos.sample_contents] contents
INNER JOIN [bigquery-public-data:github_repos.sample_files] files
ON contents.id = files.id
WHERE files.path = 'README.md'
I need your help building a SQL statement I can't wrap my head around.
In a database, I have four tables - files, folders, folders_files and links.
I have many files. One of them is called "myFile.txt".
I have many folders. "myFile.txt" is in some of them. The first folder it appears in is called "firstFolder".
I have many links to many folders. The first link to "firstFolder" is called "firstLink".
The data structure for the example would be:
// files
Id: 10
Name: "myFile.txt"
// folders
Id: 20
Name: "firstFolder"
// folder_files (join table)
Id: 30
Folder_Id: 20 (meaning "firstFolder")
File_Id: 1 (meaning "myFile.txt")
// links
Id: 40
Name: "firstLink"
Folder_Id: 20 (meaning "firstFolder")
FIRST QUESTION: How do I get the record for "myFile.txt" AND the Name and Id of "firstLink" (the first link), querying on file Id = 10, based on the lowest Id of the folder and the link?
SECOND QUESTION: How do I get the record for "myFile.txt" AND the Name and Id of "firstLink" (the first link), querying on all files, based on the lowest Id of the folder and the link?
put another way - how do I get the first link to the first folder containing "myFile.txt"?
Resulting in a record that looks like:
Id: 10
Name: "myFile.txt"
LinkId: 40
LinkName: "firstLink"
Thanks!
You should try to think about how you want your result set to look. SQL is designed to describe result sets. If you can write out a hypothetical result set, you might have an easier time writing SQL that will render that result set.
I had a hard time understanding what you are looking for, but I'm sure it's a fairly straight forward problem. I would be able to help you easier if you could describe you results more clearly, although you might not need my help anymore!
For example (going with you original schema) Q1 & Q2:
files.Id, files.Name, links.Id, links.Name (4 columns)
Q1:
SELECT
files.Id, files.Name, links.Id, links.Name
FROM
files, links
INNER JOIN
folder_files
ON files.Id = folder_files.File_Id
INNER JOIN
links
ON links.Id = folder_files.Folder_Id
WHERE
files.Id = 10
ORDER BY
folder_files.File_Id ASC, links.Id ASC
LIMIT 1;
(JOIN with folders table not necessary)
Q2:
Change both ASC to DESC
This selects all links for file id 10:
select links.id, links.name
from files
left join folder_files on files.id = folder_files.file_id
left join folders on folder_files.folder_id = folders.id
left join links on links.folder_id = folders.id
where files.id=10;
Change the where clause, add limit or whatever for other things you want. It should be simple to modify this.
I would try this:
select f.*
, l.Id as LinkId
, l.Name as LinkName,
from Link l
inner join Folder_Files ff on ff.Folder_Id = l.Folder_Id
inner join Files f on f.Id = ff.File_Id
where f.Id = 10
Resulting to:
Id | Name | LinkId | LinkName
10 | myFile.txt | 40 | firstLink
Is this what you want?
Taking into account:
more folders per file
more links per folder
taking the lowest id folder for link, and lowest id link for folder
With help of: mysql: group by ID, get highest priority per each ID
The answer for ALL files in the files table ( go for JohnB's solution for a single file, it would be faster):
SELECT file_id, file_name, link_id, link_name FROM (
SELECT file_id, file_name, link_id, link_name,
#r := CASE WHEN #prev_file_id = file_id
THEN #rn + 1
ELSE 1
END AS r,
#prev_file_id := file_id
FROM (
SELECT
f.id as file_id, f.name as file_name, l.id as link_id, l.name as link_name
FROM files f
JOIN folder_files ff
ON ff.file_id = f.id
JOIN links l
ON l.folder_id = ff.folder_id
ORDER BY ff.folder_id, l.id -- first folder first, first link to first folder second
) derived1,
(SELECT #prev_file_id := NULL,#r:=0) vars
) derived2
WHERE r = 1;