Select tables in database and group them by partial name - sql

I have a database that gets new tables automatically added to it via create table if not exists statements.
The table names are in the following format:
somenamebasedondatasource_YEAR_period_X
Where X is a financial period of the business year, being a number 1-13.
Is there a query I can run against the schema table to get all the tables in the database then group them by the year contained in the name of the table.
So if my current tables list looks like this:
somenamebasedondatasource_2018_period_1
somenamebasedondatasource_2018_period_2
somenamebasedondatasource_2018_period_3
somenamebasedondatasource_2018_period_4
somenamebasedondatasource_2018_period_5
somenamebasedondatasource_2018_period_6
somenamebasedondatasource_2018_period_7
somenamebasedondatasource_2019_period_8
somenamebasedondatasource_2019_period_9
somenamebasedondatasource_2019_period_10
somenamebasedondatasource_2019_period_11
somenamebasedondatasource_2019_period_12
somenamebasedondatasource_2019_period_13
somenamebasedondatasource_2018_period_1
somenamebasedondatasource_2019_period_2
somenamebasedondatasource_2019_period_3
somenamebasedondatasource_2019_period_4
somenamebasedondatasource_2019_period_5
somenamebasedondatasource_2019_period_6
somenamebasedondatasource_2019_period_7
someothernamedatasourcesource_2018_period_1
someothernamedatasourcesource_2018_period_2
someothernamedatasourcesource_2018_period_3
someothernamedatasourcesource_2018_period_4
someothernamedatasourcesource_2018_period_5
someothernamedatasourcesource_2018_period_6
someothernamedatasourcesource_2018_period_7
someothernamedatasourcesource_2019_period_8
someothernamedatasourcesource_2019_period_9
someothernamedatasourcesource_2019_period_10
someothernamedatasourcesource_2019_period_11
someothernamedatasourcesource_2019_period_12
someothernamedatasourcesource_2019_period_13
someothernamedatasourcesource_2018_period_1
someothernamedatasourcesource_2019_period_2
someothernamedatasourcesource_2019_period_3
someothernamedatasourcesource_2019_period_4
someothernamedatasourcesource_2019_period_5
someothernamedatasourcesource_2019_period_6
someothernamedatasourcesource_2019_period_7
I would like an output that lists:
2018
2019
Then when the list of tables gets bigger into 2020 and beyond, it lists any years for those tables too like
2018
2019
2020
SELECT TABLE_NAME
FROM information_schema.TABLES
WHERE TABLE_SCHEMA = database_name
/*not sure what else goes here.
After that I also want to do the same thing again for the period_X but only for a certain year. (so after a user selects the year from the first query, I want to show them the periods for that year from the result of the second query.)
PS: I can change the naming convention for the tables if that makes this easier, it's all just test data at this point. Each table does contain the year and period in a column in each of it's rows, I was only splitting them up to try to avoid big long select queries when grabbing the data for later use. (the tables contain a row for each minute of the day during office hours, so will end up fairly large and huge if multiple periods and years are put together.)

You can use string operations:
SELECT DISTINCT substring_index(substring_index(TABLE_NAME, '_', 2), '_', -1)
FROM information_schema.TABLES
WHERE TABLE_SCHEMA = database_name

Related

Attempting to use JOIN on CTEs but getting an error the table was not found

In Big Query I'm using the Austin Bike Share database. I'm attempting to create a new table which contains the subscriber types for bikeshare program, and the count of each subscriber type in 2021 and in 2022.
I'm creating two CTEs with counts for 2021 and then for 2022. Here is an example of my code and the resulting table for 2021:
--View subscriber counts for 2021
WITH subscriber_count2021 AS (
SELECT
count(*) AS subscriber_count_2021,
subscriber_type
FROM `adroit-petal-123456.bikeshare_trips.bikes_cleaned`
WHERE year = 2021
GROUP BY subscriber_type
ORDER BY subscriber_count_2021 desc
)
SELECT subscriber_count_2021 > subscriber_type
FROM subscriber_count2021
Resulting table
Now I'm trying to JOIN my two tables so I can have a count of my subscription types for 2021 and 2022 in one place. However, my attempt is resulting in error message: Table "subscriber_count2021" must be qualified with a dataset (e.g. dataset.table).
--Compare subscriber counts for 2021 and 2022
SELECT
subscriber_counts2021.*,
subscriber_counts_2022.*
FROM subscriber_count2021 JOIN
subscriber_count2022 ON subscriber_count2021.subscriber_type
= subscriber_count2022.subscriber_type
Based on other Stack Overflow posts this seems to be an issue with my aliasing in Big Query. I've attempted:
Using `` characters around my tables
Adding the table name as well as my CTE names ex FROM adroit-petal-123456.bikeshare_trips.bikes_cleaned.subscriber_count2021
Updating the SELECT statement to have table.column for each field I'm trying to use for my new table.
None of these seem to resolve the issue I'm facing.
You have a lot of typos in the above. The structure appears to be correct, however you're misnaming tables throughout.
Youre interchanging the word count and counts all over the place. You also appeared to be missing a closing parenthese around your cte.
Below, I standardized the table names to be plural subscriber_counts_202# and adjusted the column names to be singular subscriber_count_202#
It doesn't really matter which way you go with it, however the important thing is that you appropriately name your columns consistently, your tables consistently, and that you use the correct names in the correct places.
--View subscriber counts for 2021
WITH subscriber_counts2021 AS (
SELECT
count(*) AS subscriber_count_2021,
subscriber_type
FROM `adroit-petal-123456.bikeshare_trips.bikes_cleaned`
WHERE year = 2021
GROUP BY subscriber_type
ORDER BY subscriber_count_2021 desc
)
SELECT subscriber_count_2021 > subscriber_type
FROM subscriber_counts2021
--Compare subscriber counts for 2021 and 2022
SELECT
subscriber_counts2021.*,
subscriber_counts_2022.*
FROM subscriber_counts2021 JOIN
subscriber_counts2022 ON subscriber_counts2021.subscriber_type
= subscriber_counts2022.subscriber_type

Perform calculation without having to do it manually for each column?

I have the following view set up in SQL Server:
VIEW
(left table: population data per year; middle table: municipalities; right table: municipality areas in km²)
Query
SELECT
dbo.T_GEMEINDE.GKZ, dbo.T_GEMEINDE.NAME,
dbo.T_BASE_DAUERSIEDLUNGSRAUM_GEMEINDE.FLAECHE_KM2 / dbo.T_BASE_DAUERSIEDLUNGSRAUM_GEMEINDE.DAUERSIEDLUNGSRAUM_KM2 AS [ges. Fläche / Dauersiedlungsr.],
dbo.T_BASE_GEMEINDE_BEVOELKERUNG_JAHR_BEGINN.J2017 / dbo.T_BASE_DAUERSIEDLUNGSRAUM_GEMEINDE.FLAECHE_KM2 AS [ges. Bevölkerungsdichte],
dbo.T_BASE_GEMEINDE_BEVOELKERUNG_JAHR_BEGINN.J2017 / dbo.T_BASE_DAUERSIEDLUNGSRAUM_GEMEINDE.DAUERSIEDLUNGSRAUM_KM2 AS [Bevölkerungsdichte Dauersiedlungsraum]
FROM
dbo.T_BASE_DAUERSIEDLUNGSRAUM_GEMEINDE
INNER JOIN
dbo.T_GEMEINDE ON dbo.T_BASE_DAUERSIEDLUNGSRAUM_GEMEINDE.GKZ = dbo.T_GEMEINDE.GKZ
INNER JOIN
dbo.T_BASE_GEMEINDE_BEVOELKERUNG_JAHR_BEGINN ON dbo.T_GEMEINDE.GKZ = dbo.T_BASE_GEMEINDE_BEVOELKERUNG_JAHR_BEGINN.GKZ
The last column in the view contains a calculation (population density for 132 municipalities for a certain year) for the year 2017 and uses the column J2017 from the table seen on the left. This is the output (Bevölkerungsdichte Dauersiedlungsraum):
Current output:
OUTPUT
Desired output:
The rightmost column (Bevölkerungsdichte Dauersiedlungsraum) seen in the provided output screenshot has the output data of the calculation for the year 2017. The same output has to be generated for all the other years, but each as a separate column.
Question: how do I perform the calculation which you can see in the last column in the view for all years (J2017-J2050) without having to do it manually for each year column?
Thanks in advance.
if you want someone to provide you with a complete solution then you will need to supply:
CREATE TABLE statements for the 3 tables
INSERT INTO... statements to provide sample data for all 3 tables
However, if you just want a suggestion about how to approach this problem then I would use an UNPIVOT statement to create a view/table that
holds all the columns in dbo.T_BASE_GEMEINDE_BEVOELKERUNG_JAHR_BEGINN
apart from the "year" columns (J2017, J2018, j2019, ...)
adds a single "year" column with values from 2017 to 2050
adds a single value column to hold the population for each year
By joining your existing tables to this new table/view and grouping by your new "year" column you should achieve what you want

BigQuery can use wildcard table names and table_suffix, but I am looking for a similar solution like wildcard datasets and dataset_suffix

So if you process data daily and put the results into the same dataset, such as results, and each day will have the same table name (first part) and with date as table_suffix, such as result1_20190101, result1_20190102 etc., they you query the result tables use wildcard table names and table_suffix.
So your dataset/tables looks like
results/result1_20190101
results/result1_20190102
results/result2_20190101
results/result2_20190102
So I can query all the result1
select * from `xxxx.results.result1*`
But I arrange the results tables differently. Due to I have dozens tables processed each day. so to easily check and manage each day results. I use date as dataset.
so my dataset/tables look like
20190101/result1
20190101/result2
...
20190102/result1
20190102/result2
...
And my daily data process usually will not query cross dates(datasets). the daily results are pushed to next step data pipelines etc.
But once a while, I need to do some quick check, and I need to query across the dates(in my case, across the datasets)
so when I try to query result1, I have to hard code the dataset name.
select * from `xxxxxx.20190101/result1`
union all
select * from `xxxxxx.20190102/result1`
union all
...
1) First question is, are there anyway I could use wildcards and suffix on datasets, like we could with tables?
2) Second question: how could I use the date function, such as DATE_SUB(CURRENT_DATE(), INTERVAL 90 DAY) to get the date value and use the data value in the below query
select * from `xxxxxx.20190101/result1`
union all
select * from `xxxxxx.20190102/result1`
union all
...
to replace the hard coded value, 20190101, 20190102 etc.
There is no wildcards and/or suffix available on BigQuery datasets (at least as of today)
Meantime, you can check a feature request for INFORMATION_SCHEMA that is in Alpha now. You can apply for it by submitting form that is available there.
In short: you will be able to query list of datasets in the projects and then use it to construct your query. Please note - you still need to use some sort of client to script all this properly

How to repeat a pattern in an MS Access query

I have a MS Access database and in it are two tables called [Pattern] and [Element].
The following example's show the tables and their respective datatype.
Table 1: [Pattern]
patternID - key
pStart - short date
pEnd - short date
Table 2: [Element]
elementID - key
patternID
text -text(2)
I want to create a query where it will repeat the pattern contained within the text field of the element table. For example
for patternID = 1 there are 4 elementID entries with the text values 1,2,3,4
How do I get a query to repeat 1,2,3,4,1,2,3,4 for as long as the difference between the two dates, pStart and pEnd in the pattern table?
Hopefully this makes sense, thanks in advance. I usually work in Excel so Access is new to me.
You will need a number or factor table with one field with integers from 0 to at least the maximum day count you will have.
Then you can create the first Cartesian query:
Select
patternID
From
Pattern,
Factors
Where
DateAdd("d", [Factor], [pStart]) <= pEnd
Save this as, say, Patterns and create a second Cartesian query:
Select
Element.patternID,
Element.elementID,
Element.Text
From
Patterns,
Element
Where
Patterns.patternID = Element.patternID
Order By
Element.patternID,
Element.elementID,
Element.Text
I'm unsure if you are doing this in the query designer or sql view. If your writing the sql, this might help:
select distinct E.text
from Element E, Pattern P
where E.patternID = P.patternID
AND p.Start <> p.End

SQL query to produce a time x day grid from a list of timestamps?

Structures of my tables are as follow.
Table Name : timetable
timetable http://www.4shared.com/download/MYafV7-6ce/timetableTable.png
Table Name : slot_table
timetable http://www.4shared.com/download/9Lp_CBn2ba/slot_table.png
Table Name : instructor(this table is not required for this particular problem)
I want to show the resultant data in my android app in a timetable format somewhat like this:
random http://www.4shared.com/download/oAGiUXVAba/random.png
Question : What query i should write so that subjects of particular days with respective slots will be the result of the query?
1)The days should be in order like monday,tuesday,wednesday.
2)If monday has 2 subjects in 2 different slots then it should display like this :
Day 7:30-9:10AM 9:20-11:00AM
Monday Android Workshop Operating System
This is just a sample.
P.S:As timetable format is required,all the subjects with slot ids of all the days(monday to saturday) must be there in it.
Edit :
I tried this
select day,subject,slot from timetable,slot_table where timetable.slotid = slot_table.slotid
which gave a result :
a http://www.4shared.com/download/uMU7NA8Oce/random1.png
But i want it in a timetable format which i am not having an idea how to do that.
Edit :
Timetable sample format is something like this :
Edit :
I wrote a query
select timetable.day,count(slot_table.subject) as no_of_classes from timetable,slot_table where timetable.slotid = slot_table.slotid group by timetable.day
which resulted in
a http://www.4shared.com/download/rZW20_g8ce/random2.png
So now it shows monday has 2 classes in 2 slots,Tuesday has 1 class in 1 slot and so on.
Now any help on a query which can show the two slots(timings) on monday?
Solution :
select timetable.day,max(case when (slot='7:30-9:10AM') then slot_table.subject END) as "7:30-9:10AM",max(case when (slot='9:20-11:00AM') then slot_table.subject END) as "9:20-11:00AM",max(case when (slot='11:10-12:50PM') then slot_table.subject END) as "11:10-12:50PM",max(case when (slot='1:40-3:20PM') then slot_table.subject END) as "1:40-3:20PM", max(case when (slot='3:30-5:00PM') then slot_table.subject END) as "3:30-5:00PM" from timetable join slot_table on timetable.slotid = slot_table.slotid group by timetable.day
Result :
a http://www.4shared.com/download/1w7Tyicfce/random3.png
What you want is called a PIVOT query. In one of these, you have a select which gives the data in rows, like your result just under the EDIT (Day, subject, slot). Then you need to specify the values of the row you want to 'pivot' to become columns (slot in this example). Because a Pivot relies on the values of the column to be pivoted it can be difficult to write a general query, and the Postgres Wiki has an example using dymanic SQL and lots of code generating it at http://wiki.postgresql.org/wiki/Pivot_query
In your case, given that slots look like they're fixed and you might be able to hard-code them (that's a decision you'll have make yourself).
NB I am not a Postgres user, but it looks like it can do it (and I would have been very surprised if it couldn't).
This is a pivot or crosstab query. PostgreSQL has only limited support for these via the crosstab function in the tablefunc module.
It can sometimes be better to just deal with this in the application, accumulating the data into a table as you read each data point.