Db restructure due to performance issue - sql

Currently I have arround 250 clients with their 5 years datas and the tables structure were splited up based on their years (Eg),A client named as XX.
T00_XX_2011,T00_XX_2012,T00_XX_2013,T00_XX_2014 each table contains 220 column with more or less 10 millions records in which 12 column already has indexes
The issue was for a single select query it get arround 5 to 10 min Can anyone help to tweek the performance

Related

Generate dynamic date columns in a SELECT query SQL

First of I've got a table like this:
vID
bID
date
type
value
1
100
22.01.2021
o
250.00
1
110
25.01.2021
c
100.00
2
120
13.02.2021
o
400.00
3
130
20.02.2021
o
475.00
3
140
11.03.2022
c
75.00
1
150
15.03.2022
o
560.00
To show which values were ordered(o) and charged(c) per Month, I have to like 'generate' columns for each month both ordered and charged in a MSSQL SELECT query.
Here is an example table of what I want to get:
vID
JAN2021O
JAN2021C
FEB2021O
FEB2021C
…
MAR2022O
MAR2022C
1
250.00
100.00
560.00
2
400.00
3
475.00
75.00
I need a posibility to join it in a SQL SELECT in addition to some other columns I already have.
Does anyone has an idea and could help me please?
The SQL language has a very strict requirement to know the number of columns in the results and the type of each column at query compile time, before looking at any data in the tables. This applies even to SELECT * and PIVOT queries, where the columns are still determined at query compile time via the table definition (not data) or SQL statement.
Therefore, what you want to do is only possible in a single query if you want to show a specific, known number of months from a base date. In that case, you can accomplish this by specifying each column in the SQL and using date math with conditional aggregation to figure the value for each of the months from your starting point. The PIVOT keyword can help reduce the code, but you're still specifying every column by hand, and the query will still be far from trivial.
If you do not have a specific, known number of months to evaluate, you must do this over several steps:
Run a query to find out how many months you have.
Use the result from step 1 to dynamically construct a new statement
Run the statement constructed in step 2.
There is no other way.
Even then, this kind of pivot is usually better handled in the client code or reporting tool (at the presentation level) than via SQL itself.
It's not as likely to come up for this specific query, but you should also be aware there are certain security issues that can be raised from this kind of dynamic SQL, because some of the normal mechanisms to protect against injection issues aren't available (you can't parameterize the names of the source columns, which are dependent on data that might be user-generated) as you build the new query in step 2.

Access SQL - Add Row Number to Query Result for a Multi-table Join

What I am trying to do is fairly simple. I just want to add a row number to a query. Since this is in Access is a bit more difficult than other SQL, but under normal circumstances is still doable using solutions such as DCount or Select Count(*), example here: How to show row number in Access query like ROW_NUMBER in SQL or Access SQL how to make an increment in SELECT query
My Issue
My issue is I'm trying to add this counter to a multi-join query that orders by fields from numerous tables.
Troubleshooting
My code is a bit ridiculous (19 fields, seven of which are long expressions, from 9 different joined tables, and ordered by fields from 5 of those tables). To make things simple, I have an simplified example query below:
Example Query
SELECT DCount("*","Requests_T","[Requests_T].[RequestID]<=" & [Requests_T].[RequestID]) AS counter, Requests_T.RequestHardDeadline AS Deadline, Requests_T.RequestOverridePriority AS Priority, Requests_T.RequestUserGroup AS [User Group], Requests_T.RequestNbrUsers AS [Nbr of Users], Requests_T.RequestSubmissionDate AS [Submitted on], Requests_T.RequestID
FROM (((((((Requests_T
INNER JOIN ENUM_UserGroups_T ON ENUM_UserGroups_T.UserGroups = Requests_T.RequestUserGroup)
INNER JOIN ENUM_RequestNbrUsers_T ON ENUM_RequestNbrUsers_T.NbrUsers = Requests_T.RequestNbrUsers)
INNER JOIN ENUM_RequestPriority_T ON ENUM_RequestPriority_T.Priority = Requests_T.RequestOverridePriority)
ORDER BY Requests_T.RequestHardDeadline, ENUM_RequestPriority_T.DisplayOrder DESC , ENUM_UserGroups_T.DisplayOrder, ENUM_RequestNbrUsers_T.DisplayOrder DESC , Requests_T.RequestSubmissionDate;
If the code above is trying to select a field from a table not included, I apologize - just trust the field comes from somewhere (lol i.e. one of the other joins I excluded to simply the query). A great example of this is the .DisplayOrder fields used in the ORDER BY expression. These are fields from a table that simply determines the "priority" of an enum. Example: Requests_T.RequestOverridePriority displays to the user as an combobox option of "Low", "Med", "High". So in a table, I assign a numerical priority to these of "1", "2", and "3" to these options, respectively. Thus when ENUM_RequestPriority_T.DisplayOrder DESC is called in order by, all "High" priority requests will display above "Medium" and "Low". Same holds true for ENUM_UserGroups_T.DisplayOrder and ENUM_RequestNbrUsers_T.DisplayOrder.
I'd also prefer to NOT use DCOUNT due to efficiency, and rather do something like:
select count(*) from Requests_T where Requests_T.RequestID>=RequestID) as counter
Due to the "Order By" expression however, my 'counter' doesn't actually count my resulting rows sequentially since both of my examples are tied to the RequestID.
Example Results
Based on my actual query results, I've made an example result of the query above.
Counter Deadline Priority User_Group Nbr_of_Users Submitted_on RequestID
5 12/01/2016 High IT 2-4 01/01/2016 5
7 01/01/2017 Low IT 2-4 05/06/2016 8
10 Med IT 2-4 07/13/2016 11
15 Low IT 10+ 01/01/2016 16
8 Low IT 2-4 01/01/2016 9
2 Low IT 2-4 05/05/2016 2
The query is displaying my results in the proper order (those with the nearest deadline at the top, then those with the highest priority, then user group, then # of users, and finally, if all else is equal, it is sorted by submission date). However, my "Counter" values are completely wrong! The counter field should simply intriment +1 for each new row. Thus if displaying a single request on a form for a user, I could say
"You are number: Counter [associated to RequestID] in the
development queue."
Meanwhile my results:
Aren't sequential (notice the first four display sequentially, but then the final two rows don't)! Even though the final two rows are lower in priority than the records above them, they ended up with a lower Counter value simply because they had the lower RequestID.
They don't start at "1" and increment +1 for each new record.
Ideal Results
Thus my ideal result from above would be:
Counter Deadline Priority User_Group Nbr_of_Users Submitted_on RequestID
1 12/01/2016 High IT 2-4 01/01/2016 5
2 01/01/2017 Low IT 2-4 05/06/2016 8
3 Med IT 2-4 07/13/2016 11
4 Low IT 10+ 01/01/2016 16
5 Low IT 2-4 01/01/2016 9
6 Low IT 2-4 05/05/2016 2
I'm spoiled by PLSQL and other software where this would be automatic lol. This is driving me crazy! Any help would be greatly appreciated.
FYI - I'd prefer an SQL option over VBA if possible. VBA is very much welcomed and will definitely get an up vote and my huge thanks if it works, but I'd like to mark an SQL option as the answer.
Unfortuantely, MS Access doesn't have the very useful ROW_NUMBER() function like other clients do. So we are left to improvise.
Because your query is so complicated and MS Access does not support common table expressions, I recommend you follow a two step process. First, name that query you already wrote IntermediateQuery. Then, write a second query called FinalQuery that does the following:
SELECT i1.field_primarykey, i1.field2, ... , i1.field_x,
(SELECT field_primarykey FROM IntermediateQuery i2
WHERE t2.field_primarykey <= t1.field_primarykey) AS Counter
FROM IntermediateQuery i1
ORDER BY Counter
The unfortunate side effect of this is the more data your table returns, the longer it will take for the inline subquery to calculate. However, this is the only way you'll get your row numbers. It does depend on having a primary key in the table. In this particular case, it doesn't have to be an explicitly defined primary key, it just needs to be a field or combination of fields that is completely unique for each record.

BigQuery Row Limits

Google says BigQuery can handle billions of rows.
For my application I estimate a usage of 200,000,000 * 1000 rows. Well over a few billion.
I can partition data into 200,000,000 rows per partition but the only support for this in BigQuery seems to be different tables. (please correct me if I am wrong)
The total data size will be around 2TB.
I saw in the examples some large data sizes, but the rows were all under a billion.
Can BigQuery support the number of rows I am dealing with in a single table?
If not, can I partition it in any way besides multiple tables?
Below should answer your question
I run it agains one of our dataset
As you can see tables size close to 10TB with around 1.3-1.6 Billion rows
SELECT
ROUND(size_bytes/1024/1024/1024/1024) as TB,
row_count as ROWS
FROM [mydataset.__TABLES__]
ORDER BY row_count DESC
LIMIT 10
I think the max table we dealt so far was at least up to 5-6 Billion and all worked as expected
Row TB ROWS
1 10.0 1582903965
2 11.0 1552433513
3 10.0 1526783717
4 9.0 1415777124
5 10.0 1412000551
6 10.0 1410253780
7 11.0 1398147645
8 11.0 1382021285
9 11.0 1378284566
10 11.0 1369109770
Short answer: Yes, BigQuery will handle this just fine, even if you put all the data in a single table.
If you do want to partition your data, the only way to do it right now is to explicitly store your data in multiple tables. You might consider doing so to reduce your bill if you frequently query only a subset of your data. Many users partition their data by date and use table wildcard functions to write queries across a subset of those partitioned tables.

How to use count for each row PostgreSQL

I have a table in pgadmin for postgres which holds two private keys from two other tables. Its about modeling a hospital database in PostgreSQL. I created two tables "bed" and "room". I save the private keys from this tables in a table called "bed_rooms" which will give information about how many beds are inside a room. So far the theory. I added a picture of my table "betten_zimmer" which is german and stands for "bed_rooms".
I want to make a query which tells me how many beds a room is holding. I want to see every row with the number of beds.
My next idea is to add a trigger which will fire when a room holds more than 4 beds or less than 0.
How do I make this query? AND how would the trigger look?
This is my code so far. I have 60 rooms and cant add 60 times OR .. = 5 .. 60
SELECT zimmerid, count(*)
FROM betten_zimmer
WHERE zimmerid = 1 OR zimmerid = 2 OR zimmerid = 3 OR zimmerid = 4
GROUP BY zimmerid
Thanks in advance. If anything is unclear comment this post.
If I well understood you only need a BETWEEN construct :
WHERE zimmerid BETWEEN 1 AND 60

Join More Than 2 Tables

I have three tables.
Table Data contains data for individual parts that come from a
"data.txt" file.
Table Limits contains the limits for the Data table
from a "limits.txt" file.
Table Files is a listing for
each individual .txt file above.
So the "Files" table looks like this. As you can see it is a listing of each file that exists. The LimitsA file will contain the limits for every Data file of type A.
ID File_Name Type Sub-Type
1 DataA_10 A 10
2 DataA_20 A 20
3 DataA_30 A 30
4 LimitsA A NONE
5 DataB_10 B 10
6 DataB_20 B 20
7 LimitsB B NONE
The "Data" table looks like this. The File_ID is the foreign key from the "Files" table. Specifically, this would be data for DataA_10 above:
ID File_ID Dat1 Dat2 Dat3... Dat20
1 1 50 52 53
2 1 12 43 52
3 1 32 42 62
The "Limits" table looks like this. The File_ID is the foreign key from the "Files" table. Specifically, this would be data for LimitsA above:
ID File_ID Sub-Type Lim1 Lim2
1 4 10 40 60
2 4 20 20 30
3 4 30 10 20
So what I want to do is JOIN the correct limits from the "Limit" table to the data from the corresponding "Data" table. Each row of DataA_10 would have the limits of "40" and "60" from the LimitsA table. Unfortunately there is no way to directly link the limits table to the data table. The only way to do this would be to look back to the files table and see that LimitsA and DataA_10 are of type A. Once I link those two together I then need to specifically only grab the Limits for Sub-Type 10.
In the end I would like to have a result that looks like this.
Result:
ID File_ID Dat1 Dat2 Dat3... Dat20 Lim1 Lim2
1 1 50 52 53 40 60
2 1 12 43 52 40 60
3 1 32 42 62 40 60
I hope this is clear enough to understand. It seems to me like an issue of joining more than 2 tables, but I have been unable to find a suitable solution online as of yet. If you have a solution or any advice it would be greatly appreciated.
Your 'Files' table is actually 2 separate (but related) concepts that have been merged. If you break them out using subqueries you'll have a much easier time making a join. Note that joining like this is not the most efficient method, but then again neither is the given schema...
SELECT Data.*, Limits.Lim1, Limits.Lim2
FROM (SELECT * FROM Files WHERE SubType IS NOT NULL) DataFiles
JOIN (SELECT * FROM Files WHERE SubType IS NULL) LimitFiles
ON LimitFiles.Type = DataFiles.Type
JOIN Data
ON DataFiles.ID = Data.File_ID
JOIN Limits
ON LimitFiles.ID = Limits.File_ID
AND DataFiles.SubType = Limits.SubType
ORDER BY Data.File_ID
UPDATE
To be more specific on how to improve the schema: Currently, the Files table doesn't have a clear way to differentiate between Data and Limit file entries. Aside from this, the Data entries don't have a clear link to a single Limit file entry. Although both of these can be figured out as in the SQL above, such logic might not play well with the query optimizer, and certainly can't guarantee the Data-Limit link that you require.
Consider these options:
Instead of linking to a 'Limit' file via Type, link directly to a Limit entry Id. Set a foreign key on that link to ensure the expected Limit entry is available.
Separate the 'Limit' entries from the 'Data' entries by putting them in a separate table.
Create an index on the foreign key. For that matter, add indices for all foreign keys - SQL Server doesn't do this by default.
Of these, I would consider having a foreign key as essential, and the others as modest improvements.