SQL to sum data in Snowflake view - sql

I have a Snowflake table with the following fields:
Date
Transaction Type
Transaction Speed
Company
The table has millions of rows, so I want to summarize the data which will then feed into Power BI. I want to group by Date, then Transaction Type, then Company, and sum the values in Transaction Speed.
I'm very new to SQL and have created some basic views, but am having trouble creating the summarization. Can anyone give me some guidance?

It is usually helpful to provide an example of what you have tried, but assuming I understand your requirements, you're likely looking for something like this:
SELECT date, transaction_type, company, sum(transaction_speed) as total_transaction_speed
FROM table
GROUP BY date, transaction_type, company;

Related

BIgquery: How do I schedule a query to append an existing table with a new table?

I am new to Biguery and I need to understand the functions on how to create automation within the system. My task is to create a near real-time dynamic dashboard of my company's sales data from Google Cloud Storage to BigQuery to Google Data Studio. I want to be sure I create one truth of data that will continuously be updated with current sales data moving forward rather than having to generate a new table from which to do data analysis on.
I have my datasets exported from our POS, and uploaded files to GCS in a bucket. This will occur on a weekly basis pulling sales data from the previous week.
In Bigquery, I provide that dataset from GCS to create a table according to its time period (For example, Oct_Wk2_2022).
With my limited background in SQL, I created a saved query where I made a UNION for each table to save the query as a new table. Here's a snippet of the SQL:
SELECT
Date, Time, Category, Item, Qty, Price_Point_Name, SKU, Modifiers_Applied, Gross_Sales, Discounts, Net_Sales, Tax, Location, Customer_Name
FROM
store-sales-daily-item-summary.Square_Sales_2022.Oct_Wk2_2022
UNION ALL
SELECT
Date, Time, Category, Item, Qty, Price_Point_Name, SKU, Modifiers_Applied, Gross_Sales, Discounts, Net_Sales, Tax, Location, Customer_Name
FROM
store-sales-daily-item-summary.Square_Sales_2022.Oct_Wk1_2022
When I export to BigQuery Table, it asks for a Destination table - from here, I can't add to an existing table (created from a previous BigQuery).
How do I create a scheduled query in SQL to append an existing table when I add a new table into BigQuery? What should my query look like to make that happen?
merge from the data manipulation language will do the trick of updating your existing table with least effort.
You can also handle duplicates with merge across the existing and new data.
next you can schedule the merge query.
https://cloud.google.com/bigquery/docs/reference/standard-sql/dml-syntax#update_statement
https://cloud.google.com/bigquery/docs/reference/standard-sql/dml-syntax#merge_statement

How to create an aggregate table (data mart) that will improve chart performance?

I created a table named user_preferences where user preferences have been grouped by user_id and month.
Table:
Each month I collect all user_ids and assign all preferences:
city
district
number of rooms
the maximum price they can spend
The plan assumes displaying a graph showing users' shopping intentions like this:
The blue line is the number of interested users for the selected values in the filters.
The graph should enable filtering by parameters marked in red.
What you see above is a simplified form for clarifying the subject. In fact, there are many more users. Every month, the table increases by several hundred thousand records. The SQL query retrieving data (feeding) for chart lasts up to 50 seconds. It's far too much - I can't afford it.
So, I need to create a table (table/aggregation/data mart) where I will be able to insert the previously calculated numer of interested users for all combinations. Thanks to this, the end user will not have to wait for the data to count.
Details below:
Now the question is - how to create such a table in PostgreSQL?
I know how to write a SQL query that will calculate a specific example.
SELECT
month,
count(DISTINCT user_id) interested_users
FROM
user_preferences
WHERE
month BETWEEN '2020-01' AND '2020-03'
AND city = 'Madrid'
AND district = 'Latina'
AND rooms IN (1,2)
AND price_max BETWEEN 400001 AND 500000
GROUP BY
1
The question is - how to calculate all possible combinations? Can I write multiple nested loop in SQL?
The topic is extremely important to me, I think it will also be useful to others for the future.
I will be extremely grateful for any tips.
Well, base on your query, you have the following filters:
month
city
distirct
rooms
price_max
You can try creating a view with the following structure:
SELECT month
,city
,distirct
,rooms
,price_max
,count(DISTINCT user_id)
FROM user_preferences
GROUP BY month
,city
,distirct
,rooms
,price_max
You can make this view materialized. So, the query behind the view will not be executed when queried. It will behave like table.
When you are adding new records to the base table you will need to refresh the view (unfortunately, posgresql does not support auto-refresh like others):
REFRESH MATERIALIZED VIEW my_view;
or you can scheduled a task.
If you are using only exact search for each field, this will work. But in your example, you have criteria like:
month BETWEEN '2020-01' AND '2020-03'
AND rooms IN (1,2)
AND price_max BETWEEN 400001 AND 500000
In such cases, I usually write the same query but SUM the data from the materialized view. In your case, you are using DISTINCT and this may lead to counting a user multiple times.
If this is a issue, you need to precalculate too many combinations and I doubt this is the answer. Alternatively, you can try to normalize your data - this will improve the performance of the aggregations.

SQL Server 2005 simple query to get all dates between two dates

I know there are a lot of solutions to this but I am looking for a simple query to get all the dates between two dates.
I cannot declare variables.
As per the comment above, it's just guesswork without your table structures and further detail. Also, are you using a 3NF database or star schema structures, etc. Is this a transaction system or a data warehouse?
As a general answer, I would recommend creating a Calendar table, that way you can create multiple columns for Working Day, Weekend Day, Business Day, etc. and add a date key value, starting at 1 and incrementing each day.
Your query then is a very simple sub-select or join to the table to do something like
SELECT date FROM Calendar WHERE date BETWEEN <x> AND <y>
How to create a Calender table for 100 years in Sql
There are other options like creating the calendar table using iterations (eg, as a CTE table) and linking to that.
SQL - Create a temp table or CTE of first day of the month and month names

SQL to JPA #NamedQuery Conversion

I'm looking for an intelligent way to convert this SQL statement into #NamedQuery --if there is a way at all?
SELECT MONTH(dateField), sum(value) FROM mydb.records where status ='paid' group by MONTH(dateField) order by MONTH(dateField);
I have a JPA #Entity called Record (Hibernate). This details all invoices in the system that are created on daily basis. There will be many entries per month. Each record will have a status of paid, overdue, value, and lots of other info such as name and address of customer and so on etc.
The above statement basically summarises all the data on a month by month basis and sums the value of all paid invoices per month giveing a summary of all invoices paid in January, all paid in Feb and so on.....The result looks somethings like:
datefield value
1 4500
2 5500
3 5669
The only way I can think of doing this using JPA #NamedQuery is to select all records in the table that are of status 'pai'd and then use my Java code to do the sorting, ordering and addition in a rather slow and ugly fashion! Is there a clever way I can do this with #NamedQuery?
Thanks
MONTH() is not a standard JPQL function defined in the JPA spec, however there is an alternative.
Make a view in your database that leverages your databases month function.
create view MONTLY_REVENUE as
SELECT MONTH(dateField),
sum(value)
FROM mydb.records
where status ='paid'
group by MONTH(dateField)
order by MONTH(dateField);
Then just create this entity as you would from any other table with JPA. You can select from the view but will not be able to save and update it, however since your using the aggregates its not like you would anyway.

How to create a view for "yesterday"s data?

I'm partitioning my data on BigQuery by day, and I want a quick way to query "yesterday's data".
Is this possible? How can I write queries that automatically point to the latest data, without having to re-write the tables I want to query?
You can create a view with TABLE_QUERY to find yesterday's (or an arbitrary relative date) data.
For example, GitHubArchive stores daily tables, and I created a view that points to yesterday's table:
SELECT *
FROM TABLE_QUERY(githubarchive:day, 'table_id CONTAINS "events_"
AND table_id CONTAINS STRFTIME_UTC_USEC(DATE_ADD(CURRENT_TIMESTAMP(), -1, "day"), "%Y%m%d")')
You can test and query this view:
SELECT COUNT(*)
FROM [fh-bigquery:public_dump.github_yesterday]