SQL Query optimisation for PS

SQL Query optimisation for PS - sql

I am currently creating my webshop for my local parts store using prestashop 1.7.8.6
I developed scripts on myself and have successfully made the website work correctly.
But, with 2 millions rows of products including each 30 columns and multiple joins, i can't have a decent loading time on getProducts query.
Even with indexes and cache...
I use a simple query on prestashop product table and join id products from car filter table to match ps_product table.
I would like to know if it would be better to create tables for each vehicle using an id, and fill it with ps_product data, to use this table only instead of using multiple joins.
I'am using innoDB as engine.
Thanks

Prestashop does not have great performance with such a huge amount of data/products, as you have seen using native methods, so the best option is to strengthen your MySQL server.
Consider using one or more dedicated machines for SQL with replication),
by saving your data in external tables or store it to some distributed NoSQL system built to deal with large amount of data (like Elasticsearch or similar) so you can scale it easily and you can write your own code/module to retrieve what you need.

Related

Postgres filter on a 'many' table with pagination on the 'one' table

I have a Postgress question/challange
There is a table 'products' with a lot of rows (millions).
Each product is of a certain Class.
Each product has a number of features of different type:
A Feature could be 'color' and the value is a picklist of all colors.
A Feature could be Voltages with a numerical value of (low) 220 to (high) 240.
There can be up to 100 features for each product.
What is done is to put all features of a product in a Many-table (with the Product table as the One).
So, this table is even bigger (much bigger).
Standard query (no Feature-filters)
A query comes along for all products of that Class. This can result is a lot of products, so Pagination is implemented on the SQL Query.
I solved this by query the products table first, then a separate query on the feature-table , gather all features for the products in the first batch and add them to the result (in the NodeJS Api application)
Problem with using a Feature-filter
But now a new request comes along to request for product of a certain Class, and matching the value for a certain feature.
It is not possible to use the same method as before and just filter out all products not matching the value for the specific feature mentioned in the request.
Because post-processing the database result and taking out products (not matching the Feature-value) will mess up the pagination (which comes from the database).
Possible Solutions
The following solutions I have already thought of:
Go the MongoDB way
Just put everything of a product in one record, and use Array's in Postgres for the features.
Downside is that array's can become quite large and I don't know how Postgres performance will be on very large records.
(Maybe I should go with MongoDB, which is filled by Postgres, just to handle requests)
Any tips here?
Forget pagination from the database
Just do not do the pagination in the database abnd handle it in NodeJS. Then I can do the postprocessing in javascript.
But I need to use WHERE clause for filtering (not LIMIT/OFFSET) which makes it quite complex and costs a lot of memory on the NodeJS Application.
This is not the best solution.
Use another technique?
I'm not familiar with Data Warehousing techniques, but is there a solution lurking in that area?
Current stack is Python, Postgres, NodeJS for the API. Any other tools which can help me?

How can I perform the same query on multiple tables in Redshift

I'm working in SQL Workbench in Redshift. We have daily event tables for customer accounts, the same format each day just with updated info. There are currently 300+ tables. For a simple example, I would like to extract the top 10 rows from each table and place them in 1 table.
Table name format is Events_001, Events_002, etc. Typical values are Customer_ID and Balance.
Redshift does not appear to support declare variables, so I'm a bit stuck.

You've effectively invented a kind of pseudo-partitioning; where you manually partition the data by day.
To manually recombine the tables create a view to union everything together...
CREATE VIEW
events_combined
AS
SELECT 1 AS partition_id, * FROM events_001
UNION ALL
SELECT 2 AS partition_id, * FROM events_002
UNION ALL
SELECT 3 AS partition_id, * FROM events_003
etc, etc
That's a hassle, you need to recreate the view every time you add a new table.
That's why most modern databases have partitioning schemes built in to them, so all the boiler-plate is taken care of for you.
But RedShift doesn't do that. So, why not?
In general because RedShift has many alternative mechanisms for dividing and conquering data. It's columnar, so you can avoid reading columns you don't use. It's horizontally partitioned across multiple nodes (sharded), to share the load with large volumes of data. It's sorted and compressed in pages to avoid loading rows you don't want or need. It has dirty pages for newly arriving data, which can then be cleaned up with a VACUUM.
So, I would agree with others that it's not normal practice. Yet, Amazon themselves do have a help page (briefly) describing your use case.
https://docs.aws.amazon.com/redshift/latest/dg/c_best-practices-time-series-tables.html
So, I'd disagree with "never do this". Still, it is a strong indication that you've accidentally walked in to an anti-pattern and should seriously re-consider your design.

As others have pointed out many small tables in Redshift is really inefficient, like terrible if taken to the extreme. But that is not your question.
You want to know how to perform the same query on multiple tables from SQL Workbench. I'm assuming you are referring to SQLWorkbench/J. If so you can define variables in the bench and use these variable in queries. Then you just need to update the variable and rerun the query. Now SQLWorkbench/J doesn't offer any looping or scripting capabilities. If you want to loop you will need to wrap the bench in a script (like a BAT file or a bash script).
My preference is to write a jinja template with the SQL in it along with any looping and variable substitution. Then apply a json with the table names and presto you have all the SQL for all the tables in one file. I just need to run this - usually with the psql cli but at times I'm import it into my bench.
My advice is to treat Redshift as a query execution engine and use an external environment (Lambda, EC2, etc) for the orchestration of what queries to run and when. Many other databases (try to) provide a full operating environment inside the database functionality. Applying this pattern to Redshift often leads to problems. Use Redshift for what it is great at and perform the other actions elsewhere. In the end you will find that the large AWS ecosystem provides extended capabilities as compared to other databases, it's just that these aren't all done inside of Redshift.

Getting OLTP like performance from BigQuery results

I'm working on a project where we need to display BigQuery results in a table within a web application.
We’ve built the feature by paging, sorting and searching directly in BigQuery, but the performance isn’t what you would expect of a modern web application. It takes several seconds the apply a search term or change a page.
I can't really share much code, but this a general question that applies any large resultset generated in BigQuery.
For little bit of context. We create a view in BigQuery by joining a product catalog to orders.
WITH Catalog AS
(
SELECT
productId,
FROM `CatalogTable`
),
Orders AS (
SELECT
p.productId,
SUM(p.qty) AS qty
FROM `OrdersView` as o, o.products AS p
GROUP BY p.productId
)
SELECT
c.productId,
IF(o.qty IS NULL, 0, o.qty) AS qty,
ROW_NUMBER() OVER(ORDER BY qty DESC) as salesRank
FROM Catalog AS c
LEFT JOIN
Orders AS o
ON CONCAT(c.name, c.sku) = CONCAT(o.name, o.sku)
And the view is queried like so:
SELECT ...
FROM `catalog` c
LEFT JOIN `catalogView` cv
WHERE c.name LIKE '%searchTerm%'
LIMIT 10
OFFSET 0
What are the options for making this grid-view perform as it would if it were built on a traditional SQL database (or close to the performance)?
I've considered clustering, but i don't believe this is an option since i'm not partitioning the table:
https://medium.com/google-cloud/bigquery-optimized-cluster-your-tables-65e2f684594b
NOTES:
It's acceptable for the results to be a little delayed, if streaming the results into another database is an option.
The query is called via a WebApi endpoint and displayed in an Angular grid-view.
New orders are imported every 15 minutes so the results from this query won't be entirely static, they can change periodically.
The data-grid must support paging, sorting and searching, and the grid could contain 10,000 plus results.

BigQuery should not be used if you expect OLTP behavior or performance.
In your case, if you want to keep your project on GCP and also keep your data model as similar as possible with the model you already have, I would suggest you to take a look at Cloud SQL and Cloud Spanner.
Both are fully managed Relational Databases. The main difference is that Cloud Spanner is horizontally scalable whereas Cloud SQL is not, i.e. if you need only one node, use Cloud SQL. If you need to grow up your cluster, use Cloud Spanner.
Furthermore, both of them have it's respective Web APIs. You can find the Cloud Spanner Web API reference here. For the Cloud SQL, the reference depends on which DBMS you choose: SQLServer, MySQL or PostgreSQL.
I hope it helps

Using clob instead of multiple tables

I am a newbie when it comes with the effect of Database design on performance. I am creating an app which uses hibernate to query the DB. I am trying to create a DB schema for the application and I'm confused between using clobs and separate tables for some of the data. I have the following tables:
Person(id, name)
Address(person_id,address(varchar2))
products_bought(person_id, product_name(varchar2))
places_visited(person_id, place_name(varchar2))
..few more tables
I am sure I need to write/read the data to/from all these table every time. I'm thinking of designing them this way to reduce the tables thereby reducing the number of joins I need to make and make it easy for hibernate to fetch the info in one-go:
Person(id, name, products_bought(CLOB), places_visited(CLOB))
Address(person_id,address(varchar2))
..few more tables
Now I came across many posts online arguing that the performance will take a hit when using CLOBs. Same goes when using JOINs. How do I decide which is better based on the fact that the person table will not have more than 10K rows?

SQL Server 2012 Query Performance

I will be starting a project soon using SQL Server 2012 where I will be required to provide real-time querying of database tables in excess of 4 billion records in 1 of the tables alone. I am fairly familiar with SQL Server (I have indexes on the relevant columns), but have never had to deal with databases so large before.
I have been looking into partitioning and am fairly confident at using it, however it is only available in the Enterprise version(?) for which the licenses are WAY too expensive. Column Store indexes also look promising, but as well as only being in Enterprise version, they also render your table read-only(??). Another option is to archive data as soon as it is not being used in live so that I keep as little data in the live tables as possible.
The main queries on the largest table will be on a NVARCHAR(50) column which contains an ID. Initial testing with 4 billion records using a query to pull a single record based on the ID is taking in excess of 5 mins even with indexing. So my question is (and sorry if it sounds naive!): can somebody please suggest a way to speed up the queries on this table that I haven't mentioned (and therefore don't know about)? Many thanks in advance.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas