Create columns from rows in SQL [duplicate] - sql

This question already has answers here:
PostgreSQL Crosstab Query
(7 answers)
Closed last month.
I have a table of historical prices of a product, that keeps track of product price changes over time, something like:
CREATE TABLE product
product_id int
price_date date
price number
I would like to display data in the format:
[product_id] [2022-12-05] [2022-12-19] [2022-12-31] [2023-01-03]
112 4.23 4.5 4.5 4.86
113 3.98 3.91 5.39 5.45
ie. one row would contain the product id and the prices on each date, for however many dates there are. I don't know the number of distinct dates in advance. Most products will have updates on the same date; if there's a value missing, it's ok to display null for it.
Is this even possible in SQL or is my best bet to do this in the application logic? I'm using PostgreSQL if that makes a difference.

The Postgres Crosstab feature has already been covered here. Since the columns you want are dates, I suspect you need those to be dynamic. In that case, you're going to have to write yourself a function that reads the distinct values of price_date and executes a query string that's constructed on the fly.

Related

How to create an aggregate table (data mart) that will improve chart performance?

I created a table named user_preferences where user preferences have been grouped by user_id and month.
Table:
Each month I collect all user_ids and assign all preferences:
city
district
number of rooms
the maximum price they can spend
The plan assumes displaying a graph showing users' shopping intentions like this:
The blue line is the number of interested users for the selected values in the filters.
The graph should enable filtering by parameters marked in red.
What you see above is a simplified form for clarifying the subject. In fact, there are many more users. Every month, the table increases by several hundred thousand records. The SQL query retrieving data (feeding) for chart lasts up to 50 seconds. It's far too much - I can't afford it.
So, I need to create a table (table/aggregation/data mart) where I will be able to insert the previously calculated numer of interested users for all combinations. Thanks to this, the end user will not have to wait for the data to count.
Details below:
Now the question is - how to create such a table in PostgreSQL?
I know how to write a SQL query that will calculate a specific example.
SELECT
month,
count(DISTINCT user_id) interested_users
FROM
user_preferences
WHERE
month BETWEEN '2020-01' AND '2020-03'
AND city = 'Madrid'
AND district = 'Latina'
AND rooms IN (1,2)
AND price_max BETWEEN 400001 AND 500000
GROUP BY
1
The question is - how to calculate all possible combinations? Can I write multiple nested loop in SQL?
The topic is extremely important to me, I think it will also be useful to others for the future.
I will be extremely grateful for any tips.
Well, base on your query, you have the following filters:
month
city
distirct
rooms
price_max
You can try creating a view with the following structure:
SELECT month
,city
,distirct
,rooms
,price_max
,count(DISTINCT user_id)
FROM user_preferences
GROUP BY month
,city
,distirct
,rooms
,price_max
You can make this view materialized. So, the query behind the view will not be executed when queried. It will behave like table.
When you are adding new records to the base table you will need to refresh the view (unfortunately, posgresql does not support auto-refresh like others):
REFRESH MATERIALIZED VIEW my_view;
or you can scheduled a task.
If you are using only exact search for each field, this will work. But in your example, you have criteria like:
month BETWEEN '2020-01' AND '2020-03'
AND rooms IN (1,2)
AND price_max BETWEEN 400001 AND 500000
In such cases, I usually write the same query but SUM the data from the materialized view. In your case, you are using DISTINCT and this may lead to counting a user multiple times.
If this is a issue, you need to precalculate too many combinations and I doubt this is the answer. Alternatively, you can try to normalize your data - this will improve the performance of the aggregations.

SQL data sorting by column [duplicate]

This question already has answers here:
How does sql server sort your data?
(4 answers)
Closed 5 years ago.
I am facing one issue I cant handle yet.
Here's the deal: I am working on a program which should monitor employees working hours. So far, I created a SQL Server table called TablicaSQL with 4 columns:
Id, Ime (Name), Datum (date), BrojSati (WorkingHours)
It saves data according to the time of saving.
Example: if I enter Kristijan (name) worked on 2017-11-03 4 hours, but tomorrow if I save that Kristijan worked on 2017-11-01 4 hours, it will show which data has been saved first, which in this case is 2017-11-03.
So my question is: How can I sort my data according the column Datum (date), NOT by the date of saving the data.
Also, I am not looking for query which says something like this:
SELECT *
FROM..
ORDER BY...ASC/DESC
I need some kind of "permanetly asc/desc query".
Here is the screenshot of my table
There isn't a permanent order on database table. They are unorder data set. The data isn't order by the data of creation. Is just returned in the order is storage. But that can change if db engine optimizer find a better way to read the data. Multiple Partition, Clusters, etc.
If you want the data return in a specific order YOU MUST include ORDER BY

How to "prefer" data from one table over another in SQL, help reformulating.

Using Sql management Studio on Sql-Server 2005.
I guess my title is not very clear, so although I'd appreciate a direct answer, even better would be help in reformulating the question that I can search for an answer by myself and learn in the process.
Table A contains yearly total sales for various products. This table is updated once a year in Feb/March.There are two consequences to that: A product that started being sold in, say, July, will obviously only have a six month sale total for the year. Also, any new product will not appear in it before the following year.
Table B is an exception table that gives an estimate for all products not yet in table A or that have a less than a twelve months sale history.
Table C joins the yearly sales with various info from other tables and also contains calculated columns .
In pseudo code what I need is:
Use sales of product x from Table A in table C unless sales for this product exist in table B.
Basically, data from table B should take precedence over table A wether the record exists in A or not.
If you do a LEFT JOIN and no records exist in B, you will get a null for the column values in B. So to "prefer" valued that exist in B, use COALESCE:
SELECT COALESCE(B.Sales, C.Sales) AS Sales
FROM ...
Which loosely translates to "If B has a Sales value, use it, otherwise use the value from C."
Technically it means "take the first non-null value from B.Sales and C.Sales (in that order)"

SQL Database design question

I have a task to design a SQL database that will record values for the company's commodity products (sku numbers). The company has 7600 unique product items that need to be tracked, and each product will have approximately 200 values over the period of a year (one value per product per day, over the period of a year).
My first guess is that the sku numbers go top to bottom (each sku has a row) and each date is a column.
The data will be used to view in chart / graph format and additional calculations will be displayed against those (such as percentage profit margin etc)
My question is:
- is this layout advisable?
- do I have to be cautious of anything, if this type of data goes back about 15 yrs (each table will represent a year)
Any suggestions?
It better to have 3 columns only - instead of many as you are suggesting:
sku date value
-------------------------
1 2011-01-01 10
1 2011-01-02 12
2 2011-01-01 5
This way you can easily add another column if you want to record something else about a given product per date.
I would suggest a table for your products, and a table for the historical values. Maybe create an index for the historical values based on date if you plan to select for specific time periods.
create table products (
id number primary key,
sku number,
name text,
desc text);
create table values (
id number primary key,
product_id number,
timestamp date,
value number,
foreign key fk_prod_price product_id on product.id);
create index idx_price on values.timestamp;
NOTE: not actual sql, you will have to write your own
If you do like #fiver wrote, you don't have to have a table for each year either. Everything in one table. And add indexes on sku/date for faster searching

What is a fast way of joining two tables and using the first table column to "filter" the second table?

I am trying to develop a SQL Server 2005 query but I'm being unsuccessful at the moment. I trying every different approach that I know, like derived tables, sub-queries, CTE's, etc, but I couldn't solve the problem. I won't post the queries I tried here because they involve many other columns and tables, but I will try to explain the problem with a simpler example:
There are two tables: PARTS_SOLD and PARTS_PURCHASED. The first contains products that were sold to customers, and the second contains products that were purchased from suppliers. Both tables contains a foreign key associated with the movement itself, that contains the dates, etc.
Here is the simplified schema:
Table PARTS_SOLD:
part_id
date
other columns
Table PARTS_PURCHASED
part_id
date
other columns
What I need is to join every row in PARTS_SOLD with a unique row from PARTS_PURCHASED, chose by part_id and the maximum "date", where the "date" is equal of before the "date" column from PARTS_PURCHASED. In other words, I need to collect some information from the last purchase event for the item for every event of selling this item.
The problem itself is that I didn't find a way of joining the PARTS_PURCHASED table with PARTS_SOLD table using the column "date" from PARTS_SOLD to limit the MAX(date) of the PARTS_PURCHASED table.
I could have done this with a cursor to solve the problem with the tools I know, but every table has millions of rows, and perhaps using cursors or sub-queries that evaluate a query for every row would make the process very slow.
You aren't going to like my answer. Your database is designed incorrectly which is why you can't get the data back out the way you want. Even using a cursor, you would not get good data from this. Assume that you purchased 5 of part 1 on May 31, 2010. Assume on June 1, you sold ten of part 1. Matching just on date, you would match all ten to the May 31 purchase even though that is clearly not correct, some parts might have been purchased on May 23 and some may have been purchased on July 19, 2008.
If you want to know which purchased part relates to which sold part, your database design should include the PartPurchasedID as part of the PartsSold record and this should be populated at the time of the purchase, not later for reporting when you have 1,000,000 records to sort through.
Perhaps the following would help:
SELECT S.*
FROM PARTS_SOLD S
INNER JOIN (SELECT PART_ID, MAX(DATE)
FROM PARTS_PURCHASED
GROUP BY PART_ID) D
ON (D.PART_ID = S.PART_ID)
WHERE D.DATE <= S.DATE
Share and enjoy.
I'll toss this out there, but it's likely to contain all kinds of mistakes... both because I'm not sure I understand your question and because my SQL is... weak at best. That being said, my thought would be to try something like:
SELECT * FROM PARTS_SOLD
INNER JOIN (SELECT part_id, max(date) AS max_date
FROM PARTS_PURCHASED
GROUP BY part_id) AS subtable
ON PARTS_SOLD.part_id = subtable.part_id
AND PARTS_SOLD.date < subtable.max_date