Rails, find duplicate entries in postgres database

Rails, find duplicate entries in postgres database - ruby-on-rails-3

I have the following classes:
class Product < ActiveRecord::Base
has_one :total_purchase
end
class TotalPurchase < ActiveRecord::Base
belongs_to :product
end
My tables have the following columns (I've left out the majority for clarity):
Product
id | title | end_date | price | discount
TotalPurchase
id | product_id | amount
Numerous products have the same products.title, products.end_date and totalpurchases.amount and I'd like to identify all of those between certain dates. I'm currently using the following sql query to identify them:
Product.find_by_sql("select products.title, products.end_date, total_purchases.amount from products left join total_purchases on products.id = total_purchases.product_id where products.end_date > '2012-08-26 23:00:00' and productss.end_date < '2012-09-02 22:59:59' group by products.end_date, products.title, total_purchases.amount having count(*) > 1")
How would I write the above as a rails query rather than using find_by_sql?

Product.select('products.end_date, products.title, total_purchases.amount').
joins('left join total_purchases on products.id = total_purchases.product_id').
where('products.end_date > :start_date
and products.end_date < :end_date',
:start_date => '2012-08-26 23:00:00',
:end_date => '2012-09-02 22:59:59').
group('products.end_date, products.title, total_purchases.amount').
having('count(*) > 1')

Related

Joining 2 different tables where user_id equals to and created_at between 2 dates

In my application, I have 3 models User, Visit, Post. What I want to achieve is selecting visits & posts that belongs to the user, where created_at is between 2 dates and group them by day (I am using groupdate for group_by_day) and return a count on visits & posts for that giving day.
I have tried several with no luck:
User.joins(:visits,:posts).where("visits.user_id = ?, posts.user_id = ?", current_user.id, current_user.id)
.where("visits.created_at >= ? AND visits.created_at <= ?", 15.days.ago.midnight.in_time_zone("Berlin"), 2.days.ago.midnight.in_time_zone("Berlin"))
.where("posts.created_at >= ? AND posts.created_at <= ?", 15.days.ago.midnight.in_time_zone("Berlin"), 2.days.ago.midnight.in_time_zone("Berlin"))
.group_by_day(:created_at, range: 15.days.ago.midnight.in_time_zone("Berlin")..Time.now.in_time_zone("Berlin"), format: "%d %b")
.count
This gave me: ActiveRecord::StatementInvalid
SQLite3::SQLException: row value misused: SELECT COUNT(*) AS count_all, strftime('%Y-%m-%d 00:00:00 UTC', "users"."created_at") AS strftime_y_m_d_00_00_00_utc_users_created_at FROM "users" INNER JOIN "visits" ON "visits"."user_id" = "users"."id" INNER JOIN "posts" ON "posts"."user_id" = "users"."id" WHERE (visits.user_id = 1, posts.user_id = 1) AND (visits.created_at >= '2020-01-20 00:00:00' AND visits.created_at <= '2020-02-02 00:00:00') AND (posts.created_at >= '2020-01-20 00:00:00' AND posts.created_at <= '2020-02-02 00:00:00') AND ("users"."created_at" >= '2020-01-20 00:00:00' AND "users"."created_at" <= '2020-02-04 19:03:28.383850') GROUP BY strftime('%Y-%m-%d 00:00:00 UTC', "users"."created_at")
My Models:
class Visit < ApplicationRecord
belongs_to :user
end
class Post < ApplicationRecord
belongs_to :user
end
class User < ApplicationRecord
has_many :visits
has_many :posts
end
I've been looking around and tried a few different ways with no luck.
Any rails help just plain sql or tip is much appreciated and thanks in advance!

Try this one :
_start_time = 15.days.ago.midnight.in_time_zone("Berlin") ## get start time
_end_time = 2.days.ago.midnight.in_time_zone("Berlin") ## get end time
#### query ###
User.joins(:visits, :posts).where(visits: { created_at: (_start_time.._end_time) }).where(posts: { created_at: (_start_time.._end_time) }).group_by_day(:created_at, range: _start_time..Time.now.in_time_zone("Berlin"), format: "%d %b").count

Rails - Complex data modeling relationship between 3 tables

I'm facing the following problem, I'm still fairly new to rails and I need your help
I have a table SALES with 4 columns: product_id, quantity of the product sold, the seller and the day of the selling.
I have other table PRODUCT_PRICE with the list of the products, the price and the month (price changes based on month)
Finally I have other table PRODUCT where I have the product, description, and manufactor_id of the product.
I need to make 2 queries:
- the top 10 manufactors with more products sold in the last month
- the top 10 sellers with more revenue (using the PRODUCT_PRICE table to calculate) in the last month.
How can I do this with ActiveRecord in Ruby on Rails, I would like to use rails associations to do this.
Thanks!

Not tested and quite difficult without any further information, but if you have the three models setup:
For the first, based on this query:
SELECT manufacturer_id FROM products
LEFT JOIN sales ON sales.product_id = products.id
WHERE YEAR(sales.created_at) = YEAR(CURRENT_DATE - INTERVAL 1 MONTH)
AND MONTH(sales.created_at) = MONTH(CURRENT_DATE - INTERVAL 1 MONTH)
LIMIT 10
It would be something like this (you don't mention which version of Rails you are using)
#top_manufacturers = Product.find(:all,
:select=> 'manufacturer_id FROM products',
:joins => 'LEFT JOIN sales ON sales.product_id = products.id',
:conditions => 'YEAR(created_at) = YEAR(CURRENT_DATE - INTERVAL 1 MONTH) AND MONTH(created_at) = MONTH(CURRENT_DATE - INTERVAL 1 MONTH)',
:limit => 10)
For the second
SELECT sales.seller_id, SUM(sales.quantity), SUM(product_prices.price) FROM sales
LEFT JOIN products ON products.id = sales.product_id
LEFT JOIN product_prices ON product_prices.product_id = products.id
WHERE YEAR(sales.created_at) = YEAR(CURRENT_DATE - INTERVAL 1 MONTH)
AND MONTH(sales.created_at) = MONTH(CURRENT_DATE - INTERVAL 1 MONTH)
LIMIT 10
#top_sellers = Sale.find(:all,
:select => 'sales.seller_id, SUM(sales.quantity), SUM(product_prices.price)',
:joins => 'LEFT JOIN products ON products.id = sales.product_id LEFT JOIN product_prices ON product_prices.product_id = products.id',
:conditions => 'YEAR(sales.created_at) = YEAR(CURRENT_DATE - INTERVAL 1 MONTH) AND MONTH(sales.created_at) = MONTH(CURRENT_DATE - INTERVAL 1 MONTH)',
:limit => 10)
If you don't have created_at and updated_at columns in all your tables then you should add them - Rails populates them automagically although I guess I should be using your month column.

count() on where clause from a different table

I have a search function to search for mysql results depending on their inputs. Now I wanted to include on the where clause the count() of likes for the result from a different table.
Like this scenario. I want to search "Dog" where count(like_id) >= 1.
The Likes Table that I want to include is: TABLE_GLOBAL_LIKES.
The Structure of TABLE_GLOBAL_LIKES is:
Fields: like_id, user_id, products_id, blog_id.
BOTH TABLE_GLOBAL_PRODUCTS and TABLE_GLOBAL_LIKES has a common field of products_id and blog_id to associate with.
This is my working query that I want the Likes table to included.
SELECT SQL_CALC_FOUND_ROWS p.*,
CASE
WHEN p.specials_new_products_price > 0.0000
AND (p.expires_date > Now()
OR p.expires_date IS NULL
OR p.expires_date ='0000-00-00 00:00:00')
AND p.status != 0 THEN p.specials_new_products_price
ELSE p.products_price
END price
FROM ".TABLE_GLOBAL_PRODUCTS." p
INNER JOIN ".TABLE_STORES." s ON s.blog_id = p.blog_id
WHERE MATCH (p.products_name,
p.products_description) AGAINST ('*".$search_key."*')
AND p.display_product = '1'
AND p.products_status = '1' HAVING price <= ".$priceto_key."
AND price >= ".$pricefrom_key."
ORDER BY p.products_date_added DESC, p.products_name
I'm a newbie in mysql queries.. Please help.

Try this:
SELECT SQL_CALC_FOUND_ROWS p.*, COUNT(l.like_id)
CASE
WHEN p.specials_new_products_price > 0.0000
AND (p.expires_date > Now()
OR p.expires_date IS NULL
OR p.expires_date ='0000-00-00 00:00:00')
AND p.status != 0 THEN p.specials_new_products_price
ELSE p.products_price
END price
FROM ".TABLE_GLOBAL_PRODUCTS." p
INNER JOIN ".TABLE_STORES." s ON s.blog_id = p.blog_id
INNER JOIN ".TABLE_GLOBAL_LIKES." l ON l.blog_id = p.blog_id AND l.products_id = p.products_id
WHERE MATCH (p.products_name,
p.products_description) AGAINST ('*".$search_key."*')
AND p.display_product = '1'
AND p.products_status = '1' HAVING price <= ".$priceto_key."
AND price >= ".$pricefrom_key."
GROUP BY p.products_id
HAVING COUNT(l.like_id)>0
ORDER BY p.products_date_added DESC, p.products_name

Overlapping Data

I have a sql query to check overlapping of product records in table PRODUCTS.
In most cases query works fine except for the following.
select * from products where
product_reg_no = 'AL-NAPT'
and (to_date('14-Aug-2001') BETWEEN to_date('27-Aug-2001') AND to_date('30-Aug-2001')
or to_date('31-Aug-2001') BETWEEN to_date('27-Aug-2001') AND to_date('30-Aug-2001'))
How to make this query to catch all records are overlapping either partially or completely?
If required I can provide table structure with sample records.
Thanks
Update 1
I have added table structure and records here or as below:
create table products
(product_reg_no varchar2(32),
start_date date,
end_date date);
Insert into products
(product_reg_no, START_DATE, END_DATE)
Values
('AL-NAPT', TO_DATE('08/14/2012', 'MM/DD/YYYY'), TO_DATE('08/31/2012', 'MM/DD/YYYY'));
Insert into products
(product_reg_no, START_DATE, END_DATE)
Values
('AL-NAPT', TO_DATE('08/27/2012', 'MM/DD/YYYY'), TO_DATE('08/30/2012', 'MM/DD/YYYY'));
COMMIT;
The first record which is from August, 14 2012 to August, 31 2012 is overlapping with
second record which is from August, 27 2012 to August, 30 2012. So how can I modify my query to get the overlapping?

See Determine whether two date ranges overlap.
You need to evaluate the following, or a minor variant on it using <= instead of <, perhaps:
Start1 < End2 AND Start2 < End1
Since you're working with a single table, you need to have a self-join:
SELECT p1.*, p2.*
FROM products p1
JOIN products p2
ON p1.product_reg_no != p2.product_reg_no
AND p1.start < p2.end
AND p2.start < p1.end;
The not equal condition ensures that you don't get a record paired with itself (though the < conditions also ensure that, but if you used <=, the not equal condition would be a good idea.
This will generate two rows for each pair of products (one row with ProductA as p1 and ProductB as p2, the other with ProductB as p1 and ProductA as p2). To prevent that happening, change the != into either < or >.
And, looking more closely at the sample data, it might be that you're really interesting in rows where the registration numbers match and the dates overlap. In which case, you can ignore my wittering about != and < or > and replace the condition with = after all.
SELECT p1.*, p2.*
FROM products p1
JOIN products p2
ON p1.product_reg_no = p2.product_reg_no
AND p1.start < p2.end
AND p2.start < p1.end;
SQL Fiddle (unsaved) shows that this works:
SELECT p1.product_reg_no p1_reg, p1.start_date p1_start, p1.end_date p1_end,
p2.product_reg_no p2_reg, p2.start_date p2_start, p2.end_date p2_end
FROM products p1
JOIN products p2
ON p1.product_reg_no = p2.product_reg_no
AND p1.start_date < p2.end_date
AND p2.start_date < p1.end_date
WHERE (p1.start_date != p2.start_date OR p1.end_date != p2.end_date);
The WHERE clause eliminates the rows that are joined to themselves. With the duplicate column names in the SELECT-list eliminated, you get to see all the data. I added a row:
INSERT INTO products (product_reg_no, start_date, end_date)
VALUES ('AL-NAPT', TO_DATE('08/27/2011', 'MM/DD/YYYY'), TO_DATE('08/30/2011', 'MM/DD/YYYY'));
This was not selected — demonstrating that it does reject non-overlapping entries.
If you want to eliminate the double rows, then you have to add another fancy criterion:
SELECT p1.product_reg_no p1_reg, p1.start_date p1_start, p1.end_date p1_end,
p2.product_reg_no p2_reg, p2.start_date p2_start, p2.end_date p2_end
FROM products p1
JOIN products p2
ON p1.product_reg_no = p2.product_reg_no
AND p1.start_date < p2.end_date
AND p2.start_date < p1.end_date
WHERE (p1.start_date != p2.start_date OR p1.end_date != p2.end_date)
AND (p1.start_date < p2.start_date OR
(p1.start_date = p2.start_date AND p1.end_date < p2.end_date));

This is a strange query. You check if 14-Aug-2001 is between 27-Aug-2001 and 30-Aug-2001 which is always false OR 31-Aug-2001 is between 27-Aug-2001 and 30-Aug-2001 which also always is false. So your where clause will always be false.
Edit: Thanks for clarification
SQL Fiddle Demo
select p1.product_reg_no
, p1.start_date p1s
, p1.end_date p1e
, p2.start_date p2s
, p2.end_date p2e
from products p1, products p2
where p1.product_reg_no = p2.product_reg_no
and not ( p1.end_date < p2.start_date
and p1.start_date > p2.end_date );
What you want is the following scenarios (1 stands for the first row 2 for the second)
1 1
2 2
1 1
2 2
1 1
2 2
1 1
2 2
1 1
2 2
That you could also be turned around and say you do not want this:
1 1
2 2
1 1
2 2
I assumed you also do want this
1 1
2 2
1 1
2 2
The WHERE clause could also be written differently
not ( p1.end_date < p2.start_date and p1.start_date > p2.end_date )
is the same as
p1.end_date >= p2.start_date or p1.start_date <= p2.end_date
I think it was called De Morgan's law when I had that in school eons ago.
You must probably think about what would happen if you have more than 2 rows.

SQL query that does two GROUP BYs?

I'm having trouble getting the SQL for a report I need to generate. I've got the (equivalent of the) following setup:
A table articles (fields such as as name, manufacturer_id, etc).
A table sales.
FK to articles called article_id
An integer called amount
A date field
A field called type. We can assume it is a string and it can have 3 known values - 'morning', 'evening' and 'night'
I want to generate an aggregated sales report, given a start date and end date:
+--------------+---------------+--------------+-------------+
| article_name | morning_sales | evening_sales| night_sales |
+--------------+---------------+--------------+-------------+
| article 1 | 0 | 12 | 2 |
| article 2 | 80 | 3 | 0 |
... ... ... ... ...
| article n | 37 | 12 | 1 |
+--------------+---------------+--------------+-------------+
I'd like to do it as efficiently as possible. So far I've been able to generate a query that will give me one type of sale (morning, evening or night) but I'm not able to do it for multiple types simultaneously. Is it even possible?
This is what I have so far; it'll give me the article name and morning sales of all the articles in a given period - in other words, the first two columns of the report:
SELECT articles.name as article_name,
SUM(sales.amount) as morning_sales,
FROM "sales"
INNER JOIN articles ON articles.id = sales.articles_id
WHERE ( sales.date >= '2011-05-09'
AND sales.end_date <= '2011-05-16'
AND sales.type = 'morning'
)
GROUP BY sales.article_id
I guess I could do the same for evening and night, but the articles will be different; some articles might have sales in the morning but not in the evening, for example.
If I have to do 1 request per sale type, how do I "mix and match" the different article lists I'll get?
Is there a better way do do this (maybe with subqueries of some sort)?
Similarly, I'm able to build a query that gives me all the morning, evening and night sales, grouping by type. I guess my problem is that I need to do two GROUP BYs in order to get this report. I don't know how to do that, if it's possible at all.
I'm using PostgreSQL as my DB, but I'd like to stay as standard as possible. If it helps, the app using this is a Rails app.

Fortunately, you don't need to do multiple queries with your database format. This should work for you:
SELECT
articles.name AS article_name
SUM(IF(sales.type = 'morning', sales.amount, 0.0)) AS morning_sales,
SUM(IF(sales.type = 'evening', sales.amount, 0.0)) AS evening_sales,
SUM(IF(sales.type = 'night', sales.amount, 0.0)) AS night_sales
FROM sales
JOIN articles ON sales.article_id = articles.id
WHERE
sales.date >= "2011-01-01 00:00:00"
AND sales.date < "2011-02-01 00:00:00"
GROUP BY sales.article_id
And if there are other types, you would have to add more columns there, OR simply sum up the other types by adding this to the SELECT clause:
SUM(
IF(sales.type IS NULL OR sales.type NOT IN ('morning', 'evening', 'night'),
sales.amount, 0.0
)
) AS other_sales
The above is compatible with MySQL. To use it in Postgres, I think you'd have to change the IF expressions to CASE expressions, which should look like this (untested):
SELECT
articles.name AS article_name,
SUM(CASE WHEN sales.type = 'morning' THEN sales.amount ELSE 0.0 END) AS morning_sales,
SUM(CASE WHEN sales.type = 'evening' THEN sales.amount ELSE 0.0 END) AS evening_sales,
SUM(CASE WHEN sales.type = 'night' THEN sales.amount ELSE 0.0 END) AS night_sales
FROM sales
JOIN articles ON sales.article_id = articles.id
WHERE
sales.date >= "2011-01-01 00:00:00"
AND sales.date < "2011-02-01 00:00:00"
GROUP BY sales.article_id

Two options.
Option 1. A single join with computed columns for agreggation
select article_name = a.article_name ,
morning_sales = sum( case when sales.type = 'morning' then sales.amount end ) ,
evening_sales = sum( case when sales.type = 'evening' then sales.amount end ) ,
night_sales = sum( case when sales.type = 'night' then sales.amount end ) ,
other_sales = sum( case when sales.type in ( 'morning','evening','night') then null else sales.amount end ) ,
total_sales = sum( sales.amount )
from articles a
join sales s on s.articles_id = a.articles_id
where sales.date between #dtFrom and #dtThru
group by a.article_name
Option 2. multiple left joins
select article_name = a.article_name ,
morning_sales = sum( morning.amount ) ,
evening_sales = sum( evening.amount ) ,
night_sales = sum( night.amount ) ,
other_sales = sum( other.amount ) ,
total_sales = sum( total.amount )
from articles a
left join sales morning on morning.articles_id = a.articles_id
and morning.type = 'morning'
and morning.date between #dtFrom and #dtThru
left join sales evening on evening.articles_id = a.articles_id
and evening.type = 'evening'
and evening.date between #dtFrom and #dtThru
left join sales night on night.articles_id = a.articles_id
and night.type = 'evening'
and night.date between #dtFrom and #dtThru
left join sales other on other.articles_id = a.articles_id
and ( other.type is null
OR other.type not in ('morning','evening','night')
)
and other.date between #dtFrom and #dtThru
left join sales total on total.articles_id = a.articles_id
and total.date between #dtFrom and #dtThru
group by a.article_name

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Rails, find duplicate entries in postgres database - ruby-on-rails-3

Related

Joining 2 different tables where user_id equals to and created_at between 2 dates

Rails - Complex data modeling relationship between 3 tables

count() on where clause from a different table

Overlapping Data

SQL query that does two GROUP BYs?

Categories

Resources