Joining column for multiple tables - sql

I am trying to extract two same-data-type columns from two different tables using one query. NOTE: Accounts attribute length in both table varies. Union can't work here because number of columns are (in reality) different in both tables.
CREATE TABLE IF NOT EXISTS `mydb`.`TABLE_A` (
`ID_TABLE_A` INT NOT NULL AUTO_INCREMENT,
`ACCOUNT` VARCHAR(5) NULL,
`SALES` INT NULL,
PRIMARY KEY (`ID_TABLE_A`))
ENGINE = InnoDB;
CREATE TABLE IF NOT EXISTS `mydb`.`TABLE_B` (
`ID_TABLE_B` INT NOT NULL AUTO_INCREMENT,
`ACOUNT` VARCHAR(9) NULL,
`SALES` INT NULL,
PRIMARY KEY (`ID_TABLE_B`))
ENGINE = InnoDB;
Requirement:(I know this can't be right but just to demonstrate a partial picture)
SELECT
ACCOUNTS,
SALES
FROM
TABLE_A, TABLE_B
Result:
---------------
|accounts|sales|
| 2854 |52500 |
| 6584 |54645 |
| 54782| 5624 |
| 58496|46259 |
| 56958| 6528 |
---------------

If you want the union of two tables that are not union-compatible, then make them union-compatible:
(SELECT
ACCOUNTS,
SALES
FROM
TABLE_A) UNION ALL
(SELECT
ACCOUNTS,
SALES
FROM TABLE_B)
I put the UNION ALL based on the assumption that you would like to keep duplicates. If you would like the output to be duplicate-free, replace it with UNION.

Related

SQL comparison report on cartesian product using subquery

I'm a student building a comparison report query in MySQL on a database that tracks customers, products, and purchases in separate tables. I have to create a report that shows how many products were sold every month for each province using a subquery. I was told to use a cross join between product and customer, however, my query runs into a problem when I try to group them as the records all collapse into each other and I don't understand why this is happening. I'm not sure if this is the correct way to approach this problem since my customer and product table don't have any values that intersect with each other except through the purchase table.
These are my create table scripts
CREATE TABLE 'customer' (
'CustomerID' INT NOT NULL,
'City' VARCHAR(100) NOT NULL,
'Province' CHAR(2) NOT NULL,
PRIMARY KEY ('CustomerID'));
CREATE TABLE 'product' (
'ProductID' INT NOT NULL,
'ProductName' VARCHAR(100) NOT NULL,
'Price' DECIMAL(5,2) NOT NULL,
PRIMARY KEY ('ProductID'));
CREATE TABLE 'purchase' (
'PurchaseID' INT NOT NULL,
'PurchaseDate' DATE NOT NULL,
'customer_CustomerID' INT NOT NULL,
'product_ProductID' INT NOT NULL,
PRIMARY KEY ('PurchaseID'),
CONSTRAINT 'fk_purchase_customer'
FOREIGN KEY ('customer_CustomerID')
REFERENCES 'customer' ('CustomerID'),
CONSTRAINT 'fk_purchase_product'
FOREIGN KEY ('product_ProductID')
REFERENCES 'product' ('ProductID'));
This is the query that I have written as I have understood the instructions.
SELECT DISTINCT province, productName AS Product, JanTotalSales
FROM PRODUCT cross join CUSTOMER
LEFT JOIN
(
SELECT purchaseID, product_productID, customer_customerID, COUNT(purchaseDate) AS JanTotalSales
FROM PURCHASE
WHERE MONTH(purchaseDate) = 01
)JAN ON PRODUCT.productID = JAN.product_productID
GROUP BY province, productID;
I should be getting results like this
Province
Product
JanTotalSales
FebTotalSales
...
TotalSales
QC
Paper
1
NULL
...
1
ON
Paper
1
2
...
3
AB
Paper
1
NULL
...
1
AB
Wire
2
2
...
4
ON
Wire
2
1
...
3
NULL
Kit
NULL
NULL
...
NULL
SK
Gummy
1
1
...
2
NULL
Bag
NULL
NULL
...
NULL
However, I receive results like this when I do it on the January subquery.
Province
Product
JanTotalSales
AB
Paper
NULL
AB
Wire
NULL
AB
Kit
NULL
AB
Kit
13
ON
Paper
NULL
ON
Wire
NULL
ON
Kit
NULL
ON
Kit
13
I appreciate whatever help you can give to show me where I'm going wrong. From what I understand it's something to do with how the grouping occurs but I can't figure out why.

How to insert a column which sets unique id based on values in another column (SQL)?

I will create table where I will insert multiple values for different companies. Basically I have all values that are in the table below but I want to add a column IndicatorID which is linked to IndicatorName so that every indicator has a unique id. This will obviously not be a PrimaryKey.
I will insert the data with multiple selects:
CREATE TABLE abc
INSERT INTO abc
SELECT company_id, 'roe', roevalue, metricdate
FROM TABLE1
INSERT INTO abc
SELECT company_id, 'd/e', devalue, metricdate
FROM TABLE1
So, I don't know how to add the IndicatorID I mentioned above.
EDIT:
Here is how I populate my new table:
INSERT INTO table(IndicatorID, Indicator, Company, Value, Date)
SELECT [the ID that I need], 'NI_3y' as 'Indicator', t.Company, avg(t.ni) over (partition by t.Company order by t.reportdate rows between 2 preceding and current row) as 'ni_3y',
t.reportdate
FROM table t
LEFT JOIN IndicatorIDs i
ON i.Indicator = roe3 -- the part that is not working if I have separate indicatorID table
I am going to insert different indicators for the same companies. And I want indicatorID.
Your "indicator" is a proper entity in its own right. Create a table with all indicators:
create table indicators (
indicator_id int identity(1, 1) primary key,
indicator varchar(255)
);
Then, use the id only in this table. You can look up the value in the reference table.
Your inserts are then a little more complicated:
INSERT INTO indicators (indicator)
SELECT DISTINCT roevalue
FROM table1 t1
WHERE NOT EXISTS (SELECT 1 FROM indicators i2 WHERE i2.indicator = t1.roevalue);
Then:
INSERT INTO ABC (indicatorId, companyid, value, date)
SELECT i.indicatorId, t1.company, v.value, t1.metricdate
FROM table1 t1 CROSS APPLY
(VALUES ('roe', t1.roevalue), ('d/e', t1.devalue)
) v(indicator, value) JOIN
indicators i
ON i.indicator = v.indicator;
This process is called normalization and it is the typical way to store data in a database.
DDL and INSERT statement to create an indicators table with a unique constraint on indicator. Because the ind_id is intended to be a foreign key in the abc table it's created as a non-decomposable surrogate integer primary key using the IDENTITY property.
drop table if exists test_indicators;
go
create table test_indicators (
ind_id int identity(1, 1) primary key not null,
indicator varchar(20) unique not null);
go
insert into test_indicators(indicator) values
('NI'),
('ROE'),
('D/E');
The abc table depends on the ind_id column from indicators table as a foreign key reference. To populate the abc table company_id's are associated with ind_id's.
drop table if exists test_abc
go
create table test_abc(
a_id int identity(1, 1) primary key not null,
ind_id int not null references test_indicators(ind_id),
company_id int not null,
val varchar(20) null);
go
insert into test_abc(ind_id, company_id)
select ind_id, 102 from test_indicators where indicator='NI'
union all
select ind_id, 103 from test_indicators where indicator='ROE'
union all
select ind_id, 104 from test_indicators where indicator='D/E'
union all
select ind_id, 103 from test_indicators where indicator='NI'
union all
select ind_id, 105 from test_indicators where indicator='ROE'
union all
select ind_id, 102 from test_indicators where indicator='NI';
Query to get result
select i.ind_id, a.company_id, i.indicator, a.val
from test_abc a
join test_indicators i on a.ind_id=i.ind_id;
Output
ind_id company_id indicator val
1 102 NI NULL
2 103 ROE NULL
3 104 D/E NULL
1 103 NI NULL
2 105 ROE NULL
1 102 NI NULL
I was finally able to find the solution for my problem which seems to me very simple, although it took time and asking different people about it.
First I create my indicators table where I assign primary key for all indicators I have:
CREATE TABLE indicators (
indicator_id int identity(1, 1) primary key,
indicator varchar(255)
);
Then I populate easy without using any JOINs or CROSS APPLY. I don't know if this is optimal but it seems as the simplest choice:
INSERT INTO table(IndicatorID, Indicator, Company, Value, Date)
SELECT
(SELECT indicator_id from indicators i where i.indicator = 'NI_3y) as IndicatorID,
'NI_3y' as 'Indicator',
Company,
avg(ni) over (partition by Company order by reportdate rows between 2 preceding and current row) as ni_3y,
reportdate
FROM TABLE1

Create primary key with two columns

I have two tables, bank_data and sec_data. Table bank_data has the columns id, date, asset, and liability. The date column is divided into quarters.
id | date | asset | liability
--------+----------+--------------------
1 | 6/30/2001| 333860 | 308524
1 | 3/31/2001| 336896 | 311865
1 | 9/30/2001| 349343 | 308524
1 |12/31/2002| 353863 | 322659
2 | 6/30/2001| 451297 | 425156
2 | 3/31/2001| 411421 | 391846
2 | 9/30/2001| 430178 | 41356
2 |12/31/2002| 481687 | 46589
3 | 6/30/2001| 106506 | 104532
3 | 3/31/2001| 104196 | 102983
3 | 9/30/2001| 106383 | 104865
3 |12/31/2002| 107654 | 105867
Table sec_data has columns of id, date, and security. I combined the two tables into a new table named new_table in R using this code:
dbGetQuery(con, "CREATE TABLE new_table
AS (SELECT sec_data.id,
bank_data.date,
bank_data.asset,
bank_data.liability,
sec_data.security
FROM bank_data,bank_sec
WHERE (bank_data.id = sec_data.id) AND
(bank_data.date = sec_data.date)")
I would like to set two primary keys (id and date) in this R code without using pgAdmin. I want to use something like Constraint bankkey Primary Key (id, date) but the AS and SELECT functions are throwing me off.
First your query is wrong.. You say table sec_data but you assign table bank_sec and i am rephrase your query
CREATE TABLE new_table AS
SELECT
sec_data.id,
bank_data.date,
bank_data.asset,
bank_data.liability,
sec_data.security
FROM bank_data
INNER JOIN sec_data on bank_data.id = sec_data.id
and bank_data.date = sec_data.date
Avoid using Implicit Join and use Explicit Join instead.. And as stated by # a_horse_with_no_name you can't define more than 1 primary key in 1 table. So what you do are Composite Primary Key
Define :
is a combination of two or more columns in a table that can be used to
uniquely identify each row in the table
So you need to Alter Function because Your create statement base on other table..
ALTER TABLE new_table
ADD PRIMARY KEY (id, date);
You may run these two separate statements ( create table and Insert into )
CREATE TABLE new_table (
id int, date date, asset int, liability int, security int,
CONSTRAINT bankkey PRIMARY KEY (id, date)
) ;
INSERT INTO new_table (id,date,asset,liability,security)
SELECT s.id,
b.date,
b.asset,
b.liability,
s.security
FROM bank_data b JOIN bank_sec s
ON b.id = s.id AND b.date = s.date;
Demo
To create the primary key you desire, run the following SQL statement after your CREATE TABLE ... AS statement:
ALTER TABLE new_table
ADD CONSTRAINT bankkey PRIMARY KEY (id, date);
That has the advantage that the primary key index won't slow down the data insertion.

How to join two tables together and return all rows from both tables, and to merge some of their columns into a single column

I'm working with SQL Server 2012 and wish to query the following:
I've got 2 tables with mostly different columns. (1 table has 10 columns the other has 6 columns).
however they both contains a column with ID number and another column of category_name.
The ID numbers may be overlap between the tables (e.g. 1 table may have 200 distinct IDs and the other 900 but only 120 of the IDs are in both).
The Category name are different and unique for each table.
Now I wish to have a single table that will include all the rows of both tables, with a single ID column and a single Category_name column (total of 14 columns).
So in case the same ID has 3 records in table 1 and another 5 records in table 2 I wish to have all 8 records (8 rows)
The complex thing here I believe is to have a single "Category_name" column.
I tried the following but when there is no null in both of the tables I'm getting only one record instead of both:
SELECT isnull(t1.id, t2.id) AS [id]
,isnull(t1.[category], t2.[category_name]) AS [category name]
FROM t1
FULL JOIN t2
ON t1.id = t2.id;
Any suggestions on the correct way to have it done?
Make your FULL JOIN ON 1=0
This will prevent rows from combining and ensure that you always get 1 copy of each row from each table.
Further explanation:
A FULL JOIN gets rows from both tables, whether they have a match or not, but when they do match, it combines them on one row.
You wanted a full join where you never combine the rows, because you wanted every row in both tables to appear one time, no matter what. 1 can never equal 0, so doing a FULL JOIN on 1=0 will give you a full join where none of the rows match each other.
And of course you're already doing the ISNULL to make sure the ID and Name columns always have a value.
SELECT ID, Category_name, (then the other 8 columns), NULL, NULL, NULL, NULL
FROM t1
UNION ALL
SELECT ID, Category_name, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, (then the other 4 columns)
FROM t2
This demonstrates how you can use a UNION ALL to combine the row sets from two tables, TableA and TableB, and insert the set into TableC.
Create two source tables with some data:
CREATE TABLE dbo.TableA
(
id int NOT NULL,
category_name nvarchar(50) NOT NULL,
other_a nvarchar(20) NOT NULL
);
CREATE TABLE dbo.TableB
(
id int NOT NULL,
category_name nvarchar(50) NOT NULL,
other_b nvarchar(20) NOT NULL
);
INSERT INTO dbo.TableA (id, category_name, other_a)
VALUES (1, N'Alpha', N'ppp'),
(2, N'Bravo', N'qqq'),
(3, N'Charlie', N'rrr');
INSERT INTO dbo.TableB (id, category_name, other_b)
VALUES (4, N'Delta', N'sss'),
(5, N'Echo', N'ttt'),
(6, N'Foxtrot', N'uuu');
Create TableC to receive the result set. Note that columns other_a and other_b allow null values.
CREATE TABLE dbo.TableC
(
id int NOT NULL,
category_name nvarchar(50) NOT NULL,
other_a nvarchar(20) NULL,
other_b nvarchar(20) NULL
);
Insert the combined set of rows into TableC:
INSERT INTO dbo.TableC (id, category_name, other_a, other_b)
SELECT id, category_name, other_a, NULL AS 'other_b'
FROM dbo.TableA
UNION ALL
SELECT id, category_name, NULL, other_b
FROM dbo.TableB;
Display the results:
SELECT *
FROM dbo.TableC;

whats wrong with this query?

I'm trying to write a query that selects from four tables
campaignSentParent csp
campaignSentEmail cse
campaignSentFax csf
campaignSentSms css
Each of the cse, csf, and css tables are linked to the csp table by csp.id = (cse/csf/css).parentId
The csp table has a column called campaignId,
What I want to do is end up with rows that look like:
| id | dateSent | emailsSent | faxsSent | smssSent |
| 1 | 2011-02-04 | 139 | 129 | 140 |
But instead I end up with a row that looks like:
| 1 | 2011-02-03 | 2510340 | 2510340 | 2510340 |
Here is the query I am trying
SELECT csp.id id, csp.dateSent dateSent,
COUNT(cse.parentId) emailsSent,
COUNT(csf.parentId) faxsSent,
COUNT(css.parentId) smsSent
FROM campaignSentParent csp,
campaignSentEmail cse,
campaignSentFax csf,
campaignSentSms css
WHERE csp.campaignId = 1
AND csf.parentId = csp.id
AND cse.parentId = csp.id
AND css.parentId = csp.id;
Adding GROUP BY did not help, so I am posting the create statements.
csp
CREATE TABLE `campaignsentparent` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`campaignId` int(11) NOT NULL,
`dateSent` datetime NOT NULL,
`account` int(11) NOT NULL,
`status` varchar(15) NOT NULL DEFAULT 'Creating',
PRIMARY KEY (`id`)
) ENGINE=MyISAM AUTO_INCREMENT=2 DEFAULT CHARSET=latin1
cse/csf (same structure, different names)
CREATE TABLE `campaignsentemail` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`parentId` int(11) NOT NULL,
`contactId` int(11) NOT NULL,
`content` text,
`subject` text,
`status` varchar(15) DEFAULT 'Pending',
PRIMARY KEY (`id`)
) ENGINE=MyISAM AUTO_INCREMENT=140 DEFAULT CHARSET=latin1
css
CREATE TABLE `campaignsentsms` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`parentId` int(11) NOT NULL,
`contactId` int(11) NOT NULL,
`content` text,
`status` varchar(15) DEFAULT 'Pending',
PRIMARY KEY (`id`)
) ENGINE=MyISAM AUTO_INCREMENT=141 DEFAULT CHARSET=latin1
You need to aggregate the sums separately, not as shown in the question.
SELECT csp.id, csp.dateSent dateSent,
e.email_count, f.fax_count, s.sms_count
FROM campaignSentParent AS csp
JOIN (SELECT cse.ParentId, COUNT(*) AS email_count
FROM campaignSentEmail cse
GROUP BY cse.ParentID) AS e ON e.parentID = csp.id
JOIN (SELECT csf.ParentId, COUNT(*) AS fax_count
FROM campaignSentFax csf
GROUP BY csf.ParentID) AS f ON f.ParentID = csp.id
JOIN (SELECT css.ParentID, COUNT(*) AS sms_count
FROM campaignSentSms css
GROUP BY css.ParentId) AS s ON s.ParentID = csp.id
WHERE csp.campaignId = 1
To do this, you pretty much have to use the JOIN notation as shown.
You depending on the quality of your optimizer and the cardinalities of the various tables and the available indexes, you might find it effective to include a join with CampaignSentParent in each of the sub-queries with the csp.CampaignID = 1 condition, so as to limit the data aggregated by the sub-queries.
You might notice that the result count you get is 2510340. The prime factorization of 2510340 is 2 × 2 × 3 × 5 × 7 × 43 × 139, and your expected answer is 139, 129, and 140. You can get 3 × 43 = 129; 2 × 2 × 5 × 7 = 140; and 139 = 139. In other words, the original query is generating the Cartesian product of all the rows in the three dependent tables and counting the product, rather than counting the relevant rows from each dependent table separately.
You're missing a GROUP BY statement at the end. I can't tell from your example what you want them to be grouped by to actually give you the code.
Add GROUP BY dateSent to the end of your query.
Try adding a group by clause.
SELECT csp.id id, csp.dateSent dateSent,
COUNT('cse.parentId') emailsSent,
COUNT('csf.parentId') faxsSent,
COUNT('css.parentId') smsSent
FROM campaignSentParent csp,
campaignSentEmail cse,
campaignSentFax csf,
campaignSentSms css
WHERE csp.campaignId = 1
AND csf.parentId = csp.id
AND cse.parentId = csp.id
AND css.parentId = csp.id
GROUP BY csp.id, csp.dateSent
When you use an aggregate function, you normally need to include a group by.