Combining Columns into One Table in SQl - sql

I have two tables (Table 1 and Table 2) that include information on a company's insurance policies. There are thousands of rows and around 30 columns in each table. I need to create a table that combines certain columns from each table into a new table.
From Table 1 I need:
InvestmentCode, IndexType, Amount, FundID, PolicyNumber
From Table 2 I need:
PolicyNumber, FundValue, DepositDate, FundID
I want to merge the tables by FundID and Policynumber

Actually creating one more table would be data redundancy (because you already have data present and you are just copying) ,
you can always create view for this , for your query - view will be something as below
CREATE OR REPLACE VIEW <view_name> AS
select T1.InvestmentCode , T1.IndexType , T1.Amount , T1.FundID , T1.PolicyNumber ,
T2.FundValue , T2.DepositDate from Table1 T1 , Table2 T2
where T1.FundID = T22.FundID
and T2.PolicyNumber = T2.PolicyNumber
WITH READ ONLY

Related

Find entryno set from multiple set of records

I have two SQL temp tables #Temp1 and #Temp2.
I want to get entryno which contain set of temp table two.
For example: #Temp2 has 8 records. I want to search in #Temp1 which contains a set of records from #Temp1.
CREATE TABLE #Temp1 (entryNo INT, setid INT, measurid INT,measurvalueid int)
CREATE TABLE #Temp2(setid INT, measurid INT,measurvalueid int)
INSERT INTO #Temp1 (entryNo,setid,measurid,measurvalueid )
VALUES (1,400001,1,1),
(1,400001,2,110),
(1,400001,3,1001),
(1,400001,4,1100),
(2,400002,5,100),
(2,400002,6,102),
(2,400002,7,1003),
(2,400002,8,10004),
(3,400001,1,1),
(3,400001,2,110),
(3,400001,3,1001),
(3,400001,4,1200)
INSERT INTO #Temp2 (setid,measurid,measurvalueid )
VALUES (400001,1,1),
(400001,2,110),
(400001,3,1001),
(400001,4,1100),
(400002,5,100),
(400002,6,102),
(400002,7,1003),
(400002,8,10004)
I want output
EntryNo
1
2
It contains two sets.
One is:
(400001,1,1),
(400001,2,110),
(400001,3,1001),
(400001,4,1100)
The second is:
(400002,5,100),
(400002,6,102),
(400002,7,1003),
(400002,8,10004)
Try this:
WITH DataSourceInialData AS
(
SELECT *
,COUNT(*) OVER (PARTITION BY [entryNo], [setid]) AS [GroupCount]
FROM #Temp1
), DataSourceFilteringData AS
(
SELECT *
,COUNT(*) OVER (PARTITION BY [setid]) AS [GroupCount]
FROM #Temp2
)
SELECT A.[entryNo]
FROM DataSourceInialData A
INNER JOIN DataSourceFilteringData B
ON A.[setid] = B.[setid]
AND A.[measurid] = B.[measurid]
AND A.[measurvalueid] = B.[measurvalueid]
-- we are interested in groups which are passed completely by the filtering groups
AND A.[GroupCount] = B.[GroupCount]
GROUP BY A.[entryNo]
-- aftering joining the rows, the filtered rows must match the filtering rows
HAVING COUNT(A.[setid]) = MAX(B.[GroupCount]);
The algorithm is simple:
we count how many rows exists per data group
we count how many rows exists per filtering group
we join the initial data and the filtering data
after the join we count how many rows are left in the initial data and if there count is equal to the filtering count for the given group
and the result is:
Note, that I am checking for each match. For example, if in your sample data, there is one more row for entryNo = 1 it won't be included in the result. In order to change this behavior, comment this row:
-- we are interested in groups which are passed completely by the filtering groups
AND A.[GroupCount] = B.[GroupCount]

Query Optimization with millions of row in table

i have a table which has 4 columns
PKID,OutMailID,JobMailingDate,InsertDatetime
This is how the data ot inserted into the table
PKID is the primary Key of the table
for a single outMailID with JObMailingDate there are on avg 3 records are present in the table with
different insert date time. The table is having millions of records
I have many other table which has the same data but those is partaining to different category
Now i would like to find out the
1) Find All OutMailID Whose InsertDatetime is in between the Parameter data range
2) Once i have the list of OutMailID I would Like to Find the Minimum InsertDatetime for all these OutMailID Where this min Date falls between Param 1 and Param2
The Data for the table is like this
Select 1 as PKID,1 as OutMailID,'2010/01/01' as JobMailingDate,'2010/01/01' as InsertDatetime
UNION ALL
Select 2 as PKID,1 as OutMailID,'2010/01/01' as JobMailingDate,'2010/01/02' as InsertDatetime
UNION ALL
Select 3 as PKID,1 as OutMailID,'2010/01/01' as JobMailingDate,'2010/01/03' as InsertDatetime
UNION ALL
Select 4 as PKID,1 as OutMailID,'2010/01/01' as JobMailingDate,'2010/01/04' as InsertDatetime
All the above 2 steps i want to perform in a single Query so my query is somethig like this
Select
OutMailID,Min(InsertDatetime)
from
Table T
INNER JOIN
(
Select
OutMailID
from
Table
Where
InsertDatetime Between #Param1 and #Param2
) as T1 On (T1.OutMailID = T.outMailID)
Group by
OutMailID
Having Min(InsertDatetime) Between Between #Param1 and #Param2
But this is not Performing well. can anyone please suggest me a good way of doing this
The second problem is that once i have the output of first query then i use the same above query for other category to find out the min InsertDatatime in that category and once i have all the min date for all the category then i have to find the Min insert date among all the category
Can you please help me on this
Thanks
Atul
Does this query give you the desired results?
Select T.OutMailID, Min(T.InsertDatetime)
from Table T
INNER JOIN Table T1 On T1.OutMailID = T.outMailID
And T2.InsertDatetime Between #Param1 and #Param2
Group by OutMailID
How about using on this with statement, the with is like views that keeps everything in cache to have it for later, here is an example
with Table1 as (
Select OutMailID from Table Where InsertDatetime Between #Param1 and #Param2
),
Table2 as (
Select 4 as PKID,1 as OutMailID,'2010/01/01' as JobMailingDate,'2010/01/04' as InsertDatetime
)
select * from Table as T
inner join Table1 as T1 on T1.OutMailID = T.outMailID
group by T.OutMailID
That way you can reuse the Table1 multiple times without re-querying it again.
I think a simpler way to express your requirement is that you want all OutMailId whose first InsertDateTime is in the period specified.
It turns out that the JOIN is not necessary at all for this. This is a simpler version of your query:
Select t.OutMailID, Min(InsertDatetime)
from Table T
Group by OutMailID
Having Min(InsertDatetime) Between #Param1 and #Param2;
Many databases could take advantage of an index on Table(OutMailId, InsertDateTime) for this query.
Now, this query might not be super efficient, particularly if the range is small relative to the entire data. So, sticking with the above index, the following might work better:
select t.*
from (select OutMailId, min(InsertDatetime) as min_InsertDatetime
from table t
where InsertDatetime Between #Param1 and #Param2
group by OutMailId
) t
where not exists (select 1
from table t2
where t2.OutMailId = t.OutMailId and
t2.InsertDateTime < #Param1
);
This should use the index for the first subquery, limiting the number of ids. It should use the same index for the not exists, on a reduced number of rows.

Creating a partitioned hive table from a non partitioned table

I have a Hive table which was created by joining data from multiple tables. The data for this resides in a folder which has multiple files ("0001_1" , "0001_2", ... and so on). I need to create a partitioned table based on a date field in this table called pt_dt (either by altering this table or creating a new one). Is there a way to do this?
I've tried creating a new table and inserting into it (below) which did not work
create external table table2 (acct_id bigint, eval_dt string)
partitioned by (pt_dt string);
insert into table2
partition (pt_dt)
select acct_id, eval_dt, pt_dt
from jmx948_variable_summary;
This throws the error
"FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
MapReduce Jobs Launched:
Stage-Stage-1: Map: 189 Cumulative CPU: 401.68 sec HDFS Read: 0 HDFS Write: 0 FAIL
Total MapReduce CPU Time Spent: 6 minutes 41 seconds 680 msec"
Was able to figure it out after some trial & error.
Enable dynamic partitioning in Hive:
SET hive.exec.dynamic.partition = true;
SET hive.exec.dynamic.partition.mode = nonstrict;
Create schema for partitioned table:
CREATE TABLE table1 (id STRING, info STRING)
PARTITIONED BY ( tdate STRING);
Insert into partitioned table :
FROM table2 t2
INSERT OVERWRITE TABLE table1 PARTITION(tdate)
SELECT t2.id, t2.info, t2.tdate
DISTRIBUTE BY tdate;
In the version I am working with below works (Hive 0.14.0.2.2.4.2-2)
INSERT INTO TABLE table1 PARTITION(tdate) SELECT t2.id, t2.info, t2.tdate
From the source table select the column that needs to be partitioned by last, in the above example, date is selected as the last column in Select. Similarly, if the one needs the table to be partitioned by the column "info", then
INSERT INTO TABLE table1 PARTITION(info) SELECT t2.id, , t2.tdate, t2.info
If you want to create the table with multiple partitions the select query needs to be i that order. If you want to partition the above table with "date" and then "info"
INSERT INTO TABLE table1 PARTITION(date, info) SELECT t2.id, , t2.tdate, t2.info
With "info", then "date"
INSERT INTO TABLE table1 PARTITION(info, date) SELECT t2.id, , t2.info, t2.tdate

I am looking for a way for a trigger to insert into a second table only where the value in table 1 changes

I am looking for a way for a trigger to insert into a second table only where the value in table 1 changes. It is essentially an audit tool to trap any changes made. The field in table 1 is price and we want to write additional fields.
This is what I have so far.
CREATE TRIGGER zmerps_Item_costprice__update_history_tr ON [ITEM]
FOR UPDATE
AS
insert into zmerps_Item_costprice_history
select NEWID(), -- unique id
GETDATE(), -- CURRENT_date
'PRICE_CHANGE', -- reason code
a.ima_itemid, -- item id
a.ima_price-- item price
FROM Inserted b inner join item a
on b.ima_recordid = a.IMA_RecordID
The table only contains a unique identifier, date, reference(item) and the field changed (price). It writes any change not just a price change
Is it as simple as this? I moved some of the code around because comments after the comma between columns is just painful to maintain. You also should ALWAYS specify the columns in an insert statement. If your table changes this code will still work.
CREATE TRIGGER zmerps_Item_costprice__update_history_tr ON [ITEM]
FOR UPDATE
AS
insert into zmerps_Item_costprice_history
(
UniqueID
, CURRENT_date
, ReasonCode
, ItemID
, ItemPrice
)
select NEWID()
, GETDATE()
, 'PRICE_CHANGE'
, d.ima_itemid
, d.ima_price
FROM Inserted i
inner join deleted d on d.ima_recordid = i.IMA_RecordID
AND d.ima_price <> i.ima_price
Since you haven't provided any other column names I Have used Column2 and Column3 and the "Other" column names in the below example.
You can expand adding more columns in the below code.
overview about the query below:
Joined the deleted and inserted table (only targeting the rows that has changed) joining with the table itself will result in unnessacary processing of the rows which hasnt changed at all.
I have used NULLIF function to yeild a null value if the value of the column hasnt changed.
converted all the columns to same data type (required for unpivot) .
used unpivot to eliminate all the nulls from the result set.
unpivot will also give you the column name its has unpivoted it.
CREATE TRIGGER zmerps_Item_costprice__update_history_tr
ON [ITEM]
FOR UPDATE
AS
BEGIN
SET NOCOUNT ON ;
WITH CTE AS (
SELECT CAST(NULLIF(i.Price , d.Price) AS NVARCHAR(100)) AS Price
,CAST(NULLIF(i.Column2 , d.Column2) AS NVARCHAR(100)) AS Column2
,CAST(NULLIF(i.Column3 , d.Column3) AS NVARCHAR(100)) AS Column3
FROM dbo.inserted i
INNER JOIN dbo.deleted d ON i.IMA_RecordID = d.IMA_RecordID
WHERE i.Price <> d.Price
OR i.Column2 <> d.Column2
OR i.Column3 <> d.Column3
)
INSERT INTO zmerps_Item_costprice_history
(unique_id, [CURRENT_date], [reason code], Item_Value)
SELECT NEWID()
,GETDATE()
,Value
,ColumnName + '_Change'
FROM CTE UNPIVOT (Value FOR ColumnName IN (Price , Column2, Column3) )up
END
As I understand your question correctly, You want to record change If and only if The column Price value is changes, you dont need any other column changes to be recorded
here is your code
CREATE TRIGGER zmerps_Item_costprice__update_history_tr ON [ITEM]
FOR UPDATE
AS
if update(ima_price)
insert into zmerps_Item_costprice_history
select NEWID(), -- unique id
GETDATE(), -- CURRENT_date
'PRICE_CHANGE', -- reason code
a.ima_itemid, -- item id
a.ima_price-- item price
FROM Inserted b inner join item a
on b.ima_recordid = a.IMA_RecordID

Creating tables with fields from 2 different tables

I want to create a table that stores values from two different tables;
From table 1: cust_id (varchar2), invoice_amt (float)
From table 2: cust_id (from table 1), payment_date
My table should have 3 fields:
cust_id, invoice_amt, payment_date
I tried the following, which is obviously wrong.
create table temp1 as (
select table_1.cust_id, table_1.invoice_amt, table_2.payment_date
from table_1#dblink, table_2#dblink)
Your valuable suggestions will be of great help.
create table temp1 as (
select
table_1.cust_id,
table_1.invoice_amt,
table_2.payment_date
from
table_1#dblink,
table_2#dblink
where
table_1.cust_id = table_2.cust_id
)
I'm no oracle guy, but that should do what you want (untested, though).
You were close:
create table temp1 as (
select t1.cust_id, t1.invoice_amt, t2.payment_date
from table_1#dblink t1, table_2#dblink t2
where t1.cust_id=t2.cust_id)
It depends on what you're going to use it for, but I'd be sorely tempted to use a view instead of a table:
create view temp1(cust_id, invoice_amt, payment_date) as
select t1.cust_id, t1.invoice_amt, t2.payment_date
from table_1#dblink as t1 inner join table_2#dblink as t2
on t1.cust_id = t2.cust_id
The advantage is it always contains the values from the current versions of table_1 and table_2. The disadvantage is that you cannot edit the view (or, if you can, your edits affect the underlying tables as well as the view).