Apache Ignite: join between partitioned and replicated tables

Apache Ignite: join between partitioned and replicated tables - ignite

I'm evaluating Apache Ignite as a potential performance improvement in our setup.
I have two tables: one 22M records and the second 36k records. They need to be joined.
As the second table is relatively small I decided to keep it replicated across all nodes so the join could be executed locally on each node.
Here is my cache configuration:
<property name="cacheConfiguration">
<list>
<bean class="org.apache.ignite.configuration.CacheConfiguration">
<property name="name" value="metricsDailyCache"/>
<property name="groupName" value="group1"/>
<property name="cacheMode" value="PARTITIONED"/>
<property name="backups" value="0"/>
<property name="queryParallelism" value="1"/>
<!-- Other parameters -->
</bean>
<bean class="org.apache.ignite.configuration.CacheConfiguration">
<property name="name" value="privilegesCache"/>
<property name="groupName" value="group2"/>
<property name="cacheMode" value="REPLICATED"/>
<property name="backups" value="0"/>
</bean>
</list>
</property>
I have no persistence enabled.
So metricsDailyCache will store big table (22M) and privilegesCache will store small table (36k).
Tables DDL:
CREATE TABLE public.metrics_table (
id int PRIMARY key,
updated_at timestamp NULL,
period2 varchar NULL,
firm_id varchar NULL,
trader_id int NULL,
segment_name varchar NULL,
sector_name varchar NULL,
symbol varchar NULL,
measure_name varchar NULL,
measure_value float NULL,
market_average float NULL,
peer_average float NULL,
measure_rank int NULL,
measure_peer_rank int NULL
) WITH "CACHE_NAME=metricsDailyCache";
CREATE TABLE public.user_privileges (
id int PRIMARY KEY,
trader_id int NULL,
trader_name varchar NULL,
firm_id varchar NULL,
firm_name varchar NULL,
organization_profile varchar NULL,
tableau_user varchar NULL,
alias varchar NULL
) WITH "CACHE_NAME=privilegesCache";
Now, when I execute join:
with
user_privillages_distinct as (
select DISTINCT trader_id,
firm_id,
alias
from user_privileges
) SELECT DISTINCT sector_name
FROM metrics_table t
LEFT JOIN user_privillages_distinct u
ON t.trader_id = u.trader_id AND t.firm_id = u.firm_id;
it doesn't finish in a reasonable time (on Postgres it takes ~7s).
Is my understanding wrong?
Should I use partitioned cache for user_privileges and then define affinity key on join columns (firm_id, trader_id) instead?
Cluster configuration:
JVM_OPTS = -Xms1g -Xmx1G -XX:+AlwaysPreTouch -XX:+UseG1GC -XX:+ScavengeBeforeFullGC -XX:+DisableExplicitGC
I have 6 nodes deployed on Kubernetes, image: apacheignite/ignite:2.13.0-jdk11

Related

Inserting XML data into SQL Table with multiple nodes

So I have the following XML:
<NIACList>
<NIAC>
<Number></Number>
<SubmissionDate></SubmissionDate>
<ExpirationDate />
<IssuerIDNO></IssuerIDNO>
<IssuerName></IssuerName>
<SuspensionPeriod/>
<Cessation>
<Basis />
<Date />
</Cessation>
<Merchant>
<IDNx></IDNx>
<Name></Name>
<Address>
<Region></Region>
<Locality></Locality>
<Street></Street>
<House></House>
<Block />
<Flat />
<Phone />
<Fax />
<Email />
</Address>
</Merchant>
<CommercialUnit>
<IDNx />
<Name />
<Type></Type>
<Area></Area>
<Location></Location>
<Address>
<Region></Region>
<Locality></Locality>
<Street></Street>
<House></House>
<Block />
<Flat />
</Address>
<Activities>
<Activity>
<Code></Code>
<Name></Name>
</Activity>
</Activities>
<Goods>
<Good>
<Name></Name>
</Good>
</Goods>
<WorkProgram />
<PublicSupplyUnit>
<Capacity />
<TerraceCapacity />
</PublicSupplyUnit>
<TradingAlcohol />
<TradingBeer />
<TradingTobaccoProducts />
<AmbulatoryTrading />
<MobileUnitTrading></MobileUnitTrading>
<MobileUnit>
<Type />
<Length />
<Width />
<Height />
</MobileUnit>
<CommercialApparatusTrading></CommercialApparatusTrading>
<CommercialApparatus>
<Count />
<Length />
<Width />
<Height />
</CommercialApparatus>
</CommercialUnit>
<Modifications />
</NIAC>
</NIACList>
This is the script for the tables I created:
create table Merchant (
IdMerchant int identity primary key,
IDNX nvarchar(max) null,
Name nvarchar(max) null,
WorkProgram datetime2 null,
IdAddress int
);
create table Address (
IdAddress int identity primary key,
Region nvarchar(60) null,
Locality nvarchar(50) null,
Street nvarchar (60) null,
House nvarchar (10) null,
Block nvarchar (10) null,
Flat nvarchar(10) null,
Phone nvarchar(30) null,
Fax nvarchar(60) null,
Email nvarchar(60) null
);
create table CommercialUnit (
IDCommercialUt int identity primary key,
IDNx nvarchar(90) null,
Name nvarchar(90) null,
Type nvarchar(90) null,
Area int null,
Location nvarchar(50) null,
TerraceCapacity float null,
TradingAlcohol bit null,
TradingBeer bit null,
TradingTobaccoProducts bit null,
AmbulatoryTrading bit null,
MobileUnitTrading bit null,
CommercialApparatusTrading bit null,
IDActivities int ,
IDGoods int ,
IDMobileUnit int ,
IDCommercial int ,
IDPSU int
);
I'm not very good at XML, but here is the question:
I have tables Merchant and Address.The problem is, that the node Address is repeated 2 times(both in Merchant and CommercialUnit nodes), and has different data.My task is to specify somehow the insert, so the data that I want to insert will be divided into 2 categories, one for the Merchant node, and another for the CommercialUnit.After inserting into Address,the records must be linked with the Foreign Key from Merchant and CommercialUnit(IdAddress), so the data will be inserted here also.
I've tried to insert the data, but it inserted from the CommercialUnit node.
Below is the code for inserting:
INSERT INTO Address(Region,Locality,Street,House,Block,Flat,Phone,Fax,Email)
SELECT
Region=c.value('Region[1],','nvarchar(60)'),
Locality=c.value('Locality[1],','nvarchar(50)') ,
Street=c.value('Street[1],','nvarchar(60)') ,
House=c.value('House[1],','nvarchar(10)') ,
Block=c.value('Block[1],','nvarchar(10)') ,
Flat=c.value('Flat[1],','nvarchar(10)') ,
Phone=c.value('Phone[1],','nvarchar(30)') ,
Fax=c.value('Fax[1],','nvarchar(60)') ,
Email=c.value('Email[1],','nvarchar(60)')
FROM #xml.nodes('/NIACList/NIAC/Merchant/Address') Address(c)

You need to do four INSERT statements sequentially:
INSERT INTO Address ... (for Merchant), and capture its newly generated identity value into a variable.
INSERT INTO Merchant ... and use the variable from above.
INSERT INTO Address ... (for CommercialUnit), and capture its newly
generated identity value into a variable.
INSERT INTO CommercialUnit ... and use the variable from above.
Conceptual SQL
DECLARE #IdAddress INT;
-- Merchant
INSERT INTO Address ...
-- get last IDENTITY value for the Address table
SET #IdAddress = SCOPE_IDENTITY();
INSERT INTO Merchant ...
-- CommercialUnit
INSERT INTO Address ...
-- get last IDENTITY value for the Address table
SET #IdAddress = SCOPE_IDENTITY();
INSERT INTO CommercialUnit ...

One method is using cross apply. Provided you need distinct addresses only,
SELECT distinct t.*
FROM #xml.nodes('/NIACList/NIAC') niac(n)
cross apply (
select Region=c.value('Region[1],','nvarchar(60)'),
Locality=c.value('Locality[1],','nvarchar(50)') ,
Street=c.value('Street[1],','nvarchar(60)') ,
House=c.value('House[1],','nvarchar(10)') ,
Block=c.value('Block[1],','nvarchar(10)') ,
Flat=c.value('Flat[1],','nvarchar(10)') ,
Phone=c.value('Phone[1],','nvarchar(30)') ,
Fax=c.value('Fax[1],','nvarchar(60)') ,
Email=c.value('Email[1],','nvarchar(60)')
from niac.n.nodes('Merchant/Address') ma(c)
union
select Region=c.value('Region[1],','nvarchar(60)'),
Locality=c.value('Locality[1],','nvarchar(50)') ,
Street=c.value('Street[1],','nvarchar(60)') ,
House=c.value('House[1],','nvarchar(10)') ,
Block=c.value('Block[1],','nvarchar(10)') ,
Flat=c.value('Flat[1],','nvarchar(10)') ,
Phone=c.value('Phone[1],','nvarchar(30)') ,
Fax=c.value('Fax[1],','nvarchar(60)') ,
Email=c.value('Email[1],','nvarchar(60)')
from niac.n.nodes('CommercialUnit/Address') ua(c)
) t

Processing xml files and push data to sql table

may anyone please help me with this.
How can i parse the below xml to table.
i want data under note to be concatenate and seperated by (;)
for eg:-
App action="A" id="9951" there are two note
Reflector
Packaging Type: Box
so in table under column note it should be (Reflector;Packaging Type: Box)
similarly for qual
all text under need to be semicolon seperated.
xml data
DECLARE #DociD INT,
#XML NVARCHAR(MAX) =
'<root><App action="A" id="1">
<BaseVehicle id="95989"/>
<EngineBase id="2572"/>
<Qty>2</Qty>
<Note>Power</Note>
<Note>Textured Finish</Note>
<Note>w/Heat</Note>
<Note>wo/Turn Signal</Note>
<Note>w/Puddle Lamps</Note>
<Note>wo/Dimming</Note>
<PartType id="11618"/>
<MfrLabel>Professional Grade</MfrLabel>
<Position id="23"/>
<Part>816-8130</Part>
</App>
<App action="A" id="2">
<BaseVehicle id="8198"/>
<Qty>2</Qty>
<PartType id="11618"/>
<MfrLabel>Professional Grade</MfrLabel>
<Position id="23"/>
<Part>816-8130</Part>
</App>
<App action="A" id="3">
<BaseVehicle id="8197"/>
<Qty>2</Qty>
<PartType id="11618"/>
<MfrLabel>Professional Grade</MfrLabel>
<Position id="23"/>
<Part>816-8130</Part>
</App>
<App action="A" id="11840">
<BaseVehicle id="3723" />
<Note>Power</Note>
<Note>Textured Finish</Note>
<Note>w/Heat</Note>
<Note>wo/Turn Signal</Note>
<Note>w/Puddle Lamps</Note>
<Note>wo/Dimming</Note>
<Qty>1</Qty>
<PartType id="13117" />
<Position id="2" />
<Part>955-1147</Part>
</App>
</root>';
EXEC sys.sp_xml_preparedocument
#DociD OUTPUT,
#XML;
SELECT *
FROM OPENXML(#DociD,'/root/App',3)
WITH
(appaction CHAR(1) '#action',id INT '#id',
BaseVehicleID INT './BaseVehicle/#id',
Note varchar(20) './Note/#id'
);
Table structure
DECLARE #TABLE TABLE
(
[App action] VARCHAR (50),
[APPID] VARCHAR (50),
[BaseVehicleid] VARCHAR (50),
[Qual] VARCHAR (50),
[Qty] VARCHAR (50),
[PartTypeID] VARCHAR (50),
MfrLabel VARCHAR (50),
PositionID VARCHAR (50),
Part VARCHAR (50),
[param value] VARCHAR (50),
SubModelID VARCHAR (50),
EngineBaseID VARCHAR (50),
EngineVINID VARCHAR (50),
RecordCount VARCHAR (50),
Note VARCHAR (50)
)
but note column is coming as NULL. Can anyone please suggest.

NHibernate Profiler is lying to me?

Im using Fluent NHibernate as my ORM and NH Profiler is throwing me this sql query when I execute it
INSERT INTO [Location]
(Name,
Lat,
Lon)
VALUES ('my address' /* #p0 */,
-58.37538459999996 /* #p1 */,
-34.5969468 /* #p2 */);
which is absolutly correct by the way.
This is the design of my Location table:
[Id] [int] IDENTITY(1,1) NOT NULL,
[Name] [varchar](255) NULL,
[Lat] [decimal](23, 20) NULL,
[Lon] [decimal](23, 20) NULL,
But when I see the inserted data in sqlserver management studio, I can see it inserted
-58.37538000000000000000 instead of -58.37538459999996000000 and
-34.59694000000000000000 instead of -34.59694680000000000000.
When I execute this insert query manually, it inserts the values correctly (the 14 decimals for Lat Column) , but if nhibernate does this, it only inserts 5 decimals.
Any ideas???

The issue comes with the default precision and scale. NHibernate default decimal representation is decimal (28,5), so that's why the final INSERT statement contains only 5 decimal places. XML mapping
<property name="Lat" precision="23" scale="20" />
<property name="Lon" precision="23" scale="20" />
fluent:
...
.Precision(23)
.Scale(20)
Now the INSERT statement will be decimal (23,20).

Phonecall Database SQL Queries

I have been working on a database project for a phone call, here is a quick view of the diagram, callid is autoincremented in the call table, and sessionid is auto incremented in the session table, This way on a three way call the calls have the same diagram. I have entered in fictitious data in every field except sessionstarttime and sessionend time
Using phpmyadmin,
My question: I need to do a one query that will give me billable time for the customer aka a phonenumber.
Example phone call
A -> B from 12:00PM to 1:00PM
B -> C from 12:30PM to 1:30PM
A should be billed for 1hour
B should be billed for 1 1/2 hours (1:30 hrs)
C should be billed for 1 hour
Another Example
A -> B 12:00PM to 1:00PM
A -> C 12:30PM to 1:30PM
A should be billed for 1 1/2 hours (1:30 hrs)
B should be billed for 1hour
C should be billed for 1 hour
Here are the given data formats
- <table name="Account">
<column name="AccountID">1</column>
<column name="AcctHolderNum">617-100-5001</column>
<column name="ProviderID">1</column>
</table>
<table name="call">
<column name="callID">4</column>
<column name="callSender">617-719-9000</column>
<column name="callReceiver">617-730-8100</column>
<column name="callStartTime">2012-11-06 06:44:50</column>
<column name="callEndTime">2012-11-06 06:55:50</column>
<column name="sessionID">1</column>
- <table name="phoneNum">
<column name="phoneNum">617-300-2000</column>
<column name="phoneNumFN">Nigel</column>
<column name="phoneNumLN">Thornberry</column>
<column name="PhoneAccountID">2</column>
- <table name="Provider">
<column name="ProviderID">1</column>
<column name="ProviderName">T-Mobile</column>
</table>
- <table name="session">
<column name="sessionID">1</column>
<column name="sessionStartTime">2012-11-06 06:44:50</column>
<column name="sessionEndTime">2012-11-06 06:55:50</column>
Here is the ER diagram
http://i.stack.imgur.com/rrh4B.jpg
Here is what I got started thinking but drown myself in confusion trying to make the one query fit every possible input in the call table
FROM `call` as `call1`, `call` as `call2`, `call` as `call3`
WHERE `call1.sessionid` = `call2.sessionid` = `call3.sessionid`
AND <REST OF STUFF>
UNION /* not union all, but union*/
SELECT same as above but for three way calls
FROM `call` as `call1`, `call` as `call2`,
WHERE `call1.sessionid` = `call2.sessionid`
AND <REST OF STUFF>
UNION
SELECT same as above but for two way calls
FROM `call`
WHERE <REST OF STUFF>
Also here are a couple of simple queries for reference
Calculates length of each call
SELECT TIMEDIFF(MIN(`callStartTime`), MAX(`callEndTime`))
FROM `call` GROUP BY `callID`
Calculates length of each session
SELECT TIMEDIFF(MIN(`callStartTime`), MAX(`callEndTime`))
FROM `call` GROUP BY `sessionID`
Minutes of calls made (note callsender) by account
SELECT SUM(TIMEDIFF(`callStartTime`, `callEndTime`))
FROM `call`, `Phonenum`
WHERE `phoneNum.phoneNum` = `call.callSender`
GROUP BY `phoneAccountID`
Minutes of calls recieved (note callreciever) by account
SELECT SUM(TIMEDIFF(`callStartTime`, `callEndTime`))
FROM `call`, `Phonenum`
WHERE `phoneNum.phoneNum` = `call.callReciever` GROUP BY `phoneAccountID`
Here is the xml output for the schema
- <pma:structure_schemas>
- <pma:database name="jr_Team5" collation="utf8_general_ci" charset="utf8">
<pma:table name="Account">CREATE TABLE `Account` ( `AccountID` int(11) NOT NULL AUTO_INCREMENT COMMENT 'AI Primary Key', `AcctHolderNum` varchar(50) NOT NULL COMMENT 'Account Holder''s Phone Number i.e. "617-100-5001"', `ProviderID` int(11) DEFAULT NULL COMMENT 'Foreign Key from "ProviderID"', PRIMARY KEY (`AccountID`), KEY `AcctHolderNum` (`AcctHolderNum`), KEY `ProviderID` (`ProviderID`), CONSTRAINT `Account_ibfk_1` FOREIGN KEY (`ProviderID`) REFERENCES `Provider` (`ProviderID`) ) ENGINE=InnoDB AUTO_INCREMENT=7 DEFAULT CHARSET=utf8;</pma:table>
<pma:table name="call">CREATE TABLE `call` ( `callID` int(11) NOT NULL AUTO_INCREMENT COMMENT 'AI Primary Key', `callSender` varchar(50) NOT NULL COMMENT 'Phone Number of Caller', `callReceiver` varchar(50) NOT NULL COMMENT 'Phone Number of Reciever', `callStartTime` datetime NOT NULL COMMENT 'Time Call Begins', `callEndTime` datetime NOT NULL COMMENT 'Time Call Ends', `sessionID` int(11) NOT NULL COMMENT 'Foreign Key from "SessionID"', PRIMARY KEY (`callID`), KEY `callSender` (`callSender`), KEY `callReceiver` (`callReceiver`), KEY `sessionID` (`sessionID`), CONSTRAINT `call_ibfk_1` FOREIGN KEY (`callSender`) REFERENCES `phoneNum` (`phoneNum`), CONSTRAINT `call_ibfk_2` FOREIGN KEY (`callReceiver`) REFERENCES `phoneNum` (`phoneNum`), CONSTRAINT `call_ibfk_3` FOREIGN KEY (`sessionID`) REFERENCES `session` (`sessionID`) ) ENGINE=InnoDB AUTO_INCREMENT=61 DEFAULT CHARSET=utf8;</pma:table>
<pma:table name="phoneNum">CREATE TABLE `phoneNum` ( `phoneNum` varchar(50) NOT NULL COMMENT 'Phone Number on Record', `phoneNumFN` varchar(50) DEFAULT NULL COMMENT 'First Name of Phone User', `phoneNumLN` varchar(100) DEFAULT NULL COMMENT 'Last Name of Phone User', `PhoneAccountID` int(11) DEFAULT NULL COMMENT 'Foreign Key from "AccountID"', PRIMARY KEY (`phoneNum`), KEY `PhoneAccountID` (`PhoneAccountID`), CONSTRAINT `phoneNum_ibfk_1` FOREIGN KEY (`PhoneAccountID`) REFERENCES `Account` (`AccountID`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8;</pma:table>
<pma:table name="Provider">CREATE TABLE `Provider` ( `ProviderID` int(11) NOT NULL AUTO_INCREMENT COMMENT 'AI Primary Key', `ProviderName` varchar(50) NOT NULL COMMENT 'Network Provider i.e. "Verizon" or "Sprint"', PRIMARY KEY (`ProviderID`) ) ENGINE=InnoDB AUTO_INCREMENT=5 DEFAULT CHARSET=utf8;</pma:table>
<pma:table name="session">CREATE TABLE `session` ( `sessionID` int(11) NOT NULL AUTO_INCREMENT COMMENT 'AI Primary Key', `sessionStartTime` datetime DEFAULT NULL COMMENT 'Session Begin Time', `sessionEndTime` datetime DEFAULT NULL COMMENT 'Session End Time', PRIMARY KEY (`sessionID`) ) ENGINE=InnoDB AUTO_INCREMENT=29 DEFAULT CHARSET=utf8;</pma:table>
</pma:database>
</pma:structure_schemas>
Thanks for your help in advance,any help/input/direction would be appreciated let me know if you think of any other complex queries possible for this database
Fourway call data example in xml
- <table name="call">
<column name="callID">40</column>
<column name="callSender">617-292-1309</column>
<column name="callReceiver">617-300-2000</column>
<column name="callStartTime">2012-10-31 09:07:35</column>
<column name="callEndTime">2012-10-31 11:07:35</column>
<column name="sessionID">7</column>
</table>
- <table name="call">
<column name="callID">41</column>
<column name="callSender">617-300-2000</column>
<column name="callReceiver">617-234-1234</column>
<column name="callStartTime">2012-10-31 09:37:35</column>
<column name="callEndTime">2012-10-31 12:37:35</column>
<column name="sessionID">7</column>
</table>
- <table name="call">
<column name="callID">42</column>
<column name="callSender">617-234-1234</column>
<column name="callReceiver">617-200-4000</column>
<column name="callStartTime">2012-10-31 10:37:35</column>
<column name="callEndTime">2012-10-31 11:37:35</column>
<column name="sessionID">7</column>

I think you have two problems here. The first is determining how much time to allot to each caller in a session. The second is to aggregate this information.
Let me assume that all the time for a given phone number in a session is contiguous. That is, there is no call from B--> from 12:00 to 12:15 (because C would not be contiguous). Then you can get the timings for each user within a session:
select c.sessionid, c.caller,
(max(c.EndTime) - min(c.StartTime)) as dur
from ((select c.sessionid, c.callSender as caller, c.StartTime, c.EndTime
from call c
) union all
(select c.sessionid, c.callReceiver, c.StartTime, c.EndTime
from call c
)
) c
on s.sessionid = c.sessionid
group by c.sessionid, c.caller
From this, you can then aggregate over all sessions.
If the calls periods within a session are not contiguous, then the problem is more challenging. The best way to solve this problem depends on the database and the functions available in the database.

PK Violation on a history table

This is in SQL Server 2005.
I have an address table:
dbo.Address
(
AddressID INT IDENTITY(1, 1) PRIMARY KEY
LastUpdateBy VARCHAR(30)
<bunch of address columns>
)
I also have a history table:
dbo.AddressHistory
(
AddressID INT,
AsOf DATETIME,
UpdateBy VARCHAR(30)
<all the address columns>
CONSTRAINT PK_dbo_AddressHistory PRIMARY KEY CLUSTERED (AddressID, AsOf)
)
I have a trigger on dbo.Address to create history entries on both INSERT and UPDATE which will basically do this:
INSERT INTO dbo.AddressHistory(AddressID, AsOf, UpdateBy, <address columns>)
SELECT AddressID, CURRENT_TIMESTAMP, #UpdateBy, <address columns>
FROM INSERTED
But, every once in while, I'll get a PK violation on dbo.AddressHistory complaining about a duplicate PK being inserted. How is this possible if part of the PK for AddressHistory is the current timestamp of the insertion?
Even executing this will insert two rows into the history table successfully:
INSERT INTO dbo.Address
(LastUpdateBy, <address columns>)
SELECT 'test', <address columns>
FROM dbo.Address
WHERE AddressID < 3
And the only update sproc I have for the dbo.Address table will update a row for a given AddressID. So it should only be updating one row at a time. My insert sproc only inserts one row at a time as well.
Any idea what conditions cause this to occur?

Based on your description two concurrent executions of the stored procedure with the same parameter would seem likely.
datetime only has a precision of 1/300 second so conflicts can occur if these executions happen very close together.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Apache Ignite: join between partitioned and replicated tables - ignite

Related

Inserting XML data into SQL Table with multiple nodes

Processing xml files and push data to sql table

NHibernate Profiler is lying to me?

Phonecall Database SQL Queries

PK Violation on a history table

Categories

Resources