An efficient way to group columns in MonetDB - sql

I have the next INSERT with SELECT in MonetDB:`
insert into colombia.agregada_region_mes
(
cod_anomes
, cod_produto
, sg_estado
, cod_subcanal
, qtd_vendidas
, valor
, valor_dolar
, valor_euro
, fact_count
)
select
f.cod_anomes
, f.cod_produto
, f.sg_estado
, f.cod_subcanal
, sum(f.qtd_vendidas) as qtd_vendidas
, sum(f.valor) as valor
, sum(f.valor_dolar) as valor_dolar
, sum(f.valor_euro) as valor_euro
, count(*) as fact_count
from colombia.staging_rm_fact f
group by
f.cod_anomes
, f.cod_produto
, f.sg_estado
, f.cod_subcanal;
(Please, note the GROUP BY part).
Table "staging_rm_fact" has 50 million of rows, and MonetDB exceeds 16Gb of memory trying to resolve the INSERT:
Is there any other efficient way to resolve this group by?

Related

How can i fix this problem with OR/AND clause in SQL

I have a set of patents that I have titles and abstracts on and would like to search in such a way that their final search algorithm requires the keyword “software” to be present, and none of the keywords “chip”, “semiconductor”, “bus”, “circuit” or “circuit” to be present.
i did This:
SELECT distinct
tls201_appln.docdb_family_id
, tls201_appln.appln_id
, [appln_auth]
, [appln_nr]
, [appln_kind]
, [appln_filing_date]
, [receiving_office]
, [earliest_publn_date]
, [granted]
, [nb_citing_docdb_fam]
, [nb_applicants]
, [nb_inventors]
, tls202_appln_title.appln_title
FROM tls201_appln
INNER JOIN tls202_appln_title ON tls201_appln.appln_id = tls202_appln_title.appln_id
INNER JOIN tls203_appln_abstr ON tls201_appln.appln_id = tls203_appln_abstr.appln_id
WHERE (appln_title like '%software%'
or appln_abstract like '%software%')
AND appln_title not like '%chip%'
or '%semiconductor%'
or '%circuity%'
or '%circuitry%'
or '%bus'%'
or appln_abstract not like '%chip%'
or '%semiconductor%'
or '%circuity%'
or '%circuitry%'
or '%bus'%'
AND appln_filing_year between 2003 and 2008
but im getting this error An expression of non-boolean type specified in a context where a condition is expected, near 'or'. What should i do?
This is wrong:
and appln_title not like '%chip%' or '%semiconductor%' or '%circuity%'
This is right:
and appln_title not like '%chip%'
and appln_title not like '%semiconductor%'
and appln_title not like '%circuity%'
Sample data would be helpful so I could test my code.
To help whoever maintains this code in the future, you'd do well to qualify your column names. It's difficult to know what's going on when I can't tell which table each column comes from.
This is considerably more verbose, but possibly easier to understand and maintain. Will this work?
with main as (
SELECT distinct
tls201_appln.docdb_family_id
, tls201_appln.appln_id
, [appln_auth]
, [appln_nr]
, [appln_kind]
, [appln_filing_date]
, [receiving_office]
, [earliest_publn_date]
, [granted]
, [nb_citing_docdb_fam]
, [nb_applicants]
, [nb_inventors]
, tls202_appln_title.appln_title
, appln_abstract
FROM tls201_appln
INNER JOIN tls202_appln_title ON tls201_appln.appln_id = tls202_appln_title.appln_id
INNER JOIN tls203_appln_abstr ON tls201_appln.appln_id = tls203_appln_abstr.appln_id
WHERE appln_title + ' ' + appln_abstract like '%software%'
AND appln_filing_year between 2003 and 2008
)
select docdb_family_id
, appln_id
, [appln_auth]
, [appln_nr]
, [appln_kind]
, [appln_filing_date]
, [receiving_office]
, [earliest_publn_date]
, [granted]
, [nb_citing_docdb_fam]
, [nb_applicants]
, [nb_inventors]
, appln_title
from main
except
select docdb_family_id
, appln_id
, [appln_auth]
, [appln_nr]
, [appln_kind]
, [appln_filing_date]
, [receiving_office]
, [earliest_publn_date]
, [granted]
, [nb_citing_docdb_fam]
, [nb_applicants]
, [nb_inventors]
, appln_title
from main
where exists (
select 1
from (
select '%chip%' as cond
union select '%semiconductor%'
union select '%circuity%'
union select '%circuitry%'
union select '%bus%'
) c
where appln_title like c.cond
or appln_abstract like c.cond
)
based on https://stackoverflow.com/a/1127100/9937026

Inserting CTE results into temp table generates duplicate rows

I have a long series of CTEs, and I want to insert the results into a temporary table at the end. However, I am ending up with 16 rows of duplicate data instead of just the one. Here is my code:
INSERT INTO #lvl_1_results
(
[Level_1]
, [Level_0]
, [P]
, [Sim (%)]
, [V]
, [StAV (%)]
, [CV]
, [CV (%)]
, [Sim (Mean)]
, [SD]
, [IAV]
, [MAV]
, [BMP]
, [BMP (%)]
, [BMV]
, [BMV (%)]
, [BMCV]
, [BMCV (%)]
, [Sim BMP (Mean)]
, [BM SD]
, [SAP]
, [SAP (%)]
, [ActV]
, [ActV (%)]
, [ACV]
, [ACV (%)]
, [Act (Mean)]
, [Act SD]
, [MV]
, [MV (%)]
)
SELECT sc.[Level_1]
, sc.[Level_0]
, [P]
, [P] / NULLIF([MV], 0)
, [V]
, [V] / NULLIF([MV], 0)
, [CV]
, [CV] / NULLIF([upper VaR], 0)
, [Sim (Mean)]
, [SD]
, [IAV]
, [MAV]
, [BMP]
, [BMP] / NULLIF([MV], 0)
, [BMV]
, [BMV] / NULLIF([MV], 0)
, [BMCV]
, [BMCV] / NULLIF([upper bm VaR], 0)
, [Sim BMP (Mean)]
, [BM SD]
, [SAP]
, [SAP] / NULLIF([MV], 0)
, [ActV]
, [ActV] / NULLIF([MV], 0)
, [ACV]
, [ACV] / NULLIF([upper active VaR], 0)
, [Act (Mean)]
, [Act SD]
, [MV]
, [MV] / NULLIF([parent_MV], 0)
FROM sc
, s_p
, s_v
, bm_c
, bm_v
, bm_p
, a_c
, a_v
, a_p
, MV
, upper_mv
, c_l_v
, bm_c_l_v
, a_c_l_v
, s_m
, bm_m
, a_m
, sim_sd
, bm_sd
, a_sd
, s_i
, s_mar
How can I get this query to produce just one row? For performance reasons, I think I should opt for a solution that doesn't just select the distinct rows.
As others have indicated in their comments, this query doesn't use the ansi JOIN syntax:
TableA
[INNER/LEFT] JOIN
TableB
ON
TableA.id=tableb.id
The advantage of using this style is that you Must supply an ON clause which forces you to think about how the data in your tables is actually related
You havent told the database how this data is related at all, so all it has done is join every row it found, to every other row, giving every possible combination of rows
Because you're getting 16 rows, it implies that some of your tables have 2 or more rows. If 2 tables have 2 rows, then the result is 2x2x2x2 rows. (Or maybe two tables have 2 rows and one has 4, or maybe two tables have 4 rows.. or a 2 and an 8, either way, when multiplied so every row is joined to every other row, you're getting 16 rows)
If you have 2 tables that have values a,b and 1,2 and they're joined like you have done here the database will give you
a,1
a,2
b,1
b,2
These rows are all unique in themselves, but if you just selected the letter you'd see a a b b, and then complain there were duplicate letters. If you just look at the numbers the complaint would be the same
Add more tables in without any relationship between them, and it will get worse. Hopefully easy to see that adding another 2 row table with x and y would increase your results to 8 rows..
You have to tell your database which columns in which tables link the tables together. If you want to see which tables cause your duplicates, do a select *

ORA-30926: unable to get a stable set of rows in the source tables

I have a customer who gets: ORA-30926: unable to get a stable set of rows in the source tables:
Log show error Massage Error (30926)
13:52:19 (00:00:02.406) ERROR : Error (30926) (00:00:02.406) ORA-30926: Stabile Zeilengruppe in den Quelltabellen kann nicht eingelesen werden
TS03_MIN0100: UpdTable failed. Update inv_value in cMinTimeTable:
MERGE INTO HUBWBPMS5_ENTTS03005400223 a USING ( SELECT DISTINCT a.inv_value +
( a.inv_value_sum - h.inv_value ) AS inv_value , a.rowid xzfd_rid
FROM HUBWBPMS5_ENTTS03005700223 h , HUBWBPMS5_ENTTS03005400223 a
WHERE a.voucher_no = h.voucher_no AND a.sequence_no = h.max_seq_no
AND a.client = h.client ) xzfd_t ON ( xzfd_t.xzfd_rid = a.rowid )
WHEN MATCHED THEN
UPDATE
SET a.inv_value = xzfd_t.inv_value
I have checked for duplicate values in the tables but cant find anything unusual.
Maybe someone has an idea that could be useful.
The query is:
Query causing error (temp table):
INSERT INTO HUBWBPMS5_ENTTS03005700228 ( agg_flag , ace_code , activity , category , client , cost_dep , description , dim1 , dim2 , dim3 , dim4 , inc_ref , inv_value , max_seq_no , pd , period , project , resource_id , resource_typ , trans_date , unit , voucher_no , work_order , work_type )
SELECT agg_flag , ace_code , activity , category , client , cost_dep , description , dim1 , dim2 , dim3 , dim4 , inc_ref , SUM ( inv_value ) inv_value , max_seq_no , pd , period , project , resource_id , resource_typ , trans_date , unit , voucher_no , work_order , work_type
FROM HUBWBPMS5_ENTTS03005400228
WHERE agg_flag = 1
GROUP BY agg_flag , ace_code , activity , category , client , cost_dep , description , dim1 , dim2 , dim3 , dim4 , period , trans_date , voucher_no , max_seq_no , inc_ref , pd , project , resource_id , resource_typ , unit , work_order , work_type
When you get that error, it will be from a MERGE statement, and it indicates that there are multiple rows in the source dataset that match to a row you're joining to in the target table, and as such, Oracle doesn't know which one to use to do the update.
Taking your merge statement:
MERGE INTO HUBWBPMS5_ENTTS03005400223 a
USING (SELECT DISTINCT a.inv_value + ( a.inv_value_sum - h.inv_value ) AS inv_value,
a.rowid xzfd_rid
FROM HUBWBPMS5_ENTTS03005700223 h,
HUBWBPMS5_ENTTS03005400223 a
WHERE a.voucher_no = h.voucher_no
AND a.sequence_no = h.max_seq_no
AND a.client = h.client) xzfd_t
ON (xzfd_t.xzfd_rid = a.rowid)
WHEN MATCHED THEN
UPDATE SET a.inv_value = xzfd_t.inv_value;
it looks like the join between the two tables HUBWBPMS5_ENTTS03005700223 and HUBWBPMS5_ENTTS03005400223 in the xzfd_t subquery causes multiple rows to be returned for one or more of the HUBWBPMS5_ENTTS03005400223 rows (ie. you get multiple rows returned for at least one a.rowid).
To check this, run:
SELECT xzfd_rid,
COUNT(*) cnt
FROM (SELECT DISTINCT a.inv_value + ( a.inv_value_sum - h.inv_value ) AS inv_value,
a.rowid xzfd_rid
FROM HUBWBPMS5_ENTTS03005700223 h,
HUBWBPMS5_ENTTS03005400223 a
WHERE a.voucher_no = h.voucher_no
AND a.sequence_no = h.max_seq_no
AND a.client = h.client)
GROUP BY xzfd_rid
HAVING COUNT(*) > 1;
In order to fix this, you'd need to make the xzfd_t subquery return a single row for each xzfd_rid. Possibly using row_number() to pick a single row, or an aggregate query to sum up all the h.inv_value fields per a.rowid instead of the DISTINCT.

Column 'city.POPULATION' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause

I was creating a query, and i saw this error-message.
This is the query i'm trying to execute:
select D.ID_city
, D.ID_year
, D.NB_deaths
, D.NB_births_M
, D.NB_births_F
, D.NB_population_AS
, D.NB_population_ACL
, COUNT( NB_births_M+ NB_births_F)/C.population as Birthrate
, COUNT(NB_deaths_M + NB_deaths_F) as Mortality
from Demography D
join region C
on (D.Id_city =C.Id_city)
group
by D.ID_city
, D.ID_year
, D.NB_deaths_M
, D.NB_births_M
, D.NB_births_F
, D.NB_population_AS
, D.NB_population_ACL
You should include c.population in the group by, since you use that column outside the count aggregate function:
COUNT( NB_births_M+ NB_births_F)/C.population
Use this (added D.NB_deaths_F too since it is necessary too):
select D.ID_city
, D.ID_year
, D.NB_deaths
, D.NB_births_M
, D.NB_births_F
, D.NB_population_AS
, D.NB_population_ACL
, COUNT( NB_births_M+ NB_births_F)/C.population as Birthrate
, COUNT(NB_deaths_M + NB_deaths_F) as Mortality
from Demography D
join region C
on (D.Id_city =C.Id_city)
group
by D.ID_city
, D.ID_year
, D.NB_deaths_M
, D.NB_deaths_F -- <-- added
, D.NB_births_M
, D.NB_births_F
, D.NB_population_AS
, D.NB_population_ACL
, C.population -- <-- added

How to retrieve the records of the table from database to another database

I have two tables with different database:
AccntID from table 'account' <--database name 'ECPNWEB'
AccountTID from table 'tblPolicy' <--database name 'GENESIS'
Now I want to insert 'tblPolicy' like this: <--database 'GENESIS'
INSERT INTO dbo.tblPolicy
(
PolicyID ,
AccountTID ,
DistributorID ,
CARDNAME ,
DENOMINATION ,
RETAILPRICE ,
COSTPAYABLE ,
ECPAYFEES ,
PLUCODE
)
-- Insert statements for procedure here
select t.* from
(Select AccountTID=#AccntID, DistributorID=#DistributorID, CARDNAME=#CARDNAME, DENOMINATION=#DENOMINATION, RETAILPRICE=#RETAILPRICE, COSTPAYABLE=#COSTPAYABLE, ECPAYFEES=#ECPAYFEES, PLUCODE=#PLUCODE) t,
account a
where a.AccntID = t.AccountTID --for account
Now what I want to do is to insert this "ONLY" to tblPolicy connected with the 'account' table with different database 'GENESIS'
You can select from two databases as shown below:
SELECT table1.SomeField, table2.SomeField
FROM [ServerName1].[Database1].[dbo].[Table1] table1
INNER JOIN [ServerName2].[Database2].[dbo].[Table2] table2
ON table1.SomeField = table2.SomeField
The key point is [ServerName].[DatabaseName].[databaseowner].[tableName]..i.e. Fully qualified name
This should work
;With Cte As
(
Select AccountTID=#AccntID
, DistributorID=#DistributorID
, CARDNAME=#CARDNAME
, DENOMINATION=#DENOMINATION
, RETAILPRICE=#RETAILPRICE
, COSTPAYABLE=#COSTPAYABLE
, ECPAYFEES=#ECPAYFEES
, PLUCODE=#PLUCODE
)
INSERT INTO GENESIS..dbo.tblPolicy
(
PolicyID ,
AccountTID ,
DistributorID ,
CARDNAME ,
DENOMINATION ,
RETAILPRICE ,
COSTPAYABLE ,
ECPAYFEES ,
PLUCODE
)
Select t.*
From Cte t,ECPNWEB..account a WITH (NOLOCK)
where a.AccntID = t.AccountTID