Convert a complex PostgreSQL query to Ruby - sql

I'm new here and have some difficult to convert a complex Postgres query to Ruby. Here is the query:
SELECT date_part('year', p.created_at) AS anio,
date_part('month'::text, p.created_at) AS mes,
p.id_pais, pa.nombre AS nombre_pais,
pr.id_region, re.nombre AS nombre_region,
co.id_provincia, pr.nombre AS nombre_provincia,
p.id_comuna, co.nombre AS nombre_comuna,
p.id_tipo_propiedad, p.id_modalidad,
count(*) AS total_propiedades,
sum((p.precio/mo.conversion_dolar)/p.dimension_propiedad) AS suma_precio_m2_dolar,
sum((p.precio/mo.conversion_dolar)/p.dimension_propiedad)/count(*) AS promedio_precio_m2_dolar
FROM propiedad AS p
INNER JOIN monedas AS mo ON (p.id_moneda = mo.id)
INNER JOIN comuna AS co ON (p.id_comuna = co.id)
INNER JOIN provincia AS pr ON (co.id_provincia = pr.id)
INNER JOIN region AS re ON (pr.id_region = re.id)
INNER JOIN pais AS pa ON (p.id_pais = pa.id)
WHERE p.id_modalidad IS NOT NULL
AND p.created_at IS NOT NULL
AND p.precio > 1
AND p.dimension_propiedad > 1
GROUP BY date_part('year',p.created_at), date_part('month'::text, p.created_at), p.id_tipo_propiedad,
p.id_modalidad, p.id_pais, pr.id_region, co.id_provincia, p.id_comuna, pa.nombre, re.nombre, pr.nombre, co.nombre
If you can help me with this i'll appreciate it.

I'd just leave something like that as plain SQL and hand it to select_rows:
connection.select_rows(%q{
-- your big pile of SQL goes here
}).each do |row|
# row will be an array of Strings in here
end
You'll have to unstringify things yourself but that shouldn't be terribly difficult. You could also use execute:
rows = connection.execute(%q{
-- SQL goes here...
})
rows.each do |row|
# row will be a hash
end
Your query is pulling a bunch of data out of the database that doesn't correspond to any particular model so there's not much point in trying to get ActiveRecord to produce that SQL. If the query was producing a model, then you'd use Model.find_by_sql (and still avoid trying to convince ActiveRecord to produce the SQL you're after).

Related

If transaction within date range, then return customer name (and not all the transactions!)

This code is taking a significant amount of time to run. It's returning every single transaction within the date range but I just need to know if the customer has had at least one transaction, then include the CustomerID, CustomerName, Type, Sign, ReportingName.
I think I need to GROUP BY 'CustomerID' but again only if there was a transaction within the date range. And of course, I'm sure there is an optimal way to execute the below TSQL because it's quite slow at present.
Thanks in advance for any help!
SELECT [ABC].[dbo].[vwPrimary].[RelatedNameId] AS CustomerID
,[ABC].[dbo].[vwPrimary].[RelatedName] AS CustomerName
,[AFGPurchase].[IvL].[TaxTreatment].[ParticluarType] AS Type
,[AFGPurchase].[IvL].[Product].[Sign] AS [Sign]
,[AFGPurchase].[IvL].[Product].[ReportingName] AS ReportingName
,[AFGPurchase].[IvL].[Transaction].[EffectiveDate] AS 'Date'
FROM (((([AFGPurchase].[IvL].[Account]
INNER JOIN [AFGPurchase].[IvL].[Position] ON [AFGPurchase].[IvL].[Account].[AccountId] = [AFGPurchase].[IvL].[Position].[AccountId])
INNER JOIN [AFGPurchase].[IvL].[Product] ON [AFGPurchase].[IvL].[Position].[ProductID] = [AFGPurchase].[IvL].[Product].[ProductId])
INNER JOIN [ABC].[dbo].[vwPrimary] ON [AFGPurchase].[IvL].[Account].[ReportingEntityId] = [ABC].[dbo].[vwPrimary].[RelatedNameId])
INNER JOIN [AFGPurchase].[IvL].[TaxTreatment] ON [AFGPurchase].[IvL].[Account].[TaxTreatmentId] = [AFGPurchase].[IvL].[TaxTreatment].[TaxTreatmentId])
INNER JOIN [AFGPurchase].[IvL].[Transaction] ON [AFGPurchase].[IvL].[Position].[PositionId] = [AFGPurchase].[IvL].[Transaction].[PositionId]
WHERE ((([AFGPurchase].[IvL].[TaxTreatment].[RegistrationType]) LIKE 'NON%')
AND (([AFGPurchase].[IvL].[Product].[Sign])='XYZ2')
AND (([AFGPurchase].[IvL].[Position].[Quantity])<>0)
AND (([AFGPurchase].[IvL].[Transaction].[EffectiveDate]) between '2021-12-31' and '2022-12-31'))
Check your indexes on fragmentation, to speed up your query. And make sure you have indexes.
If you just need one result, just TOP 1
SELECT TOP 1 [ABC].[dbo].[vwPrimary].[RelatedNameId] AS CustomerID
,[ABC].[dbo].[vwPrimary].[RelatedName] AS CustomerName
,[AFGPurchase].[IvL].[TaxTreatment].[ParticluarType] AS Type
,[AFGPurchase].[IvL].[Product].[Sign] AS [Sign]
,[AFGPurchase].[IvL].[Product].[ReportingName] AS ReportingName
,[AFGPurchase].[IvL].[Transaction].[EffectiveDate] AS 'Date'
FROM (((([AFGPurchase].[IvL].[Account]
INNER JOIN [AFGPurchase].[IvL].[Position] ON [AFGPurchase].[IvL].[Account].[AccountId] = [AFGPurchase].[IvL].[Position].[AccountId])
INNER JOIN [AFGPurchase].[IvL].[Product] ON [AFGPurchase].[IvL].[Position].[ProductID] = [AFGPurchase].[IvL].[Product].[ProductId])
INNER JOIN [ABC].[dbo].[vwPrimary] ON [AFGPurchase].[IvL].[Account].[ReportingEntityId] = [ABC].[dbo].[vwPrimary].[RelatedNameId])
INNER JOIN [AFGPurchase].[IvL].[TaxTreatment] ON [AFGPurchase].[IvL].[Account].[TaxTreatmentId] = [AFGPurchase].[IvL].[TaxTreatment].[TaxTreatmentId])
INNER JOIN [AFGPurchase].[IvL].[Transaction] ON [AFGPurchase].[IvL].[Position].[PositionId] = [AFGPurchase].[IvL].[Transaction].[PositionId]
WHERE ((([AFGPurchase].[IvL].[TaxTreatment].[RegistrationType]) LIKE 'NON%')
AND (([AFGPurchase].[IvL].[Product].[Sign])='XYZ2')
AND (([AFGPurchase].[IvL].[Position].[Quantity])<>0)
AND (([AFGPurchase].[IvL].[Transaction].[EffectiveDate]) between '2021-12-31' and '2022-12-31'))
If you only need to check for the existence of a row, and not actually get any data from it then use EXISTS() rather than INNER JOIN, e.g.
SELECT vpr.[RelatedNameId] AS CustomerID
,vpr.[RelatedName] AS CustomerName
,tt.[ParticluarType] AS Type
,prd.[Sign]
,prd.ReportingName
,tr.[EffectiveDate] AS [Date]
FROM [AFGPurchase].[IvL].[Account] AS acc
INNER JOIN [AFGPurchase].[IvL].[Position] AS pos ON acc.[AccountId] = pos.[AccountId]
INNER JOIN [AFGPurchase].[IvL].[Product] AS prd ON pos.[ProductID] = prd.[ProductId]
INNER JOIN [ABC].[dbo].[vwPrimary] AS vpr ON acc.[ReportingEntityId] = vpr.[RelatedNameId]
INNER JOIN [AFGPurchase].[IvL].[TaxTreatment] AS tt ON acc.[TaxTreatmentId] = tt.[TaxTreatmentId]
WHERE tt.[RegistrationType] LIKE 'NON%'
AND prd.[Sign]='XYZ2'
AND pos.[Quantity]<>0
AND EXISTS
( SELECT 1
FROM [AFGPurchase].[IvL].[Transaction] AS tr
WHERE tr.[PositionId] = pos.[PositionId]
AND tr.[EffectiveDate] BETWEEN '2021-12-31' AND '2022-12-31'
);
N.B. I have added in table aliases and removed all the unnecessary parentheses for readability - you may disagree that it is more readable, but I would expect that most people would agree
This may not offer any performance benefits over simply grouping by the columns you are selecting and keeping your joins as they are - SQL is after all a declarative language where you tell the engine what you want, not how to get it. So you may find that the two plans are the same because you are requesting the same result. Using EXISTS does have the advance of being more semantically tied to what you are trying to do though, so gives the optimiser the best chance of getting to the right plan. If you are still having performance issues, then you may need to inspect the execution plan, and see if it suggests any indexes.
Finally, if you are really still using SQL Server 2008 then you really need to start thinking about your upgrade path. It has been completely unsupported for over 3 years now.

Join 3 tables using column common to all 3 tables

I am a total SQL novice, so please bear with me. I have three tables that are set up in the following fashion:
date|country|Test 1|Test 2|Test 3|etc.
The data in the date and country columns are identical across the three tables, and the differences are in the data in the Test columns. I'd like to use Join to query one date column and the three corresponding Test columns from the three tables.
I'm planning on just re-building the table so that the Test columns in the other tables are additional columns in the one table, but I'd still like to know how to use Join in this way. This is what I have at the moment, although it's throwing an error saying that there's an error in the syntax of the FROM clause. It's worth noting that I'm running this query in VBA using an Access DB.
SELECT r.CRDate, r.Test, p.Test, z.Test
FROM CountryRaw as r
INNER JOIN CountryPct as p ON p.CPctDate = r.CRDate
INNER JOIN CountryZ as z ON z.CZDate = p.CPctDate
WHERE r.Country = 'US' AND p.Country = 'US' AND z.Country = 'US'
I came across something using SELECT COALESCE(r.CRDate, p.CPctDate, z.CZDate) to start, but I didn't get anywhere with that.
MS Access requires extra parentheses. So try this:
SELECT r.CRDate, r.Test, p.Test, z.Test
FROM (CountryRaw as r INNER JOIN
CountryPct as p
ON p.CPctDate = r.CRDate
) INNER JOIN
CountryZ as z
ON z.CZDate = p.CPctDate
WHERE r.Country = 'US' AND p.Country = 'US' AND z.Country = 'US'

script optimization in sql

I am running simple select script, which inner join with other 3 table . all the tables are big ( lots of data ) its taking around 20 sec to run. want to optimized it.
I tried to used nolock , but not much deference
SELECT RR.ReportID,
RR.RequestFormat,
RRP.SequenceNumber,
RRP.ParameterName,
RRP.ParameterValue
CASE WHEN RP.ParameterLabelOvrrd IS NULL THEN P.ParameterLabel ELSE .ParameterLabelOvrrd END AS ParameterLabelChosen,
RRP.ParameterValueEntered
FROM ReportRequestParameters AS RRP WITH (NOLOCK)
INNER JOIN ReportRequests AS RR WITH (NOLOCK) ON RRP.RequestID = RR.RequestID
INNER JOIN ReportParameter AS RP WITH (NOLOCK) ON RP.ReportID = RR.ReportID
AND RP.SequenceNumber = RRP.SequenceNumber
INNER JOIN Parameter AS P WITH (NOLOCK) ON P.ParameterID = RP.ParameterID
WHERE RRP.RequestID = '2226765'
ORDER BY SequenceNumber;
Please advice.
This is your query:
SELECT RR.ReportID, RR.RequestFormat, RRP.SequenceNumber,
RRP.ParameterName, RRP.ParameterValue
COALESCE(RP.ParameterLabelOvrrd, P.ParameterLabel) as ParameterLabelChosen,
RRP.ParameterValueEntered
FROM ReportRequestParameters RRP JOIN
ReportRequests RR
ON RRP.RequestID = RR.RequestID JOIN
ReportParameter RP
ON RP.ReportID = RR.ReportID AND
RP.SequenceNumber = RRP.SequenceNumber JOIN
Parameter P
ON P.ParameterID = RP.ParameterID
WHERE RRP.RequestID = 2226765
ORDER BY RRP.SequenceNumber;
I have removed the single quotes on 2226765, assuming that the id is a number. Mixing types can impede the optimizer.
Then, I recommend an index on ReportRequestParameters(RequestID, SequenceNumber). I assume the other tables have indexes on the appropriate columns, but these are:
ReportRequests(RequestID, ReportID, SequenceNumber)
ReportParameter(ReportID, SequenceNumber, ParameterID)
Parameter(ParameterID)
I strongly advise you not to use nolock, unless you know what you are doing. Aaron Bertrand has a good blog post on this subject.
I would suggest running with the execution plan turned on and see if SSMS can advise you on additional indexing.
Other than that your query looks straight-forward, nothing code wise that is going to help make it faster, other than perhaps getting rid of the case statement and definitely getting rid of the NOLOCK statements.

Combining two SQL select statements takes 30 to 40 minutes to get executed

RDBMS - MS SQL
How can I combine these two SQL select queries into one and get them executed quicly:
Query 1
select
VVO.VV_CODE,
V.Vessel_name,
VVO.Arrival_date,
isnull(IGM.VIR_NO,'NULL') as VIR_NO,
isnull(VVO.TERMINAL_CODE,'NULL') as TERMINAL_CODE
from Vessel_voyage VVO, Vessel V, IGM
where V.Vessel_code = substring(VVO.VV_CODE,1,3) and VVO.VV_CODE = IGM.VV_CODE
Query 2
select
BLD.BL_NO,
isnull(BLD.Parent_BL,'NULL') as Parent_BL,
BLD.Consignee_Description,
DO.DO_Issue_Date,
CA.CAgent_Name,
BLC.Container_No ,
CS.Container_Size_Description
from BL_DATA BLD, CAgent CA, Delivery_Order DO, BL_Container BLC, Container_Size CS
where BLD.BL_NO = DO.BL_NO and DO.CAgent_Code = CA.CAgent_Code and BLD.BL_NO = BLC.BL_NO and BLC.Container_Size_Code = CS.Container_Size_Code
Executing these select queries individually, they get executed within seconds.
But making them into a single select query, they take around 30 to 40 minutes to get executed.
This is what I tried:
select
VVO.VV_CODE,
V.Vessel_name,
VVO.Arrival_date,
isnull(IGM.VIR_NO,'NULL') as VIR_NO,
isnull(VVO.TERMINAL_CODE,'NULL') as TERMINAL_CODE,
BLD.BL_NO,
isnull(BLD.Parent_BL,'NULL') as Parent_BL,
BLD.Consignee_Description,
DO.DO_Issue_Date,
CA.CAgent_Name,
BLC.Container_No ,
CS.Container_Size_Description
from Vessel_voyage VVO, Vessel V, IGM ,BL_DATA BLD, CAgent CA, Delivery_Order DO, BL_Container BLC, Container_Size CS
where V.Vessel_code = substring(VVO.VV_CODE,1,3) and VVO.VV_CODE = IGM.VV_CODE and BLD.BL_NO = DO.BL_NO and DO.CAgent_Code = CA.CAgent_Code and BLD.BL_NO = BLC.BL_NO and BLC.Container_Size_Code = CS.Container_Size_Code
Writing the query using ANSI style joins would make it easier to read - so I've done exactly that. It also helps spot where there are problems within the join logic.
Re-writing the joins I get to this :
from Vessel_voyage VVO
inner join Vessel V on V.Vessel_code = substring(VVO.VV_CODE,1,3)
inner join IGM on VVO.VV_CODE = IGM.VV_CODE
inner join BL_DATA BLD on BLD.BL_NO = DO.BL_NO
inner join CAgent CA
inner join Delivery_Order DO on DO.CAgent_Code = CA.CAgent_Code
inner join BL_Container BLC on BLD.BL_NO = BLC.BL_NO
inner join Container_Size CS on BLC.Container_Size_Code = CS.Container_Size_Code
I could move around the DO to CA join predicate but ultimately I have 8 tables being joined, but only 6 predicates joining them - net result is that it is cartesian'ing in one of the tables which is likely to give incorrect results, but most certainly will cause a performance degradation.
If you use this style of join I suspect you will be able to fix it very easily.

Need help optimizing this tSQL Query

I'm definitely not a DBA and unfortunately we don't have a DBA to consult within at our company. I was wondering if someone could give me a recommendation on how to improve this query, either by changing the query itself or adding indexes to the database.
Looking at the execution plan of the query it seems like the outer joins are killing the query. This query only returns 350k results, but it takes almost 30 seconds to complete. I don't know much about DB's, but I don't think this is good? Perhaps I'm wrong?
Any suggestions would be greatly appreciated. Thanks in advance.
As a side note this is obviously being create by an ORM and not me directly. We are using Linq-to-SQL.
SELECT
[t12].[value] AS [DiscoveryEnabled],
[t12].[value2] AS [isConnected],
[t12].[Interface],
[t12].[Description] AS [InterfaceDescription],
[t12].[value3] AS [Duplex],
[t12].[value4] AS [IsEnabled],
[t12].[value5] AS [Host],
[t12].[value6] AS [HostIP],
[t12].[value7] AS [MAC],
[t12].[value8] AS [MACadded],
[t12].[value9] AS [PortFast],
[t12].[value10] AS [PortSecurity],
[t12].[value11] AS [ShortHost],
[t12].[value12] AS [SNMPlink],
[t12].[value13] AS [Speed],
[t12].[value14] AS [InterfaceStatus],
[t12].[InterfaceType],
[t12].[value15] AS [IsUserPort],
[t12].[value16] AS [VLAN],
[t12].[value17] AS [Code],
[t12].[Description2] AS [Description],
[t12].[Host] AS [DeviceName],
[t12].[NET_OUID],
[t12].[DisplayName] AS [Net_OU],
[t12].[Enclave]
FROM (
SELECT
[t1].[DiscoveryEnabled] AS [value],
[t1].[IsConnected] AS [value2],
[t0].[Interface],
[t0].[Description],
[t2].[Duplex] AS [value3],
[t0].[IsEnabled] AS [value4],
[t3].[Host] AS [value5],
[t6].[Address] AS [value6],
[t3].[MAC] AS [value7],
[t3].[MACadded] AS [value8],
[t2].[PortFast] AS [value9],
[t2].[PortSecurity] AS [value10],
[t4].[Host] AS [value11],
[t0].[SNMPlink] AS [value12],
[t2].[Speed] AS [value13],
[t2].[InterfaceStatus] AS [value14],
[t8].[InterfaceType],
[t0].[IsUserPort] AS [value15],
[t2].[VLAN] AS [value16],
[t9].[Code] AS [value17],
[t9].[Description] AS [Description2],
[t7].[Host], [t7].[NET_OUID],
[t10].[DisplayName],
[t11].[Enclave],
[t7].[Decommissioned]
FROM [dbo].[IDB_Interface] AS [t0]
LEFT OUTER JOIN [dbo].[IDB_InterfaceLayer2] AS [t1] ON [t0].[IDB_Interface_ID] = [t1].[IDB_Interface_ID]
LEFT OUTER JOIN [dbo].[IDB_LANinterface] AS [t2] ON [t1].[IDB_InterfaceLayer2_ID] = [t2].[IDB_InterfaceLayer2_ID]
LEFT OUTER JOIN [dbo].[IDB_Host] AS [t3] ON [t2].[IDB_LANinterface_ID] = [t3].[IDB_LANinterface_ID]
LEFT OUTER JOIN [dbo].[IDB_Infrastructure] AS [t4] ON [t0].[IDB_Interface_ID] = [t4].[IDB_Interface_ID]
LEFT OUTER JOIN [dbo].[IDB_AddressMapIPv4] AS [t5] ON [t3].[IDB_AddressMapIPv4_ID] = ([t5].[IDB_AddressMapIPv4_ID])
LEFT OUTER JOIN [dbo].[IDB_AddressIPv4] AS [t6] ON [t5].[IDB_AddressIPv4_ID] = [t6].[IDB_AddressIPv4_ID]
INNER JOIN [dbo].[ART_Asset] AS [t7] ON [t7].[ART_Asset_ID] = [t0].[ART_Asset_ID]
LEFT OUTER JOIN [dbo].[NSD_InterfaceType] AS [t8] ON [t8].[NSD_InterfaceTypeID] = [t0].[NSD_InterfaceTypeID]
INNER JOIN [dbo].[NSD_InterfaceCode] AS [t9] ON [t9].[NSD_InterfaceCodeID] = [t0].[NSD_InterfaceCodeID]
INNER JOIN [dbo].[NET_OU] AS [t10] ON [t10].[NET_OUID] = [t7].[NET_OUID]
INNER JOIN [dbo].[NET_Enclave] AS [t11] ON [t11].[NET_EnclaveID] = [t10].[NET_EnclaveID]
) AS [t12]
WHERE ([t12].[Enclave] = 'USMC') AND (NOT ([t12].[Decommissioned] = 1))
LINQ-TO-SQL Query:
return from t in db.IDB_Interfaces
join v in db.IDB_InterfaceLayer3s on t.IDB_Interface_ID equals v.IDB_Interface_ID
join u in db.ART_Assets on t.ART_Asset_ID equals u.ART_Asset_ID
join c in db.NET_OUs on u.NET_OUID equals c.NET_OUID
join w in
(from d in db.IDB_InterfaceIPv4s
select new { d.IDB_InterfaceIPv4_ID, d.IDB_InterfaceLayer3_ID, d.IDB_AddressMapIPv4_ID, d.IDB_AddressMapIPv4.IDB_AddressIPv4.Address })
on v.IDB_InterfaceLayer3_ID equals w.IDB_InterfaceLayer3_ID
join h in db.NET_Enclaves on c.NET_EnclaveID equals h.NET_EnclaveID into enclaveLeftJoin
from i in enclaveLeftJoin.DefaultIfEmpty()
join m in
(from z in db.IDB_StandbyIPv4s
select new
{
z.IDB_InterfaceIPv4_ID,
z.IDB_AddressMapIPv4_ID,
z.IDB_AddressMapIPv4.IDB_AddressIPv4.Address,
z.Preempt,
z.Priority
})
on w.IDB_InterfaceIPv4_ID equals m.IDB_InterfaceIPv4_ID into standbyLeftJoin
from k in standbyLeftJoin.DefaultIfEmpty()
where t.ART_Asset.Decommissioned == false
select new NetIDBGridDataResults
{
DeviceName = u.Host,
Host = u.Host,
Interface = t.Interface,
IPAddress = w.Address,
ACLIn = v.InboundACL,
ACLOut = v.OutboundACL,
VirtualAddress = k.Address,
VirtualPriority = k.Priority,
VirtualPreempt = k.Preempt,
InterfaceDescription = t.Description,
Enclave = i.Enclave
};
As a rule (and this is very general), you want an index on:
JOIN fields (both sides)
Common WHERE filter fields
Possibly fields you aggregate
For this query, start with checking your JOIN criteria. Any one of those missing will force a table scan which is a big hit.
Looking at the execution plan of the query it seems like the outer joins are killing the query.
This query only returns 350k results, but it takes almost 30 seconds to complete. I don't know
much about DB's, but I don't think this is good? Perhaps I'm wrong?
A man has got to do waht a mana has got to do.
The joins may kill you, but when you need them YOU NEED THEM. Some tasks take long.
Make sure you ahve all indices you need.
Make sure your sql server is not a sad joke hardware wise.
All you can do.
I woudl bet someone has no clue about SQL and needs to be enlighted to the power of indices.