I am trying to fill column D and column E.
Column A: varchar(64) - unique for each trip
Column B: smallint
Column C: timestamp without time zone (excel messed it up in the
image below but you can assume this as timestamp column)
Column D: numeric - need to find out time from origin in minutes
column E: numeric - time to destination in minutes.
Each trip has got different intermediate stations and I am trying to figure out the time it has been since origin and time to destination
Cell D2 = C2 - C2 = 0
cell D3 = C3 - C2
Cell D4 = C4 - C2
Cell E2 = E6 - E2
Cell E3 = E6 - E3
Cell E6 = E6 - E6 = 0
The main issue is that each trip contains differnt number of stations for each trip_id. I can think about using partition by column but cant figure out how to implement it.
Another sub question: I am dealing with very large table (100 million rows). What is the best way Postgresql experts implement data modification. Do you create like a sample table from the original data and implement everything on the sample before implementing the modifications on the original table or do you use something like "Begin trasaction" on the original data so that you can rollback in case of any error.
PS: Help with question title appreciated.
you don't need to know the number of stops
with a as (select *,extract(minutes from c - min(c) over (partition by a)) dd,extract(minutes from max(c) over (partition by a) - c) ee from td)
update td set d=dd, e=ee
from a
where a.a = td.a and a.b=td.b
;
http://sqlfiddle.com/#!17/c9112/1
Related
I'm no expert in MSAS Cube so may be this is obvious, but this is blocking an important feature in our team.
We have a fact table of "Indicators" (basicaly values from a calculator), that are computed for a specific date. indicators have a versionId, to group them following a functional rule.
It goes like :
From Date, Value, NodeId, VersionId
D0 - 1.45 - N2 - V0
We have a fact table of "VersionsAssociation" that lists all the versions (the very same versions as the ones in the "Indicator" fact table) that are valid and visible and for what date.
To fit with a customer need, some versions are visible at multiple dates.
For instance, a version computed for date D0, may be visible/recopied for date D1, D2, ...; so for a specific version V0, we would have in "VersionAssociation" :
VersionId , Date From (computed), Date To (Visible at what date)
V0 - D0 - D0
V0 - D0 - D1
V0 - D0 - D2
V0 - D0 - D3
...
In our cube model, "Indicators" facts have a "From Date", the date they are compute for, but no "To Date", because when they are visible is not up to the indicator, but rather decided by the "VersionAssociation".
The means that in our "Dimension Usage" panel, we have a many-to-many relation from "Indicator" pointing to "VersionAssociation" on the dimension "To Date".
So far, this part works as expected. When we select "To Date" = D1 in Excel, we see indicators recopied from D0, with right values (no duplicate).
Then we have a thing called projection, where we split an indicator value alongside a specific dimension. For that we have a third measure group called "Projection", with values called "Weight".
Weights have a "To Date", because the weight are computed for a specific date, and even if an indicator is copied from D0 into D1, when projected, it is projected using D1 Weights.
Also we duplicate the weight regarding all the available from date, that's strange, but without it, the result are pure chaos.
Meaning we would have in the weights:
NodeId,From Date, To Date, Projection Axis, Weight
N2 , D0 , D0 , P1 , 0.75
N2 , D0 , D0 , P2 , 0.25 (a value on node N2 would be split into 2 different values, where the sum is still the same)
N2 , D0 , D1 , P1 , 0.70
N2 , D0 , D1 , P2 , 0.30
Here goes the issue:
The Measure Group "Projection" and "Indicator" are directly linked to the dimension "Projection".
"Projection" has a direct link to the "From Date" and the "To Date" dimension.
"Indicator" has a direct link to the "From Date" dimension, but only a m2m reference to the "To Date" dimension, through the "Version Association" measure group.
To apply the Projection weights, we use a measure expression on the mesures from the "Indicator" Measure group, having something like "[Value Unit] * [Weight]".
Because of reasons, this causes MSAS to not properly disciminate the weight that are eligible to apply to a certain value in the "Indicator" measure group.
For instance, if we look into excel and ask for the D1 date (same behavior for all date), on the Projection Axsi P1 we got :
Value Weight
1.45 * 0.75 (Weight: From Date D0, To Date D0, P1)
+ 1.45 * 0.70 (Weight: From Date D0, To Date D1, P1)
for D1 and P2 we have :
Value Weight
1.45 * 0.25 (Weight: From Date D0, To Date D0, P2)
+ 1.45 * 0.30 (Weight: From Date D0, To Date D1, P2)
This cause the values to mean nothing and be non readable.
So what all of this is for, is to ask for a way to limit the weights that can be applied in the measure expression. We tried to use scope on "From Date" , "To Date" with the "Weight" measure or the "Value" measure, but the cube never step in our SCOPE instructions.
This is very long, and complicated, but we're stuck.
I am not sure that I understoond your problem completely, but what I understood is that since there is no projection axis in the fact Indicator, hence for a similar FromDate and ToDate, when Projection is selected they repeat values.
example from your data
D0 , D0 , P1 , 0.75
D0 , D0 , P2 , 0.25
for this the indicator value is repeated 1.45 for both rows where as it should be 1.45*0.75 for the first row and 1.45*0.25 for the second.
If this is the issue try the below query
with member Measures.IndicatorTest
as
([DimFromDate].[FromDate].CurrentMember,
[DimToDate].[ToDate].CurrentMember,
[Value Unit])
member Measures.ProjectionTest
as
([DimFromDate].[FromDate].CurrentMember,
[DimToDate].[ToDate].CurrentMember,
[DimProjection].[Projection].CurrentMember
[Weight])
member Measures.WeightedIndicator
as
Measures.IndicatorTest*Measures.ProjectionTest
select Measures.WeightedIndicator
on columns,
nonempty
(
[DimFromDate].[FromDate].[FromDate],
[DimToDate].[ToDate].[ToDate],
[DimProjection].[Projection].[Projection]
)
on rows
from yourCube
For closure, as it turns out the behavior expected is not possible (as far as out team tried). so we reverted to merging two of the 3 tables together, and ahving only one many-to-many join in the measure groups.
Today I am trying to figure out the best way to create a solution to my problem.
I am trying to generate a due date(column J). This due date is based off of another date(column N), so starting with that date I need to check for a priority level (column L), of which there are 4 different values. Levels of priority: 2,3,4, or 5. Priorities are column K. Then I need to check column C for the first two letters of the string. There are 3 different options that can come up in column C such as DR and SR and A4 but A4 can be ignored all together. Below are the formula's for DR and SR
DR'S --------------------------------------------------------
2A (or B) - Column N + 29 = Column J(The solution)
=DATE(YEAR(N29)+0,MONTH(N29)+0,DAY(N29)+29)
3A (or B) - Column N + 89 = cColumn J(The solution)
=DATE(YEAR(N29)+0,MONTH(N29)+0,DAY(N29)+89)
4A (or B) - Column N + 179 = Column J(The solution)
=DATE(YEAR(N29)+0,MONTH(N29)+0,DAY(N29)+179)
5A (or B) - Column N + 364 = Column J(The solution)
=DATE(YEAR(N29)+0,MONTH(N29)+0,DAY(N29)+364)
SR'S -----------------------------------------------------------
2A (or B) - Column N + 89 = Column J(The solution)
=DATE(YEAR(N29)+0,MONTH(N29)+0,DAY(N29)+89)
3A (or B) - Column N + 179 = Column J(The solution)
=DATE(YEAR(N29)+0,MONTH(N29)+0,DAY(N29)+179)
4A (or B) - Column N + 269 = Column J(The solution)
=DATE(YEAR(N29)+0,MONTH(N29)+0,DAY(N29)+279)
5A (or B) - Column N + 364 = Column J(The solution)
=DATE(YEAR(N29)+0,MONTH(N29)+0,DAY(N29)+364)
I was hoping to get a nudge in the right direction, and some insight of the best way to implement this.
Try,
=m29+if(left(c29, 2)="DR", choose(left(k29)-1, 29, 89, 179, 364), if(left(c29, 2)="SR", choose(left(k29)-1, 89, 179, 279, 364), 0))
May need to assert the target cell's formatting as a date otherwise you may receive an answer like 43,089.
I am seeking to combine the results of two columns, and view it in a single column:
select description1, description2 from daclog where description2 is not null;
results two registry:
1st row:
DESCRIPTION1
Initialization scans sent to RTU 1, 32 bit mask: 0x00000048. Initialization mask bits are as follows: B0 - status dump, B1 - analog dump B2 - accumulator dump, B3 - Group Data Dump, B4 - accumulat
(here begin DESCRIPTION2)
,or freeze, B5 - power fail reset, B6 - time sync.
2nd row:
DESCRIPTION1
Initialization scans sent to RTU 1, 32 bit mask: 0x00000048. Initialization mask bits are as follows: B0 - status dump, B1 - analog dump B2 - accumulator dump, B3 - Group Data Dump, B4 - accumulat
(here begin DESCRIPTION2)
,or freeze, B5 - power fail reset, B6 - time sync.
Then I need the value of description1 and description2, on the same column.
It is possible?
Thank you!
You can combine two columns into one by using || operator.
select description1 || description2 as description from daclog where description2 is not null;
If you would like to use some substrings from each of the descriptions, you can use String functions and then combine the results. FNC(description1) || FNC(descriptions2) where FNC might be a function to return the desired substring of your columns.
everyone
I would like to ask community of help to find a way of how to cache our huge plain table by splitting it to the multiple hashes or otherwise.
The sample of table, as an example for structure:
A1 B1 C1 D1 E1 X1
A1 B1 C1 D1 E1 X2
A7 B5 C2 D1 E2 X3
A8 B1 C1 D1 E2 X4
A1 B6 C3 D2 E2 X5
A1 B1 C1 D2 E1 X6
This our denormalized data, we don't have any ability to normalize it.
So currently we must perform 'group by' to get required items, for instance to get all D* we perform 'data.GroupBy(A1).GroupBy(B1).GroupBy(C1)' and it takes a lot of time.
Temporarily we had found workaround for this by creating composite a string key:
A1 -> 'list of lines begin A1'
A1:B1 -> 'list of lines begin A1:B1'
A1:B1:C1 -> 'list of lines begin A1:B1:C1'
...
as a cache of results of grouping operations.
The question is how it can be stored efficiently?
Estimated number of lines in denormalized data around 10M records and as in my an example there is 6 columns it will be 60M entries in hash. So, I'm looking an approach to lookup values in O(N) if it's possible.
Thanks.
Most files I am working with only have the following fields:
F00001 - usually 1 (f1) or 9 (f9)
K00001 - usually only 1-3 sub-fields of
zoned decimals and ebcdic
F00002 - sub-fields of ebcdic, zoned and
packed decimals
Occasionally other field names K00002, F00003 and F00004 will appear in cross reference files.
Example Data:
+---------+--------------------------------------------------+--------------------------------------------------------------------------------------------+
| F00001 | K00001 | F00002 |
+---------+--------------------------------------------------+--------------------------------------------------------------------------------------------+
| f1 | f0 f0 f0 f0 f1 f2 f3 f4 f5 f6 d7 c8 | e2 e3 c1 c3 d2 d6 e5 c5 d9 c6 d3 d6 e7 40 12 34 56 7F e2 d2 c5 c5 e3 |
+---------+--------------------------------------------------+--------------------------------------------------------------------------------------------+
Currently using:
SELECT SUBSTR(HEX(F00001), 1, 2) AS FNAME_1, SUBSTR(HEX(K00001), 1, 14) AS KNAME_1, SUBSTR(HEX(K00001), 15, 2) AS KNAME_2, SUBSTR(HEX(K00001), 17, 2) AS KNAME_2, SUBSTR(HEX(F00002), 1, 28) AS FNAME_2, SUBSTR(HEX(F00002), 29, 8) AS FNAME_3, SUBSTR(HEX(F00002), 37, 10) AS FNAME_4 FROM QS36F.FILE
Is this the best way to unpack EBCDIC values as strings?
You asked for 'the best way'. Manually fiddling the bytes is categorically NOT the best way. #JamesA has a better answer: Externally describe the table and use more traditional SQL to access it. I see in your comments that you have multiple layouts within the same table. This was typical years ago when we converted from punched cards to disk. I feel your pain, having experienced this many times.
If you are using SQL to run queries, I think you have several options, all of which revolve around having a sane DB2 table instead of a jumbled S/36 flat file. Without more details on the business problem, all we can do is offer suggestions.
1) Add a trigger to QS36F.FILE that will break out the intermingled records into separate SQL defined tables. Query those.
2) Write some UDFs that will pack and unpack numbers. If you're querying today, you'll be updating tomorrow and if you think you have some chance of maintaining the raw HEX(this) and HEX(that) for SELECTS, wait until you try to do an UPDATE that way.
3) Write stored procedures that will extract out the bits you need for a given query, put them into SQL tables - maybe even a GLOBAL TEMPORARY TABLE. Have the SP query those bits and return a result set that can be consumed by other SQL queries. IBM i supports user defined table functions as well.
4) Have the RPG team write you a conversion program that will read the old file and create a data warehouse that you can query against.
It almost looks as if old S/36 files are being accessed and the system runs under CCSID 65535. That could cause the messy "hex"representation issue as well as at least some of the column name issues. A little more info about the server environment would be useful.