Join tables by timestamp and date columns? - abap

I should retrieve data from two log tables (BALHDR and ZIF_LOG_XML_CONTENT). My problem is that the only commonality between the log tables is the time when the entries were created. The query has to work for a PERIOD and not for a TIME POINT.
However, the time for the entries is not stored in the same format in the two tables. In ZIF_LOG_XML_CONTENT it is stored in one column TIMESTAMP in the other log table in the BALHDR it is stored in two columns, where DATE and TIME are stored separately.
I tried to transform all the times to STRING, but still not working...
What am I doing wrong?
DATA: GV_DATEANDTIMETO TYPE STRING,
GV_DATETO TYPE STRING,
GV_TIMETO TYPE STRING,
GV_DATEANDTIMEFROM TYPE STRING,
GV_DATEFROM TYPE STRING,
GV_TIMEFROM TYPE STRING,
GV_DATUM TYPE STRING.
SELECT * FROM BALHDR INTO #GS_MSG_STRUKT WHERE
EXTNUMBER = #P_EXTID AND
OBJECT = #P_OBJ AND
SUBOBJECT = #P_SUBOBJ AND
ALUSER = #P_USER AND
( ALDATE_BALHDR >= #GV_INPUT_DATETO AND ALTIME_BALHDR >= #GV__INPUT_TIMETO ) AND
( ALDATE_BALHDR <= #GV_INPUT_DATEFROM AND ALTIME_BALHDR <= #GV__INPUT_TIMEFROM ) AND
MSG_CNT_E >= 1 OR MSG_CNT_AL IS ZERO.
concatenate GS_MSGTABLE-DATE GS_MSGTABLE-TIME into GV_DATUM.
SELECT RES_CONTENT, REQ_CONTENT
FROM zif_log_content
INTO #GS_MSG_STRUKT
WHERE TIMESTAMP >= #Gv_date AND TIMESTAMP <= #Gv_date.
ENDSELECT.
ENDSELECT.

Concatenating works, you just need to pass timestamp into your SELECT, not string.
Here is a working simplified example based on standard BALHDR and MBEW tables:
TYPES: BEGIN OF struct,
lognumber TYPE balhdr-lognumber,
aldate TYPE balhdr-aldate,
altime TYPE balhdr-altime,
timestamp TYPE mbew-timestamp,
END OF struct.
DATA: gs_msg_strukt TYPE struct.
DATA: gt_msg_strukt TYPE TABLE OF struct.
SELECT *
FROM balhdr
INTO CORRESPONDING FIELDS OF #gs_msg_strukt
WHERE aldate >= #gv_input_dateto AND altime <= #gv_input_timeto.
CONCATENATE gs_msg_strukt-aldate gs_msg_strukt-altime INTO gv_datum.
DATA(gv_date) = CONV timestamp( gv_datum ).
SELECT timestamp
FROM mbew
INTO CORRESPONDING FIELDS OF #gs_msg_strukt
WHERE timestamp >= #gv_date AND timestamp <= #gv_date.
ENDSELECT.
APPEND gs_msg_strukt TO gt_msg_strukt. "<---move APPEND here
ENDSELECT.

This won't work as the TIMESTAMP in SAP is a decimal type and does not equal to a concatenation of the date and time in any way.
You should create your time stamp using the following sentence.
CONVERT DATE gs_msgtable-date TIME gs_msgtable-time INTO TIME STAMP DATA(gv_timestamp) TIME ZONE sy-zonlo.
Be careful also with the time zone. I do not know in which time zone your entries in Z-table are. In the BAL table they should be stored in UTC. Be sure to check it before.

QUESTION
I haven t got a fully working minimal example yet, but I can give you an example for the the two tables, which I would like to join together. The third table shows the wanted result. THX
-----------------------------------------------------------------------------
BALHDR
----------------------------------------------------------------------------
| EXTNUMBER| DATE | TIME |OBJECT|SUBOBJECT| USER|MSG_ALL |MSG_ERROR
|---------------------------------------------------------------------------
A| 1236 |2000.10.10 |12:33:24 |KAT |LEK |NEK | NULL | NULL
B| 1936 |2010.02.20 |02:33:44 |KAT |MOK |NEK | 3 | 1
C| 1466 |2010.10.10 |11:35:34 |KAT |LEK |NEK | 2 | 0
D| 1156 |2011.08.03 |02:13:14 |KAT |MOK |NEK | 3 | 0
E| 1466 |2014.10.10 |11:35:34 |KAT |LEK |NEK | NULL | NULL
F| 1156 |2019.08.03 |02:13:14 |KAT |MOK |NEK | 1 | 1
----------------------------------------------------------------------------
ZIF_LOG
-----------------------------------------------------------------------------
| TIMESTAMP | REQ | RES
|---------------------------------------------------------------------------
1| 20100220023344 | he |hello
2| 20101010113534 | bla |blala
3| 20110803021314 | to |toto
4| 20190803021314 | macs |ka
The following table shows the wanted result. The nummbers from 1 to 4 and the letters from A to F are to helps to understand how the fields would correspond with each other.
-----------------------------------------------------------------------------
WANTED RESULT TABLE
----------------------------------------------------------------------------
|EXTNUMBER| DATE | TIME |OBJECT|SUBOBJECT | USER| REQ | RES
----------------------------------------------------------------------------
A| 1236 |2000.10.10 |12:33:24 |KAT |LEK |NEK | NULL | NULL
B2| 1936 |2010.02.20 |02:33:44 |KAT |MOK |NEK | he | hello
E | 1466 |2014.10.10 |11:35:34 |KAT |LEK |NEK | NULL | NULL
F6| 1156 |2019.08.03 |02:13:14 |KAT |MOK |NEK | macs | ka
THX

Related

PostgreSQL Compare value from row to value in next row (different column)

I have a table of encounters called user_dates that is ordered by 'user' and 'start' like below. I want to create a column indicating whether an encounter was followed up by another encounter within 30 days. So basically I want to go row by row checking if "encounter_stop" is within 30 days of "encounter_start" in the following row (as long as the following row is the same user).
user | encounter_start | encounter_stop
A | 4-16-1989 | 4-20-1989
A | 4-24-1989 | 5-1-1989
A | 6-14-1993 | 6-27-1993
A | 12-24-1999 | 1-2-2000
A | 1-19-2000 | 1-24-2000
B | 2-2-2000 | 2-7-2000
B | 5-27-2001 | 6-4-2001
I want a table like this:
user | encounter_start | encounter_stop | subsequent_encounter_within_30_days
A | 4-16-1989 | 4-20-1989 | 1
A | 4-24-1989 | 5-1-1989 | 0
A | 6-14-1993 | 6-27-1993 | 0
A | 12-24-1999 | 1-2-2000 | 1
A | 1-19-2000 | 1-24-2000 | 0
B | 2-2-2000 | 2-7-2000 | 1
B | 5-27-2001 | 6-4-2001 | 0
You can select..., exists <select ... criteria>, that would return a boolean (always true or false) but if really want 1 or 0 just cast the result to integer: true=>1 and false=>0. See Demo
select ts1.user_id
, ts1.encounter_start
, ts1. encounter_stop
, (exists ( select null
from test_set ts2
where ts1.user_id = ts2.user_id
and ts2.encounter_start
between ts1.encounter_stop
and (ts1.encounter_stop + interval '30 days')::date
)::integer
) subsequent_encounter_within_30_days
from test_set ts1
order by user_id, encounter_start;
Difference: The above (and demo) disagree with your expected result:
B | 2-2-2000 | 2-7-2000| 1
subsequent_encounter (last column) should be 0. This entry starts and ends in Feb 2000, the other B entry starts In May 2001. Please explain how these are within 30 days (other than just a simple typo that is).
Caution: Do not use user as a column name. It is both a Postgres and SQL Standard reserved word. You can sometimes get away with it or double quote it. If you double quote it you MUST always do so. The big problem being it has a predefined meaning (run select user;) and if you forget to double quote is does not necessary produce an error or exception; it is much worse - wrong results.

How to find an Informix DATETIME field qualifier in an existing schema

I have a table like this:
create table t (
t0 datetime year to fraction,
t1 datetime year to fraction(1),
t2 datetime year to fraction(2),
t3 datetime year to fraction(3),
t4 datetime year to fraction(4)
);
Now I'd like to reverse engineer this table's data type information. I'm mostly interested in the fractional seconds part, but if I can find the other qualifier information, even better. The following query doesn't work:
select
c.colname::varchar(10) colname,
informix.schema_coltypename(c.coltype, c.extended_id)::varchar(10) coltypename,
c.collength,
informix.schema_precision(c.coltype, c.extended_id, c.collength) precision,
informix.schema_numscale(c.coltype, c.collength) numscale,
informix.schema_datetype(c.coltype, c.collength) datetype,
c.coltype
from syscolumns c
join systables t on c.tabid = t.tabid
where t.tabname = 't'
It yields
|colname |coltypename|collength|precision |numscale |datetype |coltype|
|----------|-----------|---------|-----------|-----------|-----------|-------|
|t0 |DATETIME |4365 |4365 | |60 |10 |
|t1 |DATETIME |3851 |3851 | |60 |10 |
|t2 |DATETIME |4108 |4108 | |60 |10 |
|t3 |DATETIME |4365 |4365 | |60 |10 |
|t4 |DATETIME |4622 |4622 | |60 |10 |
The collength seems to contain the relevant information, but I cannot extract it with schema_precision or schema_numscale as is otherwise possible for numeric precisions. Also, schema_datetype yields no interesting results.
How can I reverse engineer the coltype information back to datetime year to fraction(N)?
Based on documentation Time data types:
For columns of type DATETIME or INTERVAL, collength is determined using the following formula:
(length * 256) + (first_qualifier * 16) + last_qualifier
The length is the physical length of the DATETIME or INTERVAL field, and first_qualifier and last_qualifier have values that the following table shows.
+------------------+--------+------------------+-------+
| Field qualifier | Value | Field qualifier | Value |
+------------------+--------+------------------+-------+
| YEAR | 0 | FRACTION(1) | 11 |
| MONTH | 2 | FRACTION(2) | 12 |
| DAY | 4 | FRACTION(3) | 13 |
| HOUR | 6 | FRACTION(4) | 14 |
| MINUTE | 8 | FRACTION(5) | 15 |
| SECOND | 10 | | |
+------------------+--------+------------------+-------+
Calculation(hex value to easier spot the pattern):
t1 datetime year to fraction(1), 15*256 + 0*16+11 = 3851 0x0F0B
t2 datetime year to fraction(2), 16*256 + 0*16+12 = 4108 0x100C
t3 datetime year to fraction(3), 17*256 + 0*16+13 = 4365 0x110D
t4 datetime year to fraction(4), 18*256 + 0*16+14 = 4622 0x120E
If the length is known then it is possible to reverse engineer it even using "brute force".
Lookup:
WITH l(v) AS (
VALUES (12),(13),(14),(15),(16),(17),(18)
), first_q(v, first_qualifier) AS (
VALUES (0,'YEAR'),(2,'MONTH'),(4,'DAY'),(6,'HOUR'),(8,'MINUTE'),(10, 'SECOND')
), last_q(v, last_qualifier) AS (
VALUES (11, 'FRACTION(0)'),(12, 'FRACTION(1)'),(13, 'FRACTION(2)'),
(14, 'FRACTION(3)'),(15, 'FRACTION(4)')
), result AS (
SELECT l.v * 256 + (first_q.v * 256) + last_q.v AS collen, *
FROM l CROSS JOIN first_q CROSS JOIN last_q
)
SELECT *
FROM result
--WHERE collen = 3851
db<>fiddle demo

Spark: how to perform loop fuction to dataframes

I have two dataframes as below, I'm trying to search the second df using the foreign key, and then generate a new data frame. I was thinking of doing a spark.sql("""select history.value as previous_year 1 from df1, history where df1.key=history.key and history.date=add_months($currentdate,-1*12)""" but then I need to do it multiple times for say 10 previous_years. and join them back together. How can I create a function for this? Many thanks. Quite new here.
dataframe one:
+---+---+-----------+
|key|val| date |
+---+---+-----------+
| 1|100| 2018-04-16|
| 2|200| 2018-04-16|
+---+---+-----------+
dataframe two : historical data
+---+---+-----------+
|key|val| date |
+---+---+-----------+
| 1|10 | 2017-04-16|
| 1|20 | 2016-04-16|
+---+---+-----------+
The result I want to generate is
+---+----------+-----------------+-----------------+
|key|date | previous_year_1 | previous_year_2 |
+---+----------+-----------------+-----------------+
| 1|2018-04-16| 10 | 20 |
| 2|null | null | null |
+---+----------+-----------------+-----------------+
To solve this, the following approach can be applied:
1) Join the two dataframes by key.
2) Filter out all the rows where previous dates are not exactly years before reference dates.
3) Calculate the years difference for the row and put the value in a dedicated column.
4) Pivot the DataFrame around the column calculated in the previous step and aggregate on the value of the respective year.
private def generateWhereForPreviousYears(nbYears: Int): Column =
(-1 to -nbYears by -1) // loop on each backwards year value
.map(yearsBack =>
/*
* Each year back count number is transformed in an expression
* to be included into the WHERE clause.
* This is equivalent to "history.date=add_months($currentdate,-1*12)"
* in your comment in the question.
*/
add_months($"df1.date", 12 * yearsBack) === $"df2.date"
)
/*
The previous .map call produces a sequence of Column expressions,
we need to concatenate them with "or" in order to obtain
a single Spark Column reference. .reduce() function is most
appropriate here.
*/
.reduce(_ or _) or $"df2.date".isNull // the last "or" is added to include empty lines in the result.
val nbYearsBack = 3
val result = sourceDf1.as("df1")
.join(sourceDf2.as("df2"), $"df1.key" === $"df2.key", "left")
.where(generateWhereForPreviousYears(nbYearsBack))
.withColumn("diff_years", concat(lit("previous_year_"), year($"df1.date") - year($"df2.date")))
.groupBy($"df1.key", $"df1.date")
.pivot("diff_years")
.agg(first($"df2.value"))
.drop("null") // drop the unwanted extra column with null values
The output is:
+---+----------+---------------+---------------+
|key|date |previous_year_1|previous_year_2|
+---+----------+---------------+---------------+
|1 |2018-04-16|10 |20 |
|2 |2018-04-16|null |null |
+---+----------+---------------+---------------+
Let me "read through the lines" and give you a "similar" solution to what you are asking:
val df1Pivot = df1.groupBy("key").pivot("date").agg(max("val"))
val df2Pivot = df2.groupBy("key").pivot("date").agg(max("val"))
val result = df1Pivot.join(df2Pivot, Seq("key"), "left")
result.show
+---+----------+----------+----------+
|key|2018-04-16|2016-04-16|2017-04-16|
+---+----------+----------+----------+
| 1| 100| 20| 10|
| 2| 200| null| null|
+---+----------+----------+----------+
Feel free to manipulate the data a bit if you really need to change the column names.
Or even better:
df1.union(df2).groupBy("key").pivot("date").agg(max("val")).show
+---+----------+----------+----------+
|key|2016-04-16|2017-04-16|2018-04-16|
+---+----------+----------+----------+
| 1| 20| 10| 100|
| 2| null| null| 200|
+---+----------+----------+----------+

Oracle SQL condition in range of dates

I need a Oracle SQL query to get all the rows that respect this condition:
I have a table in which there are products with a start date of validity and an end date of validity. In input I have a range of date (ex. 20170530 and 20170630). I would get all the products that are valid in the given range. Thank you
Edit:
You are right, I try to be more clear with an example.
I have a table PRODUCTS in which I have two fields: START_DATE and END_DATE (yyyymmdd)
PRODUCTS
----------------------------
|id | start_date | end_date |
----------------------------
|1 | 20170101 | 20171230 |
|2 | 20170501 | 20170705 |
|3 | 20170101 | 20170501 |
|4 | 20170601 | 20170620 |
|5 | 20171010 | 20171110 |
|6 | 20170110 | 20170610 |
I would to extract all the products that are valid in the range 20170530-20170630. It means that the validity of the product must be in the given range 20170530-20170630.
So, from the table above, i will extract products with id
1
2
4
6
Thank you
** SOLVED Edit 2 **
Ok, what I wanted is to get rows in which the dates overlap the input range of data given as parameter. To do so, there is a simple query:
(StartDate1 <= EndDate2) and (StartDate2 <= EndDate1)
Your question is not clear, but here is my interpretation of it. You have a table such as this:
Figure 1: My Product Table
If you want all products that are valid for the range: 09/07/2017 to 11/07/2017 then you would expect ITEM 1 and ITEM 2 to be returned. The SQL Query would look something like this:
SELECT *
FROM MY_PRODUCT_TABLE
WHERE MY_START_DATE BETWEEN START_DATE AND END_DATE
AND MY_END_DATE BETWEEN START_DATE AND END_DATE
Remember the BETWEEN function is inclusive, meaning it takes values between the START_DATE and END_DATE into consideration as well.
Note: If you are using string variables as input, it would be wise to use the TO_DATE function (i.e. TO_DATE (MY_START_DATE, ‘DD.MM.YYYY’) etc. depending on format entered.

Django returns wrong results when selecting from a postgres view

I have a view defined in postgres, in a separate schema to the data it is using.
It contains three columns:
mydb=# \d "my_views"."results"
View "my_views.results"
Column | Type | Modifiers
-----------+-----------------------+-----------
Date | date |
Something | character varying(60) |
Result | numeric |
When I query it from psql or adminer, I get results like theese:
bb_adminpanel=# select * from "my_views"."results";
Date | Something | Result
------------+-----------------------------+--------------
2015-09-14 | Foo | -3.36000000
2015-09-14 | Bar | -16.34000000
2015-09-12 | Foo | -11.55000000
2015-09-12 | Bar | 11.76000000
2015-09-11 | Bar | 2.48000000
However, querying it through django, I get a different set:
(c is a cursor object on the database)
c.execute('SELECT * from "my_views"."results"')
c.fetchall()
[(datetime.date(2015, 9, 14), 'foo', Decimal('-3.36000000')),
(datetime.date(2015, 9, 14), 'bar', Decimal('-16.34000000')),
(datetime.date(2015, 9, 11), 'foo', Decimal('-11.55000000')),
(datetime.date(2015, 9, 11), 'bar', Decimal('14.24000000'))]
Which doesn't match at all - the first two rows are correct, but the last two are really weird - they have a shifted date, and the Result of the last record is the sum of the last two.
I have no idea why that's happening, any suggestions welcome.
Here is the view definition:
SELECT a."Timestamp"::date AS "Date",
a."Something",
sum(a."x") AS "Result"
FROM my_views.another_view a
WHERE a.status::text = ANY (ARRAY['DONE'::character varying::text, 'CLOSED'::character varying::text])
GROUP BY a."Timestamp"::date, a."Something"
ORDER BY a."Timestamp"::date DESC;
and "another_view" looks like this:
Column | Type | Modifiers
---------------------------+--------------------------+-----------
Timestamp | timestamp with time zone |
Something | character varying(60) |
x | numeric |
status | character varying(100) |
(some columns ommited)
Simple explanation of problem is: timezones.
Detailed: you're not declaring any timezone setting when connecting to PostgreSQL console, but django does it on each query. That way,the timestamp for some records will point to different day depending on used timezone, for example with data
+-------------------------+-----------+-------+--------+
| timestamp | something | x | status |
+-------------------------+-----------+-------+--------+
| 2015-09-11 12:00:00 UTC | foo | 2.48 | DONE |
| 2015-09-12 00:50:00 UTC | foo | 11.76 | DONE |
+-------------------------+-----------+-------+--------+
query on your view executed with timezone UTC will give you 2 rows, but query executed with timezone GMT-2 will give you only one row. because in GMT-2 timezone timestamp from second row is still in day 2015-09-11.
To fix that, you can edit your view, so it will always group days according to specified timezone:
SELECT (a."Timestamp" AT TIME ZONE 'UTC')::date AS "Date",
a."Something",
sum(a."x") AS "Result"
FROM my_views.another_view a
WHERE a.status::text = ANY (ARRAY['DONE'::character varying::text, 'CLOSED'::character varying::text])
GROUP BY (a."Timestamp" AT TIME ZONE 'UTC'), a."Something"
ORDER BY (a."Timestamp" AT TIME ZONE 'UTC') DESC;
That way days will be always counted according to 'UTC' timezone.