I am using Spark SQL to process the following dataset, so it can fit a marketing attribution model:
| user_ID | timestamp | activity | campaign | event_name
------ --------- ------ -------- ------
| akalsds124 | 2021-12-31 10:00 | click | Holidays Campaign | NULL
| akalsds124 | 2022-03-01 16:32 | click | Super Campaign | NULL
| akalsds124 | 2022-04-27 20:55 | event | NULL | purchase
| akalsds124 | 2022-05-10 10:21 | event | NULL | purchase
| akalsds124 | 2022-06-25 09:22 | click | IG 3 Campaign | NULL
| akalsds124 | 2022-07-07 15:00 | event | NULL | purchase
| ijnbmshs33 | 2022-05-02 10:31 | click | New Campaign | NULL
| ijnbmshs33 | 2022-07-04 17:01 | click | Mega Campaign | NULL
A click activity is an ad click made by the user and an event is an interaction inside the app (e.g. a purchase, login, etc).
To create the table above, you can use this code:
df=spark.createDataFrame(
[('akalsds124','2021-12-31 10:00','click','Holidays Campaign','NULL'),
('akalsds124','2022-03-01 16:32','click','Super Campaign','NULL'),
('akalsds124','2022-04-27 20:55','event','NULL','purchase'),
('akalsds124','2022-05-10 10:21','event','NULL', 'purchase'),
('akalsds124','2022-06-25 09:22','click','IG 3 Campaign','NULL'),
('akalsds124','2022-07-07 15:00','event','NULL','purchase'),
('ijnbmshs33','2022-05-02 10:31','click','New Campaign','NULL'),
('ijnbmshs33','2022-07-04 17:01','click','Mega Campaign','NULL')],
['user_id','timestamp','activity','campaign','event_name']
)
I need to create a path with each user's campaign touchpoints inside a list. When a user purchases a product, a new path must be created for his/her next touchpoints.
Also, I need a column named 'converted' with boolean results (1 if the path led to a purchase and 0 if it did not lead to a conversion), and another one (total_conversions) with the total n° of purchases per path.
The expected output should be like this:
| user_ID | path | converted | total_conversions
----- ------ ----- -------
| akalsds124 | [Holidays Campaign,Super Campaign] | 1 | 2
| akalsds124 | [IG Campaign] | 1 | 1
| ijnbmshs33 | [New Campaign,Mega Campaign] | 0 | 0
Starting from the dataset you created, here is what i've done :
data preparation
from pyspark.sql import functions as F, Window as W
df = df.withColumn(
"event_name", F.when(F.col("event_name") == "purchase", 1).otherwise(0)
)
df = df.withColumn(
"rnk", F.lag("event_name").over(W.partitionBy("user_id").orderBy("timestamp"))
)
df = df.withColumn(
"rnk", F.when((F.col("rnk") == 1) & (F.col("event_name") != 1), 1).otherwise(0)
)
df = df.withColumn(
"rnk", F.sum("rnk").over(W.partitionBy("user_id").orderBy("timestamp"))
)
aggregation
df = df.groupBy("user_id", "rnk").agg(
F.collect_set("campaign").alias("path"),
F.max("event_name").alias("converted"),
F.sum("event_name").alias("total_conversions"),
)
Result
df.show()
+----------+---+--------------------+---------+-----------------+
| user_id|rnk| path|converted|total_conversions|
+----------+---+--------------------+---------+-----------------+
|akalsds124| 0|[Super Campaign, ...| 1| 2|
|akalsds124| 1|[NULL, IG 3 Campa...| 1| 1|
|ijnbmshs33| 0|[Mega Campaign, N...| 0| 0|
+----------+---+--------------------+---------+-----------------+
I apologise in advance because I have no idea how to structure this question.
I have the following tables:
Sessions:
+----------+---------+
| login | host |
+----------+---------+
| breilly | node001 |
+----------+---------+
| pparker | node003 |
+----------+---------+
| jjameson | node004 |
+----------+---------+
| jjameson | node012 |
+----------+---------+
Userlist:
+----------+----------------+------------------+
| login | primary_server | secondary_server |
+----------+----------------+------------------+
| breilly | node001 | node010 |
+----------+----------------+------------------+
| pparker | node002 | node003 |
+----------+----------------+------------------+
| jjameson | node003 | node004 |
+----------+----------------+------------------+
What kind of SQL query should I perform so I can get a table like this?:
+----------+---------+------------+
| login | Host | Server |
+----------+---------+------------+
| jjameson | node004 | Secondary |
+----------+---------+------------+
| jjameson | node012 | Wrong Node |
+----------+---------+------------+
| pparker | node003 | Secondary |
+----------+---------+------------+
| breilly | node001 | Primary |
+----------+---------+------------+
Currently I'm just using Go with a bunch of structs / hashmaps to generate this.
I am planning to migrate the users / sessions to an in memory sqlite Database, but I can't seem to wrap my head around a query to get this sort of table.
The Server column is based on whether the user is logged on his primary / secondary or wrong machine.
I've put this in SQL Fiddle as well
Use case logic:
select s.*,
(case when s.host = ul.primary_server then 'primary'
when s.host = ul.secondary_server then 'secondary'
else 'wrong node'
end) as server
from sessions s left join
userlist ul
on s.login = ul.login;
Consider the following sample table("Customer") with these records
=========
Customer
=========
-----------------------------------------------------------------------------------------------
| customer-id | att-a | att-b | att-c | att-d | att-e | att-f | att-g | att-h | att-i | att-j |
--------------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+
| customer-1 | att-a-7 | att-b-3 | att-c-10 | att-d-10 | att-e-15 | att-f-11 | att-g-2 | att-h-7 | att-i-5 | att-j-14 |
--------------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+
| customer-2 | att-a-9 | att-b-7 | att-c-12 | att-d-4 | att-e-10 | att-f-4 | att-g-13 | att-h-4 | att-i-1 | att-j-13 |
--------------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+
| customer-3 | att-a-10 | att-b-6 | att-c-1 | att-d-1 | att-e-13 | att-f-12 | att-g-9 | att-h-6 | att-i-7 | tt-j-4 |
--------------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+
| customer-19 | att-a-7 | att-b-9 | att-c-13 | att-d-5 | att-e-8 | att-f-5 | att-g-12 | att-h-14 | att-i-13 | att-j-15 |
--------------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+
I have these records and many more records dumped into SQL database and wanted to find top 10 similar customer based on the attribute value. For example customer-1 and customer-19 have atleast one column value matching .i.e "att-a-7" so the output should give me 2 customer-id's or top similar customer that are customer-1 and customer-19.
P.S - there can be one or more columns similar across rows.
I'm using windowing technique to find top 10 similar customer and im not sure if I'm correct.
following is my approach I used in my query :
row_number() over (partition by att-a, att-b,..,att-j order by customer-id) as customers
is this correct. ?
There are two tables to join for an in depth excel report. I am trying to avoid creating duplicate metrics. I have already separately scraped competitor data using a python script
The first table looks like this
name |occurances |hits | actions |avg $|Key
---------+------------+--------+-------------+-----+----
balls |53432 | 5001 | 5| 2$ |Hgdy24
bats |5389 | 4672 | 3| 4$ |dhfg12
The competitor data is as follows;
Key | Ad Copie |
---------+------------+
Hgdy24 |Click here! |
Hgdy24 |Free Trial! |
Hgdy24 |Sign Up now |
dhfg12 |Check it out|
dhfg12 |World known |
dhfg12 |Sign up |
I have already tried joins to the following effect, (duplicate rows metric rows created here)
name |occurances | hits | actions | avg$|Key |Ad Copie
---------+------------+--------+-------------+-----+------+---------
Balls |53432 | 5001 | 5| 2$ |Hgdy24|Click here!
Balls |53432 | 5001 | 5| 2$ |Hgdy24|Free Trial!
Balls |53432 | 5001 | 5| 2$ |Hgdy24|Sign Up now
Bats |5389 | 4672 | 3| 4$ |dhfg12|Check it out
Bats |5389 | 4672 | 3| 4$ |dhfg12|World known
Bats |5389 | 4672 | 3| 4$ |dhfg12|Sign up
Here is the desired output
name |occurances | hits | actions | avg$|Key |Ad Copie
---------+------------+--------+-------------+-----+------+---------
Balls |53432 | 5001 | 5| 2$ |Hgdy24|Click here!
Balls | | | | |Hgdy24|Free Trial!
Balls | | | | |Hgdy24|Sign Up now
Bats |5389 | 4672 | 3| 4$ |dhfg12|Check it out
Bats | | | | |dhfg12|World known
Bats | | | | |dhfg12|Sign up
Does anyone have a clue on a good course of action for this? Lag function perhaps?
Your desired output is not a proper use-case for SQL. SQL is designed to create vies of data with all the fields filled in. When you want to visualize that data, you should do so in your application code and suppress the "duplicate" values there, not in SQL.
While running the SAP-SD benchmarking process on 3 tier SAP setup, a number of transactions are fired by automated users.
The following steps are executed,
6 /nva01 (Create Sales Order)
[ENTER]
7 Order Type or
Sales Organization 0001
Distribution Channel 01
Division 01
[ENTER]
8 Sold-to party sdd00000
PO Number perf500
Req.deliv.date 22.12.2009
Deliver.Plant 0001
Material Order quantity
sd000000 1
sd000001 1
sd000002 1
sd000003 1
sd000004 1
[F11] (Save)
9 [F3] (Back)
(This dialogstep is needed only to get 4 dialogsteps for VA01 as defined
for the SD benchmarks)
whenever [F11] is pressed after entering information, it saves successfully. However, when [F3] is pressed, it shows error “unable to update”
Then I manually tried to execute the same steps
6 /nva01 (Create Sales Order)
[ENTER]
7 Order Type or
Sales Organization 0001
Distribution Channel 01
Division 01
[ENTER]
8 Sold-to party sdd00000
PO Number perf500
Req.deliv.date 22.12.2009
Deliver.Plant 0001
Material Order quantity
sd000000 1
sd000001 1
sd000002 1
sd000003 1
sd000004 1
On pressing [F11] it successfully saves. But when [F3] is pressed to go back to previous screen, it gives “update was terminated” error.
[F11] (Save)
9 [F3] (Back)
Then to locate the root cause of error, SM13 transaction and it shows the following details for the error
There is a large number of same errors in logs, and the update key for all the error entries is the same “4A08B4400C022793E10000000FD5F53D” is this normal..?
On googling found out that the possible reason for this error could be
Key already exists in table and duplicate entry is disallowed.
Which table is affected by this transaction..? how to resolve..?
Document number ranges issue
Which document number range to modify..? how to resolve..?
Kindly advise how to resolve this
edit including system log--
Runtime Errors SAPSQL_ARRAY_INSERT_DUPREC Exception
CX_SY_OPEN_SQL_DB Date and Time 12.05.2009 06:59:27
---------------------------------------------------------------------------------------------------- |Short text
| | The ABAP/4 Open SQL array insert results in duplicate database
records. |
---------------------------------------------------------------------------------------------------- |What happened?
| | Error in the ABAP Application Program
| |
| | The current ABAP program "SAPLV05I" had to be terminated
because it has | | come across a statement
that unfortunately cannot be executed.
|
---------------------------------------------------------------------------------------------------- |What can you do?
| | Note down which actions and inputs caused the error.
| |
| |
| | To process the problem further, contact you SAP system
| | administrator.
| |
| | Using Transaction ST22 for ABAP Dump Analysis, you can look
| | at and manage termination messages, and you can also
| | keep them for a long time.
|
---------------------------------------------------------------------------------------------------- |Error analysis
| | An exception occurred that is explained in detail below.
| | The exception, which is assigned to class 'CX_SY_OPEN_SQL_DB',
was not caught | | in
| | procedure "SD_PARTNER_UPDATE" "(FUNCTION)", nor was it
propagated by a RAISING | | clause.
| | Since the caller of the procedure could not have anticipated
that the | | exception would occur, the
current program is terminated. | |
The reason for the exception is:
| | If you use an ABAP/4 Open SQL array insert to insert a record
in | | the database and that record
already exists with the same key, | |
this results in a termination.
| |
| | (With an ABAP/4 Open SQL single record insert in the same error
| | situation, processing does not terminate, but SY-SUBRC is set
to 4.) |
---------------------------------------------------------------------------------------------------- |How to correct the error
| | Use an ABAP/4 Open SQL array insert only if you are sure that
none of | | the records passed already
exists in the database. | |
| | If the error occures in a non-modified SAP program, you may be
able to | | find an interim solution in an
SAP Note. | |
If you have access to SAP Notes, carry out a search with the following
| | keywords:
| |
| | "SAPSQL_ARRAY_INSERT_DUPREC" "CX_SY_OPEN_SQL_DB"
| | "SAPLV05I" or "LV05IU15"
| | "SD_PARTNER_UPDATE"
| |
| | If you cannot solve the problem yourself and want to send an
error | | notification to SAP, include
the following information: | |
| | 1. The description of the current problem (short dump)
| |
| | To save the description, choose "System->List->Save->Local
File | | (Unconverted)".
| |
| | 2. Corresponding system log
| |
| | Display the system log by calling transaction SM21.
| | Restrict the time interval to 10 minutes before and five
minutes | | after the short dump. Then
choose "System->List->Save->Local File | |
(Unconverted)".
| |
| | 3. If the problem occurs in a problem of your own or a modified
SAP | | program: The source code of the
program | |
In the editor, choose "Utilities->More
| | Utilities->Upload/Download->Download".
| |
| | 4. Details about the conditions under which the error occurred
or which | | actions and input led to the
error. | |
| | The exception must either be prevented, caught within proedure
| | "SD_PARTNER_UPDATE" "(FUNCTION)", or its possible occurrence
must be declared | | in the
| | RAISING clause of the procedure.
| | To prevent the exception, note the following:
|
---------------------------------------------------------------------------------------------------- |System environment
| | SAP-Release 701
| |
| | Application server... "hpvm-202"
| | Network address...... "15.213.245.61"
| | Operating system..... "HP-UX"
| | Release.............. "B.11.31"
| | Hardware type........ "ia64"
| | Character length.... 16 Bits
| | Pointer length....... 64 Bits
| | Work process number.. 10
| | Shortdump setting.... "full"
| |
| | Database server... "ghoul3"
| | Database type..... "ORACLE"
| | Database name..... "E64"
| | Database user ID.. "SAPSR3"
| |
| | Terminal.......... "hpvmmsa"
| |
| | Char.set.... "C"
| |
| | SAP kernel....... 701
| | created (date)... "Feb 24 2009 21:53:01"
| | create on........ "HP-UX B.11.23 U ia64"
| | Database version. "OCI_102 (10.2.0.4.0) "
| |
| | Patch level. 32
| | Patch text.. " "
| |
| | Database............. "ORACLE 9.2.0.., ORACLE 10.1.0..,
ORACLE 10.2.0.." | | SAP database version. 701
| | Operating system..... "HP-UX B.11"
| |
| | Memory consumption
| | Roll.... 2013408
| | EM...... 0
| | Heap.... 0
| | Page.... 0
| | MM Used. 1966160
| | MM Free. 24336
|
---------------------------------------------------------------------------------------------------- |User and Transaction
| |
| | Client.............. 900
| | User................ "SAP_PERF000"
| | Language key........ "E"
| | Transaction......... "VA01 "
| | Transactions ID..... "4A08B9BC0C022793E10000000FD5F53D"
| |
| | Program............. "SAPLV05I"
| | Screen.............. "RSM13000 3000"
| | Screen line......... 2
|
---------------------------------------------------------------------------------------------------- |Information on where terminated
| | Termination occurred in the ABAP program "SAPLV05I" - in
"SD_PARTNER_UPDATE". | | The main program was
"RSM13000 ".
| |
| | In the source code you have the termination point in line 480
| | of the (Include) program "LV05IU15".
| | The program "SAPLV05I" was started in the update system.
| | The termination is caused because exception "CX_SY_OPEN_SQL_DB"
occurred in | | procedure "SD_PARTNER_UPDATE"
"(FUNCTION)", but it was neither handled locally | |
nor declared
| | in the RAISING clause of its signature.
| |
| | The procedure is in program "SAPLV05I "; its source code begins
in line | | 1 of the (Include program
"LV05IU15 ". |
---------------------------------------------------------------------------------------------------- |Source Code Extract
|
---------------------------------------------------------------------------------------------------- |Line |SourceCde
|
---------------------------------------------------------------------------------------------------- | 450| POSNR = I_XVBPA-POSNR
| | 451| PARVW =
I_XVBPA-PARVW. | | 452| IF
I_YVBPA-STCD1 <> I_XVBPA-STCD1 OR
| | 453| I_YVBPA-STCD2 <> I_XVBPA-STCD2 OR
| | 454| I_YVBPA-STCD3 <> I_XVBPA-STCD3 OR
| | 455| I_YVBPA-STCD4 <> I_XVBPA-STCD4 OR
| | 456| I_YVBPA-STCDT <> I_XVBPA-STCDT OR
| | 457| I_YVBPA-STKZN <> I_XVBPA-STKZN OR
| | 458| I_YVBPA-J_1KFREPRE <> I_XVBPA-J_1KFREPRE OR
| | 459| I_YVBPA-J_1KFTBUS <> I_XVBPA-J_1KFTBUS OR
| | 460| I_YVBPA-J_1KFTIND <> I_XVBPA-J_1KFTIND.
| | 461| MOVE-CORRESPONDING I_XVBPA TO WA_XVBPA3I.
| | 462| APPEND WA_XVBPA3I TO DA_XVBPA3I.
| | 463| ENDIF.
| | 464| ENDIF.
| | 465| ENDIF.
| | 466| WHEN UPDKZ_OLD.
| | 467| IF DA_VBPA-ADRDA CA GCF_ADDR_IND_COMB_MAN_OLD OR
| | 468| DA_VBPA-ADRDA CA GCF_ADDR_IND_COMB_MAN_ADRC.
| | 469| YADR-ADRNR = DA_VBPA-ADRNR. COLLECT YADR.
| | 470| ENDIF.
| | 471| IF DA_VBPA-ADRDA CA GCF_ADDR_IND_COMB_MAN_OLD OR
| | 472| DA_VBPA-ADRDA CA GCF_ADDR_IND_COMB_MAN_ADRC.
| | 473| XADR-ADRNR = DA_VBPA-ADRNR. COLLECT XADR.
| | 474| ENDIF.
| | 475| ENDCASE.
| | 476| ENDLOOP.
| | 477| UPDATE (OBJECT) FROM TABLE DA_XVBPAU.
| | 478| UPDATE VBPA3 FROM TABLE DA_XVBPA3U.
| | 479|
| |>>>>>| INSERT (OBJECT) FROM TABLE DA_XVBPAI.
| | 481| INSERT VBPA3 FROM TABLE DA_XVBPA3I.
| | 482|
| | 483| IF SY-SUBRC > 0.
| | 484| MESSAGE A700 WITH OBJECT SY-SUBRC DA_XVBPAI(21).
| | 485| ENDIF.
| | 486|
| | 487|* Sonderfall neue VBPA (VBPA2) für Rollen AA und AW
| | 488| LOOP AT I_XVBPA2.
| | 489| DA_VBPA2 = I_XVBPA2.
| | 490| CASE DA_VBPA2-UPDKZ.
| | 491| WHEN UPDKZ_NEW.
| | 492| IF DA_VBPA2-ADRDA CA GCF_ADDR_IND_COMB_MAN_OLD OR
| | 493| DA_VBPA2-ADRDA CA GCF_ADDR_IND_COMB_MAN_ADRC.
| | 494| XADR-ADRNR = DA_VBPA2-ADRNR. COLLECT XADR.
| | 495| ENDIF.
| | 496| I_XVBPA-MANDT = SY-MANDT.
| | 497| IF I_XVBPA2-VBELN IS INITIAL.
| | 498| I_XVBPA2-VBELN = F_VBELN.
| | 499| ENDIF.
|
It is very clear that system is trying to update with some duplicate record and hence, the update termination system is popping up. Take the help of ABAP team and check the root cause of this issue. Also if there is any customization involved in sale order creation process, then in that case also, this will happen. So you have to check with ABAP team. Alternatively, if you have login credentials for Service Marketplace, then have a look at OSS note 330904.