Hive beginner, got FAILED: SemanticException error - sql

Suppose I have two tables, actv_user and play_video:
actv_user :
|p_date | user_id|country_name|
| -------- | -------------- |------------|
| 20210125| 1|Brazil|
| 20210124| 2|ENG|
| 20210125| 3|India|
| 20210125| 4|Indonesia|
| 20210125| 5|Indonesia|
| 20210125| 6|Brazil|
| 20210125| 7|Brazil|
| 20210125| 8|Indonesia|
User_id is unique but country_name can be null
play_video:
| user_id| video_id|
| -------- | -------------- |
| 1| 1001|
| 1| 1002|
| 2| 2001|
| 3| 1001|
| 3| 1002|
| 3| 3003|
| 4| 4004|
|5| 1001|
|5| 5005|
|6| 1001|
|6| 1002|
|7| 1001|
|7| 1002|
|8| 3003|
|8| 4004|
What I want to do is find New users(p_date = 20210125) in Brazil, Indonesia and India play videos on top on the first day.
Therefore, new users in Brazil are 1,6,7(user_id), new user in India is 3, new user in Indonesia are 4, 5,8(user_id);
The outcome is something like this:
In Brazil the top videos played by new users are 1001,1002
In India the top videos played by new users are 1001,1002,3003
In Indonesia the top videos played by new users are 4004,3003, 5005
desire outcome:
|country_name| count|video_id|
| -------- | -------------- |----- |
| Brazil| 1001|3|
| Brazil| 1002|3|
| India | 1001|1|
| India | 1002|1|
| India | 3003|1|
| Indonesia | 4004|2|
| Indonesia | 3003|1|
| Indonesia | 5005|1|
what I got error message is : Failed: semanticexception error condition: user_ ID is not null. Table play is missing in SQL_ Photo partition restrictions! If there is partition condition, please check whether there is abnormal or use, please add bracket for or condition!
any ideas?
I tried:
select actv_user.country_name ,play_video.video_id, count(play_video.video_id) count_num
from actv_user join play_photo on actv_user.user_id = play_video.user_id
where p_date = 20210125 and (country_name = 'Brazil' or country_name = 'India ' or country_name = 'Indonesia ')
group by actv_user.country_name ;

Please try this
country_name in ('Brazil' ,'India ' ,'Indonesia ')

Related

Counting the Number of Rows That Have a Spesific Value in a Column in a Table in SQL

id | street | city | country | postal code |
--------------------------------------------
0 |a street|x city| Turkey | 12345 |
1 |b street|y city| Turkey | 12335 |
2 |c street|z city| USA | 12315 |
3 |d street|j city| Turkey | 32345 |
4 |e street|k city| Germany | 12135 |
5 |f street|l city| France | 13215 |
6 |g street|m city| Turkey | 42135 |
7 |h street|n city| Italy | 12135 |
8 |i street|z city| Spain | 32115 |
Hello. Let say we have a table like above named 'person_address'. In db there are lots of different tables like this. I want to find the number of rows whose values are "Turkey" in the "country" column in the person_address table which is 4. How can I translate this into postgresql?
SELECT * FROM person_address u WHERE u.country = 'Turkey';
With this code i can list what i want but i need number of that list. And after that i have to use this in Java spring boot project. Should to this with #Query annotation or is there a better way?
If you enter the output you want, you will get the answer very quickly. The following query returns the countries that have been listed 4 times.
select *
from
(select *,
count(*) over(partition by country order by country) as numberOfCountry
from person_address) t
where numberOfCountry = 4

Using pyspark to create a segment array from a flat record

I have a sparsely populated table with values for various segments for unique user ids. I need to create an array with unique_id and relevant segment headers only
Please note that this is just an indicative dataset. I have several hundreds of segments like these.
------------------------------------------------
| user_id | seg1 | seg2 | seg3 | seg4 | seg5 |
------------------------------------------------
| 100 | M | null| 25 | null| 30 |
| 200 | null| null| 43 | null| 250 |
| 300 | F | 3000| null| 74 | null|
------------------------------------------------
I am expecting the output to be
-------------------------------
| user_id| segment_array |
-------------------------------
| 100 | [seg1, seg3, seg5] |
| 200 | [seg3, seg5] |
| 300 | [seg1, seg2, seg4] |
-------------------------------
Is there any function available in pyspark of pyspark-sql to accomplish this?
Thanks for your help!
I cannot find the direct way but you can do this.
cols= df.columns[1:]
r = df.withColumn('array', array(*[when(col(c).isNotNull(), lit(c)).otherwise('notmatch') for c in cols])) \
.withColumn('array', array_remove('array', 'notmatch'))
r.show()
+-------+----+----+----+----+----+------------------+
|user_id|seg1|seg2|seg3|seg4|seg5| array|
+-------+----+----+----+----+----+------------------+
| 100| M|null| 25|null| 30|[seg1, seg3, seg5]|
| 200|null|null| 43|null| 250| [seg3, seg5]|
| 300| F|3000|null| 74|null|[seg1, seg2, seg4]|
+-------+----+----+----+----+----+------------------+
Not sure this is the best way but I'd attack it this way:
There's the collect_set function which will always give you a unique value across a list of values you aggregate over.
do a union for each segment on:
df_seg_1 = df.select(
'user_id',
fn.when(
col('seg1').isNotNull(),
lit('seg1)
).alias('segment')
)
# repeat for all segments
df = df_seg_1.union(df_seg_2).union(...)
df.groupBy('user_id').agg(collect_list('segment'))

Executing a join while avoiding creating duplicate metrics in first table rows

There are two tables to join for an in depth excel report. I am trying to avoid creating duplicate metrics. I have already separately scraped competitor data using a python script
The first table looks like this
name |occurances |hits | actions |avg $|Key
---------+------------+--------+-------------+-----+----
balls |53432 | 5001 | 5| 2$ |Hgdy24
bats |5389 | 4672 | 3| 4$ |dhfg12
The competitor data is as follows;
Key | Ad Copie |
---------+------------+
Hgdy24 |Click here! |
Hgdy24 |Free Trial! |
Hgdy24 |Sign Up now |
dhfg12 |Check it out|
dhfg12 |World known |
dhfg12 |Sign up |
I have already tried joins to the following effect, (duplicate rows metric rows created here)
name |occurances | hits | actions | avg$|Key |Ad Copie
---------+------------+--------+-------------+-----+------+---------
Balls |53432 | 5001 | 5| 2$ |Hgdy24|Click here!
Balls |53432 | 5001 | 5| 2$ |Hgdy24|Free Trial!
Balls |53432 | 5001 | 5| 2$ |Hgdy24|Sign Up now
Bats |5389 | 4672 | 3| 4$ |dhfg12|Check it out
Bats |5389 | 4672 | 3| 4$ |dhfg12|World known
Bats |5389 | 4672 | 3| 4$ |dhfg12|Sign up
Here is the desired output
name |occurances | hits | actions | avg$|Key |Ad Copie
---------+------------+--------+-------------+-----+------+---------
Balls |53432 | 5001 | 5| 2$ |Hgdy24|Click here!
Balls | | | | |Hgdy24|Free Trial!
Balls | | | | |Hgdy24|Sign Up now
Bats |5389 | 4672 | 3| 4$ |dhfg12|Check it out
Bats | | | | |dhfg12|World known
Bats | | | | |dhfg12|Sign up
Does anyone have a clue on a good course of action for this? Lag function perhaps?
Your desired output is not a proper use-case for SQL. SQL is designed to create vies of data with all the fields filled in. When you want to visualize that data, you should do so in your application code and suppress the "duplicate" values there, not in SQL.

SQL - Converting similar data while keeping different data

I will try to explain this as detailed as I can if the details are insufficient please help edit my question or inquire about the lacking details for me to add in.
Problem Description
I am required to write a SELECT Statement to convert the data within ORDERED_BY from the REQUESTED_AUTHORS table into AUTHOR_NAME data. For example, JJ as shown in ORDERED_BY must be converted into Jack Johnson as shown in AUTHOR_NAME. Therefore the end results will be Jack Johnson instead of JJ. Below shows my 2 tables:
REQUESTED_AUTHORS
+-----------+
| ORDERED_BY|
+-----------+
| JJ |
+-----------+
| AB |
+-----------+
| JonJey |
+-----------+
| Admin |
+-----------+
| Tech Assit|
+-----------+
| Dr.Ob |
+-----------+
| EL |
+-----------+
| TA |
+-----------+
| JD |
+-----------+
| ET |
+-----------+
AUTHOR_LIST
+----------------+---------------------+
| ORDER_INITIAL | AUTHOR_NAME |
+----------------+---------------------+
| JJ | Jack Johnson |
+----------------+---------------------+
| AB | Albert Bently |
+----------------+---------------------+
| AlecBor | Alec Baldwin |
+----------------+---------------------+
| KingSt | KingSton |
+----------------+---------------------+
| GaryNort | Gary Norton |
+----------------+---------------------+
| Prof.Li | Professor Li |
+----------------+---------------------+
| EL | Elton Langsey |
+----------------+---------------------+
| TA | Thomas Alecson |
+----------------+---------------------+
| JD | Johnny Depp |
+----------------+---------------------+
| ET | Elson Tarese |
+----------------+---------------------+
Solution Tried (1)
SELECT ru.*, al.AUTHOR_NAME
FROM REQUESTED_AUTHORS ru, AUTHOR_LIST al
WHERE al.ORDER_INITIAL = ru.ORDERED_BY;
But this did not work as I intended it to, as there are different data in both ORDERED_BY and ORDER_INITIAL. I tried using DECODE function in order to convert it but I am stuck there.
Solution Tried (2)
SELECT ru.ORDERED_BY,
al.ORDER_INITIAL,
DECODE(ru.ORDERED_BY, (ru.ORDERED_BY != al.ORDER_INITIAL), ru.ORDERED_BY,
(ru.ORDERED_BY = al.ORDER_INITIAL), al.AUTHOR_NAME)results
FROM REQUESTED_AUTHORS ru, AUTHOR_LIST al;
What I intend on doing is changing the data with are similar to the other but keep the different data as how they are.
Meaning that the data as shown below are to be kept the same and not converted as there is nothing for it to convert to.
+-----------+
| ORDERED_BY|
+-----------+
| JonJey |
+-----------+
| Admin |
+-----------+
| Tech Assit|
+-----------+
| Dr.Ob |
+-----------+
My Question:
How may I write a query to convert the similar data and keep the different data?
You need an Outer Join (another reason to avoid old-style joins):
SELECT ru.*,
-- if there's a match return AUTHOR_NAME, otherwise keep ORDERED_BY
COALESCE(al.AUTHOR_NAME, ru.ORDERED_BY)
FROM REQUESTED_AUTHORS ru
LEFT JOIN AUTHOR_LIST al
ON al.ORDER_INITIAL = ru.ORDERED_BY;
Use left outer join here
SELECT ru.*, nvl( al.AUTHOR_NAME , ru.ordered_by)
FROM REQUESTED_AUTHORS ru, AUTHOR_LIST al
WHERE ru.ORDERED_BY = al.ORDER_INITIAL(+);

oracle sql select the table and cumulative calculate the score by the date

http://i.stack.imgur.com/IDMWU.jpg
+-----+-----------+----------+
| ID | TERM | SCORE |
+-----+-----------+----------+
| 1001| 201009 | 3 |
| 1001| 201009 | 1.5 |
| 1001| 201101 | 2 |
| 1001| 201101 | 1 |
| 1001| 201109 | 2 |
+-----+-----------+----------+
here is the table 1 that had some kind of gpa score
one person had some score in several terms
is it possible using select statement to group the term and calculate the score
http://i.stack.imgur.com/Zkrqu.jpg
+-----+-----------+--------------------+
|ID |TERM | GPA |
+-----+-----------+--------------------+
|1001 |201009 | (3+1.5)/2=2.25 |
|1001 |201101 |(3+1.5+2+1)/4=1.875 |
|1001 |201109 |(3+1.5+2+1+2)/5=1.9 |
+-----+-----------+--------------------+
etc....
i am using apex to make a report,it seems it create the table by select statement
is it possible select the table like that ?
I don't have access to oracle server at this moment but if I remember correctly this should work:
SELECT DISTINCT id, term, AVG(score)
OVER (PARTITION BY id ORDER BY term)
FROM foo;