SQLite Create Table and Insert & Update Value - sql

When I am studying at Udemy SQL course, I am stuck on the below problem. Could you help me how to solve? It is very difficult for me.
Here's the question:
The goal of this exercise is to find out whether each one of the 4 numeric attributes in TemporalData is
positively or negatively correlated with sales.
Requirements:
Each example in the sample consists of a pair (Store, WeekDate). This means that Sales data must be summed across all departments before the correlation is computed.
Return a relation with schema (AttributeName VARCHAR(20), CorrelationSign Integer).
The values of AttributeName can be hardcoded string literals, but the values
of CorrelationSign must be computed automatically using SQL queries over
the given database instance.
You can use multiple SQL statements to compute the result. It might be of help to create and update your own tables.
Here's the sales table:
| store | dept | weekdate | weeklysales |
|1 | 1 | 2010-07-01| 2400.0 |
|2 | 1 | 2010-07-02| 40.0 |
Here's the temporaldata table:
| store | weekdate | temperature | fuelprice | cpi | unemploymentrate|
| 1 |2010-07-01| 100.14. | 2.981 | 125.1222 | 9.402
| 2 |2010-07-02| 99.13 | 3.823 | 129.2912 | 14.81
So, I wrote code like below. Could you give how to solve this question.
DROP TABLE IF EXISTS CorrelationGroup
CREATE TABLE CorrelationGroup(
attributName VARCHAR(30),
CorrelationSign INTEGER
);
SELECT SUM(weeklysales)
FROM temporaldate t, sales s
WHERE s.weekdate = t.weekdate AND s.store = t.store ;
SELECT AVG(t.temperature) as 'temp', AVG(t.fuelprice) as 'fuel', AVG(t.cpi) as 'fuel', AVG(t.unemploymentrate) as'unemploymentrate
FROM sales s, temporaldate t
WHERE s.store = t.store AND s.weekdate = t.weekdate;
INSERT INTO CorrelationGroup VALUES ('Temp', 'The Value Of CorrelationSign');
-- 3 more

Related

SQL question, query is not updating account_id's fields: income, customerid, customergroup?

I am executing this query through a databricks notebook, to join data from a stage table to a target table based on the shared join keys: account_id and stmt_end_dt. The stage table has 2 billion rows of data and the target table has 3 billion rows of data.
Here is the main query:
"UPDATE TARGET_TBL SET INCOME = S.INCOME, CUSTOMERGROUPID = S.CUSTOMERGROUPID, CUSTOMERID = S.CUSTOMERID
FROM STAGE_TBL AS S
WHERE CAST(S.ACCT_ID AS NUMBER(18,0)) = TARGET_TBL.ACCT_ID
AND CAST(S.STMT_END_DT AS DATE) = TARGET_TBL.STMT_END_DT"
What I want to do is add "income", "customerid", and "customergroup" data to the matching rows of "account_id" and "stmt_end_dt" in the target table, from the stage table. When I go into the target table I see that there are now fields for "income", "customerid", and "customergrop" (this is fine because it was created through an earlier query). After my query has run and I click into the target table I see that account_id is blank and that "income", "customerid" and "customergroup" all have data filled. And when I run this query: SELECT * FROM TARGET_TBL WHERE INCOME IS NOT NULL; I get back 80000 rows (seems kinda low considering the stage table is 2 billion). Also after that query runs I see again that "income", "customerid" and "customergroup" are all populated with data, but account_id is full of NULLS. It is as this data is just being appended or tacked on, and not updating each account_id's fields with the matching data, this is how I imagine it should look like:
account_id | income | customerid | customergroupid
4321 | 60000 | 6345 | 3
5432 | 55000 | 4345 | 5
But instead it looks like this:
account_id | income | customerid | customergroupid
| 60000 | 6345 | 3
| 55000 | 4345 | 5
Or when I run: SELECT * WHERE INCOME IS NOT NULL:
account_id | income | customerid | customergroupid
NULL | 60000 | 6345 | 3
NULL | 55000 | 4345 | 5
And if I simply open the target table it looks like this:
account_id | income | customerid | customergroupid
4321 | | |
5432 | | |
After that query runs, it is also NULL for all other fields outside of the last 3 shown.
Perhaps the data types coming from the stage table aren't compatible with the target table?
What could be causing this strange behavior?
you can't compare "values" with "null"... if a field is "null" there is nothing to compare. I believe this is your problem.
if you have null fields and you want to compare, usually you can try "is null" or "nvl" lookup for the syntax of these.. it is very helpfull.

How to get a value inside of a JSON that is inside a column in a table in Oracle sql?

Suppose that I have a table named agents_timesheet that having a structure like this:
ID | name | health_check_record | date | clock_in | clock_out
---------------------------------------------------------------------------------------------------------
1 | AAA | {"mental":{"stress":"no", "depression":"no"}, | 6-Dec-2021 | 08:25:07 |
| | "physical":{"other_symptoms":"headache", "flu":"no"}} | | |
---------------------------------------------------------------------------------------------------------
2 | BBB | {"mental":{"stress":"no", "depression":"no"}, | 6-Dec-2021 | 08:26:12 |
| | "physical":{"other_symptoms":"no", "flu":"yes"}} | | |
---------------------------------------------------------------------------------------------------------
3 | CCC | {"mental":{"stress":"no", "depression":"severe"}, | 6-Dec-2021 | 08:27:12 |
| | "physical":{"other_symptoms":"cancer", "flu":"yes"}} | | |
Now I need to get all agents having flu at the day. As for getting the flu from a single JSON in Oracle SQL, I can already get it by this SQL statement:
SELECT * FROM JSON_TABLE(
'{"mental":{"stress":"no", "depression":"no"}, "physical":{"fever":"no", "flu":"yes"}}', '$'
COLUMNS (fever VARCHAR(2) PATH '$.physical.flu')
);
As for getting the values from the column health_check_record, I can get it by utilizing the SELECT statement.
But How to get the values of flu in the JSON in the health_check_record of that table?
Additional question
Based on the table, how can I retrieve full list of other_symptoms, then it will get me this kind of output:
ID | name | other_symptoms
-------------------------------
1 | AAA | headache
2 | BBB | no
3 | CCC | cancer
You can use JSON_EXISTS() function.
SELECT *
FROM agents_timesheet
WHERE JSON_EXISTS(health_check_record, '$.physical.flu == "yes"');
There is also "plain old way" without JSON parsing only treting column like a standard VARCHAR one. This way will not work in 100% of cases, but if you have the data in the same way like you described it might be sufficient.
SELECT *
FROM agents_timesheet
WHERE health_check_record LIKE '%"flu":"yes"%';
How to get the values of flu in the JSON in the health_check_record of that table?
From Oracle 12, to get the values you can use JSON_TABLE with a correlated CROSS JOIN to the table:
SELECT a.id,
a.name,
j.*,
a."DATE",
a.clock_in,
a.clock_out
FROM agents_timesheet a
CROSS JOIN JSON_TABLE(
a.health_check_record,
'$'
COLUMNS (
mental_stress VARCHAR2(3) PATH '$.mental.stress',
mental_depression VARCHAR2(3) PATH '$.mental.depression',
physical_fever VARCHAR2(3) PATH '$.physical.fever',
physical_flu VARCHAR2(3) PATH '$.physical.flu'
)
) j
WHERE physical_flu = 'yes';
db<>fiddle here
You can use "dot notation" to access data from a JSON column. Like this:
select "DATE", id, name
from agents_timesheet t
where t.health_check_record.physical.flu = 'yes'
;
DATE ID NAME
----------- --- ----
06-DEC-2021 2 BBB
Note that this approach requires that you use an alias for the table name (so you can use it in accessing the JSON data).
For testing I used the data posted by MT0 on dbfiddle. I am not a big fan of double-quoted column names; use something else for "DATE", such as dt or date_.

Join two tables - One common column with different values

I have been searching around for how to do this for days - unfortunately I don't have much experience with SQL Queries, so it's been a bit of trial and error.
Basically, I have created two tables - both with one DateTime column and a different column with values in.
The DateTime column has different values in each table.
So...
ACOQ1 (Table 1)
===============
| DateTime | ACOQ1_Pump_Running |
|----------+--------------------|
| 7:14:12 | 1 |
| 8:09:03 | 1 |
ACOQ2 (Table 2)
===============
| DateTime | ACOQ2_Pump_Running |
|----------+--------------------|
| 3:54:20 | 1 |
| 7:32:57 | 1 |
I want to combine these two tables to look like this:
| DateTime | ACOQ1_Pump_Running | ACOQ2_Pump_Running |
|----------+--------------------+--------------------|
| 3:54:20 | 0 OR NULL | 1 |
| 7:14:12 | 1 | 0 OR NULL |
| 7:32:57 | 0 OR NULL | 1 |
| 8:09:03 | 1 | 0 OR NULL |
I have achieved this by creating a third table that 'UNION's the DateTime column from both tables and then uses that third table's DateTime column for the new table but was wondering if there was a way to skip this step out.
(Eventually I will be adding more and more columns on from different tables and don't really want to be adding yet more processing time by creating a joint DateTime table that may not be necessary).
My working code at the moment:
CREATE TABLE JointDateTime
(
DateTime CHAR(50)
CONSTRAINT [pk_Key3] PRIMARY KEY (DateTime)
);
INSERT INTO JointDateTime (DateTime)
SELECT ACOQ1.DateTime FROM ACOQ1
UNION
SELECT ACOQ2.DateTime FROM ACOQ2
SELECT JointDateTime.DateTime, ACOQ1.ACOQ1_NO_1_PUMP_RUNNING, ACOQ2.ACOQ2_NO_1_PUMP_RUNNING
FROM (SELECT ACOQ1.DateTime FROM ACOQ1
UNION
SELECT ACOQ2.DateTime FROM ACOQ2) JointDateTime
LEFT OUTER JOIN ACOQ1
ON JointDateTime.DateTime = ACOQ1.DateTime
LEFT OUTER JOIN ACOQ2
ON JointDateTime.DateTime = ACOQ2.DateTime
You need a plain old FULL OUTER JOIN like this.
SELECT COALESCE(A1.DateTime,A2.DateTime) DateTime,ACOQ1_Pump_Running, ACOQ2_Pump_Running
FROM ACOQ1 A1
FULL OUTER JOIN ACOQ2 A2
ON A1.DateTime = A2.DateTime
This will give you NULL for ACOQ1_Pump_Running, ACOQ2_Pump_Running for rows which do not match the date in the corresponding table. If you need 0 just use COALESCE or ISNULL.
Side Note: : In your script, I can see your are using DateTime CHAR(50). Please use appropriate types

SQL Query: Search with list of tuples

I have a following table (simplified version) in SQLServer.
Table Events
-----------------------------------------------------------
| Room | User | Entered | Exited |
-----------------------------------------------------------
| A | Jim | 2014-10-10T09:00:00 | 2014-10-10T09:10:00 |
| B | Jim | 2014-10-10T09:11:00 | 2014-10-10T09:22:30 |
| A | Jill | 2014-10-10T09:00:00 | NULL |
| C | Jack | 2014-10-10T09:45:00 | 2014-10-10T10:00:00 |
| A | Jack | 2014-10-10T10:01:00 | NULL |
.
.
.
I need to create a query that returns person's whereabouts in given timestamps.
For an example: Where was (Jim at 2014-10-09T09:05:00), (Jim at 2014-10-10T09:01:00), (Jill at 2014-10-10T09:10:00), ...
The result set must contain the given User and Timestamp as well as the found room (if any).
------------------------------------------
| User | Timestamp | WasInRoom |
------------------------------------------
| Jim | 2014-10-09T09:05:00 | NULL |
| Jim | 2014-10-09T09:01:00 | A |
| Jim | 2014-10-10T09:10:00 | A |
The number of User-Timestamp tuples can be > 10 000.
The current implementation retrieves all records from Events table and does the search in Java code. I am hoping that I could push this logic to SQL. But how?
I am using MyBatis framework to create SQL queries so the tuples can be inlined to the query.
The basic query is:
select e.*
from events e
where e.user = 'Jim' and '2014-10-09T09:05:00' >= e.entered and ('2014-10-09T09:05:00' <= e.exited or e.exited is NULL) or
e.user = 'Jill' and '2014-10-10T09:10:00 >= e.entered and ('2014-10-10T09:10:00' <= e.exited or e.exited is NULL) or
. . .;
SQL Server can handle ridiculously large queries, so you can continue in this vein. However, if you have the name/time values in a table already (or it is the result of a query), then use a join:
select ut.*, t.*
from usertimes ut left join
events e
on e.user = ut.user and
ut.thetime >= et.entered and (ut.thetime <= exited or ut.exited is null);
Note the use of a left join here. It ensures that all the original rows are in the result set, even when there are no matches.
Answers from Jonas and Gordon got me on track, I think.
Here is query that seems to do the job:
CREATE TABLE #SEARCH_PARAMETERS(User VARCHAR(16), "Timestamp" DATETIME)
INSERT INTO #SEARCH_PARAMETERS(User, "Timestamp")
VALUES
('Jim', '2014-10-09T09:05:00'),
('Jim', '2014-10-10T09:01:00'),
('Jill', '2014-10-10T09:10:00')
SELECT #SEARCH_PARAMETERS.*, Events.Room FROM #SEARCH_PARAMETERS
LEFT JOIN Events
ON #SEARCH_PARAMETERS.User = Events.User AND
#SEARCH_PARAMETERS."Timestamp" > Events.Entered AND
(Events.Exited IS NULL OR Events.Exited > #SEARCH_PARAMETERS."Timestamp"
DROP TABLE #SEARCH_PARAMETERS
By declaring a table valued parameter type for the (user, timestamp) tuples, it should be simple to write a table valued user defined function which returns the desired result by joining the parameter table and the Events table. See http://msdn.microsoft.com/en-us/library/bb510489.aspx
Since you are using MyBatis it may be easier to just generate a table variable for the tuples inline in the query and join with that.

Crosstab splitting results due to presence of unrelated field

I'm using postgres 9.1 with tablefunc:crosstab
I have a table with the following structure:
CREATE TABLE marketdata.instrument_data
(
dt date NOT NULL,
instrument text NOT NULL,
field text NOT NULL,
value numeric,
CONSTRAINT instrument_data_pk PRIMARY KEY (dt , instrument , field )
)
This is populated by a script that fetches data daily. So it might look like so:
| dt | instrument | field | value |
|------------+-------------------+-----------+-------|
| 2014-05-23 | SGX.MiniJGB.2014U | PX_VOLUME | 1 |
| 2014-05-23 | SGX.MiniJGB.2014U | OPEN_INT | 2 |
I then use the following crosstab query to pivot the table:
select dt, instrument, vol, oi
FROM crosstab($$
select dt, instrument, field, value
from marketdata.instrument_data
where field = 'PX_VOLUME' or field = 'OPEN_INT'
$$::text, $$VALUES ('PX_VOLUME'),('OPEN_INT')$$::text
) vol(dt date, instrument text, vol numeric, oi numeric);
Running this I get the result:
| dt | instrument | vol | oi |
|------------+-------------------+-----+----|
| 2014-05-23 | SGX.MiniJGB.2014U | 1 | 2 |
The problem:
When running this with lot of real data in the table, I noticed that for some fields the function was splitting the result over two rows:
| dt | instrument | vol | oi |
|------------+-------------------+-----+----|
| 2014-05-23 | SGX.MiniJGB.2014U | 1 | |
| 2014-05-23 | SGX.MiniJGB.2014U | | 2 |
I checked that the dt and instrument fields were identical and produced a work-around by grouping the ouput of the crosstab.
Analysis
I've discovered that it's the presence of one other entry in the input table that causes the output to be split over 2 rows. If I have the input as follows:
| dt | instrument | field | value |
|------------+-------------------+-----------+-------|
| 2014-04-23 | EUX.Bund.2014M | PX_VOLUME | 0 |
| 2014-05-23 | SGX.MiniJGB.2014U | PX_VOLUME | 1 |
| 2014-05-23 | SGX.MiniJGB.2014U | OPEN_INT | 2 |
I get:
| dt | instrument | vol | oi |
|------------+-------------------+-----+----|
| 2014-04-23 | EUX.Bund.2014M | 0 | |
| 2014-05-23 | SGX.MiniJGB.2014U | 1 | |
| 2014-05-23 | SGX.MiniJGB.2014U | | 2 |
Where it gets really weird...
If I recreate the above input table manually then the output is as we would expect, combined into a single row.
If I run:
update marketdata.instrument_data
set instrument = instrument
where instrument = 'EUX.Bund.2014M'
Then again, the output is as we would expect, which is surprising as all I've done is set the instrument field to itself.
So I can only conclude that there is some hidden character/encoding issue in that Bund entry that is breaking crosstab.
Are there any suggestions as to how I can determine what it is about that entry that breaks crosstab?
Edit:
I ran the following on the raw table to try and see any hidden characters:
select instrument, encode(instrument::bytea, 'escape')
from marketdata.bloomberg_future_data_temp
where instrument = 'EUX.Bund.2014M';
And got:
| instrument | encode |
|----------------+----------------|
| EUX.Bund.2014M | EUX.Bund.2014M |
Two problems.
1. ORDER BY is required.
The manual:
In practice the SQL query should always specify ORDER BY 1,2 to ensure that the input rows are properly ordered, that is, values with the same row_name are brought together and correctly ordered within the row.
With the one-parameter form of crosstab(), ORDER BY 1,2 would be necessary.
2. One column with distinct values per group.
The manual:
crosstab(text source_sql, text category_sql)
source_sql is a SQL statement that produces the source set of data.
...
This statement must return one row_name column, one category column,
and one value column. It may also have one or more "extra" columns.
The row_name column must be first. The category and value columns must
be the last two columns, in that order. Any columns between row_name
and category are treated as "extra". The "extra" columns are expected
to be the same for all rows with the same row_name value.
Bold emphasis mine. One column. It seems like you want to form groups over two columns, which does not work as you desire.
Related answer:
Pivot on Multiple Columns using Tablefunc
The solution depends on what you actually want to achieve. It's not in your question, you silently assumed the function would do what you hope for.
Solution
I guess you want to group on both leading columns: (dt, instrument). You could play tricks with concatenating or arrays, but that would be slow and / or unreliable. I suggest a cleaner and faster approach with a window function rank() or dense_rank() to produce a single-column unique value per desired group. This is very cheap, because ordering rows is the main cost and the order of the frame is identical to the required order anyway. You can remove the added column in the outer query if desired:
SELECT dt, instrument, vol, oi
FROM crosstab(
$$SELECT dense_rank() OVER (ORDER BY dt, instrument) AS rnk
, dt, instrument, field, value
FROM marketdata.instrument_data
WHERE field IN ('PX_VOLUME', 'OPEN_INT')
ORDER BY 1$$
, $$VALUES ('PX_VOLUME'),('OPEN_INT')$$
) vol(rnk int, dt date, instrument text, vol numeric, oi numeric);
More details:
PostgreSQL Crosstab Query
You could run a query that replaces irregular characters with an asterisk:
select regexp_replace(instrument, '[^a-zA-Z0-9]', '*', 'g')
from marketdata.instrument_data
where instrument = 'EUX.Bund.2014M'
Perhaps the instrument = instrument assignment discards trailing whitespace. That would also explain why where instrument = 'EUX.Bund.2014M' matches two values that crosstab sees as different.