How to query duplicated rows according to column value?

How to query duplicated rows according to column value? - sql

I'm trying to figure out how to duplicate my rows based on the pass and fail column. Below is the table that I've done querying so far.
The code for the query is shown below. I'm querying from a json.
SELECT
to_date(LotSummary ->> 'Start', 'HH24:MI:SS DD/MM/YY')AS "Date",
Machine AS "Machine",
LotSummary ->> 'Pass' AS "Pass",
LotSummary ->> 'Fail' AS "Fail"
FROM
(
SELECT
CASE
WHEN jsonb_typeof(json_data->'OEESummary'->'LotSummary'->'Lot') = 'array'
THEN
jsonb_array_elements(cast(json_data->'OEESummary'->'LotSummary'->'Lot' AS JSONB))
ELSE
json_data->'OEESummary'->'LotSummary'->'Lot'
END
AS LotSummary,
json_data->'OEESummary'->>'MachineID' AS Machine
FROM
(
SELECT
jsonb_array_elements(cast(json_data->>'body' AS JSONB)) AS json_data
FROM data
)t
WHERE
json_data ->> 'file_name' = 'OEE.xml'
)a
WHERE
to_date(LotSummary ->> 'Start', 'HH24:MI:SS DD/MM/YY') IS NOT NULL
So let's say I want to query it as duplicates to separate Pass and Fail values just like:
+----------------------------+--------------+------+------+
| Date | Machine | Pass | Fail |
+----------------------------+--------------+------+------+
| "2019-08-04T16:00:00.000Z" | TRS1000i-082 | 5 | NULL |
| "2019-08-04T16:00:00.000Z" | TRS1000i-082 | NULL | 2 |
| "2019-07-01T16:00:00.000Z" | TRS1000i-001 | 0 | NULL |
| "2019-07-01T16:00:00.000Z" | TRS1000i-001 | NULL | 0 |
| "2019-07-01T16:00:00.000Z" | TRS1000i-001 | 20 | NULL |
| "2019-07-01T16:00:00.000Z" | TRS1000i-001 | NULL | 0 |
+----------------------------+--------------+------+------+
Just in case you need the json formats (Do note it's not the exact one, but the format is correct):
{
"body": [
{
"file_name": "OEE.xml",
"OEESummary": {
"MachineID": "TRS1000i-012",
"LotSummary": {
"Lot": [
{
"#i": "0",
"Start": "14:52:16 15/08/19",
"Pass": "3",
"Fail": "0"
},
{
"#i": "1",
"Start": "15:40:41 15/08/19",
"Pass": "3",
"Fail": "0"
}
]
},
"Utilisation": [
"0:01:42:48",
"19.04%"
],
"MTTR": "--",
"IdleTime": "0:07:16:39",
"MUBA": "57",
"OEE": "60.55%"
}
}
],
"header": {
"json_metadata_revision": "v1.0.0",
"json_metadata_datetime_creation": "14-OCT-2019_14:55:57",
"json_metadata_uuid": "14102019145557_65b425d8-09e5-48ec-be85-e69d9a50d2e3",
"json_metadata_type": "mvst_xml_to_json"
}
}
Do help if you know any techniques I could use to solve this issue. Your help is greatly appreciated! Thank youu.

With your table, you can use a lateral join:
select t.date, t.machine, v.pass, v.fail
from t cross join lateral
(values (t.pass, null), (null, t.fail)) v(pass, fail);
I'm not quite sure what your query has to do with the question. But you can define it as a CTE and then use the results for t.

In the top of your result set , its just union query:
WITH A AS(
Select 1 id , 'TRS1000i-082' as Machine , 5 pass, 2 fail union all
Select 2 id , 'TRS1000i-001' as Machine , 0 pass, 0 fail union all
Select 3 id , 'TRS1000i-001' as Machine , 20 pass, 0 fail
)
SELECT ID
,MACHINE
,pass
,null fail
FROM a
UNION ALL
SELECT ID
,MACHINE
,null pass
,fail fail
FROM a
order by ID

Related

How select a row and count row in single query in postgresql?

I have a table in postgresql as follow:
id | chat_id | content | time | read_times
----+---------+-----------+------+-------------------------------------------------------------------------
1 | chat_10 | content_1 | t1 | [{"username": "user1", "time": 123}, {"username": "user2", "time": 111}]
2 | chat_10 | content_2 | t2 | [{"username": "user2", "time": 1235}]
3 | chat_10 | content_3 | t3 | []
4 | chat_11 | content_4 | t4 | [{"username": "user1", "time": 125}, {"username": "user3", "time": 121}]
5 | chat_11 | content_5 | t5 | [{"username": "user1", "time": 126}, {"username": "user3", "time": 127}]
Note: t1 < t2 < t3 < t4 < t5
After every user read a message, we registered it in read_times column(user2 read a message with id 2 at time 1235), Now I want to get user chat list with unread count chats. for user1 the result is as follow:
chat_id | content | unread_count
--------+-----------+--------------
chat_10 | content_3 | 2
chat_11 | content_5 | 0
Note: unread_count is messages count that user didn't read in a caht_id.
Is it possible with one query?

First, you must extract the user names for each chat_id and content with json_array_elements() function and with FIRST_VALUE() window function get the last content of each chat_id.
Then aggregate and combine SUM() window function with MAX() aggregate function to get the column unread_count:
WITH cte AS (
SELECT t.chat_id, t.content,
FIRST_VALUE(t.content) OVER (PARTITION BY t.chat_id ORDER BY t.time DESC) last_content,
(r->>'username') username
FROM tablename t LEFT JOIN json_array_elements(read_times::json) r ON true
)
SELECT DISTINCT c.chat_id, MAX(c.last_content) "content",
SUM((MAX((COALESCE(username, '') = 'user1')::int) = 0)::int) OVER (PARTITION BY c.chat_id) unread_count
FROM cte c
GROUP BY c.chat_id, c.content
ORDER BY c.chat_id
See the demo.

How can I return an ARRAY/JSON structure in a column in postgresql?

Hello I have a set of tables as follows to explain my issue with a minimal example (I want to know how can I retrieve that structure, not change the database structure):
fruit:
id | name | form| texture_id
-------------------------------
1 | Apple | round | 1
2 | Banana | long | 1
fruit_varieties:
id | name | fruit_id | color
-------------------------------
1 | golden | 1| green
2 | fuji | 1| red
3 | canarias | 2| yellow
fruit_texture
id | name
-------------------------------
1 | soft
2 | hard
variety_countries
id | name | fruit_variety_id | average_temperature
-------------------------------
1 | france | 1 | 21
2 | spain | 1 | 24
3 | italy | 2 | 23
I wan to get this structure as follows:
For a given fruit.name=Apple:
{
"fruit_name" = "Apple",
"form" = "round",
"texture" = "soft",
"fruit_properties" = [{
"variety_name" = "Golden",
"color" = "green",
"countries" = [{
"country" = "france",
"avg_temperature" = "21",
}, {
"country" = "spain",
"avg_temperature" = "24",
}
]
}, {
"variety_name" = "fuji",
"color" = "red",
"countries" = [{
"country" = "italy",
"avg_temperature" = "23",
}
]
}
]
}
So I started with something like this
SELECT
fruit.name AS fruit_name,
fruit.form AS form,
fruit_texture.name AS texture,
(
# I don't know how to handle this
) AS fruit_properties
FROM fruit
JOIN fruit_varieties
ON fruit.id = fruit_varieties.fruit_id
WHERE fruit.name = 'Apple'
Now I'm not able to know how can I return that array inside a column, or create a JSON with the whole response. I have been already some hours trying to use some JSON PATH functions I have been suggested in some questions but I am not able to make them work.
Could someone give me a hint using this simple example?

Your output structure is not a standard JSON format. It should be : instead of = between key and value. Considering you want standard JSON output, try below mentioned query:
select row_to_json(d2) from (
select
name,
form,
texture,
json_agg(json_build_object('variety_name',variety_name,'color',color,'countries',countries)) "fruit_properties"
from
(
select
t1.name "name",
t1.form "form",
t3.name "texture",
t2.name "variety_name",
t2.color "color",
json_agg(json_build_object( 'country',t4.name,'temp',t4.average_temperature)) "countries"
from
fruit t1 inner join fruit_varieties t2 on t1.id=t2.fruit_id
inner join fruit_texture t3 on t1.texture_id=t3.id
inner join variety_countries t4 on t4.fruit_variety_id=t2.id
group by 1,2,3,4,5
) d1
group by 1,2,3
) d2
where d2.name='Apple'
DEMO
Above query will return a row with JSON value for each fruit if you will not use where clause.
If you literally want the output as you have mentioned in your question then replace row_to_json(d2) with replace(row_to_json(d2)::text,':', ' = ') in above query.

How to Use Hive parsing multiple nested JSON arrays

{
"base": {
"code": "xm",
"name": "project"
},
"list": [{
"ACode": "cp1",
"AName": "Product1",
"BList": [{
"BCode": "gn1",
"BName": "Feature1"
}, {
"BCode": "gn2",
"BName": "Feature2"
}]
}, {
"ACode": "cp2",
"AName": "Product2",
"BList": [{
"BCode": "gn1",
"BName": "Feature1"
}]
}]
}
JSON like this, want to get this
| code | name | ACode | Aname | Bcode | Bname |
| ---- | ------- | ----- | -------- | ----- | -------- |
| xm | project | cp1 | Product1 | gn1 | Feature1 |
| xm | project | cp1 | Product1 | gn2 | Feature2 |
| xm | project | cp2 | Product2 | gn1 | Feature1 |
I try Use this
SELECT
code
, name
, get_json_object(t.list, '$.[*].ACode') AS ACode
, get_json_object(t.list, '$.[*].AName') AS AName
, get_json_object(t.list, '$.[*].BList[*].BCode') AS BCode
, get_json_object(t.list, '$.[*].BList[*].BName') AS BName
FROM
(
SELECT
get_json_object(t.value, '$.base.code') AS code
, get_json_object(t.value, '$.base.name') AS name
, get_json_object(t.value, '$.list') AS list
FROM
(
SELECT
'{"base":{"code":"xm","name":"project"},"list":[{"ACode":"cp1","AName":"Product1","BList":[{"BCode":"gn1","BName":"Feature1"},{"BCode":"gn2","BName":"Feature2"}]},{"ACode":"cp2","AName":"Product2","BList":[{"BCode":"gn1","BName":"Feature1"}]}]}' as value
)
t
)
t
;
get this
xm project ["cp1","cp2"] ["Product1","Product2"] ["gn1","gn2","gn1"] ["Feature1","Feature2","Feature1"]
BUT I find it will generate six row.Seems to have a Cartesian product.
And I try use split(string, "\},\{"),but this will split the inner layer at the same time.SO I hope to get help.

I solve it!!
SELECT
code
, name
, ai.ACode
, ai.AName
, p.BCode
, p.BName
FROM
(
SELECT
get_json_object(t.value, '$.base.code') AS code
, get_json_object(t.value, '$.base.name') AS name
, get_json_object(t.value, '$.list') AS list
FROM
(
SELECT
'{"base":{"code":"xm","name":"project"},"list":[{"ACode":"cp1","AName":"Product1","BList":[{"BCode":"gn1","BName":"Feature1"},{"BCode":"gn2","BName":"Feature2"}]},{"ACode":"cp2","AName":"Product2","BList":[{"BCode":"gn1","BName":"Feature1"}]}]}' as value
)
t
)
t
lateral view explode(split(regexp_replace(regexp_extract(list,'^\\[(.+)\\]$',1),'\\}\\]\\}\\,\\{', '\\}\\]\\}\\|\\|\\{'),'\\|\\|')) list as a
lateral view json_tuple(a,'ACode','AName','BList') ai as ACode
, AName
, BList
lateral view explode(split(regexp_replace(regexp_extract(BList,'^\\[(.+)\\]$',1),'\\}\\,\\{', '\\}\\|\\|\\{'),'\\|\\|')) BList as b
lateral view json_tuple(b,'BCode','BName') p as BCode
, BName

Counting distinct values within group

Given a table like so:
| JobId | Result |
|-------|--------|
| 1 | true |
| 1 | false |
| 1 | true |
| 1 | |
| 2 | false |
| 2 | false |
| 1 | true |
| 1 | true |
| 1 | true |
Is it possible for a SQL query to generate an output like this?
[{JobId: 1, true: 2, false: 1, undefined: 1},
{JobId: 2, true: 3, false: 2, undefined: 0}]
I am using the ORM Sequelize currently, but can use raw queries, this is what I have so far.
db.JobResponse.findAll({
where: {
jobId: job
},
attributes: ['JobId', [fn('sum', col('result'))]],
group: ['JobId']
}).then(result => {
res.status(200).send({
data: result
});
})
Currently it is trying to sum the result column, grouping by JobId, however, result is a boolean (expected value are true, false and undefined), so a simple sum won't work. Is it possible to count per distinct value within a group?

I think a basic pivot query to aggregate the true, false, and undefined tallies along with FOR JSON AUTO at the end of the query should generate the result you want:
SELECT
JobId,
SUM(CASE WHEN Result = 'true' THEN 1 ELSE 0 END) AS true,
SUM(CASE WHEN Result = 'false' THEN 1 ELSE 0 END) AS false,
SUM(CASE WHEN Result IS NULL THEN 1 ELSE 0 END) AS undefined
FROM yourTable
GROUP BY JobId
FOR JSON AUTO; -- convert each result record to a JSON element inside an outer array []
I couldn't actually test this, because both Rextester and SQLFiddle appear to not support the SQL Server JSON extensions. But this useful tutorial seems to support the answer I gave.

Flatten BigQuery nested field contents into new columns instead of rows

I have some BigQuery data in the following format:
"thing": [
{
"name": "gameLost",
"params": [
{
"key": "total_games",
"val": {
"str_val": "3",
"int_val": null
}
},
{
"key": "games_won",
"val": {
"str_val": "2",
"int_val": null
}
},
{
"key": "game_time",
"val": {
"str_val": "44",
"int_val": null
}
}
],
"dt_a": "1470625311138000",
"dt_b": "1470620345566000"
}
I'm aware of the FLATTEN() function that will result in an output of 3 rows like so:
+------------+------------------+------------------+--------------------+--------------------------+--------------------------+
| thing.name | thing.dt_a | event_dim.dt_b | thing.params.key | thing.params.val.str_val | thing.params.val.int_val |
+------------+------------------+------------------+--------------------+--------------------------+--------------------------+
| gameLost | 1470625311138000 | 1470620345566000 | total_games_played | 3 | null |
| | | | | | |
| gameLost | 1470625311138000 | 1470620345566000 | games_won | 2 | null |
| | | | | | |
| gameLost | 1470625311138000 | 1470620345566000 | game_time | 44 | null |
+------------+------------------+------------------+--------------------+--------------------------+--------------------------+
where the higher level keys/values get repeated into new rows for each deeper level object.
However, I need to output the deeper key/values as entirely new columns, and not repeat fields so the results would look like this:
+------------+------------------+------------------+--------------------+-----------+-----------+
| thing.name | thing.dt_a | event_dim.dt_b | total_games_played | games_won | game_time |
+------------+------------------+------------------+--------------------+-----------+-----------+
| gameLost | 1470625311138000 | 1470620345566000 | 3 | 2 | 44 |
+------------+------------------+------------------+--------------------+-----------+-----------+
How can I do this? Thanks!

Standard SQL makes this easier to express (uncheck "Use Legacy SQL" under "Show Options"):
WITH T AS (
SELECT STRUCT(
"gameLost" AS name,
ARRAY<STRUCT<key STRING, val STRUCT<str_val STRING, int_val INT64>>>[
STRUCT("total_games", STRUCT("3", NULL)),
STRUCT("games_won", STRUCT("2", NULL)),
STRUCT("game_time", STRUCT("44", NULL))] AS params,
1470625311138000 AS dt_a,
1470620345566000 AS dt_b) AS thing
)
SELECT
(SELECT AS STRUCT thing.* EXCEPT (params)) AS thing,
thing.params[OFFSET(0)].val.str_val AS total_games_played,
thing.params[OFFSET(1)].val.str_val AS games_won,
thing.params[OFFSET(2)].val.str_val AS game_time
FROM T;
+-------------------------------------------------------------------------+--------------------+-----------+-----------+
| thing | total_games_played | games_won | game_time |
+-------------------------------------------------------------------------+--------------------+-----------+-----------+
| {"name":"gameLost","dt_a":"1470625311138000","dt_b":"1470620345566000"} | 3 | 2 | 44 |
+-------------------------------------------------------------------------+--------------------+-----------+-----------+
If you don't know the order of the keys in the array, you can use subselects to extract the relevant values:
WITH T AS (
SELECT STRUCT(
"gameLost" AS name,
ARRAY<STRUCT<key STRING, val STRUCT<str_val STRING, int_val INT64>>>[
STRUCT("total_games", STRUCT("3", NULL)),
STRUCT("games_won", STRUCT("2", NULL)),
STRUCT("game_time", STRUCT("44", NULL))] AS params,
1470625311138000 AS dt_a,
1470620345566000 AS dt_b) AS thing
)
SELECT
(SELECT AS STRUCT thing.* EXCEPT (params)) AS thing,
(SELECT val.str_val FROM UNNEST(thing.params) WHERE key = "total_games") AS total_games_played,
(SELECT val.str_val FROM UNNEST(thing.params) WHERE key = "games_won") AS games_won,
(SELECT val.str_val FROM UNNEST(thing.params) WHERE key = "game_time") AS game_time
FROM T;

Try below (Legacy SQL)
SELECT
thing.name AS name,
thing.dt_a AS dt_a,
thing.dt_b AS dt_b
MAX(IF(thing.params.key = "total_games_played", INTEGER(thing.params.val.str_val), 0)) WITHIN RECORD AS total_games_played,
MAX(IF(thing.params.key = "games_won", INTEGER(thing.params.val.str_val), 0)) WITHIN RECORD AS games_won,
MAX(IF(thing.params.key = "game_time", INTEGER(thing.params.val.str_val), 0)) WITHIN RECORD AS game_time,
FROM YourTable
For Standard SQL you can try (inspired by Elliott's answer - important difference - array is ordered by key so order of key values is guaranteed)
WITH Temp AS (
SELECT
(SELECT AS STRUCT thing.* EXCEPT (params)) AS thing,
ARRAY(SELECT val.str_val AS val FROM UNNEST(thing.params) ORDER BY key) AS params
FROM YourTable
)
SELECT
thing,
params[OFFSET(2)] AS total_games_played,
params[OFFSET(1)] AS games_won,
params[OFFSET(0)] AS game_time
FROM Temp
Note: If you have other keys in params - you should add WHERE clause to SELECT inside ARRAY

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How to query duplicated rows according to column value? - sql

With your table, you can use a lateral join: select t.date, t.machine, v.pass, v.fail from t cross join lateral (values (t.pass, null), (null, t.fail)) v(pass, fail); I'm not quite sure what your query has to do with the question. But you can define it as a CTE and then use the results for t.

Related

How select a row and count row in single query in postgresql?

How can I return an ARRAY/JSON structure in a column in postgresql?

How to Use Hive parsing multiple nested JSON arrays

Counting distinct values within group

Flatten BigQuery nested field contents into new columns instead of rows

Categories

Resources