I'm new to PostgreSQL and am using version 9.4. I'm having a table with collected measurements as strings and need to convert it to a kind of PIVOT table using something which is always up-to-date, like a VIEW.
Furthermore, some values need to be converted, e. g. multiplied by 1000, as you
can see in the example below for "sensor3".
Source Table:
CREATE TABLE source (
id bigint NOT NULL,
name character varying(255),
"timestamp" timestamp without time zone,
value character varying(32672),
CONSTRAINT source_pkey PRIMARY KEY (id)
);
INSERT INTO source VALUES
(15,'sensor2','2015-01-03 22:02:05.872','88.4')
, (16,'foo27' ,'2015-01-03 22:02:10.887','-3.755')
, (17,'sensor1','2015-01-03 22:02:10.887','1.1704')
, (18,'foo27' ,'2015-01-03 22:02:50.825','-1.4')
, (19,'bar_18' ,'2015-01-03 22:02:50.833','545.43')
, (20,'foo27' ,'2015-01-03 22:02:50.935','-2.87')
, (21,'sensor3','2015-01-03 22:02:51.044','6.56');
Source Table Result:
| id | name | timestamp | value |
|----+-----------+---------------------------+----------|
| 15 | "sensor2" | "2015-01-03 22:02:05.872" | "88.4" |
| 16 | "foo27" | "2015-01-03 22:02:10.887" | "-3.755" |
| 17 | "sensor1" | "2015-01-03 22:02:10.887" | "1.1704" |
| 18 | "foo27" | "2015-01-03 22:02:50.825" | "-1.4" |
| 19 | "bar_18" | "2015-01-03 22:02:50.833" | "545.43" |
| 20 | "foo27" | "2015-01-03 22:02:50.935" | "-2.87" |
| 21 | "sensor3" | "2015-01-03 22:02:51.044" | "6.56" |
Desired Final Result:
| timestamp | sensor1 | sensor2 | sensor3 | foo27 | bar_18 |
|---------------------------+---------+---------+---------+---------+---------|
| "2015-01-03 22:02:05.872" | | 88.4 | | | |
| "2015-01-03 22:02:10.887" | 1.1704 | | | -3.755 | |
| "2015-01-03 22:02:50.825" | | | | -1.4 | |
| "2015-01-03 22:02:50.833" | | | | | 545.43 |
| "2015-01-03 22:02:50.935" | | | | -2.87 | |
| "2015-01-03 22:02:51.044" | | | 6560.00 | | |
Using this:
-- CREATE EXTENSION tablefunc;
SELECT *
FROM
crosstab(
'SELECT
source."timestamp",
source.name,
source.value
FROM
public.source
ORDER BY
1'
,
'SELECT
DISTINCT
source.name
FROM
public.source
ORDER BY
1'
)
AS
(
"timestamp" timestamp without time zone,
"sensor1" character varying(32672),
"sensor2" character varying(32672),
"sensor3" character varying(32672),
"foo27" character varying(32672),
"bar_18" character varying(32672)
)
;
I got the result:
| timestamp | sensor1 | sensor2 | sensor3 | foo27 | bar_18 |
|---------------------------+---------+---------+---------+---------+---------|
| "2015-01-03 22:02:05.872" | | | | 88.4 | |
| "2015-01-03 22:02:10.887" | | -3.755 | 1.1704 | | |
| "2015-01-03 22:02:50.825" | | -1.4 | | | |
| "2015-01-03 22:02:50.833" | 545.43 | | | | |
| "2015-01-03 22:02:50.935" | | -2.87 | | | |
| "2015-01-03 22:02:51.044" | | | | | 6.56 |
Unfortunately,
the values aren't assigned to the correct column,
the columns aren't dynamic; that means the query fails when there is an additional entry in the name column like 'sensor4' and
I don't know how to change the values of some columns (multiply).
Your query works like this:
SELECT * FROM crosstab(
$$SELECT "timestamp", name
, CASE name
WHEN 'sensor3' THEN value::numeric * 1000
-- WHEN 'sensor9' THEN value::numeric * 9000 -- add more ...
ELSE value::numeric END AS value
FROM source
ORDER BY 1, 2$$
,$$SELECT unnest('{bar_18,foo27,sensor1,sensor2,sensor3}'::text[])$$
) AS (
"timestamp" timestamp
, bar_18 numeric
, foo27 numeric
, sensor1 numeric
, sensor2 numeric
, sensor3 numeric);
To multiply the value for selected columns use a "simple" CASE statement. But you need to cast to a numeric type first. Using value::numeric in the example.
Which begs the question: Why not store value as numeric type to begin with?
You need to use the version with two parameters. Detailed explanation:
PostgreSQL Crosstab Query
Truly dynamic cross tabulation tables is next to impossible, since SQL demands to know the result type in advance - at call time at the latest. But you can do something with polymorphic types:
Dynamic alternative to pivot with CASE and GROUP BY
#Erwin: It said "too long by 7128 characters" for a comment! Anyway:
Your post gave me the hints for the right direction, so thank you very much,
but particularly in my case I need it be truly dynamic. Currently I've got
38886 rows with 49 different items (= columns to be pivoted).
To first answer yours and #Jasen's urgent question:
The source table layout is not up to me, I'm already very happy to get this
data into an RDBMS. If it were to me, I'd always save UTC-timestamps! But
there's also a reason for having the data saved as strings: it may contain
various data types, like boolean, integer, float, string etc.
To avoid confusing me further I created a new demo dataset, prefixing the data
type (I know some hate this!) to avoid problems with keywords and changing the
timestamp (--> minutes) for better overview:
-- --------------------------------------------------------------------------
-- Create demo table of given schema and insert arbitrary data
-- --------------------------------------------------------------------------
DROP TABLE IF EXISTS table_source;
CREATE TABLE table_source
(
column_id BIGINT NOT NULL,
column_name CHARACTER VARYING(255),
column_timestamp TIMESTAMP WITHOUT TIME ZONE,
column_value CHARACTER VARYING(32672),
CONSTRAINT table_source_pkey PRIMARY KEY (column_id)
);
INSERT INTO table_source VALUES ( 15,'sensor2','2015-01-03 22:01:05.872','88.4');
INSERT INTO table_source VALUES ( 16,'foo27' ,'2015-01-03 22:02:10.887','-3.755');
INSERT INTO table_source VALUES ( 17,'sensor1','2015-01-03 22:02:10.887','1.1704');
INSERT INTO table_source VALUES ( 18,'foo27' ,'2015-01-03 22:03:50.825','-1.4');
INSERT INTO table_source VALUES ( 19,'bar_18','2015-01-03 22:04:50.833','545.43');
INSERT INTO table_source VALUES ( 20,'foo27' ,'2015-01-03 22:05:50.935','-2.87');
INSERT INTO table_source VALUES ( 21,'seNSor3','2015-01-03 22:06:51.044','6.56');
SELECT * FROM table_source;
Furthermore based on #Erwin's suggestions I created a view which already
converts the data type. This has the nice feature, beside being fast, to only
add required transformations for known items, but not impacting other (new)
items.
-- --------------------------------------------------------------------------
-- Create view to process source data
-- --------------------------------------------------------------------------
DROP VIEW IF EXISTS view_source_processed;
CREATE VIEW
view_source_processed
AS
SELECT
column_timestamp,
column_name,
CASE LOWER( column_name)
WHEN LOWER( 'sensor3') THEN CAST( column_value AS DOUBLE PRECISION) * 1000.0
ELSE CAST( column_value AS DOUBLE PRECISION)
END AS column_value
FROM
table_source
;
SELECT * FROM view_source_processed ORDER BY column_timestamp DESC LIMIT 100;
This is the desired result of the whole question:
-- --------------------------------------------------------------------------
-- Desired result:
-- --------------------------------------------------------------------------
/*
| column_timestamp | bar_18 | foo27 | sensor1 | sensor2 | seNSor3 |
|---------------------------+---------+---------+---------+---------+---------|
| "2015-01-03 22:01:05.872" | | | | 88.4 | |
| "2015-01-03 22:02:10.887" | | -3.755 | 1.1704 | | |
| "2015-01-03 22:03:50.825" | | -1.4 | | | |
| "2015-01-03 22:04:50.833" | 545.43 | | | | |
| "2015-01-03 22:05:50.935" | | -2.87 | | | |
| "2015-01-03 22:06:51.044" | | | | | 6560 |
*/
This is #Erwin's solution, adopted to the new demo source data. It's perfect,
as long as the items (= columns to be pivoted) doesn't change:
-- --------------------------------------------------------------------------
-- Solution by Erwin, modified for changed demo dataset:
-- http://stackoverflow.com/a/27773730
-- --------------------------------------------------------------------------
SELECT *
FROM
crosstab(
$$
SELECT
column_timestamp,
column_name,
column_value
FROM
view_source_processed
ORDER BY
1, 2
$$
,
$$
SELECT
UNNEST( '{bar_18,foo27,sensor1,sensor2,seNSor3}'::text[])
$$
)
AS
(
column_timestamp timestamp,
bar_18 DOUBLE PRECISION,
foo27 DOUBLE PRECISION,
sensor1 DOUBLE PRECISION,
sensor2 DOUBLE PRECISION,
seNSor3 DOUBLE PRECISION
)
;
When reading through the links #Erwin provided, I found a Dynamic SQL example
by #Clodoaldo Neto and remembered, that I had already done it this way in
Transact-SQL; this is my attempt:
-- --------------------------------------------------------------------------
-- Dynamic attempt based on:
-- http://stackoverflow.com/a/12989297/131874
-- --------------------------------------------------------------------------
DO $DO$
DECLARE
list_columns TEXT;
BEGIN
DROP TABLE IF EXISTS temp_table_pivot;
list_columns := (
SELECT
string_agg( DISTINCT column_name, ' ' ORDER BY column_name)
FROM
view_source_processed
);
EXECUTE(
FORMAT(
$format_1$
CREATE TEMP TABLE
temp_table_pivot(
column_timestamp TIMESTAMP,
%1$s
)
$format_1$
,
(
REPLACE(
list_columns,
' ',
' DOUBLE PRECISION, '
) || ' DOUBLE PRECISION'
)
)
);
EXECUTE(
FORMAT(
$format_2$
INSERT INTO temp_table_pivot
SELECT
*
FROM crosstab(
$crosstab_1$
SELECT
column_timestamp,
column_name,
column_value
FROM
view_source_processed
ORDER BY
column_timestamp, column_name
$crosstab_1$
,
$crosstab_2$
SELECT DISTINCT
column_name
FROM
view_source_processed
ORDER BY
column_name
$crosstab_2$
)
AS
(
column_timestamp TIMESTAMP,
%1$s
);
$format_2$
,
REPLACE( list_columns, ' ', ' DOUBLE PRECISION, ')
||
' DOUBLE PRECISION'
)
);
END;
$DO$;
SELECT * FROM temp_table_pivot ORDER BY column_timestamp DESC LIMIT 100;
Beside getting this into a stored procedure, I will, for performance reasons,
try to adopt this to an intermediate table where only new values are inserted.
I'll keep you up-to-date!
Thanks!!!
L.
PS: NO, I don't want to answer my own question, but the "comment"-field is too small!
Related
I have a column of type jsonb which contains json arrays of the form
[
{
"Id": 62497,
"Text": "BlaBla"
}
]
I'd like to update the Id to the value of a column word_id (type uuid) from a different table word.
I tried this
update inflection_copy
SET inflectionlinks = s.json_array
FROM (
SELECT jsonb_agg(
CASE
WHEN elems->>'Id' = (
SELECT word_copy.id::text
from word_copy
where word_copy.id::text = elems->>'Id'
) THEN jsonb_set(
elems,
'{Id}'::text [],
(
SELECT jsonb(word_copy.word_id::text)
from word_copy
where word_copy.id::text = elems->>'Id'
)
)
ELSE elems
END
) as json_array
FROM inflection_copy,
jsonb_array_elements(inflectionlinks) elems
) s;
Until now I always get the following error:
invalid input syntax for type json
DETAIL: Token "c66a4353" is invalid.
CONTEXT: JSON data, line 1: c66a4353...
The c66a4535 is part of one of the uuids of the word table. I don't understand why this is marked as invalid input.
EDIT:
To give an example of one of the uuids:
select to_jsonb(word_id::text) from word_copy limit(5);
returns
+----------------------------------------+
| to_jsonb |
|----------------------------------------|
| "078c979d-e479-4fce-b27c-d14087f467c2" |
| "ef288256-1599-4f0f-a932-aad85d666c9a" |
| "d1d95b60-623e-47cf-b770-de46b01042c5" |
| "f97464c6-b872-4be8-9d9d-83c0102fb26a" |
| "9bb19719-e014-4286-a2d1-4c0cf7f089fc" |
+----------------------------------------+
As requested the respective columns id and word_id from the word table:
+---------------------------------------------------+
| row |
|---------------------------------------------------|
| ('27733', '078c979d-e479-4fce-b27c-d14087f467c2') |
| ('72337', 'ef288256-1599-4f0f-a932-aad85d666c9a') |
| ('72340', 'd1d95b60-623e-47cf-b770-de46b01042c5') |
| ('27741', 'f97464c6-b872-4be8-9d9d-83c0102fb26a') |
| ('72338', '9bb19719-e014-4286-a2d1-4c0cf7f089fc') |
+---------------------------------------------------+
+----------------+----------+----------------------------+
| Column | Type | Modifiers |
|----------------+----------+----------------------------|
| id | bigint | |
| value | text | |
| homonymnumber | smallint | |
| pronounciation | text | |
| audio | text | |
| level | integer | |
| alpha | bigint | |
| frequency | bigint | |
| hanja | text | |
| typeeng | text | |
| typekr | text | |
| word_id | uuid | default gen_random_uuid() |
+----------------+----------+----------------------------+
I would suggest you to modify your sub query as follow :
update inflection_copy AS ic
SET inflectionlinks = s.json_array
FROM
(SELECT jsonb_agg(CASE WHEN wc.word_id IS NULL THEN e.elems ELSE jsonb_set(e.elems, array['Id'], to_jsonb(wc.word_id::text)) END ORDER BY e.id ASC) AS json_array
FROM inflection_copy AS ic
CROSS JOIN LATERAL jsonb_path_query(ic.inflectionlinks, '$[*]') WITH ORDINALITY AS e(elems, id)
LEFT JOIN word_copy AS wc
ON wc.id::text = e.elems->>'Id'
) AS s
The LEFT JOIN clause will return wc.word_id = NULL when there is no wc.id which corresponds to e.elems->>'id', so that e.elems is unchanged in the CASE.
The ORDER BY clause in the aggregate function jsonb_agg will ensure that the order is unchanged in the jsonb array.
jsonb_path_query is used instead of jsonb_array_elements so that to not raise an error when ic.inflectionlinks is not a jsonb array and it is used in lax mode (which is the default behavior).
see the test result in dbfiddle
Problem
(This is for an open source, analytics library.)
Here's our query results from events_view:
id | visit_id | name | prop0 | prop1 | url
------+----------+--------+----------------------------+-------+------------
2004 | 4 | Magnus | 2021-10-26 02:25:55.790999 | 142 | cnn.com
2007 | 4 | Hartis | 2021-10-26 02:26:37.773999 | 25 | fox.com
Currently all columns are VARCHAR.
Column | Type | Collation | Nullable | Default
----------+-------------------+-----------+----------+---------
id | bigint | | |
visit_id | character varying | | |
name | character varying | | |
prop0 | character varying | | |
prop1 | character varying | | |
url | character varying | | |
They should be something like
Column | Type | Collation | Nullable | Default
----------+------------------------+-----------+----------+---------
id | bigint | | |
visit_id | bigint | | |
name | character varying | | |
prop0 | time without time zone | | |
prop1 | bigint | | |
url | character varying | | |
Desired result
Hardcoding these castings as in SELECT visit::bigint, name::varchar, prop0::time, prop1::integer, url::varchar FROM tbl won't do, column names are known in run time only.
To simplify things we could cast each column into only three types: boolean, numeric, or varchar. Use regexps below for matching types:
boolean: ^(true|false|t|f)$
numeric: ^(,-)[0-9]+(,\.[0-9]+)$
varchar: every result that does not match boolean and numeric above
What should be the SQL that discover what type each column is and dynamically cast them?
These are a few ideas rather than a true solution for this tricky job. A slow but very reliable function can be used instead of regular expressions.
create or replace function can_cast(s text, vtype text)
returns boolean language plpgsql immutable as
$body$
begin
execute format('select %L::%s', s, vtype);
return true;
exception when others then
return false;
end;
$body$;
Data may be presented like this (partial list of columns from your example)
create or replace temporary view tv(id, visit_id, prop0, prop1) as
values
(
2004::bigint,
4::bigint,
case when can_cast('2021-10-26 02:25:55.790999', 'time') then '2021-10-26 02:25:55.790999'::time end,
case when can_cast('142', 'bigint') then '142'::bigint end
), -- determine the types
(2007, 4, '2021-10-26 02:26:37.773999', 25)
-- the rest of the data here;
I believe that it is possible to generate the temporary view DDL dynamically as a select from events_view too.
i have string data on my column
------------
name
------------
john-yolo
john-yolo
john-yolo
felix-goran
carine-carin
carine-carin
i want to select name column with how many times the name present
ex :
------------
name
------------
john-yolo-1
john-yolo-2
john-yolo-3
felix-goran-1
carine-carin-1
carine-carin-2
how can i produce data like that?
MaraiDB supports ROW_NUMBER
CREATE TABLE test
(`name` varchar(12))
;
INSERT INTO test
(`name`)
VALUES
('john-yolo'),
('john-yolo'),
('john-yolo'),
('felix-goran'),
('carine-carin'),
('carine-carin')
;
SELECT CONCAT(name,'-', ROW_NUMBER() OVER(PARTITION BY name)) as name FROM test
| name |
| :------------- |
| carine-carin-1 |
| carine-carin-2 |
| felix-goran-1 |
| john-yolo-1 |
| john-yolo-2 |
| john-yolo-3 |
db<>fiddle here
Here is my script, once i run the script, it return in single colomn with value like (val1,val2,val3,bla..bla bla...
CREATE OR REPLACE FUNCTION meter_latest_read_custom()
RETURNS TABLE(
maxdate timestamp without time zone,
ertu integer,
meter integer,
meter_name character varying,
acq_9010 numeric)
AS
$BODY$
DECLARE
formal_table text;
BEGIN
FOR formal_table IN
SELECT
quote_ident(table_name)
FROM
information_schema.tables
WHERE
table_schema = 'public' AND
table_name LIKE 'task%_1'
LOOP
RETURN QUERY EXECUTE
'with groupedft as (
SELECT meter_id, MAX(acq_date) AS MaxDateTime
FROM ' ||formal_table|| ' GROUP BY meter_id),
foo as (
SELECT
ft.acq_date AS maxdate,
ft.ertu_id AS ertu,
ft.meter_id AS meter,
ft.acq_9010 AS acq_9010
FROM
'||formal_table|| ' ft
INNER JOIN groupedft
ON
ft.meter_id = groupedft.meter_id
AND ft.acq_date = groupedft.MaxDateTime)
SELECT
maxdate, ertu, meter, m.meter_name, acq_9010
FROM
foo
LEFT JOIN
meter_record m
ON
foo.meter=m.meter_id
AND foo.ertu=m.ertu_id';
END LOOP;
END
$BODY$
LANGUAGE plpgsql VOLATILE
COST 100
ROWS 1000;
ALTER FUNCTION meter_latest_read_custom() OWNER TO postgres;
The result will be return in single column
"("2017-02-16 10:45:00",201,6,"SPARE 6",)"
"("2017-02-16 10:45:00",201,14,"SPARE 14",)"
"("2017-02-16 10:45:00",201,8,"SPARE 8",)"
"("2017-02-16 10:45:00",201,12,"SPARE 12",)"
"("2017-02-16 10:45:00",201,1,"E.CO-PUAS KAJANG/AC PANEL ETS",16986.00000)"
"("2017-02-16 10:45:00",201,2,"SPARE 2",)"
"("2017-02-16 10:45:00",201,3,"SPARE 3",)"
"("2017-02-16 10:45:00",201,10,"SPARE 10",)"
"("2017-02-16 10:45:00",201,11,"SPARE 11",)"
"("2017-02-16 10:45:00",201,4,"SPARE 4",)"
Im required the result return in seperated column not in one column.. where to modify it. ?
-------------------------------------------------------------------
| maxdate | ertu | meter | meter_name | acq_9010 |
-------------------------------------------------------------------
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
Try this way
SELECT * FROM meter_latest_read_custom();
I have created a HIVE partition table and when I run describe table I see other table properties as well as the table column details. If I want to see only the table column details, then what command can I use?
create table t1 (x int, y int, s string) partitioned by (z date) stored as sequencefile;
describe t1;
+--------------------------+-----------------------+-----------------------+--+
| col_name | data_type | comment |
+--------------------------+-----------------------+-----------------------+--+
| x | int | |
| y | int | |
| s | string | |
| z | date | |
| | NULL | NULL |
| # Partition Information | NULL | NULL |
| # col_name | data_type | comment |
| | NULL | NULL |
| z | date | |
+--------------------------+-----------------------+-----------------------+--+
Can the last 5 rows be avoided?
| NULL | NULL |
| # Partition Information | NULL | NULL |
| # col_name | data_type | comment |
| | NULL | NULL |
| z | date | |
Also what does this NULL | NULL row means?
What you're looking for is this configuration parameter:
set hive.display.partition.cols.separately=false
From hive documentation:
In Hive 0.10.0 and earlier, no distinction is made between partition columns and non-partition columns while displaying columns for DESCRIBE TABLE. From Hive 0.12.0 onwards, they are displayed separately.
In Hive 0.13.0 and later, the configuration parameter hive.display.partition.cols.separately lets you use the old behavior, if desired (HIVE-6689). For an example, see the test case in the patch for HIVE-6689.