Hive – partition table query failed when stored as parquet

Hive – partition table query failed when stored as parquetHiveisdevelopedbyFacebooktoanalyzeandextractusefulinformationfromtheirhugedatabutnowitis…

大家好,欢迎来到IT知识分享网。

Hive is developed by Facebook to analyze and extract useful information from their huge data but now it is very popular in other organizations too such as Netflix and FINRA.

Use-case:

Now a days most of us are using different ways to optimize query or we can say to improve the performance of the Hive query. Out of which 2 most common techniques are:

  1. Partitioning
  2. Storing data in parquet format.

Partitioning is very known concept to the folks who are processing/analyzing/aggregating their data thru Apache Hive and the Parquet file format incorporates several features that make it highly suited to data warehouse-style.

But most of us are unaware of the fact that Apache hive does not support the query, when storing a partitioned table in parquet format and executing a query on partitioned column.

Let’s have a detail look into it.

Below is the pipe delimiter sample data present in HDFS which we will load into managed non-partitioned Hive table

Hive – partition table query failed when stored as parquetBelow steps will create a managed hive table named “hive_emp1”.

Hive – partition table query failed when stored as parquet

Loading data from HDFS into hive table (hive_emp1) which we have created in above steps.

Hive – partition table query failed when stored as parquet

Take a look into data present in Hive table created above.

We have few Males and 2 Females which are represented by ‘M’ and ‘F’ respectively in last column (sex).

Hive – partition table query failed when stored as parquet

Now, we will create another table in hive name “hive_emp_dynpart”, which will be partitioned on 2 columns (dept and gender) and also data of this table will be stored in parquet format.

Hive – partition table query failed when stored as parquet

Set the hive.exec.dynamic.partition to true and hive.exec.dynamic.partition.mode to nonstrict to load the data dynamically in hive table.

Hive – partition table query failed when stored as parquet

We will insert the data from hive_emp1 table into hive_emp_dynpart table along with partitions too.

Hive – partition table query failed when stored as parquet

Issue:

While querying the hive_emp_dynpart table with one of the partition column, you will get the following error, for all other regular column it is working fine. 

Hive – partition table query failed when stored as parquet

Those who are unable to see above screen, can refer to below statements for error.

hive> select * from hive_emp_dynpart where gender = ‘M’;
OK
Failed with exception java.io.IOException:java.lang.IllegalArgumentException: Column [gender] was not found in schema!
Time taken: 0.255 seconds

Error Description:

It is a known bug in Apache Hive (HIVE-11401) filtering option, when the partitioned was stored as Parquet.

Resolution:

A known workaround is to disable predicate pushdown by setting property hive.optimize.index.filter to false.

Hive – partition table query failed when stored as parquet

Now query the table using same command.

Hive – partition table query failed when stored as parquet

Conclusion:

You need to set the property to false every time you execute the query.

hive2.3.0 fix it: https://issues.apache.org/jira/browse/HIVE-15782

免责声明:本站所有文章内容,图片,视频等均是来源于用户投稿和互联网及文摘转载整编而成,不代表本站观点,不承担相关法律责任。其著作权各归其原作者或其出版社所有。如发现本站有涉嫌抄袭侵权/违法违规的内容,侵犯到您的权益,请在线联系站长,一经查实,本站将立刻删除。 本文来自网络,若有侵权,请联系删除,如若转载,请注明出处:https://yundeesoft.com/27682.html

(0)
上一篇 2023-09-20 22:33
下一篇 2023-10-19 17:45

相关推荐

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注

关注微信