2022 Changelog
ClickHouse release 22.12, 2022-12-15
Danger
This release contains a faulty systemd service comment that might break the ClickHouse installation on upgrade for some Linux distributions. The systemd service changes the directory owner permissions on /run/systemd, causing all subsequent systemd operations to fail. It is advised that you skip upgrading to this version and instead upgrade to a newer version of ClickHouse.
Refer to this issue on GitHub for more details: https://github.com/ClickHouse/ClickHouse/issues/48285
Upgrade Notes
- Fixed backward incompatibility in (de)serialization of states of min,max,any*,argMin,argMaxaggregate functions withStringargument. The incompatibility affects 22.9, 22.10 and 22.11 branches (fixed since 22.9.6, 22.10.4 and 22.11.2 correspondingly). Some minor releases of 22.3, 22.7 and 22.8 branches are also affected: 22.3.13...22.3.14 (fixed since 22.3.15), 22.8.6...22.8.9 (fixed since 22.8.10), 22.7.6 and newer (will not be fixed in 22.7, we recommend upgrading from 22.7.* to 22.8.10 or newer). This release note does not concern users that have never used affected versions. Incompatible versions append an extra'\0'to strings when reading states of the aggregate functions mentioned above. For example, if an older version saved state ofanyState('foobar')tostate_columnthen the incompatible version will print'foobar\0'onanyMerge(state_column). Also incompatible versions write states of the aggregate functions without trailing'\0'. Newer versions (that have the fix) can correctly read data written by all versions including incompatible versions, except one corner case. If an incompatible version saved a state with a string that actually ends with null character, then newer version will trim trailing'\0'when reading state of affected aggregate function. For example, if an incompatible version saved state ofanyState('abrac\0dabra\0')tostate_columnthen newer versions will print'abrac\0dabra'onanyMerge(state_column). The issue also affects distributed queries when an incompatible version works in a cluster together with older or newer versions. #43038 (Alexander Tokmakov, Raúl Marín). Note: all the official ClickHouse builds already include the patches. This is not necessarily true for unofficial third-party builds that should be avoided.
New Feature
- Add BSONEachRowinput/output format. In this format, ClickHouse formats/parses each row as a separate BSON document and each column is formatted/parsed as a single BSON field with the column name as the key. #42033 (mark-polokhov).
- Add grace_hashJOIN algorithm, it can be enabled withSET join_algorithm = 'grace_hash'. #38191 (BigRedEye, Vladimir C).
- Allow configuring password complexity rules and checks for creating and changing users. #43719 (Nikolay Degterinsky).
- Mask sensitive information in logs; mask secret parts in the output of queries SHOW CREATE TABLEandSELECT FROM system.tables. Also resolves #41418. #43227 (Vitaly Baranov).
- Add GROUP BY ALLsyntax: #37631. #42265 (刘陶峰).
- Add FROM table SELECT columnsyntax. #41095 (Nikolay Degterinsky).
- Added function concatWithSeparatorandconcat_wsas an alias for Spark SQL compatibility. A functionconcatWithSeparatorAssumeInjectiveadded as a variant to enable GROUP BY optimization, similarly toconcatAssumeInjective. #43749 (李扬).
- Added multiplyDecimalanddivideDecimalfunctions for decimal operations with fixed precision. #42438 (Andrey Zvonov).
- Added system.movestable with list of currently moving parts. #42660 (Sergei Trifonov).
- Add support for embedded Prometheus endpoint for ClickHouse Keeper. #43087 (Antonio Andelic).
- Support numeric literals with _as the separator, for example,1_000_000. #43925 (jh0x).
- Added possibility to use an array as a second parameter for cutURLParameterfunction. It will cut multiple parameters. Close #6827. #43788 (Roman Vasin).
- Add a column with the expression of the index in the system.data_skipping_indicestable. #43308 (Guillaume Tassery).
- Add column engine_fullto system tabledatabasesso that users can access the entire engine definition of a database via system tables. #43468 (凌涛).
- New hash function xxh3 added. Also, the performance of xxHash32andxxHash64are improved on ARM thanks to a library update. #43411 (Nikita Taranov).
- Added support to define constraints for merge tree settings. For example you can forbid overriding the storage_policyby users. #43903 (Sergei Trifonov).
- Add a new setting input_format_json_read_objects_as_stringsthat allows the parsing of nested JSON objects into Strings in all JSON input formats. This setting is disabled by default. #44052 (Kruglov Pavel).
Experimental Feature
- Support deduplication for asynchronous inserts. Before this change, async inserts did not support deduplication, because multiple small inserts coexisted in one inserted batch. Closes #38075. #43304 (Han Fei).
- Add support for cosine distance for the experimental Annoy (vector similarity search) index. #42778 (Filatenkov Artur).
- Add CREATE / ALTER / DROP NAMED COLLECTIONqueries. #43252 (Kseniia Sumarokova). This feature is under development and the queries are not effective as of version 22.12. This changelog entry is added only to avoid confusion. Restrict default access to named collections to the user defined in config. This requires thatshow_named_collections = 1is set to be able to see them. #43325 (Kseniia Sumarokova). Thesystem.named_collectionstable is introduced #43147 (Kseniia Sumarokova).
Performance Improvement
- Add settings max_streams_for_merge_tree_readingandallow_asynchronous_read_from_io_pool_for_merge_tree. Settingmax_streams_for_merge_tree_readinglimits the number of reading streams for MergeTree tables. Settingallow_asynchronous_read_from_io_pool_for_merge_treeenables a background I/O pool to read fromMergeTreetables. This may increase performance for I/O bound queries if used together withmax_streams_to_max_threads_ratioormax_streams_for_merge_tree_reading. #43260 (Nikolai Kochetov). This improves performance up to 100 times in case of high latency storage, low number of CPU and high number of data parts.
- Settings merge_tree_min_rows_for_concurrent_read_for_remote_filesystem/merge_tree_min_bytes_for_concurrent_read_for_remote_filesystemdid not respect adaptive granularity. Fat rows did not decrease the number of read rows (as it was done formerge_tree_min_rows_for_concurrent_read/merge_tree_min_bytes_for_concurrent_read, which could lead to high memory usage when using remote filesystems. #43965 (Nikolai Kochetov).
- Optimized the number of list requests to ZooKeeper or ClickHouse Keeper when selecting a part to merge. Previously it could produce thousands of requests in some cases. Fixes #43647. #43675 (Alexander Tokmakov).
- Optimization is getting skipped now if max_size_to_preallocate_for_aggregationhas too small a value. The default value of this setting increased to10^8. #43945 (Nikita Taranov).
- Speed-up server shutdown by avoiding cleaning up of old data parts. Because it is unnecessary after https://github.com/ClickHouse/ClickHouse/pull/41145. #43760 (Sema Checherinda).
- Merging on initiator now uses the same memory bound approach as merging of local aggregation results if enable_memory_bound_merging_of_aggregation_resultsis set. #40879 (Nikita Taranov).
- Keeper improvement: try syncing logs to disk in parallel with replication. #43450 (Antonio Andelic).
- Keeper improvement: requests are batched more often. The batching can be controlled with the new setting max_requests_quick_batch_size. #43686 (Antonio Andelic).
Improvement
- Implement referential dependencies and use them to create tables in the correct order while restoring from a backup. #43834 (Vitaly Baranov).
- Substitute UDFs in CREATEquery to avoid failures during loading at startup. Additionally, UDFs can now be used asDEFAULTexpressions for columns. #43539 (Antonio Andelic).
- Change how the following queries delete parts: TRUNCATE TABLE, ALTER TABLE DROP PART, ALTER TABLE DROP PARTITION. Now, these queries make empty parts which cover the old parts. This makes the TRUNCATE query work without a followedexclusive lock which means concurrent reads aren't locked. Also achieved durability in all those queries. If the request succeeds, then no resurrected parts appear later. Note that atomicity is achieved only with transaction scope. #41145 (Sema Checherinda).
- SET param_xquery no longer requires manual string serialization for the value of the parameter. For example, query- SET param_a = '[\'a\', \'b\']'can now be written like- SET param_a = ['a', 'b']. #41874 (Nikolay Degterinsky).
- Show read rows in the progress indication while reading from STDIN from client. Closes #43423. #43442 (Kseniia Sumarokova).
- Show progress bar while reading from s3 table function / engine. #43454 (Kseniia Sumarokova).
- Progress bar will show both read and written rows. #43496 (Ilya Yatsishin).
- filesystemAvailableand related functions support one optional argument with disk name, and change- filesystemFreeto- filesystemUnreserved. Closes #35076. #42064 (flynn).
- Integration with LDAP: increased the default value of search_limit to 256, and added LDAP server config option to change that to an arbitrary value. Closes: #42276. #42461 (Vasily Nemkov).
- Allow the removal of sensitive information (see the query_masking_rulesin the configuration file) from the exception messages as well. Resolves #41418. #42940 (filimonov).
- Support queries like SHOW FULL TABLES ...for MySQL compatibility. #43910 (Filatenkov Artur).
- Keeper improvement: Add 4lw command rqldwhich can manually assign a node as leader. #43026 (JackyWoo).
- Apply connection timeout settings for Distributed async INSERT from the query. #43156 (Azat Khuzhin).
- The unhexfunction now supportsFixedStringarguments. issue42369. #43207 (DR).
- Priority is given to deleting completely expired parts according to the TTL rules, see #42869. #43222 (zhongyuankai).
- More precise and reactive CPU load indication in clickhouse-client. #43307 (Sergei Trifonov).
- Support reading of subcolumns of nested types from storage S3and table functions3with formatsParquet,ArrowandORC. #43329 (chen).
- Add table_uuidcolumn to thesystem.partstable. #43404 (Azat Khuzhin).
- Added client option to display the number of locally processed rows in non-interactive mode (--print-num-processed-rows). #43407 (jh0x).
- Implement aggregation-in-orderoptimization on top of a query plan. It is enabled by default (but works only together withoptimize_aggregation_in_order, which is disabled by default). Setquery_plan_aggregation_in_order = 0to use the previous AST-based version. #43592 (Nikolai Kochetov).
- Allow to collect profile events with trace_type = 'ProfileEvent'tosystem.trace_logon each increment with current stack, profile event name and value of the increment. It can be enabled by the settingtrace_profile_eventsand used to investigate performance of queries. #43639 (Anton Popov).
- Add a new setting input_format_max_binary_string_sizeto limit string size in RowBinary format. #43842 (Kruglov Pavel).
- When ClickHouse requests a remote HTTP server, and it returns an error, the numeric HTTP code was not displayed correctly in the exception message. Closes #43919. #43920 (Alexey Milovidov).
- Correctly report errors in queries even when multiple JOINs optimization is taking place. #43583 (Salvatore).
Build/Testing/Packaging Improvement
- Systemd integration now correctly notifies systemd that the service is really started and is ready to serve requests. #43400 (Коренберг Марк).
- Added the option to build ClickHouse with OpenSSL using the OpenSSL FIPS Module. This build type has not been tested to validate security and is not supported. #43991 (Boris Kuschel).
- Upgrade to the new DeflateQplcompression codec which has been implemented in a previous PR (details: https://github.com/ClickHouse/ClickHouse/pull/39494). This patch improves codec on below aspects: 1. QPL v0.2.0 to QPL v0.3.0 Intel® Query Processing Library (QPL) 2. Improve CMake file for fixing QPL build issues for QPL v0.3.0. 3. Link the QPL library with libaccel-config at build time instead of runtime loading on QPL v0.2.0 (dlopen) 4. Fixed log print issue in CompressionCodecDeflateQpl.cpp. #44024 (jasperzhu).
Bug Fix (user-visible misbehavior in official stable or prestable release)
- Fixed bug which could lead to deadlock while using asynchronous inserts. #43233 (Anton Popov).
- Fix some incorrect logic in AST level optimization optimize_normalize_count_variants. #43873 (Duc Canh Le).
- Fix a case when mutations are not making progress when checksums do not match between replicas (e.g. caused by a change in data format on an upgrade). #36877 (nvartolomei).
- Fix the skip_unavailable_shardsoptimization which did not work with thehdfsClustertable function. #43236 (chen).
- Fix s3support for the?wildcard. Closes #42731. #43253 (chen).
- Fix functions arrayFirstOrNullandarrayLastOrNullor null when the array containsNullableelements. #43274 (Duc Canh Le).
- Fix incorrect UserTimeMicroseconds/SystemTimeMicrosecondsaccounting related to Kafka tables. #42791 (Azat Khuzhin).
- Do not suppress exceptions in webdisks. Fix retries for thewebdisk. #42800 (Azat Khuzhin).
- Fixed (logical) race condition between inserts and dropping materialized views. A race condition happened when a Materialized View was dropped at the same time as an INSERT, where the MVs were present as a dependency of the insert at the begining of the execution, but the table has been dropped by the time the insert chain tries to access it, producing either an UNKNOWN_TABLEorTABLE_IS_DROPPEDexception, and stopping the insertion. After this change, we avoid these exceptions and just continue with the insert if the dependency is gone. #43161 (AlfVII).
- Fix undefined behavior in the quantilesfunction, which might lead to uninitialized memory. Found by fuzzer. This closes #44066. #44067 (Alexey Milovidov).
- Additional check on zero uncompressed size is added to CompressionCodecDelta. #43255 (Nikita Taranov).
- Flatten arrays from Parquet to avoid an issue with inconsistent data in arrays. These incorrect files can be generated by Apache Iceberg. #43297 (Arthur Passos).
- Fix bad cast from LowCardinalitycolumn when using short circuit function execution. #43311 (Kruglov Pavel).
- Fixed queries with SAMPLE BYwith prewhere optimization on tables usingMergeengine. #43315 (Antonio Andelic).
- Check and compare the content of the format_versionfile inMergeTreeDataso that tables can be loaded even if the storage policy was changed. #43328 (Antonio Andelic).
- Fix possible (very unlikely) "No column to rollback" logical error during INSERT into Buffertables. #43336 (Azat Khuzhin).
- Fix a bug that allowed the parser to parse an unlimited amount of round brackets into one function if allow_function_parametersis set. #43350 (Nikolay Degterinsky).
- MaterializeMySQL(experimental feature) support DDL:- drop table t1, t2and compatible with most of MySQL DROP DDL. #43366 (zzsmdfj).
- session_log(experimental feature): Fixed the inability to log in (because of failure to create the session_log entry) in a very rare case of messed up setting profiles. #42641 (Vasily Nemkov).
- Fix possible Cannot create non-empty column with type Nothingin functionsif/multiIf. Closes #43356. #43368 (Kruglov Pavel).
- Fix a bug when a row level filter uses the default value of a column. #43387 (Alexander Gololobov).
- Query with DISTINCT+LIMIT BY+LIMITcan return fewer rows than expected. Fixes #43377. #43410 (Igor Nikonov).
- Fix sumMapforNullable(Decimal(...)). #43414 (Azat Khuzhin).
- Fix date_difffor hour/minute on macOS. Close #42742. #43466 (zzsmdfj).
- Fix incorrect memory accounting because of merges/mutations. #43516 (Azat Khuzhin).
- Fixed primary key analysis with conditions involving toString(enum). #43596 (Nikita Taranov). This error has been found by @tisonkun.
- Ensure consistency when clickhouse-copierupdates status andattach_is_donein Keeper after partition attach is done. #43602 (lzydmxy).
- During the recovery of a lost replica of a Replicateddatabase (experimental feature), there could a situation where we need to atomically swap two table names (use EXCHANGE). Previously we tried to use two RENAME queries, which was obviously failing and moreover, failed the whole recovery process of the database replica. #43628 (Nikita Mikhaylov).
- Fix the case when the s3Clusterfunction throwsNOT_FOUND_COLUMN_IN_BLOCKerror. Closes #43534. #43629 (chen).
- Fix possible logical error Array sizes mismatchedwhile parsing JSON object with arrays with same key names but with different nesting level. Closes #43569. #43693 (Kruglov Pavel).
- Fixed possible exception in the case of distributed GROUP BYwith anALIAScolumn among aggregation keys. #43709 (Nikita Taranov).
- Fix bug which can lead to broken projections if zero-copy replication (experimental feature) is enabled and used. #43764 (alesapin).
- Fix using multipart upload for very large S3 objects in AWS S3. #43824 (ianton-ru).
- Fixed ALTER ... RESET SETTINGwithON CLUSTER. It could have been applied to one replica only. Fixes #43843. #43848 (Elena Torró).
- Fix a logical error in JOIN with Jointable engine at right hand side, ifUSINGis being used. #43963 (Vladimir C). Fix a bug with wrong order of keys inJointable engine. #44012 (Vladimir C).
- Keeper fix: throw if the interserver port for Raft is already in use. #43984 (Antonio Andelic).
- Fix ORDER BY positional argument (example: ORDER BY 1, 2) in case of unneeded columns pruning from subqueries. Closes #43964. #43987 (Kseniia Sumarokova).
- Fixed exception when a subquery contains HAVING but doesn't contain an actual aggregation. #44051 (Nikita Taranov).
- Fix race in s3 multipart upload. This race could cause the error Part number must be an integer between 1 and 10000, inclusive. (S3_ERROR)while restoring from a backup. #44065 (Vitaly Baranov).
ClickHouse release 22.11, 2022-11-17
Backward Incompatible Change
- JSONExtractfamily of functions will now attempt to coerce to the requested type. #41502 (Márcio Martins).
New Feature
- Adds support for retries during INSERTs into ReplicatedMergeTree when a session with ClickHouse Keeper is lost. Apart from fault tolerance, it aims to provide better user experience, - avoid returning a user an error during insert if keeper is restarted (for example, due to upgrade). #42607 (Igor Nikonov).
- Add HudiandDeltaLaketable engines, read-only, only for tables on S3. #41054 (Daniil Rubin, Kseniia Sumarokova).
- Add table function hudianddeltaLake. #43080 (flynn).
- Support for composite time intervals. 1. Add, subtract and negate operations are now available on Intervals. In the case where the types of Intervals are different, they will be transformed into the Tuple of those types. 2. A tuple of intervals can be added to or subtracted from a Date/DateTime field. 3. Added parsing of Intervals with different types, for example: INTERVAL '1 HOUR 1 MINUTE 1 SECOND'. #42195 (Nikolay Degterinsky).
- Added **glob support for recursive directory traversal of the filesystem and S3. Resolves #36316. #42376 (SmitaRKulkarni).
- Introduce s3_plaindisk type for write-once-read-many operations. ImplementATTACHofMergeTreetable fors3_plaindisk. #42628 (Azat Khuzhin).
- Added applied row-level policies to system.query_log. #39819 (Vladimir Chebotaryov).
- Add four-letter command csnpfor manually creating snapshots in ClickHouse Keeper. Additionally,lgifwas added to get Raft information for a specific node (e.g. index of last created snapshot, last committed log index). #41766 (JackyWoo).
- Add function asciilike in Apache Spark: https://spark.apache.org/docs/latest/api/sql/#ascii. #42670 (李扬).
- Add function pmodwhich returns non-negative result based on modulo. #42755 (李扬).
- Add function formatReadableDecimalSize. #42774 (Alejandro).
- Add function randCanonical, which is similar to therandfunction in Apache Spark or Impala. The function generates pseudo random results with independent and identically distributed uniformly distributed values in [0, 1). #43124 (李扬).
- Add function displayName, closes #36770. #37681 (hongbin).
- Add min_age_to_force_merge_on_partition_onlysetting to optimize old parts for the entire partition only. #42659 (Antonio Andelic).
- Add generic implementation for arbitrary structured named collections, access type and system.named_collections. #43147 (Kseniia Sumarokova).
Performance Improvement
- matchfunction can use the index if it's a condition on string prefix. This closes #37333. #42458 (clarkcaoliu).
- Speed up AND and OR operators when they are sequenced. #42214 (Zhiguo Zhou).
- Support parallel parsing for LineAsStringinput format. This improves performance just slightly. This closes #42502. #42780 (Kruglov Pavel).
- ClickHouse Keeper performance improvement: improve commit performance for cases when many different nodes have uncommitted states. This should help with cases when a follower node can't sync fast enough. #42926 (Antonio Andelic).
- A condition like NOT LIKE 'prefix%'can use the primary index. #42209 (Duc Canh Le).
Experimental Feature
- Support type Objectinside other types, e.g.Array(JSON). #36969 (Anton Popov).
- Ignore MySQL binlog SAVEPOINT event for MaterializedMySQL. #42931 (zzsmdfj). Handle (ignore) SAVEPOINT queries in MaterializedMySQL. #43086 (Stig Bakken).
Improvement
- Trivial queries with small LIMIT will properly determine the number of estimated rows to read, so that the threshold will be checked properly. Closes #7071. #42580 (Han Fei).
- Add support for interactive parameters in INSERT VALUES queries. #43077 (Nikolay Degterinsky).
- Added new field allow_readonlyinsystem.table_functionsto allow using table functions in readonly mode. Resolves #42414 Implementation: Added a new field allow_readonly to table system.table_functions. Updated to use new field allow_readonly to allow using table functions in readonly mode. Testing: Added a test for filesystem tests/queries/0_stateless/02473_functions_in_readonly_mode.sh Documentation: Updated the english documentation for Table Functions. #42708 (SmitaRKulkarni).
- The system.asynchronous_metricsgets embedded documentation. This documentation is also exported to Prometheus. Fixed an error with the metrics aboutcachedisks - they were calculated only for one arbitrary cache disk instead all of them. This closes #7644. #43194 (Alexey Milovidov).
- Throttling algorithm changed to token bucket. #42665 (Sergei Trifonov).
- Mask passwords and secret keys both in system.query_logand/var/log/clickhouse-server/*.logand also in error messages. #42484 (Vitaly Baranov).
- Remove covered parts for fetched part (to avoid possible replication delay grows). #39737 (Azat Khuzhin).
- If /dev/ttyis available, the progress in clickhouse-client and clickhouse-local will be rendered directly to the terminal, without writing to STDERR. It allows getting progress even if STDERR is redirected to a file, and the file will not be polluted by terminal escape sequences. The progress can be disabled by--progress false. This closes #32238. #42003 (Alexey Milovidov).
- Add support for FixedStringinput to base64 coding functions. #42285 (ltrk2).
- Add columns bytes_on_diskandpathtosystem.detached_parts. Closes #42264. #42303 (chen).
- Improve using structure from insertion table in table functions, now setting use_structure_from_insertion_table_in_table_functionshas new possible value -2that means that ClickHouse will try to determine if we can use structure from insertion table or not automatically. Closes #40028. #42320 (Kruglov Pavel).
- Fix no progress indication on INSERT FROM INFILE. Closes #42548. #42634 (chen).
- Refactor function tokensto enable max tokens returned for related functions (disabled by default). #42673 (李扬).
- Allow to use Date32arguments forformatDateTimeandFROM_UNIXTIMEfunctions. #42737 (Roman Vasin).
- Update tzdata to 2022f. Mexico will no longer observe DST except near the US border: https://www.timeanddate.com/news/time/mexico-abolishes-dst-2022.html. Chihuahua moves to year-round UTC-6 on 2022-10-30. Fiji no longer observes DST. See https://github.com/google/cctz/pull/235 and https://bugs.launchpad.net/ubuntu/+source/tzdata/+bug/1995209. #42796 (Alexey Milovidov).
- Add FailedAsyncInsertQueryevent metric for async inserts. #42814 (Krzysztof Góralski).
- Implement read-in-orderoptimization on top of query plan. It is enabled by default. Setquery_plan_read_in_order = 0to use previous AST-based version. #42829 (Nikolai Kochetov).
- Increase the size of upload part exponentially for backup to S3 to avoid errors about max 10 000 parts limit of the multipart upload to s3. #42833 (Vitaly Baranov).
- When the merge task is continuously busy and the disk space is insufficient, the completely expired parts cannot be selected and dropped, resulting in insufficient disk space. My idea is that when the entire Part expires, there is no need for additional disk space to guarantee, ensure the normal execution of TTL. #42869 (zhongyuankai).
- Add ossfunction andOSStable engine (this is convenient for users). oss is fully compatible with s3. #43155 (zzsmdfj).
- Improve error reporting in the collection of OS-related info for the system.asynchronous_metricstable. #43192 (Alexey Milovidov).
- Modify the INFORMATION_SCHEMAtables in a way so that ClickHouse can connect to itself using the MySQL compatibility protocol. Add columns instead of aliases (related to #9769). It will improve the compatibility with various MySQL clients. #43198 (Filatenkov Artur).
- Add some functions for compatibility with PowerBI, when it connects using MySQL protocol #42612 (Filatenkov Artur).
- Better usability for Dashboard on changes #42872 (Vladimir C).
Build/Testing/Packaging Improvement
- Run SQLancer for each pull request and commit to master. SQLancer is an OpenSource fuzzer that focuses on automatic detection of logical bugs. #42397 (Ilya Yatsishin).
- Update to latest zlib-ng. #42463 (Boris Kuschel).
- Add support for testing ClickHouse server with Jepsen. By the way, we already have support for testing ClickHouse Keeper with Jepsen. This pull request extends it to Replicated tables. #42619 (Antonio Andelic).
- Use https://github.com/matus-chochlik/ctcache for clang-tidy results caching. #42913 (Mikhail f. Shiryaev).
- Before the fix, the user-defined config was preserved by RPM in $file.rpmsave. The PR fixes it and won't replace the user's files from packages. #42936 (Mikhail f. Shiryaev).
- Remove some libraries from Ubuntu Docker image. #42622 (Alexey Milovidov).
Bug Fix (user-visible misbehavior in official stable or prestable release)
- Updated normaliser to clone the alias ast. Resolves #42452 Implementation: Updated QueryNormalizer to clone alias ast, when its replaced. Previously just assigning the same leads to exception in LogicalExpressinsOptimizer as it would be the same parent being inserted again. This bug is not seen with new analyser (allow_experimental_analyzer), so no changes for it. I added a test for the same. #42827 (SmitaRKulkarni).
- Fix race for backup of tables in Lazydatabases. #43104 (Vitaly Baranov).
- Fix for skip_unavailable_shards: it did not work with thes3Clustertable function. #43131 (chen).
- Fix schema inference in s3Clusterand improvement inhdfsCluster. #41979 (Kruglov Pavel).
- Fix retries while reading from URL table engines / table function. (retriable errors could be retries more times than needed, non-retriable errors resulted in failed assertion in code). #42224 (Kseniia Sumarokova).
- A segmentation fault related to DNS & c-ares has been reported and fixed. #42234 (Arthur Passos).
- Fix LOGICAL_ERRORArguments of 'plus' have incorrect data typeswhich may happen in PK analysis (monotonicity check). Fix invalid PK analysis for monotonic binary functions with first constant argument. #42410 (Nikolai Kochetov).
- Fix incorrect key analysis when key types cannot be inside Nullable. This fixes #42456. #42469 (Amos Bird).
- Fix typo in a setting name that led to bad usage of schema inference cache while using setting input_format_csv_use_best_effort_in_schema_inference. Closes #41735. #42536 (Kruglov Pavel).
- Fix creating a Set with wrong header when data type is LowCardinality. Closes #42460. #42579 (flynn).
- (U)Int128and- (U)Int256values were correctly checked in- PREWHERE. #42605 (Antonio Andelic).
- Fix a bug in functions parser that could have led to a segmentation fault. #42724 (Nikolay Degterinsky).
- Fix the locking in truncate table. #42728 (flynn).
- Fix possible crash in webdisks when file does not exist (orOPTIMIZE TABLE FINAL, that also can got the same error eventually). #42767 (Azat Khuzhin).
- Fix auth_typemapping insystem.session_log, by includingSSL_CERTIFICATEfor the enum values. #42782 (Miel Donkers).
- Fix stack-use-after-return under ASAN build in the Create User query parser. #42804 (Nikolay Degterinsky).
- Fix lowerUTF8/upperUTF8in case of symbol was in between 16-byte boundary (very frequent case of you have strings > 16 bytes long). #42812 (Azat Khuzhin).
- Additional bound check was added to LZ4 decompression routine to fix misbehaviour in case of malformed input. #42868 (Nikita Taranov).
- Fix rare possible hang on query cancellation. #42874 (Azat Khuzhin).
- Fix incorrect behavior with multiple disjuncts in hash join, close #42832. #42876 (Vladimir C).
- A null pointer will be generated when select if as from ‘three table join’ , For example, this SQL query: #42883 (zzsmdfj).
- Fix memory sanitizer report in Cluster Discovery, close #42763. #42905 (Vladimir C).
- Improve DateTime schema inference in case of empty string. #42911 (Kruglov Pavel).
- Fix rare NOT_FOUND_COLUMN_IN_BLOCK error when projection is possible to use but there is no projection available. This fixes #42771 . The bug was introduced in https://github.com/ClickHouse/ClickHouse/pull/25563. #42938 (Amos Bird).
- Fix ATTACH TABLE in PostgreSQLdatabase engine if the table contains DATETIME data type. Closes #42817. #42960 (Kseniia Sumarokova).
- Fix lambda parsing. Closes #41848. #42979 (Nikolay Degterinsky).
- Fix incorrect key analysis when nullable keys appear in the middle of a hyperrectangle. This fixes #43111 . #43133 (Amos Bird).
- Fix several buffer over-reads in deserialization of carefully crafted aggregate function states. #43159 (Raúl Marín).
- Fix function ifin case of NULL and const Nullable arguments. Closes #43069. #43178 (Kruglov Pavel).
- Fix decimal math overflow in parsing DateTime with the 'best effort' algorithm. Closes #43061. #43180 (Kruglov Pavel).
- The indentfield produced by thegit-importtool was miscalculated. See https://clickhouse.com/docs/en/getting-started/example-datasets/github/. #43191 (Alexey Milovidov).
- Fixed unexpected behaviour of Intervaltypes with subquery and casting. #43193 (jh0x).
ClickHouse release 22.10, 2022-10-25
Backward Incompatible Change
- Rename cache commands: show caches->show filesystem caches,describe cache->describe filesystem cache. #41508 (Kseniia Sumarokova).
- Remove support for the WITH TIMEOUTsection forLIVE VIEW. This closes #40557. #42173 (Alexey Milovidov).
- Remove support for the {database}macro from the client's prompt. It was displayed incorrectly if the database was unspecified and it was not updated onUSEstatements. This closes #25891. #42508 (Alexey Milovidov).
New Feature
- Composable protocol configuration is added. Now different protocols can be set up with different listen hosts. Protocol wrappers such as PROXYv1 can be set up over any other protocols (TCP, TCP secure, MySQL, Postgres). #41198 (Yakov Olkhovskiy).
- Add S3as a new type of the destination of backups. Support BACKUP to S3 with as-is path/data structure. #42333 (Vitaly Baranov), #42232 (Azat Khuzhin).
- Added functions (randUniform,randNormal,randLogNormal,randExponential,randChiSquared,randStudentT,randFisherF,randBernoulli,randBinomial,randNegativeBinomial,randPoisson) to generate random values according to the specified distributions. This closes #21834. #42411 (Nikita Mikhaylov).
- An improvement for ClickHouse Keeper: add support for uploading snapshots to S3. S3 information can be defined inside keeper_server.s3_snapshot. #41342 (Antonio Andelic).
- Added an aggregate function analysisOfVariance(anova) to perform a statistical test over several groups of normally distributed observations to find out whether all groups have the same mean or not. Original PR #37872. #42131 (Nikita Mikhaylov).
- Support limiting of temporary data stored on disk using settings max_temporary_data_on_disk_size_for_user/max_temporary_data_on_disk_size_for_query. #40893 (Vladimir C).
- Add setting format_json_object_each_row_column_for_object_nameto write/parse object name as column value in JSONObjectEachRow format. #41703 (Kruglov Pavel).
- Add BLAKE3 hash-function to SQL. #33435 (BoloniniD).
- The function javaHashhas been extended to integers. #41131 (JackyWoo).
- Add OpenTelemetry support to ON CLUSTER DDL (require distributed_ddl_entry_format_versionto be set to 4). #41484 (Frank Chen).
- Added system table asynchronous_insert_log. It contains information about asynchronous inserts (including results of queries in fire-and-forget mode (withwait_for_async_insert=0)) for better introspection. #42040 (Anton Popov).
- Add support for methods lz4,bz2,snappyin HTTP'sAccept-Encodingwhich is a non-standard extension to HTTP protocol. #42071 (Nikolay Degterinsky).
- Adds Morton Coding (ZCurve) encode/decode functions. #41753 (Constantine Peresypkin).
- Add support for SET setting_name = DEFAULT. #42187 (Filatenkov Artur).
Experimental Feature
- Added new infrastructure for query analysis and planning under the allow_experimental_analyzersetting. #31796 (Maksim Kita).
- Initial implementation of Kusto Query Language. Please don't use it. #37961 (Yong Wang).
Performance Improvement
- Relax the "Too many parts" threshold. This closes #6551. Now ClickHouse will allow more parts in a partition if the average part size is large enough (at least 10 GiB). This allows to have up to petabytes of data in a single partition of a single table on a single server, which is possible using disk shelves or object storage. #42002 (Alexey Milovidov).
- Implement operator precedence element parser to make the required stack size smaller. #34892 (Nikolay Degterinsky).
- DISTINCT in order optimization leverage sorting properties of data streams. This improvement will enable reading in order for DISTINCT if applicable (before it was necessary to provide ORDER BY for columns in DISTINCT). #41014 (Igor Nikonov).
- ColumnVector: optimize UInt8 index with AVX512VBMI. #41247 (Guo Wangyang).
- Optimize the lock contentions for ThreadGroupStatus::mutex. The performance experiments of SSB (Star Schema Benchmark) on the ICX device (Intel Xeon Platinum 8380 CPU, 80 cores, 160 threads) shows that this change could bring a 2.95x improvement of the geomean of all subcases' QPS. #41675 (Zhiguo Zhou).
- Add ldaprcapabilities to AArch64 builds. This is supported from Graviton 2+, Azure and GCP instances. Only appeared in clang-15 not so long ago. #41778 (Daniel Kutenin).
- Improve performance when comparing strings and one argument is an empty constant string. #41870 (Jiebin Sun).
- Optimize insertFromof ColumnAggregateFunction to share Aggregate State in some cases. #41960 (flynn).
- Make writing to azure_blob_storagedisks faster (respectmax_single_part_upload_sizeinstead of writing a block per each buffer size). Inefficiency mentioned in #41754. #42041 (Kseniia Sumarokova).
- Make thread ids in the process list and query_log unique to avoid waste. #42180 (Alexey Milovidov).
- Support skipping cache completely (both download to cache and reading cached data) in case the requested read range exceeds the threshold defined by cache setting bypass_cache_threashold, requires to be enabled withenable_bypass_cache_with_threshold). #42418 (Han Shukai). This helps on slow local disks.
Improvement
- Add setting allow_implicit_no_password: in combination withallow_no_passwordit forbids creating a user with no password unlessIDENTIFIED WITH no_passwordis explicitly specified. #41341 (Nikolay Degterinsky).
- Embedded Keeper will always start in the background allowing ClickHouse to start without achieving quorum. #40991 (Antonio Andelic).
- Made reestablishing a new connection to ZooKeeper more reactive in case of expiration of the previous one. Previously there was a task which spawns every minute by default and thus a table could be in readonly state for about this time. #41092 (Nikita Mikhaylov).
- Now projections can be used with zero copy replication (zero-copy replication is a non-production feature). #41147 (alesapin).
- Support expression (EXPLAIN SELECT ...)in a subquery. Queries likeSELECT * FROM (EXPLAIN PIPELINE SELECT col FROM TABLE ORDER BY col)became valid. #40630 (Vladimir C).
- Allow changing async_insert_max_data_sizeorasync_insert_busy_timeout_msin scope of query. E.g. user wants to insert data rarely and she doesn't have access to the server config to tune default settings. #40668 (Nikita Mikhaylov).
- Improvements for reading from remote filesystems, made threadpool size for reads/writes configurable. Closes #41070. #41011 (Kseniia Sumarokova).
- Support all combinators combination in WindowTransform/arratReduce*/initializeAggregation/aggregate functions versioning. Previously combinators like ForEach/Resample/Mapdidn't work in these places, using them led to exception likeState function ... inserts results into non-state column. #41107 (Kruglov Pavel).
- Add function tryDecryptthat returns NULL when decrypt fails (e.g. decrypt with incorrect key) instead of throwing an exception. #41206 (Duc Canh Le).
- Add the unreserved_spacecolumn to thesystem.diskstable to check how much space is not taken by reservations per disk. #41254 (filimonov).
- Support s3 authorization headers in table function arguments. #41261 (Kseniia Sumarokova).
- Add support for MultiRead in Keeper and internal ZooKeeper client (this is an extension to ZooKeeper protocol, only available in ClickHouse Keeper). #41410 (Antonio Andelic).
- Add support for decimal type comparing with floating point literal in IN operator. #41544 (liang.huang).
- Allow readable size values (like 1TB) in cache config. #41688 (Kseniia Sumarokova).
- ClickHouse could cache stale DNS entries for some period of time (15 seconds by default) until the cache won't be updated asynchronously. During these periods ClickHouse can nevertheless try to establish a connection and produce errors. This behavior is fixed. #41707 (Nikita Mikhaylov).
- Add interactive history search with fzf-like utility (fzf/sk) for clickhouse-client/clickhouse-local(note you can useFZF_DEFAULT_OPTS/SKIM_DEFAULT_OPTIONSto additionally configure the behavior). #41730 (Azat Khuzhin).
- Only allow clients connecting to a secure server with an invalid certificate only to proceed with the '--accept-certificate' flag. #41743 (Yakov Olkhovskiy).
- Add function tryBase58Decode, similar to the existing functiontryBase64Decode. #41824 (Robert Schulze).
- Improve feedback when replacing partition with different primary key. Fixes #34798. #41838 (Salvatore).
- Fix parallel parsing: segmentator now checks max_block_size. This fixed memory overallocation in case of parallel parsing and small LIMIT. #41852 (Vitaly Baranov).
- Don't add "TABLE_IS_DROPPED" exception to system.errorsif it's happened during SELECT from a system table and was ignored. #41908 (AlfVII).
- Improve option enable_extended_results_for_datetime_functionsto return results of type DateTime64 for functionstoStartOfDay,toStartOfHour,toStartOfFifteenMinutes,toStartOfTenMinutes,toStartOfFiveMinutes,toStartOfMinuteandtimeSlot. #41910 (Roman Vasin).
- Improve DateTimetype inference for text formats. Now it respects settingdate_time_input_formatand doesn't try to infer datetimes from numbers as timestamps. Closes #41389 Closes #42206. #41912 (Kruglov Pavel).
- Remove confusing warning when inserting with perform_ttl_move_on_insert= false. #41980 (Vitaly Baranov).
- Allow user to write countState(*)similar tocount(*). This closes #9338. #41983 (Amos Bird).
- Fix rankCorrsize overflow. #42020 (Duc Canh Le).
- Added an option to specify an arbitrary string as an environment name in the Sentry's config for more handy reports. #42037 (Nikita Mikhaylov).
- Fix parsing out-of-range Date from CSV. #42044 (Andrey Zvonov).
- parseDataTimeBestEffortnow supports comma between date and time. Closes #42038. #42049 (flynn).
- Improved stale replica recovery process for ReplicatedMergeTree. If a lost replica has some parts which are absent from a healthy replica, but these parts should appear in the future according to the replication queue of the healthy replica, then the lost replica will keep such parts instead of detaching them. #42134 (Alexander Tokmakov).
- Add a possibility to use Date32arguments for date_diff function. Fix issue in date_diff function when using DateTime64 arguments with a start date before Unix epoch and end date after Unix epoch. #42308 (Roman Vasin).
- When uploading big parts to Minio, 'Complete Multipart Upload' can take a long time. Minio sends heartbeats every 10 seconds (see https://github.com/minio/minio/pull/7198). But clickhouse times out earlier, because the default send/receive timeout is set to 5 seconds. #42321 (filimonov).
- Fix rarely invalid cast of aggregate state types with complex types such as Decimal. This fixes #42408. #42417 (Amos Bird).
- Allow to use Date32arguments fordateNamefunction. #42554 (Roman Vasin).
- Now filters with NULL literals will be used during index analysis. #34063. #41842 (Amos Bird).
- Merge parts if every part in the range is older than a certain threshold. The threshold can be set by using min_age_to_force_merge_seconds. This closes #35836. #42423 (Antonio Andelic). This is continuation of #39550i by @fastio who implemented most of the logic.
- Added new infrastructure for query analysis and planning under allow_experimental_analyzersetting. #31796 (Maksim Kita).
- Improve the time to recover lost keeper connections. #42541 (Raúl Marín).
Build/Testing/Packaging Improvement
- Add fuzzer for table definitions #40096 (Anton Popov). This represents the biggest advancement for ClickHouse testing in this year so far.
- Beta version of the ClickHouse Cloud service is released: https://clickhouse.cloud/. It provides the easiest way to use ClickHouse (even slightly easier than the single-command installation).
- Added support of WHERE clause generation to AST Fuzzer and possibility to add or remove ORDER BY and WHERE clause. #38519 (Ilya Yatsishin).
- Aarch64 binaries now require at least ARMv8.2, released in 2016. Most notably, this enables use of ARM LSE, i.e. native atomic operations. Also, CMake build option "NO_ARMV81_OR_HIGHER" has been added to allow compilation of binaries for older ARMv8.0 hardware, e.g. Raspberry Pi 4. #41610 (Robert Schulze).
- Allow building ClickHouse with Musl (small changes after it was already supported but broken). #41987 (Alexey Milovidov).
- Add the $CLICKHOUSE_CRONFILEfile checking to avoid running thesedcommand to get the file not found error on install. #42081 (Chun-Sheng, Li).
- Update cctz to 2022eto support the new timezone changes. Palestine transitions are now Saturdays at 02:00. Simplify three Ukraine zones into one. Jordan and Syria switch from +02/+03 with DST to year-round +03. (https://data.iana.org/time-zones/tzdb/NEWS). This closes #42252. #42327 (Alexey Milovidov). #42273 (Dom Del Nano).
- Add Rust code support into ClickHouse with BLAKE3 hash-function library as an example. #33435 (BoloniniD).
Bug Fix (user-visible misbehavior in official stable or prestable release)
- Choose correct aggregation method for LowCardinalitywith big integer types. #42342 (Duc Canh Le).
- Several fixes for webdisk. #41652 (Kseniia Sumarokova).
- Fixes an issue that causes docker run to fail if https_portis not present in config. #41693 (Yakov Olkhovskiy).
- Mutations were not cancelled properly on server shutdown or SYSTEM STOP MERGESquery and cancellation might take long time, it's fixed. #41699 (Alexander Tokmakov).
- Fix wrong result of queries with ORDER BYorGROUP BYby columns from prefix of sorting key, wrapped into monotonic functions, with enable "read in order" optimization (settingsoptimize_read_in_orderandoptimize_aggregation_in_order). #41701 (Anton Popov).
- Fix possible crash in SELECTfromMergetable with enabledoptimize_monotonous_functions_in_order_bysetting. Fixes #41269. #41740 (Nikolai Kochetov).
- Fixed "Part ... intersects part ..." error that might happen in extremely rare cases if replica was restarted just after detaching some part as broken. #41741 (Alexander Tokmakov).
- Don't allow to create or alter merge tree tables with column name _row_exists, which is reserved for lightweight delete. Fixed #41716. #41763 (Jianmei Zhang).
- Fix a bug that CORS headers are missing in some HTTP responses. #41792 (Frank Chen).
- 22.9 might fail to startup ReplicatedMergeTreetable if that table was created by 20.3 or older version and was never altered, it's fixed. Fixes #41742. #41796 (Alexander Tokmakov).
- When the batch sending fails for some reason, it cannot be automatically recovered, and if it is not processed in time, it will lead to accumulation, and the printed error message will become longer and longer, which will cause the http thread to block. #41813 (zhongyuankai).
- Fix compact parts with compressed marks setting. Fixes #41783 and #41746. #41823 (alesapin).
- Old versions of Replicated database don't have a special marker in [Zoo]Keeper. We need to check only whether the node contains come obscure data instead of special mark. #41875 (Nikita Mikhaylov).
- Fix possible exception in fs cache. #41884 (Kseniia Sumarokova).
- Fix use_environment_credentialsfor s3 table function. #41970 (Kseniia Sumarokova).
- Fixed "Directory already exists and is not empty" error on detaching broken part that might prevent ReplicatedMergeTreetable from starting replication. Fixes #40957. #41981 (Alexander Tokmakov).
- toDateTime64now returns the same output with negative integer and float arguments. #42025 (Robert Schulze).
- Fix write into azure_blob_storage. Partially closes #41754. #42034 (Kseniia Sumarokova).
- Fix the bzip2decoding issue for specificbzip2files. #42046 (Nikolay Degterinsky).
- Fix SQL function toLastDayOfMonthwith setting "enable_extended_results_for_datetime_functions = 1" at the beginning of the extended range (January 1900). - Fix SQL function "toRelativeWeekNum()" with setting "enable_extended_results_for_datetime_functions = 1" at the end of extended range (December 2299). - Improve the performance of for SQL functions "toISOYear()", "toFirstDayNumOfISOYearIndex()" and "toYearWeekOfNewyearMode()" by avoiding unnecessary index arithmetics. #42084 (Roman Vasin).
- The maximum size of fetches for each table accidentally was set to 8 while the pool size could be bigger. Now the maximum size of fetches for table is equal to the pool size. #42090 (Nikita Mikhaylov).
- A table might be shut down and a dictionary might be detached before checking if can be dropped without breaking dependencies between table, it's fixed. Fixes #41982. #42106 (Alexander Tokmakov).
- Fix bad inefficiency of remote_filesystem_read_method=readwith filesystem cache. Closes #42125. #42129 (Kseniia Sumarokova).
- Fix possible timeout exception for distributed queries with use_hedged_requests = 0. #42130 (Azat Khuzhin).
- Fixed a minor bug inside function runningDifferencein case of using it withDate32type. PreviouslyDatewas used and it may cause some logical errors likeBad cast from type DB::ColumnVector<int> to DB::ColumnVector<unsigned short>'. #42143 (Alfred Xu).
- Fix reusing of files > 4GB from base backup. #42146 (Azat Khuzhin).
- DISTINCT in order fails with LOGICAL_ERROR if first column in sorting key contains function. #42186 (Igor Nikonov).
- Fix a bug with projections and the aggregate_functions_null_for_emptysetting. This bug is very rare and appears only if you enable theaggregate_functions_null_for_emptysetting in the server's config. This closes #41647. #42198 (Alexey Milovidov).
- Fix read from Buffertables with read in order desc. #42236 (Duc Canh Le).
- Fix a bug which prevents ClickHouse to start when background_pool_size settingis set on default profile butbackground_merges_mutations_concurrency_ratiois not. #42315 (nvartolomei).
- ALTER UPDATEof attached part (with columns different from table schema) could create an invalid- columns.txtmetadata on disk. Reading from such part could fail with errors or return invalid data. Fixes #42161. #42319 (Nikolai Kochetov).
- Setting additional_table_filterswere not applied toDistributedstorage. Fixes #41692. #42322 (Nikolai Kochetov).
- Fix a data race in query finish/cancel. This closes #42346. #42362 (Alexey Milovidov).
- This reverts #40217 which introduced a regression in date/time functions. #42367 (Alexey Milovidov).
- Fix assert cast in join on falsy condition, Close #42380. #42407 (Vladimir C).
- Fix buffer overflow in the processing of Decimal data types. This closes #42451. #42465 (Alexey Milovidov).
- AggregateFunctionQuantilenow correctly works with UInt128 columns. Previously, the quantile state interpreted- UInt128columns as- Int128which could have led to incorrect results. #42473 (Antonio Andelic).
- Fix bad_cast assert during INSERT into Annoyindexes over non-Float32 columns.Annoyindices is an experimental feature. #42485 (Robert Schulze).
- Arithmetic operator with Date or DateTime and 128 or 256-bit integer was referencing uninitialized memory. #42453. #42573 (Alexey Milovidov).
- Fix unexpected table loading error when partition key contains alias function names during server upgrade. #36379 (Amos Bird).
ClickHouse release 22.9, 2022-09-22
Backward Incompatible Change
- Upgrade from 20.3 and older to 22.9 and newer should be done through an intermediate version if there are any ReplicatedMergeTreetables, otherwise server with the new version will not start. #40641 (Alexander Tokmakov).
- Remove the functions accurate_Castandaccurate_CastOrNull(they are different toaccurateCastandaccurateCastOrNullby underscore in the name and they are not affected by the value ofcast_keep_nullablesetting). These functions were undocumented, untested, unused, and unneeded. They appeared to be alive due to code generalization. #40682 (Alexey Milovidov).
- Add a test to ensure that every new table function will be documented. See #40649. Rename table function MeiliSearchtomeilisearch. #40709 (Alexey Milovidov).
- Add a test to ensure that every new function will be documented. See #40649. The functions lemmatize,synonyms,stemwere case-insensitive by mistake. Now they are case-sensitive. #40711 (Alexey Milovidov).
- For security and stability reasons, catboost models are no longer evaluated within the ClickHouse server. Instead, the evaluation is now done in the clickhouse-library-bridge, a separate process that loads the catboost library and communicates with the server process via HTTP. #40897 (Robert Schulze).
- Make interpretation of YAML configs to be more conventional. #41044 (Vitaly Baranov).
New Feature
- Support insert_quorum = 'auto'to use majority number. #39970 (Sachin).
- Add embedded dashboards to ClickHouse server. This is a demo project about how to achieve 90% results with 1% effort using ClickHouse features. #40461 (Alexey Milovidov).
- Added new settings constraint writability kind changeable_in_readonly. #40631 (Sergei Trifonov).
- Add support for INTERSECT DISTINCTandEXCEPT DISTINCT. #40792 (Duc Canh Le).
- Add new input/output format JSONObjectEachRow- Support import for formatsJSON/JSONCompact/JSONColumnsWithMetadata. Add new settinginput_format_json_validate_types_from_metadatathat controls whether we should check if data types from metadata match data types from the header. - Add new settinginput_format_json_validate_utf8, when it's enabled, allJSONformats will validate UTF-8 sequences. It will be disabled by default. Note that this setting doesn't influence output formatsJSON/JSONCompact/JSONColumnsWithMetadata, they always validate utf8 sequences (this exception was made because of compatibility reasons). - Add new settinginput_format_json_read_numbers_as_stringsthat allows to parse numbers in String column, the setting is disabled by default. - Add new settingoutput_format_json_quote_decimalsthat allows to output decimals in double quotes, disabled by default. - Allow to parse decimals in double quotes during data import. #40910 (Kruglov Pavel).
- Query parameters supported in DESCRIBE TABLE query. #40952 (Nikita Taranov).
- Add support to Parquet Time32/64 by converting it into DateTime64. Parquet time32/64 represents time elapsed since midnight, while DateTime32/64 represents an actual unix timestamp. Conversion simply offsets from 0. #41333 (Arthur Passos).
- Implement set operations on Apache Datasketches. #39919 (Fangyuan Deng). Note: there is no point of using Apache Datasketches, they are inferiour than ClickHouse and only make sense for integration with other systems.
- Allow recording errors to specified file while reading text formats (CSV,TSV). #40516 (zjial).
Experimental Feature
- Add ANN (approximate nearest neighbor) index based on Annoy. #40818 (Filatenkov Artur). #37215 (VVMak).
- Add new storage engine KeeperMap, that uses ClickHouse Keeper or ZooKeeper as a key-value store. #39976 (Antonio Andelic). This storage engine is intended to store a small amount of metadata.
- Improvement for in-memory data parts: remove completely processed WAL files. #40592 (Azat Khuzhin).
Performance Improvement
- Implement compression of marks and primary key. Close #34437. #37693 (zhongyuankai).
- Allow to load marks with threadpool in advance. Regulated by setting load_marks_asynchronously(default: 0). #40821 (Kseniia Sumarokova).
- Virtual filesystem over s3 will use random object names split into multiple path prefixes for better performance on AWS. #40968 (Alexey Milovidov).
- Account max_block_sizevalue while producing single-level aggregation results. Allows to execute following query plan steps using more threads. #39138 (Nikita Taranov).
- Software prefetching is used in aggregation to speed up operations with hash tables. Controlled by the setting enable_software_prefetch_in_aggregation, enabled by default. #39304 (Nikita Taranov).
- Better support of optimize_read_in_orderin case when some of sorting key columns are always constant after applyingWHEREclause. E.g. query likeSELECT ... FROM table WHERE a = 'x' ORDER BY a, b, wheretablehas storage definition:MergeTree ORDER BY (a, b). #38715 (Anton Popov).
- Filter joined streams for full_sorting_joinby each other before sorting. #39418 (Vladimir C).
- LZ4 decompression optimised by skipping empty literals processing. #40142 (Nikita Taranov).
- Speedup backup process using native copywhen possible instead of copying throughclickhouse-servermemory. #40395 (alesapin).
- Do not obtain storage snapshot for each INSERT block (slightly improves performance). #40638 (Azat Khuzhin).
- Implement batch processing for aggregate functions with multiple nullable arguments. #41058 (Raúl Marín).
- Speed up reading UniquesHashSet (uniqStatefrom disk for example). #41089 (Raúl Marín).
- Fixed high memory usage while executing mutations of compact parts in tables with huge number of columns. #41122 (lthaooo).
- Enable the vectorscan library on ARM, this speeds up regexp evaluation. #41033 (Robert Schulze).
- Upgrade vectorscan to 5.4.8 which has many performance optimizations to speed up regexp evaluation. #41270 (Robert Schulze).
- Fix incorrect fallback to skip the local filesystem cache for VFS (like S3) which happened on very high concurrency level. #40420 (Kseniia Sumarokova).
- If row policy filter is always false, return empty result immediately without reading any data. This closes #24012. #40740 (Amos Bird).
- Parallel hash JOIN for Float data types might be suboptimal. Make it better. #41183 (Alexey Milovidov).
Improvement
- During startup and ATTACH call, ReplicatedMergeTreetables will be readonly until the ZooKeeper connection is made and the setup is finished. #40148 (Antonio Andelic).
- Add enable_extended_results_for_datetime_functionsoption to return results of type Date32 for functions toStartOfYear, toStartOfISOYear, toStartOfQuarter, toStartOfMonth, toStartOfWeek, toMonday and toLastDayOfMonth when argument is Date32 or DateTime64, otherwise results of Date type are returned. For compatibility reasons default value is ‘0’. #41214 (Roman Vasin).
- For security and stability reasons, CatBoost models are no longer evaluated within the ClickHouse server. Instead, the evaluation is now done in the clickhouse-library-bridge, a separate process that loads the catboost library and communicates with the server process via HTTP. Function modelEvaluate()was replaced bycatboostEvaluate(). #40897 (Robert Schulze). #39629 (Robert Schulze).
- Add more metrics for on-disk temporary data, close #40206. #40239 (Vladimir C).
- Add config option warning_supress_regexp, close #40330. #40548 (Vladimir C).
- Add setting to disable limit on kafka_num_consumers. Closes #40331. #40670 (Kruglov Pavel).
- Support SETTINGSinDELETE ...query. #41533 (Kseniia Sumarokova).
- Detailed S3 profile events DiskS3*per S3 API call split for S3 ObjectStorage. #41532 (Sergei Trifonov).
- Two new metrics in system.asynchronous_metrics.NumberOfDetachedPartsandNumberOfDetachedByUserParts. #40779 (Sema Checherinda).
- Allow CONSTRAINTs for ODBC and JDBC tables. #34551 (Alexey Milovidov).
- Don't print SETTINGSmore than once during query formatting if it didn't appear multiple times in the original query. #38900 (Raúl Marín).
- Improve the tracing (OpenTelemetry) context propagation across threads. #39010 (Frank Chen).
- ClickHouse Keeper: add listeners for interserver_listen_hostonly in Keeper if specified. #39973 (Antonio Andelic).
- Improve recovery of Replicated user access storage after errors. #39977 (Vitaly Baranov).
- Add support for TTL in EmbeddedRocksDB. #39986 (Lloyd-Pottiger).
- Add schema inference to clickhouse-obfuscator, so the--structureargument is no longer required. #40120 (Nikolay Degterinsky).
- Improve and fix dictionaries in Arrowformat. #40173 (Kruglov Pavel).
- More natural conversion of Date32,DateTime64,Dateto narrower types: upper or lower normal value is considered when out of normal range. #40217 (Andrey Zvonov).
- Fix the case when Mergetable overViewcannot use index. #40233 (Duc Canh Le).
- Custom key names for JSON server logs. #40251 (Mallik Hassan).
- It is now possible to set a custom error code for the exception thrown by function throwIf. #40319 (Robert Schulze).
- Improve schema inference cache, respect format settings that can change the schema. #40414 (Kruglov Pavel).
- Allow parsing DateasDateTimeandDateTime64. This implements the enhancement proposed in #36949. #40474 (Alexey Milovidov).
- Allow conversion from StringwithDateTime64like2022-08-22 01:02:03.456toDateandDate32. Allow conversion from String with DateTime like2022-08-22 01:02:03toDate32. This closes #39598. #40475 (Alexey Milovidov).
- Better support for nested data structures in Parquet format #40485 (Arthur Passos).
- Support reading Array(Record) into flatten nested table in Avro. #40534 (Kruglov Pavel).
- Add read-only support for EmbeddedRocksDB. #40543 (Lloyd-Pottiger).
- Validate the compression method parameter of URL table engine. #40600 (Frank Chen).
- Better format detection for url table function/engine in presence of a query string after a file name. Closes #40315. #40636 (Kruglov Pavel).
- Disable projection when grouping set is used. It generated wrong result. This fixes #40635. #40726 (Amos Bird).
- Fix incorrect format of APPLYcolumn transformer which can break metadata if used in table definition. This fixes #37590. #40727 (Amos Bird).
- Support the %zdescriptor for formatting the timezone offset informatDateTime. #40736 (Cory Levy).
- The interactive mode in clickhouse-clientnow interprets.and/as "run the last command". #40750 (Robert Schulze).
- Fix issue with passing MySQL timeouts for MySQL database engine and MySQL table function. Closes #34168. #40751 (Kseniia Sumarokova).
- Create status file for filesystem cache directory to make sure that cache directories are not shared between different servers or caches. #40820 (Kseniia Sumarokova).
- Add support for DELETEandUPDATEforEmbeddedRocksDBstorage. #40853 (Antonio Andelic).
- ClickHouse Keeper: fix shutdown during long commit and increase allowed request size. #40941 (Antonio Andelic).
- Fix race in WriteBufferFromS3, add TSA annotations. #40950 (Kseniia Sumarokova).
- Grouping sets with group_by_use_nulls should only convert key columns to nullable. #40997 (Duc Canh Le).
- Improve the observability of INSERT on distributed table. #41034 (Frank Chen).
- More low-level metrics for S3 interaction. #41039 (mateng915).
- Support relative path in Location header after HTTP redirect. Closes #40985. #41162 (Kruglov Pavel).
- Apply changes to HTTP handlers on fly without server restart. #41177 (Azat Khuzhin).
- ClickHouse Keeper: properly close active sessions during shutdown. #41215 (Antonio Andelic). This lowers the period of "table is read-only" errors.
- Add ability to automatically comment SQL queries in clickhouse-client/local (with Alt-#, like in readline). #41224 (Azat Khuzhin).
- Fix incompatibility of cache after switching setting do_no_evict_index_and_mark_filesfrom 1 to 0, 0 to 1. #41330 (Kseniia Sumarokova).
- Add a setting allow_suspicious_fixed_string_typesto prevent users from creating columns of type FixedString with size > 256. #41495 (Duc Canh Le).
- Add has_lightweight_deleteto system.parts. #41564 (Kseniia Sumarokova).
Build/Testing/Packaging Improvement
- Enforce documentation for every setting. #40644 (Alexey Milovidov).
- Enforce documentation for every current metric. #40645 (Alexey Milovidov).
- Enforce documentation for every profile event counter. Write the documentation where it was missing. #40646 (Alexey Milovidov).
- Allow minimal clickhouse-localbuild by correcting some dependencies. #40460 (Alexey Milovidov). It is less than 50 MiB.
- Calculate and report SQL function coverage in tests. #40593. #40647 (Alexey Milovidov).
- Enforce documentation for every MergeTree setting. #40648 (Alexey Milovidov).
- A prototype of embedded reference documentation for high-level uniform server components. #40649 (Alexey Milovidov).
- We will check all queries from the changed perf tests to ensure that all changed queries were tested. #40322 (Nikita Taranov).
- Fix TGZ packages. #40681 (Mikhail f. Shiryaev).
- Fix debug symbols. #40873 (Azat Khuzhin).
- Extended the CI configuration to create a x86 SSE2-only build. Useful for old or embedded hardware. #40999 (Robert Schulze).
- Switch to llvm/clang 15. #41046 (Azat Khuzhin).
- Continuation of #40938. Fix ODR violation for Loggersclass. Fixes #40398, #40937. #41060 (Dmitry Novik).
- Add macOS binaries to GitHub release assets, it fixes #37718. #41088 (Mikhail f. Shiryaev).
- The c-ares library is now bundled with ClickHouse's build system. #41239 (Robert Schulze).
- Get rid of dlopenfrom the main ClickHouse code. It remains in the library-bridge and odbc-bridge. #41428 (Alexey Milovidov).
- Don't allow dlopenin the main ClickHouse binary, because it is harmful and insecure. We don't use it. But it can be used by some libraries for the implementation of "plugins". We absolutely discourage the ancient technique of loading 3rd-party uncontrolled dangerous libraries into the process address space, because it is insane. #41429 (Alexey Milovidov).
- Add sourcefield to deb packages, updatenfpm. #41531 (Mikhail f. Shiryaev).
- Support for DWARF-5 in the in-house DWARF parser. #40710 (Azat Khuzhin).
- Add fault injection in ZooKeeper client for testing #30498 (Alexander Tokmakov).
- Add stateless tests with s3 storage with debug and tsan #35262 (Kseniia Sumarokova).
- Trying stress on top of S3 #36837 (alesapin).
- Enable concurrency-mt-unsafeinclang-tidy#40224 (Alexey Milovidov).
Bug Fix
- Fix potential dataloss due to a bug in AWS SDK. Bug can be triggered only when clickhouse is used over S3. #40506 (alesapin). This bug has been open for 5 years in AWS SDK and is closed after our report.
- Malicious data in Native format might cause a crash. #41441 (Alexey Milovidov).
- The aggregate function categorialInformationValuewas having incorrectly defined properties, which might cause a null pointer dereferencing at runtime. This closes #41443. #41449 (Alexey Milovidov).
- Writing data in Apache ORCformat might lead to a buffer overrun. #41458 (Alexey Milovidov).
- Fix memory safety issues with functions encryptandcontingencyif Array of Nullable is used as an argument. This fixes #41004. #40195 (Alexey Milovidov).
- Fix bugs in MergeJoin when 'not_processed' is not null. #40335 (liql2007).
- Fix incorrect result in case of decimal precision loss in IN operator, ref #41125. #41130 (Vladimir C).
- Fix filling of missed Nestedcolumns with multiple levels. #37152 (Anton Popov).
- Fix SYSTEM UNFREEZE query for Ordinary (deprecated) database. Fix for https://github.com/ClickHouse/ClickHouse/pull/36424. #38262 (Vadim Volodin).
- Fix unused unknown columns introduced by WITH statement. This fixes #37812 . #39131 (Amos Bird).
- Fix query analysis for ORDER BY in presence of window functions. Fixes #38741 Fixes #24892. #39354 (Dmitry Novik).
- Fixed Unknown identifier (aggregate-function)exception which appears when a user tries to calculate WINDOW ORDER BY/PARTITION BY expressions over aggregate functions. #39762 (Vladimir Chebotaryov).
- Limit number of analyze for one query with setting max_analyze_depth. It prevents exponential blow up of analysis time for queries with extraordinarily large number of subqueries. #40334 (Vladimir C).
- Fix rare bug with column TTL for MergeTree engines family: In case of repeated vertical merge the error Cannot unlink file ColumnName.bin ... No such file or directory.could happen. #40346 (alesapin).
- Use DNS entries for both IPv4 and IPv6 if present. #40353 (Maksim Kita).
- Allow to read snappy compressed files from Hadoop. #40482 (Kruglov Pavel).
- Fix crash while parsing values of type Object(experimental feature) that contains arrays of variadic dimension. #40483 (Duc Canh Le).
- Fix settings input_format_tsv_skip_first_lines. #40491 (mini4).
- Fix bug (race condition) when starting up MaterializedPostgreSQL database/table engine. #40262. Fix error with reaching limit of relcache_callback_list slots. #40511 (Maksim Buren).
- Fix possible error 'Decimal math overflow' while parsing DateTime64. #40546 (Kruglov Pavel).
- Fix vertical merge of parts with lightweight deleted rows. #40559 (Alexander Gololobov).
- Fix segment fault when writing data to URL table engine if it enables compression. #40565 (Frank Chen).
- Fix possible logical error 'Invalid Field get from type UInt64 to type String'in arrayElement function with Map. #40572 (Kruglov Pavel).
- Fix possible race in filesystem cache. #40586 (Kseniia Sumarokova).
- Removed skipping of mutations in unaffected partitions of MergeTreetables, because this feature never worked correctly and might cause resurrection of finished mutations. #40589 (Alexander Tokmakov).
- The clickhouse server will crash if we add a grpc port which has been occupied to the configuration in runtime. #40597 (何李夫).
- Fix base58Encode / base58Decodehandling leading 0 / '1'. #40620 (Andrey Zvonov).
- keeper-fix: fix race in accessing logs while snapshot is being installed. #40627 (Antonio Andelic).
- Fix short circuit execution of toFixedString function. Solves (partially) #40622. #40628 (Kruglov Pavel).
- Fixes SQLite int8 column conversion to int64 column in ClickHouse. Fixes #40639. #40642 (Barum Rho).
- Fix stack overflow in recursive Buffertables. This closes #40637. #40643 (Alexey Milovidov).
- During insertion of a new query to the ProcessListallocations happen. If we reach the memory limit during these allocations we can not useOvercommitTracker, becauseProcessList::mutexis already acquired. Fixes #40611. #40677 (Dmitry Novik).
- Fix LOGICAL_ERROR with max_read_buffer_size=0 during reading marks. #40705 (Azat Khuzhin).
- Fix memory leak while pushing to MVs w/o query context (from Kafka/...). #40732 (Azat Khuzhin).
- Fix possible error Attempt to read after eof in CSV schema inference. #40746 (Kruglov Pavel).
- Fix logical error in write-through cache "File segment completion can be done only by downloader". Closes #40748. #40759 (Kseniia Sumarokova).
- Make the result of GROUPING function the same as in SQL and other DBMS. #40762 (Dmitry Novik).
- In #40595 it was reported that the host_regexpfunctionality was not working properly with a name to address resolution in/etc/hosts. It's fixed. #40769 (Arthur Passos).
- Fix incremental backups for Log family. #40827 (Vitaly Baranov).
- Fix extremely rare bug which can lead to potential data loss in zero-copy replication. #40844 (alesapin).
- Fix key condition analyzing crashes when same set expression built from different column(s). #40850 (Duc Canh Le).
- Fix nested JSON Objects schema inference. #40851 (Kruglov Pavel).
- Fix 3-digit prefix directory for filesystem cache files not being deleted if empty. Closes #40797. #40867 (Kseniia Sumarokova).
- Fix uncaught DNS_ERROR on failed connection to replicas. #40881 (Robert Coelho).
- Fix bug when removing unneeded columns in subquery. #40884 (luocongkai).
- Fix extra memory allocation for remote read buffers. #40896 (Kseniia Sumarokova).
- Fixed a behaviour when user with explicitly revoked grant for dropping databases can still drop it. #40906 (Nikita Mikhaylov).
- A fix for ClickHouse Keeper: correctly compare paths in write requests to Keeper internal system node paths. #40918 (Antonio Andelic).
- Fix deadlock in WriteBufferFromS3. #40943 (Kseniia Sumarokova).
- Fix access rights for DESCRIBE TABLE url()and some otherDESCRIBE TABLE <table_function>(). #40975 (Vitaly Baranov).
- Remove wrong parser logic for WITH GROUPING SETSwhich may lead to nullptr dereference. #41049 (Duc Canh Le).
- A fix for ClickHouse Keeper: fix possible segfault during Keeper shutdown. #41075 (Antonio Andelic).
- Fix possible segfaults, use-heap-after-free and memory leak in aggregate function combinators. Closes #40848. #41083 (Kruglov Pavel).
- Fix query_views_log with Window views. #41132 (Raúl Marín).
- Disables optimize_monotonous_functions_in_order_by by default, mitigates: #40094. #41136 (Denny Crane).
- Fixed "possible deadlock avoided" error on automatic conversion of database engine from Ordinary to Atomic. #41146 (Alexander Tokmakov).
- Fix SIGSEGV in SortedBlocksWriter in case of empty block (possible to get with optimize_aggregation_in_orderandjoin_algorithm=auto). #41154 (Azat Khuzhin).
- Fix incorrect query result when trivial count optimization is in effect with array join. This fixes #39431. #41158 (Denny Crane).
- Fix stack-use-after-return in GetPriorityForLoadBalancing::getPriorityFunc(). #41159 (Azat Khuzhin).
- Fix positional arguments exception Positional argument out of bounds. Closes #40634. #41189 (Kseniia Sumarokova).
- Fix background clean up of broken detached parts. #41190 (Kseniia Sumarokova).
- Fix exponential query rewrite in case of lots of cross joins with where, close #21557. #41223 (Vladimir C).
- Fix possible logical error in write-through cache, which happened because not all types of exception were handled as needed. Closes #41208. #41232 (Kseniia Sumarokova).
- Fix String log entry in system.filesystem_cache_log. #41233 (jmimbrero).
- Queries with OFFSETclause in subquery andWHEREclause in outer query might return incorrect result, it's fixed. Fixes #40416. #41280 (Alexander Tokmakov).
- Fix possible wrong query result with query_plan_optimize_primary_keyenabled. Fixes #40599. #41281 (Nikolai Kochetov).
- Do not allow invalid sequences influence other rows in lowerUTF8/upperUTF8. #41286 (Azat Khuzhin).
- Fix ALTER <table> ADD COLUMNqueries with columns of typeObject. #41290 (Anton Popov).
- Fixed "No node" error when selecting from system.distributed_ddl_queuewhen there's nodistributed_ddl.pathin config. Fixes #41096. #41296 (young scott).
- Fix incorrect logical error Expected relative pathin disk object storage. Related to #41246. #41297 (Kseniia Sumarokova).
- Add column type check before UUID insertion in MsgPack format. #41309 (Kruglov Pavel).
- Fix possible crash after inserting asynchronously (with enabled setting async_insert) malformed data to columns of typeObject. It could happen, if JSONs in all batches of async inserts were invalid and could not be parsed. #41336 (Anton Popov).
- Fix possible deadlock with async_socket_for_remote/use_hedged_requests and parallel KILL. #41343 (Azat Khuzhin).
- Disables optimize_rewrite_sum_if_to_count_if by default, mitigates: #38605 #38683. #41388 (Denny Crane).
- Since 22.8 ON CLUSTERclause is ignored if database isReplicatedand cluster name and database name are the same. Because of thisDROP PARTITION ON CLUSTERworked unexpected way withReplicated. It's fixed, nowON CLUSTERclause is ignored only for queries that are replicated on database level. Fixes #41299. #41390 (Alexander Tokmakov).
- Fix possible hung/deadlock on query cancellation (KILL QUERYor server shutdown). #41467 (Azat Khuzhin).
- Fix possible server crash when using the JBOD feature. This fixes #41365. #41483 (Amos Bird).
- Fix conversion from nullable fixed string to string. #41541 (Duc Canh Le).
- Prevent crash when passing wrong aggregation states to groupBitmap*. #41563 (Raúl Marín).
- Queries with ORDER BYand1500 <= LIMIT <= max_block_sizecould return incorrect result with missing rows from top. Fixes #41182. #41576 (Nikolai Kochetov).
- Fix read bytes/rows in X-ClickHouse-Summary with materialized views. #41586 (Raúl Marín).
- Fix possible pipeline stuckexception for queries withOFFSET. The error was found withenable_optimize_predicate_expression = 0and always false condition inWHERE. Fixes #41383. #41588 (Nikolai Kochetov).
ClickHouse release 22.8, 2022-08-18
Backward Incompatible Change
- Extended range of Date32andDateTime64to support dates from the year 1900 to 2299. In previous versions, the supported interval was only from the year 1925 to 2283. The implementation is using the proleptic Gregorian calendar (which is conformant with ISO 8601:2004 (clause 3.2.1 The Gregorian calendar)) instead of accounting for historical transitions from the Julian to the Gregorian calendar. This change affects implementation-specific behavior for out-of-range arguments. E.g. if in previous versions the value of1899-01-01was clamped to1925-01-01, in the new version it will be clamped to1900-01-01. It changes the behavior of rounding withtoStartOfIntervalif you passINTERVAL 3 QUARTERup to one quarter because the intervals are counted from an implementation-specific point of time. Closes #28216, improves #38393. #39425 (Roman Vasin).
- Now, all relevant dictionary sources respect remote_url_allow_hostssetting. It was already done for HTTP, Cassandra, Redis. Added ClickHouse, MongoDB, MySQL, PostgreSQL. Host is checked only for dictionaries created from DDL. #39184 (Nikolai Kochetov).
- Prebuilt ClickHouse x86 binaries now require support for AVX instructions, i.e. a CPU not older than Intel Sandy Bridge / AMD Bulldozer, both released in 2011. #39000 (Robert Schulze).
- Make the remote filesystem cache composable, allow not to evict certain files (regarding idx, mrk, ..), delete old cache version. Now it is possible to configure cache over Azure blob storage disk, over Local disk, over StaticWeb disk, etc. This PR is marked backward incompatible because cache configuration changes and in order for cache to work need to update the config file. Old cache will still be used with new configuration. The server will startup fine with the old cache configuration. Closes https://github.com/ClickHouse/ClickHouse/issues/36140. Closes https://github.com/ClickHouse/ClickHouse/issues/37889. (Kseniia Sumarokova). #36171)
New Feature
- Support SQL standard DELETE FROM syntax on merge tree tables and lightweight delete implementation for merge tree families. #37893 (Jianmei Zhang) (Alexander Gololobov). Note: this new feature does not make ClickHouse an HTAP DBMS.
- Query parameters can be set in interactive mode as SET param_abc = 'def'and transferred via the native protocol as settings. #39906 (Nikita Taranov).
- Quota key can be set in the native protocol (Yakov Olkhovsky).
- Added a setting exact_rows_before_limit(0/1). When enabled, ClickHouse will provide exact value forrows_before_limit_at_leaststatistic, but with the cost that the data before limit will have to be read completely. This closes #6613. #25333 (kevin wan).
- Added support for parallel distributed insert select with s3Clustertable function into tables withDistributedandReplicatedengine #34670. #39107 (Nikita Mikhaylov).
- Add new settings to control schema inference from text formats: - input_format_try_infer_dates- try infer dates from strings. -input_format_try_infer_datetimes- try infer datetimes from strings. -input_format_try_infer_integers- try inferInt64instead ofFloat64. -input_format_json_try_infer_numbers_from_strings- try infer numbers from json strings in JSON formats. #39186 (Kruglov Pavel).
- An option to provide JSON formatted log output. The purpose is to allow easier ingestion and query in log analysis tools. #39277 (Mallik Hassan).
- Add function nowInBlockwhich allows getting the current time during long-running and continuous queries. Closes #39522. Notes: there are no functionsnow64InBlockneithertodayInBlock. #39533 (Alexey Milovidov).
- Add ability to specify settings for an executable()table function. #39681 (Constantine Peresypkin).
- Implemented automatic conversion of database engine from OrdinarytoAtomic. Create emptyconvert_ordinary_to_atomicfile inflagsdirectory and allOrdinarydatabases will be converted automatically on next server start. Resolves #39546. #39933 (Alexander Tokmakov).
- Support SELECT ... INTO OUTFILE '...' AND STDOUT. #37490. #39054 (SmitaRKulkarni).
- Add formats PrettyMonoBlock,PrettyNoEscapesMonoBlock,PrettyCompactNoEscapes,PrettyCompactNoEscapesMonoBlock,PrettySpaceNoEscapes,PrettySpaceMonoBlock,PrettySpaceNoEscapesMonoBlock. #39646 (Kruglov Pavel).
Performance Improvement
- Improved memory usage during memory efficient merging of aggregation results. #39429 (Nikita Taranov).
- Added concurrency control logic to limit total number of concurrent threads created by queries. #37558 (Sergei Trifonov). Add concurrent_threads_soft_limit parameterto increase performance in case of high QPS by means of limiting total number of threads for all queries. #37285 (Roman Vasin).
- Add SLRUcache policy for uncompressed cache and marks cache. (Kseniia Sumarokova). #34651 (alexX512). Decoupling local cache function and cache algorithm #38048 (Han Shukai).
- Intel® In-Memory Analytics Accelerator (Intel® IAA) is a hardware accelerator available in the upcoming generation of Intel® Xeon® Scalable processors ("Sapphire Rapids"). Its goal is to speed up common operations in analytics like data (de)compression and filtering. ClickHouse gained the new "DeflateQpl" compression codec which utilizes the Intel® IAA offloading technology to provide a high-performance DEFLATE implementation. The codec uses the Intel® Query Processing Library (QPL) which abstracts access to the hardware accelerator, respectively to a software fallback in case the hardware accelerator is not available. DEFLATE provides in general higher compression rates than ClickHouse's LZ4 default codec, and as a result, offers less disk I/O and lower main memory consumption. #36654 (jasperzhu). #39494 (Robert Schulze).
- DISTINCTin order with- ORDER BY: Deduce way to sort based on input stream sort description. Skip sorting if input stream is already sorted. #38719 (Igor Nikonov). Improve memory usage (significantly) and query execution time + use- DistinctSortedChunkTransformfor final distinct when- DISTINCTcolumns match- ORDER BYcolumns, but rename to- DistinctSortedStreamTransformin- EXPLAIN PIPELINE→ this improves memory usage significantly + remove unnecessary allocations in hot loop in- DistinctSortedChunkTransform. #39432 (Igor Nikonov). Use- DistinctSortedTransformonly when sort description is applicable to DISTINCT columns, otherwise fall back to ordinary DISTINCT implementation + it allows making less checks during- DistinctSortedTransformexecution. #39528 (Igor Nikonov). Fix:- DistinctSortedTransformdidn't take advantage of sorting. It never cleared HashSet since clearing_columns were detected incorrectly (always empty). So, it basically worked as ordinary- DISTINCT(- DistinctTransform). The fix reduces memory usage significantly. #39538 (Igor Nikonov).
- Use local node as first priority to get structure of remote table when executing clusterand similar table functions. #39440 (Mingliang Pan).
- Optimize filtering by numeric columns with AVX512VBMI2 compress store. #39633 (Guo Wangyang). For systems with AVX512 VBMI2, this PR improves performance by ca. 6% for SSB benchmark queries queries 3.1, 3.2 and 3.3 (SF=100). Tested on Intel Icelake Xeon 8380 * 2 socket. #40033 (Robert Schulze).
- Optimize index analysis with functional expressions in multi-thread scenario. #39812 (Guo Wangyang).
- Optimizations for complex queries: Don't visit the AST for UDFs if none are registered. #40069 (Raúl Marín). Optimize CurrentMemoryTracker alloc and free. #40078 (Raúl Marín).
- Improved Base58 encoding/decoding. #39292 (Andrey Zvonov).
- Improve bytes to bits mask transform for SSE/AVX/AVX512. #39586 (Guo Wangyang).
Improvement
- Normalize AggregateFunctiontypes and state representations because optimizations like #35788 will treatcount(not null columns)ascount(), which might confuses distributed interpreters with the following error :Conversion from AggregateFunction(count) to AggregateFunction(count, Int64) is not supported. #39420 (Amos Bird). The functions with identical states can be used in materialized views interchangeably.
- Rework and simplify the system.backupstable, remove theinternalcolumn, allow user to set the ID of operation, add columnsnum_files,uncompressed_size,compressed_size,start_time,end_time. #39503 (Vitaly Baranov).
- Improved structure of DDL query result table for Replicateddatabase (separate columns with shard and replica name, more clear status) -CREATE TABLE ... ON CLUSTERqueries can be normalized on initiator first ifdistributed_ddl_entry_format_versionis set to 3 (default value). It means thatON CLUSTERqueries may not work if initiator does not belong to the cluster that specified in query. Fixes #37318, #39500 - IgnoreON CLUSTERclause if database isReplicatedand cluster name equals to database name. Related to #35570 - Miscellaneous minor fixes forReplicateddatabase engine - Check metadata consistency when starting upReplicateddatabase, start replica recovery in case of mismatch of local metadata and metadata in Keeper. Resolves #24880. #37198 (Alexander Tokmakov).
- Add result_rows and result_bytes to progress reports (X-ClickHouse-Summary). #39567 (Raúl Marín).
- Improve primary key analysis for MergeTree. #25563 (Nikolai Kochetov).
- timeSlotsnow works with DateTime64; subsecond duration and slot size available when working with DateTime64. #37951 (Andrey Zvonov).
- Added support of LEFT SEMIandLEFT ANTIdirect join withEmbeddedRocksDBtables. #38956 (Vladimir C).
- Add profile events for fsync operations. #39179 (Azat Khuzhin).
- Add the second argument to the ordinary function file(path[, default]), which function returns in the case when a file does not exists. #39218 (Nikolay Degterinsky).
- Some small fixes for reading via http, allow to retry partial content in case if 200 OK. #39244 (Kseniia Sumarokova).
- Support queries CREATE TEMPORARY TABLE ... (<list of columns>) AS .... #39462 (Kruglov Pavel).
- Add support of !/*(exclamation/asterisk) in custom TLDs (cutToFirstSignificantSubdomainCustom()/cutToFirstSignificantSubdomainCustomWithWWW()/firstSignificantSubdomainCustom()). #39496 (Azat Khuzhin).
- Add support for TLS connections to NATS. Implements #39525. #39527 (Constantine Peresypkin).
- clickhouse-obfuscator(a tool for database obfuscation for testing and load generation) now has the new- --saveand- --loadparameters to work with pre-trained models. This closes #39534. #39541 (Alexey Milovidov).
- Fix incorrect behavior of log rotation during restart. #39558 (Nikolay Degterinsky).
- Fix building aggregate projections when external aggregation is on. Mark as improvement because the case is rare and there exists easy workaround to fix it via changing settings. This fixes #39667 . #39671 (Amos Bird).
- Allow to execute hash functions with arguments of type Map. #39685 (Anton Popov).
- Add a configuration parameter to hide addresses in stack traces. It may improve security a little but generally, it is harmful and should not be used. #39690 (Alexey Milovidov).
- Change the prefix size of AggregateFunctionDistinct to make sure nested function data memory segment is aligned. #39696 (Pxl).
- Properly escape credentials passed to the clickhouse-diagnostictool. #39707 (Dale McDiarmid).
- ClickHouse Keeper improvement: create a snapshot on exit. It can be controlled with the config keeper_server.create_snapshot_on_exit,trueby default. #39755 (Antonio Andelic).
- Support primary key analysis for row_policy_filterandadditional_filter. It also helps fix issues like #37454 . #39826 (Amos Bird).
- Fix two usability issues in Play UI: - it was non-pixel-perfect on iPad due to parasitic border radius and margins; - the progress indication did not display after the first query. This closes #39957. This closes #39960. #39961 (Alexey Milovidov).
- Play UI: add row numbers; add cell selection on click; add hysteresis for table cells. #39962 (Alexey Milovidov).
- Play UI: recognize tab key in textarea, but at the same time don't mess up with tab navigation. #40053 (Alexey Milovidov).
- The client will show server-side elapsed time. This is important for the performance comparison of ClickHouse services in remote datacenters. This closes #38070. See also this for motivation. #39968 (Alexey Milovidov).
- Adds parseDateTime64BestEffortUS,parseDateTime64BestEffortUSOrNull,parseDateTime64BestEffortUSOrZerofunctions, closing #37492. #40015 (Tanya Bragin).
- Extend the system.processors_profile_logwith more information such as input rows. #40121 (Amos Bird).
- Display server-side time in clickhouse-benchmarkby default if it is available (since ClickHouse version 22.8). This is needed to correctly compare the performance of clouds. This behavior can be changed with the new--client-side-timecommand line option. Change the--randomizecommand line option from--randomize 1to the form without argument. #40193 (Alexey Milovidov).
- Add counters (ProfileEvents) for cases when query complexity limitation has been set and has reached (a separate counter for overflow_mode=breakandthrow). For example, if you have set upmax_rows_to_readwithread_overflow_mode = 'break', looking at the value ofOverflowBreakcounter will allow distinguishing incomplete results. #40205 (Alexey Milovidov).
- Fix memory accounting in case of "Memory limit exceeded" errors (previously [peak] memory usage was takes failed allocations into account). #40249 (Azat Khuzhin).
- Add metrics for filesystem cache: FilesystemCacheSizeandFilesystemCacheElements. #40260 (Kseniia Sumarokova).
- Support hadoop secure RPC transfer (hadoop.rpc.protection=privacy and hadoop.rpc.protection=integrity). #39411 (michael1589).
- Avoid continuously growing memory consumption of pattern cache when using functions multi(Fuzzy)Match(Any|AllIndices|AnyIndex)(). #40264 (Robert Schulze).
Build/Testing/Packaging Improvement
- ClickFiddle: A new tool for testing ClickHouse versions in read/write mode (Igor Baliuk).
- ClickHouse binary is made self-extracting #35775 (Yakov Olkhovskiy, Arthur Filatenkov).
- Update tzdata to 2022b to support the new timezone changes. See https://github.com/google/cctz/pull/226. Chile's 2022 DST start is delayed from September 4 to September 11. Iran plans to stop observing DST permanently, after it falls back on 2022-09-21. There are corrections of the historical time zone of Asia/Tehran in the year 1977: Iran adopted standard time in 1935, not 1946. In 1977 it observed DST from 03-21 23:00 to 10-20 24:00; its 1978 transitions were on 03-24 and 08-05, not 03-20 and 10-20; and its spring 1979 transition was on 05-27, not 03-21 (https://data.iana.org/time-zones/tzdb/NEWS). (Alexey Milovidov).
- Former packages used to install systemd.service file to /etc. The files there are marked asconfand are not cleaned out, and not updated automatically. This PR cleans them out. #39323 (Mikhail f. Shiryaev).
- Ensure LSan is effective. #39430 (Azat Khuzhin).
- TSAN has issues with clang-14 (https://github.com/google/sanitizers/issues/1552, https://github.com/google/sanitizers/issues/1540), so here we build the TSAN binaries with clang-15. #39450 (Mikhail f. Shiryaev).
- Remove the option to build ClickHouse tools as separate executable programs. This fixes #37847. #39520 (Alexey Milovidov).
- Small preparations for build on s390x (which is big-endian). #39627 (Harry Lee). #39656 (Harry Lee). Fixed Endian issue in BitHelpers for s390x. #39656 (Harry Lee). Implement a piece of code related to SipHash for s390x architecture (which is not supported by ClickHouse). #39732 (Harry Lee). Fixed an Endian issue in Coordination snapshot code for s390x architecture (which is not supported by ClickHouse). #39931 (Harry Lee). Fixed Endian issues in Codec code for s390x architecture (which is not supported by ClickHouse). #40008 (Harry Lee). Fixed Endian issues in reading/writing BigEndian binary data in ReadHelpers and WriteHelpers code for s390x architecture (which is not supported by ClickHouse). #40179 (Harry Lee).
- Support build with clang-16(trunk). This closes #39949. #40181 (Alexey Milovidov).
- Prepare RISC-V 64 build to run in CI. This is for #40141. #40197 (Alexey Milovidov).
- Simplified function registration macro interface (FUNCTION_REGISTER*) to eliminate the step to add and call an extern function in the registerFunctions.cpp, it also makes incremental builds of a new function faster. #38615 (Li Yin).
- Docker: Now entrypoint.sh in docker image creates and executes chown for all folders it found in config for multidisk setup #17717. #39121 (Nikita Mikhaylov).
Bug Fix
- Fix possible segfault in CapnProtoinput format. This bug was found and send through ClickHouse bug-bounty program by kiojj. #40241 (Kruglov Pavel).
- Fix a very rare case of incorrect behavior of array subscript operator. This closes #28720. #40185 (Alexey Milovidov).
- Fix insufficient argument check for encryption functions (found by query fuzzer). This closes #39987. #40194 (Alexey Milovidov).
- Fix the case when the order of columns can be incorrect if the INoperator is used with a table withENGINE = Setcontaining multiple columns. This fixes #13014. #40225 (Alexey Milovidov).
- Fix seeking while reading from encrypted disk. This PR fixes #38381. #39687 (Vitaly Baranov).
- Fix duplicate columns in join plan. Finally, solve #26809. #40009 (Vladimir C).
- Fixed query hanging for SELECT with ORDER BY WITH FILL with different date/time types. #37849 (Yakov Olkhovskiy).
- Fix ORDER BY that matches projections ORDER BY (before it simply returns unsorted result). #38725 (Azat Khuzhin).
- Do not optimise functions in GROUP BY statements if they shadow one of the table columns or expressions. Fixes #37032. #39103 (Anton Kozlov).
- Fix wrong table name in logs after RENAME TABLE. This fixes #38018. #39227 (Amos Bird).
- Fix positional arguments in case of columns pruning when optimising the query. Closes #38433. #39293 (Kseniia Sumarokova).
- Fix bug in schema inference in case of empty messages in Protobuf/CapnProto formats that allowed to create column with empty Tupletype. Closes #39051 Add 2 new settingsinput_format_{protobuf/capnproto}_skip_fields_with_unsupported_types_in_schema_inferencethat allow to skip fields with unsupported types while schema inference for Protobuf and CapnProto formats. #39357 (Kruglov Pavel).
- (Window View is an experimental feature) Fix segmentation fault on CREATE WINDOW VIEW .. ON CLUSTER ... INNER. Closes #39363. #39384 (Kseniia Sumarokova).
- Fix WriteBuffer finalize when cancelling insert into function (in previous versions it may leat to std::terminate). #39458 (Kruglov Pavel).
- Fix storing of columns of type Objectin sparse serialization. #39464 (Anton Popov).
- Fix possible "Not found column in block" exception when using projections. This closes #39469. #39470 (小路).
- Fix exception on race between DROP and INSERT with materialized views. #39477 (Azat Khuzhin).
- A bug in Apache Avro library: fix data race and possible heap-buffer-overflow in Avro format. Closes #39094 Closes #33652. #39498 (Kruglov Pavel).
- Fix rare bug in asynchronous reading (with setting local_filesystem_read_method='pread_threadpool') with enabledO_DIRECT(enabled by settingmin_bytes_to_use_direct_io). #39506 (Anton Popov).
- (only on FreeBSD) Fixes "Code: 49. DB::Exception: FunctionFactory: the function name '' is not unique. (LOGICAL_ERROR)" observed on FreeBSD when starting clickhouse. #39551 (Alexander Gololobov).
- Fix bug with the recently introduced "maxsplit" argument for splitByChar, which was not working correctly. #39552 (filimonov).
- Fix bug in ASOF JOIN with enable_optimize_predicate_expression, close #37813. #39556 (Vladimir C).
- Fixed CREATE/DROP INDEXquery withON CLUSTERorReplicateddatabase andReplicatedMergeTree. It used to be executed on all replicas (causing error or DDL queue stuck). Fixes #39511. #39565 (Alexander Tokmakov).
- Fix "column not found" error for push down with join, close #39505. #39575 (Vladimir C).
- Fix the wrong REGEXP_REPLACEalias. This fixes https://github.com/ClickHouse/ClickBench/issues/9. #39592 (Alexey Milovidov).
- Fixed point of origin for exponential decay window functions to the last value in window. Previously, decay was calculated by formula exp((t - curr_row_t) / decay_length), which is incorrect when right boundary of window is notCURRENT ROW. It was changed to:exp((t - last_row_t) / decay_length). There is no change in results for windows withROWS BETWEEN (smth) AND CURRENT ROW. #39593 (Vladimir Chebotaryov).
- Fix Decimal division overflow, which can be detected based on operands scale. #39600 (Andrey Zvonov).
- Fix settings output_format_arrow_string_as_stringandoutput_format_arrow_low_cardinality_as_dictionarywork in combination. Closes #39624. #39647 (Kruglov Pavel).
- Fixed a bug in default database resolution in distributed table reads. #39674 (Anton Kozlov).
- (Only with the obsolete Ordinary databases) Select might read data of dropped table if cache for mmap IO is used and database engine is Ordinary and new tables was created with the same name as dropped one had. It's fixed. #39708 (Alexander Tokmakov).
- Fix possible error Invalid column type for ColumnUnique::insertRangeFrom. Expected String, got ColumnLowCardinalityFixes #38460. #39716 (Arthur Passos).
- Field names in the metasection of JSON format were erroneously double escaped. This closes #39693. #39747 (Alexey Milovidov).
- Fix wrong index analysis with tuples and operator IN, which could lead to wrong query result. #39752 (Anton Popov).
- Fix EmbeddedRocksDBtables filtering by key using params. #39757 (Antonio Andelic).
- Fix error Invalid number of columns in chunk pushed to OutputPortwhich was caused by ARRAY JOIN optimization. Fixes #39164. #39799 (Nikolai Kochetov).
- A workaround for a bug in Linux kernel. Fix CANNOT_READ_ALL_DATAexception withlocal_filesystem_read_method=pread_threadpool. This bug affected only Linux kernel version 5.9 and 5.10 according to man. #39800 (Anton Popov).
- (Only on NFS) Fix broken NFS mkdir for root-squashed volumes. #39898 (Constantine Peresypkin).
- Remove dictionaries from prometheus metrics on DETACH/DROP. #39926 (Azat Khuzhin).
- Fix read of StorageFile with virtual columns. Closes #39907. #39943 (flynn).
- Fix big memory usage during fetches. Fixes #39915. #39990 (Nikolai Kochetov).
- (experimental feature) Fix hashIdcrash and salt parameter not being used. #40002 (Raúl Marín).
- EXCEPTand- INTERSECToperators may lead to crash if a specific combination of constant and non-constant columns were used. #40020 (Duc Canh Le).
- Fixed "Part directory doesn't exist" and "tmp_<part_name>... No such file or directory" errors during too slow INSERT or too long merge/mutation. Also fixed issue that may cause some replication queue entries to stuck without any errors or warnings in logs if previous attempt to fetch part failed, buttmp-fetch_<part_name>directory was not cleaned up. #40031 (Alexander Tokmakov).
- Fix rare cases of parsing of arrays of tuples in format Values. #40034 (Anton Popov).
- Fixes ArrowColumn format Dictionary(X) & Dictionary(Nullable(X)) conversion to ClickHouse LowCardinality(X) & LowCardinality(Nullable(X)) respectively. #40037 (Arthur Passos).
- Fix potential deadlock in writing to S3 during task scheduling failure. #40070 (Maksim Kita).
- Fix bug in collectFilesToSkip() by adding correct file extension (.idx or idx2) for indexes to be recalculated, avoid wrong hard links. Fixed #39896. #40095 (Jianmei Zhang).
- A fix for reverse DNS resolution. #40134 (Arthur Passos).
- Fix unexpected result arrayDifferenceof `Array(UInt32). #40211 (Duc Canh Le).
ClickHouse release 22.7, 2022-07-21
Upgrade Notes
- Enable setting enable_positional_argumentsby default. It allows queries likeSELECT ... ORDER BY 1, 2where 1, 2 are the references to the select clause. If you need to return the old behavior, disable this setting. #38204 (Alexey Milovidov).
- Disable format_csv_allow_single_quotesby default. See #37096. (Kruglov Pavel).
- Ordinarydatabase engine and old storage definition syntax for- *MergeTreetables are deprecated. By default it's not possible to create new databases with- Ordinaryengine. If- systemdatabase has- Ordinaryengine it will be automatically converted to- Atomicon server startup. There are settings to keep old behavior (- allow_deprecated_database_ordinaryand- allow_deprecated_syntax_for_merge_tree), but these settings may be removed in future releases. #38335 (Alexander Tokmakov).
- Force rewriting comma join to inner by default (set default value cross_to_inner_join_rewrite = 2). To have old behavior setcross_to_inner_join_rewrite = 1. #39326 (Vladimir C). If you will face any incompatibilities, you can turn this setting back.
New Feature
- Support expressions with window functions. Closes #19857. #37848 (Dmitry Novik).
- Add new directjoin algorithm forEmbeddedRocksDBtables, see #33582. #35363 (Vladimir C).
- Added full sorting merge join algorithm. #35796 (Vladimir C).
- Implement NATS table engine, which allows to pub/sub to NATS. Closes #32388. #37171 (tchepavel). (Kseniia Sumarokova)
- Implement table function mongodb. Allow writes intoMongoDBstorage / table function. #37213 (aaapetrenko). (Kseniia Sumarokova)
- Add SQLInsertoutput format. Closes #38441. #38477 (Kruglov Pavel).
- Introduced settings additional_table_filters. Using this setting, you can specify additional filtering condition for a table which will be applied directly after reading. Example:select number, x, y from (select number from system.numbers limit 5) f any left join (select x, y from table_1) s on f.number = s.x settings additional_table_filters={'system.numbers : 'number != 3', 'table_1' : 'x != 2'}. Introduced settingadditional_result_filterwhich specifies additional filtering condition for query result. Closes #37918. #38475 (Nikolai Kochetov).
- Add compatibilitysetting andsystem.settings_changessystem table that contains information about changes in settings through ClickHouse versions. Closes #35972. #38957 (Kruglov Pavel).
- Add functions translate(string, from_string, to_string)andtranslateUTF8(string, from_string, to_string). It translates some characters to another. #38935 (Nikolay Degterinsky).
- Support parseTimeDeltafunction. It can be used like;-+,:can be used as separators, eg.1yr-2mo,2m:6s:SELECT parseTimeDelta('1yr-2mo-4w + 12 days, 3 hours : 1 minute ; 33 seconds'). #39071 (jiahui-97).
- Added CREATE TABLE ... EMPTY AS SELECTquery. It automatically deduces table structure from the SELECT query, but does not fill the table after creation. Resolves #38049. #38272 (Alexander Tokmakov).
- Added options to limit IO operations with remote storage: max_remote_read_network_bandwidth_for_serverandmax_remote_write_network_bandwidth_for_server. #39095 (Sergei Trifonov).
- Add group_by_use_nullssetting to make aggregation key columns nullable in the case of ROLLUP, CUBE and GROUPING SETS. Closes #37359. #38642 (Dmitry Novik).
- Add the ability to specify compression level during data export. #38907 (Nikolay Degterinsky).
- Add an option to require explicit grants to SELECT from the systemdatabase. Details: #38970 (Vitaly Baranov).
- Functions multiMatchAny,multiMatchAnyIndex,multiMatchAllIndicesand their fuzzy variants now accept non-const pattern array argument. #38485 (Robert Schulze). SQL functionmultiSearchAllPositionsnow accepts non-const needle arguments. #39167 (Robert Schulze).
- Add a setting zstd_window_log_maxto configure max memory usage on zstd decoding when importing external files. Closes #35693. #37015 (wuxiaobai24).
- Add send_logs_source_regexpsetting. Send server text logs with specified regexp to match log source name. Empty means all sources. #39161 (Amos Bird).
- Support ALTERforHivetables. #38214 (lgbo).
- Support isNullablefunction. This function checks whether it's argument is nullable and return 1 or 0. Closes #38611. #38841 (lokax).
- Added functions for base58 encoding/decoding. #38159 (Andrey Zvonov).
- Add chart visualization to Play UI. #38197 (Alexey Milovidov).
- Added L2 Squared distance and norm functions for both arrays and tuples. #38545 (Julian Gilyadov).
- Add ability to pass HTTP headers to the urltable function / storage via SQL. Closes #37897. #38176 (Kseniia Sumarokova).
- Add clickhouse-diagnosticsbinary to the packages. #38647 (Mikhail f. Shiryaev).
Experimental Feature
- Adds new setting implicit_transactionto run standalone queries inside a transaction. It handles both creation and closing (via COMMIT if the query succeeded or ROLLBACK if it didn't) of the transaction automatically. #38344 (Raúl Marín).
Performance Improvement
- Distinct optimization for sorted columns. Use specialized distinct transformation in case input stream is sorted by column(s) in distinct. Optimization can be applied to pre-distinct, final distinct, or both. Initial implementation by @dimarub2000. #37803 (Igor Nikonov).
- Improve performance of ORDER BY,MergeTreemerges, window functions using batch version ofBinaryHeap. #38022 (Maksim Kita).
- More parallel execution for queries with FINAL#36396 (Nikita Taranov).
- Fix significant join performance regression which was introduced in #35616. It's interesting that common join queries such as ssb queries have been 10 times slower for almost 3 months while no one complains. #38052 (Amos Bird).
- Migrate from the Intel hyperscan library to vectorscan, this speeds up many string matching on non-x86 platforms. #38171 (Robert Schulze).
- Increased parallelism of query plan steps executed after aggregation. #38295 (Nikita Taranov).
- Improve performance of insertion to columns of type JSON. #38320 (Anton Popov).
- Optimized insertion and lookups in the HashTable. #38413 (Nikita Taranov).
- Fix performance degradation from #32493. #38417 (Alexey Milovidov).
- Improve performance of joining with numeric columns using SIMD instructions. #37235 (zzachimed). #38565 (Maksim Kita).
- Norm and Distance functions for arrays speed up 1.2-2 times. #38740 (Alexander Gololobov).
- Add AVX-512 VBMI optimized copyOverlap32Shufflefor LZ4 decompression. In other words, LZ4 decompression performance is improved. #37891 (Guo Wangyang).
- ORDER BY (a, b)will use all the same benefits as- ORDER BY a, b. #38873 (Igor Nikonov).
- Align branches within a 32B boundary to make benchmark more stable. #38988 (Guo Wangyang). It improves performance 1..2% on average for Intel.
- Executable UDF, executable dictionaries, and Executable tables will avoid wasting one second during wait for subprocess termination. #38929 (Constantine Peresypkin).
- Optimize accesses to system.stack_tracetable if not all columns are selected. #39177 (Azat Khuzhin).
- Improve isNullable/isConstant/isNull/isNotNull performance for LowCardinality argument. #39192 (Kruglov Pavel).
- Optimized processing of ORDER BY in window functions. #34632 (Vladimir Chebotarev).
- The table system.asynchronous_metric_logis further optimized for storage space. This closes #38134. See the YouTube video. #38428 (Alexey Milovidov).
Improvement
- Support SQL standard CREATE INDEX and DROP INDEX syntax. #35166 (Jianmei Zhang).
- Send profile events for INSERT queries (previously only SELECT was supported). #37391 (Azat Khuzhin).
- Implement in order aggregation (optimize_aggregation_in_order) for fully materialized projections. #37469 (Azat Khuzhin).
- Remove subprocess run for kerberos initialization. Added new integration test. Closes #27651. #38105 (Roman Vasin).
- Add setting multiple_joins_try_to_keep_original_namesto not rewrite identifier name on multiple JOINs rewrite, close #34697. #38149 (Vladimir C).
 
- Add setting 
- Improved trace-visualizer UX. #38169 (Sergei Trifonov).
- Enable stack trace collection and query profiler for AArch64. #38181 (Maksim Kita).
- Do not skip symlinks in user_defineddirectory during SQL user defined functions loading. Closes #38042. #38184 (Maksim Kita).
- Added background cleanup of subdirectories in store/. In some cases clickhouse-server might left garbage subdirectories instore/(for example, on unsuccessful table creation) and those dirs were never been removed. Fixes #33710. #38265 (Alexander Tokmakov).
- Add DESCRIBE CACHEquery to show cache settings from config. AddSHOW CACHESquery to show available filesystem caches list. #38279 (Kseniia Sumarokova).
- Add access check for system drop filesystem cache. Support ON CLUSTER. #38319 (Kseniia Sumarokova).
- Fix PostgreSQL database engine incompatibility on upgrade from 21.3 to 22.3. Closes #36659. #38369 (Kseniia Sumarokova).
- filesystemAvailableand similar functions now work in- clickhouse-local. This closes #38423. #38424 (Alexey Milovidov).
- Add revisionfunction. #38555 (Azat Khuzhin).
- Fix GCS via proxy tunnel usage. #38726 (Azat Khuzhin).
- Support \i filein clickhouse client / local (similar to psql \i). #38813 (Kseniia Sumarokova).
- New option optimize = 1inEXPLAIN AST. If enabled, it shows AST after it's rewritten, otherwise AST of original query. Disabled by default. #38910 (Igor Nikonov).
- Allow trailing comma in columns list. closes #38425. #38440 (chen).
- Bugfixes and performance improvements for parallel_hashJOIN method. #37648 (Vladimir C).
- Support hadoop secure RPC transfer (hadoop.rpc.protection=privacy and hadoop.rpc.protection=integrity). #37852 (Peng Liu).
- Add struct type support in StorageHive. #38118 (lgbo).
- S3 single objects are now removed with RemoveObjectRequest. Implement compatibility with GCP which did not allow to useremoveFileIfExistseffectively breaking approximately half ofremovefunctionality. Automatic detection forDeleteObjectsS3 API, that is not supported by GCS. This will allow to use GCS without explicitsupport_batch_delete=0in configuration. #37882 (Vladimir Chebotarev).
- Expose basic ClickHouse Keeper related monitoring data (via ProfileEvents and CurrentMetrics). #38072 (lingpeng0314).
- Support auto_closeoption for PostgreSQL engine connection. Closes #31486. #38363 (Kseniia Sumarokova).
- Allow NULLmodifier in columns declaration for table functions. #38816 (Kruglov Pavel).
- Deactivate mutations_finalizing_taskbefore shutdown to avoid benignTABLE_IS_READ_ONLYerrors during shutdown. #38851 (Raúl Marín).
- Eliminate unnecessary waiting of SELECT queries after ALTER queries in presence of INSERT queries if you use deprecated Ordinary databases. #38864 (Azat Khuzhin).
- New option rewriteinEXPLAIN AST. If enabled, it shows AST after it's rewritten, otherwise AST of original query. Disabled by default. #38910 (Igor Nikonov).
- Stop reporting Zookeeper "Node exists" exceptions in system.errors when they are expected. #38961 (Raúl Marín).
- clickhouse-keeper: add support for real-time digest calculation and verification. It is disabled by default. #37555 (Antonio Andelic).
- Allow to specify globs * or {expr1, expr2, expr3}inside a key forclickhouse-extract-from-configtool. #38966 (Nikita Mikhaylov).
- clearOldLogs: Don't report KEEPER_EXCEPTION on concurrent deletes. #39016 (Raúl Marín).
- clickhouse-keeper improvement: persist meta-information about keeper servers to disk. #39069 (Antonio Andelic). This will make it easier to operate if you shutdown or restart all keeper nodes at the same time.
- Continue without exception when running out of disk space when using filesystem cache. #39106 (Kseniia Sumarokova).
- Handling SIGTERM signals from k8s. #39130 (Timur Solodovnikov).
- Add merge_algorithmcolumn (Undecided, Horizontal, Vertical) to system.part_log. #39181 (Azat Khuzhin).
- Don't increment a counter in system.errorswhen the disk is not rotational. #39216 (Raúl Marín).
- The metric result_bytesforINSERTqueries insystem.query_logshows number of bytes inserted. Previously value was incorrect and stored the same value asresult_rows. #39225 (Ilya Yatsishin).
- The CPU usage metric in clickhouse-client will be displayed in a better way. Fixes #38756. #39280 (Sergei Trifonov).
- Rethrow exception on filesystem cache initialization on server startup, better error message. #39386 (Kseniia Sumarokova).
- OpenTelemetry now collects traces without Processors spans by default (there are too many). To enable Processors spans collection opentelemetry_trace_processorssetting. #39170 (Ilya Yatsishin).
- Functions multiMatch[Fuzzy](AllIndices/Any/AnyIndex)- don't throw a logical error if the needle argument is empty. #39012 (Robert Schulze).
- Allow to declare RabbitMQqueue without default argumentsx-max-lengthandx-overflow. #39259 (rnbondarenko).
Build/Testing/Packaging Improvement
- Apply Clang Thread Safety Analysis (TSA) annotations to ClickHouse. #38068 (Robert Schulze).
- Adapt universal installation script for FreeBSD. #39302 (Alexey Milovidov).
- Preparation for building on s390xplatform. #39193 (Harry Lee).
- Fix a bug in jemalloclibrary #38757 (Azat Khuzhin).
- Hardware benchmark now has support for automatic results uploading. #38427 (Alexey Milovidov).
- System table "system.licenses" is now correctly populated on Mac (Darwin). #38294 (Robert Schulze).
- Change all|noarchpackages to architecture-dependent - Fix some documentation for it - Push aarch64|arm64 packages to artifactory and release assets - Fixes #36443. #38580 (Mikhail f. Shiryaev).
Bug Fix (user-visible misbehavior in official stable or prestable release)
- Fix rounding for Decimal128/Decimal256with more than 19-digits long scale. #38027 (Igor Nikonov).
- Fixed crash caused by data race in storage Hive(integration table engine). #38887 (lgbo).
- Fix crash when executing GRANT ALL ON . with ON CLUSTER. It was broken in https://github.com/ClickHouse/ClickHouse/pull/35767. This closes #38618. #38674 (Vitaly Baranov).
- Correct glob expansion in case of {0..10}forms. Fixes #38498 Current Implementation is similar to what shell does mentiond by @rschu1ze here. #38502 (Heena Bansal).
- Fix crash for mapUpdate,mapFilterfunctions when using with constant map argument. Closes #38547. #38553 (hexiaoting).
- Fix toHourmonotonicity information for query optimization which can lead to incorrect query result (incorrect index analysis). This fixes #38333. #38675 (Amos Bird).
- Fix checking whether s3 storage support parallel writes. It resulted in s3 parallel writes not working. #38792 (chen).
- Fix s3 seekable reads with parallel read buffer. (Affected memory usage during query). Closes #38258. #38802 (Kseniia Sumarokova).
- Update simdjson. This fixes #38621 - a buffer overflow on machines with the latest Intel CPUs with AVX-512 VBMI. #38838 (Alexey Milovidov).
- Fix possible logical error for Vertical merges. #38859 (Maksim Kita).
- Fix settings profile with seconds unit. #38896 (Raúl Marín).
- Fix incorrect partition pruning when there is a nullable partition key. Note: most likely you don't use nullable partition keys - this is an obscure feature you should not use. Nullable keys are a nonsense and this feature is only needed for some crazy use-cases. This fixes #38941. #38946 (Amos Bird).
- Improve fsync_part_directoryfor fetches. #38993 (Azat Khuzhin).
- Fix possible dealock inside OvercommitTracker. Fixes #37794. #39030 (Dmitry Novik).
- Fix bug in filesystem cache that could happen in some corner case which coincided with cache capacity hitting the limit. Closes #39066. #39070 (Kseniia Sumarokova).
- Fix some corner cases of interpretation of the arguments of window expressions. Fixes #38538 Allow using of higher-order functions in window expressions. #39112 (Dmitry Novik).
- Keep LowCardinalitytype intuplefunction. PreviouslyLowCardinalitytype was dropped and elements of created tuple had underlying type ofLowCardinality. #39113 (Anton Popov).
- Fix error Block structure mismatchwhich could happen for INSERT into table with attached MATERIALIZED VIEW and enabled settingextremes = 1. Closes #29759 and #38729. #39125 (Nikolai Kochetov).
- Fix unexpected query result when both optimize_trivial_count_queryandempty_result_for_aggregation_by_empty_setare set to true. This fixes #39140. #39155 (Amos Bird).
- Fixed error Not found column Type in blockin selects withPREWHEREand read-in-order optimizations. #39157 (Yakov Olkhovskiy).
- Fix extremely rare race condition in during hardlinks for remote filesystem. The only way to reproduce it is concurrent run of backups. #39190 (alesapin).
- (zero-copy replication is an experimental feature that should not be used in production) Fix fetch of in-memory part with allow_remote_fs_zero_copy_replication. #39214 (Azat Khuzhin).
- (MaterializedPostgreSQL - experimental feature). Fix segmentation fault in MaterializedPostgreSQL database engine, which could happen if some exception occurred at replication initialisation. Closes #36939. #39272 (Kseniia Sumarokova).
- Fix incorrect fetch of table metadata from PostgreSQL database engine. Closes #33502. #39283 (Kseniia Sumarokova).
- Fix projection exception when aggregation keys are wrapped inside other functions. This fixes #37151. #37155 (Amos Bird).
- Fix possible logical error ... with argument with type Nothing and default implementation for Nothing is expected to return result with type Nothing, got ...in some functions. Closes: #37610 Closes: #37741. #37759 (Kruglov Pavel).
- Fix incorrect columns order in subqueries of UNION (in case of duplicated columns in subselects may produce incorrect result). #37887 (Azat Khuzhin).
- Fix incorrect work of MODIFY ALTER Column with column names that contain dots. Closes #37907. #37971 (Kruglov Pavel).
- Fix reading of sparse columns from MergeTreetables that store their data in S3. #37978 (Anton Popov).
- Fix possible crash in Distributedasync insert in case of removing a replica from config. #38029 (Nikolai Kochetov).
- Fix "Missing columns" for GLOBAL JOIN with CTE without alias. #38056 (Azat Khuzhin).
- Rewrite tuple functions as literals in backwards-compatibility mode. #38096 (Anton Kozlov).
- Fix redundant memory reservation for output block during ORDER BY. #38127 (iyupeng).
- Fix possible logical error Bad cast from type DB::IColumn* to DB::ColumnNullable*in array mapped functions. Closes #38006. #38132 (Kruglov Pavel).
- Fix temporary name clash in partial merge join, close #37928. #38135 (Vladimir C).
- Some minr issue with queries like CREATE TABLE nested_name_tuples (aTuple(x String, y Tuple(i Int32, j String))) ENGINE = Memory;#38136 (lgbo).
- Fix bug with nested short-circuit functions that led to execution of arguments even if condition is false. Closes #38040. #38173 (Kruglov Pavel).
- (Window View is a experimental feature) Fix LOGICAL_ERROR for WINDOW VIEW with incorrect structure. #38205 (Azat Khuzhin).
- Update librdkafka submodule to fix crash when an OAUTHBEARER refresh callback is set. #38225 (Rafael Acevedo).
- Fix INSERT into Distributed hung due to ProfileEvents. #38307 (Azat Khuzhin).
- Fix retries in PostgreSQL engine. #38310 (Kseniia Sumarokova).
- Fix optimization in PartialSortingTransform (SIGSEGV and possible incorrect result). #38324 (Azat Khuzhin).
- Fix RabbitMQ with formats based on PeekableReadBuffer. Closes #38061. #38356 (Kseniia Sumarokova).
- MaterializedPostgreSQL - experimentail feature. Fix possible Invalid number of rows in Chunkin MaterializedPostgreSQL. Closes #37323. #38360 (Kseniia Sumarokova).
- Fix RabbitMQ configuration with connection string setting. Closes #36531. #38365 (Kseniia Sumarokova).
- Fix PostgreSQL engine not using PostgreSQL schema when retrieving array dimension size. Closes #36755. Closes #36772. #38366 (Kseniia Sumarokova).
- Fix possibly incorrect result of distributed queries with DISTINCTandLIMIT. Fixes #38282. #38371 (Anton Popov).
- Fix wrong results of countSubstrings() & position() on patterns with 0-bytes. #38589 (Robert Schulze).
- Now it's possible to start a clickhouse-server and attach/detach tables even for tables with the incorrect values of IPv4/IPv6 representation. Proper fix for issue #35156. #38590 (alesapin).
- rankCorrfunction will work correctly if some arguments are NaNs. This closes #38396. #38722 (Alexey Milovidov).
- Fix parallel_view_processing=1withoptimize_trivial_insert_select=1. Fixmax_insert_threadswhile pushing to views. #38731 (Azat Khuzhin).
- Fix use-after-free for aggregate functions with Mapcombinator that leads to incorrect result. #38748 (Azat Khuzhin).
ClickHouse release 22.6, 2022-06-16
Backward Incompatible Change
- Remove support for octal number literals in SQL. In previous versions they were parsed as Float64. #37765 (Yakov Olkhovskiy).
- Changes how settings using secondsas type are parsed to support floating point values (for example:max_execution_time=0.5). Infinity or NaN values will throw an exception. #37187 (Raúl Marín).
- Changed format of binary serialization of columns of experimental type Object. New format is more convenient to implement by third-party clients. #37482 (Anton Popov).
- Turn on setting output_format_json_named_tuples_as_objectsby default. It allows to serialize named tuples as JSON objects in JSON formats. #37756 (Anton Popov).
- LIKE patterns with trailing escape symbol ('\') are now disallowed (as mandated by the SQL standard). #37764 (Robert Schulze).
- If you run different ClickHouse versions on a cluster with AArch64 CPU or mix AArch64 and amd64 on a cluster, and use distributed queries with GROUP BY multiple keys of fixed-size type that fit in 256 bits but don't fit in 64 bits, and the size of the result is huge, the data will not be fully aggregated in the result of these queries during upgrade. Workaround: upgrade with downtime instead of a rolling upgrade.
New Feature
- Add GROUPINGfunction. It allows to disambiguate the records in the queries withROLLUP,CUBEorGROUPING SETS. Closes #19426. #37163 (Dmitry Novik).
- A new codec FPC algorithm for floating point data compression. #37553 (Mikhail Guzov).
- Add new columnar JSON formats: JSONColumns,JSONCompactColumns,JSONColumnsWithMetadata. Closes #36338 Closes #34509. #36975 (Kruglov Pavel).
- Added open telemetry traces visualizing tool based on d3js. #37810 (Sergei Trifonov).
- Support INSERTs into system.zookeepertable. Closes #22130. #37596 (Han Fei).
- Support non-constant pattern argument for LIKE,ILIKEandmatchfunctions. #37251 (Robert Schulze).
- Executable user defined functions now support parameters. Example: SELECT test_function(parameters)(arguments). Closes #37578. #37720 (Maksim Kita).
- Add merge_reasoncolumn to system.part_log table. #36912 (Sema Checherinda).
- Add support for Maps and Records in Avro format. Add new setting input_format_avro_null_as_defaultthat allow to insert null as default in Avro format. Closes #18925 Closes #37378 Closes #32899. #37525 (Kruglov Pavel).
- Add clickhouse-diskstool to introspect and operate on virtual filesystems configured for ClickHouse. #36060 (Artyom Yurkov).
- Adds H3 unidirectional edge functions. #36843 (Bharat Nallan).
- Add support for calculating hashids from unsigned integers. #37013 (Michael Nutt).
- Explicit SALTspecification is allowed forCREATE USER <user> IDENTIFIED WITH sha256_hash. #37377 (Yakov Olkhovskiy).
- Add two new settings input_format_csv_skip_first_lines/input_format_tsv_skip_first_linesto allow skipping specified number of lines in the beginning of the file in CSV/TSV formats. #37537 (Kruglov Pavel).
- showCertificatefunction shows current server's SSL certificate. #37540 (Yakov Olkhovskiy).
- HTTP source for Data Dictionaries in Named Collections is supported. #37581 (Yakov Olkhovskiy).
- Added a new window function nonNegativeDerivative(metric_column, timestamp_column[, INTERVAL x SECOND]). #37628 (Andrey Zvonov).
- Implemented changing the comment for ReplicatedMergeTreetables. #37416 (Vasily Nemkov).
- Added SYSTEM UNFREEZEquery that deletes the whole backup regardless if the corresponding table is deleted or not. #36424 (Vadim Volodin).
Experimental Feature
- Enables POPULATEforWINDOW VIEW. #36945 (vxider).
- ALTER TABLE ... MODIFY QUERYsupport for- WINDOW VIEW. #37188 (vxider).
- This PR changes the behavior of the ENGINEsyntax inWINDOW VIEW, to make it like inMATERIALIZED VIEW. #37214 (vxider).
Performance Improvement
- Added numerous optimizations for ARM NEON #38093(Daniel Kutenin), (Alexandra Pilipyuk) Note: if you run different ClickHouse versions on a cluster with ARM CPU and use distributed queries with GROUP BY multiple keys of fixed-size type that fit in 256 bits but don't fit in 64 bits, the result of the aggregation query will be wrong during upgrade. Workaround: upgrade with downtime instead of a rolling upgrade.
- Improve performance and memory usage for select of subset of columns for formats Native, Protobuf, CapnProto, JSONEachRow, TSKV, all formats with suffixes WithNames/WithNamesAndTypes. Previously while selecting only subset of columns from files in these formats all columns were read and stored in memory. Now only required columns are read. This PR enables setting input_format_skip_unknown_fieldsby default, because otherwise in case of select of subset of columns exception will be thrown. #37192 (Kruglov Pavel).
- Now more filters can be pushed down for join. #37472 (Amos Bird).
- Load marks for only necessary columns when reading wide parts. #36879 (Anton Kozlov).
- Improved performance of aggregation in case, when sparse columns (can be enabled by experimental setting ratio_of_defaults_for_sparse_serializationinMergeTreetables) are used as arguments in aggregate functions. #37617 (Anton Popov).
- Optimize function COALESCEwith two arguments. #37666 (Anton Popov).
- Replace multiIftoifin case whenmultiIfhas only one condition, because functionifis more performant. #37695 (Anton Popov).
- Improve performance of dictGetDescendants,dictGetChildrenfunctions, create temporary parent to children hierarchical index per query, not per function call during query. Allow to specifyBIDIRECTIONALforHIERARHICALattributes, dictionary will maintain parent to children index in memory, that way functionsdictGetDescendants,dictGetChildrenwill not create temporary index per query. Closes #32481. #37148 (Maksim Kita).
- Aggregates state destruction now may be posted on a thread pool. For queries with LIMIT and big state it provides significant speedup, e.g. select uniq(number) from numbers_mt(1e7) group by number limit 100became around 2.5x faster. #37855 (Nikita Taranov).
- Improve sort performance by single column. #37195 (Maksim Kita).
- Improve performance of single column sorting using sorting queue specializations. #37990 (Maksim Kita).
- Improved performance on array norm and distance functions 2x-4x times. #37394 (Alexander Gololobov).
- Improve performance of number comparison functions using dynamic dispatch. #37399 (Maksim Kita).
- Improve performance of ORDER BY with LIMIT. #37481 (Maksim Kita).
- Improve performance of hasAllfunction using dynamic dispatch infrastructure. #37484 (Maksim Kita).
- Improve performance of greatCircleAngle,greatCircleDistance,geoDistancefunctions. #37524 (Maksim Kita).
- Improve performance of insert into MergeTree if there are multiple columns in ORDER BY. #35762 (Maksim Kita).
- Fix excessive CPU usage in background when there are a lot of tables. #38028 (Maksim Kita).
- Improve performance of notfunction using dynamic dispatch. #38058 (Maksim Kita).
- Optimized the internal caching of re2 patterns which occur e.g. in LIKE and MATCH functions. #37544 (Robert Schulze).
- Improve filter bitmask generator function all in one with AVX-512 instructions. #37588 (yaqi-zhao).
- Apply read method threadpoolfor Hive integration engine. This will significantly speed up reading. #36328 (李扬).
- When all the columns to read are partition keys, construct columns by the file's row number without real reading the Hive file. #37103 (lgbo).
- Support multi disks for caching hive files. #37279 (lgbo).
- Limiting the maximum cache usage per query can effectively prevent cache pool contamination. Related Issues. #37859 (Han Shukai).
- Currently clickhouse directly downloads all remote files to the local cache (even if they are only read once), which will frequently cause IO of the local hard disk. In some scenarios, these IOs may not be necessary and may easily cause negative optimization. As shown in the figure below, when we run SSB Q1-Q4, the performance of the cache has caused negative optimization. #37516 (Han Shukai).
- Allow to prune the list of files via virtual columns such as _fileand_pathwhen reading from S3. This is for #37174 , #23494. #37356 (Amos Bird).
- In function: CompressedWriteBuffer::nextImpl(), there is an unnecessary write-copy step that would happen frequently during inserting data. Below shows the differentiation with this patch: - Before: 1. Compress "working_buffer" into "compressed_buffer" 2. write-copy into "out" - After: Directly Compress "working_buffer" into "out". #37242 (jasperzhu).
Improvement
- Support types with non-standard defaults in ROLLUP, CUBE, GROUPING SETS. Closes #37360. #37667 (Dmitry Novik).
- Fix stack traces collection on ARM. Closes #37044. Closes #15638. #37797 (Maksim Kita).
- Client will try every IP address returned by DNS resolution until successful connection. #37273 (Yakov Olkhovskiy).
- Allow to use String type instead of Binary in Arrow/Parquet/ORC formats. This PR introduces 3 new settings for it: output_format_arrow_string_as_string,output_format_parquet_string_as_string,output_format_orc_string_as_string. Default value for all settings isfalse. #37327 (Kruglov Pavel).
- Apply setting input_format_max_rows_to_read_for_schema_inferencefor all read rows in total from all files in globs. Previously settinginput_format_max_rows_to_read_for_schema_inferencewas applied for each file in glob separately and in case of huge number of nulls we could read firstinput_format_max_rows_to_read_for_schema_inferencerows from each file and get nothing. Also increase default value for this setting to 25000. #37332 (Kruglov Pavel).
- Add separate CLUSTERgrant (andaccess_control_improvements.on_cluster_queries_require_cluster_grantconfiguration directive, for backward compatibility, default tofalse). #35767 (Azat Khuzhin).
- Added support for schema inference for hdfsCluster. #35812 (Nikita Mikhaylov).
- Implement least_usedload balancing algorithm for disks inside volume (multi disk configuration). #36686 (Azat Khuzhin).
- Modify the HTTP Endpoint to return the full stats under the X-ClickHouse-Summaryheader whensend_progress_in_http_headers=0(before it would return all zeros). - Modify the HTTP Endpoint to returnX-ClickHouse-Exception-Codeheader when progress has been sent before (send_progress_in_http_headers=1) - Modify the HTTP Endpoint to returnHTTP_REQUEST_TIMEOUT(408) instead ofHTTP_INTERNAL_SERVER_ERROR(500) onTIMEOUT_EXCEEDEDerrors. #36884 (Raúl Marín).
- Allow a user to inspect grants from granted roles. #36941 (nvartolomei).
- Do not calculate an integral numerically but use CDF functions instead. This will speed up execution and will increase the precision. This fixes #36714. #36953 (Nikita Mikhaylov).
- Add default implementation for Nothing in functions. Now most of the functions will return column with type Nothing in case one of it's arguments is Nothing. It also solves problem with functions like arrayMap/arrayFilter and similar when they have empty array as an argument. Previously queries like select arrayMap(x -> 2 * x, []);failed because function inside lambda cannot work with typeNothing, now such queries return empty array with typeArray(Nothing). Also add support for arrays of nullable types in functions like arrayFilter/arrayFill. Previously, queries likeselect arrayFilter(x -> x % 2, [1, NULL])failed, now they work (if the result of lambda is NULL, then this value won't be included in the result). Closes #37000. #37048 (Kruglov Pavel).
- Now if a shard has local replica we create a local plan and a plan to read from all remote replicas. They have shared initiator which coordinates reading. #37204 (Nikita Mikhaylov).
- Do no longer abort server startup if configuration option "mark_cache_size" is not explicitly set. #37326 (Robert Schulze).
- Allows providing NULL/NOT NULLright after type in column declaration. #37337 (Igor Nikonov).
- optimize file segment PARTIALLY_DOWNLOADED get read buffer. #37338 (xiedeyantu).
- Try to improve short circuit functions processing to fix problems with stress tests. #37384 (Kruglov Pavel).
- Closes #37395. #37415 (Memo).
- Fix extremely rare deadlock during part fetch in zero-copy replication. Fixes #37423. #37424 (metahys).
- Don't allow to create storage with unknown data format. #37450 (Kruglov Pavel).
- Set global_memory_usage_overcommit_max_wait_microsecondsdefault value to 5 seconds. Add info aboutOvercommitTrackerto OOM exception message. AddMemoryOvercommitWaitTimeMicrosecondsprofile event. #37460 (Dmitry Novik).
- Do not display -0.0CPU time in clickhouse-client. It can appear due to rounding errors. This closes #38003. This closes #38038. #38064 (Alexey Milovidov).
- Play UI: Keep controls in place when the page is scrolled horizontally. This makes edits comfortable even if the table is wide and it was scrolled far to the right. The feature proposed by Maksym Tereshchenko from CaspianDB. #37470 (Alexey Milovidov).
- Modify query div in play.html to be extendable beyond 20% height. In case of very long queries it is helpful to extend the textarea element, only today, since the div is fixed height, the extended textarea hides the data div underneath. With this fix, extending the textarea element will push the data div down/up such the extended textarea won't hide it. Also, keeps query box width 100% even when the user adjusting the size of the query textarea. #37488 (guyco87).
- Added ProfileEventsfor introspection of type of written (inserted or merged) parts (Inserted{Wide/Compact/InMemory}Parts,MergedInto{Wide/Compact/InMemory}Parts. Added columnpart_typetosystem.part_log. Resolves #37495. #37536 (Anton Popov).
- clickhouse-keeper improvement: move broken logs to a timestamped folder. #37565 (Antonio Andelic).
- Do not write expired columns by TTL after subsequent merges (before only first merge/optimize of the part will not write expired by TTL columns, all other will do). #37570 (Azat Khuzhin).
- More precise result of the dumpColumnStructuremiscellaneous function in presence of LowCardinality or Sparse columns. In previous versions, these functions were converting the argument to a full column before returning the result. This is needed to provide an answer in #6935. #37633 (Alexey Milovidov).
- clickhouse-keeper: store only unique session IDs for watches. #37641 (Azat Khuzhin).
- Fix possible "Cannot write to finalized buffer". #37645 (Azat Khuzhin).
- Add setting support_batch_deleteforDiskS3to disable multiobject delete calls, which Google Cloud Storage doesn't support. #37659 (Fred Wulff).
- Add an option to disable connection pooling in ODBC bridge. #37705 (Anton Kozlov).
- Functions dictGetHierarchy,dictIsIn,dictGetChildren,dictGetDescendantsadded support nullableHIERARCHICALattribute in dictionaries. Closes #35521. #37805 (Maksim Kita).
- Expose BoringSSL version related info in the system.build_optionstable. #37850 (Bharat Nallan).
- Now clickhouse-server removes delete_tmpdirectories on server start. Fixes #26503. #37906 (alesapin).
- Clean up broken detached parts after timeout. Closes #25195. #37975 (Kseniia Sumarokova).
- Now in MergeTree table engines family failed-to-move parts will be removed instantly. #37994 (alesapin).
- Now if setting always_fetch_merged_partis enabled for ReplicatedMergeTree merges will try to find parts on other replicas rarely with smaller load for [Zoo]Keeper. #37995 (alesapin).
- Add implicit grants with grant option too. For example GRANT CREATE TABLE ON test.* TO A WITH GRANT OPTIONnow allowsAto executeGRANT CREATE VIEW ON test.* TO B. #38017 (Vitaly Baranov).
Build/Testing/Packaging Improvement
- Use clang-14and LLVM infrastructure version 14 for builds. This closes #34681. #34754 (Alexey Milovidov). Note:clang-14has a bug in ThreadSanitizer that makes our CI work worse.
- Allow to drop privileges at startup. This simplifies Docker images. Closes #36293. #36341 (Alexey Milovidov).
- Add docs spellcheck to CI. #37790 (Vladimir C).
- Fix overly aggressive stripping which removed the embedded hash required for checking the consistency of the executable. #37993 (Robert Schulze).
Bug Fix
- Fix SELECT ... INTERSECTandEXCEPT SELECTstatements with constant string types. #37738 (Antonio Andelic).
- Fix GROUP BYAggregateFunction(i.e. youGROUP BYby the column that hasAggregateFunctiontype). #37093 (Azat Khuzhin).
- (experimental WINDOW VIEW) Fix addDependencyin WindowView. This bug can be reproduced like #37237. #37224 (vxider).
- Fix inconsistency in ORDER BY ... WITH FILL feature. Query, containing ORDER BY ... WITH FILL, can generate extra rows when multiple WITH FILL columns are present. #38074 (Yakov Olkhovskiy).
- This PR moving addDependencyfrom constructor tostartup()to avoid adding dependency to a dropped table, fix #37237. #37243 (vxider).
- Fix inserting defaults for missing values in columnar formats. Previously missing columns were filled with defaults for types, not for columns. #37253 (Kruglov Pavel).
- (experimental Object type) Fix some cases of insertion nested arrays to columns of type Object. #37305 (Anton Popov).
- Fix unexpected errors with a clash of constant strings in aggregate function, prewhere and join. Close #36891. #37336 (Vladimir C).
- Fix projections with GROUP/ORDER BY in query and optimize_aggregation_in_order (before the result was incorrect since only finish sorting was performed). #37342 (Azat Khuzhin).
- Fixed error with symbols in key name in S3. Fixes #33009. #37344 (Vladimir Chebotarev).
- Throw an exception when GROUPING SETS used with ROLLUP or CUBE. #37367 (Dmitry Novik).
- Fix LOGICAL_ERROR in getMaxSourcePartsSizeForMerge during merges (in case of non standard, greater, values of background_pool_size/background_merges_mutations_concurrency_ratiohas been specified inconfig.xml(new way) not inusers.xml(deprecated way)). #37413 (Azat Khuzhin).
- Stop removing UTF-8 BOM in RowBinary format. #37428 (Paul Loyd). #37428 (Paul Loyd).
- clickhouse-keeper bugfix: fix force recovery for single node cluster. #37440 (Antonio Andelic).
- Fix logical error in normalizeUTF8 functions. Closes #37298. #37443 (Maksim Kita).
- Fix cast lowcard of nullable in JoinSwitcher, close #37385. #37453 (Vladimir C).
- Fix named tuples output in ORC/Arrow/Parquet formats. #37458 (Kruglov Pavel).
- Fix optimization of monotonous functions in ORDER BY clause in presence of GROUPING SETS. Fixes #37401. #37493 (Dmitry Novik).
- Fix error on joining with dictionary on some conditions. Close #37386. #37530 (Vladimir C).
- Prohibit optimize_aggregation_in_orderwithGROUPING SETS(fixesLOGICAL_ERROR). #37542 (Azat Khuzhin).
- Fix wrong dump information of ActionsDAG. #37587 (zhanglistar).
- Fix converting types for UNION queries (may produce LOGICAL_ERROR). #37593 (Azat Khuzhin).
- Fix WITH FILLmodifier with negative intervals inSTEPclause. Fixes #37514. #37600 (Anton Popov).
- Fix illegal joinGet array usage when join_use_nulls = 1. This fixes #37562 . #37650 (Amos Bird).
- Fix columns number mismatch in cross join, close #37561. #37653 (Vladimir C).
- Fix segmentation fault in show create tablefrom mysql database when it is configured with named collections. Closes #37683. #37690 (Kseniia Sumarokova).
- Fix RabbitMQ Storage not being able to startup on server restart if storage was create without SETTINGS clause. Closes #37463. #37691 (Kseniia Sumarokova).
- SQL user defined functions disable CREATE/DROP in readonly mode. Closes #37280. #37699 (Maksim Kita).
- Fix formatting of Nullable arguments for executable user defined functions. Closes #35897. #37711 (Maksim Kita).
- Fix optimization enabled by setting optimize_monotonous_functions_in_order_byin distributed queries. Fixes #36037. #37724 (Anton Popov).
- Fix possible logical error: Invalid Field get from type UInt64 to type Float64invaluestable function. Closes #37602. #37754 (Kruglov Pavel).
- Fix possible segfault in schema inference in case of exception in SchemaReader constructor. Closes #37680. #37760 (Kruglov Pavel).
- Fix setting cast_ipv4_ipv6_default_on_conversion_error for internal cast function. Closes #35156. #37761 (Maksim Kita).
- Fix toString error on DatatypeDate32. #37775 (LiuNeng).
- The clickhouse-keeper setting dead_session_check_period_mswas transformed into microseconds (multiplied by 1000), which lead to dead sessions only being cleaned up after several minutes (instead of 500ms). #37824 (Michael Lex).
- Fix possible "No more packets are available" for distributed queries (in case of async_socket_for_remote/use_hedged_requestsis disabled). #37826 (Azat Khuzhin).
- (experimental WINDOW VIEW) Do not drop the inner target table when executing ALTER TABLE … MODIFY QUERYin WindowView. #37879 (vxider).
- Fix directory ownership of coordination dir in clickhouse-keeper Docker image. Fixes #37914. #37915 (James Maidment).
- Dictionaries fix custom query with update field and {condition}. Closes #33746. #37947 (Maksim Kita).
- Fix possible incorrect result of SELECT ... WITH FILLin the case whenORDER BYshould be applied afterWITH FILLresult (e.g. for outer query). Incorrect result was caused by optimization forORDER BYexpressions (#35623). Closes #37904. #37959 (Yakov Olkhovskiy).
- (experimental WINDOW VIEW) Add missing default columns when pushing to the target table in WindowView, fix #37815. #37965 (vxider).
- Fixed too large stack frame that would cause compilation to fail. #37996 (Han Shukai).
- When open enable_filesystem_query_cache_limit, throw Reserved cache size exceeds the remaining cache size. #38004 (xiedeyantu).
- Fix converting types for UNION queries (may produce LOGICAL_ERROR). #34775 (Azat Khuzhin).
- TTL merge may not be scheduled again if BackgroundExecutor is busy. --merges_with_ttl_counter is increased in selectPartsToMerge() --merge task will be ignored if BackgroundExecutor is busy --merges_with_ttl_counter will not be decrease. #36387 (lthaooo).
- Fix overridden settings value of normalize_function_names. #36937 (李扬).
- Fix for exponential time decaying window functions. Now respecting boundaries of the window. #36944 (Vladimir Chebotarev).
- Fix possible heap-use-after-free error when reading system.projection_parts and system.projection_parts_columns . This fixes #37184. #37185 (Amos Bird).
- Fixed DateTime64fractional seconds behavior prior to Unix epoch. #37697 (Andrey Zvonov). #37039 (李扬).
ClickHouse release 22.5, 2022-05-19
Upgrade Notes
- Now, background merges, mutations and OPTIMIZEwill not incrementSelectedRowsandSelectedBytesmetrics. They (still) will incrementMergedRowsandMergedUncompressedBytesas it was before. This only affects the metric values, and makes them better. This change does not introduce any incompatibility, but you may wonder about the changes of metrics, so we put in this category. #37040 (Nikolai Kochetov).
- Updated the BoringSSL module to the official FIPS compliant version. This makes ClickHouse FIPS compliant. #35914 (Meena-Renganathan). The ciphers aes-192-cfb128andaes-256-cfb128were removed, because they are not included in the FIPS certified version of BoringSSL.
- max_memory_usagesetting is removed from the default user profile in- users.xml. This enables flexible memory limits for queries instead of the old rigid limit of 10 GB.
- Disable log_query_threadssetting by default. It controls the logging of statistics about every thread participating in query execution. After supporting asynchronous reads, the total number of distinct thread ids became too large, and logging into thequery_thread_loghas become too heavy. #37077 (Alexey Milovidov).
- Remove function groupArraySortedwhich has a bug. #36822 (Alexey Milovidov).
New Feature
- Enable memory overcommit by default. #35921 (Dmitry Novik).
- Add support of GROUPING SETS in GROUP BY clause. This implementation supports a parallel processing of grouping sets. #33631 (Dmitry Novik).
- Added system.certificatestable. #37142 (Yakov Olkhovskiy).
- Adds h3Line,h3Distanceandh3HexRingfunctions. #37030 (Bharat Nallan).
- New single binary based diagnostics tool (clickhouse-diagnostics). #36705 (Dale McDiarmid).
- Add output format Prometheus#36051. #36206 (Vladimir C).
- Add MySQLDumpinput format. It reads all data from INSERT queries belonging to one table in dump. If there are more than one table, by default it reads data from the first one. #36667 (Kruglov Pavel).
- Show the total_rowsandtotal_bytesfields insystem.tablesfor temporary tables. #36401. #36439 (xiedeyantu).
- Allow to override parts_to_delay_insertandparts_to_throw_insertwith query-level settings. If they are defined, they will override table-level settings. #36371 (Memo).
Experimental Feature
- Implemented L1, L2, Linf, Cosine distance functions for arrays and L1, L2, Linf norm functions for arrays. #37033 (qieqieplus). Caveat: the functions will be renamed.
- Improve the WATCHquery in WindowView: 1. Reduce the latency of providing query results by calling thefire_conditionsignal. 2. Makes the cancel query operation(ctrl-c) faster, by checkingisCancelled()more frequently. #37226 (vxider).
- Introspection for remove filesystem cache. #36802 (Han Shukai).
- Added new hash function wyHash64for SQL. #36467 (olevino).
- Improvement for replicated databases: Added SYSTEM SYNC DATABASE REPLICAquery which allows to sync tables metadata inside Replicated database, because currently synchronisation is asynchronous. #35944 (Nikita Mikhaylov).
- Improvement for remote filesystem cache: Better read from cache. #37054 (Kseniia Sumarokova). Improve SYSTEM DROP FILESYSTEM CACHEquery:<path>option andFORCEoption. #36639 (Kseniia Sumarokova).
- Improvement for semistructured data: Allow to cast columns of type Object(...)toObject(Nullable(...)). #36564 (awakeljw).
- Improvement for parallel replicas: We create a local interpreter if we want to execute query on localhost replica. But for when executing query on multiple replicas we rely on the fact that a connection exists so replicas can talk to coordinator. It is now improved and localhost replica can talk to coordinator directly in the same process. #36281 (Nikita Mikhaylov).
Performance Improvement
- Improve performance of avg,sumaggregate functions if used without GROUP BY expression. #37257 (Maksim Kita).
- Improve performance of unary arithmetic functions (bitCount,bitNot,abs,intExp2,intExp10,negate,roundAge,roundDuration,roundToExp2,sign) using dynamic dispatch. #37289 (Maksim Kita).
- Improve performance of ORDER BY, MergeJoin, insertion into MergeTree using JIT compilation of sort columns comparator. #34469 (Maksim Kita).
- Change structure of system.asynchronous_metric_log. It will take about 10 times less space. This closes #36357. The fieldevent_time_microsecondswas removed, because it is useless. #36360 (Alexey Milovidov).
- Load marks for only necessary columns when reading wide parts. #36879 (Anton Kozlov).
- Improves performance of file descriptor cache by narrowing mutex scopes. #36682 (Anton Kozlov).
- Improve performance of reading from storage Fileand table functionsfilein case when path has globs and matched directory contains large number of files. #36647 (Anton Popov).
- Apply parallel parsing for input format HiveText, which can speed up HiveText parsing by 2x when reading local file. #36650 (李扬).
- The default HashJoinis not thread safe for inserting right table's rows and run it in a single thread. When the right table is large, the join process is too slow with low cpu utilization. #36415 (lgbo).
- Allow to rewrite select countDistinct(a) from ttoselect count(1) from (select a from t groupBy a). #35993 (zhanglistar).
- Transform OR LIKE chain to multiMatchAny. Will enable once we have more confidence it works. #34932 (Daniel Kutenin).
- Improve performance of some functions with inlining. #34544 (Daniel Kutenin).
- Add a branch to avoid unnecessary memcpy in readBig. It improves performance somewhat. #36095 (jasperzhu).
- Implement partial GROUP BY key for optimize_aggregation_in_order. #35111 (Azat Khuzhin).
Improvement
- Show names of erroneous files in case of parsing errors while executing table functions file,s3andurl. #36314 (Anton Popov).
- Allowed to increase the number of threads for executing background operations (merges, mutations, moves and fetches) at runtime if they are specified at top level config. #36425 (Nikita Mikhaylov).
- Now date time conversion functions that generates time before 1970-01-01 00:00:00 with partial hours/minutes timezones will be saturated to zero instead of overflow. This is the continuation of https://github.com/ClickHouse/ClickHouse/pull/29953 which addresses https://github.com/ClickHouse/ClickHouse/pull/29953#discussion_r800550280 . Mark as improvement because it's implementation defined behavior (and very rare case) and we are allowed to break it. #36656 (Amos Bird).
- Add a warning if someone running clickhouse-server with log level "test". The log level "test" was added recently and cannot be used in production due to inevitable, unavoidable, fatal and life-threatening performance degradation. #36824 (Alexey Milovidov).
- Parse collations in CREATE TABLE, throw exception or ignore. closes #35892. #36271 (yuuch).
- Option compatibility_ignore_auto_increment_in_create_tableallows ignoringAUTO_INCREMENTkeyword in a column declaration to simplify migration from MySQL. #37178 (Igor Nikonov).
- Add aliases JSONLinesandNDJSONforJSONEachRow. Closes #36303. #36327 (flynn).
- Limit the max partitions could be queried for each hive table. Avoid resource overruns. #37281 (lgbo).
- Added implicit cast for h3kRingfunction second argument to improve usability. Closes #35432. #37189 (Maksim Kita).
- Fix progress indication for INSERT SELECTinclickhouse-localfor any query and for file progress in client, more correct file progress. #37075 (Kseniia Sumarokova).
- Fix bug which can lead to forgotten outdated parts in MergeTree table engines family in case of filesystem failures during parts removal. Before fix they will be removed only after first server restart. #37014 (alesapin).
- Implemented a new mode of handling row policies which can be enabled in the main configuration which enables users without permissive row policies to read rows. #36997 (Vitaly Baranov).
- Play UI: Nullable numbers will be aligned to the right in table cells. This closes #36982. #36988 (Alexey Milovidov).
- Play UI: If there is one row in result and more than a few columns, display the result vertically. Continuation of #36811. #36842 (Alexey Milovidov).
- Cleanup CSS in Play UI. The pixels are more evenly placed. Better usability for long content in table cells. #36569 (Alexey Milovidov).
- Finalize write buffers in case of exception to avoid doing it in destructors. Hope it fixes: #36907. #36979 (Kruglov Pavel).
- After #36425 settings like background_fetches_pool_sizebecame obsolete and can appear in top level config, but clickhouse throws and exception likeError updating configuration from '/etc/clickhouse-server/config.xml' config.: Code: 137. DB::Exception: A setting 'background_fetches_pool_size' appeared at top level in config /etc/clickhouse-server/config.xml.This is fixed. #36917 (Nikita Mikhaylov).
- Add extra diagnostic info (if applicable) when sending exception to other server. #36872 (tavplubix).
- Allow to execute hash functions with arguments of type Array(Tuple(..)). #36812 (Anton Popov).
- Added user_defined_pathconfig setting. #36753 (Maksim Kita).
- Allow cluster macro in s3Clustertable function. #36726 (Vadim Volodin).
- Properly cancel INSERT queries in clickhouse-client/clickhouse-local. #36710 (Azat Khuzhin).
- Allow to cancel a query while still keeping a decent query id in MySQLHandler. #36699 (Amos Bird).
- Add is_all_data_sentcolumn intosystem.processes, and improve internal testing hardening check based on it. #36649 (Azat Khuzhin).
- The metrics about time spent reading from s3 now calculated correctly. Close #35483. #36572 (Alexey Milovidov).
- Allow file descriptors in table function file if it is run in clickhouse-local. #36562 (wuxiaobai24).
- Allow names of tuple elements that start from digits. #36544 (Anton Popov).
- Now clickhouse-benchmark can read authentication info from environment variables. #36497 (Anton Kozlov).
- clickhouse-keeperimprovement: add support for force recovery which allows you to reconfigure cluster without quorum. #36258 (Antonio Andelic).
- Improve schema inference for JSON objects. #36207 (Kruglov Pavel).
- Refactor code around schema inference with globs. Try next file from glob only if it makes sense (previously we tried next file in case of any error). Also it fixes #36317. #36205 (Kruglov Pavel).
- Add a separate CLUSTERgrant (andaccess_control_improvements.on_cluster_queries_require_cluster_grantconfiguration directive, for backward compatibility, default tofalse). #35767 (Azat Khuzhin).
- If the required amount of memory is available before the selected query stopped, all waiting queries continue execution. Now we don't stop any query if memory is freed before the moment when the selected query knows about the cancellation. #35637 (Dmitry Novik).
- Nullables detection in protobuf. In proto3, default values are not sent on the wire. This makes it non-trivial to distinguish between null and default values for Nullable columns. A standard way to deal with this problem is to use Google wrappers to nest the target value within an inner message (see https://github.com/protocolbuffers/protobuf/blob/master/src/google/protobuf/wrappers.proto). In this case, a missing field is interpreted as null value, a field with missing value if interpreted as default value, and a field with regular value is interpreted as regular value. However, ClickHouse interprets Google wrappers as nested columns. We propose to introduce special behaviour to detect Google wrappers and interpret them like in the description above. For example, to serialize values for a Nullable column test, we would usegoogle.protobuf.StringValue testin our .proto schema. Note that these types are so called "well-known types" in Protobuf, implemented in the library itself. #35149 (Jakub Kuklis).
- Added support for specifying content_typein predefined and static HTTP handler config. #34916 (Roman Nikonov).
- Warn properly if use clickhouse-client --file without preceeding --external. Close #34747. #34765 (李扬).
- Improve MySQL database engine to compatible with binary(0) dataType. #37232 (zzsmdfj).
- Improve JSON report of clickhouse-benchmark. #36473 (Tian Xinhui).
- Server might refuse to start if it cannot resolve hostname of external ClickHouse dictionary. It's fixed. Fixes #36451. #36463 (tavplubix).
Build/Testing/Packaging Improvement
- Now clickhouse-keeperfor thex86_64architecture is statically linked with musl and doesn't depend on any system libraries. #31833 (Alexey Milovidov).
- ClickHouse builds for PowerPC64LEarchitecture are now available in universal installation scriptcurl https://clickhouse.com/ | shand by direct linkhttps://builds.clickhouse.com/master/powerpc64le/clickhouse. #37095 (Alexey Milovidov).
- Limit PowerPC code generation to Power8 for better compatibility. This closes #36025. #36529 (Alexey Milovidov).
- Simplify performance test. This will give a chance for us to use it. #36769 (Alexey Milovidov).
- Fail performance comparison on errors in the report. #34797 (Mikhail f. Shiryaev).
- Add ZSTD support for Arrow. This fixes #35283. #35486 (Sean Lafferty).
Bug Fix
- Extracts Version ID if present from the URI and adds a request to the AWS HTTP URI. Closes #31221. - [x] Extract Version IDfrom URI if present and reassemble without it. - [x] ConfigureAWS HTTP URIobject with request. - [x] Unit Tests:gtest_s3_uri- [x] Drop instrumentation commit. #34571 (Saad Ur Rahman).
- Fix system.opentelemetry_span_log attribute.values alias to values instead of keys. #37275 (Aleksandr Razumov).
- Fix Nullable(String) to Nullable(Bool/IPv4/IPv6) conversion Closes #37221. #37270 (Kruglov Pavel).
- Experimental feature: Fix execution of mutations in tables, in which there exist columns of type Object. Using subcolumns of typeObjectinWHEREexpression ofUPDATEorDELETEqueries is now allowed yet, as well as manipulating (DROP,MODIFY) of separate subcolumns. Fixes #37205. #37266 (Anton Popov).
- Kafka does not need group.idon producer stage. In console log you can find Warning that describe this issue:2022.05.15 17:59:13.270227 [ 137 ] {} <Warning> StorageKafka (topic-name): [rdk:CONFWARN] [thrd:app]: Configuration property group.id is a consumer property and will be ignored by this producer instance. #37228 (Mark Andreev).
- Experimental feature (WindowView): Update max_fired_watermarkafter blocks actually fired, in case delete data that hasn't been fired yet. #37225 (vxider).
- Fix "Cannot create column of type Set" for distributed queries with LIMIT BY. #37193 (Azat Khuzhin).
- Experimental feature: Now WindowView WATCH EVENTSquery will not be terminated due to the nonempty Chunk created inWindowViewSource.h:58. #37182 (vxider).
- Enable enable_global_with_statementfor subqueries, close #37141. #37166 (Vladimir C).
- Fix implicit cast for optimize_skip_unused_shards_rewrite_in. #37153 (Azat Khuzhin).
- The ILIKE function on FixedString columns could have returned wrong results (i.e. match less than it should). #37117 (Robert Schulze).
- Fix GROUP BYAggregateFunction(i.e. youGROUP BYby the column that hasAggregateFunctiontype). #37093 (Azat Khuzhin).
- Experimental feature: Fix optimize_aggregation_in_order with prefix GROUP BY and *Array aggregate functions. #37050 (Azat Khuzhin).
- Fixed performance degradation of some INSERT SELECT queries with implicit aggregation. Fixes #36792. #37047 (tavplubix).
- Experimental feature: Fix in-order GROUP BY(optimize_aggregation_in_order=1) with*Array(groupArrayArray/...) aggregate functions. #37046 (Azat Khuzhin).
- Fix LowCardinality->ArrowDictionary invalid output when type of indexes is not UInt8. Closes #36832. #37043 (Kruglov Pavel).
- Fixed problem with infs in quantileTDigest. Fixes #32107. #37021 (Vladimir Chebotarev).
- Fix sending external tables data in HedgedConnections with max_parallel_replicas != 1. #36981 (Kruglov Pavel).
- Fixed logical error on TRUNCATEquery inReplicateddatabase. Fixes #33747. #36976 (tavplubix).
- Experimental feature: Fix stuck when dropping source table in WindowView. Closes #35678. #36967 (vxider).
- Experimental feature (rocksdb cache): Fix issue: #36671. #36929 (李扬).
- Experimental feature: Fix bugs when using multiple columns in WindowView by adding converting actions to make it possible to callwriteIntoWindowViewwith a slightly different schema. #36928 (vxider).
- Fix bug in clickhouse-keeper which can lead to corrupted compressed log files in case of small load and restarts. #36910 (alesapin).
- Fix incorrect query result when doing constant aggregation. This fixes #36728 . #36888 (Amos Bird).
- Experimental feature: Fix current_sizecount in cache. #36887 (Kseniia Sumarokova).
- Experimental feature: Fix fire in window view with hop window #34044. #36861 (vxider).
- Experimental feature: Fix incorrect cast in cached buffer from remote fs. #36809 (Kseniia Sumarokova).
- Fix creation of tables with flatten_nested = 0. Previously unflattenedNestedcolumns could be flattened after server restart. #36803 (Anton Popov).
- Fix some issues with async reads from remote filesystem which happened when reading low cardinality. #36763 (Kseniia Sumarokova).
- Experimental feature: Fix insertion to columns of type Objectfrom multiple files, e.g. via table functionfilewith globs. #36762 (Anton Popov).
- Fix timeouts in Hedged requests. Connection hang right after sending remote query could lead to eternal waiting. #36749 (Kruglov Pavel).
- Experimental feature: Fix a bug of groupBitmapAndState/groupBitmapOrState/groupBitmapXorStateon distributed table. #36739 (Zhang Yifan).
- Experimental feature: During the test in PR, I found that the one cache class was initialized twice, it throws a exception. Although the cause of this problem is not clear, there should be code logic of repeatedly loading disk in ClickHouse, so we need to make special judgment for this situation. #36737 (Han Shukai).
- Fix vertical merges in wide parts. Previously an exception There is no columncan be thrown during merge. #36707 (Anton Popov).
- Fix server reload on port change (do not wait for current connections from query context). #36700 (Azat Khuzhin).
- Experimental feature: In the previous PR, I found that testing (stateless tests, flaky check (address, actions)) is timeout. Moreover, testing locally can also trigger unstable system deadlocks. This problem still exists when using the latest source code of master. #36697 (Han Shukai).
- Experimental feature: Fix server restart if cache configuration changed. #36685 (Kseniia Sumarokova).
- Fix possible heap-use-after-free in schema inference. Closes #36661. #36679 (Kruglov Pavel).
- Fixed parsing of query settings in CREATEquery when engine is not specified. Fixes https://github.com/ClickHouse/ClickHouse/pull/34187#issuecomment-1103812419. #36642 (tavplubix).
- Experimental feature: Fix merges of wide parts with type Object. #36637 (Anton Popov).
- Fix format crash when default expression follow EPHEMERAL not literal. Closes #36618. #36633 (flynn).
- Fix Missing columnexception which could happen while usingINTERPOLATEwithENGINE = MergeTreetable. #36549 (Yakov Olkhovskiy).
- Fix potential error with literals in WHEREfor join queries. Close #36279. #36542 (Vladimir C).
- Fix offset update ReadBufferFromEncryptedFile, which could cause undefined behaviour. #36493 (Kseniia Sumarokova).
- Fix hostname sanity checks for Keeper cluster configuration. Add keeper_server.host_checks_enabledconfig to enable/disable those checks. #36492 (Antonio Andelic).
- Fix usage of executable user defined functions in GROUP BY. Before executable user defined functions cannot be used as expressions in GROUP BY. Closes #36448. #36486 (Maksim Kita).
- Fix possible exception with unknown packet from server in client. #36481 (Kseniia Sumarokova).
- Experimental feature (please never use system.session_log, it is going to be removed): Add missing enum values in system.session_log table. Closes #36474. #36480 (Memo).
- Fix bug in s3Cluster schema inference that let to the fact that not all data was read in the select from s3Cluster. The bug appeared in https://github.com/ClickHouse/ClickHouse/pull/35544. #36434 (Kruglov Pavel).
- Fix nullptr dereference in JOIN and COLUMNS matcher. This fixes #36416. This is for https://github.com/ClickHouse/ClickHouse/pull/36417. #36430 (Amos Bird).
- Fix dictionary reload for ClickHouseDictionarySourceif it contains scalar subqueries. #36390 (lthaooo).
- Fix assertion in JOIN, close #36199. #36201 (Vladimir C).
- Queries with aliases inside special operators returned parsing error (was broken in 22.1). Example: SELECT substring('test' AS t, 1, 1). #36167 (Maksim Kita).
- Experimental feature: Fix insertion of complex JSONs with nested arrays to columns of type Object. #36077 (Anton Popov).
- Fix ALTER DROP COLUMN of nested column with compact parts (i.e. ALTER TABLE x DROP COLUMN n, when there is columnn.d). #35797 (Azat Khuzhin).
- Fix substring function range error length when offsetandlengthis negative constant andsis not constant. #33861 (RogerYK).
ClickHouse release 22.4, 2022-04-19
Backward Incompatible Change
- Do not allow SETTINGS after FORMAT for INSERT queries (there is compatibility setting allow_settings_after_format_in_insertto accept such queries, but it is turned OFF by default). #35883 (Azat Khuzhin).
- Function yandexConsistentHash(consistent hashing algorithm by Konstantin "kostik" Oblakov) is renamed tokostikConsistentHash. The old name is left as an alias for compatibility. Although this change is backward compatible, we may remove the alias in subsequent releases, that's why it's recommended to update the usages of this function in your apps. #35553 (Alexey Milovidov).
New Feature
- Added INTERPOLATE extension to the ORDER BY ... WITH FILL. Closes #34903. #35349 (Yakov Olkhovskiy).
- Profiling on Processors level (under log_processors_profilessetting, ClickHouse will write time that processor spent during execution/waiting for data tosystem.processors_profile_logtable). #34355 (Azat Khuzhin).
- Added functions makeDate(year, month, day), makeDate32(year, month, day). #35628 (Alexander Gololobov). Implementation of makeDateTime() and makeDateTIme64(). #35934 (Alexander Gololobov).
- Support new type of quota WRITTEN BYTESto limit amount of written bytes during insert queries. #35736 (Anton Popov).
- Added function flattenTuple. It receives nested namedTupleas an argument and returns a flattenTuplewhich elements are the paths from the originalTuple. E.g.:Tuple(a Int, Tuple(b Int, c Int)) -> Tuple(a Int, b Int, c Int).flattenTuplecan be used to select all paths from typeObjectas separate columns. #35690 (Anton Popov).
- Added functions arrayFirstOrNull,arrayLastOrNull. Closes #35238. #35414 (Maksim Kita).
- Added functions minSampleSizeContinousandminSampleSizeConversion. Author achimbab. #35360 (Maksim Kita).
- New functions minSampleSizeContinous and minSampleSizeConversion. #34354 (achimbab).
- Introduce format ProtobufList(all records as repeated messages in out Protobuf). Closes #16436. #35152 (Nikolai Kochetov).
- Add h3PointDistM,h3PointDistKm,h3PointDistRads,h3GetRes0Indexes,h3GetPentagonIndexesfunctions. #34568 (Bharat Nallan).
- Add toLastDayOfMonthfunction which rounds up a date or date with time to the last day of the month. #33501. #34394 (Habibullah Oladepo).
- Added load balancing setting for [Zoo]Keeper client. Closes #29617. #30325 (小路).
- Add a new kind of row policies named simple. Before this PR we had two kinds or row policies:permissiveandrestrictive. Asimplerow policy adds a new filter on a table without any side-effects like it was for permissive and restrictive policies. #35345 (Vitaly Baranov).
- Added an ability to specify cluster secret in replicated database. #35333 (Nikita Mikhaylov).
- Added sanity checks on server startup (available memory and disk space, max thread count, etc). #34566 (Sergei Trifonov).
- INTERVAL improvement - can be used with [MILLI|MICRO|NANO]SECOND. AddedtoStartOf[Milli|Micro|Nano]second()functions. Added[add|subtract][Milli|Micro|Nano]seconds(). #34353 (Andrey Zvonov).
Experimental Feature
- Added support for transactions for simple MergeTreetables. This feature is highly experimental and not recommended for production. Part of #22086. #24258 (tavplubix).
- Support schema inference for type Objectin formatJSONEachRow. Allow to convert columns of typeMapto columns of typeObject. #35629 (Anton Popov).
- Allow to write remote FS cache on all write operations. Add system.remote_filesystem_cachetable. Adddrop remote filesystem cachequery. Add introspection for s3 metadata withsystem.remote_data_pathstable. Closes #34021. Add cache option for merges by adding moderead_from_filesystem_cache_if_exists_otherwise_bypass_cache(turned on by default for merges and can also be turned on by query setting with the same name). Rename cache related settings (remote_fs_enable_cache -> enable_filesystem_cache, etc). #35475 (Kseniia Sumarokova).
- An option to store parts metadata in RocksDB. Speed up parts loading process of MergeTree to accelerate starting up of clickhouse-server. With this improvement, clickhouse-server was able to decrease starting up time from 75 minutes to 20 seconds, with 700k mergetree parts. #32928 (李扬).
Performance Improvement
- A new query plan optimization. Evaluate functions after ORDER BYwhen possible. As an example, for a querySELECT sipHash64(number) FROM numbers(1e8) ORDER BY number LIMIT 5, functionsipHash64would be evaluated afterORDER BYandLIMIT, which gives ~20x speed up. #35623 (Nikita Taranov).
- Sizes of hash tables used during aggregation now collected and used in later queries to avoid hash tables resizes. #33439 (Nikita Taranov).
- Improvement for hasAll function using SIMD instructions (SSE and AVX2). #27653 (youennL-cs). #35723 (Maksim Kita).
- Multiple changes to improve ASOF JOIN performance (1.2 - 1.6x as fast). It also adds support to use big integers. #34733 (Raúl Marín).
- Improve performance of ASOF JOIN if key is native integer. #35525 (Maksim Kita).
- Parallelization of multipart upload into S3 storage. #35343 (Sergei Trifonov).
- URL storage engine now downloads multiple chunks in parallel if the endpoint supports HTTP Range. Two additional settings were added, max_download_threadsandmax_download_buffer_size, which control maximum number of threads a single query can use to download the file and the maximum number of bytes each thread can process. #35150 (Antonio Andelic).
- Use multiple threads to download objects from S3. Downloading is controllable using max_download_threadsandmax_download_buffer_sizesettings. #35571 (Antonio Andelic).
- Narrow mutex scope when interacting with HDFS. Related to #35292. #35646 (shuchaome).
- Require mutations for per-table TTL only when it had been changed. #35953 (Azat Khuzhin).
Improvement
- Multiple improvements for schema inference. Use some tweaks and heuristics to determine numbers, strings, arrays, tuples and maps in CSV, TSV and TSVRaw data formats. Add setting input_format_csv_use_best_effort_in_schema_inferencefor CSV format that enables/disables using these heuristics, if it's disabled, we treat everything as string. Add similar settinginput_format_tsv_use_best_effort_in_schema_inferencefor TSV/TSVRaw format. These settings are enabled by default. - Add Maps support for schema inference in Values format. - Fix possible segfault in schema inference in Values format. - Allow to skip columns with unsupported types in Arrow/ORC/Parquet formats. Add corresponding settings for it:input_format_{parquet|orc|arrow}_skip_columns_with_unsupported_types_in_schema_inference. These settings are disabled by default. - Allow to convert a column with type Null to a Nullable column with all NULL values in Arrow/Parquet formats. - Allow to specify column names in schema inference via settingcolumn_names_for_schema_inferencefor formats that don't contain column names (like CSV, TSV, JSONCompactEachRow, etc) - Fix schema inference in ORC/Arrow/Parquet formats in terms of working with Nullable columns. Previously all inferred types were not Nullable and it blocked reading Nullable columns from data, now it's fixed and all inferred types are always Nullable (because we cannot understand that column is Nullable or not by reading the schema). - Fix schema inference in Template format with CSV escaping rules. #35582 (Kruglov Pavel).
- Add parallel parsing and schema inference for format JSONAsObject. #35592 (Anton Popov).
- Added a support for automatic schema inference to s3Clustertable function. Synced the signatures ofs3ands3Cluster. #35544 (Nikita Mikhaylov).
- Added support for schema inference for hdfsCluster. #35602 (Nikita Mikhaylov).
- Add new setting input_format_json_read_bools_as_numbersthat allows to infer and parse bools as numbers in JSON input formats. It's enabled by default. Suggested by @alexey-milovidov. #35735 (Kruglov Pavel).
- Improve columns ordering in schema inference for formats TSKV and JSONEachRow, closes #35640. Don't stop schema inference when reading empty row in schema inference for formats TSKV and JSONEachRow. #35724 (Kruglov Pavel).
- Add settings input_format_orc_case_insensitive_column_matching,input_format_arrow_case_insensitive_column_matching, andinput_format_parquet_case_insensitive_column_matchingwhich allows ClickHouse to use case insensitive matching of columns while reading data from ORC, Arrow or Parquet files. #35459 (Antonio Andelic).
- Added is_securecolumn tosystem.query_logwhich denotes if the client is using a secure connection over TCP or HTTP. #35705 (Antonio Andelic).
- Now kafka_num_consumerscan be bigger than amount of physical cores in case of low resource machine (less than 16 cores). #35926 (alesapin).
- Add some basic metrics to monitor engine=Kafka tables. #35916 (filimonov).
- Now it's not allowed to ALTER TABLE ... RESET SETTINGfor non-existing settings for MergeTree engines family. Fixes #35816. #35884 (alesapin).
- Now some ALTER MODIFY COLUMNqueries forArraysandNullabletypes can be done at metadata level without mutations. For example, alter fromArray(Enum8('Option1'=1))toArray(Enum8('Option1'=1, 'Option2'=2)). #35882 (alesapin).
- Added an animation to the hourglass icon to indicate to the user that a query is running. #35860 (peledni).
- support ALTER TABLE t DETACH PARTITION (ALL). #35794 (awakeljw).
- Improve projection analysis to optimize trivial queries such as count(). #35788 (Amos Bird).
- Support schema inference for insert select with using inputtable function. Get schema from insertion table instead of inferring it from the data in case of insert select from table functions that support schema inference. Closes #35639. #35760 (Kruglov Pavel).
- Respect remote_url_allow_hostsfor Hive tables. #35743 (李扬).
- Implement send_logs_levelfor clickhouse-local. Closes #35653. #35716 (Kseniia Sumarokova).
- Closes #35641 Allow EPHEMERALcolumns without explicit default expression. #35706 (Yakov Olkhovskiy).
- Add profile event counter AsyncInsertBytesabout size of async INSERTs. #35644 (Alexey Milovidov).
- Improve the pipeline description for JOIN. #35612 (何李夫).
- Deduce absolute hdfs config path. #35572 (李扬).
- Improve pasting performance and compatibility of clickhouse-client. This helps #35501. #35541 (Amos Bird).
- It was possible to get stack overflow in distributed queries if one of the settings async_socket_for_remoteanduse_hedged_requestsis enabled while parsing very deeply nested data type (at least in debug build). Closes #35509. #35524 (Kruglov Pavel).
- Add sizes of subcolumns to system.parts_columnstable. #35488 (Anton Popov).
- Add explicit table info to the scan node of query plan and pipeline. #35460 (何李夫).
- Allow server to bind to low-numbered ports (e.g. 443). ClickHouse installation script will set cap_net_bind_serviceto the binary file. #35451 (Alexey Milovidov).
- Fix INSERT INTO table FROM INFILE: it did not display the progress bar. #35429 (xiedeyantu).
- Add arguments --user,--password,--host,--portforclickhouse-diagnosticstool. #35422 (李扬).
- Support uuid for Postgres engines. Closes #35384. #35403 (Kseniia Sumarokova).
- For table function s3clusterorHDFSClusterorhive, we can't get rightAccessTypebyStorageFactory::instance().getSourceAccessType(getStorageTypeName()). This pr fix it. #35365 (李扬).
- Remove --testmodeoption for clickhouse-client, enable it unconditionally. #35354 (Kseniia Sumarokova).
- Don't allow wchcoperation (four letter command) for clickhouse-keeper. #35320 (zhangyuli1).
- Add function getTypeSerializationStreams. For a specified type (which is detected from column), it returns an array with all the serialization substream paths. This function is useful mainly for developers. #35290 (李扬).
- If portis not specified in cluster configuration, default server port will be used. This closes #34769. #34772 (Alexey Milovidov).
- Use minmaxindex for orc/parquet file in Hive Engine. Related PR: https://github.com/ClickHouse/arrow/pull/10. #34631 (李扬).
- System log tables now allow to specify COMMENT in ENGINE declaration. Closes #33768. #34536 (Maksim Kita).
- Proper support of setting max_rows_to_readin case of reading in order of sorting key and specified limit. Previously the exceptionLimit for rows or bytes to read exceededcould be thrown even if query actually requires to read less amount of rows. #33230 (Anton Popov).
- Respect only quota & period from cgroups, ignore shares (which are not really limit the number of the cores which can be used). #35815 (filimonov).
Build/Testing/Packaging Improvement
- Add next batch of randomization settings in functional tests. #35047 (Kruglov Pavel).
- Add backward compatibility check in stress test. Closes #25088. #27928 (Kruglov Pavel).
- Migrate package building to nfpm- Deprecatereleasescript in favor ofpackages/build- Build everything in clickhouse/binary-builder image (cleanup: clickhouse/deb-builder) - Add symbol stripping to cmake (todo: use bin_dir/clickhouse/$binary.debug) - Fix issue with DWARF symbols - Add Alpine APK packages - Renamealientoadditional_pkgs. #33664 (Mikhail f. Shiryaev).
- Add a night scan and upload for Coverity. #34895 (Boris Kuschel).
- A dedicated small package for clickhouse-keeper. #35308 (Mikhail f. Shiryaev).
- Running with podman was failing: it complains about specifying the same volume twice. #35978 (Roman Nikonov).
- Minor improvement in contrib/krb5 build configuration. #35832 (Anton Kozlov).
- Add a label to recognize a building task for every image. #35583 (Mikhail f. Shiryaev).
- Apply blackformatter to python code and add a per-commit check. #35466 (Mikhail f. Shiryaev).
- Redo alpine image to use clean Dockerfile. Create a script in tests/ci to build both ubuntu and alpine images. Add clickhouse-keeper image (cc @nikitamikhaylov). Add build check to PullRequestCI. Add a job to a ReleaseCI. Add a job to MasterCI to build and push clickhouse/clickhouse-server:headandclickhouse/clickhouse-keeper:headimages for each merged PR. #35211 (Mikhail f. Shiryaev).
- Fix stress-test report in CI, now we upload the runlog with information about started stress tests only once. #35093 (Mikhail f. Shiryaev).
- Switch to libcxx / libcxxabi from LLVM 14. #34906 (Raúl Marín).
- Update unixodbc to mitigate CVE-2018-7485. Note: this CVE is not relevant for ClickHouse as it implements its own isolation layer for ODBC. #35943 (Mikhail f. Shiryaev).
Bug Fix
- Added settings input_format_ipv4_default_on_conversion_error,input_format_ipv6_default_on_conversion_errorto allow insert of invalid ip address values as default into tables. Closes #35726. #35733 (Maksim Kita).
- Avoid erasing columns from a block if it doesn't exist while reading data from Hive. #35393 (lgbo).
- Add type checking when creating materialized view. Close: #23684. #24896 (hexiaoting).
- Fix formatting of INSERT INFILE queries (missing quotes). #35886 (Azat Khuzhin).
- Disable session_logbecause memory safety issue has been found by fuzzing. See #35714. #35873 (Alexey Milovidov).
- Avoid processing per-column TTL multiple times. #35820 (Azat Khuzhin).
- Fix inserts to columns of type Objectin case when there is data related to several partitions in insert query. #35806 (Anton Popov).
- Fix bug in indexes of not presented columns in -WithNames formats that led to error INCORRECT_NUMBER_OF_COLUMNSwhen the number of columns is more than 256. Closes #35793. #35803 (Kruglov Pavel).
- Fixes #35751. #35799 (Nikolay Degterinsky).
- Fix for reading from HDFS in Snappy format. #35771 (shuchaome).
- Fix bug in conversion from custom types to string that could lead to segfault or unexpected error messages. Closes #35752. #35755 (Kruglov Pavel).
- Fix any/all (subquery) implementation. Closes #35489. #35727 (Kseniia Sumarokova).
- Fix dropping non-empty database in clickhouse-local. Closes #35692. #35711 (Kseniia Sumarokova).
- Fix bug in creating materialized view with subquery after server restart. Materialized view was not getting updated after inserts into underlying table after server restart. Closes #35511. #35691 (Kruglov Pavel).
- Fix possible Can't adjust last granuleexception while reading subcolumns of experimental typeObject. #35687 (Anton Popov).
- Enable build with JIT compilation by default. #35683 (Maksim Kita).
- Fix possible loss of subcolumns in experimental type Object. #35682 (Anton Popov).
- Fix check ASOF JOIN key nullability, close #35565. #35674 (Vladimir C).
- Fix part checking logic for parts with projections. Error happened when projection and main part had different types. This is similar to https://github.com/ClickHouse/ClickHouse/pull/33774 . The bug is addressed by @caoyang10. #35667 (Amos Bird).
- Fix server crash when large number of arguments are passed into formatfunction. Please refer to the test file and see how to reproduce the crash. #35651 (Amos Bird).
- Fix usage of quotas with asynchronous inserts. #35645 (Anton Popov).
- Fix positional arguments with aliases. Closes #35600. #35620 (Kseniia Sumarokova).
- Check remote_url_allow_hostsbefore schema inference in URL engine Closes #35064. #35619 (Kruglov Pavel).
- Fix HashJoinwhen columns withLowCardinalitytype are used. This closes #35548. #35616 (Antonio Andelic).
- Fix possible segfault in MaterializedPostgreSQL which happened if exception occurred when data, collected in memory, was synced into underlying tables. Closes #35611. #35614 (Kseniia Sumarokova).
- Setting database_atomic_wait_for_drop_and_detach_synchronouslyworked incorrectly forATTACH TABLEquery when previously detached table is still in use, It's fixed. #35594 (tavplubix).
- Fix HTTP headers with named collections, add compression_method. Closes #35273. Closes #35269. #35593 (Kseniia Sumarokova).
- Fix s3 engine getting virtual columns. Closes #35411. #35586 (Kseniia Sumarokova).
- Fixed return type deduction for caseWithExpression. The type of the ELSE branch is now correctly taken into account. #35576 (Antonio Andelic).
- Fix parsing of IPv6 addresses longer than 39 characters. Closes #34022. #35539 (Maksim Kita).
- Fix cast into IPv4, IPv6 address in IN section. Fixes #35528. #35534 (Maksim Kita).
- Fix crash during short circuit function evaluation when one of arguments is nullable constant. Closes #35497. Closes #35496. #35502 (Maksim Kita).
- Fix crash for function throwIfwith constant arguments. #35500 (Maksim Kita).
- Fix bug in Keeper which can lead to unstable client connections. Introduced in #35031. #35498 (alesapin).
- Fix bug in function ifwhen resulting column type differs with resulting data type that led to logical errors likeLogical error: 'Bad cast from type DB::ColumnVector<int> to DB::ColumnVector<long>'.. Closes #35367. #35476 (Kruglov Pavel).
- Fix excessive logging when using S3 as backend for MergeTree or as separate table engine/function. Fixes #30559. #35434 (alesapin).
- Now merges executed with zero copy replication (experimental) will not spam logs with message Found parts with the same min block and with the same max block as the missing part _ on replica _. Hoping that it will eventually appear as a result of a merge.. #35430 (alesapin).
- Skip possible exception if empty chunks appear in GroupingAggregatedTransform. #35417 (Nikita Taranov).
- Fix working with columns that are not needed in query in Arrow/Parquet/ORC formats, it prevents possible errors like Unsupported <format> type <type> of an input column <column_name>when file contains column with unsupported type and we don't use it in query. #35406 (Kruglov Pavel).
- Fix for local cache for remote filesystem (experimental feature) for high concurrency on corner cases. #35381 (Kseniia Sumarokova). Fix possible deadlock in cache. #35378 (Kseniia Sumarokova).
- Fix partition pruning in case of comparison with constant in WHERE. If column and constant had different types, overflow was possible. Query could return an incorrect empty result. This fixes #35304. #35334 (Amos Bird).
- Fix schema inference for TSKV format while using small max_read_buffer_size. #35332 (Kruglov Pavel).
- Fix mutations in tables with enabled sparse columns. #35284 (Anton Popov).
- Do not delay final part writing by default (fixes possible Memory limit exceededduringINSERTby addingmax_insert_delayed_streams_for_parallel_writewith default to 1000 for writes to s3 and disabled as before otherwise). #34780 (Azat Khuzhin).
ClickHouse release v22.3-lts, 2022-03-17
Backward Incompatible Change
- Make arrayCompactfunction behave as other higher-order functions: perform compaction not of lambda function results but on the original array. If you're using nontrivial lambda functions in arrayCompact you may restore old behaviour by wrappingarrayCompactarguments intoarrayMap. Closes #34010 #18535 #14778. #34795 (Alexandre Snarskii).
- Change implementation specific behavior on overflow of function toDatetime. It will be saturated to the nearest min/max supported instant of datetime instead of wraparound. This change is highlighted as "backward incompatible" because someone may unintentionally rely on the old behavior. #32898 (HaiBo Li).
- Make function cast(value, 'IPv4'),cast(value, 'IPv6')behave same astoIPv4,toIPv6functions. Changed behavior of incorrect IP address passed into functionstoIPv4,toIPv6, now if invalid IP address passes into this functions exception will be raised, before this function return default value. Added functionsIPv4StringToNumOrDefault,IPv4StringToNumOrNull,IPv6StringToNumOrDefault,IPv6StringOrNulltoIPv4OrDefault,toIPv4OrNull,toIPv6OrDefault,toIPv6OrNull. FunctionsIPv4StringToNumOrDefault,toIPv4OrDefault,toIPv6OrDefaultshould be used if previous logic relied onIPv4StringToNum,toIPv4,toIPv6returning default value for invalid address. Added settingcast_ipv4_ipv6_default_on_conversion_error, if this setting enabled, then IP address conversion functions will behave as before. Closes #22825. Closes #5799. Closes #35156. #35240 (Maksim Kita).
New Feature
- Support for caching data locally for remote filesystems. It can be enabled for s3disks. Closes #28961. #33717 (Kseniia Sumarokova). In the meantime, we enabled the test suite on s3 filesystem and no more known issues exist, so it is started to be production ready.
- Add new table function hive. It can be used as followshive('<hive metastore url>', '<hive database>', '<hive table name>', '<columns definition>', '<partition columns>')for exampleSELECT * FROM hive('thrift://hivetest:9083', 'test', 'demo', 'id Nullable(String), score Nullable(Int32), day Nullable(String)', 'day'). #34946 (lgbo).
- Support authentication of users connected via SSL by their X.509 certificate. #31484 (eungenue).
- Support schema inference for inserting into table functions file/hdfs/s3/url. #34732 (Kruglov Pavel).
- Now you can read system.zookeepertable without restrictions on path or usinglikeexpression. This reads can generate quite heavy load for zookeeper so to enable this ability you have to enable settingallow_unrestricted_reads_from_keeper. #34609 (Sergei Trifonov).
- Display CPU and memory metrics in clickhouse-local. Close #34545. #34605 (李扬).
- Implement startsWithandendsWithfunction for arrays, closes #33982. #34368 (usurai).
- Add three functions for Map data type: 1. mapReplace(map1, map2)- replaces values for keys in map1 with the values of the corresponding keys in map2; adds keys from map2 that don't exist in map1. 2.mapFilter3.mapMap. mapFilter and mapMap are higher order functions, accepting two arguments, the first argument is a lambda function with k, v pair as arguments, the second argument is a column of type Map. #33698 (hexiaoting).
- Allow getting default user and password for clickhouse-client from the CLICKHOUSE_USERandCLICKHOUSE_PASSWORDenvironment variables. Close #34538. #34947 (DR).
Experimental Feature
- New data type Object(<schema_format>), which supports storing of semi-structured data (for now JSON only). Data is written to such types as string. Then all paths are extracted according to format of semi-structured data and written as separate columns in most optimal types, that can store all their values. Those columns can be queried by names that match paths in source data. E.gdata.key1.key2or with cast operatordata.key1.key2::Int64.
- Add database_replicated_allow_only_replicated_enginesetting. When enabled, it only allowed to only createReplicatedtables or tables with stateless engines inReplicateddatabases. #35214 (Nikolai Kochetov). Note thatReplicateddatabase is still an experimental feature.
Performance Improvement
- Improve performance of insertion into MergeTreetables by optimizing sorting. Up to 2x improvement is observed on realistic benchmarks. #34750 (Maksim Kita).
- Columns pruning when reading Parquet, ORC and Arrow files from URL and S3. Closes #34163. #34849 (Kseniia Sumarokova).
- Columns pruning when reading Parquet, ORC and Arrow files from Hive. #34954 (lgbo).
- A bunch of performance optimizations from a performance superhero. Improve performance of processing queries with large INsection. Improve performance ofdirectdictionary if its source isClickHouse. Improve performance ofdetectCharset,detectLanguageUnknownfunctions. #34888 (Maksim Kita).
- Improve performance of anyaggregate function by using more batching. #34760 (Raúl Marín).
- Multiple improvements for performance of clickhouse-keeper: less locking #35010 (zhanglistar), lower memory usage by streaming reading and writing of snapshot instead of full copy. #34584 (zhanglistar), optimizing compaction of log store in the RAFT implementation. #34534 (zhanglistar), versioning of the internal data structure #34486 (zhanglistar).
Improvement
- Allow asynchronous inserts to table functions. Fixes #34864. #34866 (Anton Popov).
- Implicit type casting of the key argument for functions dictGetHierarchy,dictIsIn,dictGetChildren,dictGetDescendants. Closes #34970. #35027 (Maksim Kita).
- EXPLAIN ASTquery can output AST in form of a graph in Graphviz format:- EXPLAIN AST graph = 1 SELECT * FROM system.parts. #35173 (李扬).
- When large files were written with s3table function or table engine, the content type on the files was mistakenly set toapplication/xmldue to a bug in the AWS SDK. This closes #33964. #34433 (Alexey Milovidov).
- Change restrictive row policies a bit to make them an easier alternative to permissive policies in easy cases. If for a particular table only restrictive policies exist (without permissive policies) users will be able to see some rows. Also SHOW CREATE ROW POLICYwill always showAS permissiveorAS restrictivein row policy's definition. #34596 (Vitaly Baranov).
- Improve schema inference with globs in File/S3/HDFS/URL engines. Try to use the next path for schema inference in case of error. #34465 (Kruglov Pavel).
- Play UI now correctly detects the preferred light/dark theme from the OS. #35068 (peledni).
- Added date_time_input_format = 'best_effort_us'. Closes #34799. #34982 (WenYao).
- A new settings called allow_plaintext_passwordandallow_no_passwordare added in server configuration which turn on/off authentication types that can be potentially insecure in some environments. They are allowed by default. #34738 (Heena Bansal).
- Support for DateTime64data type inArrowformat, closes #8280 and closes #28574. #34561 (李扬).
- Reload remote_url_allow_hosts(filtering of outgoing connections) on config update. #35294 (Nikolai Kochetov).
- Support --testmodeparameter forclickhouse-local. This parameter enables interpretation of test hints that we use in functional tests. #35264 (Kseniia Sumarokova).
- Add distributed_depthto query log. It is like a more detailed variant ofis_initial_query#35207 (李扬).
- Respect remote_url_allow_hostsforMySQLandPostgreSQLtable functions. #35191 (Heena Bansal).
- Added disk_namefield tosystem.part_log. #35178 (Artyom Yurkov).
- Do not retry non-rertiable errors when querying remote URLs. Closes #35161. #35172 (Kseniia Sumarokova).
- Support distributed INSERT SELECT queries (the setting parallel_distributed_insert_select) table functionview(). #35132 (Azat Khuzhin).
- More precise memory tracking during INSERTintoBufferwithAggregateFunction. #35072 (Azat Khuzhin).
- Avoid division by zero in Query Profiler if Linux kernel has a bug. Closes #34787. #35032 (Alexey Milovidov).
- Add more sanity checks for keeper configuration: now mixing of localhost and non-local servers is not allowed, also add checks for same value of internal raft port and keeper client port. #35004 (alesapin).
- Currently, if the user changes the settings of the system tables there will be tons of logs and ClickHouse will rename the tables every minute. This fixes #34929. #34949 (Nikita Mikhaylov).
- Use connection pool for Hive metastore client. #34940 (lgbo).
- Ignore per-column TTLinCREATE TABLE ASif new table engine does not support it (i.e. if the engine is not ofMergeTreefamily). #34938 (Azat Khuzhin).
- Allow LowCardinalitystrings forngrambf_v1/tokenbf_v1indexes. Closes #21865. #34911 (Lars Hiller Eidnes).
- Allow opening empty sqlite db if the file doesn't exist. Closes #33367. #34907 (Kseniia Sumarokova).
- Implement memory statistics for FreeBSD - this is required for max_server_memory_usageto work correctly. #34902 (Alexandre Snarskii).
- In previous versions the progress bar in clickhouse-client can jump forward near 50% for no reason. This closes #34324. #34801 (Alexey Milovidov).
- Now ALTER TABLE DROP COLUMN columnXqueries forMergeTreetable engines will work instantly whencolumnXis anALIAScolumn. Fixes #34660. #34786 (alesapin).
- Show hints when user mistyped the name of a data skipping index. Closes #29698. #34764 (flynn).
- Support remote()/cluster()table functions forparallel_distributed_insert_select. #34728 (Azat Khuzhin).
- Do not reset logging that configured via --log-file/--errorlog-filecommand line options in case of empty configuration in the config file. #34718 (Amos Bird).
- Extract schema only once on table creation and prevent reading from local files/external sources to extract schema on each server startup. #34684 (Kruglov Pavel).
- Allow specifying argument names for executable UDFs. This is necessary for formats where argument name is part of serialization, like Native,JSONEachRow. Closes #34604. #34653 (Maksim Kita).
- MaterializedMySQL(experimental feature) now supports- materialized_mysql_tables_list(a comma-separated list of MySQL database tables, which will be replicated by the MaterializedMySQL database engine. Default value: empty list — means all the tables will be replicated), mentioned at #32977. #34487 (zzsmdfj).
- Improve OpenTelemetry span logs for INSERT operation on distributed table. #34480 (Frank Chen).
- Make the znode ctimeandmtimeconsistent between servers in ClickHouse Keeper. #33441 (小路).
Build/Testing/Packaging Improvement
- Package repository is migrated to JFrog Artifactory (Mikhail f. Shiryaev).
- Randomize some settings in functional tests, so more possible combinations of settings will be tested. This is yet another fuzzing method to ensure better test coverage. This closes #32268. #34092 (Kruglov Pavel).
- Drop PVS-Studio from our CI. #34680 (Mikhail f. Shiryaev).
- Add an ability to build stripped binaries with CMake. In previous versions it was performed by dh-tools. #35196 (alesapin).
- Smaller "fat-free" clickhouse-keeperbuild. #35031 (alesapin).
- Use @robot-clickhouse as an author and committer for PRs like https://github.com/ClickHouse/ClickHouse/pull/34685. #34793 (Mikhail f. Shiryaev).
- Limit DWARF version for debug info by 4 max, because our internal stack symbolizer cannot parse DWARF version 5. This makes sense if you compile ClickHouse with clang-15. #34777 (Alexey Milovidov).
- Remove clickhouse-testdebian package as unneeded complication. CI use tests from repository and standalone testing via deb package is no longer supported. #34606 (Ilya Yatsishin).
Bug Fix (user-visible misbehaviour in official stable or prestable release)
- A fix for HDFS integration: When the inner buffer size is too small, NEED_MORE_INPUT in HadoopSnappyDecoderwill run multi times (>=3) for one compressed block. This makes the input data be copied into the wrong place inHadoopSnappyDecoder::buffer. #35116 (lgbo).
- Ignore obsolete grants in ATTACH GRANT statements. This PR fixes #34815. #34855 (Vitaly Baranov).
- Fix segfault in Postgres database when getting create table query if database was created using named collections. Closes #35312. #35313 (Kseniia Sumarokova).
- Fix partial merge join duplicate rows bug, close #31009. #35311 (Vladimir C).
- Fix possible Assertion 'position() != working_buffer.end()' failedwhile using bzip2 compression with smallmax_read_buffer_sizesetting value. The bug was found in https://github.com/ClickHouse/ClickHouse/pull/35047. #35300 (Kruglov Pavel). While using lz4 compression with a small max_read_buffer_size setting value. #35296 (Kruglov Pavel). While using lzma compression with smallmax_read_buffer_sizesetting value. #35295 (Kruglov Pavel). While usingbrotlicompression with a smallmax_read_buffer_sizesetting value. The bug was found in https://github.com/ClickHouse/ClickHouse/pull/35047. #35281 (Kruglov Pavel).
- Fix possible segfault in JSONEachRowschema inference. #35291 (Kruglov Pavel).
- Fix CHECK TABLEquery in case when sparse columns are enabled in table. #35274 (Anton Popov).
- Avoid std::terminate in case of exception in reading from remote VFS. #35257 (Azat Khuzhin).
- Fix reading port from config, close #34776. #35193 (Vladimir C).
- Fix error in query with WITH TOTALSin case ifHAVINGreturned empty result. This fixes #33711. #35186 (Amos Bird).
- Fix a corner case of replaceRegexpAll, close #35117. #35182 (Vladimir C).
- Schema inference didn't work properly on case of INSERT INTO FUNCTION s3(...) FROM ..., it tried to read schema from s3 file instead of from select query. #35176 (Kruglov Pavel).
- Fix MaterializedPostgreSQL (experimental feature) table overridesfor partition by, etc. Closes #35048. #35162 (Kseniia Sumarokova).
- Fix MaterializedPostgreSQL (experimental feature) adding new table to replication (ATTACH TABLE) after manually removing (DETACH TABLE). Closes #33800. Closes #34922. Closes #34315. #35158 (Kseniia Sumarokova).
- Fix partition pruning error when non-monotonic function is used with IN operator. This fixes #35136. #35146 (Amos Bird).
- Fixed slightly incorrect translation of YAML configs to XML. #35135 (Miel Donkers).
- Fix optimize_skip_unused_shards_rewrite_infor signed columns and negative values. #35134 (Azat Khuzhin).
- The update_lagexternal dictionary configuration option was unusable showing the error messageUnexpected key `update_lag` in dictionary source configuration. #35089 (Jason Chu).
- Avoid possible deadlock on server shutdown. #35081 (Azat Khuzhin).
- Fix missing alias after function is optimized to a subcolumn when setting optimize_functions_to_subcolumnsis enabled. Closes #33798. #35079 (qieqieplus).
- Fix reading from system.asynchronous_insertstable if there exists asynchronous insert into table function. #35050 (Anton Popov).
- Fix possible exception Reading for MergeTree family tables must be done with last position boundary(relevant to operation on remote VFS). Closes #34979. #35001 (Kseniia Sumarokova).
- Fix unexpected result when use -State type aggregate function in window frame. #34999 (metahys).
- Fix possible segfault in FileLog (experimental feature). Closes #30749. #34996 (Kseniia Sumarokova).
- Fix possible rare error Cannot push block to port which already has data. #34993 (Nikolai Kochetov).
- Fix wrong schema inference for unquoted dates in CSV. Closes #34768. #34961 (Kruglov Pavel).
- Integration with Hive: Fix unexpected result when use ininwherein hive query. #34945 (lgbo).
- Avoid busy polling in ClickHouse Keeper while searching for changelog files to delete. #34931 (Azat Khuzhin).
- Fix DateTime64 conversion from PostgreSQL. Closes #33364. #34910 (Kseniia Sumarokova).
- Fix possible "Part directory doesn't exist" during INSERTinto MergeTree table backed by VFS over s3. #34876 (Azat Khuzhin).
- Support DDLs like CREATE USER to be executed on cross replicated cluster. #34860 (Jianmei Zhang).
- Fix bugs for multiple columns group by in WindowView(experimental feature). #34859 (vxider).
- Fix possible failures in S2 functions when queries contain const columns. #34745 (Bharat Nallan).
- Fix bug for H3 funcs containing const columns which cause queries to fail. #34743 (Bharat Nallan).
- Fix No such file or directorywith enabledfsync_part_directoryand vertical merge. #34739 (Azat Khuzhin).
- Fix serialization/printing for system queries RELOAD MODEL,RELOAD FUNCTION,RESTART DISKwhen usedON CLUSTER. Closes #34514. #34696 (Maksim Kita).
- Fix allow_experimental_projection_optimizationwithenable_global_with_statement(before it may lead toStack size too largeerror in case of multiple expressions inWITHclause, and also it executes scalar subqueries again and again, so not it will be more optimal). #34650 (Azat Khuzhin).
- Stop to select part for mutate when the other replica has already updated the transaction log for ReplatedMergeTreeengine. #34633 (Jianmei Zhang).
- Fix incorrect result of trivial count query when part movement feature is used #34089. #34385 (nvartolomei).
- Fix inconsistency of max_query_sizelimitation in distributed subqueries. #34078 (Chao Ma).
ClickHouse release v22.2, 2022-02-17
Upgrade Notes
- Applying data skipping indexes for queries with FINAL may produce incorrect result. In this release we disabled data skipping indexes by default for queries with FINAL (a new setting use_skip_indexes_if_finalis introduced and disabled by default). #34243 (Azat Khuzhin).
New Feature
- Projections are production ready. Set allow_experimental_projection_optimizationby default and deprecate this setting. #34456 (Nikolai Kochetov).
- An option to create a new files on insert for File/S3/HDFSengines. Allow to overwrite a file inHDFS. Throw an exception in attempt to overwrite a file inS3by default. Throw an exception in attempt to append data to file in formats that have a suffix (and thus don't support appends, likeParquet,ORC). Closes #31640 Closes #31622 Closes #23862 Closes #15022 Closes #16674. #33302 (Kruglov Pavel).
- Add a setting that allows a user to provide own deduplication semantic in MergeTree/ReplicatedMergeTreeIf provided, it's used instead of data digest to generate block ID. So, for example, by providing a unique value for the setting in each INSERT statement, the user can avoid the same inserted data being deduplicated. This closes: #7461. #32304 (Igor Nikonov).
- Add support of DEFAULTkeyword for INSERT statements. Closes #6331. #33141 (Andrii Buriachevskyi).
- EPHEMERALcolumn specifier is added to- CREATE TABLEquery. Closes #9436. #34424 (yakov-olkhovskiy).
- Support IF EXISTSclause forTTL expr TO [DISK|VOLUME] [IF EXISTS] 'xxx'feature. Parts will be moved to disk or volume only if it exists on replica, soMOVE TTLrules will be able to behave differently on replicas according to the existing storage policies. Resolves #34455. #34504 (Anton Popov).
- Allow set default table engine and to create tables without specifying ENGINE. #34187 (Ilya Yatsishin).
- Add table function format(format_name, data). #34125 (Kruglov Pavel).
- Detect format in clickhouse-localby file name even in the case when it is passed to stdin. #33829 (Kruglov Pavel).
- Add schema inference for valuestable function. Closes #33811. #34017 (Kruglov Pavel).
- Dynamic reload of server TLS certificates on config reload. Closes #15764. #15765 (johnskopis). #31257 (Filatenkov Artur).
- Now ReplicatedMergeTree can recover data when some of its disks are broken. #13544 (Amos Bird).
- Fault-tolerant connections in clickhouse-client: clickhouse-client ... --host host1 --host host2 --port port2 --host host3 --port port --host host4. #34490 (Kruglov Pavel). #33824 (Filippov Denis).
- Add DEGREESandRADIANSfunctions for MySQL compatibility. #33769 (Bharat Nallan).
- Add h3ToCenterChildfunction. #33313 (Bharat Nallan). Add new h3 miscellaneous functions:edgeLengthKm,exactEdgeLengthKm,exactEdgeLengthM,exactEdgeLengthRads,numHexagons. #33621 (Bharat Nallan).
- Add function bitSliceto extract bit subsequences from String/FixedString. #33360 (RogerYK).
- Implemented meanZTestaggregate function. #33354 (achimbab).
- Add confidence intervals to T-tests aggregate functions. #33260 (achimbab).
- Add function addressToLineWithInlines. Close #26211. #33467 (SuperDJY).
- Added #!and#as a recognised start of a single line comment. Closes #34138. #34230 (Aaron Katz).
Experimental Feature
- Functions for text classification: language and charset detection. See #23271. #33314 (Nikolay Degterinsky).
- Add memory overcommit to MemoryTracker. Addedguaranteedsettings for memory limits which represent soft memory limits. In case when hard memory limit is reached,MemoryTrackertries to cancel the most overcommited query. New settingmemory_usage_overcommit_max_wait_microsecondsspecifies how long queries may wait another query to stop. Closes #28375. #31182 (Dmitry Novik).
- Enable stream to table join in WindowView. #33729 (vxider).
- Support SET,YEAR,TIMEandGEOMETRYdata types inMaterializedMySQL(experimental feature). Fixes #18091, #21536, #26361. #33429 (zzsmdfj).
- Fix various issues when projection is enabled by default. Each issue is described in separate commit. This is for #33678 . This fixes #34273. #34305 (Amos Bird).
Performance Improvement
- Support optimize_read_in_orderif prefix of sorting key is already sorted. E.g. if we have sorting keyORDER BY (a, b)in table and query withWHERE a = const ORDER BY bclauses, now it will be applied reading in order of sorting key instead of full sort. #32748 (Anton Popov).
- Improve performance of partitioned insert into table functions URL,S3,File,HDFS. Closes #34348. #34510 (Maksim Kita).
- Multiple performance improvements of clickhouse-keeper. #34484 #34587 (zhanglistar).
- FlatDictionaryimprove performance of dictionary data load. #33871 (Maksim Kita).
- Improve performance of mapPopulateSeriesfunction. Closes #33944. #34318 (Maksim Kita).
- _fileand- _pathvirtual columns (in file-like table engines) are made- LowCardinality- it will make queries for multiple files faster. Closes #34300. #34317 (flynn).
- Speed up loading of data parts. It was not parallelized before: the setting part_loading_threadsdid not have effect. See #4699. #34310 (alexey-milovidov).
- Improve performance of LineAsStringformat. This closes #34303. #34306 (alexey-milovidov).
- Optimize quantilesExact{Low,High}to usenth_elementinstead ofsort. #34287 (Danila Kutenin).
- Slightly improve performance of Regexpformat. #34202 (alexey-milovidov).
- Minor improvement for analysis of scalar subqueries. #34128 (Federico Rodriguez).
- Make ORDER BY tuple almost as fast as ORDER BY columns. We have special optimizations for multiple column ORDER BY: https://github.com/ClickHouse/ClickHouse/pull/10831 . It's beneficial to also apply to tuple columns. #34060 (Amos Bird).
- Rework and reintroduce the scalar subqueries cache to Materialized Views execution. #33958 (Raúl Marín).
- Slightly improve performance of ORDER BYby adding x86-64 AVX-512 support formemcmpSmallfunctions to accelerate memory comparison. It works only if you compile ClickHouse by yourself. #33706 (hanqf-git).
- Improve range_hasheddictionary performance if for key there are a lot of intervals. Fixes #23821. #33516 (Maksim Kita).
- For inserts and merges into S3, write files in parallel whenever possible (TODO: check if it's merged). #33291 (Nikolai Kochetov).
- Improve clickhouse-keeperperformance and fix several memory leaks in NuRaft library. #33329 (alesapin).
Improvement
- Support asynchronous inserts in clickhouse-clientfor queries with inlined data. #34267 (Anton Popov).
- Functions dictGet,dictHasimplicitly cast key argument to dictionary key structure, if they are different. #33672 (Maksim Kita).
- Improvements for range_hasheddictionaries. Improve performance of load time if there are multiple attributes. Allow to create a dictionary without attributes. Added option to specify strategy when intervalsstartandendhaveNullabletypeconvert_null_range_bound_to_openby default istrue. Closes #29791. Allow to specifyFloat,Decimal,DateTime64,Int128,Int256,UInt128,UInt256as range types.RangeHashedDictionaryadded support for range values that extendInt64type. Closes #28322. Added optionrange_lookup_strategyto specify range lookup typemin,maxby default ismin. Closes #21647. Fixed allocated bytes calculations. Fixed type name insystem.dictionariesin case ofComplexKeyHashedDictionary. #33927 (Maksim Kita).
- flat,- hashed,- hashed_arraydictionaries now support creating with empty attributes, with support of reading the keys and using- dictHas. Fixes #33820. #33918 (Maksim Kita).
- Added support for DateTime64data type in dictionaries. #33914 (Maksim Kita).
- Allow to write s3(url, access_key_id, secret_access_key)(autodetect of data format and table structure, but with explicit credentials). #34503 (Kruglov Pavel).
- Added sending of the output format back to client like it's done in HTTP protocol as suggested in #34362. Closes #34362. #34499 (Vitaly Baranov).
- Send ProfileEvents statistics in case of INSERT SELECT query (to display query metrics in clickhouse-clientfor this type of queries). #34498 (Dmitry Novik).
- Recognize .jsonlextension for JSONEachRow format. #34496 (Kruglov Pavel).
- Improve schema inference in clickhouse-local. Allow to write just clickhouse-local -q "select * from table" < data.format. #34495 (Kruglov Pavel).
- Privileges CREATE/ALTER/DROP ROW POLICY now can be granted on a table or on database.*as well as globally*.*. #34489 (Vitaly Baranov).
- Allow to export arbitrary large files to s3. Add two new settings:s3_upload_part_size_multiply_factorands3_upload_part_size_multiply_parts_count_threshold. Now each times3_upload_part_size_multiply_parts_count_thresholduploaded to S3 from a single querys3_min_upload_part_sizemultiplied bys3_upload_part_size_multiply_factor. Fixes #34244. #34422 (alesapin).
- Allow to skip not found (404) URLs for globs when using URL storage / table function. Also closes #34359. #34392 (Kseniia Sumarokova).
- Default input and output formats for clickhouse-localthat can be overriden by --input-format and --output-format. Close #30631. #34352 (李扬).
- Add options for clickhouse-format. Which close #30528 -max_query_size-max_parser_depth. #34349 (李扬).
- Better handling of pre-inputs before client start. This is for #34308. #34336 (Amos Bird).
- REGEXP_MATCHESand- REGEXP_REPLACEfunction aliases for compatibility with PostgreSQL. Close #30885. #34334 (李扬).
- Some servers expect a User-Agent header in their HTTP requests. A User-Agentheader entry has been added to HTTP requests of the form: User-Agent: ClickHouse/VERSION_STRING. #34330 (Saad Ur Rahman).
- Cancel merges before acquiring table lock for TRUNCATEquery to avoidDEADLOCK_AVOIDEDerror in some cases. Fixes #34302. #34304 (tavplubix).
- Change severity of the "Cancelled merging parts" message in logs, because it's not an error. This closes #34148. #34232 (alexey-milovidov).
- Add ability to compose PostgreSQL-style cast operator ::with expressions using[]and.operators (array and tuple indexing). #34229 (Nikolay Degterinsky).
- Recognize YYYYMMDD-hhmmssformat inparseDateTimeBestEffortfunction. This closes #34206. #34208 (alexey-milovidov).
- Allow carriage return in the middle of the line while parsing by Regexpformat. This closes #34200. #34205 (alexey-milovidov).
- Allow to parse dictionary's PRIMARY KEYasPRIMARY KEY (id, value); previously supported onlyPRIMARY KEY id, value. Closes #34135. #34141 (Maksim Kita).
- An optional argument for splitByCharto limit the number of resulting elements. close #34081. #34140 (李扬).
- Improving the experience of multiple line editing for clickhouse-client. This is a follow-up of #31123. #34114 (Amos Bird).
- Add UUIDsuport inMsgPackinput/output format. #34065 (Kruglov Pavel).
- Tracing context (for OpenTelemetry) is now propagated from GRPC client metadata (this change is relevant for GRPC client-server protocol). #34064 (andremarianiello).
- Supports all types of SYSTEMqueries withON CLUSTERclause. #34005 (小路).
- Improve memory accounting for queries that are using less than max_untracker_memory. #34001 (Azat Khuzhin).
- Fixed UTF-8 string case-insensitive search when lowercase and uppercase characters are represented by different number of bytes. Example is ẞandß. This closes #7334. #33992 (Harry Lee).
- Detect format and schema from stdin in clickhouse-local. #33960 (Kruglov Pavel).
- Correctly handle the case of misconfiguration when multiple disks are using the same path on the filesystem. #29072. #33905 (zhongyuankai).
- Try every resolved IP address while getting S3 proxy. S3 proxies are rarely used, mostly in Yandex Cloud. #33862 (Nikolai Kochetov).
- Support EXPLAIN AST CREATE FUNCTION query EXPLAIN AST CREATE FUNCTION mycast AS (n) -> cast(n as String)will returnEXPLAIN AST CREATE FUNCTION mycast AS n -> CAST(n, 'String'). #33819 (李扬).
- Added support for cast from Map(Key, Value)toArray(Tuple(Key, Value)). #33794 (Maksim Kita).
- Add some improvements and fixes for Booldata type. Fixes #33244. #33737 (Kruglov Pavel).
- Parse and store OpenTelemetry trace-id in big-endian order. #33723 (Frank Chen).
- Improvement for fromUnixTimestamp64family functions.. They now accept any integer value that can be converted toInt64. This closes: #14648. #33505 (Andrey Zvonov).
- Reimplement _shard_numfrom constants (see #7624) withshardNum()function (seee #27020), to avoid possible issues (like those that had been found in #16947). #33392 (Azat Khuzhin).
- Enable binary arithmetic (plus, minus, multiply, division, least, greatest) between Decimal and Float. #33355 (flynn).
- Respect cgroups limits in max_threads autodetection. #33342 (JaySon).
- Add new clickhouse-keeper setting min_session_timeout_ms. Now clickhouse-keeper will determine client session timeout according tomin_session_timeout_msandsession_timeout_mssettings. #33288 (JackyWoo).
- Added UUIDdata type support for functionshexandbin. #32170 (Frank Chen).
- Fix reading of subcolumns with dots in their names. In particular fixed reading of Nestedcolumns, if their element names contain dots (e.gNested(`keys.name` String, `keys.id` UInt64, values UInt64)). #34228 (Anton Popov).
- Fixes parallel_view_processing = 0not working when inserting into a table usingVALUES. - Fixesview_duration_msin thequery_views_lognot being set correctly for materialized views. #34067 (Raúl Marín).
- Fix parsing tables structure from ZooKeeper: now metadata from ZooKeeper compared with local metadata in canonical form. It helps when canonical function names can change between ClickHouse versions. #33933 (sunny).
- Properly escape some characters for interaction with LDAP. #33401 (IlyaTsoi).
Build/Testing/Packaging Improvement
- Remove unbundled build support. #33690 (Azat Khuzhin).
- Ensure that tests don't depend on the result of non-stable sorting of equal elements. Added equal items ranges randomization in debug after sort to prevent issues when we rely on equal items sort order. #34393 (Maksim Kita).
- Add verbosity to a style check. #34289 (Mikhail f. Shiryaev).
- Remove clickhouse-testdebian package because it's obsolete. #33948 (Ilya Yatsishin).
- Multiple improvements for build system to remove the possibility of occasionally using packages from the OS and to enforce hermetic builds. #33695 (Amos Bird).
Bug Fix (user-visible misbehaviour in official stable or prestable release)
- Fixed the assertion in case of using allow_experimental_parallel_reading_from_replicaswithmax_parallel_replicasequals to 1. This fixes #34525. #34613 (Nikita Mikhaylov).
- Fix rare bug while reading of empty arrays, which could lead to Data compressed with different methodserror. It can reproduce if you have mostly empty arrays, but not always. And reading is performed in backward direction with ORDER BY ... DESC. This error is extremely unlikely to happen. #34327 (Anton Popov).
- Fix wrong result of round/roundBankersif integer values of small types are rounded. Closes #33267. #34562 (李扬).
- Sometimes query cancellation did not work immediately when we were reading multiple files from s3 or HDFS. Fixes #34301 Relates to #34397. #34539 (Dmitry Novik).
- Fix exception Chunk should have AggregatedChunkInfo in MergingAggregatedTransform(in case ofoptimize_aggregation_in_order = 1anddistributed_aggregation_memory_efficient = 0). Fixes #34526. #34532 (Anton Popov).
- Fix comparison between integers and floats in index analysis. Previously it could lead to skipping some granules for reading by mistake. Fixes #34493. #34528 (Anton Popov).
- Fix compression support in URL engine. #34524 (Frank Chen).
- Fix possible error 'file_size: Operation not supported' in files' schema autodetection. #34479 (Kruglov Pavel).
- Fixes possible race with table deletion. #34416 (Kseniia Sumarokova).
- Fix possible error Cannot convert column Function to maskin short circuit function evaluation. Closes #34171. #34415 (Kruglov Pavel).
- Fix potential crash when doing schema inference from url source. Closes #34147. #34405 (Kruglov Pavel).
- For UDFs access permissions were checked for database level instead of global level as it should be. Closes #34281. #34404 (Maksim Kita).
- Fix wrong engine syntax in result of SHOW CREATE DATABASEquery for databases with engineMemory. This closes #34335. #34345 (alexey-milovidov).
- Fixed a couple of extremely rare race conditions that might lead to broken state of replication queue and "intersecting parts" error. #34297 (tavplubix).
- Fix progress bar width. It was incorrectly rounded to integer number of characters. #34275 (alexey-milovidov).
- Fix current_user/current_address client information fields for inter-server communication (before this patch current_user/current_address will be preserved from the previous query). #34263 (Azat Khuzhin).
- Fix memory leak in case of some Exception during query processing with optimize_aggregation_in_order=1. #34234 (Azat Khuzhin).
- Fix metric Query, which shows the number of executing queries. In last several releases it was always 0. #34224 (Anton Popov).
- Fix schema inference for table runction s3. #34186 (Kruglov Pavel).
- Fix rare and benign race condition in HDFS,S3andURLstorage engines which can lead to additional connections. #34172 (alesapin).
- Fix bug which can rarely lead to error "Cannot read all data" while reading LowCardinality columns of MergeTree table engines family which stores data on remote file system like S3 (virtual filesystem over s3 is an experimental feature that is not ready for production). #34139 (alesapin).
- Fix inserts to distributed tables in case of a change of native protocol. The last change was in the version 22.1, so there may be some failures of inserts to distributed tables after upgrade to that version. #34132 (Anton Popov).
- Fix possible data race in Filetable engine that was introduced in #33960. Closes #34111. #34113 (Kruglov Pavel).
- Fixed minor race condition that might cause "intersecting parts" error in extremely rare cases after ZooKeeper connection loss. #34096 (tavplubix).
- Fix asynchronous inserts with Nativeformat. #34068 (Anton Popov).
- Fix bug which lead to inability for server to start when both replicated access storage and keeper (embedded in clickhouse-server) are used. Introduced two settings for keeper socket timeout instead of settings from default user: keeper_server.socket_receive_timeout_secandkeeper_server.socket_send_timeout_sec. Fixes #33973. #33988 (alesapin).
- Fix segfault while parsing ORC file with corrupted footer. Closes #33797. #33984 (Kruglov Pavel).
- Fix parsing IPv6 from query parameter (prepared statements) and fix IPv6 to string conversion. Closes #33928. #33971 (Kruglov Pavel).
- Fix crash while reading of nested tuples. Fixes #33838. #33956 (Anton Popov).
- Fix usage of functions arrayandtuplewith literal arguments in distributed queries. Previously it could lead toNot found columnsexception. #33938 (Anton Popov).
- Aggregate function combinator -Ifdid not correctly processNullablefilter argument. This closes #27073. #33920 (alexey-milovidov).
- Fix potential race condition when doing remote disk read (virtual filesystem over s3 is an experimental feature that is not ready for production). #33912 (Amos Bird).
- Fix crash if SQL UDF is created with lambda with non identifier arguments. Closes #33866. #33868 (Maksim Kita).
- Fix usage of sparse columns (which can be enabled by experimental setting ratio_of_defaults_for_sparse_serialization). #33849 (Anton Popov).
- Fixed replica is not readonlylogical error onSYSTEM RESTORE REPLICAquery when replica is actually readonly. Fixes #33806. #33847 (tavplubix).
- Fix memory leak in clickhouse-keeperin case of compression is used (default). #33840 (Azat Khuzhin).
- Fix index analysis with no common types available. #33833 (Amos Bird).
- Fix schema inference for JSONEachRowandJSONCompactEachRow. #33830 (Kruglov Pavel).
- Fix usage of external dictionaries with redissource and large number of keys. #33804 (Anton Popov).
- Fix bug in client that led to 'Connection reset by peer' in server. Closes #33309. #33790 (Kruglov Pavel).
- Fix parsing query INSERT INTO ... VALUES SETTINGS ... (...), ... #33776 (Kruglov Pavel).
- Fix bug of check table when creating data part with wide format and projection. #33774 (李扬).
- Fix tiny race between count() and INSERT/merges/... in MergeTree (it is possible to return incorrect number of rows for SELECT with optimize_trivial_count_query). #33753 (Azat Khuzhin).
- Throw exception when directory listing request has failed in storage HDFS. #33724 (LiuNeng).
- Fix mutation when table contains projections. This fixes #33010. This fixes #33275. #33679 (Amos Bird).
- Correctly determine current database if CREATE TEMPORARY TABLE AS SELECTis queried inside a named HTTP session. This is a very rare use case. This closes #8340. #33676 (alexey-milovidov).
- Allow some queries with sorting, LIMIT BY, ARRAY JOIN and lambda functions. This closes #7462. #33675 (alexey-milovidov).
- Fix bug in "zero copy replication" (a feature that is under development and should not be used in production) which lead to data duplication in case of TTL move. Fixes #33643. #33642 (alesapin).
- Fix Chunk should have AggregatedChunkInfo in GroupingAggregatedTransform(in case ofoptimize_aggregation_in_order = 1). #33637 (Azat Khuzhin).
- Fix error Bad cast from type ... to DB::DataTypeArraywhich may happen when table hasNestedcolumn with dots in name, and default value is generated for it (e.g. during insert, when column is not listed). Continuation of #28762. #33588 (Alexey Pavlenko).
- Export into lz4files has been fixed. Closes #31421. #31862 (Kruglov Pavel).
- Fix potential crash if group_by_overflow_modewas set toany(approximate GROUP BY) and aggregation was performed by single column of typeLowCardinality. #34506 (DR).
- Fix inserting to temporary tables via gRPC client-server protocol. Fixes #34347, issue #2. #34364 (Vitaly Baranov).
- Fix issue #19429. #34225 (Vitaly Baranov).
- Fix issue #18206. #33977 (Vitaly Baranov).
- This PR allows using multiple LDAP storages in the same list of user directories. It worked earlier but was broken because LDAP tests are disabled (they are part of the testflows tests). #33574 (Vitaly Baranov).
ClickHouse release v22.1, 2022-01-18
Upgrade Notes
- The functions leftandrightwere previously implemented in parser and now full-featured. Distributed queries withleftorrightfunctions without aliases may throw exception if cluster contains different versions of clickhouse-server. If you are upgrading your cluster and encounter this error, you should finish upgrading your cluster to ensure all nodes have the same version. Also you can add aliases (AS something) to the columns in your queries to avoid this issue. #33407 (alexey-milovidov).
- Resource usage by scalar subqueries is fully accounted since this version. With this change, rows read in scalar subqueries are now reported in the query_log. If the scalar subquery is cached (repeated or called for several rows) the rows read are only counted once. This change allows KILLing queries and reporting progress while they are executing scalar subqueries. #32271 (Raúl Marín).
New Feature
- Implement data schema inference for input formats. Allow to skip structure (or write just auto) in table functionsfile,url,s3,hdfsand in parameters ofclickhouse-local. Allow to skip structure in create query for table enginesFile,HDFS,S3,URL,Merge,Buffer,DistributedandReplicatedMergeTree(if we add new replicas). #32455 (Kruglov Pavel).
- Detect format by file extension in file/hdfs/s3/urltable functions andHDFS/S3/URLtable engines and also forSELECT INTO OUTFILEandINSERT FROM INFILE#33565 (Kruglov Pavel). Close #30918. #33443 (OnePiece).
- A tool for collecting diagnostics data if you need support. #33175 (Alexander Burmak).
- Automatic cluster discovery via Zoo/Keeper. It allows to add replicas to the cluster without changing configuration on every server. #31442 (vdimir).
- Implement hive table engine to access apache hive from clickhouse. This implements: #29245. #31104 (taiyang-li).
- Add aggregate functions cramersV,cramersVBiasCorrected,theilsUandcontingency. These functions calculate dependency (measure of association) between categorical values. All these functions are using cross-tab (histogram on pairs) for implementation. You can imagine it like a correlation coefficient but for any discrete values (not necessary numbers). #33366 (alexey-milovidov). Initial implementation by Vanyok-All-is-OK and antikvist.
- Added table function hdfsClusterwhich allows processing files from HDFS in parallel from many nodes in a specified cluster, similarly tos3Cluster. #32400 (Zhichang Yu).
- Adding support for disks backed by Azure Blob Storage, in a similar way it has been done for disks backed by AWS S3. #31505 (Jakub Kuklis).
- Allow COMMENTinCREATE VIEW(for all VIEW kinds). #31062 (Vasily Nemkov).
- Dynamically reinitialize listening ports and protocols when configuration changes. #30549 (Kevin Michel).
- Added left,right,leftUTF8,rightUTF8functions. Fix error in implementation ofsubstringUTF8function with negative offset (offset from the end of string). #33407 (alexey-milovidov).
- Add new functions for H3coordinate system:h3HexAreaKm2,h3CellAreaM2,h3CellAreaRads2. #33479 (Bharat Nallan).
- Add MONTHNAMEfunction. #33436 (usurai).
- Added function arrayLast. Closes #33390. #33415 Added functionarrayLastIndex. #33465 (Maksim Kita).
- Add function decodeURLFormComponentslightly different todecodeURLComponent. Close #10298. #33451 (SuperDJY).
- Allow to split GraphiteMergeTreerollup rules for plain/tagged metrics (optional rule_type field). #33494 (Michail Safronov).
Performance Improvement
- Support moving conditions to PREWHERE(settingoptimize_move_to_prewhere) for tables ofMergeengine if its all underlying tables supportsPREWHERE. #33300 (Anton Popov).
- More efficient handling of globs for URL storage. Now you can easily query million URLs in parallel with retries. Closes #32866. #32907 (Kseniia Sumarokova).
- Avoid exponential backtracking in parser. This closes #20158. #33481 (alexey-milovidov).
- Abuse of untuplefunction was leading to exponential complexity of query analysis (found by fuzzer). This closes #33297. #33445 (alexey-milovidov).
- Reduce allocated memory for dictionaries with string attributes. #33466 (Maksim Kita).
- Slight performance improvement of reinterpretfunction. #32587 (alexey-milovidov).
- Non significant change. In extremely rare cases when data part is lost on every replica, after merging of some data parts, the subsequent queries may skip less amount of partitions during partition pruning. This hardly affects anything. #32220 (Azat Khuzhin).
- Improve clickhouse-keeperwriting performance by optimization the size calculation logic. #32366 (zhanglistar).
- Optimize single part projection materialization. This closes #31669. #31885 (Amos Bird).
- Improve query performance of system tables. #33312 (OnePiece).
- Optimize selecting of MergeTree parts that can be moved between volumes. #33225 (OnePiece).
- Fix sparse_hasheddict performance with sequential keys (wrong hash function). #32536 (Azat Khuzhin).
Experimental Feature
- Parallel reading from multiple replicas within a shard during distributed query without using sample key. To enable this, set allow_experimental_parallel_reading_from_replicas = 1andmax_parallel_replicasto any number. This closes #26748. #29279 (Nikita Mikhaylov).
- Implemented sparse serialization. It can reduce usage of disk space and improve performance of some queries for columns, which contain a lot of default (zero) values. It can be enabled by setting ratio_for_sparse_serialization. Sparse serialization will be chosen dynamically for column, if it has ratio of number of default values to number of all values above that threshold. Serialization (default or sparse) will be fixed for every column in part, but may varies between parts. #22535 (Anton Popov).
- Add "TABLE OVERRIDE" feature for customizing MaterializedMySQL table schemas. #32325 (Stig Bakken).
- Add EXPLAIN TABLE OVERRIDEquery. #32836 (Stig Bakken).
- Support TABLE OVERRIDE clause for MaterializedPostgreSQL. RFC: #31480. #32749 (Kseniia Sumarokova).
- Change ZooKeeper path for zero-copy marks for shared data. Note that "zero-copy replication" is non-production feature (in early stages of development) that you shouldn't use anyway. But in case if you have used it, let you keep in mind this change. #32061 (ianton-ru).
- Events clause support for WINDOW VIEW watch query. #32607 (vxider).
- Fix ACL with explicit digit hash in clickhouse-keeper: now the behavior consistent with ZooKeeper and generated digest is always accepted. #33249 (小路). #33246.
- Fix unexpected projection removal when detaching parts. #32067 (Amos Bird).
Improvement
- Now date time conversion functions that generates time before 1970-01-01 00:00:00will be saturated to zero instead of overflow. #29953 (Amos Bird). It also fixes a bug in index analysis if date truncation function would yield result before the Unix epoch.
- Always display resource usage (total CPU usage, total RAM usage and max RAM usage per host) in client. #33271 (alexey-milovidov).
- Improve Booltype serialization and deserialization, check the range of values. #32984 (Kruglov Pavel).
- If an invalid setting is defined using the SETquery or using the query parameters in the HTTP request, error message will contain suggestions that are similar to the invalid setting string (if any exists). #32946 (Antonio Andelic).
- Support hints for mistyped setting names for clickhouse-client and clickhouse-local. Closes #32237. #32841 (凌涛).
- Allow to use virtual columns in Materialized Views. Close #11210. #33482 (OnePiece).
- Add config to disable IPv6 in clickhouse-keeper if needed. This close #33381. #33450 (Wu Xueyang).
- Add more info to system.build_optionsabout current git revision. #33431 (taiyang-li).
- clickhouse-local: track memory under- --max_memory_usage_in_clientoption. #33341 (Azat Khuzhin).
- Allow negative intervals in function intervalLengthSum. Their length will be added as well. This closes #33323. #33335 (alexey-milovidov).
- LineAsStringcan be used as output format. This closes #30919. #33331 (Sergei Trifonov).
- Support <secure/>in cluster configuration, as an alternative form of<secure>1</secure>. Close #33270. #33330 (SuperDJY).
- Pressing Ctrl+C twice will terminate clickhouse-benchmarkimmediately without waiting for in-flight queries. This closes #32586. #33303 (alexey-milovidov).
- Support Unix timestamp with milliseconds in parseDateTimeBestEffortfunction. #33276 (Ben).
- Allow to cancel query while reading data from external table in the formats: Arrow/Parquet/ORC- it failed to be cancelled it case of big files and setting input_format_allow_seeks as false. Closes #29678. #33238 (Kseniia Sumarokova).
- If table engine supports SETTINGSclause, allow to pass the settings as key-value or via config. Add this support for MySQL. #33231 (Kseniia Sumarokova).
- Correctly prevent Nullable primary keys if necessary. This is for #32780. #33218 (Amos Bird).
- Add retry for PostgreSQLconnections in case nothing has been fetched yet. Closes #33199. #33209 (Kseniia Sumarokova).
- Validate config keys for external dictionaries. #33095. #33130 (Kseniia Sumarokova).
- Send profile info inside clickhouse-local. Closes #33093. #33097 (Kseniia Sumarokova).
- Short circuit evaluation: support for function throwIf. Closes #32969. #32973 (Maksim Kita).
- (This only happens in unofficial builds). Fixed segfault when inserting data into compressed Decimal, String, FixedString and Array columns. This closes #32939. #32940 (N. Kolotov).
- Added support for specifying subquery as SQL user defined function. Example: CREATE FUNCTION test AS () -> (SELECT 1). Closes #30755. #32758 (Maksim Kita).
- Improve gRPC compression support for #28671. #32747 (Vitaly Baranov).
- Flush all In-Memory data parts when WAL is not enabled while shutdown server or detaching table. #32742 (nauta).
- Allow to control connection timeouts for MySQL (previously was supported only for dictionary source). Closes #16669. Previously default connect_timeout was rather small, now it is configurable. #32734 (Kseniia Sumarokova).
- Support authSourceoption for storageMongoDB. Closes #32594. #32702 (Kseniia Sumarokova).
- Support Date32type ingenarateRandomtable function. #32643 (nauta).
- Add settings max_concurrent_select_queriesandmax_concurrent_insert_queriesfor control concurrent queries by query kind. Close #3575. #32609 (SuperDJY).
- Improve handling nested structures with missing columns while reading data in Protobufformat. Follow-up to https://github.com/ClickHouse/ClickHouse/pull/31988. #32531 (Vitaly Baranov).
- Allow empty credentials for MongoDBengine. Closes #26267. #32460 (Kseniia Sumarokova).
- Disable some optimizations for window functions that may lead to exceptions. Closes #31535. Closes #31620. #32453 (Kseniia Sumarokova).
- Allows to connect to MongoDB 5.0. Closes #31483,. #32416 (Kseniia Sumarokova).
- Enable comparison between DecimalandFloat. Closes #22626. #31966 (flynn).
- Added settings command_read_timeout,command_write_timeoutforStorageExecutable,StorageExecutablePool,ExecutableDictionary,ExecutablePoolDictionary,ExecutableUserDefinedFunctions. Settingcommand_read_timeoutcontrols timeout for reading data from command stdout in milliseconds. Settingcommand_write_timeouttimeout for writing data to command stdin in milliseconds. Added settingscommand_termination_timeoutforExecutableUserDefinedFunction,ExecutableDictionary,StorageExecutable. Added settingexecute_directforExecutableUserDefinedFunction, by default true. Added settingexecute_directforExecutableDictionary,ExecutablePoolDictionary, by default false. #30957 (Maksim Kita).
- Bitmap aggregate functions will give correct result for out of range argument instead of wraparound. #33127 (DR).
- Fix parsing incorrect queries with FROM INFILEstatement. #33521 (Kruglov Pavel).
- Don't allow to write into S3if path contains globs. #33142 (Kruglov Pavel).
- --echooption was not used by- clickhouse-clientin batch mode with single query. #32843 (N. Kolotov).
- Use --databaseoption for clickhouse-local. #32797 (Kseniia Sumarokova).
- Fix surprisingly bad code in SQL ordinary function file. Now it supports symlinks. #32640 (alexey-milovidov).
- Updating modification_timefor data part insystem.partsafter part movement #32964. #32965 (save-my-heart).
- Potential issue, cannot be exploited: integer overflow may happen in array resize. #33024 (varadarajkumar).
Build/Testing/Packaging Improvement
- Add packages, functional tests and Docker builds for AArch64 (ARM) version of ClickHouse. #32911 (Mikhail f. Shiryaev). #32415
- Prepare ClickHouse to be built with musl-libc. It is not enabled by default. #33134 (alexey-milovidov).
- Make installation script working on FreeBSD. This closes #33384. #33418 (alexey-milovidov).
- Add actionlintfor GitHub Actions workflows and verify workflow files viaact --listto check the correct workflow syntax. #33612 (Mikhail f. Shiryaev).
- Add more tests for the nullable primary key feature. Add more tests with different types and merge tree kinds, plus randomly generated data. #33228 (Amos Bird).
- Add a simple tool to visualize flaky tests in web browser. #33185 (alexey-milovidov).
- Enable hermetic build for shared builds. This is mainly for developers. #32968 (Amos Bird).
- Update libc++andlibc++abito the latest. #32484 (Raúl Marín).
- Added integration test for external .NET client (ClickHouse.Client). #23230 (Oleg V. Kozlyuk).
- Inject git information into clickhouse binary file. So we can get source code revision easily from clickhouse binary file. #33124 (taiyang-li).
- Remove obsolete code from ConfigProcessor. Yandex specific code is not used anymore. The code contained one minor defect. This defect was reported by Mallik Hassan in #33032. This closes #33032. #33026 (alexey-milovidov).
Bug Fix (user-visible misbehavior in official stable or prestable release)
- Several fixes for format parsing. This is relevant if clickhouse-serveris open for write access to adversary. Specifically crafted input data forNativeformat may lead to reading uninitialized memory or crash. This is relevant ifclickhouse-serveris open for write access to adversary. #33050 (Heena Bansal). Fixed Apache Avro Union type index out of boundary issue in Apache Avro binary format. #33022 (Harry Lee). Fix null pointer dereference inLowCardinalitydata when deserializingLowCardinalitydata in the Native format. #33021 (Harry Lee).
- ClickHouse Keeper handler will correctly remove operation when response sent. #32988 (JackyWoo).
- Potential off-by-one miscalculation of quotas: quota limit was not reached, but the limit was exceeded. This fixes #31174. #31656 (sunny).
- Fixed CASTing from String to IPv4 or IPv6 and back. Fixed error message in case of failed conversion. #29224 (Dmitry Novik) #27914 (Vasily Nemkov).
- Fixed an exception like Unknown aggregate function nothingduring an execution on a remote server. This fixes #16689. #26074 (hexiaoting).
- Fix wrong database for JOIN without explicit database in distributed queries (Fixes: #10471). #33611 (Azat Khuzhin).
- Fix segfault in Apache Avroformat that appears after the second insert into file. #33566 (Kruglov Pavel).
- Fix segfault in Apache Arrowformat if schema containsDictionarytype. Closes #33507. #33529 (Kruglov Pavel).
- Out of band offsetandlimitsettings may be applied incorrectly for views. Close #33289 #33518 (hexiaoting).
- Fix an exception Block structure mismatchwhich may happen during insertion into table with default nestedLowCardinalitycolumn. Fixes #33028. #33504 (Nikolai Kochetov).
- Fix dictionary expressions for range_hashedrange min and range max attributes when created using DDL. Closes #30809. #33478 (Maksim Kita).
- Fix possible use-after-free for INSERT into Materialized View with concurrent DROP (Azat Khuzhin).
- Do not try to read pass EOF (to workaround for a bug in the Linux kernel), this bug can be reproduced on kernels (3.14..5.9), and requires index_granularity_bytes=0(i.e. turn off adaptive index granularity). #33372 (Azat Khuzhin).
- The commands SYSTEM SUSPENDandSYSTEM ... THREAD FUZZERmissed access control. It is fixed. Author: Kevin Michel. #33333 (alexey-milovidov).
- Fix when COMMENTfor dictionaries does not appear insystem.tables,system.dictionaries. Allow to modify the comment forDictionaryengine. Closes #33251. #33261 (Maksim Kita).
- Add asynchronous inserts (with enabled setting async_insert) to query log. Previously such queries didn't appear in the query log. #33239 (Anton Popov).
- Fix sending WHERE 1 = 0expressions for external databases query. Closes #33152. #33214 (Kseniia Sumarokova).
- Fix DDL validation for MaterializedPostgreSQL. Fix setting materialized_postgresql_allow_automatic_update. Closes #29535. #33200 (Kseniia Sumarokova). Make sure unused replication slots are always removed. Found in #26952. #33187 (Kseniia Sumarokova). Fix MaterializedPostreSQL detach/attach (removing / adding to replication) tables with non-default schema. Found in #29535. #33179 (Kseniia Sumarokova). Fix DROP MaterializedPostgreSQL database. #33468 (Kseniia Sumarokova).
- The metric StorageBufferBytessometimes was miscalculated. #33159 (xuyatian).
- Fix error Invalid version for SerializationLowCardinality key columnin case of reading fromLowCardinalitycolumn withlocal_filesystem_read_prefetchorremote_filesystem_read_prefetchenabled. #33046 (Nikolai Kochetov).
- Fix s3table function reading empty file. Closes #33008. #33037 (Kseniia Sumarokova).
- Fix Context leak in case of cancel_http_readonly_queries_on_client_close (i.e. leaking of external tables that had been uploaded the the server and other resources). #32982 (Azat Khuzhin).
- Fix wrong tuple output in CSVformat in case of custom csv delimiter. #32981 (Kruglov Pavel).
- Fix HDFS URL check that didn't allow using HA namenode address. Bug was introduced in https://github.com/ClickHouse/ClickHouse/pull/31042. #32976 (Kruglov Pavel).
- Fix throwing exception like positional argument out of bounds for non-positional arguments. Closes #31173#event-5789668239. #32961 (Kseniia Sumarokova).
- Fix UB in case of unexpected EOF during filling a set from HTTP query (i.e. if the client interrupted in the middle, i.e. timeout 0.15s curl -Ss -F 's=@t.csv;' 'http://127.0.0.1:8123/?s_structure=key+Int&query=SELECT+dummy+IN+s'and with large enought.csv). #32955 (Azat Khuzhin).
- Fix a regression in replaceRegexpAllfunction. The function worked incorrectly when matched substring was empty. This closes #32777. This closes #30245. #32945 (alexey-milovidov).
- Fix ORCformat stripe reading. #32929 (kreuzerkrieg).
- topKWeightedStatefailed for some input types. #32487. #32914 (vdimir).
- Fix exception Single chunk is expected from view inner query (LOGICAL_ERROR)in materialized view. Fixes #31419. #32862 (Nikolai Kochetov).
- Fix optimization with lazy seek for async reads from remote filesystems. Closes #32803. #32835 (Kseniia Sumarokova).
- MergeTreetable engine might silently skip some mutations if there are too many running mutations or in case of high memory consumption, it's fixed. Fixes #17882. #32814 (tavplubix).
- Avoid reusing the scalar subquery cache when processing MV blocks. This fixes a bug when the scalar query reference the source table but it means that all subscalar queries in the MV definition will be calculated for each block. #32811 (Raúl Marín).
- Server might fail to start if database with MySQLengine cannot connect to MySQL server, it's fixed. Fixes #14441. #32802 (tavplubix).
- Fix crash when used fuzzBitsfunction, close #32737. #32755 (SuperDJY).
- Fix error Column is not under aggregate functionin case of MV withGROUP BY (list of columns)(which is pared asGROUP BY tuple(...)) overKafka/RabbitMQ. Fixes #32668 and #32744. #32751 (Nikolai Kochetov).
- Fix ALTER TABLE ... MATERIALIZE TTLquery withTTL ... DELETE WHERE ...andTTL ... GROUP BY ...modes. #32695 (Anton Popov).
- Fix optimize_read_in_orderoptimization in case when table engine isDistributedorMergeand its underlyingMergeTreetables have monotonous function in prefix of sorting key. #32670 (Anton Popov).
- Fix LOGICAL_ERROR exception when the target of a materialized view is a JOIN or a SET table. #32669 (Raúl Marín).
- Inserting into S3 with multipart upload to Google Cloud Storage may trigger abort. #32504. #32649 (vdimir).
- Fix possible exception at RabbitMQstorage startup by delaying channel creation. #32584 (Kseniia Sumarokova).
- Fix table lifetime (i.e. possible use-after-free) in case of parallel DROP TABLE and INSERT. #32572 (Azat Khuzhin).
- Fix async inserts with formats CustomSeparated,Template,Regexp,MsgPackandJSONAsString. Previousely the async inserts with these formats didn't read any data. #32530 (Kruglov Pavel).
- Fix groupBitmapAndfunction on distributed table. #32529 (minhthucdao).
- Fix crash in JOIN found by fuzzer, close #32458. #32508 (vdimir).
- Proper handling of the case with Apache Arrow column duplication. #32507 (Dmitriy Mokhnatkin).
- Fix issue with ambiguous query formatting in distributed queries that led to errors when some table columns were named ALLorDISTINCT. This closes #32391. #32490 (alexey-milovidov).
- Fix failures in queries that are trying to use skipping indices, which are not materialized yet. Fixes #32292 and #30343. #32359 (Anton Popov).
- Fix broken select query when there are more than 2 row policies on same column, begin at second queries on the same session. #31606. #32291 (SuperDJY).
- Fix fractional unix timestamp conversion to DateTime64, fractional part was reversed for negative unix timestamps (before 1970-01-01). #32240 (Ben).
- Some entries of replication queue might hang for temporary_directories_lifetime(1 day by default) withDirectory tmp_merge_<part_name>orPart ... (state Deleting) already exists, but it will be deleted soonor similar error. It's fixed. Fixes #29616. #32201 (tavplubix).
- Fix parsing of APPLY lambdacolumn transformer which could lead to client/server crash. #32138 (Kruglov Pavel).
- Fix base64Encodeadding trailing bytes on small strings. #31797 (Kevin Michel).
- Fix possible crash (or incorrect result) in case of LowCardinalityarguments of window function. Fixes #31114. #31888 (Nikolai Kochetov).
- Fix hang up with command DROP TABLE system.query_log sync. #33293 (zhanghuajie).