It’s my pleasure to announce the release of Debezium0.7.2!

Amongst the new features there’s support for geo-spatial types, a new snapshotting mode for recovering a lost DB history topic for the MySQL connector, and a message transformation for converting MongoDB change events into a structure which can be consumed by many more sink connectors. And of course we fixed a whole lot of bugs, too.

Debezium 0.7.2 is a drop-in replacement for previous 0.7.x versions. When upgrading from versions earlier than 0.7.0, please check out therelease notesof all 0.7.x releases to learn about any steps potentially required for upgrading.

A big thank you goes out to our fantastic community members for their hard work on this release:Andrey Pustovetov,Denis Mikhaylov,Peter Goransson,Robert Coup,Sairam PolavarapuandTom Bentley.

Now let’s take a closer look at some of new features.

MySQL Connector

The biggest change of the MySQL connector issupport for geo-spatial column typessuch asGEOMETRY,多边形,MULTIPOINTetc.

There are two new logical field types — io.debezium.data.geometry.Geometryandio.debezium.data.geometry.Geography — for representing geo-spatial columns in change data messages. These types represent geo-spatial data via WKB ("well-known binary") and SRID (coordinate reference system identifier), allowing downstream consumers to interpret the change events using any existing library with support for parsing WKB. A blog post with more details on this will follow soon.

Thenew snapshotting modeschema_only_recoverycomes in handy when for some reason you lost (parts of) the DB history topic used by the MySQL connector. It’s also useful if you’d like to compact that topic by re-creating it. Please refer to theconnector documentationfor the details of this mode, esp. when it’s safe (and when not) to make use of it.

Another new feature related to managing the size of the DB history topic isthe optionto control whether to include all DDL events or only those pertaining to tables captured as per the whitelist/blacklist configuration. Again, check out the connector docs to learn more about the specifics of that setting.

Finally, we fixed a few shortcomings of the MySQL DDL parser (DBZ-524,DBZ-530).

PostgreSQL Connector

Similar to the MySQL connector, there’s largely improved support for geo-spatial columns in Postgres now. More specifically, PostGIS column types can be represented in change data events now. Thanks a lot for Robert Coup who contributed this feature!

Also the support for Postgresarray columnshas been expanded, e.g. we now support to track changes toVARCHARandDATEarray columns. Note that the connector doesn’t yet work with geo-spatial array columns (should you ever have those), but thisshould be addedsoon, too.

If you’d like to include just a subset of the rows of a captured table in snapshots, you may like the ability tospecify dedicated SELECT statementsto do so. For instance this can be used to exclude any logically deleted records — which you can recognize based on some flag in that table — from the snapshot.

A few bugs in this connector where reported and fixed by community members, too, e.g. the connector can becorrectly pausednow (thanks, Andrey Pustovetov), and we fixed an issue which could potentially have committed anincorrect offsetto Kafka Connect (thanks, Thon Mekathikom).

MongoDB Connector

If you’ve ever compared the structures of change events emitted by the Debezium RDBMS connectors (MySQL, Postgres) and the MongoDB connector, you’ll know that the message structure of the latter is a bit different than the others. Due to the schemaless nature of MongoDB, the change events essentially contain a String with a JSON representation of the applied insert or patch. This structure cannot be consumed by existing sink connectors, such as the Confluent connectors for JDBC or Elasticsearch.

This getspossible nowby means of a newly added single message transformation (SMT), which parses these JSON strings and creates a structured Kafka Connect record from it (thanks, Sairam Polavarapu!). When applying this SMT to the JDBC sink connector, you can now stream data changes from MongoDB to any supported relational database.

Note that this SMT is work-in-progress, details of its emitted message structure may still change. Also there are some inherent limitations to what can be achieved with it, if you e.g. have arrays in your MongoDB documents, the record created by this SMT will be structured accordingly, but many sink connectors cannot process such structure.

We have some ideas for further development here, e.g. there could bean optionfor flattening out (non-array) nested structures, so that e.g.{“地址”{“街头”:“……”}}would be represented asaddress_street, which then could be consumed by sink connectors expecting a flat structure.

The new SMT is described in detail inour docs.

What’s next?

Please see thefull change logfor more details and the complete list of issues fixed in Debezium 0.7.2.

The 0.7.3 release is scheduled for February 14th.

We’ll focus on some more bug fixes, also we’re working on having Debezium regulary emitheartbeat messagesto a dedicated topic. This will be practical for diagnostic purposes but also help to regularly trigger commits of the offset in Kafka Connect. That’s beneficial in certain situations when capturing tables which only very infrequently change.

We’ve also worked outa roadmapdescribing our ideas for future work on Debezium, going beyond the next bugfix releases. While nothing is cast in stone, this is our idea of the features to add in the coming months. If you miss anything important on this roadmap, please tell us either in the comments below or send a message to our Google group. Looking forward to your feedback!

Gunnar Morling

Gunnar is a software engineer at Decodable and an open-source enthusiast by heart. He has been the project lead of Debezium over many years. Gunnar has created open-source projects like kcctl, JfrUnit, and MapStruct, and is the spec lead for Bean Validation 2.0 (JSR 380). He’s based in Hamburg, Germany.


About Debezium

Debezium is an open source distributed platform that turns your existing databases into event streams, so applications can see and respond almost instantly to each committed row-level change in the databases. Debezium is built on top ofKafkaand providesKafka Connectcompatible connectors that monitor specific database management systems. Debezium records the history of data changes in Kafka logs, so your application can be stopped and restarted at any time and can easily consume all of the events it missed while it was not running, ensuring that all events are processed correctly and completely. Debezium isopen sourceunder theApache License, Version 2.0.

Get involved

We hope you find Debezium interesting and useful, and want to give it a try. Follow us on Twitter@debezium,chat with us on Zulip, or join ourmailing listto talk with the community. All of the code is open sourceon GitHub, so build the code locally and help us improve ours existing connectors and add even more connectors. If you find problems or have ideas how we can improve Debezium, please let us know orlog an issue.

Baidu
map