Short Description:
In this article we’ll cover a number debugging and monitoring features introduced in Apache Storm 1.0.
Article
Apache Storm is a distributed real-time computation system for processing large volumes of high-velocity data. Debugging a distributed system is inherently hard since there are many moving parts spread across a large number of hosts in a cluster. Tracing failures to a particular component or a host in the system is hard and requires one to collect and analyze a lot of logs and debug/trace processes running in the distributed cluster.
In the past the support for debugging topologies in an Apache storm cluster was very minimal. Users would have to first turn on debug logs, restart the workers, redeploy the topologies and then collect logs from different worker nodes and analyze them. If they wanted to take a jstack dump or profile the workers, they would have to login to the worker hosts and run the commands manually which was painful. If one wanted to inspect the tuples flowing through the topology pipeline, they would have to add special “debug” bolts to log the tuples to some log and then remove the debug bolts from the topology before putting it back into production.
In this article we will go through the new features added in Apache Storm 1.0 that makes the various aspects of debugging a storm topology much easier and user friendly.
1. Dynamic Log Levels
Storm allows users and administrators to dynamically change the log level settings of a running topology both from the Storm UI and the command line. None of the Storm processes need to be restarted for the settings to take effect. Users can also specify an optional timeout after which those changes will be automatically reverted. The resulting log files are also easily searchable from the Storm UI and logviewer service.
The log level settings apply the same way as you’d expect from log4j. If you set the log level of a parent logger, the child loggers start using that level (unless the children have a more restrictive level already).
1.1. Enabling via Storm UI
In order to set a level, click on a running topology and then click on “Change Log Level” in the Topology Actions section.
Figure 1: Changing the log levels of a topology.
Next, provide the logger name, select the level you want (e.g. WARN), and a timeout in seconds (or 0 if not needed). Then click on “Add”.
In the example above (Figure 1), while running a storm starter topology, the root logger is set to ERROR and storm.starter is set to DEBUG. That way one can look more specifically into the debug logs from the “storm.starter” packages while the other logs are suppressed.
To clear the log level click on the “Clear” button. This reverts the log level back to what it was before you added the setting. The log level line will disappear from the UI.
Figure 2: Clearing the dynamic log levels.
1.2. Using the CLI
Log level can be set via the command line as follows,
1 | ./bin/storm set_log_level [topology name]-l [logger name]=[LEVEL]:[TIMEOUT] |
For example:
1 | ./bin/storm set_log_level my_topology -l ROOT=DEBUG:30Sets the ROOT logger to DEBUG for 30 seconds. |
Clears the ROOT logger dynamic log level, resetting it to its original value.
See JIRA STORM-412 for more details.
2. Topology Event logging
Topology event inspector provides the ability to view the tuples as they flow through different stages in a storm topology. This could be useful for inspecting the tuples emitted from a spout or a bolt in the topology pipeline while the topology is running, without stopping or redeploying the topology. The normal flow of tuples from the spouts to the bolts is not affected by turning on event logging.
2.1. Enabling event logging
Note: Event logging is disabled by default and needs to be enabled first by setting the storm config “topology.eventlogger.executors” to a non zero value. Please see the Configuration section for more details.
Events can be logged by clicking the “Debug” button under the topology actions in the topology view. This logs the tuples from all the spouts and bolts in a topology at the specified sampling percentage.
Figure 3: Enable event logging at topology level.
You could also enable event logging at a specific spout or bolt level by going to the corresponding component page and clicking “Debug” under component actions.
Figure 4: Enable event logging at component level.
2.2. Viewing the event logs
The Storm “logviewer” should be running for viewing the logged tuples. If not already running log viewer can be started by running the “bin/storm logviewer” command from the storm installation directory. For viewing the tuples, go to the specific spout or bolt component page from storm UI and click on the “events” link under the component summary (as highlighted in Figure 4 above).
This would open up a view like below where you can navigate between different pages and view the logged tuples.
Figure 5: Viewing the logged events.
Each line in the event log contains an entry corresponding to a tuple emitted from a specific spout/bolt in a comma separated format:
1 | Timestamp, Component name, Component task-id, MessageId (in case of anchoring), List of emitted values |
2.3. Disabling the event logs
Event logging can be disabled at a specific component or at the topology level by clicking the “Stop Debug” under the topology or component actions in the Storm UI.
Figure 6: Disable event logging at topology level.
2.4. Configuration
Event logging sends events (tuples) from each component to an internal eventlogger bolt. By default Storm does not start any event logger tasks due to a slight performance degradation when event logging is turned on. This can be changed by setting the below parameter when submitting your topology (either globally in the storm.yaml or using the command line options).
Parameter | Meaning | When to use |
---|---|---|
topology.eventlogger.executors: 0 | No event logger tasks are created (default). | If you don’t intend to inspect tuples and don’t want the slight performance hit. |
topology.eventlogger.executors: 1 | One event logger task for the topology. | If you want to sample a low percentage of tuples from a specific spout or a bolt. This could be the most common use case. |
topology.eventlogger.executors: nil | One event logger task per worker. | If you want to sample entire topology (all spouts and bolt) at a very high sampling percentage and the tuple rate is very high. |
2.5. Extending event logging
Storm provides the IEventLogger interface, which is used by the event logger bolt to log the events. The default implementation is FileBasedEventLogger, which logs the events to an events.log file ( logs/workers-artifacts/<topology-id>/<worker-port>/events.log). Alternative implementations of the IEventLogger interface can be added to extend the event logging functionality (say build a search index or log the events into a database)
1 | /** |
See JIRA STORM-954 for more details.
3. Distributed Log Search
Another improvement to Storm’s UI is the addition of distributed log search. This capability allows users to search across all log files of a specific topology, including archived logs. The search results will include matches from all supervisor nodes.
Figure 7: A distributed log search output
This feature is very helpful to search for patterns across workers or supervisors of a topology. Similar log search is also supported within specific worker log files via the UI.
See JIRA STORM-902 for more details.
4. Dynamic Worker Profiling
Users can now request worker profile data directly from Storm UI, including Heap dumps, JStack output and JProfile recordings without restarting their topologies.
The generated files are then available for download for offline analysis via log viewer with various debugging tools. It is also now possible to restart workers via the Storm UI.
Figure 8: Profiling and debugging workers
The output can be viewed from the worker log viewer UI by switching to the appropriate file.
Figure 9: Jstack output viewed from the Storm UI
See JIRA STORM-1157 for more details.
5. Conclusion
We covered some of the new features added in Apache Storm 1.0 that should make the various aspects of debugging a Storm topology much easier, intuitive and more user friendly. These features allow Apache Storm users to quickly troubleshoot their topologies whenever any issues are encountered.
参考链接: