Use this URL to cite or link to this record in EThOS:
Title: Statistic oriented video coding and streaming methods with future insight
Author: Yu, L.
Awarding Body: University of Liverpool
Current Institution: University of Liverpool
Date of Award: 2016
Availability of Full Text:
Access from EThOS:
Access from Institution:
As indicated by Cisco, IP video traffic represents 70 percent of all consumer Internet traffic in 2015 globally, and it is expected to reach 82 percent by 2020. Given this, research works related to video compression, video transmission, and interactive playback are of vital importance. Most existing works solve one step of these tasks based on the currently and/or previously acquired information. One common challenge behind all these tasks is the uncertainty in the future. For example, the dynamic adaptive video streaming over HTTP (DASH) standard provides multiple quality levels for each video block to choose. The benefit of various options is that it can adapt to the bandwidth fluctuation and various client device capacity. Most methods predict the bitrate of future video blocks according to the already downloaded ones, which is usually unprecise. As a result, the mismatch between the predicted and actual bitrate of the chosen video block leads to latency or inefficient usage of the bandwidth. Thus, one of our work proposes to send the exact bitrate information of all video blocks to the client at the beginning to avoid such problems. To sum up, the focus of this thesis is to solve the video coding and streaming problems with future insight. By analyzing the uncertainties of future information in a statistical way, more efficient and suitable solutions are derived. In this thesis, how each problem is solved with future insight is described respectively. As for video compression, inter prediction is one of the biggest contributors to the compression ratio, which removes temporal redundancies between frames. However, it is also one of the most computational complex processes. Thus, the ideal scenario is that the inter prediction is only performed within necessary areas, where there exist similar contents for reference. However, the existing encoding standards, such as H.264 and H.265, simply uses the inter prediction for all reference frames following a fixed prediction structure. Thus, it is a waste of resources to perform inter prediction in these unnecessary areas that have less probability of being referenced. Inspired by this idea, a statistical approach for motion estimation skipping (SAMEK) is proposed to recognize these unnecessary areas and avoid using them in the motion estimation stage while encoding future frames. By doing so, the overall complexity and encoding time are reduced. After the compression process (source coding), the channel coding is needed to protect video contents when they are transmitted over unreliable networks. Reed-Solomon (RS) erasure code is one of the most popular errors correcting codes, which detects and recovers the erasures by adding parity packets. These parity packets should be optimally allocated according to the importance of each video packet. The importance of each packet can be evaluated through its influence on the quality of the whole video. Thus, by knowing the future potential influence of each packet, a rate-distortion optimized redundancy allocation scheme is proposed to automatically allocate parity packets based on the network conditions and video characteristics. RS based error control mechanisms are usually used for real-time streaming over the unreliable networks, such as IP, UDP; whereas for delay insensitive video streaming over reliable protocol TCP, DASH is commonly adopted. The DASH is the de-facto video delivery mechanism nowadays, which takes advantage of the existing low cost and widespread HTTP platforms. So far, most DASH works focus on the CBR (constant bitrate) video delivery. The bit rate of CBR video is kept constant over each segment. In this thesis, VBR (various bitrate) video delivery is investigated instead. Since the quality is kept constant in VBR video, the bit rate of each segment fluctuates. Thus, it is important to know the instant bit rate of future segments beforehand. In the proposed method, such accurate bit rate information of every segment is sent at the beginning of a streaming session. Then, the proposed internal QoE (Quality of Experience) goal function would take the expected future influence of each request over buffer reservation into consideration. In addition to effective video streaming, user demands are increasing with the emergence of interactive multiview video streaming platforms, which provides immersive vision, seamless view switching, and interactive involvement. A probabilistic navigation model, which predicts the views that might be watched by the user, is incorporated in the proposed convolutional neural network (CNN) assisted seamless multiview video streaming and navigation system to guide the download of future views. In addition, a bit allocation mechanism under the guidance of the navigation model is developed to prefetch all possibly being watched views and adapt to the network fluctuations at the same time. Besides, a convolutional neural network assisted multiview representation method is proposed to prepare the multiview videos at the server. The proposed representation method would maintain a satisfactory compression efficiency and allow random access to any subset of views with dynamic qualities at the same time. All the above methods work closely to provide a seamless viewing experience to users. They can be fused into any existing multiview video streaming frameworks to enhance the overall performance. The main contribution of this thesis is incorporating future insight into various tasks related to video coding and streaming. By leveraging the methods proposed in this thesis, efficient results could be obtained in different application scenarios. For example, with the proposed SAMEK method, up to 9.5% encoding time (averagely 6.87%) is saved with negligible rate-distortion losses (in average 0.006 dB) when compared with classical HEVC encoder. With the proposed RS redundancy allocation scheme, an average gain of 1dB over the state-of-the-art approach is achieved. The proposed multiview video streaming and navigation system enhances the overall quality over benchmark with averagely 0.6 dB with a lower bitrate.
Supervisor: Tillo, Tammam Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral