WeChat (Chinese: 微信; pinyin: Wēixìn .mw-parser-output .noitalicfont-style:normal(listen); lit. 'micro-message') is a Chinese instant messaging, social media, and mobile payment app developed by Tencent. First released in 2011, it became the world's largest standalone mobile app in 2018,[2][3] with over 1 billion monthly active users.[4][5][6] WeChat has been described as China's "app for everything" and a super-app because of its wide range of functions.[7] WeChat provides text messaging, hold-to-talk voice messaging, broadcast (one-to-many) messaging, video conferencing, video games, sharing of photographs and videos and location sharing.
WeChat provides many features similar to Snapchat such as text messaging, hold-to-talk voice messaging, broadcast (one-to-many) messaging, video calls and conferencing, video games, photograph and video sharing, as well as location sharing.[33] WeChat also allows users to exchange contacts with people nearby via Bluetooth, as well as providing various features for contacting people at random if desired (if people are open to it). It can also integrate with other social networking services such as Facebook and Tencent QQ.[34] Photographs may also be embellished with filters and captions, and automatic translation service is available.
Tencent targets video conferencing
The meeting solutions combine communications, co-operation, and content sharing to facilitate virtual meetings. The video conferencing gained a huge grip since the coronavirus outbreak occurred. The enterprises and government needed to act quickly to keep operations running. Therefore, video conferencing was the solution to connect employees and customers. Except taking the workforce operating remotely is not the same job as ensuring the assets needs to access have been properly secured. As a result, the necessity for excellent visibility and security accompanying granular application control is more important than ever before.
App-ID enables visibility in video conferencing apps in your network. The App-ID concentrated on application identification and in-app features (e.g., meeting, messaging, desktop sharing, and remote access), along with file transfer capabilities such as download and upload. Administrators can block or control what they deem to be risky functions such as file transfer.
In a cyber-attack chain, delivery is a phase where the attacker sends a malicious payload to the victim. For example, they could place a remote access Trojan inside a file that looks to have important information, share it in a meeting and attract users into downloading and running it. For instance, in Zoombombing incidents, a teleconferencing session is hijacked, and hackers joining a meeting without authorization can upload and distribute malicious contents into a video-conference call rooms/meeting. It is conceivable to infect participants' endpoints or mobile devices that are unwittingly downloading malicious files.
To minimize the chance of this threat, using App-ID features network security administrators might place a policy to granularity block file upload/download functions for users or groups. Nevertheless, it is suggested to block files known to carry threats or to have no real use case for upload/download. For policy rules that allow video conferencing applications, you can be strict with your file blocking because of the risk that users unknowingly download malicious files.
Uploading sensitive data to a third-party through unmonitored software (application) is dangerous and costly. In addition, the network traffic for the application could be innocuous-seeming like a meeting solution application. This type of data exfiltration typically arises from malicious insiders. By utilizing App-ID, the risk of this kind of incident can decrease. By using policy restrictions on upload files in video conferencing applications, it protects organizations against transferring sensitive data to vulnerable and unapproved systems.
Remote access to an organization could provide an attacker the ability to manipulate and subvert the organization systems from outside the physical security perimeter. Unauthorized remote connections via video conferencing applications should be restricted by establishing and enforcing policies. By allowing remote control functionality in meeting solution applications, it exposes users to possible remote access attacks, which employees are not usually aware of. It may put the organization at a severe security disadvantage by allowing attackers the opportunity to remotely gain access to a system. Once the attackers are inside the system, they may upload malicious code, exfiltrate sensitive data, and use the compromised machine to target and attack other workstations or networks within the same environment.
Video coding systems, started for TV broadcasting services over satellite and cable networks with limited bandwidth, later on used for surveillance video and internet video, those target on higher compression ratio with lower quality lose, under the trade-off of RDO (rate distortion optimization) model, judged by human experts. In other word, current video coding standards are good for people, for human visual perception, not design for machine intelligence. However, today more and more applications from industry require video coding for machine, which targets to compress image and video for machine usage, object detection and or tracking, image classification, event analysis, and so on, those target on higher compression ratio with higher recognition accuracy, under the trade-off of RAO (rate accuracy optimization) model, judged by system. In this case, video coding needs to do feature compression, which preserves and transmits the most critical information for computer vision and pattern recognition, not for human visual perception. So it is quite different between video coding for human and video coding for machine, even if the two systems will coexist for a long time. In this talk, I will introduce the history of VCM, list some early works on pattern analysis based on compressed data domain, some efforts from ISO/IEC MPEG group on MPEG-7 CDVS (compact descriptor for visual search) and CDVA (compact descriptors for visual analysis), some ongoing projects on AVS working group and MPEG working group, give the key techniques and challenges on VCM, and overview its future.
Multi-Object Tracking (MOT) and Person Search both demand to localize and identify specific targets from raw image frames. Existing methods can be classified into two categories, namely two-step strategy and end-to-end strategy. Two-step approaches have high accuracy but suffer from costly computations, while end-to-end methods show greater efficiency with limited performance. In this paper, we dissect the gap between two-step and end-to-end strategy and propose a simple yet effective end-to-end framework with knowledge distillation. Our proposed framework is simple in concept and easy to benefit from external datasets. Experimental results demonstrate that our model performs competitively with other sophisticated two-step and end-to-end methods in multi-object tracking and person search.
Prevailing Multiple Object Tracking (MOT) works following the Tracking-by-Detection (TBD) paradigm pay most attention to either object detection in a first step or data association in a second step. In this paper, we approach the MOT problem from a different perspective by directly obtaining the embedded spatial-temporal information of trajectories from raw video data. For the purpose we propose a joint trajectory locating and attributes encoding framework for real-time, on-line MOT. We firstly introduce a trajectory attribute representation scheme designed for each tracked target (instead of object) where the extracted Trajectory Map (TM) encodes the spatial-temporal attributes of a trajectory across a window of consecutive video frames. Next we present a Temporal Priors Embedding (TPE) methodology to infer these attributes with a logical reasoning strategy based on long-term feature dynamics. The proposed MOT framework projects multiple attributes of tracked targets, e.g., presence, enter/exit, location, scale, motion, etc. into a continuous TM to perform one-shot regression for real-time MOT. Experimental results show that, our proposed video-based method runs at 33 FPS and is more accurate and robust as compared to the detection-based tracking methods and a few other State-of-the- Art (SOTA) approaches on MOT16/17/20 benchmarks.
Weakly supervised temporal action localization (WS-TAL) is a challenging task that aims to localize action instances in the given video with video-level categorical supervision. Previous works use the appearance and motion features extracted from pre-trained feature encoder directly,e.g., feature concatenation or score-level fusion. In this work, we argue that the features extracted from the pre-trained extractors,e.g., I3D, which are trained for trimmed video action classification, but not specific for WS-TAL task, leading to inevitable redundancy and sub-optimization. Therefore, the feature re-calibration is needed for reducing the task-irrelevant information redundancy. Here, we propose a cross-modal consensus network(CO2-Net) to tackle this problem. In CO2-Net, we mainly introduce two identical proposed cross-modal consensus modules (CCM) that design a cross-modal attention mechanism to filter out the task-irrelevant information redundancy using the global information from the main modality and the cross-modal local information from the auxiliary modality. Moreover, we further explore inter-modality consistency, where we treat the attention weights derived from each CCM as the pseudo targets of the attention weights derived from another CCM to maintain the consistency between the predictions derived from two CCMs, forming a mutual learning manner. Finally, we conduct extensive experiments on two commonly used temporal action localization datasets, THUMOS14 and ActivityNet1.2, to verify our method, which we achieve state-of-the-art results. The experimental results show that our proposed cross-modal consensus module can produce more representative features for temporal action localization. 2ff7e9595c
Commenti